TinyLlama-1.1B-Katakana-Lyrics-Liaison

This model is a fine-tuned version of TinyLlama/TinyLlama-1.1B-Chat-v1.0 using LoRA. It is specifically designed to convert English lyrics and phrases into Phonetic Katakana, prioritizing real-world pronunciation, linking (liaison), and rhythm over literal dictionary spelling.

🌟 Concept: "The Training Wheels for English Rhythm"

As the creator, I generally believe that English should be learned without Katakana. However, for children and beginners, the "fear of the written word" often stops them from speaking entirely.

This model was built to provide "Supportive Katakana" — not just a translation, but a phonetic guide that helps learners mimic the actual rhythm and flow of native speakers, serving as temporary "training wheels" until they are ready to rely solely on their ears.

✨ Key Features

Liaison & Linking: Handles word connections naturally (e.g., hold your → ホージョ, take it → テイキッ).
Silent Letters: Trained to ignore silent consonants (e.g., honest → オネス, hour → アワー).
Lyric-focused Reductions: Strong support for informal contractions like gonna, wanna, and gotta.
Complex Phonetics: Specifically trained to handle difficult phonetic mappings like Scarborough Fair → スカーブラフェア.

📊 Comparison Examples

English Phrase	Dictionary-style (Standard)	This Model (Phonetic)
I wanna hold your hand	アイウォナホールドユアハンド	アイワナホージョハン
I gotta be honest with you	アイガッタビーオネスト ...	アイガラビーオネスウィズユー
Scarborough Fair	スカーバラフェア	スカーブラフェア
Take it anymore	テイクイットエニモア	テイキッエニモー

🚀 How to Use

To get the best results, use the following prompt format:

以下の英文を、聞こえたままのカタカナに変換して。

英語: [Your English Phrase Here]
カタカナ:

Example Code (Python / Transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model_path = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
lora_model_path = "YOUR_USERNAME/TinyLlama-1.1B-Katakana-Lyrics-Liaison"

tokenizer = AutoTokenizer.from_pretrained(base_model_path)
model = AutoModelForCausalLM.from_pretrained(base_model_path)
model = PeftModel.from_pretrained(model, lora_model_path)

prompt = "英語を歌いやすいように、音のつながり（リエゾン）を考慮してカタカナに変換してください。\n\n英語: take it easy
カタカナ: テイキッイージー\n\n英語: I wanna hold you\nカタカナ: アイワナホージュー\n\n英語: I love the way you lie\nカタカナ:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

🛠 Training Details

Dataset: 1,200+ samples of custom-curated phonetic pairs.
Methodology: Developed using a "human-in-the-loop" approach, focusing on capturing real-world auditory experiences rather than robotic dictionary rules.
Fine-tuning Method: LoRA (Low-Rank Adaptation)
Base Model: TinyLlama-1.1B-Chat-v1.0

⚠️ Limitations

Model Size: As a 1.1B model, it may occasionally hallucinate or misinterpret extremely long or rare technical terms.
Dialect: Primarily targets General American/Standard English pronunciation as heard in global pop music.

📜 License

This model is licensed under the Apache 2.0 License, consistent with the base TinyLlama model.

Downloads last month: 82

Safetensors

Model size

1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pyon0024/tinyllama-katakana-converter

Base model

TinyLlama/TinyLlama-1.1B-Chat-v1.0

Finetuned

(473)

this model

pyon0024
/

tinyllama-katakana-converter