TinyLlama-1.1B-Katakana-Lyrics-Liaison
This model is a fine-tuned version of TinyLlama/TinyLlama-1.1B-Chat-v1.0 using LoRA. It is specifically designed to convert English lyrics and phrases into Phonetic Katakana, prioritizing real-world pronunciation, linking (liaison), and rhythm over literal dictionary spelling.
๐ Concept: "The Training Wheels for English Rhythm"
As the creator, I generally believe that English should be learned without Katakana. However, for children and beginners, the "fear of the written word" often stops them from speaking entirely.
This model was built to provide "Supportive Katakana" โ not just a translation, but a phonetic guide that helps learners mimic the actual rhythm and flow of native speakers, serving as temporary "training wheels" until they are ready to rely solely on their ears.
โจ Key Features
- Liaison & Linking: Handles word connections naturally (e.g.,
hold yourโใใผใธใง,take itโใใคใญใ). - Silent Letters: Trained to ignore silent consonants (e.g.,
honestโใชใใน,hourโใขใฏใผ). - Lyric-focused Reductions: Strong support for informal contractions like
gonna,wanna, andgotta. - Complex Phonetics: Specifically trained to handle difficult phonetic mappings like
Scarborough Fairโในใซใผใใฉใใงใข.
๐ Comparison Examples
| English Phrase | Dictionary-style (Standard) | This Model (Phonetic) |
|---|---|---|
| I wanna hold your hand | ใขใค ใฆใฉใ ใใผใซใ ใฆใข ใใณใ | ใขใคใฏใใใผใธใงใใณ |
| I gotta be honest with you | ใขใค ใฌใใฟ ใใผ ใชใในใ ... | ใขใคใฌใฉใใผใชใในใฆใฃใบใฆใผ |
| Scarborough Fair | ในใซใผใใฉ ใใงใข | ในใซใผใใฉใใงใข |
| Take it anymore | ใใคใฏ ใคใใ ใจใใขใข | ใใคใญใใจใใขใผ |
๐ How to Use
To get the best results, use the following prompt format:
ไปฅไธใฎ่ฑๆใใ่ใใใใพใพใฎใซใฟใซใใซๅคๆใใฆใ
่ฑ่ช: [Your English Phrase Here]
ใซใฟใซใ:
Example Code (Python / Transformers)
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model_path = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
lora_model_path = "YOUR_USERNAME/TinyLlama-1.1B-Katakana-Lyrics-Liaison"
tokenizer = AutoTokenizer.from_pretrained(base_model_path)
model = AutoModelForCausalLM.from_pretrained(base_model_path)
model = PeftModel.from_pretrained(model, lora_model_path)
prompt = "่ฑ่ชใๆญใใใใใใใซใ้ณใฎใคใชใใ๏ผใชใจใพใณ๏ผใ่ๆ
ฎใใฆใซใฟใซใใซๅคๆใใฆใใ ใใใ\n\n่ฑ่ช: take it easy
ใซใฟใซใ: ใใคใญใใคใผใธใผ\n\n่ฑ่ช: I wanna hold you\nใซใฟใซใ: ใขใคใฏใใใผใธใฅใผ\n\n่ฑ่ช: I love the way you lie\nใซใฟใซใ:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
๐ Training Details
- Dataset: 1,200+ samples of custom-curated phonetic pairs.
- Methodology: Developed using a "human-in-the-loop" approach, focusing on capturing real-world auditory experiences rather than robotic dictionary rules.
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- Base Model: TinyLlama-1.1B-Chat-v1.0
โ ๏ธ Limitations
- Model Size: As a 1.1B model, it may occasionally hallucinate or misinterpret extremely long or rare technical terms.
- Dialect: Primarily targets General American/Standard English pronunciation as heard in global pop music.
๐ License
This model is licensed under the Apache 2.0 License, consistent with the base TinyLlama model.
- Downloads last month
- 82
Model tree for pyon0024/tinyllama-katakana-converter
Base model
TinyLlama/TinyLlama-1.1B-Chat-v1.0