TinyLlama-1.1B-Katakana-Lyrics-Liaison

This model is a fine-tuned version of TinyLlama/TinyLlama-1.1B-Chat-v1.0 using LoRA. It is specifically designed to convert English lyrics and phrases into Phonetic Katakana, prioritizing real-world pronunciation, linking (liaison), and rhythm over literal dictionary spelling.

๐ŸŒŸ Concept: "The Training Wheels for English Rhythm"

As the creator, I generally believe that English should be learned without Katakana. However, for children and beginners, the "fear of the written word" often stops them from speaking entirely.

This model was built to provide "Supportive Katakana" โ€” not just a translation, but a phonetic guide that helps learners mimic the actual rhythm and flow of native speakers, serving as temporary "training wheels" until they are ready to rely solely on their ears.

โœจ Key Features

  • Liaison & Linking: Handles word connections naturally (e.g., hold your โ†’ ใƒ›ใƒผใ‚ธใƒง, take it โ†’ ใƒ†ใ‚คใ‚ญใƒƒ).
  • Silent Letters: Trained to ignore silent consonants (e.g., honest โ†’ ใ‚ชใƒใ‚น, hour โ†’ ใ‚ขใƒฏใƒผ).
  • Lyric-focused Reductions: Strong support for informal contractions like gonna, wanna, and gotta.
  • Complex Phonetics: Specifically trained to handle difficult phonetic mappings like Scarborough Fair โ†’ ใ‚นใ‚ซใƒผใƒ–ใƒฉใƒ•ใ‚งใ‚ข.

๐Ÿ“Š Comparison Examples

English Phrase Dictionary-style (Standard) This Model (Phonetic)
I wanna hold your hand ใ‚ขใ‚ค ใ‚ฆใ‚ฉใƒŠ ใƒ›ใƒผใƒซใƒ‰ ใƒฆใ‚ข ใƒใƒณใƒ‰ ใ‚ขใ‚คใƒฏใƒŠใƒ›ใƒผใ‚ธใƒงใƒใƒณ
I gotta be honest with you ใ‚ขใ‚ค ใ‚ฌใƒƒใ‚ฟ ใƒ“ใƒผ ใ‚ชใƒใ‚นใƒˆ ... ใ‚ขใ‚คใ‚ฌใƒฉใƒ“ใƒผใ‚ชใƒใ‚นใ‚ฆใ‚ฃใ‚บใƒฆใƒผ
Scarborough Fair ใ‚นใ‚ซใƒผใƒใƒฉ ใƒ•ใ‚งใ‚ข ใ‚นใ‚ซใƒผใƒ–ใƒฉใƒ•ใ‚งใ‚ข
Take it anymore ใƒ†ใ‚คใ‚ฏ ใ‚คใƒƒใƒˆ ใ‚จใƒ‹ใƒขใ‚ข ใƒ†ใ‚คใ‚ญใƒƒใ‚จใƒ‹ใƒขใƒผ

๐Ÿš€ How to Use

To get the best results, use the following prompt format:

ไปฅไธ‹ใฎ่‹ฑๆ–‡ใ‚’ใ€่žใ“ใˆใŸใพใพใฎใ‚ซใ‚ฟใ‚ซใƒŠใซๅค‰ๆ›ใ—ใฆใ€‚

่‹ฑ่ชž: [Your English Phrase Here]
ใ‚ซใ‚ฟใ‚ซใƒŠ:

Example Code (Python / Transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model_path = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
lora_model_path = "YOUR_USERNAME/TinyLlama-1.1B-Katakana-Lyrics-Liaison"

tokenizer = AutoTokenizer.from_pretrained(base_model_path)
model = AutoModelForCausalLM.from_pretrained(base_model_path)
model = PeftModel.from_pretrained(model, lora_model_path)

prompt = "่‹ฑ่ชžใ‚’ๆญŒใ„ใ‚„ใ™ใ„ใ‚ˆใ†ใซใ€้ŸณใฎใคใชใŒใ‚Š๏ผˆใƒชใ‚จใ‚พใƒณ๏ผ‰ใ‚’่€ƒๆ…ฎใ—ใฆใ‚ซใ‚ฟใ‚ซใƒŠใซๅค‰ๆ›ใ—ใฆใใ ใ•ใ„ใ€‚\n\n่‹ฑ่ชž: take it easy
ใ‚ซใ‚ฟใ‚ซใƒŠ: ใƒ†ใ‚คใ‚ญใƒƒใ‚คใƒผใ‚ธใƒผ\n\n่‹ฑ่ชž: I wanna hold you\nใ‚ซใ‚ฟใ‚ซใƒŠ: ใ‚ขใ‚คใƒฏใƒŠใƒ›ใƒผใ‚ธใƒฅใƒผ\n\n่‹ฑ่ชž: I love the way you lie\nใ‚ซใ‚ฟใ‚ซใƒŠ:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

๐Ÿ›  Training Details

  • Dataset: 1,200+ samples of custom-curated phonetic pairs.
  • Methodology: Developed using a "human-in-the-loop" approach, focusing on capturing real-world auditory experiences rather than robotic dictionary rules.
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • Base Model: TinyLlama-1.1B-Chat-v1.0

โš ๏ธ Limitations

  • Model Size: As a 1.1B model, it may occasionally hallucinate or misinterpret extremely long or rare technical terms.
  • Dialect: Primarily targets General American/Standard English pronunciation as heard in global pop music.

๐Ÿ“œ License

This model is licensed under the Apache 2.0 License, consistent with the base TinyLlama model.


Downloads last month
82
Safetensors
Model size
1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for pyon0024/tinyllama-katakana-converter

Finetuned
(473)
this model

Space using pyon0024/tinyllama-katakana-converter 1