Recommended length: 5–20 seconds. Avoid long silence at the start/end of audio — these can cause the model to get stuck in a loop, producing only silence.
Automatically trim silence from the beginning and end of input audio (for more stability)
Generation Parameters (100 tokens ≈ 1 second)
0.12
0.11
1003000
01000
Output
We thank Marin and OpenAthena for enabling this project with open-development LLM and training infrastructure.