SODA: Scaling Open Discrete Audio

Continue speech from an audio prompt!

Input (Audio)

Input Audio

Recommended length: 5–20 seconds. Avoid long silence at the start/end of audio — these can cause the model to get stuck in a loop, producing only silence.

Trim Silence

Automatically trim silence from the beginning and end of input audio (for more stability)

Generation Parameters (100 tokens ≈ 1 second)

Temperature

0.1 2

Top-p

0.1 1

Max New Tokens

100 3000

Min New Tokens

0 1000

Random Seed

Suppress Tokens (comma-separated token IDs)

Optional: Enter token IDs to prevent from being generated. <|text_end|>=128257, <|audio_end|>=128259

Output

Continued Speech

Status

We thank Marin and OpenAthena for enabling this project with open-development LLM and training infrastructure.