Estonian Audio-to-Phoneme Model

This is a Wav2Vec2 model fine-tuned for Estonian phoneme recognition. It converts Estonian speech audio directly into phoneme sequences.

Model Description

  • Language: Estonian (et-EE)
  • Task: Audio-to-Phoneme Conversion
  • Base Model: facebook/wav2vec2-lv-60-espeak-cv-ft
  • Training Data: Estonian speech corpus with phoneme annotations

Usage

from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
import librosa
import torch

# Load model and processor
processor = Wav2Vec2Processor.from_pretrained("vocametrix/estonian-audio-to-phoneme")
model = Wav2Vec2ForCTC.from_pretrained("vocametrix/estonian-audio-to-phoneme")

# Load audio
audio, sr = librosa.load("audio.wav", sr=16000)

# Process
input_values = processor(audio, sampling_rate=16000, return_tensors="pt").input_values

# Get predictions
with torch.no_grad():
    logits = model(input_values).logits

predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.decode(predicted_ids[0])

print(f"Detected phonemes: {transcription}")

Estonian Phoneme Set

The model recognizes the following Estonian phonemes:

  • Consonants: p, b, t, d, k, g, f, s, ʃ, h, v, j, m, n, ŋ, r, l
  • Long consonants: pː, bː, tː, dː, kː, gː, fː, sː, ʃː, hː, vː, jː, mː, nː, ŋː, rː, lː
  • Vowels: i, y, u, e, ø, ɤ, o, æ, ɑ
  • Long vowels: iː, yː, uː, eː, øː, ɤː, oː, æː, ɑː

Model Performance

The model was trained on Estonian speech data with phoneme-level annotations.

Citation

If you use this model, please cite:

@misc{estonian-audio-to-phoneme,
  author = {Patrick Marmaroli},
  company = {Vocametrix},
  title = {Estonian Audio-to-Phoneme Model},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/vocametrix/estonian-audio-to-phoneme}}
}

License

Apache 2.0

Downloads last month
48
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support