Estonian Audio-to-Phoneme Model
This is a Wav2Vec2 model fine-tuned for Estonian phoneme recognition. It converts Estonian speech audio directly into phoneme sequences.
Model Description
- Language: Estonian (et-EE)
- Task: Audio-to-Phoneme Conversion
- Base Model: facebook/wav2vec2-lv-60-espeak-cv-ft
- Training Data: Estonian speech corpus with phoneme annotations
Usage
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
import librosa
import torch
# Load model and processor
processor = Wav2Vec2Processor.from_pretrained("vocametrix/estonian-audio-to-phoneme")
model = Wav2Vec2ForCTC.from_pretrained("vocametrix/estonian-audio-to-phoneme")
# Load audio
audio, sr = librosa.load("audio.wav", sr=16000)
# Process
input_values = processor(audio, sampling_rate=16000, return_tensors="pt").input_values
# Get predictions
with torch.no_grad():
logits = model(input_values).logits
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.decode(predicted_ids[0])
print(f"Detected phonemes: {transcription}")
Estonian Phoneme Set
The model recognizes the following Estonian phonemes:
- Consonants: p, b, t, d, k, g, f, s, ʃ, h, v, j, m, n, ŋ, r, l
- Long consonants: pː, bː, tː, dː, kː, gː, fː, sː, ʃː, hː, vː, jː, mː, nː, ŋː, rː, lː
- Vowels: i, y, u, e, ø, ɤ, o, æ, ɑ
- Long vowels: iː, yː, uː, eː, øː, ɤː, oː, æː, ɑː
Model Performance
The model was trained on Estonian speech data with phoneme-level annotations.
Citation
If you use this model, please cite:
@misc{estonian-audio-to-phoneme,
author = {Patrick Marmaroli},
company = {Vocametrix},
title = {Estonian Audio-to-Phoneme Model},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/vocametrix/estonian-audio-to-phoneme}}
}
License
Apache 2.0
- Downloads last month
- 48
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support