Instructions to use Cnam-LMSSC/mimi_throat_microphone with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Cnam-LMSSC/mimi_throat_microphone with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="Cnam-LMSSC/mimi_throat_microphone")# Load model directly from transformers import AutoFeatureExtractor, AutoModel extractor = AutoFeatureExtractor.from_pretrained("Cnam-LMSSC/mimi_throat_microphone") model = AutoModel.from_pretrained("Cnam-LMSSC/mimi_throat_microphone") - Notebooks
- Google Colab
- Kaggle
Inference script :
import torch, torchaudio
from datasets import load_dataset
from moshi.models import loaders
weight_path = loaders.hf_hub_download("Cnam-LMSSC/mimi_throat_microphone", "kyutai_implementation.safetensors")
model = loaders.get_mimi(weight_path).eval()
model.set_num_codebooks(model.total_codebooks) # use all codebooks
test_dataset = load_dataset("Cnam-LMSSC/vibravox", "speech_clean", split="test", streaming=True)
audio_48kHz = torch.Tensor(next(iter(test_dataset))["audio.throat_microphone"]["array"])
audio_24kHz = torchaudio.functional.resample(audio_48kHz, orig_freq=48_000, new_freq=24_000)
enhanced_audio_24kHz = model.decode(model.encode(audio_24kHz[None, None, :]))
For streaming usage, please refer to this script
- Downloads last month
- 3