mlx-community/diar_streaming_sortformer_4spk-v2.1-fp32
This model was converted to MLX format from nvidia/diar_streaming_sortformer_4spk-v2.1 using mlx-audio version 0.3.2.
Refer to the original model card for more details on the model.
Use with mlx-audio
pip install -U mlx-audio
Converting from NeMo
The original model is distributed as a .nemo archive. This repo contains the pre-converted MLX weights.
python -m mlx_audio.vad.models.sortformer.convert \
--nemo-path nvidia/diar_streaming_sortformer_4spk-v2.1 \
--output-dir ./sortformer-v2.1-mlx
Python Example β Streaming Inference (Recommended):
from mlx_audio.vad import load
model = load("mlx-community/diar_streaming_sortformer_4spk-v2.1-fp32")
for result in model.generate_stream("meeting.wav", chunk_duration=5.0, verbose=True):
for seg in result.segments:
print(f"Speaker {seg.speaker}: {seg.start:.2f}s - {seg.end:.2f}s")
Python Example β Offline Inference:
from mlx_audio.vad import load
model = load("mlx-community/diar_streaming_sortformer_4spk-v2.1-fp32")
result = model.generate("meeting.wav", threshold=0.5, verbose=True)
print(result.text)
Python Example β Real-time Microphone Streaming:
from mlx_audio.vad import load
model = load("mlx-community/diar_streaming_sortformer_4spk-v2.1-fp32")
state = model.init_streaming_state()
for chunk in mic_stream(): # your audio source
result, state = model.feed(chunk, state, sample_rate=16000)
for seg in result.segments:
print(f"Speaker {seg.speaker}: {seg.start:.2f}s - {seg.end:.2f}s")
Model Details
- Architecture: FastConformer (17 layers) + Transformer Encoder (18 layers) + Sortformer Modules
- Mel bins: 128
- Max speakers: 4
- Streaming: AOSC (Arrival-Order Speaker Cache) compression for intelligent long-range context
- Input: 16kHz mono audio
- Output: Per-frame speaker activity probabilities
Key Streaming Features
- Speaker Cache + FIFO buffers for long-range and recent context
- AOSC compression scores frames by per-speaker log-likelihood ratio, boosting underrepresented speakers
- Silence profiling fills cache gaps with running-mean silence embeddings
- Left/right context for chunk boundary handling in file mode
- Downloads last month
- 40
Hardware compatibility
Log In to add your hardware
Quantized
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support
Model tree for mlx-community/diar_streaming_sortformer_4spk-v2.1-fp32
Base model
nvidia/diar_streaming_sortformer_4spk-v2.1