AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement Paper โข 2511.23475 โข Published 10 days ago โข 41
Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation Paper โข 2509.18824 โข Published Sep 23 โข 22
pyannote/speaker-diarization-3.1 Automatic Speech Recognition โข Updated May 10, 2024 โข 15.2M โข 1.35k