BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline Paper • 2408.15079 • Published Aug 27, 2024 • 56
Sommelier: Scalable Open Multi-turn Audio Pre-processing for Full-duplex Speech Language Models Paper • 2603.25750 • Published 25 days ago • 36
Running on CPU Upgrade Featured 102 Cohere Multilingual ASR 🎙 102 Transcribe audio clips to text in many languages
Running Featured 191 Voxtral TTS Demo ⚡ 191 Generate realistic speech from text with custom or preset voices
SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning Paper • 2603.23483 • Published 20 days ago • 62
Running on CPU Upgrade 219 The Synthetic Data Playbook: Generating Trillions of the Finest Tokens 📝 219 Explore synthetic data experiments on a virtual bookshelf