view article Article Welcome Gemma 4: Frontier multimodal intelligence on device +5 merve, pcuenq, sergiopaniego, burtenshaw, Steveeeeeeen, alvarobartt, SaylorTwift • Apr 2 • 898
LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels Paper • 2603.19312 • Published Mar 13 • 46
LeWM Collection Official checkpoints and datasets related to LeWM paper. • 9 items • Updated Mar 27 • 39
Beyond Language Modeling: An Exploration of Multimodal Pretraining Paper • 2603.03276 • Published Mar 3 • 104
view article Article Mixture of Experts (MoEs) in Transformers +5 ariG23498, pcuenq, merve, IlyasMoutawwakil, ArthurZ, sergiopaniego, Molbap • Feb 26 • 161
view article Article We Got Claude to Fine-Tune an Open Source LLM burtenshaw, evalstate • Dec 4, 2025 • 627
view article Article Transformers v5: Simple model definitions powering the AI ecosystem +2 lysandre, ArthurZ, cyrilvallez, reach-vb • Dec 1, 2025 • 311
view article Article nanoVLM: The simplest repository to train your VLM in pure PyTorch +5 ariG23498, lusxvr, andito, sergiopaniego, merve, pcuenq, reach-vb • May 21, 2025 • 258
view article Article The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix codelion • Nov 3, 2025 • 65
view article Article Why Did MiniMax M2 End Up as a Full Attention Model? MiniMax-AI • Oct 30, 2025 • 80
view article Article We’re open-sourcing our text-to-image model and the process behind it Photoroom • Nov 12, 2025 • 99
view article Article Streaming datasets: 100x More Efficient +3 andito, lhoestq, burtenshaw, pcuenq, merve • Oct 27, 2025 • 86