view article Article Tokenization in Transformers v5: Simpler, Clearer, and More Modular +4 itazap, ariG23498, ArthurZ, sergiopaniego, merve, pcuenq • Dec 18, 2025 • 124
view article Article Continuous batching from first principles +1 ror, ArthurZ, mcpotato • Nov 25, 2025 • 388
view article Article Fixing Gradient Accumulation +4 lysandre, ArthurZ, muellerzr, ydshieh, BenjaminB, pcuenq • Oct 16, 2024 • 66
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge NormalUhr • Feb 7, 2025 • 293
view article Article Introducing EuroBERT: A High-Performance Multilingual Encoder Model EuroBERT • Mar 10, 2025 • 147
view article Article Open-R1: a fully open reproduction of DeepSeek-R1 +1 eliebak, lvwerra, lewtun • Jan 28, 2025 • 889
ModernBERT Collection Bringing BERT into modernity via both architecture changes and scaling • 3 items • Updated Dec 19, 2024 • 159