On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models Paper • 2512.07783 • Published 6 days ago • 31
DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle Paper • 2512.04324 • Published 10 days ago • 147
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models Paper • 2512.02556 • Published 12 days ago • 208
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling Paper • 2511.20785 • Published 19 days ago • 150
Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights Paper • 2512.01816 • Published 13 days ago • 88
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Paper • 2512.01374 • Published 13 days ago • 89
Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance Paper • 2511.13254 • Published 27 days ago • 134
P1: Mastering Physics Olympiads with Reinforcement Learning Paper • 2511.13612 • Published 27 days ago • 132
MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling Paper • 2511.11793 • Published 30 days ago • 161
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds Paper • 2511.08892 • Published Nov 12 • 195
DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation Paper • 2511.06307 • Published Nov 9 • 50
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B Paper • 2511.06221 • Published Nov 9 • 130
The Path Not Taken: RLVR Provably Learns Off the Principals Paper • 2511.08567 • Published Nov 11 • 32
IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction Paper • 2511.07327 • Published Nov 10 • 75
Too Good to be Bad: On the Failure of LLMs to Role-Play Villains Paper • 2511.04962 • Published Nov 7 • 52