Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning Paper • 2510.03259 • Published Sep 26, 2025 • 57
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense Paper • 2510.07242 • Published Oct 8, 2025 • 30
First Try Matters: Revisiting the Role of Reflection in Reasoning Models Paper • 2510.08308 • Published Oct 9, 2025 • 24
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward Paper • 2510.03222 • Published Oct 3, 2025 • 76
Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by Refining Belief States Paper • 2510.11052 • Published Oct 13, 2025 • 52
RLFR: Extending Reinforcement Learning for LLMs with Flow Environment Paper • 2510.10201 • Published Oct 11, 2025 • 36
Demystifying Reinforcement Learning in Agentic Reasoning Paper • 2510.11701 • Published Oct 13, 2025 • 33
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs Paper • 2510.11696 • Published Oct 13, 2025 • 182
Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning Paper • 2510.27044 • Published Oct 30, 2025 • 6
Reverse-Engineered Reasoning for Open-Ended Generation Paper • 2509.06160 • Published Sep 7, 2025 • 151
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Paper • 2509.22186 • Published Sep 26, 2025 • 161
FlowRL: Matching Reward Distributions for LLM Reasoning Paper • 2509.15207 • Published Sep 18, 2025 • 118
Towards a Unified View of Large Language Model Post-Training Paper • 2509.04419 • Published Sep 4, 2025 • 76
Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models Paper • 2509.06949 • Published Sep 8, 2025 • 57
Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach for LLM Reasoning in RLVR Paper • 2509.23808 • Published Sep 28, 2025 • 47
Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models Paper • 2511.23319 • Published Nov 28, 2025 • 24
Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making Paper • 2602.06570 • Published Feb 6 • 61
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published Apr 8 • 324
The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook Paper • 2604.02029 • Published Apr 2 • 148
TriAttention: Efficient Long Reasoning with Trigonometric KV Compression Paper • 2604.04921 • Published Apr 6 • 113
Why Fine-Tuning Encourages Hallucinations and How to Fix It Paper • 2604.15574 • Published 23 days ago • 23
Hallucinations Undermine Trust; Metacognition is a Way Forward Paper • 2605.01428 • Published 7 days ago • 18
Efficient Training on Multiple Consumer GPUs with RoundPipe Paper • 2604.27085 • Published 10 days ago • 40
Learning from Noisy Preferences: A Semi-Supervised Learning Approach to Direct Preference Optimization Paper • 2604.24952 • Published 12 days ago • 6
Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key Paper • 2605.06638 • Published 2 days ago • 10