LLM - a Chevolier Collection

Chevolier 's Collections

Self-Improving AI

Image Generation

Video Generation

LLM

updated about 7 hours ago

Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning

Paper • 2510.03259 • Published Sep 26, 2025 • 57
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense

Paper • 2510.07242 • Published Oct 8, 2025 • 30
First Try Matters: Revisiting the Role of Reflection in Reasoning Models

Paper • 2510.08308 • Published Oct 9, 2025 • 24
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward

Paper • 2510.03222 • Published Oct 3, 2025 • 76
Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by Refining Belief States

Paper • 2510.11052 • Published Oct 13, 2025 • 52
RLFR: Extending Reinforcement Learning for LLMs with Flow Environment

Paper • 2510.10201 • Published Oct 11, 2025 • 36
Making Mathematical Reasoning Adaptive

Paper • 2510.04617 • Published Oct 6, 2025 • 23
Demystifying Reinforcement Learning in Agentic Reasoning

Paper • 2510.11701 • Published Oct 13, 2025 • 33
Are Large Reasoning Models Interruptible?

Paper • 2510.11713 • Published Oct 13, 2025 • 5
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

Paper • 2510.11696 • Published Oct 13, 2025 • 182
Deep Self-Evolving Reasoning

Paper • 2510.17498 • Published Oct 20, 2025 • 12
Continuous Autoregressive Language Models

Paper • 2510.27688 • Published Oct 31, 2025 • 74
Higher-order Linear Attention

Paper • 2510.27258 • Published Oct 31, 2025 • 15
Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning

Paper • 2510.27044 • Published Oct 30, 2025 • 6
Why Language Models Hallucinate

Paper • 2509.04664 • Published Sep 4, 2025 • 199
Reverse-Engineered Reasoning for Open-Ended Generation

Paper • 2509.06160 • Published Sep 7, 2025 • 151
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

Paper • 2509.22186 • Published Sep 26, 2025 • 161
FlowRL: Matching Reward Distributions for LLM Reasoning

Paper • 2509.15207 • Published Sep 18, 2025 • 118
Towards a Unified View of Large Language Model Post-Training

Paper • 2509.04419 • Published Sep 4, 2025 • 76
Variational Reasoning for Language Models

Paper • 2509.22637 • Published Sep 26, 2025 • 69
Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models

Paper • 2509.06949 • Published Sep 8, 2025 • 57
Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach for LLM Reasoning in RLVR

Paper • 2509.23808 • Published Sep 28, 2025 • 47
Sequential Diffusion Language Models

Paper • 2509.24007 • Published Sep 28, 2025 • 47
Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models

Paper • 2511.23319 • Published Nov 28, 2025 • 24
GLM-5: from Vibe Coding to Agentic Engineering

Paper • 2602.15763 • Published Feb 17 • 149
Experiential Reinforcement Learning

Paper • 2602.13949 • Published Feb 15 • 74
Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making

Paper • 2602.06570 • Published Feb 6 • 61
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability

Paper • 2604.06628 • Published Apr 8 • 324
Self-Distilled RLVR

Paper • 2604.03128 • Published Apr 3 • 170
The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook

Paper • 2604.02029 • Published Apr 2 • 148
TriAttention: Efficient Long Reasoning with Trigonometric KV Compression

Paper • 2604.04921 • Published Apr 6 • 113
Can LLMs Learn to Reason Robustly under Noisy Supervision?

Paper • 2604.03993 • Published Apr 5 • 42
Large Language Models Explore by Latent Distilling

Paper • 2604.24927 • Published 12 days ago • 72
Why Fine-Tuning Encourages Hallucinations and How to Fix It

Paper • 2604.15574 • Published 23 days ago • 23
Hallucinations Undermine Trust; Metacognition is a Way Forward

Paper • 2605.01428 • Published 7 days ago • 18
Co-Evolving Policy Distillation

Paper • 2604.27083 • Published 10 days ago • 62
Efficient Training on Multiple Consumer GPUs with RoundPipe

Paper • 2604.27085 • Published 10 days ago • 40
Learning from Noisy Preferences: A Semi-Supervised Learning Approach to Direct Preference Optimization

Paper • 2604.24952 • Published 12 days ago • 6
Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key

Paper • 2605.06638 • Published 2 days ago • 10
Continuous Latent Diffusion Language Model

Paper • 2605.06548 • Published 2 days ago • 52