-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 24 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 152 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
Collections
Discover the best community collections!
Collections including paper arxiv:2509.04664
-
Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning
Paper • 2510.03259 • Published • 57 -
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense
Paper • 2510.07242 • Published • 30 -
First Try Matters: Revisiting the Role of Reflection in Reasoning Models
Paper • 2510.08308 • Published • 24 -
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
Paper • 2510.03222 • Published • 75
-
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 277 -
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 263 -
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Paper • 2507.01006 • Published • 251 -
A Survey of Context Engineering for Large Language Models
Paper • 2507.13334 • Published • 261
-
Reasoning with Sampling: Your Base Model is Smarter Than You Think
Paper • 2510.14901 • Published • 48 -
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?
Paper • 2505.23359 • Published • 38 -
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation
Paper • 2506.02397 • Published • 36 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 144
-
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
Paper • 2509.09372 • Published • 247 -
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth
Paper • 2509.03867 • Published • 211 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 232 -
Why Language Models Hallucinate
Paper • 2509.04664 • Published • 196
-
Visual Representation Alignment for Multimodal Large Language Models
Paper • 2509.07979 • Published • 84 -
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper • 2509.07980 • Published • 105 -
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth
Paper • 2509.03867 • Published • 211 -
Why Language Models Hallucinate
Paper • 2509.04664 • Published • 196
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 24 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 152 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
Reasoning with Sampling: Your Base Model is Smarter Than You Think
Paper • 2510.14901 • Published • 48 -
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?
Paper • 2505.23359 • Published • 38 -
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation
Paper • 2506.02397 • Published • 36 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 144
-
Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning
Paper • 2510.03259 • Published • 57 -
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense
Paper • 2510.07242 • Published • 30 -
First Try Matters: Revisiting the Role of Reflection in Reasoning Models
Paper • 2510.08308 • Published • 24 -
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
Paper • 2510.03222 • Published • 75
-
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
Paper • 2509.09372 • Published • 247 -
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth
Paper • 2509.03867 • Published • 211 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 232 -
Why Language Models Hallucinate
Paper • 2509.04664 • Published • 196
-
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 277 -
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 263 -
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Paper • 2507.01006 • Published • 251 -
A Survey of Context Engineering for Large Language Models
Paper • 2507.13334 • Published • 261
-
Visual Representation Alignment for Multimodal Large Language Models
Paper • 2509.07979 • Published • 84 -
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper • 2509.07980 • Published • 105 -
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth
Paper • 2509.03867 • Published • 211 -
Why Language Models Hallucinate
Paper • 2509.04664 • Published • 196