arxiv:2602.08354

Does Your Reasoning Model Implicitly Know When to Stop Thinking?

Published on Feb 9

· Submitted by

Yikun Ban on Feb 23

Authors:

Zixuan Huang ,

Yikun Ban ,

Abstract

Large reasoning models can implicitly determine optimal stopping points for thinking, which SAGE-RL enhances by incorporating efficient reasoning patterns into pass@1 inference for improved accuracy and efficiency.

AI-generated summary

Recent advancements in large reasoning models (LRMs) have greatly improved their capabilities on complex reasoning tasks through Long Chains of Thought (CoTs). However, this approach often results in substantial redundancy, impairing computational efficiency and causing significant delays in real-time applications. Recent studies show that longer reasoning chains are frequently uncorrelated with correctness and can even be detrimental to accuracy. In a further in-depth analysis of this phenomenon, we surprisingly uncover and empirically verify that LRMs implicitly know the appropriate time to stop thinking, while this capability is obscured by current sampling paradigms. Motivated by this, we introduce SAGE (Self-Aware Guided Efficient Reasoning), a novel sampling paradigm that unleashes this efficient reasoning potential. Furthermore, integrating SAGE as mixed sampling into group-based reinforcement learning (SAGE-RL) enables SAGE-RL to effectively incorporate SAGE-discovered efficient reasoning patterns into standard pass@1 inference, markedly enhancing both the reasoning accuracy and efficiency of LRMs across multiple challenging mathematical benchmarks.

View arXiv page View PDF Project page Add to collection

Community

Yikunb

Paper author Paper submitter Feb 23

•

edited about 1 month ago

Large reasoning models already implicitly know when they have reached the correct answer.
We just don’t let them stop.
Project Page: https://hzx122.github.io/sage-rl/

We would love to share our codes! Please contact: [email protected]

travisking

about 1 month ago

I haven't read this paper yet (soon) but I can fairly confidently tell you that https://hf.co/Nanbeige/Nanbeige4.1-3B cannot tell when to stop

hzxllll

Paper author about 1 month ago

•

edited about 1 month ago

Haha, thanks for your comment. Efficient inference potential is inherently tied to the model itself; it cannot exceed the model’s inherent upper limit. Our self-aware guided efficient reasoning (SAGE) leverages cumulative self-confidence to discover concise, correct reasoning chains based on the inherent potential of the model. We further integrate SAGE into RL via SAGE-RL, a minimal modification to RLVR that incorporates efficient reasoning patterns into the model's standard pass@1 inference.

librarian-bot

about 1 month ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

avahal

about 1 month ago

arXivLens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/does-your-reasoning-model-implicitly-know-when-to-stop-thinking-1467-bceb5ae4

Executive Summary
Detailed Breakdown
Practical Applications

Yikunb

Paper author Paper submitter about 1 month ago

•

edited about 1 month ago

We would love to share our codes! Please contact: [email protected]

NewbieYoung

28 days ago

This comment has been hidden (marked as Resolved)

grantsing

28 days ago

found a solid walkthrough of this research on when models know to stop thinking https://arxivexplained.com/paper/does-your-reasoning-model-implicitly-know-when-to-stop-thinking the SAGE framework for efficient reasoning is pretty clever

aidenwright

20 days ago

I have found an unofficial reproduction code repository for this paper:
https://github.com/shenlilinghua-ux/sage-rl

Tommy-DING

4 days ago

You claim SAGE is more efficient than pass@1 and pass@k-style sampling, but the paper never provides a rigorous accounting of the total inference cost of the search procedure itself. Nearly all of the reported “token efficiency” gains are based on the final response length, not on the full number of tokens generated during branching, step expansion, and candidate pruning. Since SAGE repeatedly samples multiple reasoning steps per frontier and only later discards most of them, the actual token budget consumed during search could be substantially larger than what is reflected by the final output length. Without a controlled comparison that reports total generated tokens / decoding cost / FLOPs under matched accuracy targets against pass@1 and pass@k baselines, why should we accept the claim that SAGE is truly more efficient, rather than simply producing shorter final answers after a more expensive hidden search process?

hzxllll

Paper author 3 days ago

•

edited 1 day ago

We appreciate your interest in our paper. First, we do not claim that SAGE is more efficient than pass@1 and pass@k. Second, efficient inference does not come for free. SAGE itself is a search algorithm based on policy confidence. We merely aim to demonstrate that, given a certain exploration width, the policy model is capable of finding efficient reasoning chains on its own. We do not intend to compare SAGE as an effective decoding algorithm. The efficiency you mentioned in our work is mainly reflected in the fact that SAGE-RL-tuned models can effectively improve token efficiency while improving pass@1.