New analysis from Google AI proposes a deep-thinking ratio that improves LLM accuracy whereas slicing complete inference price in half

by root February 22, 2026

written by root February 22, 2026 0 comment 96 views

For the previous few years, the world of AI has adopted a easy rule. In different phrases, if you wish to resolve harder issues with large-scale language fashions (LLMs), you may Chain of Thought (CoT) Longer. Nonetheless, in line with a brand new research, College of Virginia and google It proves that “considering lengthy” and “considering onerous” are usually not the identical factor.

Researchers have revealed that you could truly create an AI just by including tokens to the response few Correct. As a substitute of counting phrases, Google researchers launched a brand new measurement technique. Deep Pondering Ratio (DTR).

“Token Maximization” failure‘

Engineers usually use the variety of tokens as a proxy for the trouble an AI places right into a activity. Nonetheless, the researchers discovered that the uncooked token counts had a median correlation. r= -0.59 Precisely.

This detrimental quantity implies that the extra textual content the mannequin produces, the extra probably it’s to be mistaken. This happens because of “overthinking” that causes the mannequin to get caught in loops, repeat redundant steps, or amplify errors within the mannequin itself. Relying solely on size wastes costly computation on uninformative tokens.

What’s Deep Pondering Token?

The analysis crew argued that the true “considering” occurs contained in the layers of the mannequin, not simply the ultimate output. When a mannequin predicts a token, it processes the information by means of a sequence of processes. transformer layer (left).

Shallow token: Merely put, the mannequin’s predictions stabilize early. The “guess” would not change a lot from layer 5 to layer 36.
Ponder Token: For troublesome logic and mathematical symbols, predictions change considerably at deeper layers.

The best way to measure depth

To determine these tokens, the analysis crew makes use of a way that appears into the interior “drafts” of the mannequin in any respect layers. These venture intermediate hidden states (h_T.L.) into the lexical house utilizing the mannequin. Non-embedded matrix (W_U). This provides us a likelihood distribution (p_t,l) layer by layer.

Then they calculate: Jensen Shannon Divergence (JSD) Between the distribution of the intermediate layer and the distribution of the ultimate layer (p_{t, L}):

D_t,l:= JSD(p_{t, L}|| p_t,l)

What’s a token? deep considering token If the prediction settles solely on the “late regime”, it’s outlined as: Depth proportion (⍴). Of their check, ⍴= 0.85. Which means the token stabilized solely within the final 15% of the tier.

of Deep Pondering Ratio (DTR) The proportion of those “onerous” tokens within the full sequence. Between fashions like Deep Search-R1-70B, Qwen3-30B-Ponderingand GPT-OSS-120BDTR confirmed the next sturdy optimistic correlations on common: r = 0.683 Precisely.

Suppose@n: Improve accuracy at 50% much less price

Utilizing this revolutionary strategy, the analysis crew Suppose@na brand new strategy to scale AI efficiency throughout inference.

Most builders use Self-consistency (Cons@n)the place to pattern 48 Choose totally different solutions and use majority voting to decide on the very best reply. That is very costly as each token needs to be generated for each reply.

Suppose@n makes use of “early stopping” to alter the sport.

The mannequin begins producing a number of doable solutions.
Instantly after 50 prefix tokensthe system calculates the DTR for every candidate.
Instantly cease producing “unlikely” candidates with low DTR.
Terminate solely candidates with excessive deep considering scores.

AIME 2025 outcomes

technique	accuracy	Common price (ok tokens)
Cons@n (majority vote)	92.7%	307.6
Suppose@n (DTR based mostly choice)	94.7%	155.4

in AIME25 Achieved Math Benchmark Suppose@n greater accuracy whereas lowering inference prices over customary voting. 49%^.

Vital factors

The variety of tokens shouldn’t be adequate to foretell accuracy. Uncooked output size has a median detrimental correlation with efficiency (r = -0.59). Which means longer inference traces usually point out “overthinking” relatively than top quality.
Tokens of deep considering outline true effort. Not like easy tokens which can be secure in early layers, deep-thinking tokens are those who endure important modification in deeper mannequin layers earlier than inside predictions converge.
Deep-Pondering Ratio (DTR) is a superb metric. DTR measures the proportion of deep-thinking tokens in a sequence and reveals a sturdy optimistic correlation with accuracy (common r = 0.683), persistently outperforming both length-based or confidence-based baselines.
Suppose@n allows environment friendly check time scaling. By prioritizing and finalizing solely samples with a excessive charge of deep considering, the Suppose@n technique matches or exceeds the efficiency of normal majority voting (Cons@n).
Important price financial savings because of early shutdown: Since DTR could be estimated from a brief prefix of simply 50 tokens, unpromising generations could be rejected early, lowering the entire inference price by about 50%.

Please verify paper. Additionally, be at liberty to observe us Twitter Remember to affix us 100,000+ ML subreddits and subscribe our newsletter. grasp on! Are you on telegram? You can now also participate by telegram.

Welcome to Ivugangingo!

At Ivugangingo, we're passionate about delivering insightful content that empowers and informs our readers across a spectrum of crucial topics. Whether you're delving into the world of insurance, navigating the complexities of cryptocurrency, or seeking wellness tips in health and fitness, we've got you covered.

New analysis from Google AI proposes a deep-thinking ratio that improves LLM accuracy whereas slicing complete inference price in half

“Token Maximization” failure‘

What’s Deep Pondering Token?

The best way to measure depth

Suppose@n: Improve accuracy at 50% much less price

AIME 2025 outcomes

Vital factors

Bitcoin retailer balances document fast outflow—what’s going to occur to the worth?

What you might want to learn about at-home STD testing: Execs, cons, and suggestions (2026)

Converter

Editors Pick

Newsletter

Categories

Related Posts

Leave a Comment Cancel Reply

Latest

Best selling