NVIDIA has introduced the discharge. Nemotron Cascade 2indiscriminate class 30B Blended Specialists (MoE) with mannequin 3B Enabling Parameters. This mannequin focuses on maximizing “intelligence density” and gives superior inference capabilities at a fraction of the parameter scale utilized in frontier fashions. Nemotron-Cascade 2 is the second open weight LLM I’ve accomplished. gold medal stage efficiency It would compete within the 2025 Worldwide Arithmetic Olympiad (IMO), Worldwide Olympiad in Informatics (IOI), and ICPC World Finals.

Focused efficiency and strategic trade-offs
Nemotron-Cascade 2’s important worth proposition is specialised efficiency in mathematical reasoning, coding, alignment, and instruction following. Though we obtain state-of-the-art ends in these key inference-intensive areas, we don’t obtain “all-out wins” on each benchmark.
This mannequin performs higher in a number of goal classes in comparison with just lately launched fashions. Quen 3.5-35B-A3B (February 2026) and better Nemotron-3-Tremendous-120B-A12B:
- Mathematical reasoning: Efficiency higher than Qwen3.5-35B-A3B AIME2025 (92.4 vs. 91.9) and HMMT February twenty fifth (94.6 vs. 89.0).
- coding: lead on stay code bench v6 (87.2 vs. 74.6) and IOI 2025 (439.28 vs. 348.6+).
- Changes and directions are as follows: your rating might be considerably increased enviornment laborious v2 (83.5 vs. 65.4+) and IF bench (82.9 vs. 70.2).


Technical structure: Cascaded RL and multi-domain on-policy distillation (police police)
The mannequin’s inference capabilities start with the post-training pipeline. Nemotron-3-Nano-30B-A3B-Base mannequin.
1. Supervised Advantageous-Tuning (SFT)
Throughout SFT, the NVIDIA analysis workforce utilized a rigorously curated dataset through which samples had been packed into as much as two sequences. 256,000 tokens. The dataset included:
- 1.9 million Python inference traces 1.3 million Python device name samples for aggressive coding.
- 816K samples For proof of mathematical pure language.
- skilled Software program Engineering (SWE) Mix Consists of 125K agent samples and 389K agentless samples.
2. Cascade reinforcement studying
Following SFT, the mannequin is Cascade RLapply sequential domain-wise coaching.. This enables hyperparameters to be tuned to a selected area with out destabilizing different parameters, thus stopping catastrophic forgetting.. The pipeline contains levels for instruction following (IF-RL), multidomain RL, RLHF, lengthy context RL, particular code, and SWE RL..


3. Multi-domain on-policy distillation (MOPD)
The important thing improvements of Nemotron-Cascade 2 are: police police Through the cascading RL course of. MOPD meeting gives the advantages of dense token-level distillation utilizing the best-performing intermediate “supervised” mannequin already derived from the identical SFT initialization. This benefit is mathematically outlined as:
$$a_{t}^{MOPD}=log~pi^{domain_{t}}(y_{t}|s_{t})-log~pi^{practice}(y_{t}|s_{t})$$
The analysis workforce discovered that MOPD is considerably extra pattern environment friendly than sequence-level reward algorithms reminiscent of: Group relative coverage optimization (GRPO). for instance, AIME25MOPD reached teacher-level efficiency (92.0) inside 30 steps, whereas GRPO might solely obtain 91.0 by matching these steps.
Interplay between the inference perform and the agent
Nemotron-Cascade 2 helps two important modes of operation via chat templates.
- Considering mode: It began with one particular person
<suppose>The token is adopted by a newline. This allows deep reasoning for advanced math and code duties. - Non-thinking mode: Activated by prepending an empty string
<suppose></suppose>Block for a extra environment friendly and direct response.
For agent duties, the mannequin makes use of a structured device invocation protocol inside system prompts.. The instruments accessible are: <instruments> The mannequin is instructed to execute the device name enclosed within the tag. <tool_call> Tags to make sure verifiable execution suggestions.
By specializing in “intelligence density,” Nemotron-Cascade 2 demonstrated that specialised inference talents, beforehand regarded as the unique area of frontier-scale fashions, are achievable at 30B scale via domain-specific reinforcement studying.
try paper and HF model. Please be happy to observe us too Twitter Do not forget to affix us 120,000+ ML subreddits and subscribe our newsletter. dangle on! Are you on telegram? You can now also participate by telegram.

