Immediately, we’re excited to announce availability at Day Zero. NVIDIA Nemotron 3 Extremely With Amazon SageMaker JumpStart.
With this launch, now you can deploy Nemotron 3 Extremely fashions utilizing a one-click deployment expertise. Nemotron 3 Extremely is an open mannequin constructed for frontier inference and orchestration in long-running autonomous brokers, delivering 5x quicker inference and as much as 30% price financial savings for agent workloads. Nemotron 3 Extremely is optimized for the NVFP4 format, making it considerably quicker and more cost effective to host the mannequin.
NVIDIA Nemotron 3 Extremely Overview
NVIDIA Nemotron 3 Extremely is an open large-scale language mannequin with 550 billion whole parameters and 55 billion energetic parameters. It’s constructed on a hybrid Transformer-Mamba Combination-of-Specialists (MoE) structure and is designed to ship frontier intelligence at a fraction of the computational price of dense fashions of comparable high quality.
| specification | element |
|---|---|
| structure | Hybrid Transformer – Mamba MoE |
| parameters | Whole 550B / Energetic 55B |
| context size | As much as 1 million tokens |
| enter/output | Textual content enter, textual content output |
| accuracy | NVFP4 |
| Inference pace | Lengthy-running agent workflows are actually 5x quicker |
| Charge | As much as 30% discount for advanced agent duties |
Why agent AI wants a devoted mannequin
Brokers do not reply simply as soon as. They make plans, invoke instruments, delegate work to subagents, assessment outcomes, and repeat a whole lot of turns. Each step provides tokens and compute, so the important thing metrics are process completion with helpful accuracy, time to completion, and price per process.
Nemotron 3 Extremely immediately addresses this. Its MoE structure prompts solely 55B of 550B parameters per ahead move, sustaining excessive throughput even at context lengths of 1 million tokens. This implies brokers can plan, name instruments, and preserve a self-correcting loop spanning a whole lot of turns whereas serving to preserve consistency and management prices.
Enterprise use case
Nemotron 3 Extremely excels in workloads that require sustained, multi-step inference.
- agent orchestrator – Coordinate a number of subagents and handle state throughout lengthy instrument name chains
- coding agent – Generate, check, debug, and iterate code throughout giant repositories
- deep analysis – Combine info from a number of sources and preserve constant inferences throughout prolonged contexts
- Complicated enterprise workflows – Automate multi-step enterprise processes with resolution branching and error restoration.
Strive utilizing SageMaker JumpStar
You may deploy Nemotron 3 Extremely with one click on by way of Amazon SageMaker JumpStart, eliminating the necessity to handle infrastructure or configure a serving framework.
Stipulations
Earlier than you start, be sure to have the next:
- AWS account
- Applicable scope of permissions for SageMaker JumpStart
- Enough service quota for GPU cases (comparable to ml.p5en.48xlarge, ml.p5.48xlarge, or ml.g7e.48xlarge)
vital: Deploying this mannequin creates a SageMaker endpoint that incurs expenses whereas operating. GPU cases like ml.p5en.48xlarge can price a number of {dollars} per hour. For extra info, see Amazon SageMaker AI Pricing. Make sure to delete the endpoint if you’re completed to keep away from ongoing expenses.
Deploy utilizing SageMaker Studio
- Open Amazon SageMaker Studio
- Within the left navigation pane, choose SageMaker JumpStart
- Seek for Nemotron 3 Extremely
- Please choose a mannequin card
- Choose deployment
- Choose an occasion kind (supported occasion sorts are ml.p5en.48xlarge, ml.p5.48xlarge, or ml.g7e.48xlarge)
- Test your deployment settings (defaults are adequate for many use instances)
- Choose Deploy to create the endpoint
- Wait till the endpoint standing reveals InService earlier than continuing with inference.

Deploy utilizing the SageMaker Python SDK
carry out inference
cleansing
Delete the SageMaker endpoint if you’re carried out to keep away from pointless expenses.predictor.delete_endpoint()
conclusion
NVIDIA Nemotron 3 Extremely brings frontier-class inference to Amazon SageMaker JumpStart, making inference 5x quicker and decreasing prices for agent workloads by as much as 30%. With a hybrid Transformer-Mamba MoE structure and a 1 million token context window, it’s purpose-built for the persistent multi-step inference required by manufacturing brokers.
Whether or not you are constructing an agent orchestrator, coding agent, deep analysis system, or advanced enterprise automation, you’ll be able to deploy Nemotron 3 Extremely at the moment with SageMaker JumpStart.
Seek for Nemotron 3 Extremely on Amazon SageMaker JumpStart to get began at the moment.
Concerning the writer
Dan Ferguson I am an AWS options architect primarily based in New York, USA. Dan is a machine studying companies professional devoted to serving to clients combine ML workflows effectively, successfully, and sustainably.
Malaf Shastri He’s a software program growth engineer at AWS and is a part of the Amazon SageMaker JumpStart and Amazon Bedrock groups. His function focuses on enabling clients to reap the benefits of cutting-edge open supply and proprietary foundational fashions. Malav holds a grasp’s diploma in laptop science.
Vivek Gangasani World chief in answer structure, SageMaker Inference. He leads SageMaker Inference’s answer structure, technical go-to-market (GTM), and outbound product technique. We additionally assist enterprises and startups deploy and optimize GenAI fashions and construct AI workflows utilizing SageMaker and GPUs. At the moment, he focuses on growing methods and content material to optimize inference efficiency and use instances comparable to agentic workflows and RAGs. In my free time, I get pleasure from climbing, watching films, and sampling completely different cuisines.

