Run NVIDIA Nemotron 3 Nano as a completely managed serverless mannequin on Amazon Bedrock

by root March 10, 2026

written by root March 10, 2026 0 comment 56 views

This submit was co-authored with NVIDIA’s Abdullahi Olaoye, Curtis Lockhart, and Nirmal Kumar Juluru.

I’m happy to announce this NVIDIA’s Nemotron 3 Nano is now obtainable as a completely managed serverless mannequin on Amazon Bedrock. This follows our earlier announcement at AWS re:Invent to help the NVIDIA Nemotron 2 Nano 9B and NVIDIA Nemotron 2 Nano VL 12B fashions.

and NVIDIA Nemotron Amazon Bedrock’s open mannequin lets you speed up innovation and ship tangible enterprise worth with out managing infrastructure complexity. You need to use Nemotron’s capabilities by Amazon Bedrock’s inference capabilities to energy your generated AI functions and make the most of its intensive capabilities and instruments.

On this submit, we discover the technical options of the NVIDIA Nemotron 3 Nano mannequin and talk about potential software use instances. Moreover, we offer technical steerage that can assist you get began utilizing this mannequin on your generated AI functions inside your Amazon Bedrock surroundings.

About Nemotron 3 Nano

NVIDIA Nemotron 3 Nano is a small language mannequin (SLM) with a hybrid mixed-of-experts (MoE) structure that delivers excessive computational effectivity and accuracy that builders can use to construct specialised agent AI techniques. The mannequin is totally open with open weights, datasets, and recipes, selling transparency and belief for builders and companies. In comparison with different equally sized fashions, the Nemotron 3 Nano excels at coding and inference duties, main in benchmarks corresponding to SWE Bench Verified, AIME 2025, Area Onerous v2, and IFBench.

Mannequin overview:

Structure:
- Combined Experience (MoE) with Hybrid Transformer-Mamba Structure
- Helps token budgets and offers precision whereas avoiding overthinking
Accuracy:
- Glorious accuracy in coding, scientific reasoning, arithmetic, calling instruments, following directions, and chatting
- Nemotron 3 Nano outperforms in benchmarks like SWE Bench, AIME 2025, Humanity Final Examination, IFBench, RULER, Area Onerous (in comparison with different open language fashions with MoE under 30 billion)
Mannequin measurement: 30 B (with 3 B energetic parameters)
Context size: 256K
Mannequin enter: textual content
Mannequin output: textual content

Nemotron 3 Nano combines Mamba, Transformer, and Combination-of-Specialists layers into one spine to assist steadiness effectivity, inference accuracy, and scale. Mamba allows long-range sequence modeling with low reminiscence overhead, whereas the Transformer layer helps present exact consideration to structured reasoning duties corresponding to code, math, and planning. MoE routing additional improves scalability by activating solely a subset of specialists per token, which helps enhance latency and throughput. This makes Nemotron 3 Nano significantly appropriate for agent clusters that run many light-weight workflows concurrently.

For extra data on the Nemotron 3 Nano structure and the way to practice it, see: Inside the NVIDIA Nemotron 3: Techniques, tools, and data that make your NVIDIA Nemotron 3 efficient and accurate.

Benchmarking the mannequin

The next picture exhibits the Nemotron 3 Nano main in essentially the most enticing quadrant. artificial analysis Openness index and intelligence index. Why openness issues: Openness builds belief by transparency. Builders and enterprises can construct with confidence on Nemotron, which offers clear visibility into fashions, information pipelines, and information traits for straightforward auditing and governance.

title: Graph exhibiting that Nemotron 3 Nano is in essentially the most enticing quadrant of synthetic evaluation openness and intelligence index (supply: artificial analysis)

As proven within the picture under, the Nemotron 3 Nano affords superior accuracy with the best effectivity of any open mannequin, incomes a formidable rating of 52 factors, considerably larger than the earlier Nemotron 2 Nano mannequin. As agent AI will increase the demand for tokens, the power to “assume quick” (reaching the suitable reply shortly utilizing fewer tokens) is necessary. Nemotron 3 Nano delivers excessive throughput with an environment friendly Hybrid Transformer-Mamba and MoE structure.

title: NVIDIA Nemotron 3 Nano affords the best accuracy and highest effectivity of any open mannequin, with a formidable rating of 52 factors within the Synthetic Analytics Intelligence vs. Output Pace Index. (sauce: artificial analysis)

NVIDIA Nemotron 3 Nano utilization instance

Nemotron 3 Nano helps energy numerous use instances in numerous industries. Examples of use embody:

Finance – Speed up mortgage processing by extracting information, analyzing income patterns, detecting fraud, and lowering cycle time and danger.
Cybersecurity – Robotically prioritize vulnerabilities, carry out deep malware evaluation, and proactively seek for safety threats.
Software program improvement – Help with duties corresponding to code summarization.
Retail – Optimize stock administration and improve in-store service with real-time personalised product suggestions and help.

Strive utilizing NVIDIA Nemotron 3 Nano on Amazon Bedrock

To check the NVIDIA Nemotron 3 Nano on Amazon Bedrock, observe these steps:

Transfer to. Amazon Bedrock Console and choose Chat/Textual content Playground From the menu on the left ( check part).
select Please choose a mannequin It’s positioned within the higher left nook of the playground.
select Nvidia Choose from the class checklist. NVIDIA Nemotron 3 Nano.
select apply Click on to load the mannequin.

After making your choice, you’ll be able to instantly check your mannequin. Let’s generate a unit check in Python code utilizing the next immediate. pytest Framework:

Write a pytest unit check suite for a Python operate known as calculate_mortgage(principal, price, years). Embrace check instances for: 1) A regular 30-year mounted mortgage 2) An edge case with 0% curiosity 3) Error dealing with for damaging enter values.

Complicated duties like this immediate can profit from a chain-of-thought strategy to generate correct outcomes primarily based on the inference capabilities natively constructed into the mannequin.

Utilizing the AWS CLI and SDKs

You may entry the mannequin programmatically utilizing the mannequin ID. nvidia.nemotron-nano-3-30b. This mannequin helps each InvokeModel and Converse APIs by the AWS Command Line Interface (AWS CLI) and AWS SDKs nvidia.nemotron-nano-3-30b as a mannequin ID. Moreover, it helps Amazon Bedrock OpenAI SDK appropriate APIs.

Invoke the mannequin straight from the terminal by operating the next command: AWS Command Line Interface (AWS CLI) and InvokeModel API:

aws bedrock-runtime invoke-model  
 --model-id nvidia.nemotron-nano-3-30b  
 --region us-west-2  
 --body '{"messages": [{"role": "user", "content": "Type_Your_Prompt_Here"}], "max_tokens": 512, "temperature": 0.5, "top_p": 0.9}'  
 --cli-binary-format raw-in-base64-out  
invoke-model-output.txt

To name the mannequin by the AWS SDK for Python (boto3), Immediate the mannequin utilizing the next script. On this case, use the Converse API.

import boto3 
from botocore.exceptions import ClientError 

# Create a Bedrock Runtime consumer within the AWS Area you need to use. 
consumer = boto3.consumer("bedrock-runtime", region_name="us-west-2") 

# Set the mannequin ID
model_id = "nvidia.nemotron-nano-3-30b" 

# Begin a dialog with the person message. 

user_message = "Type_Your_Prompt_Here" 
dialog = [ 
   { 
       "role": "user", 

       "content": [{"text": user_message}], 
   } 
]  

attempt: 
   # Ship the message to the mannequin utilizing a fundamental inference configuration. 
   response = consumer.converse( 
        modelId=model_id, 

       messages=dialog, 
        inferenceConfig={"maxTokens": 512, "temperature": 0.5, "topP": 0.9}, 
   ) 
 
   # Extract and print the response textual content. 
    response_text = response["output"]["message"]["content"][0]["text"] 
   print(response_text)

besides (ClientError, Exception) as e: 
    print(f"ERROR: Cannot invoke '{model_id}'. Motive: {e}") 
    exit(1)

To name a mannequin through Amazon Bedrock OpenAI Compatibility ChatCompletions Endpoints let you do that utilizing the OpenAI SDK.

# Import OpenAI SDK
from openai import OpenAI

# Set surroundings variables
os.environ["OPENAI_API_KEY"] = "<insert your bedrock API key>"
os.environ["OPENAI_BASE_URL"] = "https://bedrock-runtime.<AWS area>.amazon.com/openai/v1"

# Set the mannequin ID
model_id = "nvidia.nemotron-nano-3-30b"

# Set prompts
system_prompt = “Type_Your_System_Prompt_Here”
user_message = "Type_Your_User_Prompt_Here"


# Use ChatCompletionsAPI
response = consumer.chat.completions.create(
    mannequin= mannequin _ID,                 
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user",   "content": user_message}
    ],
    temperature=0,
    max_completion_tokens=1000
)
 
# Extract and print the response textual content
print(response.selections[0].message.content material)

Utilizing NVIDIA Nemotron 3 Nano with Amazon Bedrock Options

Energy your generative AI functions by combining Nemotron 3 Nano with Amazon Bedrock administration instruments. Implement security measures utilizing Amazon Bedrock Guardrails and create sturdy retrieval extension technology (RAG) workflows utilizing Amazon Data Base.

Amazon bedrock guardrail

Guardrails is a managed security layer that helps implement accountable AI by filtering dangerous content material, redacting delicate data (PII), and blocking particular subjects throughout prompts and responses. Works throughout a number of fashions to assist detect rapid injection assaults and hallucinations.

Utilization instance: For those who’re constructing a mortgage assistant, you’ll be able to select to not present normal funding recommendation. Configuring a filter for the phrase “shares” instantly blocks person prompts containing that phrase and permits them to obtain a customized message.

To arrange guardrails, observe these steps:

in Amazon Bedrock ConsoleTransfer to construct Choose the part on the left guardrail.
Create new guardrails and configure the filters wanted on your use case.

After configuration, check your guardrails utilizing numerous prompts to test efficiency. You may then fine-tune settings corresponding to denied subjects, phrase filters, and PII enhancing to fit your particular security necessities. For extra data, see Create Guardrails.

Amazon Bedrock Data Base

Amazon Bedrock Data Bases automates the entire RAG workflow. It handles ingesting content material from information sources, chunking it into searchable segments, changing it to vector embedding, and saving it to a vector database. Then, when a person submits a question, the system matches the enter towards the saved vectors to seek out semantically related content material. This content material is used to enhance the prompts despatched to the underlying mannequin.

On this instance, we uploaded a PDF (e.g. buy a new house, Mortgage Toolkit, Buy a Mortgage) to Amazon Easy Storage Service (Amazon S3) and chosen Amazon OpenSearch Serverless because the vector retailer. The next code exhibits the way to use the RetrieveAndGenerate API to question this data base to routinely facilitate security compliance changes by a particular Guardrail ID.

import boto3
bedrock_agent_runtime_client = boto3.consumer('bedrock-agent-runtime')
response = bedrock_agent_runtime_client.retrieve_and_generate(
    enter={
        'textual content': 'I'm curious about buying a house. What steps ought to I take to verify I'm ready to tackle a mortgage?'
    },
    retrieveAndGenerateConfiguration={
        'knowledgeBaseConfiguration': {
            'generationConfiguration': {
                'guardrailConfiguration': {
                    'guardrailId': '<INSERT GUARDRAIL ID>',
                    'guardrailVersion': '1'
                }
            },
            'knowledgeBaseId': '<INSERT KNOWLEDGE BASE ID>',
            'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/nvidia.nemotron-nano-3-30b',
            "generationConfiguration": {
                "promptTemplate": {
                    "textPromptTemplate": (
                        "You're a useful assistant that solutions questions on mortgages"
                        "search outcomes.nn"
                        "Search outcomes:n$search_results$nn"
                        "Person question:n$question$nn"
                        "Reply clearly and concisely."
                    )
                },
            },
            "orchestrationConfiguration": {
                "promptTemplate": {
                    "textPromptTemplate": (
                        "You might be very educated on mortgages"
                        "Dialog to date:n$conversation_history$nn"
                        "Person question:n$question$nn"
                        "$output_format_instructions$"
                    )
                }
            }
        },
        'kind': 'KNOWLEDGE_BASE'
    }
)
print(response)

This instructs the NVIDIA Nemotron 3 Nano mannequin to make use of a customized immediate template to synthesize retrieved documentation into clear, well-founded solutions. To arrange your individual pipeline, try the entire tutorial within the Amazon Bedrock Person Information.

conclusion

On this submit, we confirmed you the way to get began utilizing NVIDIA Nemotron 3 Nano with Amazon Bedrock for totally managed serverless inference. We additionally confirmed you the way to use the mannequin with Amazon Bedrock Data Bases and Amazon Bedrock Guardrails. This mannequin is at present obtainable within the following AWS Areas: US East (N. Virginia), US East (Ohio), US West (Oregon), Asia Pacific (Tokyo), Asia Pacific (Mumbai), South America (São Paulo), Europe (London), and Europe (Milan). Test the entire area checklist for future updates. If you want to study extra, please go to right here NVIDIA Nemotron Strive NVIDIA Nemotron 3 Nano on Amazon Bedrock console as we speak.

In regards to the writer

Welcome to Ivugangingo!

At Ivugangingo, we're passionate about delivering insightful content that empowers and informs our readers across a spectrum of crucial topics. Whether you're delving into the world of insurance, navigating the complexities of cryptocurrency, or seeking wellness tips in health and fitness, we've got you covered.

Run NVIDIA Nemotron 3 Nano as a completely managed serverless mannequin on Amazon Bedrock

About Nemotron 3 Nano

Mannequin overview:

Benchmarking the mannequin

NVIDIA Nemotron 3 Nano utilization instance

Strive utilizing NVIDIA Nemotron 3 Nano on Amazon Bedrock

Utilizing the AWS CLI and SDKs

Utilizing NVIDIA Nemotron 3 Nano with Amazon Bedrock Options

Amazon bedrock guardrail

Amazon Bedrock Data Base

conclusion

In regards to the writer

Ladies Leaders in Insurance coverage within the USA | Elite Ladies

Lengthy-lost pages of Archimedes’ writings rediscovered in France

Converter

Editors Pick

Newsletter

Categories

Related Posts