AI is advancing quickly, and for many clients, the true alternative lies not in experimenting with AI, however in operating it in manufacturing and delivering significant enterprise outcomes. This implies constructing techniques that carry out reliably, carry out at scale, and meet your group’s safety and compliance necessities.
At present at NVIDIA GTC 2026, AWS and NVIDIA introduced an expanded collaboration with new expertise integrations to assist rising AI computing calls for and assist you construct and run production-ready AI options. These integrations span accelerated computing, interconnect applied sciences, and mannequin fine-tuning and inference. They embrace:
Key bulletins at NVIDIA GTC 2026
Increase your AI infrastructure with expanded GPU choices and optimized interconnects
Accelerating computing energy within the age of agent AI
Beginning in 2026, AWS will add greater than 1 million NVIDIA GPUs, together with Blackwell and Rubin GPU architectures, throughout international cloud areas. AWS provides the broadest assortment of NVIDIA GPU-based situations of any cloud supplier to energy a wide range of AI/ML workloads. AWS and NVIDIA are additionally collaborating on Spectrum networking and different infrastructure areas, furthering the businesses’ greater than 15 years of joint innovation.
AWS’ superior cloud and AI infrastructure offers enterprises, startups, and researchers with the infrastructure they should construct and scale agent AI techniques that may motive, plan, and function autonomously throughout complicated workflows.
New Amazon EC2 situations with NVIDIA RTX PRO 4500 Blackwell Server Version GPUs
At present, we introduced the next accelerations for Amazon EC2 situations. NVIDIA RTX PRO 4500 Blackwell Server Edition GPU I will be right here quickly. AWS is the primary main cloud supplier to announce assist for RTX PRO 4500 Blackwell Server Version GPUs. These situations are appropriate for a variety of workloads together with information analytics, conversational AI, content material era, recommender techniques, video streaming, video rendering, and different graphics workloads.
Amazon EC2 situations accelerated by NVIDIA RTX PRO 4500 Blackwell Server Version GPUs are constructed on the AWS Nitro System, a mixture of purpose-built {hardware} and a light-weight hypervisor that gives the occasion with nearly all of the compute and reminiscence assets of the host {hardware}, bettering total useful resource utilization and efficiency. The Nitro System’s specialised {hardware}, software program, and firmware are designed to implement restrictions that forestall anybody, together with AWS staff, from accessing delicate AI workloads and information. Moreover, Nitro System helps firmware updates, bug fixes, and optimizations whereas the system continues to function. These options throughout the Nitro System present enhanced useful resource effectivity, safety, and stability wanted for AI, analytics, and graphics workloads in manufacturing environments.
Distributed LLM Inference Interconnect Acceleration with NVIDIA NIXL on AWS EFA and Trainium
As your mannequin dimension will increase, communication overhead between GPUs or Trainium can grow to be a bottleneck. At present, we introduced assist for the NVIDIA Inference Xfer Library (NIXL) by AWS EFA to speed up distributed Giant Language Mannequin (LLM) inference throughout NVIDIA GPUs on Amazon EC2 and AWS Trainium. Granular inference acceleration is vital to scaling trendy AI workloads as a result of it allows environment friendly overlap of communication and computation whereas minimizing communication latency and maximizing GPU utilization. This integration allows high-throughput, low-latency motion of KV cache information between GPU compute nodes that carry out token era and distributed reminiscence assets that retailer KV cache state. It additionally offers the flexibleness to construct inference clusters utilizing any mixture of GPUs and Trainium EFA-enabled EC2 situations. NIXL with EFA natively integrates with fashionable open supply frameworks similar to NVIDIA Dynamo, vLLM, and SGLang to enhance token-to-token latency and enhance KV cache reminiscence utilization effectivity.
Speed up information evaluation with Amazon EMR and NVIDIA GPUs
Run Apache Spark 3x quicker with Amazon EMR on Amazon EKS with G7e situations
Information engineers and information scientists are sometimes confronted with multi-hour information processing pipelines that decelerate AI/ML mannequin iterations and enterprise intelligence era. The efficiency of those workloads has improved considerably. AWS and NVIDIA speed up efficiency of Apache Spark workloads by 3x utilizing Amazon EMR on EKS on G7e situations. This efficiency is the results of a joint engineering collaboration between AWS and NVIDIA that optimizes GPU-accelerated analytics by combining Amazon EMR on EKS with NVIDIA’s RTX PRO 6000 structure. Amazon EMR and G7e situations allow information engineers and information scientists to speed up time to perception for AI/ML characteristic engineering, complicated ETL transformations, and real-time analytics at scale. Clients operating giant information processing pipelines can cut back the time required to carry out analytics whereas sustaining full compatibility with current Spark functions.
Expanded assist for NVIDIA Nemotron fashions on Amazon Bedrock
Advantageous-tuning Nemotron fashions in Amazon Bedrock utilizing Reinforcement Advantageous-Tuning (coming quickly)
Builders will quickly be capable of fine-tune NVIDIA Nemotron fashions immediately on Amazon Bedrock utilizing Reinforcement Advantageous-Tuning (RFT). That is essential for groups that must tailor mannequin conduct to a selected area, similar to authorized, medical, monetary, or different skilled fields. Advantageous-tuning reinforcement lets you form not simply what your mannequin is aware of, however the way it causes and reacts. It additionally runs natively on Amazon Bedrock, so there may be zero infrastructure overhead. Outline your duties, present suggestions indicators, and Bedrock takes care of the remaining. Find out about fine-tuning reinforcement in Amazon Bedrock.
Nemotron 3 Tremendous (coming quickly) on Amazon Bedrock
NVIDIA Nemotron 3 Tremendous, a hybrid MoE mannequin constructed for multi-agent workloads and scaled inference, is coming quickly to Amazon Bedrock. It’s designed to assist AI brokers preserve accuracy throughout complicated multi-step workflows, powering use circumstances throughout monetary cybersecurity, retail, and software program improvement, and offers quick, cost-effective inference by means of totally managed APIs.
Bettering power effectivity and sustainability
As AI workloads scale, efficiency per watt turns into greater than only a sustainability metric, it turns into a aggressive benefit. in This NVIDIA GTC sessionAmazon CSO Kara Hurst joins sustainability leaders from Equinix and PepsiCo to debate how AI is remodeling enterprise power and infrastructure at scale, from the info middle as an energetic grid participant to AI as an enterprise effectivity engine, and the way AWS infrastructure may also help you obtain optimum power effectivity, with AWS infrastructure being 4.1 instances extra power environment friendly than on-premises information facilities.
Constructed to run collectively
What makes these bulletins fascinating just isn’t a single characteristic, however their collective presentation. AWS and NVIDIA’s 15-year partnership has created a full stack of end-to-end optimized AI infrastructure, from GPU to community to managed companies layer. No want to stitch it your self. Able to run.
If you happen to’re at GTC this week, cease by the AWS sales space. Take a look at dwell demos, watch in-booth theater classes, and get your personalized swag at AWS Swag Manufacturing facility.
Go to AWS at NVIDIA GTC 2026 to see all the pieces AWS is doing on the convention.
In regards to the creator

