Organizations implementing brokers and agent-based methods typically expertise challenges comparable to implementing a number of instruments, operate calling, and orchestrating the workflows of the instrument calling. An agent makes use of a operate name to invoke an exterior instrument (like an API or database) to carry out particular actions or retrieve info it doesn’t possess internally. These instruments are built-in as an API name contained in the agent itself, resulting in challenges in scaling and power reuse throughout an enterprise. Prospects trying to deploy brokers at scale want a constant solution to combine these instruments, whether or not inside or exterior, whatever the orchestration framework they’re utilizing or the operate of the instrument.
Model Context Protocol (MCP) goals to standardize how these channels, brokers, instruments, and buyer knowledge can be utilized by brokers, as proven within the following determine. For patrons, this interprets straight right into a extra seamless, constant, and environment friendly expertise in comparison with coping with fragmented methods or brokers. By making instrument integration less complicated and standardized, clients constructing brokers can now concentrate on which instruments to make use of and find out how to use them, reasonably than spending cycles constructing customized integration code. We’ll deep dive into the MCP structure later on this put up.
For MCP implementation, you want a scalable infrastructure to host these servers and an infrastructure to host the massive language mannequin (LLM), which can carry out actions with the instruments applied by the MCP server. Amazon SageMaker AI offers the power to host LLMs with out worrying about scaling or managing the undifferentiated heavy lifting. You’ll be able to deploy your mannequin or LLM to SageMaker AI internet hosting providers and get an endpoint that can be utilized for real-time inference. Furthermore, you’ll be able to host MCP servers on the compute setting of your selection from AWS, together with Amazon Elastic Compute Cloud (Amazon EC2), Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS), and AWS Lambda, based on your most well-liked stage of managed service—whether or not you need to have full management of the machine working the server, otherwise you desire to not fear about sustaining and managing these servers.
On this put up, we focus on the next subjects:
- Understanding the MCP structure, why you must use the MCP in comparison with implementing microservices or APIs, and two fashionable methods of implementing MCP utilizing LangGraph adapters:
- FastMCP for prototyping and easy use instances
- FastAPI for advanced routing and authentication
- Really helpful structure for scalable deployment of MCP
- Utilizing SageMaker AI with FastMCP for speedy prototyping
- Implementing a mortgage underwriter MCP workflow with LangGraph and SageMaker AI with FastAPI for customized routing
Understanding MCP
Let’s deep dive into the MCP structure. Developed by Anthropic as an open protocol, the MCP offers a standardized solution to join AI fashions to just about any knowledge supply or instrument. Utilizing a client-server architecture (as illustrated within the following screenshot), MCP helps builders expose their knowledge by way of light-weight MCP servers whereas constructing AI purposes as MCP shoppers that join to those servers.

The MCP makes use of a client-server architecture containing the next elements:
- Host – A program or AI instrument that requires entry to knowledge by way of the MCP protocol, comparable to Anthropic’s Claude Desktop, an built-in improvement setting (IDE), or different AI purposes
- Shopper – Protocol shoppers that preserve one-to-one connections with servers
- Server – Light-weight packages that expose capabilities by way of standardized MCP or act as instruments
- Knowledge sources – Native knowledge sources comparable to databases and file methods, or exterior methods out there over the web by way of APIs (internet APIs) that MCP servers can connect with
Primarily based on these elements, we will outline the protocol because the communication spine connecting the MCP shopper and server throughout the structure, which incorporates the algorithm and requirements defining how shoppers and servers ought to work together, what messages they trade (utilizing JSON-RPC 2.0), and the roles of various elements.
Now let’s perceive the MCP workflow and the way it interacts with an LLM to ship you a response through the use of an instance of a journey agent. You ask the agent to “E-book a 5-day journey to Europe in January and we like heat climate.” The host software (performing as an MCP shopper) identifies the necessity for exterior knowledge and connects by way of the protocol to specialised MCP servers for flights, lodges, and climate info. These servers return the related knowledge by way of the MCP, which the host then integrates with the unique immediate, offering enriched context to the LLM to generate a complete, augmented response for the consumer. The next diagram illustrates this workflow.

When to make use of MCP as an alternative of implementing microservices or APIs
MCP marks a big development in comparison with conventional monolithic APIs and complex microservices architectures. Conventional APIs typically bundle the functionalities collectively, resulting in challenges the place scaling requires upgrading your entire system, updates carry excessive dangers of system-wide failures, and managing totally different variations for varied purposes turns into overly advanced. Though microservices supply extra modularity, they usually demand separate, typically advanced, integrations for every service and complex administration overhead.
MCP overcomes these limitations by establishing a standardized client-server structure particularly designed for environment friendly and safe integration. It offers a real-time, two-way communication interface enabling AI methods to seamlessly join with numerous exterior instruments, API providers, and knowledge sources utilizing a “write as soon as, use anyplace” philosophy. Utilizing transports like customary enter/output (stdio) or streamable HTTP below the unifying JSON-RPC 2.0 customary, MCP delivers key benefits comparable to superior fault isolation, dynamic service discovery, constant safety controls, and plug-and-play scalability, making it exceptionally well-suited for AI purposes that require dependable, modular entry to a number of assets.
FastMCP vs. FastAPI
On this put up, we focus on two totally different approaches for implementing MCP servers: FastAPI with SageMaker, and FastMCP with LangGraph. Each are totally appropriate with the MCP structure and can be utilized interchangeably, relying in your wants. Let’s perceive the distinction between each.
FastMCP is used for speedy prototyping, academic demos, and situations the place improvement pace is a precedence. It’s a light-weight, opinionated wrapper constructed particularly for rapidly standing up MCP-compliant endpoints. It abstracts away a lot of the boilerplate—comparable to enter/output schemas and request dealing with—so you’ll be able to focus completely in your mannequin logic.
To be used instances the place you must customise request routing, add authentication, or combine with observability instruments like Langfuse or Prometheus, FastAPI offers you the flexibleness to take action. FastAPI is a full-featured internet framework that provides you finer-grained management over the server conduct. It’s well-suited for extra advanced workflows, superior request validation, detailed logging, middleware, and different production-ready options.
You’ll be able to safely use both method in your MCP servers—the selection depends upon whether or not you prioritize simplicity and pace (FastMCP) or flexibility and extensibility (FastAPI). Each approaches conform to the identical interface anticipated by brokers within the LangGraph pipeline, so your orchestration logic stays unchanged.
Answer overview
On this part, we stroll by way of a reference structure for scalable deployment of MCP servers and MCP shoppers, utilizing SageMaker AI because the internet hosting setting for the inspiration fashions (FMs) and LLMs. Though this structure makes use of SageMaker AI as its reasoning core, it may be rapidly tailored to help Amazon Bedrock fashions as effectively. The next diagram illustrates the answer structure.

The structure decouples the shopper from the server through the use of streamable HTTP because the transport layer. By doing this, shoppers and servers can scale independently, making it an amazing match for serverless orchestration powered by Lambda, AWS Fargate for Amazon ECS, or Fargate for Amazon EKS. An extra advantage of decoupling is which you could higher management authorization of purposes and consumer by controlling AWS Identification and Entry Administration (IAM) permissions of shopper and servers individually, and propagating consumer entry to the backend. Should you’re working shopper and server with a monolithic structure on the identical compute, we recommend as an alternative utilizing stdio because the transport layer to scale back networking overhead.
Use SageMaker AI with FastMCP for speedy prototyping
With the structure outlined, let’s analyze the appliance circulation as proven within the following determine.

By way of utilization patterns, MCP shares a logic much like instrument calling, with an preliminary addition to find the out there instruments:
- The shopper connects to the MCP server and obtains a listing of obtainable instruments.
- The shopper invokes the LLM utilizing a immediate engineered with the checklist of instruments out there on the MCP server (message of kind “consumer”).
- The LLM causes with respect to which of them it must name and what number of occasions, and replies (“assistant” kind message).
- The shopper asks the MCP server to execute the instrument calling and offers the outcome to the LLM (“consumer” kind message).
- This loop iterates till a ultimate reply is reached and may be given again to the consumer.
- The shopper disconnects from the MCP server.
Let’s begin with the MCP server definition. To create an MCP server, we use the official Model Context Protocol Python SDK. For instance, let’s create a easy server with only one instrument. The instrument will simulate looking for the most well-liked track performed at a radio station, and return it in a Python dictionary. Make sure that so as to add correct docstring and enter/output typing, in order that the each the server and shopper can uncover and eat the useful resource appropriately.
As we mentioned earlier, MCP servers may be run on AWS compute providers—Amazon EC2, Amazon EC2, Amazon EKS, or Lambda—and may then be used to securely entry different assets within the AWS Cloud, for instance databases in digital personal clouds (VPCs) or an enterprise API, in addition to exterior assets. For instance, a easy solution to deploy an MCP server is to make use of Lambda help for Docker pictures to put in the MCP dependency on the Lambda operate or Fargate.
With the server arrange, let’s flip our focus to the MCP shopper. Communication begins with the MCP shopper connecting to the MCP Server utilizing streamable HTTP:
When connecting to the MCP server, apply is to ask the server for a listing of obtainable instruments with the list_tools() API. With the instrument checklist and their description, we will then outline a system immediate for instrument calling:
Instruments are normally outlined utilizing a JSON schema much like the next instance. This instrument is known as top_song and its operate is to get the most well-liked track performed on a radio station:
With the system immediate configured, you’ll be able to run the chat loop as a lot as wanted, alternating between invoking the hosted LLM and calling the instruments powered by the MCP server. You should utilize packages comparable to SageMaker Boto3, the Amazon SageMaker Python SDK, or one other third-party library, comparable to LiteLLM or related.
A mannequin hosted on SageMaker doesn’t help operate calling natively in its API. This implies that you’ll want to parse the content material of the response utilizing an everyday expression or related strategies:
After no extra instrument requests can be found within the LLM response, you’ll be able to take into account the content material as the ultimate reply and return it to the consumer. Lastly, you shut the stream to finalize interactions with the MCP server.
Implement a mortgage underwriter MCP workflow with LangGraph and SageMaker AI with FastAPI for customized routing
To exhibit the facility of MCP with SageMaker AI, let’s discover a mortgage underwriting system that processes purposes by way of three specialised personas:
- Mortgage officer – Summarizes the appliance
- Credit score analyst – Evaluates creditworthiness
- Danger supervisor – Makes ultimate approval or denial selections
We’ll stroll you thru these personas by way of the next structure for a mortgage processing workflow utilizing MCP. The code for this resolution is accessible within the following GitHub repo.

Within the structure, the MCP shopper and server are working on EC2 situations and the LLM is hosted on SageMaker endpoints. The workflow consists of the next steps:
- The consumer enters a immediate with mortgage enter particulars comparable to identify, age, earnings, and credit score rating.
- The request is routed to the mortgage MCP server by the MCP shopper.
- The mortgage parser sends output as enter to the credit score analyzer MCP server.
- The credit score analyzer sends output as enter to the danger supervisor MCP server.
- The ultimate immediate is processed by the LLM and despatched again to the MCP shopper to supply the output to the consumer.
You should utilize LangGraph’s built-in human-in-the-loop function when the credit score analyzer sends the output to the danger supervisor and when the danger supervisor sends the output. For this put up, now we have not applied this workflow.
Every persona is powered by an agent with LLMs hosted by SageMaker AI, and its logic is deployed through the use of a devoted MCP server. Our MCP server implementation within the instance makes use of the Awesome MCP FastAPI, however you too can construct a regular MCP server implementation based on the unique Anthropic package deal and specification. The devoted MCP server on this instance is working on an area Docker container, however it may be rapidly deployed to the AWS Cloud utilizing providers like Fargate. To run the servers domestically, use the next code:
When the servers are working, you can begin creating the brokers and the workflow. You have to to deploy the LLM endpoint by working the next command:
This instance makes use of LangGraph, a typical open supply framework for agentic workflows, designed to help seamless integration of language fashions into advanced workflows and purposes. Workflows are represented as graphs fabricated from nodes—actions, instruments, or mannequin queries—and edges with the circulation of knowledge between them. LangGraph offers a structured but dynamic solution to execute duties, making it easy to jot down AI purposes involving pure language understanding, automation, and decision-making.
In our instance, the primary agent we create is the mortgage officer:
The objective of the mortgage officer (or LoanParser) is to carry out the duties outlined in its MCP server. To name the MCP server, we will use the httpx library:
With that completed, we will run the workflow utilizing the scripts/run_pipeline.py file. We configured the repository to be traceable through the use of LangSmith. When you have appropriately configured the setting variables, you will notice a hint much like this one in your LangSmith UI.
Configuring LangSmith UI for experiment tracing is optionally available. You’ll be able to skip this step.
After working python3 scripts/run_pipeline.py, you must see the next in your terminal or log.
We use the next enter:
We get the next output:
Tracing with the LangSmith UI
LangSmith traces comprise the complete info of all of the inputs and outputs of every step of the appliance, giving customers full visibility into their agent. That is an optionally available step and in case you’ve configured LangSmith for tracing the MCP mortgage processing software. You’ll be able to go the LangSmith login web page and log in to the LangSmith UI. Then you’ll be able to select Tracing Venture and run LoanUnderwriter. You must see an in depth circulation of every MCP server, comparable to mortgage parser, credit score analyzer, and danger assessor enter and outputs by the LLM, as proven within the following screenshot.

Conclusion
The MCP proposed by Anthropic gives a standardized approach of connecting FMs to knowledge sources, and now you should utilize this functionality with SageMaker AI. On this put up, we offered an instance of mixing the facility of SageMaker AI and MCP to construct an software that provides a brand new perspective on mortgage underwriting by way of specialised roles and automatic workflows.
Organizations can now streamline their AI integration processes by minimizing customized integrations and upkeep bottlenecks. As AI continues to evolve, the power to securely join fashions to your group’s crucial methods will develop into more and more beneficial. Whether or not you’re trying to remodel mortgage processing, streamline operations, or acquire deeper enterprise insights, the SageMaker AI and MCP integration offers a versatile basis on your subsequent AI innovation.
The next are some examples of what you’ll be able to construct by connecting your SageMaker AI fashions to MCP servers:
- A multi-agent mortgage processing system that coordinates between totally different roles and knowledge sources
- A developer productiveness assistant that integrates with enterprise methods and instruments
- A machine studying workflow orchestrator that manages advanced, multi-step processes whereas sustaining context throughout operations
Should you’re on the lookout for methods to optimize your SageMaker AI deployment, be taught extra about find out how to unlock price financial savings with the brand new scale all the way down to zero function in SageMaker Inference, in addition to find out how to unlock cost-effective AI inference utilizing Amazon Bedrock serverless capabilities with a SageMaker educated mannequin. For software improvement, consult with Construct agentic AI options with DeepSeek-R1, CrewAI, and Amazon SageMaker AI
In regards to the Authors
Mona Mona at present works as a Sr World Broad Gen AI Specialist Options Architect at Amazon specializing in Gen AI Options. She was a Lead Generative AI specialist in Google Public Sector at Google earlier than becoming a member of Amazon. She is a printed writer of two books – Pure Language Processing with AWS AI Companies and Google Cloud Licensed Skilled Machine Studying Examine Information. She has authored 19 blogs on AI/ML and cloud expertise and a co-author on a analysis paper on CORD19 Neural Search which gained an award for Finest Analysis Paper on the prestigious AAAI (Affiliation for the Development of Synthetic Intelligence) convention.
Davide Gallitelli is a Senior Worldwide Specialist Options Architect for Generative AI at AWS, the place he empowers international enterprises to harness the transformative energy of AI. Primarily based in Europe however with a worldwide scope, Davide companions with organizations throughout industries to architect customized AI brokers that resolve advanced enterprise challenges utilizing AWS ML stack. He’s significantly obsessed with democratizing AI applied sciences and enabling groups to construct sensible, scalable options that drive organizational transformation.
Surya Kari is a Senior Generative AI Knowledge Scientist at AWS, specializing in creating options leveraging state-of-the-art basis fashions. He has in depth expertise working with superior language fashions together with DeepSeek-R1, the Llama household, and Qwen, specializing in their fine-tuning and optimization for particular scientific purposes. His experience extends to implementing environment friendly coaching pipelines and deployment methods utilizing AWS SageMaker, enabling the scaling of basis fashions from improvement to manufacturing. He collaborates with clients to design and implement generative AI options, serving to them navigate mannequin choice, fine-tuning approaches, and deployment methods to realize optimum efficiency for his or her particular use instances.
Giuseppe Zappia is a Principal Options Architect at AWS, with over 20 years of expertise in full stack software program improvement, distributed methods design, and cloud structure. In his spare time, he enjoys enjoying video video games, programming, watching sports activities, and constructing issues.

