Tuesday, May 19, 2026
banner
Top Selling Multipurpose WP Theme

Design patterns for scalable voice brokers matter for organizations that must ship quick, pure, and dependable voice experiences. Many groups face challenges like excessive latency, managing real-time audio, and coordinating a number of brokers in complicated workflows.

On this publish, you’ll learn to use Amazon Nova Sonic, Amazon Bedrock AgentCore, and Strands BidiAgent to construct scalable, maintainable voice brokers that deal with these challenges effectively, leading to extra responsive and clever buyer interactions.

We’ll discover three fashionable architectural patterns for voice brokers, highlighting their trade-offs and greatest practices for minimizing latency.

The constructing blocks

Earlier than diving deeper into the structure patterns, right here’s a fast overview of the three key parts used because the pattern answer on this publish.

Amazon Nova Sonic is a basis mannequin that creates pure, human-like speech-to-speech conversations for generative AI purposes. Customers can work together with AI by means of voice in actual time, with capabilities for understanding tone, pure conversational circulate, and performing actions.

Amazon Bedrock AgentCore Runtime is a serverless internet hosting surroundings for AI brokers. You package deal your agent as a container, deploy to AgentCore Runtime, and it handles scaling, session isolation, and billing. For voice brokers, it supplies bidirectional WebSocket streaming with SigV4 auth, microVM-level session isolation to keep away from noisy-neighbor latency spikes, AgentCore Gateway for shared instrument internet hosting utilizing the Mannequin Context Protocol (MCP) open supply protocol, persistent reminiscence throughout periods, and telemetry for voice-specific metrics like time-to-first-audio.

Strands Agents is an open supply framework for constructing AI brokers. Its BidiAgent class is one integration choice between Nova Sonic and your utility. It manages the bidirectional stream lifecycle, routes instrument calls, and handles session administration, simplifying the voice agent utility by means of the mannequin SDK interface.

Three integration patterns: instrument, agent-as-tool (sub-agent), and session segmentation

As an alternative of constructing one omnipotent agent, trendy voice techniques are more and more composed of tool-driven brokers, sub-agents performing as instruments and session segmentation methods that isolate prompts, reminiscence, and permissions. These patterns permit groups to decompose giant assistants into smaller, specialised, and reusable parts whereas sustaining clear safety boundaries.

Earlier than operating the samples within the following sections, set up Python and the required dependencies, together with strands-agents and boto3, and ensure your IAM setup has the required permissions for the required providers. For the total instance, check with the GitHub repository.

Sample 1: AgentCore Gateway – instrument choice for low latency

A instrument name is when a voice agent sends enter to an exterior perform or service, which processes it and returns output. It lets the agent carry out duties like querying a database or triggering a service shortly and securely, with out additional reasoning steps.

With AgentCore Gateway, you expose your present enterprise logic as instruments, discrete features that Nova Sonic can name instantly throughout a dialog. The voice mannequin selects which instrument to invoke, passes parameters, will get a consequence, and speaks it again. There’s no intermediate reasoning layer between the mannequin and the instrument.

Architecture diagram showing AgentCore Gateway tool selection pattern with Nova Sonic calling MCP tools directly

AgentCore Gateway hosts MCP servers as managed endpoints. MCP is the protocol, AgentCore Gateway is the AWS function that runs them. The voice agent connects through Gateway ARNs.

# Nova Sonic calls instruments instantly through AgentCore Gateway
mannequin = BidiNovaSonicModel(
    model_id="amazon.nova-2-sonic-v1:0",
    mcp_gateway_arn=[
        "arn:aws:bedrock-agentcore:us-east-1:123456789012:gateway/auth-tools",
        "arn:aws:bedrock-agentcore:us-east-1:123456789012:gateway/banking-tools",
        "arn:aws:bedrock-agentcore:us-east-1:123456789012:gateway/mortgage-tools",
    ],
)

When a consumer says “What’s my account steadiness?”, Nova Sonic:

  1. Understands the intent from speech.
  2. Selects get_account_balance from the accessible MCP instruments.
  3. Calls the instrument with the proper parameters.
  4. Speaks the consequence again.

Commerce-off: Nova Sonic makes all the selections. If a instrument name requires multi-step validation, conditional logic, or chaining a number of operations collectively, that reasoning burden falls fully on the voice mannequin’s system immediate. For easy instruments that is superb. For complicated workflows, it will get brittle.

Sample 2: Sub-agent – further reasoning with decoupled brokers

With the sub-agent or agent-as-tool sample, your present enterprise logic runs in autonomous brokers, every with its personal mannequin, system immediate, instruments, and reasoning capabilities. The voice orchestrator delegates entire duties to those sub-agents as a substitute of calling particular person instruments.

There are a lot of methods to hook up with a sub-agent out of your voice agent. Agent-to-Agent (A2A) and Strands Agent-as-Software are two frequent approaches:

  • Native agent-as-tool: The sub-agent runs in-process, wrapped as a @instrument perform utilizing the Brokers as Instruments sample in Strands. That is essentially the most easy method with no community hop and no separate deployment. The trade-off is that the sub-agent shares the identical course of and scales with the orchestrator.
  • Distant agent through A2A protocol: The sub-agent is deployed as an impartial A2A server on AgentCore Runtime (or a distant server) and invoked over the community. A2A is an open protocol for agent-to-agent communication. As MCP connects brokers to instruments, A2A connects brokers to different brokers. Because the AWS weblog on A2A protocol assist in AgentCore Runtime explains, brokers constructed with totally different frameworks (Strands, OpenAI, LangGraph, Google ADK) can share context and reasoning in a standard format. This supplies full deployment independence and cross-framework interoperability.

Architecture diagram showing the sub-agent pattern with Nova Sonic delegating to specialized agents

Strands Brokers has built-in assist for each protocols, MCP for instrument entry and A2A for agent-to-agent communication. For a hands-on walkthrough, see the neighborhood information on Agent Collaboration: Strands Agents, MCP, and the Agent2Agent Protocol.

Right here’s the native agent-as-tool method, every sub-agent is a @instrument wrapping a full Strands Agent:

# sub_agents.py — Outline sub-agents as Strands instruments utilizing the Brokers-as-Instruments sample
from strands import Agent, instrument
from strands.fashions import BedrockModel

# Every sub-agent is a full Strands Agent wrapped as a @instrument
# The BidiAgent orchestrator calls these through Nova Sonic's instrument use

@instrument
def authenticate_customer(account_id: str, date_of_birth: str) -> str:
    """Authenticate a buyer utilizing their account ID and date of beginning.
    Handles the total verification circulate together with identification checks and retry logic.
    Returns authentication standing and token."""
    auth_agent = Agent(
        mannequin=BedrockModel(model_id="amazon.nova-lite-v1:0"),
        system_prompt="""You're an authentication agent. Confirm the shopper's identification
        utilizing the supplied account ID and date of beginning. Name verify_identity to test
        credentials. Return a transparent auth standing in 1-2 sentences.""",
        instruments=[verify_identity, check_account_exists],  # Sub-agent's personal instruments
    )
    consequence = auth_agent(f"Authenticate account {account_id}, DOB: {date_of_birth}")
    return str(consequence)


@instrument
def handle_banking_inquiry(question: str, auth_token: str) -> str:
    """Deal with banking questions — balances, transactions, transfers.
    Validates permissions and returns a conversational abstract."""
    banking_agent = Agent(
        mannequin=BedrockModel(model_id="amazon.nova-lite-v1:0"),
        system_prompt="""You're a banking agent. Use the supplied instruments to reply
        the shopper's question. Summarize ends in 2-3 pure sentences.
        Don't return uncooked JSON.""",
        instruments=[get_account_balance, get_recent_transactions, transfer_funds],
    )
    consequence = banking_agent(question)
    return str(consequence)


@instrument
def handle_mortgage_inquiry(question: str) -> str:
    """Deal with mortgage questions — charges, calculations, eligibility, utility standing.
    Performs its personal calculations and reasoning."""
    mortgage_agent = Agent(
        mannequin=BedrockModel(model_id="amazon.nova-lite-v1:0"),
        system_prompt="""You're a mortgage specialist. Assist with price inquiries,
        fee calculations, and eligibility assessments. Maintain responses concise
        and conversational — this can be spoken aloud.""",
        instruments=[get_mortgage_rates, calculate_payment, check_eligibility],
    )
    consequence = mortgage_agent(question)
    return str(consequence)

The voice orchestrator then makes use of BidiAgent with these sub-agent instruments:

# voice_orchestrator.py — BidiAgent with sub-agents as instruments
from strands.experimental.bidi.agent import BidiAgent
from strands.experimental.bidi.fashions.nova_sonic import BidiNovaSonicModel
from sub_agents import authenticate_customer, handle_banking_inquiry, handle_mortgage_inquiry

mannequin = BidiNovaSonicModel(
    area="us-east-1",
    model_id="amazon.nova-2-sonic-v1:0",
    provider_config={"audio": {"voice": "tiffany", "input_sample_rate": 16000, "output_sample_rate": 16000}},
)

agent = BidiAgent(
    mannequin=mannequin,
    instruments=[authenticate_customer, handle_banking_inquiry, handle_mortgage_inquiry],
    system_prompt="""You're a banking voice assistant. Route buyer requests to the
    applicable specialist. At all times authenticate earlier than accessing account information.
    Maintain your personal responses transient — the sub-agents deal with the main points.""",
)

await agent.run(inputs=[ws_input], outputs=[ws_output])

The sub-agent does its personal considering. Nova Sonic doesn’t must orchestrate the person steps. It delegates and speaks the consequence.

Commerce-off: Every sub-agent name provides latency: the sub-agent’s personal mannequin inference plus its instrument calls. In a voice dialog, this implies longer silence whereas the sub-agent causes. The AWS weblog on multi-agent voice assistants recommends beginning with smaller, environment friendly fashions like Amazon Nova 2 Lite for sub-agents to scale back latency whereas nonetheless dealing with specialised duties successfully.

Amazon Nova 2 Sonic helps asynchronous instrument calling, so the dialog continues naturally whereas instruments run within the background. It retains accepting enter, can run a number of instruments in parallel, and gracefully adapts if the consumer modifications their request mid-process, delivering all outcomes whereas specializing in what’s nonetheless related.

Sample 3: Session segmentation for ultra-low latency

There’s a 3rd method price contemplating. It doesn’t map neatly to the MCP or sub-agent patterns, however is purpose-built for voice eventualities the place latency is the overriding concern.

As an alternative of delegating exterior instruments or sub-agents, you section the dialog into logical phases, every with its personal Nova Sonic session, system immediate, and gear set. When the dialog transitions from one part to the following (for instance, from authentication to account inquiry), you shut the present session and open a brand new one with a unique immediate and instruments, throughout the identical WebSocket connection. Every sub-voice-agent can use its personal MCP gateways, instruments, and even sub-agents — the variations that it operates with a centered immediate and minimal instrument floor, lowering reasoning overhead and latency.

Architecture diagram showing session segmentation pattern with separate Nova Sonic sessions per conversation phase

Consider a banking voice assistant with three dialog phases: authentication, account administration, and mortgage inquiry. Slightly than loading one large system immediate with each instrument, you run every part as a centered Nova Sonic session:

# Section 1: Authentication
auth_session = BidiNovaSonicModel(
    model_id="amazon.nova-2-sonic-v1:0",
    mcp_gateway_arn=["arn:...gateway/auth-tools"],  # Solely auth instruments
)
auth_agent = BidiAgent(
    mannequin=auth_session,
    instruments=[],
    system_prompt="""You're an authentication assistant. 
    Gather the consumer's account ID and date of beginning. 
    Name verify_identity to authenticate. 
    As soon as verified, say 'You are all set' and cease.""",
)
# Run till authentication completes
await auth_agent.run(inputs=[ws_input], outputs=[ws_output])

# Section 2: Account administration (new session, new immediate, new instruments)
banking_session = BidiNovaSonicModel(
    model_id="amazon.nova-2-sonic-v1:0",
    mcp_gateway_arn=["arn:...gateway/banking-tools"],  # Solely banking instruments
)
banking_agent = BidiAgent(
    mannequin=banking_session,
    instruments=[],
    system_prompt="""You're a banking assistant. The consumer is already authenticated.
    Assist with steadiness inquiries, transactions, and transfers.
    Maintain responses to 1 or two sentences.""",
)
await banking_agent.run(inputs=[ws_input], outputs=[ws_output])

Every part will get a clear Nova Sonic session with:

  • A centered system immediate: Shorter, extra particular, much less room for the mannequin to get confused.
  • Solely the related instruments: through MCP gateways, native instruments, or each. The mannequin doesn’t waste reasoning cycles selecting between 15 instruments when it solely wants 3.
  • Optionally its personal sub-agents: a part that requires deeper reasoning can use Sample 2 internally, whereas less complicated phases keep tool-only.
  • The earlier session context could be handed into the brand new session as chat historical past, so the general dialog retains continuity.

In comparison with instrument, sub-agent, and session segmentation patterns

Issue Software Sub-Agent (Agent-as-Software) Session Segmentation
Latency Low Increased (sub-agent reasoning) Lowest (with latency throughout session transitions)
Software set per flip Instruments loaded Sub-agent’s instruments Solely phase-relevant instruments
System immediate One giant immediate Orchestrator + sub-agent prompts Small, phase-specific prompts
Reasoning depth Voice mannequin solely Voice mannequin + sub-agent Voice mannequin solely (per part)
Reuse of present brokers Excessive (identical MCP instruments) Highest (identical sub-agents) Medium (composes instruments/sub-agents per part)
Dialog continuity Seamless Seamless Requires handoff logic between phases

Latency greatest practices for voice brokers

Latency is a key consideration when constructing voice versus textual content brokers. Listed below are sensible strategies to maintain response instances quick and responsive:

Begin with small fashions for sub-agents. Your voice orchestrator makes use of Nova Sonic for the dialog, however sub-agents don’t want a big mannequin. Begin with Amazon Nova 2 Lite or Nova 2 Micro. They’re quick, value optimized, and deal with most specialised duties nicely. You may at all times improve a particular sub-agent to a bigger mannequin if high quality requires it, however default to small.

Design stateful sub-agents with caching. A stateless sub-agent that hits a database or API on each name provides latency each time. As an alternative, design sub-agents to cache outcomes from information sources (APIs, AWS Lambda features, databases) inside a session. If the banking sub-agent fetches account particulars as soon as, it ought to maintain that information in reminiscence and serve subsequent questions (steadiness, transactions, abstract) from cache quite than making repeated backend calls.

Prefetch information after authentication. That is particularly precious for contact middle eventualities. After a buyer authenticates, you already know who they’re. Don’t await them to ask earlier than pulling their information. Instantly fetch account balances, current transactions, pending alerts, and mortgage standing within the background. When the shopper asks “What’s my steadiness?”, the reply is already in reminiscence.

Parallelize impartial instrument calls. If the consumer asks “Give me an summary of my accounts”, don’t name get_checking_balance, then get_savings_balance, then get_credit_card_balance sequentially. Use concurrent execution so three calls occur without delay. Strands helps this natively. The agent’s instrument executor runs impartial calls in parallel by default.

Use filler phrases to masks instrument latency. When a instrument name or sub-agent delegation is unavoidable, instruct the voice mannequin to talk a quick filler whereas ready: “Let me test that for you…” or “One second whereas I look that up…” This retains the dialog feeling alive as a substitute of dropping into silence.

Reduce instrument rely per session. Software choice will get slower because the variety of accessible instruments grows. In case your agent has 15 instruments however a typical dialog solely makes use of 3 to 4, contemplate the session segmentation sample to load solely the related instruments per part.

Clear up

After you end testing the pattern, keep in mind to scrub up the sources you created to keep away from pointless prices. Comply with the repository directions to cease providers and delete any deployed infrastructure.

Conclusion

Migrating a textual content chatbot to a voice assistant isn’t an easy wrapper job. The interplay mannequin is basically totally different, from response design to latency budgets to turn-taking habits. However with a well-structured multi-agent structure and Amazon Bedrock AgentCore, the enterprise logic layer stays intact.

The sub-agents you’ve already constructed are your largest asset. Reuse them.

For a working instance of a Strands BidiAgent voice assistant deployed on AgentCore Runtime with WebSocket streaming, see the AgentCore bidirectional streaming sample.

Subsequent steps

Subsequent, you’ll be able to prolong the pattern to suit your personal use case, combine your corporation instruments, refine prompts for voice interactions, and check the agent in real-world eventualities to arrange for manufacturing deployment. To be taught extra about voice brokers on AWS, go to:


In regards to the authors

Lana Zhang

Lana Zhang

Lana Zhang is a Senior Specialist Options Architect for Generative AI at AWS throughout the Worldwide Specialist Group. She focuses on AI/ML, with a give attention to use circumstances comparable to AI voice assistants and multimodal understanding. She works intently with prospects throughout various industries, together with media and leisure, gaming, sports activities, promoting, monetary providers, and healthcare, to assist them remodel their enterprise options by means of AI.

Osman Ipek

Osman Ipek

Osman Ipek is a Options Architect on Amazon’s AGI crew specializing in Nova basis fashions. He guides groups to speed up improvement by means of sensible AI implementation methods, with experience spanning voice AI, NLP, and MLOps.

banner
Top Selling Multipurpose WP Theme

Converter

Top Selling Multipurpose WP Theme

Newsletter

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

banner
Top Selling Multipurpose WP Theme

Leave a Comment

banner
Top Selling Multipurpose WP Theme

Latest

Best selling

22000,00 $
16000,00 $
6500,00 $

Top rated

6500,00 $
22000,00 $
900000,00 $

Products

Knowledge Unleashed
Knowledge Unleashed

Welcome to Ivugangingo!

At Ivugangingo, we're passionate about delivering insightful content that empowers and informs our readers across a spectrum of crucial topics. Whether you're delving into the world of insurance, navigating the complexities of cryptocurrency, or seeking wellness tips in health and fitness, we've got you covered.