Speed up enterprise AI growth with Weights & Biases and Amazon Bedrock AgentCore

by root January 5, 2026

written by root January 5, 2026 0 comment 269 views

This submit was co-authored by Thomas Capelle and Ray Strickland of Weights & Biases (W&B).

The adoption of generative synthetic intelligence (AI) is accelerating throughout the enterprise, evolving from easy underlying mannequin interactions to classy agent workflows. As organizations transfer from proof of idea to manufacturing deployment, they want strong instruments to develop, consider, and monitor AI functions at scale.

This submit reveals you how you can use Amazon Bedrock’s Basis Mannequin (FM) with the newly launched Amazon Bedrock AgentCore. W&B Weave We provide help to construct, consider, and monitor your enterprise AI options. We cowl your complete growth lifecycle, from monitoring particular person FM calls to monitoring complicated agent workflows in manufacturing.

Overview of W&B Weave

Weights and Bias (W&B) is an AI developer system that gives complete instruments for coaching, fine-tuning, and leveraging underlying fashions for corporations of all sizes in quite a lot of industries.

W&B Weave supplies an built-in suite of developer instruments to assist each stage of your agent AI workflow. This lets you:

Tracing and monitoring: Monitor massive language mannequin (LLM) calls and software logic to debug and analyze manufacturing techniques.
Systematic iteration: Modify and iterate on prompts, datasets, and fashions.
experiment: Experiment with totally different fashions and prompts. LLM Playground.
analysis: Use customized or pre-built scorers with comparability instruments to systematically consider and improve software efficiency. Accumulate person and knowledgeable suggestions for real-world testing and analysis.
guardrail: Defend your functions with content material administration, speedy security, and different safeguards. Use customized or third-party guardrails, together with Amazon Bedrock Guardrails, or W&B Weave’s native guardrails.

W&B Weave will be absolutely managed by Weights & Biases in a multi-tenant or single-tenant setting, or deployed instantly right into a buyer’s Amazon Digital Non-public Cloud (VPC). Moreover, W&B Weave’s integration into the W&B growth platform supplies organizations with a seamlessly built-in expertise between mannequin coaching/fine-tuning workflows and agent AI workflows.

To get began, subscribe to the Weights & Biases AI growth platform by way of AWS Market. People and tutorial groups can subscribe to W&B at no extra cost.

Monitoring Amazon Bedrock FM utilizing W&B Weave SDK

W&B Weave seamlessly integrates with Amazon Bedrock by way of the Python and TypeScript SDKs. As soon as you put in the library and patch your Bedrock shopper, W&B Weave mechanically tracks LLM calls.

!pip set up weave
import weave
import boto3
import json
from weave.integrations.bedrock.bedrock_sdk import patch_client

weave.init("my_bedrock_app")

# Create and patch the Bedrock shopper
shopper = boto3.shopper("bedrock-runtime")
patch_client(shopper)

# Use the shopper as standard
response = shopper.invoke_model(
    modelId="anthropic.claude-3-5-sonnet-20240620-v1:0",
    physique=json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 100,
        "messages": [
            {"role": "user", "content": "What is the capital of France?"}
        ]
    }),
    contentType="software/json",
    settle for="software/json"
)
response_dict = json.masses(response.get('physique').learn())
print(response_dict["content"][0]["text"])

This integration mechanically creates variations of your experiments and tracks configuration, providing you with full visibility into your Amazon Bedrock functions with out altering your core logic.

Experiment with Amazon Bedrock FM at W&B Weave Playground

of W&B Weave Playground Speed up speedy engineering with an intuitive interface for testing and evaluating Bedrock fashions. The primary options are:

Direct immediate modifying and message retry
Examine fashions aspect by aspect
Entry from hint view for speedy iteration

First, add yours AWS credentials Within the playground settings, select the one you want Amazon Bedrock FMbegin the experiment. This interface permits speedy iteration at prompts whereas sustaining full traceability of experiments.

Amazon Bedrock FM ranking by W&B Weave ranking

Evaluation of W&B weave We offer specialised instruments to successfully consider generative AI fashions. Utilizing W&B Weave Analysis with Amazon Bedrock, customers can effectively consider these fashions, analyze output, and visualize efficiency throughout key metrics. Customers can use W&B Weave’s built-in scorers, third-party or customized scorers, and human/knowledgeable suggestions. This mixture supplies a deeper understanding of trade-offs between fashions, together with variations in value, accuracy, pace, and output high quality.

W&B Weave has a good way to trace evaluations utilizing the Mannequin & Analysis class. To arrange an evaluation job, prospects can:

outline dataset or a dictionary record of examples to judge
Create a listing of scoring capabilities. Every perform has a model_output and optionally different inputs from the pattern, and should return a dictionary containing the scores.
Outline an Amazon Bedrock mannequin utilizing the Mannequin class
Consider this mannequin by calling Analysis

The next is an instance of the settings for an analysis job.

import weave
from weave import Analysis
import asyncio

# Accumulate your examples
examples = [
    {"question": "What is the capital of France?", "expected": "Paris"},
    {"question": "Who wrote 'To Kill a Mockingbird'?", "expected": "Harper Lee"},
    {"question": "What is the square root of 64?", "expected": "8"},
]

# Outline any customized scoring perform
@weave.op()
def match_score1(anticipated: str, output: dict) -> dict:
    # Right here is the place you'd outline the logic to attain the mannequin output
    return {'match': anticipated == model_output['generated_text']}

@weave.op()
def function_to_evaluate(query: str):
    # here is the place you'll add your LLM name and return the output
    return  {'generated_text': 'Paris'}

# Rating your examples utilizing scoring capabilities
analysis = Analysis(
    dataset=examples, scorers=[match_score1]
)

# Begin monitoring the analysis
weave.init('intro-example')
# Run the analysis
asyncio.run(analysis.consider(function_to_evaluate))

Analysis dashboards visualize efficiency metrics and provide help to make knowledgeable mannequin choice and configuration choices. Please see our earlier submit for detailed steerage. Evaluating LLM summaries using Amazon Bedrock and Weave.

Amazon Bedrock AgentCore Observability Enhancements with W&B Weave

Amazon Bedrock AgentCore is an entire set of providers to extra securely deploy and function high-performance brokers at enterprise scale. It supplies a safer runtime setting, workflow execution instruments, and operational controls that work with widespread frameworks similar to: strand agentCrewAI, LangGraph, LlamaIndex, and plenty of different LLM fashions from Amazon Bedrock or exterior sources.

AgentCore has built-in observability by way of the Amazon CloudWatch dashboard that tracks key metrics similar to token utilization, latency, session period, and error charges. It additionally tracks workflow steps, exhibiting which instruments have been referred to as and the way the mannequin responded, offering important visibility for debugging and high quality assurance in manufacturing.

Utilizing AgentCore and W&B Weave collectively permits groups to make use of the operational monitoring and safety basis constructed into AgentCore, whereas additionally utilizing W&B Weave when it matches their present growth workflows. Organizations which have already invested in a W&B setting can select to include W&B Weave’s visualization instruments together with AgentCore’s native performance. This strategy offers groups the pliability to make use of the observability resolution that most closely fits their established processes and preferences when growing complicated brokers that chain a number of instruments and inference steps.

There are two foremost approaches to including W&B Weave observability to AgentCore brokers. Both utilizing the native W&B Weave SDK or integrating by way of OpenTelemetry.

Native W&B Weave SDK

The only strategy is to make use of W&B Weave’s @weave.op decorator to mechanically observe perform calls. Initialize W&B Weave together with your challenge title and wrap the capabilities you need to monitor.

import weave
import os

os.environ["WANDB_API_KEY"] = "your_api_key"
weave.init("your_project_name")

@weave.op()
def word_count_op(textual content: str) -> int:
    return len(textual content.cut up())

@weave.op()
def run_agent(agent: Agent, user_message: str) -> Dict[str, Any]:
    consequence = agent(user_message)
    return {"message": consequence.message, "mannequin": agent.mannequin.config["model_id"]}

AgentCore runs as a Docker container, so add W&B Weave (for instance, uv add Weave) to your dependencies to incorporate it in your container picture.

OpenTelemetry integration

For groups already utilizing OpenTelemetry or needing vendor-neutral instrumentation, W&B Weave instantly helps OTLP (OpenTelemetry Protocol).

from opentelemetry import hint
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter

auth_b64 = base64.b64encode(f"api:{WANDB_API_KEY}".encode()).decode()
exporter = OTLPSpanExporter(
    endpoint="https://hint.wandb.ai/otel/v1/traces",
    headers={"Authorization": f"Primary {auth_b64}", "project_id": WEAVE_PROJECT}
)

# Create spans to trace execution
with tracer.start_as_current_span("invoke_agent") as span:
    span.set_attribute("enter.worth", json.dumps({"immediate": user_message}))
    consequence = agent(user_message)
    span.set_attribute("output.worth", json.dumps({"message": consequence.message}))

This strategy routes traces to W&B Weave for visualization whereas remaining appropriate with AgentCore’s present OpenTelemetry infrastructure. When utilizing each AgentCore and W&B Weave collectively, groups have a number of choices for observability. AgentCore’s CloudWatch integration supplies tracing of agent inference and gear choice whereas monitoring system well being, useful resource utilization, and error charges. W&B Weave supplies visualization capabilities that show execution knowledge in a format acquainted to groups already utilizing a W&B setting. Each options present visibility into how brokers course of data and make choices, permitting organizations to decide on the observability strategy that most closely fits their present workflows and configurations. This two-tier strategy permits customers to:

Monitor manufacturing service stage agreements (SLAs) by way of CloudWatch alerts
Debug complicated agent conduct with W&B Weave’s Hint Explorer
Optimize token utilization and latency with detailed execution breakdowns.
Examine agent efficiency throughout totally different prompts and configurations

The combination requires minimal code adjustments and your present AgentCore deployment will be maintained and scaled based on agent complexity. Whether or not you are constructing a easy device invocation agent or orchestrating a multi-step workflow, this observability stack supplies the insights you could iterate rapidly and deploy with confidence.

For implementation particulars and full code examples, see the next documentation: Previous post.

conclusion

On this submit, we demonstrated how Amazon Bedrock’s FM and AgentCore will be mixed with W&B Weave’s complete observability toolkit to construct and optimize enterprise-grade agent AI options. We thought-about how W&B Weave can energy each stage of the LLM growth lifecycle, from preliminary experimentation within the playground to systematic analysis of mannequin efficiency and, in the end, manufacturing monitoring of complicated agent workflows.

The combination of Amazon Bedrock and W&B Weave supplies a number of vital options.

Mechanically observe Amazon Bedrock FM calls with minimal code adjustments utilizing W&B Weave SDK
Take a look at prompts and examine fashions for speedy experimentation utilizing W&B Weave Playground’s intuitive interface
Systematic analysis with customized scoring capabilities to judge totally different Amazon Bedrock fashions
Complete observability of AgentCore deployments utilizing CloudWatch metrics supplies extra strong operational monitoring, complemented by detailed execution traces.

To get began:

Request a free trial or subscribe to the Weights &Biases AI growth platform by way of AWS Market
Set up W&B Weave SDK Observe our code examples to begin monitoring your Bedrock FM calls
Experiment with totally different fashions within the W&B Weave Playground by including your AWS credentials and testing totally different Amazon Bedrock FMs.
Arrange an analysis utilizing the W&B Weave analysis framework to systematically examine the efficiency of fashions on your use circumstances.
Improve your AgentCore agent by including W&B Weave observability utilizing the native SDK or OpenTelemetry integration

Begin with a easy integration to trace Amazon Bedrock calls and steadily undertake extra superior options as your AI software turns into extra complicated. The mixture of Amazon Bedrock and W&B Weave’s complete growth instruments supplies the inspiration you could construct, consider, and preserve production-ready AI options at scale.

In regards to the writer

James Yee I’m a Senior AI/ML Accomplice Options Architect at AWS. He spearheads AWS’ strategic partnerships in rising applied sciences and leads engineering groups to design and develop cutting-edge collaborative options in generative AI. He allows discipline and technical groups to seamlessly deploy, function, safe, and combine companion options on AWS. James works intently with enterprise leaders to outline and execute collaborative go-to-market methods to drive progress for cloud-based companies. Outdoors of labor, I get pleasure from enjoying soccer, touring, and spending time with my household.

ray strickland He’s a Senior Accomplice Options Architect at AWS, specializing in AI/ML, Agenttic AI, and Clever Doc Processing. He allows companions to deploy scalable generative AI options utilizing AWS finest practices and drives innovation by way of strategic companion help applications. Ray collaborates throughout a number of AWS groups to speed up AI adoption and has intensive expertise in companion evaluation and enablement.

thomas capel I am a machine studying engineer at Weights & Biases. He’s liable for maintaining the www.github.com/wandb/examples repository stay and updated. We’re additionally constructing content material about MLOPS, W&B functions to business, and enjoyable deep studying normally. Beforehand, he used deep studying to resolve short-term predictions for photo voltaic power. He has a background in city planning, combinatorial optimization, transportation economics, and utilized arithmetic.

scott juan I’m Alliance Director at Weights & Biases. Previous to becoming a member of W&B, he led quite a few strategic partnerships at AWS and Cloudera. Scott studied supplies engineering and is keen about renewable power.

Welcome to Ivugangingo!

At Ivugangingo, we're passionate about delivering insightful content that empowers and informs our readers across a spectrum of crucial topics. Whether you're delving into the world of insurance, navigating the complexities of cryptocurrency, or seeking wellness tips in health and fitness, we've got you covered.

Speed up enterprise AI growth with Weights & Biases and Amazon Bedrock AgentCore

Overview of W&B Weave

Monitoring Amazon Bedrock FM utilizing W&B Weave SDK

Experiment with Amazon Bedrock FM at W&B Weave Playground

Amazon Bedrock FM ranking by W&B Weave ranking

Amazon Bedrock AgentCore Observability Enhancements with W&B Weave

Native W&B Weave SDK

OpenTelemetry integration

conclusion

In regards to the writer

PwC ramps up digital asset efforts beneath revised US regulatory surroundings

Sorry, Tamagotchi followers, it is AI time.

Converter

Editors Pick

Newsletter

Categories

Related Posts