This submit was co-authored by Thomas Capelle and Ray Strickland of Weights & Biases (W&B).
The adoption of generative synthetic intelligence (AI) is accelerating throughout the enterprise, evolving from easy underlying mannequin interactions to classy agent workflows. As organizations transfer from proof of idea to manufacturing deployment, they want strong instruments to develop, consider, and monitor AI functions at scale.
This submit reveals you how you can use Amazon Bedrock’s Basis Mannequin (FM) with the newly launched Amazon Bedrock AgentCore. W&B Weave We provide help to construct, consider, and monitor your enterprise AI options. We cowl your complete growth lifecycle, from monitoring particular person FM calls to monitoring complicated agent workflows in manufacturing.
Overview of W&B Weave
Weights and Bias (W&B) is an AI developer system that gives complete instruments for coaching, fine-tuning, and leveraging underlying fashions for corporations of all sizes in quite a lot of industries.
W&B Weave supplies an built-in suite of developer instruments to assist each stage of your agent AI workflow. This lets you:
- Tracing and monitoring: Monitor massive language mannequin (LLM) calls and software logic to debug and analyze manufacturing techniques.
- Systematic iteration: Modify and iterate on prompts, datasets, and fashions.
- experiment: Experiment with totally different fashions and prompts. LLM Playground.
- analysis: Use customized or pre-built scorers with comparability instruments to systematically consider and improve software efficiency. Accumulate person and knowledgeable suggestions for real-world testing and analysis.
- guardrail: Defend your functions with content material administration, speedy security, and different safeguards. Use customized or third-party guardrails, together with Amazon Bedrock Guardrails, or W&B Weave’s native guardrails.
W&B Weave will be absolutely managed by Weights & Biases in a multi-tenant or single-tenant setting, or deployed instantly right into a buyer’s Amazon Digital Non-public Cloud (VPC). Moreover, W&B Weave’s integration into the W&B growth platform supplies organizations with a seamlessly built-in expertise between mannequin coaching/fine-tuning workflows and agent AI workflows.
To get began, subscribe to the Weights & Biases AI growth platform by way of AWS Market. People and tutorial groups can subscribe to W&B at no extra cost.
Monitoring Amazon Bedrock FM utilizing W&B Weave SDK
W&B Weave seamlessly integrates with Amazon Bedrock by way of the Python and TypeScript SDKs. As soon as you put in the library and patch your Bedrock shopper, W&B Weave mechanically tracks LLM calls.
This integration mechanically creates variations of your experiments and tracks configuration, providing you with full visibility into your Amazon Bedrock functions with out altering your core logic.
Experiment with Amazon Bedrock FM at W&B Weave Playground
of W&B Weave Playground Speed up speedy engineering with an intuitive interface for testing and evaluating Bedrock fashions. The primary options are:
- Direct immediate modifying and message retry
- Examine fashions aspect by aspect
- Entry from hint view for speedy iteration
First, add yours AWS credentials Within the playground settings, select the one you want Amazon Bedrock FMbegin the experiment. This interface permits speedy iteration at prompts whereas sustaining full traceability of experiments.
Amazon Bedrock FM ranking by W&B Weave ranking
Evaluation of W&B weave We offer specialised instruments to successfully consider generative AI fashions. Utilizing W&B Weave Analysis with Amazon Bedrock, customers can effectively consider these fashions, analyze output, and visualize efficiency throughout key metrics. Customers can use W&B Weave’s built-in scorers, third-party or customized scorers, and human/knowledgeable suggestions. This mixture supplies a deeper understanding of trade-offs between fashions, together with variations in value, accuracy, pace, and output high quality.
W&B Weave has a good way to trace evaluations utilizing the Mannequin & Analysis class. To arrange an evaluation job, prospects can:
- outline dataset or a dictionary record of examples to judge
- Create a listing of scoring capabilities. Every perform has a model_output and optionally different inputs from the pattern, and should return a dictionary containing the scores.
- Outline an Amazon Bedrock mannequin utilizing the Mannequin class
- Consider this mannequin by calling Analysis
The next is an instance of the settings for an analysis job.
Analysis dashboards visualize efficiency metrics and provide help to make knowledgeable mannequin choice and configuration choices. Please see our earlier submit for detailed steerage. Evaluating LLM summaries using Amazon Bedrock and Weave.
Amazon Bedrock AgentCore Observability Enhancements with W&B Weave
Amazon Bedrock AgentCore is an entire set of providers to extra securely deploy and function high-performance brokers at enterprise scale. It supplies a safer runtime setting, workflow execution instruments, and operational controls that work with widespread frameworks similar to: strand agentCrewAI, LangGraph, LlamaIndex, and plenty of different LLM fashions from Amazon Bedrock or exterior sources.
AgentCore has built-in observability by way of the Amazon CloudWatch dashboard that tracks key metrics similar to token utilization, latency, session period, and error charges. It additionally tracks workflow steps, exhibiting which instruments have been referred to as and the way the mannequin responded, offering important visibility for debugging and high quality assurance in manufacturing.
Utilizing AgentCore and W&B Weave collectively permits groups to make use of the operational monitoring and safety basis constructed into AgentCore, whereas additionally utilizing W&B Weave when it matches their present growth workflows. Organizations which have already invested in a W&B setting can select to include W&B Weave’s visualization instruments together with AgentCore’s native performance. This strategy offers groups the pliability to make use of the observability resolution that most closely fits their established processes and preferences when growing complicated brokers that chain a number of instruments and inference steps.
There are two foremost approaches to including W&B Weave observability to AgentCore brokers. Both utilizing the native W&B Weave SDK or integrating by way of OpenTelemetry.
Native W&B Weave SDK
The only strategy is to make use of W&B Weave’s @weave.op decorator to mechanically observe perform calls. Initialize W&B Weave together with your challenge title and wrap the capabilities you need to monitor.
AgentCore runs as a Docker container, so add W&B Weave (for instance, uv add Weave) to your dependencies to incorporate it in your container picture.
OpenTelemetry integration
For groups already utilizing OpenTelemetry or needing vendor-neutral instrumentation, W&B Weave instantly helps OTLP (OpenTelemetry Protocol).
This strategy routes traces to W&B Weave for visualization whereas remaining appropriate with AgentCore’s present OpenTelemetry infrastructure. When utilizing each AgentCore and W&B Weave collectively, groups have a number of choices for observability. AgentCore’s CloudWatch integration supplies tracing of agent inference and gear choice whereas monitoring system well being, useful resource utilization, and error charges. W&B Weave supplies visualization capabilities that show execution knowledge in a format acquainted to groups already utilizing a W&B setting. Each options present visibility into how brokers course of data and make choices, permitting organizations to decide on the observability strategy that most closely fits their present workflows and configurations. This two-tier strategy permits customers to:
- Monitor manufacturing service stage agreements (SLAs) by way of CloudWatch alerts
- Debug complicated agent conduct with W&B Weave’s Hint Explorer
- Optimize token utilization and latency with detailed execution breakdowns.
- Examine agent efficiency throughout totally different prompts and configurations
The combination requires minimal code adjustments and your present AgentCore deployment will be maintained and scaled based on agent complexity. Whether or not you are constructing a easy device invocation agent or orchestrating a multi-step workflow, this observability stack supplies the insights you could iterate rapidly and deploy with confidence.
For implementation particulars and full code examples, see the next documentation: Previous post.
conclusion
On this submit, we demonstrated how Amazon Bedrock’s FM and AgentCore will be mixed with W&B Weave’s complete observability toolkit to construct and optimize enterprise-grade agent AI options. We thought-about how W&B Weave can energy each stage of the LLM growth lifecycle, from preliminary experimentation within the playground to systematic analysis of mannequin efficiency and, in the end, manufacturing monitoring of complicated agent workflows.
The combination of Amazon Bedrock and W&B Weave supplies a number of vital options.
- Mechanically observe Amazon Bedrock FM calls with minimal code adjustments utilizing W&B Weave SDK
- Take a look at prompts and examine fashions for speedy experimentation utilizing W&B Weave Playground’s intuitive interface
- Systematic analysis with customized scoring capabilities to judge totally different Amazon Bedrock fashions
- Complete observability of AgentCore deployments utilizing CloudWatch metrics supplies extra strong operational monitoring, complemented by detailed execution traces.
To get began:
- Request a free trial or subscribe to the Weights &Biases AI growth platform by way of AWS Market
- Set up W&B Weave SDK Observe our code examples to begin monitoring your Bedrock FM calls
- Experiment with totally different fashions within the W&B Weave Playground by including your AWS credentials and testing totally different Amazon Bedrock FMs.
- Arrange an analysis utilizing the W&B Weave analysis framework to systematically examine the efficiency of fashions on your use circumstances.
- Improve your AgentCore agent by including W&B Weave observability utilizing the native SDK or OpenTelemetry integration
Begin with a easy integration to trace Amazon Bedrock calls and steadily undertake extra superior options as your AI software turns into extra complicated. The mixture of Amazon Bedrock and W&B Weave’s complete growth instruments supplies the inspiration you could construct, consider, and preserve production-ready AI options at scale.
In regards to the writer
James Yee I’m a Senior AI/ML Accomplice Options Architect at AWS. He spearheads AWS’ strategic partnerships in rising applied sciences and leads engineering groups to design and develop cutting-edge collaborative options in generative AI. He allows discipline and technical groups to seamlessly deploy, function, safe, and combine companion options on AWS. James works intently with enterprise leaders to outline and execute collaborative go-to-market methods to drive progress for cloud-based companies. Outdoors of labor, I get pleasure from enjoying soccer, touring, and spending time with my household.
ray strickland He’s a Senior Accomplice Options Architect at AWS, specializing in AI/ML, Agenttic AI, and Clever Doc Processing. He allows companions to deploy scalable generative AI options utilizing AWS finest practices and drives innovation by way of strategic companion help applications. Ray collaborates throughout a number of AWS groups to speed up AI adoption and has intensive expertise in companion evaluation and enablement.
thomas capel I am a machine studying engineer at Weights & Biases. He’s liable for maintaining the www.github.com/wandb/examples repository stay and updated. We’re additionally constructing content material about MLOPS, W&B functions to business, and enjoyable deep studying normally. Beforehand, he used deep studying to resolve short-term predictions for photo voltaic power. He has a background in city planning, combinatorial optimization, transportation economics, and utilized arithmetic.
scott juan I’m Alliance Director at Weights & Biases. Previous to becoming a member of W&B, he led quite a few strategic partnerships at AWS and Cloudera. Scott studied supplies engineering and is keen about renewable power.




