Amazon’s Finance Expertise (FinTech) groups construct and function techniques for Amazon groups to handle regulatory inquiries in compliance with totally different jurisdictions. These groups course of regulatory inquiries from authorities, every presenting totally different necessities, doc codecs, and complexity ranges.
Processing these regulatory inquiries entails reviewing documentation, extracting related info, retrieving supporting knowledge from a number of techniques inside Amazon’s infrastructure, and compiling responses inside regulatory timeframes. As inquiry frequency and enterprise complexity grew, Amazon wanted a extra scalable strategy.
On this put up, we display how Amazon FinTech groups are utilizing Amazon Bedrock and different AWS providers to construct a scalable AI utility to remodel how regulatory inquiries are dealt with. Every group utilizing this resolution creates and maintains its personal devoted data base, populated with that group’s particular paperwork and reference supplies.
Challenges
The size and complexity of managing regulatory inquiries introduced a number of interconnected challenges:
Data fragmentation and retrieval complexity
Regulatory inquiries require synthesizing info from hundreds of historic paperwork. These paperwork exist in varied codecs (PDF, PPT, Phrase, CSV) and include domain-specific terminology. Groups wanted a option to rapidly find related precedents and supporting info throughout this huge corpus whereas sustaining accuracy and regulatory compliance.
Conversational context and state administration
Regulatory inquiries require multi-turn conversations the place context from earlier interactions is crucial for producing correct responses. Sustaining conversational state throughout classes and monitoring response evolution as group members refine solutions by means of iterative interactions presents vital complexity.
Observability and steady enchancment
With generative AI techniques, understanding why a selected response was generated is as vital because the response itself. Groups required complete visibility into the retrieval course of, mannequin selections, and consumer interactions to determine areas for enchancment and preserve compliance with accountable AI rules. For instance, groups should detect when the mannequin hallucinates info that isn’t current in supply paperwork, or catch when the system retrieves outdated compliance pointers that would result in regulatory violations. AI techniques expertise accuracy drift over time as fashions, prompts, and the doc corpus change, requiring steady monitoring.
Answer overview
To handle these challenges, Amazon FinTech group constructed an clever regulatory response automation system utilizing Amazon Bedrock, AWS Lambda, and supporting AWS providers. The answer implements Retrieval Augmented Era (RAG) with Amazon Bedrock Data Bases and Amazon OpenSearch Serverless for vector storage, enabling info retrieval from hundreds of historic paperwork. Actual-time chat interactions powered by Claude Sonnet 4.5 by means of the Converse Stream API, mixed with Amazon DynamoDB for dialog historical past administration, present contextually-aware multi-turn conversations. Complete observability by means of OpenTelemetry and self-hosted Langfuse ensures steady monitoring and enchancment of the AI system’s efficiency. The system doesn’t cache giant language mannequin (LLM) responses or intermediate outcomes as a result of regulatory inquiries are extremely contextual and are vulnerable to a low cache hit price.
The next diagram reveals how you should utilize Amazon Bedrock Data Bases in a workflow, alongside Converse API and different instruments, to supply vital info for regulatory inquiries:
Data base ingestion move
The data base ingestion move offers an automatic doc processing pipeline that initiates after the consumer uploads a doc. Its job is to embed the doc’s knowledge into an Amazon Bedrock Data Base. Right here is the move:
You should utilize the data base ingestion workflow to add paperwork in bulk and rework them into searchable vector embeddings by means of an automatic pipeline. The next detailed move is illustrated within the earlier determine.
- Doc Add by Person: Customers add paperwork by means of the consumer utility.
- Pre-Signed URL Era: The consumer utility sends a request to Amazon API Gateway, which invokes the data base ingestion AWS Lambda perform to generate a pre-signed S3 URL.
- Doc Add: The consumer utility makes use of the generated pre-signed URL to add the doc.
- Ingestion Set off and Information Processing: After the doc is efficiently uploaded to Amazon Easy Storage Service (Amazon S3), the consumer utility triggers the Amazon API Gateway to provoke the doc processing AWS Lambda, which handles format conversion and manages the concurrent ingestion of paperwork. We don’t have to pre-process the photographs, charts, and tables in these paperwork as a result of the Amazon Bedrock Data Base is configured with Amazon Bedrock Information Automation (BDA) to successfully extract this multimodal content material. The AWS Lambda perform then calls the Amazon Bedrock Data Bases.
- Vector Storage: The Amazon Bedrock Data Base chunks the doc content material utilizing a hierarchical chunking technique, generates embeddings utilizing Amazon Titan Textual content Embeddings, and shops the ensuing vectors in OpenSearch Serverless. Hierarchical chunking creates nested parent-child relationships that mirror the sectioned construction of monetary paperwork. This technique works properly for structured and complicated paperwork as a result of it indexes small chunks for exact retrieval whereas returning bigger guardian chunks to supply adequate context for coherent responses.
Constructing an automatic ingestion pipeline addresses the core problem of data fragmentation by effectively processing hundreds of historic paperwork throughout a number of codecs whereas optimizing content material indexing for related AI responses. This parallelized strategy allows the system to scale successfully, accommodating the rising year-over-year regulatory inquiry exercise whereas sustaining constant processing efficiency throughout giant doc volumes.
Chat Software
The Chat Software offers a real-time dialog interface powered by AWS serverless structure, enabling pure language interactions with the system. We selected to stream responses to clients to allow them to start studying the AI response sooner in real-time, implementing this functionality by means of WebSocket connections. By way of these WebSocket connections and the Claude Sonnet 4.5 mannequin, it delivers contextually related responses whereas sustaining dialog state in DynamoDB. The workflow operates as follows:
- Provoke Chat Dialog: Customers provoke or open an present chat session by means of the consumer utility.
- WebSocket Connection: The applying makes use of WebSockets to determine a persistent, bi-directional reference to Amazon API Gateway.
- Message Submission: The applying posts the consumer questions by means of the WebSocket connection which is propagated to the Chat service AWS Lambda perform.
- Question Enhancement: The Chat Service AWS Lambda perform makes use of the Claude 3.5 Haiku mannequin with a query expansion strategy to generate a number of variations of the consumer’s query.
- Data Retrieval: The Chat Service Lambda invokes the Amazon Bedrock Data Bases Retrieve API for every expanded question. The API performs vector similarity searches in opposition to the underlying OpenSearch Serverless index and returns probably the most related doc chunks together with their supply metadata and relevance scores.
- Context Meeting: The Chat Service AWS Lambda perform retrieves dialog historical past from Amazon DynamoDB (for present conversations, primarily based on that particular dialog ID) and combines it with the retrieved data base outcomes and the consumer’s query.
- Response Era: The Chat Service AWS Lambda perform makes use of the Converse Stream API with Claude Sonnet 4.5 and a response generator immediate to supply a contextually related reply primarily based on the assembled context.
- Person Engagement: The Chat Service AWS Lambda perform streams the generated response again to the consumer utility in Markdown format by means of the WebSocket connection and shops all of the dialog within the Conversational Historical past Desk by Amazon DynamoDb.
- Observability: All through the method, the Chat Service publishes end-to-end traces to a self-hosted Langfuse occasion utilizing the OpenTelemetry (OTEL) SDK. This captures detailed telemetry knowledge together with latency metrics, token utilization, immediate templates, and mannequin responses.
Multi-turn conversational expertise
Regulatory inquiry discussions usually progress by means of a number of exchanges as groups refine responses and reference further knowledge sources. To assist this iterative course of, the Amazon FinTech group carried out a multi-turn conversational workflow utilizing Amazon API Gateway (WebSocket APIs), AWS Lambda, and Amazon DynamoDB, built-in with the Amazon Bedrock ConverseStream API for low-latency, context-aware dialogue. Every chat session is securely authenticated by means of Amazon Cognito and assigned a singular dialog ID. DynamoDB shops messages in chronological order to protect context throughout classes, so customers can resume prior discussions seamlessly and preserve continuity.
When a consumer submits a question, the system sanitizes inputs to forestall immediate injection assaults. After sanitization, the system classifies intent and determines whether or not retrieval from the Amazon Bedrock Data Base is required. This willpower is made by means of an LLM name that classifies the consumer question as both conversational or data intensive. For advanced, knowledge-intensive questions, the workflow employs a question growth technique that addresses the prevalent use of acronyms and abbreviated questions by customers. This layer generates as much as 5 question variations utilizing Claude 3.5 Haiku, then makes parallel Retrieve API calls to the Data Base, retrieving related outcomes utilizing OpenSearch vector similarity search. To keep up efficiency at scale, the workflow implements parallel processing for these retrieval calls utilizing multi-threading. This optimization decreased retrieval latency from 10 seconds (sequential processing) to beneath 2 seconds, enabling responsive conversations. The retrieved info—mixed with current dialog historical past—is handed to Claude Sonnet 4.5 by means of the ConverseStream API augmented with Amazon Bedrock Guardrails, that implement delicate info filters to routinely detect and take away PII and monetary knowledge from each inputs and outputs. That is essential for safeguarding regulatory documentation. When immediate injection makes an attempt are detected, the system responds with “Sorry, the mannequin can not reply that query,” maintaning safe and compliant interactions whereas sustaining conversational fluency.
This structure delivers continuity, transparency, and scalability. Customers obtain real-time, streaming responses with standing updates all through the retrieval and era phases, bettering engagement and decreasing latency. Persistent logs in DynamoDB present an immutable audit path for compliance evaluate, whereas the serverless and event-driven design scales routinely to assist concurrent classes. Collectively, these capabilities allow Amazon FinTech group to conduct advanced, iterative conversations—producing contextually related, safe, and regulatory-compliant responses powered by Amazon Bedrock.
Observability
Observability performs a essential position in understanding and bettering AI-driven workflows. To realize full visibility into the regulatory inquiry response system, the Chat Service AWS Lambda built-in OpenTelemetry (OTEL) with a self-hosted Langfuse occasion to seize detailed, end-to-end traces of every interplay. This setup offers engineers and utilized scientists with fine-grained telemetry on how prompts are processed, data is retrieved, and responses are generated. This permits almost steady refinement of the system’s efficiency and accuracy. The choice to make use of OTEL over the native Langfuse SDK offers vendor-neutral flexibility, permitting telemetry knowledge to be routed to a number of observability backends and tailored to evolving monitoring necessities.
At runtime, every stage of the Chat Service AWS Lambda is manually instrumented utilizing the OTEL Java SDK to document latency, token utilization, mannequin selections, and immediate metadata in OTEL Generative AI semantic standard. Spans are printed to Langfuse in close to actual time, giving the group a clear view of how the Amazon Bedrock ConverseStream API, Data Base retrieval, and Claude Sonnet 4.5 work together inside a single request. The detailed telemetry permits the group to determine efficiency bottlenecks, optimize immediate methods, and improve retrieval precision whereas sustaining accountable AI practices.
This observability framework maintains belief and accountability within the system’s conduct. Engineers can correlate consumer actions with mannequin outcomes, hint knowledge lineage throughout a number of providers, and fine-tune configurations with out disrupting operations. By combining OpenTelemetry’s interoperability with Langfuse’s visualization and analytics, Amazon FinTech group features a scalable, extensible basis for evaluating generative AI techniques at scale—turning each interplay into actionable perception for steady enchancment.
The next screenshot illustrates an end-to-end hint captured in Langfuse, showcasing how the observability resolution captures the entire workflow—from question growth and data retrieval to mannequin prompts, responses, and latency metrics. It additionally highlights supply doc citations, providing a clear view of how contextual info flows by means of the system throughout response era
Reference: Finish-to-Finish Hint Posted in Langfuse
Conclusion
On this put up, you noticed how Amazon FinTech group constructed a scalable AI resolution utilizing Amazon Bedrock, designed to assist regulatory inquiries by automating data retrieval, conversational workflows, and response era. By combining a doc ingestion pipeline, multi-turn stateful conversations, and detailed observability through OpenTelemetry and Langfuse, the structure empowers groups to deal with regulatory inquiries in ruled, traceable and compliant method.
As a result of the complete stack is constructed on AWS serverless providers, it gives the operational scalability, safety, and elasticity required for enterprise-grade deployment. Whether or not you’re coping with authorized compliance, regulatory inquiries, or high-volume inner data workflows, this sample gives a sensible basis which you can tailor and prolong to what you are promoting area.
For those who’re able to modernize your knowledge-intensive processes with generative AI, discover the Amazon Bedrock documentation to find how one can start constructing your individual safe, ruled, and scalable AI-powered workflows.
Concerning the authors


