Many corporations have massive volumes of paper or digital paperwork that include untapped enterprise intelligence. With the development of generative AI, varied massive language fashions can be utilized to precisely extract related knowledge from these paperwork. This publish demonstrates an clever doc processing pipeline that consists of each on-demand inference and batch inference choices on Amazon Bedrock to allow the pliability on the doc processing time and value. For time-sensitive requests, one can use the on-demand inference choice, whereas the batch inference choice is most value optimized. It additionally explains methods to dynamically specify the big language mannequin and prompts on the doc stage, enabling you to extract knowledge from a number of kinds of paperwork utilizing the identical pipelines.
Answer overview
In the event you, like one in all our clients, have lots of of hundreds of thousands land lease paperwork in scanned PDF format (PDF that incorporates solely photographs with out editable textual content, e.g. on this case, scanned land lease saved as PDF) within the backlog, and new paperwork are nonetheless piling up day by day, it is a answer you should utilize to successfully extract knowledge from these paperwork. As proven within the following diagram, this answer builds two inference pipelines, on-demand and batch, with a mechanism to invoke them dynamically. By utilizing successfully designed prompts managed in Amazon Bedrock Immediate Administration, the information might be extracted and standardized from scan PDFs, which frequently have various codecs and conventions, or from textual content recordsdata.
The pipeline on the left is the on-demand pipeline that extracts knowledge from paperwork one-by-one, returning outcomes inside seconds. This makes it appropriate for time-sensitive requests.
The pipeline on the fitting is the batch inference pipeline that processes a number of doc requests in a single Amazon Bedrock batch inference job, the place your mannequin invocation will likely be processed asynchronously. Customers can specify the immediate ID and model within the request in each pipelines, and the corresponding immediate textual content will likely be retrieved from Amazon Bedrock Immediate Administration.
The next sections present detailed descriptions of each pipelines.
1. On-demand inference pipeline
An AWS SQS First-In, First-Out (FIFO) queue is created within the on-demand inference pipeline. When a queue message containing the doc ID, LLM mannequin ID, immediate ID/model, and system immediate ID/model arrives, it triggers an AWS Lambda perform. This perform retrieves the PDF doc from the required Amazon S3 bucket, converts the PDF pages to PNG photographs, retrieves the related prompts from Amazon Bedrock Immediate Administration, composes the message to name the LLM, and saves the end result into an Amazon DynamoDB desk.
1.1. AWS SQS FIFO queue
An AWS SQS FIFO queue is used to set off Amazon Bedrock inference when a single doc arrives. The important thing causes for utilizing a FIFO queue are:
- Dependable Message Supply – Makes positive that every message is delivered precisely as soon as.
- First-In, First-Out (FIFO) Processing – Maintains a strict ordering, offering higher predictability for processing.
- Message Grouping – The Message Group ID attribute makes positive the messages are processed so as inside every group. Every producer can use a novel Message Group ID to keep up order for associated messages.
How is a queue message created?
The queue messages might be created externally with AWS CLI or AWS SDK API. The next is an AWS CLI command instance:
The file message_txt.txt on this instance is a JSON file containing the message attributes wanted for the applying. See particulars within the Testing the pipelines part under.
The Lambda perform will delete the queue message after Amazon Bedrock has returned the extracted knowledge.
1.2. Lambda perform – queue message processing and inferencing
1.2.1 Retrieving the paperwork, changing to pictures, and splitting massive recordsdata
The Lambda perform downloads the doc utilizing the s3_location attribute within the queue message. If the doc is scanned PDF, it’s then transformed to pictures for the multimodal mannequin to know.
As of this writing, the Claude 4 Sonnet mannequin solely permits a most of 20 photographs per multimodal invocation. Due to this fact, if a doc incorporates greater than 20 pages of photographs, it have to be break up into chunks of 20 pages. The doc_id, chunk_count and chunk_id are saved in an Amazon DynamoDB desk, together with the extracted outcomes and the mannequin efficiency metrics.
doc_id: the identifier of the docchunk_count: the entire variety of chunks for that docchunk_id: the identifier of every chunk of the doc
1.2.2. Retrieving prompts from Amazon Bedrock Immediate Administration
Land lease paperwork fluctuate in format – some current land tract attributes in numbered listing, others in tables, and a few even in land drawings. Therefore, utilizing totally different prompts tailor-made to every doc format enhances extraction accuracy.
The prompts used within the LLM name are saved in Amazon Bedrock Immediate Administration. Every immediate has a novel ID and is versioned. The SQS messages should specify the related immediate ID and model, that are then used to retrieve the immediate physique throughout Lambda execution.
Observe: There’s a service restrict of fifty prompts per area and 10 variations per immediate.
1.2.3 Composing message for LLM calls and processing the response
The Lambda perform continues with the next steps:
- Compose the messages for LLM by concatenating the immediate physique and pictures.
- Ship request(s) to Amazon Bedrock utilizing the Converse API.
The LLM will return the extract knowledge in a JSON string, you possibly can study the lead to your DynamoDB desk as illustrated within the following testing the pipelines part.
1.2.4 Saving the outcomes
Lastly, the Lambda perform completes the method by:
- Parsing the JSON and storing the land tract attributes to the DynamoDB desk.
- If the doc has been efficiently processed and the outcomes are saved, the SQS message is deleted from the queue.
2. Batch inference pipeline
A normal AWS SQS queue is used for the batch inference pipeline due to its excessive throughput. The queue messages are created in the same manner as within the on-demand pipeline, besides the message-group-id attribute is just not required.
The principle parts within the batch inference pipeline contains:
- Amazon EventBridge Scheduler.
- Batch Inference AWS Lambda perform to pre-process the scanned PDFs, create JSONL recordsdata and submit the batch inference job.
- Amazon EventBridge rule.
- Put up-processing AWS Lambda perform.
The next sections describe the small print of the batch inference pipeline.
2.1. Amazon EventBridge scheduler
An Amazon EventBridge Scheduler begins the batch inference Lambda perform on a schedule.
2.2. Batch inference Lambda perform
The perform first checks if there are sufficient messages within the queue earlier than continuing. On the time of writing, there’s a minimal variety of information of 100 for Amazon Bedrock batch inference job.
2.2.1 Receiving queue messages
The Lambda perform loops by the messages within the queue and extracts the doc ID, LLM mannequin ID, immediate ID/model, and system promptID/model.
2.2.2 Retrieving the paperwork with out duplicates, changing to picture, and splitting massive recordsdata
The Lambda perform then retrieves the paperwork, converts them to pictures if they’re scanned PDF, and splits the big recordsdata if essential – simply as within the on-demand pipeline. As a result of the usual SQS queues don’t assure exactly-once message supply, the perform additionally makes positive that duplicate messages are ignored.
2.2.3 Permitting totally different prompts in a batch inference job
Just like the on-demand pipeline, totally different doc codecs require totally different consumer prompts for more practical knowledge extraction.
The supposed immediate ID and model for every doc are specified within the SQS messages. Throughout Lambda execution, the perform retrieves the immediate physique from Amazon Bedrock Immediate Administration.
2.2.4 Creating JSONL artifacts for batch inference job
The Lambda perform then handles the next duties:
- Making a
metadata.jsonwithin the Batch Inference Information S3 bucket to retailer the message attributes, together with the SQS message ID,doc_id, immediate ID/model, system immediate ID/model, and different project-related attributes. This file is later utilized by the Put up-Processing Lambda to populate the DynamoDB desk. - Processing the paperwork to create the JSONL recordsdata required for the Amazon Bedrock batch inference job. This course of is parallelized utilizing Python’s multiprocessing module for effectivity. The JSONL recordsdata are uploaded to the Batch Inference Information S3 bucket.
- Deleting the SQS messages after the paperwork have been ready and uploaded to the S3 bucket. This requires setting a big Visibility Timeout for the queue.
2.2.5 Composing messages and submits batch inference job
Lastly, the batch inference Lambda perform creates the Amazon Bedrock batch inference job utilizing the JSONL artifacts from the earlier step. Observe that every batch job can solely course of paperwork utilizing one mannequin, which means the SQS messages inside the identical batch job should specify the identical mannequin ID. If there are multiple mannequin ID specified within the incoming messages, the Lambda perform makes use of a polling mechanism that selects probably the most incessantly specified mannequin ID to make use of.
2.3. The Amazon Bedrock batch inference job
When Amazon Bedrock receives the batch inference job, it locations it in a queue. As soon as the job begins, it proceeds with the next steps.
2.3.1 Retrieving JSONL artifacts for batch inference job
Amazon Bedrock retrieves the JSONL artifacts specified throughout job creation.
2.3.2 Storing batch inference outputs
Upon completion, Amazon Bedrock shops the outputs to the Batch Inference Information S3 bucket, which can also be specified within the job creation.
2.3.3 Notifying Amazon EventBridge
After job completion, Amazon Bedrock sends a job standing change occasion to Amazon EventBridge, which is captured by an EventBridge rule.
2.4. Amazon EventBridge rule triggers the post-inference Lambda perform
The EventBridge rule triggers the post-processing Lambda perform to deal with additional mannequin output processing.
2.5. Put up-processing Lambda perform
2.5.1 Retrieving the output JSONL
The Lambda perform fetches the inference output JSONL from the batch inference knowledge S3 bucket.
2.5.2 Saving the inference output
The perform parses the JSONL recordsdata and saves the extracted land tract attributes to a DynamoDB desk.
Conditions
If you wish to do that instance your self, be sure you meet these stipulations:
- An AWS account with entry to the AWS Administration Console
- Applicable IAM permissions to create and handle CloudFormation stacks, which usually embrace:
- cloudformation:CreateStack
- cloudformation:DescribeStacks
- cloudformation:UpdateStack
- cloudformation:DeleteStack
Deploying the CloudFormation stacks
Deploy the on-demand pipeline:
![]()
Once you select the Launch Stack hyperlink, you’ll be taken to AWS CloudFormation to launch the CloudFormation stack:
- On the Create stack web page, select Subsequent
- On the Specify stack particulars web page, select Subsequent
- On the Configure stack choices web page, select Subsequent
- On the Evaluate and create web page, choose I acknowledge that AWS CloudFormation would possibly create IAM sources
- Select Submit
After it’s submitted, you possibly can observe some particulars in regards to the stack corresponding to Stack data, Occasions, Useful resource, and extra. The next screenshot is the Occasions on your reference:

You may also deploy the batch pipeline following the identical steps.
![]()
Testing the pipelines
The next steps information you to check the on-demand pipeline. The batch pipeline may also be examined in the same steps if in case you have at lease 100 paperwork.
- Obtain the information to your native atmosphere. There are three land paperwork from Winkler County, Andrews County, and Sutton County which can be bought from the Texas Land Records and County Records web site.
- Add downloaded PDF file(s) to the S3 artifact bucket ondemand-data-pipeline-bucket-${account_id} that’s created in CloudFormation stack.
- Create a textual content file message_txt.json utilizing the next instance by changing the immediate ID, system immediate ID and S3 bucket which can be created out of your CloudFormation stack.
- Create a shell script send2queue.sh by utilizing the above AWS CLI instance by changing the queue identify in and execute it. You will note a message to your SQS queue ondemand-data-pipeline-queue.fifo.
- The queue message will set off the Lambda perform ondemand-data-pipeline-queue-processor.
- Study the Lambda log in Amazon CloudWatch, the log group is /aws/lambda/ondemand-data-pipeline-queue-processor.
- Study the Amazon Bedrock inference output within the DynamoDB ondemand-data-pipeline-table desk. The JSON end result within the
model_responsecolumn for the Winkler County instance ought to appear to be the next:
Cleanup
To scrub up the sources:
- Sign up to the AWS Administration Console
- Navigate to the CloudFormation service
- Within the CloudFormation dashboard, discover and choose the stack you need to delete
- Select the “Delete” button on the high of the web page
- Affirm the deletion when prompted
CloudFormation will routinely delete the sources that had been created as a part of the stack within the appropriate order, dealing with dependencies appropriately.
Deleting the CloudFormation stacks doesn’t delete the S3 buckets and the DynamoDB as a result of their deletion coverage is about to retain to assist stop knowledge loss. To delete these sources, go to every service’s web page within the AWS Administration Console and delete them.
Conclusion
The on-demand and batch Amazon Bedrock inference pipelines introduced on this publish clarify how one can dynamically course of paperwork primarily based on the time sensitivity and knowledge quantity. You also needs to think about the associated fee details when deciding which pipeline to make use of. With the batch pipeline, as present in our checks, the price of Amazon Bedrock is 50% decrease in comparison with on-demand pipeline.
One other key function on this answer is the flexibility to specify the big language mannequin (for on-demand pipeline) and immediate on the particular person doc stage, enabling these pipelines to assist varied kinds of clever doc processing.
With parallelism enabled utilizing the Python’s multiprocessing module, each Lambda capabilities of the batch inference pipeline can course of 1,000 paperwork inside quarter-hour.
Name to motion
Amazon Bedrock can allow you to construct many generative AI functions. We advocate following the fast begin within the following GitHub repo and familiarizing your self with constructing generative AI functions. For superior readers, you possibly can look into methods to scale the answer additional. One thought is to run the Lambda code in AWS Batch as a substitute, permitting tens of hundreds of paperwork to be processed in a single Amazon Bedrock batch inference job.
Concerning the authors

