Amazon Bedrock is a totally managed service that enables organizations to decide on high-performance foundational fashions (FMs) from main synthetic intelligence (AI) firms, comparable to AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon, by a single API. To offer FMs with up-to-date, distinctive info, organizations use Retrieval Augmented Technology (RAG), a method that retrieves information from enterprise information sources and enriches prompts to offer extra related and correct responses. Amazon Bedrock data base is a totally managed characteristic that helps implement your entire RAG workflow, from ingestion to retrieval to immediate augmentation. Nevertheless, details about one dataset could also be present in one other dataset, known as metadata. With out metadata, the retrieval course of might retrieve irrelevant outcomes, decreasing the accuracy of the FM and rising the price of FM immediate tokens.
On March 27, 2024, Amazon Bedrock introduced an necessary new characteristic known as Metadata Filtering and in addition modified the default engine. This alteration means that you can use metadata fields throughout the ingest course of. Nevertheless, the metadata fields should be set throughout the data base ingestion course of. Typically, you’ll have tabular information the place particulars of 1 area can be found in one other area. You may additionally must quote the precise textual content doc or textual content area to forestall hallucinations. On this put up, we’ll present you the way to use the brand new metadata filtering characteristic in Amazon Bedrock data base for such tabular information.
Resolution overview
The answer consists of the next high-level steps:
- Prepares information for metadata filtering.
- Create and populate your data base with information and metadata.
- Use metadata filtering to retrieve information from the data base.
Getting ready information for metadata filtering
On the time of writing, Amazon Bedrock data bases are supported by Amazon OpenSearch Serverless, Amazon Aurora, Pine cone, Redis Enterpriseand MongoDB Atlas Because the underlying vector retailer supplier. On this put up, we use the Amazon Bedrock Boto3 SDK to create and entry an OpenSearch Serverless vector retailer. For extra info, see Configuring a Information Base Vector Index with a Supported Vector Retailer.
On this put up, we’ll use a public dataset to create a data base. Food.com – Recipes and ReviewsThe next screenshot reveals an instance dataset.
of TotalTime It’s in ISO 8601 format, you may convert it to minutes utilizing the next logic:
After changing some options, CholesterolContent, SugarContent, and RecipeInstructionsYour information body ought to appear like the next screenshot:

To be able to have FM level to a particular menu with a hyperlink (citing a doc), I cut up every row of tabular information into one textual content file, with every file containing the next: RecipeInstructions As a knowledge area TotalTimeInMinutes, CholesterolContent, and SugarContent Reserve it as metadata. The metadata is saved in a separate JSON file with the identical title as the information file, .metadata.json The next characters are added to the file title: For instance, if the information file title is 100.txtThe metadata file title is 100.txt.metadata.jsonFor extra info, see Including metadata to recordsdata to allow them to be filtered. Moreover, the content material of the metadata file should be within the following format:
For simplicity’s sake, we’ll solely course of the highest 2,000 rows to create the data base.
- After you import the required libraries, create a neighborhood listing utilizing the next Python code:
- Iterate by the highest 2,000 rows and create a knowledge file and a metadata file to save lots of in a neighborhood folder.
- Create an Amazon Easy Storage Service (Amazon S3) bucket.
food-kbAdd the file:
Create and populate your data base with information and metadata
After getting an S3 folder prepared, you may comply with this pattern pocket book to create a data base within the Amazon Bedrock console utilizing the SDK.
Utilizing Metadata Filtering to Retrieve Knowledge from the Information Base
Now, let’s get the information from the data base. On this put up, I am utilizing Amazon Bedrock’s Anthropic Claude Sonnet because the FM, however you may select from quite a lot of Amazon Bedrock fashions. First, you’ll want to set the next variables: kb_id is the ID of your data base. You could find the data base ID programmatically as proven within the following picture: Sample Noteor you may entry your data bases by navigating to your particular person data bases from the Amazon Bedrock console, as proven within the following screenshot.

Use the next code to set the required Amazon Bedrock parameters:
The next code is the output of outcomes retrieved from the data base with none metadata filtering for the question “What recipes can I make in beneath half-hour which have lower than 10 ldl cholesterol?”. As you may see, the preparation time of the 2 recipes is 30 and 480 minutes respectively, and the ldl cholesterol content material is 86 and 112.4 respectively. Therefore, the retrieval doesn’t comply with the question precisely.

The next code reveals the way to use the Retrieve API for a similar question with metadata filters set to ldl cholesterol content material lower than 10 and cook dinner time lower than half-hour.
As you may see from the next outcomes, the preparation instances for the 2 recipes are 27 and 20 respectively, and the ldl cholesterol content material is 0 and 0 respectively. You need to use metadata filtering to get extra correct outcomes.

The next code reveals the way to use the identical metadata filtering to get the precise output: retrieve_and_generate API. First configure the immediate, then configure the API with metadata filtering.
As you may see within the following output, the mannequin returns detailed recipes following the indicated metadata filtering with prep time lower than half-hour and ldl cholesterol content material lower than 10.

cleansing
In the event you plan to make use of the data base you created for constructing a RAG utility, ensure to remark the next part. In case you are simply making an attempt to create a data base utilizing the SDK, ensure to delete all of the assets created as there’s a value for storing paperwork in an OpenSearch Serverless index. See the next code:
Conclusion
On this put up, we now have seen the way to cut up a big tabular dataset into rows, arrange a data base with metadata for every file, and use metadata filtering to acquire the output. We additionally confirmed how utilizing metadata to acquire outcomes is extra correct than outcomes with out metadata filtering. Lastly, we confirmed the way to use FM to acquire correct outcomes.
To additional discover the data base capabilities of Amazon Bedrock, see the next assets:
In regards to the Writer
Tanay Choudhury He’s a Knowledge Scientist within the Generative AI Innovation Middle at Amazon Net Providers, the place he helps prospects resolve enterprise issues utilizing Generative AI and Machine Studying.

