Introducing document-level sync reviews: Enhanced knowledge sync visibility in Amazon Q Enterprise

by root August 18, 2024

written by root August 18, 2024 0 comment 217 views

Amazon Q Enterprise is a completely managed, generative synthetic intelligence (AI)-powered assistant that helps enterprises unlock the worth of their knowledge and information. With Amazon Q, you’ll be able to shortly discover solutions to questions, generate summaries and content material, and full duties through the use of the data and experience saved throughout your organization’s numerous knowledge sources and enterprise techniques. On the core of this functionality are native knowledge supply connectors that seamlessly combine and index content material from a number of repositories right into a unified index. This permits the Amazon Q giant language mannequin (LLM) to supply correct, well-written solutions by drawing from the consolidated knowledge and knowledge. The info supply connectors act as a bridge, synchronizing content material from disparate techniques like Salesforce, Jira, and SharePoint right into a centralized index that powers the pure language understanding and generative talents of Amazon Q.

Prospects recognize that Amazon Q Enterprise securely connects to over 40 knowledge sources. Whereas utilizing their knowledge supply, they need higher visibility into the doc processing lifecycle throughout knowledge supply sync jobs. They need to know the standing of every doc they tried to crawl and index, in addition to the flexibility to troubleshoot why sure paperwork weren’t returned with the anticipated solutions. Moreover, they need entry to metadata, timestamps, and entry management lists (ACLs) for the listed paperwork.

We’re happy to announce a brand new characteristic now accessible in Amazon Q Enterprise that considerably improves visibility into knowledge supply sync operations. The newest launch introduces a complete document-level report included into the sync historical past, offering directors with granular indexing standing, metadata, and ACL particulars for each doc processed throughout a knowledge supply sync job. This enhancement to sync job observability permits directors to shortly examine and resolve ingestion or entry points encountered whereas organising an Amazon Q Enterprise utility. The detailed doc reviews are continued within the new SYNC_RUN_HISTORY_REPORT log stream below the Amazon Q Enterprise utility log group, so vital sync job particulars can be found on-demand when troubleshooting.

Lifecycle of a doc in a knowledge supply sync run job

On this part, we study the lifecycle of a doc inside a knowledge supply sync in Amazon Q Enterprise. This supplies priceless perception into the sync course of. The info supply sync contains three key phases: crawling, syncing, and indexing. Crawling entails the connector connecting to the info supply and extracting paperwork assembly the outlined sync scope in accordance with the info supply configuration. These paperwork are then synced to Amazon Q Enterprise throughout the syncing section. Lastly, indexing makes the synced paperwork searchable throughout the Amazon Q Enterprise surroundings.

The next diagram reveals a flowchart of a sync run job.

Crawling stage

The primary stage is the crawling stage, the place the connector crawls all paperwork and their metadata from the info supply. Throughout this stage, the connector additionally compares the checksum of the doc in opposition to the Amazon Q index to determine if a selected doc must be added, modified, or deleted from the index. This operation corresponds to the CrawlAction area within the sync run historical past report.

If the doc is unmodified, it’s marked as UNMODIFIED and skipped in the remainder of the phases. If any doc fails within the crawling stage, for instance as a consequence of throttling errors, damaged content material, or if the doc dimension is simply too large, that doc is marked as failed within the sync run historical past report with the CrawlStatus as FAILED. If the doc was skipped as a consequence of any validation errors, its CrawlStatus is marked as SKIPPED. These paperwork will not be despatched ahead to the subsequent stage. All profitable paperwork are marked as SUCCESS and are despatched ahead.

We additionally seize the ACLs and metadata on every doc on this stage to have the ability to add it to the sync run historical past report.

Syncing stage

In the course of the syncing stage, the doc is shipped to Amazon Q Enterprise ingestion service APIs like BatchPutDocument and BatchDeleteDocument. After a doc is submitted to those APIs, Amazon Q Enterprise runs validation checks on the submitted paperwork. If any doc fails these checks, its SyncStatus is marked as FAILED. If there’s an irrecoverable error for a selected doc, it’s marked as SKIPPED and different paperwork are despatched ahead.

Indexing stage

On this step, Amazon Q Enterprise parses the doc, processes it in accordance with its content material kind, and persists it within the index. If the doc fails to be continued, its IndexStatus is marked as FAILED; in any other case, it’s marked as SUCCESS.

After the statuses of all of the phases have been captured, we emit these statuses as an Amazon Cloudwatch occasion to the client’s AWS account.

Key options and advantages of document-level reviews

The next are the important thing options and advantages of the brand new doc degree report in Amazon Q Enterprise functions:

Enhanced sync run historical past web page – A brand new Actions column has been added to the sync run historical past web page, offering entry to the document-level report for every sync run.
Devoted log stream – A brand new log stream named SYNC_RUN_HISTORY_REPORT has been created within the Amazon Q Enterprise CloudWatch log group, containing the document-level report.
Complete doc info – The document-level report consists of the next info for every doc.
Doc ID – That is the doc ID that’s inherited straight from the info supply or mapped by the client within the knowledge supply area mappings.
Doc title – The title of the doc is taken from the info supply or mapped by the client within the knowledge supply area mappings.
Consolidated doc standing (SUCCESS, FAILED, or SKIPPED) – That is the ultimate consolidated standing of the doc. It may have a price of SUCCESS, FAILED, or SKIPPED. If the doc was efficiently processed in all phases, then the worth is SUCCESS. If the doc has failed or was skipped in any of the phases, then the worth of this area will probably be FAILED or SKIPPED.
Error message (if the doc failed) – This area incorporates the error message with which a doc failed. If a doc was skipped as a consequence of throttling errors, or any inside errors, this will probably be proven within the error message area.
Crawl standing – This area denotes whether or not the doc was crawled efficiently from the info supply. This standing correlates to the syncing-crawling state within the knowledge supply sync.
Sync standing – This area denotes whether or not the doc was despatched for syncing efficiently. This correlates to the syncing-indexing state within the knowledge supply sync.
Index standing – This area denotes whether or not the doc was efficiently continued within the index.
ACLs – This area incorporates a listing of document-level permissions that have been crawled from the info supply. The main points of every aspect within the checklist are:
- International title: That is the e-mail/username of the person. This area is mapped throughout a number of knowledge sources. For instance, if a person has 3 knowledge sources – Confluence, Sharepoint and Gmail with the native person ID as confluence_user, sharepoint_user and gmail_user respectively, and their e mail tackle person@e mail.com is the globalName within the ACL for all of them; then Amazon Q Enterprise understands that every one of those native person IDs map to the identical international title.
- Identify: That is the native distinctive ID of the person which is assigned by the info supply.
- Kind: This area signifies the principal kind. This may be both USER or GROUP.
- Is Federated: It is a boolean flag which signifies whether or not the group is of INDEX degree (true) or DATASOURCE degree (false).
- Entry: This area signifies whether or not the person has entry allowed or denied explicitly. Values will be both ALLOWED or DENIED.
- Knowledge supply ID: That is the info supply ID. For federated teams (INDEX degree), this area will probably be null.
Metadata – This area incorporates the metadata fields (apart from ACL) that have been pulled from the info supply. This checklist additionally consists of the metadata fields mapped by the client within the knowledge supply area mappings in addition to further metadata fields added by the connector.
Hashed doc ID (for troubleshooting help) – To safeguard your knowledge privateness, we current a safe, one-way hash of the doc identifier. This encrypted worth permits the Amazon Q Enterprise staff to effectively find and analyze the precise doc inside our logs, do you have to encounter any subject that requires additional investigation and backbone.
Timestamp – The timestamp signifies when the doc standing was logged in CloudWatch.

Within the following sections, we discover totally different use circumstances for the logging characteristic.

Troubleshoot “Sorry, I couldn’t discover related info” with the new logging feature

The brand new document-level logging characteristic in Amazon Q Enterprise might help troubleshoot widespread points associated to the “Sorry, I couldn’t discover related info to finish your request” response.

Let’s discover an instance situation. A mutual funds supervisor makes use of Amazon Q Enterprise chat for information retrieval and insights extraction throughout their enterprise knowledge shops. When the fund supervisor asks, “What’s the CAGR of the multi-asset fund?” within the Amazon Q chat, they obtain the “Sorry, I couldn’t discover related info to finish your request” response.

Because the administrator managing their Amazon Q Enterprise utility, you’ll be able to troubleshoot the difficulty utilizing the next method with the brand new logging characteristic. First, you need to decide whether or not the multi-asset fund doc was efficiently listed within the Amazon Q Enterprise utility. Subsequent, you could confirm if the fund supervisor’s person account has the required permission to learn the data from the multi-asset fund doc. Amazon Q Enterprise enforces the doc permissions configured in its knowledge supply, and you need to use this new characteristic to confirm that the doc ACL settings are synced within the Amazon Q Enterprise utility index.

You need to use the next CloudWatch question string to examine the doc ACL settings:

filter @logStream like 'SYNC_RUN_HISTORY_REPORT/' 
and DocumentTitle = "your-document-title"
| fields DocumentTitle, ConnectorDocumentStatus.Standing, Acl
| type @timestamp desc
| restrict 1

This question filter makes use of the per-document-level logging stream SYNC_RUN_HISTORY_REPORT, and shows the doc title and its related ACL settings. By verifying the doc indexing and permissions, you’ll be able to determine and resolve potential points that could be inflicting the “Sorry, I couldn’t discover related info” response.

The next screenshot reveals an instance consequence.

Decide the optimum boosting period for latest paperwork in utilizing document-level reporting

In the case of producing correct solutions, you could need to fine-tune the way in which Amazon Q prioritizes its content material. For example, you could desire to spice up latest paperwork over older ones to verify probably the most up-to-date passages are used to generate a solution. To attain this, you need to use the enterprise’s relevance tuning characteristic in Amazon Q Enterprise to spice up paperwork primarily based on the final replace date attribute, with a specified boosting period. Nonetheless, figuring out the optimum boosting interval will be difficult when coping with numerous steadily altering paperwork.

Now you can use the per-document-level report back to receive the _last_updated_at metadata area info to your paperwork, which might help you identify the suitable boosting interval. For this, you employ the next CloudWatch Logs Insights question to retrieve the _last_updated_at metadata attribute for machine studying paperwork from the SYNC_RUN_HISTORY_REPORT log stream:

filter @logStream like 'SYNC_RUN_HISTORY_REPORT/' 
and Metadata like 'Machine Studying'
| parse Metadata '{"key":"_last_updated_at","worth":{"dateValue":"*"}}' as @last_updated_at
| type @last_updated_at desc, @timestamp desc
| dedup DocumentTitle

With the previous question, you’ll be able to achieve insights into the final up to date timestamps of your paperwork, enabling you to make knowledgeable choices concerning the optimum boosting interval. This method makes certain your chat responses are generated utilizing the latest and related info, enhancing the general accuracy and effectiveness of your Amazon Q Enterprise implementation.

The next screenshot reveals an instance consequence.

Widespread doc indexing observability and troubleshooting strategies

On this part, we discover some widespread admin duties for observing and troubleshooting doc indexing utilizing the brand new document-level reporting characteristic.

Checklist all efficiently listed paperwork from a knowledge supply

To retrieve a listing of all paperwork which were efficiently listed from a selected knowledge supply, you need to use the next CloudWatch question:

fields DocumentTitle, DocumentId, @timestamp
| filter @logStream like 'SYNC_RUN_HISTORY_REPORT/your-data-source-id/'
and ConnectorDocumentStatus.Standing = "SUCCESS"
| type @timestamp desc | dedup DocumentTitle, DocumentId

The next screenshot reveals an instance consequence.

Checklist all efficiently listed paperwork from a knowledge supply sync job

To retrieve a listing of all paperwork which were efficiently listed throughout a selected sync job, you need to use the next CloudWatch question:

fields DocumentTitle, DocumentId, ConnectorDocumentStatus.Standing AS IndexStatus, @timestamp
| filter @logStream like 'SYNC_RUN_HISTORY_REPORT/your-data-source-id/run-id'
and ConnectorDocumentStatus.Standing = "SUCCESS"
| type DocumentTitle

The next screenshot reveals an instance consequence.

Checklist all failed listed paperwork from a knowledge supply sync job

To retrieve a listing of all paperwork that didn’t index throughout a selected sync job, together with the error messages, you need to use the next CloudWatch question:

fields DocumentTitle, DocumentId, ConnectorDocumentStatus.Standing AS IndexStatus, ErrorMsg, @timestamp
| filter @logStream like 'SYNC_RUN_HISTORY_REPORT/your-data-source-id/run-id'
and ConnectorDocumentStatus.Standing = "FAILED"
| type @timestamp desc

The next screenshot reveals an instance consequence.

Checklist all paperwork that incorporates a selected person title ACL permission from an Amazon Q Enterprise utility

To retrieve a listing of paperwork which have a selected person’s ACL permission, you need to use the next CloudWatch Logs Insights question:

filter @logStream like 'SYNC_RUN_HISTORY_REPORT/' 
and Acl like 'aneesh@mydemoaws.onmicrosoft.com'
| show DocumentTitle, SourceUri

The next screenshot reveals an instance consequence.

Checklist the ACL of an listed doc from a knowledge supply sync job

To retrieve the ACL info for a selected listed doc from a sync job, you need to use the next CloudWatch Logs Insights question:

filter @logStream like 'SYNC_RUN_HISTORY_REPORT/data-source-id/run-id' 
and DocumentTitle = "your-document-title"
| show DocumentTitle, Acl

The next screenshot reveals an instance consequence.

Checklist metadata of an listed doc from a knowledge supply sync job

To retrieve the metadata info for a selected listed doc from a sync job, you need to use the next CloudWatch Logs Insights question:

filter @logStream like 'SYNC_RUN_HISTORY_REPORT/data-source-id/run-id' 
and DocumentTitle = "your-document-title"
| show DocumentTitle, Metadata

The next screenshot reveals an instance consequence.

Conclusion

The newly launched document-level report in Amazon Q Enterprise supplies enhanced visibility and observability into the doc processing lifecycle throughout knowledge supply sync jobs. This characteristic addresses a vital want expressed by clients for higher troubleshooting capabilities and entry to detailed details about the indexing standing, metadata, and ACLs of particular person paperwork.

The document-level report is saved in a devoted log stream named SYNC_RUN_HISTORY_REPORT throughout the Amazon Q Enterprise utility CloudWatch log group. This report incorporates complete info for every doc, together with the doc ID, title, total doc sync standing, error messages (if any), together with its ACLs, and metadata info retrieved from the info sources. The info supply sync run historical past web page now consists of an Actions column, offering entry to the document-level report for every sync run. This characteristic considerably improves the flexibility to troubleshoot points associated to doc ingestion and entry management, and points associated to metadata relevance, and supplies higher visibility concerning the paperwork synced with an Amazon Q index.

To get began with Amazon Q Enterprise, discover the Getting began information. To study extra about knowledge supply connectors and finest practices, see Configuring Amazon Q Enterprise knowledge supply connectors.

In regards to the authors

Aneesh Mohan is a Senior Options Architect at Amazon Net Providers (AWS), bringing twenty years of expertise in creating impactful options for business-critical workloads. He’s captivated with know-how and loves working with clients to construct well-architected options, specializing in the monetary companies business, AI/ML, safety, and knowledge applied sciences.

Ashwin Shukla is a Software program Improvement Engineer II on the Amazon Q for Enterprise and Amazon Kendra engineering staff, with 6 years of expertise in creating enterprise software program. On this function, he works on designing and creating foundational options for Amazon Q for Enterprise.

Welcome to Ivugangingo!

At Ivugangingo, we're passionate about delivering insightful content that empowers and informs our readers across a spectrum of crucial topics. Whether you're delving into the world of insurance, navigating the complexities of cryptocurrency, or seeking wellness tips in health and fitness, we've got you covered.

Introducing document-level sync reviews: Enhanced knowledge sync visibility in Amazon Q Enterprise

Lifecycle of a doc in a knowledge supply sync run job

Crawling stage

Syncing stage

Indexing stage

Key options and advantages of document-level reviews

Troubleshoot “Sorry, I couldn’t discover related info” with the new logging feature

Decide the optimum boosting period for latest paperwork in utilizing document-level reporting

Widespread doc indexing observability and troubleshooting strategies

Checklist all efficiently listed paperwork from a knowledge supply

Checklist all efficiently listed paperwork from a knowledge supply sync job

Checklist all failed listed paperwork from a knowledge supply sync job

Checklist all paperwork that incorporates a selected person title ACL permission from an Amazon Q Enterprise utility

Checklist the ACL of an listed doc from a knowledge supply sync job

Checklist metadata of an listed doc from a knowledge supply sync job

Conclusion

In regards to the authors

Why Environmental Insurance coverage Stays Robust Regardless of Harder Underwriting

The ten Finest TVs We Reviewed (2024) and Shopping for Recommendation

Converter

Editors Pick

Newsletter

Categories

Related Posts