Automate the creation of handout notes utilizing Amazon Bedrock Information Automation

by root July 31, 2025

written by root July 31, 2025 0 comment 265 views

Organizations throughout varied sectors face vital challenges when changing assembly recordings or recorded displays into structured documentation. The method of making handouts from displays requires a lot of handbook effort, akin to reviewing recordings to establish slide transitions, transcribing spoken content material, capturing and organizing screenshots, synchronizing visible parts with speaker notes, and formatting content material. These challenges impression productiveness and scalability, particularly when coping with a number of presentation recordings, convention periods, coaching supplies, and academic content material.

On this submit, we present how one can construct an automatic, serverless answer to remodel webinar recordings into complete handouts utilizing Amazon Bedrock Information Automation for video evaluation. We stroll you thru the implementation of Amazon Bedrock Information Automation to transcribe and detect slide modifications, in addition to the usage of Amazon Bedrock basis fashions (FMs) for transcription refinement, mixed with customized AWS Lambda capabilities orchestrated by AWS Step Features. By means of detailed implementation particulars, architectural patterns, and code, you’ll learn to construct a workflow that automates the handout creation course of.

Amazon Bedrock Information Automation

Amazon Bedrock Information Automation makes use of generative AI to automate the transformation of multimodal knowledge (akin to pictures, movies and extra) right into a customizable structured format. Examples of structured codecs embrace summaries of scenes in a video, unsafe or express content material in textual content and pictures, or organized content material based mostly on commercials or manufacturers. The answer offered on this submit makes use of Amazon Bedrock Information Automation to extract audio segments and totally different photographs in movies.

Answer overview

Our answer makes use of a serverless structure orchestrated by Step Features to course of presentation recordings into complete handouts. The workflow consists of the next steps:

The workflow begins when a video is uploaded to Amazon Easy Storage Service (Amazon S3), which triggers an occasion notification by means of Amazon EventBridge guidelines that initiates our video processing workflow in Step Features.
After the workflow is triggered, Amazon Bedrock Information Automation initiates a video transformation job to establish totally different photographs within the video. In our case, that is represented by a change of slides. The workflow strikes right into a ready state, and checks for the transformation job progress. If the job is in progress, the workflow returns to the ready state. When the job is full, the workflow continues, and we now have extracted each visible photographs and spoken content material.
These visible photographs and spoken content material feed right into a synchronization step. On this Lambda operate, we use the output of the Amazon Bedrock Information Automation job to match the spoken content material to the correlating photographs based mostly on the matching of timestamps.
After operate has matched the spoken content material to the visible photographs, the workflow strikes right into a parallel state. One of many steps of this state is the technology of screenshots. We use a FFmpeg-enabled Lambda operate to create pictures for every recognized video shot.
The opposite step of the parallel state is the refinement of our transformations. Amazon Bedrock processes and improves every uncooked transcription part by means of a Map state. This helps us take away speech disfluencies and enhance the sentence construction.
Lastly, after the screenshots and refined transcript are created, the workflow makes use of a Lambda operate to create handouts. We use the Python-PPTX library, which generates the ultimate presentation with synchronized content material. These remaining handouts are saved in Amazon S3 for distribution.

The next diagram illustrates this workflow.

If you wish to check out this answer, we’ve got created an AWS Cloud Growth Equipment (AWS CDK) stack obtainable within the accompanying GitHub repo which you could deploy in your account. It deploys the Step Features state machine to orchestrate the creation of handout notes from the presentation video recording. It additionally gives you with a pattern video to check out the outcomes.

To deploy and check the answer in your personal account, comply with the directions within the GitHub repository’s README file. The next sections describe in additional element the technical implementation particulars of this answer.

Video add and preliminary processing

The workflow begins with Amazon S3, which serves because the entry level for our video processing pipeline. When a video is uploaded to a devoted S3 bucket, it triggers an occasion notification that, by means of EventBridge guidelines, initiates our Step Features workflow.

Shot detection and transcription utilizing Amazon Bedrock Information Automation

This step makes use of Amazon Bedrock Information Automation to detect slide transitions and create video transcriptions. To combine this as a part of the workflow, you have to create an Amazon Bedrock Information Automation venture. A venture is a grouping of output configurations. Every venture can include commonplace output configurations in addition to customized output blueprints for paperwork, pictures, video, and audio. The venture has already been created as a part of the AWS CDK stack. After you arrange your venture, you possibly can course of content material utilizing the InvokeDataAutomationAsync API. In our answer, we use the Step Features service integration to execute this API name and begin the asynchronous processing job. A job ID is returned for monitoring the method.

The workflow should now verify the standing of the processing job earlier than persevering with with the handout creation course of. That is achieved by polling Amazon Bedrock Information Automation for the job standing utilizing the GetDataAutomationStatus API frequently. Utilizing a mixture of the Step Features Wait and Alternative states, we will ask the workflow to ballot the API on a set interval. This not solely provides you the power to customise the interval relying in your wants, however it additionally helps you management the workflow prices, as a result of each state transition is billed in Customary workflows, which this answer makes use of.

When the GetDataAutomationStatus API output exhibits as SUCCESS, the loop exits and the workflow continues to the subsequent step, which is able to match transcripts to the visible photographs.

Matching audio segments with corresponding photographs

To create complete handouts, you have to set up a mapping between the visible photographs and their corresponding audio segments. This mapping is essential to verify the ultimate handouts precisely signify each the visible content material and the spoken narrative of the presentation.

A shot represents a collection of interrelated consecutive frames captured in the course of the presentation, usually indicating a definite visible state. In our presentation context, a shot corresponds to both a brand new slide or a major slide animation that provides or modifies content material.

An audio section is a particular portion of an audio recording that accommodates uninterrupted spoken language, with minimal pauses or breaks. This section captures a pure movement of speech. The Amazon Bedrock Information Automation output gives an audio_segments array, with every section containing exact timing info akin to the beginning and finish time of every section. This permits for correct synchronization with the visible photographs.

The synchronization between photographs and audio segments is vital for creating correct handouts that protect the presentation’s narrative movement. To realize this, we implement a Lambda operate that manages the matching course of in three steps:

The operate retrieves the processing outcomes from Amazon S3, which accommodates each the visible photographs and audio segments.
It creates structured JSON arrays from these elements, making ready them for the matching algorithm.
It executes an identical algorithm that analyzes the totally different timestamps of the audio segments and the photographs, and matches them based mostly on these timestamps. This algorithm additionally considers timestamp overlaps between photographs and audio segments.

For every shot, the operate examines audio segments and identifies these whose timestamps overlap with the shot’s length, ensuring the related spoken content material is related to its corresponding slide within the remaining handouts. The operate returns the matched outcomes on to the Step Features workflow, the place it can function enter for the subsequent step, the place Amazon Bedrock will refine the transcribed content material and the place we’ll create screenshots in parallel.

Screenshot technology

After you get the timestamps of every shot and related audio section, you possibly can seize the slides of the presentation to create complete handouts. Every detected shot from Amazon Bedrock Information Automation represents a definite visible state within the presentation—usually a brand new slide or vital content material change. By producing screenshots at these exact moments, we be certain that our handouts precisely signify the visible movement of the unique presentation.

That is achieved with a Lambda operate utilizing the ffmpeg-python library. This library acts as a Python binding for the FFmpeg media framework, so you possibly can run FFmpeg terminal instructions utilizing Python strategies. In our case, we will extract frames from the video at particular timestamps recognized by Amazon Bedrock Information Automation. The screenshots are saved in an S3 bucket for use in creating the handouts, as described within the following code. To make use of ffmpeg-python in Lambda, we created a Lambda ZIP deployment containing the required dependencies to run the code. Directions on find out how to create the ZIP file will be present in our GitHub repository.

The next code exhibits how a screenshot is taken utilizing ffmpeg-python. You may view the total Lambda code on GitHub.

## Taking a screenshot at a particular timestamp 
ffmpeg.enter(video_path, ss=timestamp).output(screenshot_path, vframes=1).run()

Transcript refinement with Amazon Bedrock

In parallel with the screenshot technology, we refine the transcript utilizing a big language mannequin (LLM). We do that to enhance the standard of the transcript and filter out errors and speech disfluencies. This course of makes use of an Amazon Bedrock mannequin to reinforce the standard of the matched transcription segments whereas sustaining content material accuracy. We use a Lambda operate that integrates with Amazon Bedrock by means of the Python Boto3 shopper, utilizing a immediate to information the mannequin’s refinement course of. The operate can then course of every transcript section, instructing the mannequin to do the next:

Repair typos and grammatical errors
Take away speech disfluencies (akin to “uh” and “um”)
Keep the unique that means and technical accuracy
Protect the context of the presentation

In our answer, we used the next immediate with three instance inputs and outputs:

immediate=""'That is the results of a transcription. 
I need you to take a look at this audio section and repair the typos and errors current. 
Be at liberty to make use of the context of the remainder of the transcript to refine (however do not omit any data). 
Pass over elements the place the speaker misspoke. 
Be certain that to additionally take away works like "uh" or "um". 
Solely make change to the information or sentence construction when there are errors. 
Solely give again the refined transcript as output, do not add anything or any context or title. 
If there aren't any typos or errors, return the unique object enter. 
Don't clarify why you might have or haven't made any modifications; I simply need the JSON object. 

These are examples: 
Enter: <an example-input> 
Output: <an example-output>

Enter: <an example-input> 
Output: <an example-output>

Enter: <an example-input> 
Output: <an example-output>

Right here is the item: ''' + textual content

The next is an instance enter and output:

Enter: Yeah. Um, so let's discuss somewhat bit about recovering from a ransomware assault, proper?

Output: Sure, let's discuss somewhat bit about recovering from a ransomware assault.

To optimize processing velocity whereas adhering to the utmost token limits of the Amazon Bedrock InvokeModel API, we use the Step Features Map state. This permits parallel processing of a number of transcriptions, every similar to a separate video section. As a result of these transcriptions have to be dealt with individually, the Map state effectively distributes the workload. Moreover, it reduces operational overhead by managing integration—taking an array as enter, passing every factor to the Lambda operate, and mechanically reconstructing the array upon completion.The Map state returns the refined transcript on to the Step Features workflow, sustaining the construction of the matched segments whereas offering cleaner, extra skilled textual content content material for the ultimate handout technology.

Handout technology

The ultimate step in our workflow entails creating the handouts utilizing the python-pptx library. This step combines the refined transcripts with the generated screenshots to create a complete presentation doc.

The Lambda operate processes the matched segments sequentially, creating a brand new slide for every screenshot whereas including the corresponding refined transcript as speaker notes. The implementation makes use of a customized Lambda layer containing the python-pptx package deal. To allow this performance in Lambda, we created a customized layer utilizing Docker. Through the use of Docker to create our layer, we be certain that the dependencies are compiled in an atmosphere that matches the Lambda runtime. You’ll find the directions to create this layer and the layer itself in our GitHub repository.

The Lambda operate implementation makes use of python-pptx to create structured displays:

import boto3
from pptx import Presentation
from pptx.util import Inches
import os
import json

def lambda_handler(occasion, context):
    # Create new presentation with particular dimensions
    prs = Presentation()
    prs.slide_width = int(12192000)  # Customary presentation width
    prs.slide_height = int(6858000)  # Customary presentation top
    
    # Course of every section
    for i in vary(num_images):
        # Add new slide
        slide = prs.slides.add_slide(prs.slide_layouts[5])
        
        # Add screenshot as full-slide picture
        slide.shapes.add_picture(image_path, 0, 0, width=slide_width)
        
        # Add transcript as speaker notes
        notes_slide = slide.notes_slide
        transcription_text = transcription_segments[i].get('transcript', '')
        notes_slide.notes_text_frame.textual content = transcription_text
    
    # Save presentation
    pptx_path = os.path.be a part of(tmp_dir, "lecture_notes.pptx")
    prs.save(pptx_path)

The operate processes segments sequentially, making a presentation that mixes visible photographs with their corresponding audio segments, leading to handouts prepared for distribution.

The next screenshot exhibits an instance of a generated slide with notes. The total deck has been added as a file in the GitHub repository.

Conclusion

On this submit, we demonstrated find out how to construct a serverless answer that automates the creation of handout notes from recorded slide displays. By combining Amazon Bedrock Information Automation with customized Lambda capabilities, we’ve created a scalable pipeline that considerably reduces the handbook effort required in creating handout supplies. Our answer addresses a number of key challenges in content material creation:

Automated detection of slide transitions, content material modifications, and correct transcription of spoken content material utilizing the video modality capabilities of Amazon Bedrock Information Automation
Clever refinement of transcribed textual content utilizing Amazon Bedrock
Synchronized visible and textual content material with a customized matching algorithm
Handout technology utilizing the ffmpeg-python and python-pptx libraries in Lambda

The serverless structure, orchestrated by Step Features, gives dependable execution whereas sustaining cost-efficiency. Through the use of Python packages for FFmpeg and a Lambda layer for python-pptx, we’ve overcome technical limitations and created a strong answer that may deal with varied presentation codecs and lengths. This answer will be prolonged and customised for various use circumstances, from academic establishments to company coaching applications. Sure steps such because the transcript refinement may also be improved, as an example by including translation capabilities to account for various audiences.

To study extra about Amazon Bedrock Information Automation, consult with the next sources:

Concerning the authors

Laura Verghote is the GenAI Lead for PSI Europe at Amazon Net Providers (AWS), driving Generative AI adoption throughout public sector organizations. She companions with prospects all through Europe to speed up their GenAI initiatives by means of technical experience and strategic planning, bridging advanced necessities with progressive AI options.

Elie Elmalem is a options architect at Amazon Net Providers (AWS) and helps Training prospects throughout the UK and EMEA. He works with prospects to successfully use AWS companies, offering architectural finest practices, recommendation, and steerage. Exterior of labor, he enjoys spending time with household and mates and loves watching his favourite soccer crew play.

Welcome to Ivugangingo!

At Ivugangingo, we're passionate about delivering insightful content that empowers and informs our readers across a spectrum of crucial topics. Whether you're delving into the world of insurance, navigating the complexities of cryptocurrency, or seeking wellness tips in health and fitness, we've got you covered.

Automate the creation of handout notes utilizing Amazon Bedrock Information Automation

Amazon Bedrock Information Automation

Answer overview

Video add and preliminary processing

Shot detection and transcription utilizing Amazon Bedrock Information Automation

Matching audio segments with corresponding photographs

Screenshot technology

Transcript refinement with Amazon Bedrock

Handout technology

Conclusion

Concerning the authors

High 5 challenges going through P&C Insurance coverage MGAs and the way AMS may help

Be part of Wired’s AI Energy Summit

Converter

Editors Pick

Newsletter

Categories

Related Posts