This publish is co-written with Aurélien Capdecomme and Bertrand d’Aure from 20 Minutes.
With 19 million month-to-month readers, 20 Minutes is a serious participant within the French media panorama. The media group delivers helpful, related, and accessible data to an viewers that consists primarily of younger and lively city readers. Each month, almost 8.3 million 25–49-year-olds select 20 Minutes to remain knowledgeable. Established in 2002, 20 Minutes constantly reaches greater than a 3rd (39 p.c) of the French inhabitants every month by print, net, and cell platforms.
As 20 Minutes’s expertise workforce, we’re liable for creating and working the group’s net and cell choices and driving modern expertise initiatives. For a number of years, we’ve got been actively utilizing machine studying and synthetic intelligence (AI) to enhance our digital publishing workflow and to ship a related and customized expertise to our readers. With the arrival of generative AI, and specifically massive language fashions (LLMs), we’ve got now adopted an AI by design technique, evaluating the applying of AI for each new expertise product we develop.
Considered one of our key objectives is to supply our journalists with a best-in-class digital publishing expertise. Our newsroom journalists work on information tales utilizing Storm, our customized in-house digital enhancing expertise. Storm serves because the entrance finish for Nova, our serverless content material administration system (CMS). These purposes are a spotlight level for our generative AI efforts.
In 2023, we recognized a number of challenges the place we see the potential for generative AI to have a constructive influence. These embrace new instruments for newsroom journalists, methods to extend viewers engagement, and a brand new approach to make sure advertisers can confidently assess the model security of our content material. To implement these use circumstances, we depend on Amazon Bedrock.
Amazon Bedrock is a totally managed service that gives a alternative of high-performing basis fashions (FMs) from main AI firms like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon Internet Providers (AWS) by a single API, together with a broad set of capabilities it’s essential construct generative AI purposes with safety, privateness, and accountable AI.
This weblog publish outlines varied use circumstances the place we’re utilizing generative AI to deal with digital publishing challenges. We dive into the technical points of our implementation and clarify our choice to decide on Amazon Bedrock as our basis mannequin supplier.
Figuring out challenges and use circumstances
Immediately’s fast-paced information setting presents each challenges and alternatives for digital publishers. At 20 Minutes, a key objective of our expertise workforce is to develop new instruments for our journalists that automate repetitive duties, enhance the standard of reporting, and permit us to succeed in a wider viewers. Based mostly on this objective, we’ve got recognized three challenges and corresponding use circumstances the place generative AI can have a constructive influence.
The primary use case is to make use of automation to reduce the repetitive handbook duties that journalists carry out as a part of the digital publishing course of. The core work of creating a information story revolves round researching, writing, and enhancing the article. Nevertheless, when the article is full, supporting data and metadata should be outlined, akin to an article abstract, classes, tags, and associated articles.
Whereas these duties can really feel like a chore, they’re crucial to search engine optimization (SEO) and due to this fact the viewers attain of the article. If we will automate a few of these repetitive duties, this use case has the potential to unencumber time for our newsroom to give attention to core journalistic work whereas rising the attain of our content material.
The second use case is how we republish information company dispatches at 20 Minutes. Like most information shops, 20 Minutes subscribes to news agencies, such because the Agence France-Presse (AFP) and others, that publish a feed of stories dispatches masking nationwide and worldwide information. 20 Minutes journalists choose tales related to our viewers and rewrite, edit, and broaden on them to suit the editorial requirements and distinctive tone our readership is used to. Rewriting these dispatches can also be needed for website positioning, as serps rank duplicate content material low. As a result of this course of follows a repeatable sample, we determined to construct an AI-based software to simplify the republishing course of and cut back the time spent on it.
The third and ultimate use case we recognized is to enhance transparency across the model security of our revealed content material. As a digital writer, 20 Minutes is dedicated to offering a brand-safe setting for potential advertisers. Content material could be categorized as brand-safe or not brand-safe based mostly on its appropriateness for promoting and monetization. Relying on the advertiser and model, several types of content material may be thought-about applicable. For instance, some advertisers won’t need their model to seem subsequent to information content material about delicate subjects akin to navy conflicts, whereas others won’t need to seem subsequent to content material about medicine and alcohol.
Organizations such because the Interactive Advertising Bureau (IAB) and the Global Alliance for Responsible Media (GARM) have developed complete guidelines and frameworks for classifying the model security of content material. Based mostly on these pointers, information suppliers such because the IAB and others conduct automated model security assessments of digital publishers by often crawling web sites akin to 20minutes.fr and calculating a model security rating.
Nevertheless, this model security rating is site-wide and doesn’t break down the model security of particular person information articles. Given the reasoning capabilities of LLMs, we determined to develop an automatic per-article model security evaluation based mostly on industry-standard pointers to supply advertisers with a real-time, granular view of the model security of 20 Minutes content material.
Our technical answer
At 20 Minutes, we’ve been utilizing AWS since 2017, and we goal to construct on prime of serverless providers each time doable.
The digital publishing frontend software Storm is a single-page software constructed utilizing React and Material Design and deployed utilizing Amazon Easy Storage Service (Amazon S3) and Amazon CloudFront. Our CMS backend Nova is applied utilizing Amazon API Gateway and several other AWS Lambda features. Amazon DynamoDB serves as the first database for 20 Minutes articles. New articles and adjustments to present articles are captured utilizing DynamoDB Streams, which invokes processing logic in AWS Step Features and feeds our search service based mostly on Amazon OpenSearch.
We combine Amazon Bedrock utilizing AWS PrivateLink, which permits us to create a personal connection between our Amazon Digital Non-public Cloud (VPC) and Amazon Bedrock with out traversing the general public web.
When engaged on articles in Storm, journalists have entry to a number of AI instruments applied utilizing Amazon Bedrock. Storm is a block-based editor that enables journalists to mix a number of blocks of content material, akin to title, lede, textual content, picture, social media quotes, and extra, into a whole article. With Amazon Bedrock, journalists can use AI to generate an article abstract suggestion block and place it instantly into the article. We use a single-shot immediate with the complete article textual content in context to generate the abstract.
Storm CMS additionally offers journalists solutions for article metadata. This consists of suggestions for applicable classes, tags, and even in-text hyperlinks. These references to different 20 Minutes content material are crucial to rising viewers engagement, as serps rank content material with related inner and exterior hyperlinks greater.
To implement this, we use a mix of Amazon Comprehend and Amazon Bedrock to extract essentially the most related phrases from an article’s textual content after which carry out a search towards our inner taxonomic database in OpenSearch. Based mostly on the outcomes, Storm offers a number of solutions of phrases that must be linked to different articles or subjects, which customers can settle for or reject.
Information dispatches turn into accessible in Storm as quickly as we obtain them from our companions akin to AFP. Journalists can browse the dispatches and choose them for republication on 20minutes.fr. Each dispatch is manually reworked by our journalists earlier than publication. To take action, journalists first invoke a rewrite of the article by an LLM utilizing Amazon Bedrock. For this, we use a low-temperature single-shot immediate that instructs the LLM to not reinterpret the article in the course of the rewrite, and to maintain the phrase rely and construction as comparable as doable. The rewritten article is then manually edited by a journalist in Storm like some other article.
To implement our new model security characteristic, we course of each new article revealed on 20minutes.fr. Presently, we use a single shot immediate that features each the article textual content and the IAB model security pointers in context to get a sentiment evaluation from the LLM. We then parse the response, retailer the sentiment, and make it publicly accessible for every article to be accessed by advert servers.
Classes discovered and outlook
After we began engaged on generative AI use circumstances at 20 Minutes, we had been stunned at how rapidly we had been in a position to iterate on options and get them into manufacturing. Due to the unified Amazon Bedrock API, it’s simple to modify between fashions for experimentation and discover the very best mannequin for every use case.
For the use circumstances described above, we use Anthropic’s Claude in Amazon Bedrock as our main LLM due to its total prime quality and, specifically, its high quality in recognizing French prompts and producing French completions. As a result of 20 Minutes content material is nearly completely French, these multilingual capabilities are key for us. Now we have discovered that cautious immediate engineering is a key success issue and we carefully adhere to Anthropic’s prompt engineering resources to maximise completion high quality.
Even with out counting on approaches like fine-tuning or retrieval-augmented era (RAG) to this point, we will implement use circumstances that ship actual worth to our journalists. Based mostly on information collected from our newsroom journalists, our AI instruments save them a mean of eight minutes per article. With round 160 items of content material revealed day-after-day, that is already a big period of time that may now be spent reporting the information to our readers, quite than performing repetitive handbook duties.
The success of those use circumstances relies upon not solely on technical efforts, but in addition on shut collaboration between our product, engineering, newsroom, advertising and marketing, and authorized groups. Collectively, representatives from these roles make up our AI Committee, which establishes clear insurance policies and frameworks to make sure the clear and accountable use of AI at 20 Minutes. For instance, each use of AI is mentioned and authorised by this committee, and all AI-generated content material should endure human validation earlier than being revealed.
We consider that generative AI continues to be in its infancy relating to digital publishing, and we sit up for bringing extra modern use circumstances to our platform this yr. We’re at the moment engaged on deploying fine-tuned LLMs utilizing Amazon Bedrock to precisely match the tone and voice of our publication and additional enhance our model security evaluation capabilities. We additionally plan to make use of Bedrock fashions to tag our present picture library and supply automated solutions for article photos.
Why Amazon Bedrock?
Based mostly on our analysis of a number of generative AI mannequin suppliers and our expertise implementing the use circumstances described above, we chosen Amazon Bedrock as our main supplier for all our basis mannequin wants. The important thing causes that influenced this choice had been:
- Alternative of fashions: The marketplace for generative AI is evolving quickly, and the AWS strategy of working with a number of main mannequin suppliers ensures that we’ve got entry to a big and rising set of foundational fashions by a single API.
- Inference efficiency: Amazon Bedrock delivers low-latency, high-throughput inference. With on-demand and provisioned throughput, the service can constantly meet all of our capability wants.
- Non-public mannequin entry: We use AWS PrivateLink to ascertain a personal connection to Amazon Bedrock endpoints with out traversing the general public web, making certain that we keep full management over the information we ship for inference.
- Integration with AWS providers: Amazon Bedrock is tightly built-in with AWS providers akin to AWS Identification and Entry Administration (IAM) and the AWS Software program Improvement Package (AWS SDK). Consequently, we had been in a position to rapidly combine Bedrock into our present structure with out having to adapt any new instruments or conventions.
Conclusion and outlook
On this weblog publish, we described how 20 Minutes is utilizing generative AI on Amazon Bedrock to empower our journalists within the newsroom, attain a broader viewers, and make model security clear to our advertisers. With these use circumstances, we’re utilizing generative AI to convey extra worth to our journalists at the moment, and we’ve constructed a basis for promising new AI use circumstances sooner or later.
To be taught extra about Amazon Bedrock, begin with Amazon Bedrock Sources for documentation, weblog posts, and extra buyer success tales.
Concerning the authors
Aurélien Capdecomme is the Chief Know-how Officer at 20 Minutes, the place he leads the IT improvement and infrastructure groups. With over 20 years of expertise in constructing environment friendly and cost-optimized architectures, he has a robust give attention to serverless technique, scalable purposes and AI initiatives. He has applied innovation and digital transformation methods at 20 Minutes, overseeing the entire migration of digital providers to the cloud.
Bertrand d’Aure is a software program developer at 20 Minutes. An engineer by coaching, he designs and implements the backend of 20 Minutes purposes, with a give attention to the software program utilized by journalists to create their tales. Amongst different issues, he’s liable for including generative AI options to the software program to simplify the authoring course of.
Dr. Pascal Vogel is a Options Architect at Amazon Internet Providers. He collaborates with enterprise prospects throughout EMEA to construct cloud-native options with a give attention to serverless and generative AI. As a cloud fanatic, Pascal loves studying new applied sciences and connecting with like-minded prospects who need to make a distinction of their cloud journey.