Mistral AI has launched Mistral OCR 3, a state-of-the-art optical character recognition service that powers the corporate’s Doc AI stack. mannequin named mistral-ocr-2512is constructed to extract interleaved textual content and pictures from PDFs and different paperwork whereas preserving construction, and does so at an unbeatable value of $2 per 1,000 pages, and 50% off when used through the Batch API.
What’s Mistral OCR 3 optimized for?
Mistral OCR 3 targets frequent enterprise doc workloads. This mannequin is tailor-made for kinds, scanned paperwork, advanced tables, and handwriting. It’s evaluated based mostly on inner benchmarks extracted from real-world enterprise use instances and achieves a 74% general win fee over Mistral OCR 2 throughout these doc classes utilizing fuzzy match metrics in opposition to floor reality.
This mannequin outputs markdown that preserves the doc’s format, and if tabular formatting is enabled, enriches the output with an HTML-based tabular illustration. This mixture gives downstream methods with each content material and structural info wanted for acquisition pipelines, evaluation, and agent workflows.
Mistral Doc Position in AI
OCR 3 is constructed inside Mistral Doc AI, the corporate’s doc processing capabilities that mix OCR with structured information extraction and Doc QnA.
It now powers the Doc AI Playground in Mistral AI Studio. This interface lets customers add PDFs or photographs with out writing any code, and returns clear textual content or structured JSON. The identical underlying OCR pipeline is accessible through public APIs, permitting groups to maneuver from interactive exploration to manufacturing workloads with out altering the core mannequin.
Inputs, outputs, and constructions
OCR processors settle for a number of doc codecs via a single API. of doc Fields can level to:
document_urlFor PDF, pptx, docx, and so forth.image_urlFor picture varieties like png, jpeg, avif and so forth.- PDF or picture uploaded or Base64 encoded through the identical schema
That is described within the OCR Processor part of Mistral’s Doc AI documentation.
The response is the next JSON object. pages array. Every web page incorporates an index, markdown strings, an inventory of photographs, and an inventory of tables. table_format="html" Used, Hyperlink Detected, Choices header and footer fields if header or footer extraction is enabled, and dimensions An object with web page measurement. There may be additionally. document_annotation Structured annotation fields and usage_info Block accounting info.
As soon as the photographs and HTML tables are extracted, the markdown will comprise placeholders like this:  and [tbl-3.html](tbl-3.html). These placeholders are photographs and tables Arraying within the response simplifies downstream reconstruction.
Upgrading from Mistral OCR 2
Mistral OCR 3 introduces a number of particular upgrades in comparison with OCR 2. The general public launch notes spotlight 4 key areas:
- handwritten Mistral OCR 3 extra precisely interprets cursive, combined content material annotations, and handwritten textual content positioned on high of printed templates.
- kind Improves detection of packing containers, labels, and handwriting in dense layouts equivalent to invoices, receipts, compliance kinds, and authorities paperwork.
- scanned advanced paperwork This mannequin is extra sturdy to compression artifacts, skew, distortion, low DPI, and background noise in scanned pages.
- advanced desk You possibly can rebuild desk constructions, together with headers, merged cells, multi-row blocks, and column hierarchies, and return HTML tables with the suitable info.
colspanandrowspanUse tags to make sure format is preserved.

Pricing, batch inference, and annotations
OCR 3 mannequin playing cards are priced at $2 per 1,000 pages for normal OCR and $3 per 1,000 annotated pages when utilizing structured annotations.
Mistral additionally exposes OCR 3 via Batch Inference API /v1/batchwhich is documented within the platform’s batch processing part. Batch processing applies a 50% low cost to jobs run via the batch pipeline, slicing the efficient value of OCR in half to $1 per 1,000 pages.
This mannequin is built-in with two essential options on the identical endpoint: annotations – structured and BBox extraction. These enable builders to connect schema-driven labels to areas of a doc and seize bounding packing containers for textual content and different components. That is helpful when mapping content material to downstream methods or UI overlays.
Vital factors
- fashions and roles: Mistral OCR 3, the identify is
mistral-ocr-2512is a brand new OCR service that powers Mistral’s Doc AI stack for page-based doc understanding. - Improved accuracy: In inner benchmarks masking kinds, scanned paperwork, advanced tables, and handwritten textual content, OCR 3 achieved a 74% general win fee versus Mistral OCR 2, and Mistral positions OCR 3 as state-of-the-art in opposition to each conventional and AI-native OCR methods.
- RAG structured output: This service extracts interleaved textual content and embedded photographs and returns markdown enriched with HTML-reconstructed tables, preserving format and desk construction so the output will be fed instantly into RAGs, brokers, and search pipelines with minimal further parsing.
- API and documentation format: Builders entry OCR 3 within the following methods:
/v1/ocrEndpoint or SDK, move the PDF asdocument_urlPhotographs equivalent to png and jpegimage_urlYou possibly can allow choices equivalent to , HTML desk output, header or footer extraction, and Base64 photographs within the response. - Pricing and batch processing: OCR 3 is priced at $2 per 1,000 pages and $3 per 1,000 annotated pages, however when used through the Batch API, the efficient value of normal OCR drops to $1 per 1,000 pages for large-scale processing.
Please verify technical details. Please be at liberty to test it out GitHub page for tutorials, code, and notebooks. Please be at liberty to observe us too Twitter Do not forget to hitch us 100,000+ ML subreddits and subscribe our newsletter.

Michal Sutter is a knowledge science knowledgeable with a grasp’s diploma in information science from the College of Padova. With a robust basis in statistical evaluation, machine studying, and information engineering, Michal excels at reworking advanced datasets into actionable insights.

