Home / Technology / Mistral AI Unveils OCR 4: Documents as Semantic Maps
Mistral AI Unveils OCR 4: Documents as Semantic Maps
25 Jun
Summary
- OCR 4 transforms documents into structured representations with bounding boxes.
- Model supports 170 languages and offers self-hosted deployment for sensitive data.
- Release aligns with European AI sovereignty and data protection concerns.

Mistral AI released OCR 4 on Tuesday, a new document intelligence model that provides structured representations of documents, including bounding boxes, block-type classification, and per-word confidence scores. This marks Mistral's fourth OCR generation in approximately 15 months, emphasizing European AI sovereignty.
The model supports 170 languages across 10 groups and accepts various document formats like PDF and DOC. A key feature is its self-hosted deployment capability, appealing to regulated industries concerned about routing sensitive documents through U.S. cloud APIs.
OCR 4 fundamentally shifts from traditional OCR by treating documents as semantic maps. It outputs localized blocks with classifications such as title, table, or signature, along with confidence scores. This structure aids downstream systems like RAG pipelines and compliance workflows by providing crucial traceability and enabling programmatic routing of content.
The release is strategically timed, following an export ban on Anthropic's AI models, which highlighted European reliance on U.S. technology. Mistral's self-hosted option offers a compliant alternative under the EU AI Act, with enforcement beginning August 2.
Baidu also released an open-weight OCR model, Unlimited-OCR, around the same time, highlighting a market split between self-hosted enterprise solutions and free, open-weight tools. OCR 4 is positioned as a commercial product for enterprise procurement, focusing on SLAs and audits, while Baidu's model suits research needs.
Mistral views OCR 4 as an entry point into enterprise AI budgets, feeding into its broader AI stack. The company aims to compete with larger U.S. rivals by offering a differentiated enterprise solution centered on sovereignty and structured document intelligence, targeting significant revenue growth.