Baidu’s Unlimited OCR Transforms Long Document Reading with Flat Memory

Artimouse Prime1 hour ago

0 33 2 minutes read

Baidu launched Unlimited OCR on June 22, 2026. This new model can read long documents in one go. It handles dozens of pages in a single pass, even entire PDFs and multi-page scans.

What makes it special is its memory system. Instead of growing memory as the document gets longer, Unlimited OCR keeps the memory size flat. It does this by replacing the usual decoder attention with something called Reference Sliding Window Attention, or R-SWA. This keeps the key-value cache constant, no matter how long the document is.

The model has 3 billion parameters but only activates 500 million during inference. This approach helps it run faster and use less memory. It supports a huge context length of 32,768 tokens, letting it parse very long texts without breaking a sweat.

Unlimited OCR comes in two configurations. The “base” mode uses 1024 image size, while the “gundam” mode uses 640. Both can handle long documents efficiently, but the base mode offers higher throughput.

Speed and Accuracy Advantages

Unlimited OCR beats the previous DeepSeek OCR model on multiple benchmarks. On OmniDocBench v1.5, it scored 93.23, which is 6.22 points higher than DeepSeek. On the newer OmniDocBench v1.6, it reached an even better 93.92.

Speed-wise, it hits 5,580 tokens per second (TPS) in base mode. That’s a 12.7% increase over DeepSeek’s 4,951 TPS. When generating 6,144 output tokens, Unlimited OCR has a 35% throughput advantage.

This speed boost means it can handle large batches or longer documents faster. Parsing 40-plus pages in one pass is now possible without running out of memory or slowing down.

Open Source and Community Response

Baidu open-sourced Unlimited OCR and shared the weights under the MIT license. The model is available on GitHub, ModelScope, and Hugging Face. It supports popular tools like Hugging Face Transformers, vLLM, SGLang, and Docker Model Runner.

The release sparked quick interest. Within 24 hours of launch, the GitHub repo collected 1,800 stars. This response shows strong excitement from the developer community.

The model was trained by continue-training from the DeepSeek OCR checkpoint. Baidu used about 2 million document samples and ran 4,000 training steps to get the latest version.

It uses PyMuPDF for converting PDFs into images before processing. This step is crucial for handling multi-page scans and ensuring the model reads the entire document in one pass.

Unlimited OCR’s flat memory design and impressive performance make it a breakthrough for document AI. It opens doors for faster, more accurate reading of long papers, contracts, books, and more.

Based on

Baidu’s Unlimited OCR Transforms Long Document Reading with Flat Memory

Speed and Accuracy Advantages

Open Source and Community Response

Artimouse Prime

Leave a Reply Cancel reply

Why Most Americans Doubt AI’s Promise and Fear Its Risks

New US Bill Targets AI Deepfakes and Protects Creators’ Voices

Windows June Update Fixes Security but Breaks Key Features

How AI-Generated Influencers Are Changing Social Media Marketing

Why Amazon Is Abandoning Human-in-the-Loop AI Oversight

Mastering Time Series Forecasting and Machine Learning Pipelines in Python

The Real Cost of AI Work and Who Pays the Price

AI’s New Role in Solving Math’s Toughest Problems

OpenAI Faces Possible Legal Fight Over Apple Partnership Disputes

Graphon AI Secures $8.3M to Enhance Enterprise Data Connectivity

OpenAI Launches Mobile Access for Its Coding Platform

Speed and Accuracy Advantages

Open Source and Community Response

Artimouse Prime

Amazon’s $48 Billion India Cloud and AI Push by 2030

How China’s Desert Solar Project Powers Data Centers Directly

Related Articles

Building Real-Time Feature Stores That Actually Work

How Adaptive Optimizers Beat Gradient Descent’s Hidden Struggles

NVIDIA’s 4-Bit Floating Point Pushes AI Training Limits

Next-Gen Multimodal AI Training and Reinforcement Learning Explored

Leave a Reply Cancel reply

Mastering Time Series Forecasting and Machine Learning Pipelines in Python

The Real Cost of AI Work and Who Pays the Price

AI’s New Role in Solving Math’s Toughest Problems

OpenAI Faces Possible Legal Fight Over Apple Partnership Disputes

Graphon AI Secures $8.3M to Enhance Enterprise Data Connectivity

OpenAI Launches Mobile Access for Its Coding Platform