Baidu’s Unlimited OCR Transforms Long Document Reading with Flat Memory

Baidu launched Unlimited OCR on June 22, 2026. This new model can read long documents in one go. It handles dozens of pages in a single pass, even entire PDFs and multi-page scans.
What makes it special is its memory system. Instead of growing memory as the document gets longer, Unlimited OCR keeps the memory size flat. It does this by replacing the usual decoder attention with something called Reference Sliding Window Attention, or R-SWA. This keeps the key-value cache constant, no matter how long the document is.
The model has 3 billion parameters but only activates 500 million during inference. This approach helps it run faster and use less memory. It supports a huge context length of 32,768 tokens, letting it parse very long texts without breaking a sweat.
Unlimited OCR comes in two configurations. The “base” mode uses 1024 image size, while the “gundam” mode uses 640. Both can handle long documents efficiently, but the base mode offers higher throughput.
Speed and Accuracy Advantages
Unlimited OCR beats the previous DeepSeek OCR model on multiple benchmarks. On OmniDocBench v1.5, it scored 93.23, which is 6.22 points higher than DeepSeek. On the newer OmniDocBench v1.6, it reached an even better 93.92.
Speed-wise, it hits 5,580 tokens per second (TPS) in base mode. That’s a 12.7% increase over DeepSeek’s 4,951 TPS. When generating 6,144 output tokens, Unlimited OCR has a 35% throughput advantage.
This speed boost means it can handle large batches or longer documents faster. Parsing 40-plus pages in one pass is now possible without running out of memory or slowing down.
Open Source and Community Response
Baidu open-sourced Unlimited OCR and shared the weights under the MIT license. The model is available on GitHub, ModelScope, and Hugging Face. It supports popular tools like Hugging Face Transformers, vLLM, SGLang, and Docker Model Runner.
The release sparked quick interest. Within 24 hours of launch, the GitHub repo collected 1,800 stars. This response shows strong excitement from the developer community.
The model was trained by continue-training from the DeepSeek OCR checkpoint. Baidu used about 2 million document samples and ran 4,000 training steps to get the latest version.
It uses PyMuPDF for converting PDFs into images before processing. This step is crucial for handling multi-page scans and ensuring the model reads the entire document in one pass.
Unlimited OCR’s flat memory design and impressive performance make it a breakthrough for document AI. It opens doors for faster, more accurate reading of long papers, contracts, books, and more.
Based on
- Baidu Releases Unlimited OCR, a 3B Model That Keeps the KV Cache Flat for Long-Document Parsing — marktechpost.com
- Baidu’s Unlimited OCR : Beats DeepSeek OCR, Parses entire book in one go | by Mehul Gupta | Data Science in Your Pocket | Jun, 2026 | Medium — medium.com
- Baidu Unlimited-OCR: One-Shot PDF Parsing Is Here | byteiota — byteiota.com
- Baidu Unlimited-OCR: One-Shot Long-Horizon Document Parsing Explained | explainx.ai Blog | explainx.ai — explainx.ai
- Unlimited OCR parses entire PDFs in one pass with a 3B open model – Top AI Product — topaiproduct.com




