NVIDIA’s DFlash Unleashes 15x Speed Boost on Blackwell GPUs

Ready for a breakthrough in AI inference speed? NVIDIA just pushed the limits with its latest DFlash speculative decoding technology. This game-changing innovation delivers up to 15 times higher throughput on NVIDIA Blackwell GPUs. The leap in performance is not just hype—it’s verified across multiple models and tasks, showing massive gains that could reshape how we run large language models.
What Makes DFlash So Fast?
DFlash breaks away from the old token-by-token decoding approach. Instead, it drafts entire token blocks in a single pass. This block-level speculative decoding slashes the time models spend generating text. The NVIDIA engineering team calls this method the “definitive 2026 framework” for unlocking extreme throughput on Blackwell GPUs.
The technology is lightweight and open source, designed to speed up inference without sacrificing user responsiveness. NVIDIA AI confirmed, “Increase inference performance by up to 15x without sacrificing responsiveness.” That means you get blazing-fast results while keeping interactions smooth and snappy.
Performance That Speaks Volumes
The numbers are staggering. Across a range of models and tasks, DFlash achieves over 6× lossless acceleration. Specifically, on NVIDIA Blackwell GPUs, it hits:
- Up to 15× higher throughput for gpt-oss-120b at the same user interactivity target
- An average 4.86× speedup on Qwen3-8B using greedy decoding
- A peak 6.08× boost on the challenging MATH-500 task, with an average τ = 6.49 across various benchmarks
DFlash outpaces competing approaches too. It delivers an average 2.3× speedup on gpt-oss-120b, compared to EAGLE-3’s 1.7× at matched concurrency. On Llama 3.1 8B Instruct, DFlash averages 2.8× speedup, beating EAGLE-3’s 2.2×. These results highlight how well DFlash scales across popular large language models.
JetFlow and the Broader Ecosystem
Alongside DFlash, the JetFlow team from UC San Diego, ByteDance, and MSRA is pushing performance boundaries too. JetFlow achieves up to 9.64× speedup on MATH-500 and 4.58× speedup on open-ended conversational workloads running on NVIDIA H100 GPUs.
These advances signal a fast-evolving landscape where NVIDIA’s hardware and software teams, as well as external research groups, race to unlock new efficiency levels. DFlash shines on Blackwell GPUs, while JetFlow showcases strong gains on H100s, proving that the future of LLM inference is blazing fast.
What This Means for AI and You
Faster throughput means more users can interact with large language models smoothly and simultaneously. It slashes inference latency while keeping the user experience responsive. For AI developers and enterprises, this opens doors to deploying larger, more complex models without sacrificing speed.
NVIDIA AI sums it up perfectly: “Deploying DFlash to propose an entire token block in a single pass instead of brittle token-by-token drafting is the definitive 2026 framework to unlock 15x higher throughput on NVIDIA Blackwell.”
The technology is poised to accelerate innovation across many AI applications—from chatbots and virtual assistants to complex reasoning tasks. As GPUs like Blackwell and H100 continue to evolve, these decoding breakthroughs will drive the next wave of AI-powered tools that feel instant and intelligent.
The race for faster, smarter AI inference just kicked into overdrive. With DFlash leading the pack, NVIDIA is setting a new standard for performance that will ripple across the AI ecosystem this year and beyond.
Based on
- DFlash Speculative Decoding Drafts Whole Token Blocks in Parallel for Up to 15x Higher Throughput on NVIDIA Blackwell — marktechpost.com
- NVIDIA Blackwell GPUs Achieve 15x AI Inference Boost With DFlash – Blockchain.News — blockchain.news
- Increase inference performance by up to 15x without sacrificing responsiveness.
DFlash, an open source lightweight block diffusion model designed for speculative decoding, delivers up to 15x higher… | NVIDIA AI | 12 comments — linkedin.com
- JetFlow: 9.64x faster LLM inference with parallel tree… | The Neural Feed — theneuralfeed.com
- Boost Inference Performance up to 15x on NVIDIA Blackwell Using DFlash Speculative Decoding – Technical Blog – NVIDIA Developer Forums — forums.developer.nvidia.com




