Now Reading: Nvidia bets on open infrastructure for the agentic AI era with Nemotron 3

Loading
svg

Nvidia bets on open infrastructure for the agentic AI era with Nemotron 3

NewsDecember 16, 2025Artifice Prime
svg13

AI agents must be able to cooperate, coordinate, and execute across large contexts and long time periods, and this, says Nvidia, demands a new type of infrastructure, one that is open.

The company says it has the answer with its new Nemotron 3 family of open models.

Developers and engineers can use the new models to create domain-specific AI agents or applications without having to build a foundation model from scratch. Nvidia is also releasing most of its training data and its reinforcement learning (RL) libraries for use by anyone looking to build AI agents.

“This is Nvidia’s response to DeepSeek disrupting the AI market,” said Wyatt Mayham of Northwest AI Consulting. “They’re offering a ‘business-ready’ open alternative with enterprise support and hardware optimization.”

Introducing Nemotron 3 Nano, Super, and Ultra

Nemotron 3 features what Nvidia calls a “breakthrough hybrid latent mixture-of-experts (MoE) architecture”. The model comes in three sizes:

  • Nano: The smallest and most “compute-cost-efficient,” intended for targeted, highly-efficient tasks like quick information retrieval, software debugging, content summarization, and AI assistant workflows. The 30-billion-parameter model activates 3 billion parameters at a time for speed and has a 1-million-token context window, allowing it to remember and connect information over multi-step tasks.
  • Super: An advanced, high-accuracy reasoning model with roughly 100 billion parameters, up to 10 billion of which are active per token. It is intended for applications that require many collaborating agents to tackle complex tasks, such as deep research and strategy planning, with low latency.
  • Ultra: A large reasoning engine intended for complex AI applications. It has 500 billion parameters, with up to 50 billion active per token.

Nemotron 3 Nano is now available on Hugging Face and through other inference service providers and enterprise AI and data infrastructure platforms. It will soon be made available on AWS via Amazon Bedrock and will be supported on Google Cloud, CoreWeave, Microsoft Foundry, and other public infrastructures. It is also offered as a pre-built Nvidia NIM microservice.

Nemotron 3 Super and Ultra are expected to be available in the first half of 2026.

Positioned as an infrastructure layer

The strategic positioning here is fundamentally different from that of the API providers, experts note.

“Nvidia isn’t trying to compete with OpenAI or Anthropic’s hosted services — they’re positioning themselves as the infrastructure layer for enterprises that want to build and own their own AI agents,” said Mayham.

Brian Jackson, principal research director at Info-Tech Research Group, agreed that the Nemotron models aren’t intended as a ready-baked product. “They are more like a meal kit that a developer can start working with,” he said, “and make desired modifications along the way to get the exact flavor they want.”

Hybrid architecture enhances performance

So far, Nemotron 3 seems to be exhibiting impressive gains in efficiency and performance; according to third-party benchmarking company Artificial Analysis, Nano is the most efficient among those of its size, and leads in accuracy.

Nvidia says Nano’s hybrid Mamba-Transformer MoE architecture, which integrates three architectures into a single backbone, supports this efficiency. Mamba layers offer efficient sequence modeling, transformer layers provide precision reasoning, and MoE routing gives scalable compute efficiency. The company says this design delivers a 4X higher token throughput compared to Nemotron 2 Nano while reducing reasoning-token generation by up to 60%.

“Throughput is the critical metric for agentic AI,” said Mayham. “When you’re orchestrating dozens of concurrent agents, inference costs scale dramatically. Higher throughput means lower cost per token and more responsive real-time agent behavior.”

The 60% reduction in reasoning-token generation addresses the “verbosity problem,” where chain-of-thought (CoT) models generate excessive internal reasoning before producing useful output, he noted. “For developers building multi-agent systems, this translates directly to lower latency and reduced compute costs.”

The upcoming Nemotron 3 Super, Nvidia says, excels at applications that require many collaborating agents to achieve complex tasks with low latency, while Nemotron 3 Ultra will serve as an advanced reasoning engine for AI workflows that demand deep research and strategic planning.

Mayham explained that these as-yet-unreleased models feature latent MoE, which projects tokens into a smaller, latent, dimension before expert routing, “theoretically” enabling 4X more experts at the same inference cost because it reduces communication overhead between GPUs.

The hybrid architecture behind Nemotron 3 that combines Mamba-2 layers, sparse transformers, and MoE routing is “genuinely novel in its combination,” Mayham said, although each technique exists individually elsewhere.

Ultimately, Nemotron pricing is “attractive,” he said; open weights are free to download and run locally. Third-party API pricing on DeepInfra starts at $0.06/million input tokens for Nemotron 3 Nano, which is “significantly cheaper” than GPT-4o, he noted.

Differentiator is openness

To underscore its commitment to open source, Nvidia is revealing some of Nemotron 3’s inner workings, releasing a dataset with real-world telemetry for safety evaluations, and 3 trillion tokens of Nemotron 3’s pretraining, post-training, and RL datasets.

In addition, Nvidia is open-sourcing its NeMo Gym and NeMo RL libraries, which provide Nemotron 3’s training environments and post-training foundation, and NeMo Evaluator, to help builders validate model safety and performance. All are now available on GitHub and Hugging Face. Of these, Mayham noted, NeMo Gym might be the most “strategically significant” piece of this release.

Pre-training teaches models to predict tokens, not to complete domain-specific tasks, and traditional RL from human feedback (RLHF) doesn’t scale for complex agentic behaviors, Mayham explained. NeMo Gym enables RL with verifiable rewards — essentially computational verification of task completion rather than subjective human ratings. That is, did the code pass tests? Is the math correct? Were the tools called properly?

This gives developers building domain-specific agents the infrastructure to train models on their own workflows without having to understand the full RL training loop.

“The idea is that NeMo Gym will speed up the setup and execution of RL jobs for models,” explained Jason Andersen, VP and principal analyst with Moor Insights & Strategy. “The important distinction is NeMo Gym decouples the RL environment from the training itself, so it can easily set up and create multiple training instances (or ‘gyms’).”

Mayham called this “unprecedented openness” the real differentiator of the Nemotron 3 release. “No major competitor offers that level of completeness,” he said. “For enterprises, this means full control over customization, on premises deployment, and cost optimization that closed providers simply can’t match.”

But there is a tradeoff in capability, Mayham pointed out: Claude and GPT-4o still outperform Nemotron 3 on specialized tasks like coding benchmarks. However, Nemotron 3 seems to be targeting a different buyer: Enterprises that need deployment flexibility and don’t want vendor lock-in.

“The value proposition for enterprises isn’t raw capability, it’s the combination of open weights, training data, deployment flexibility, and Nvidia ecosystem integration that closed providers can’t match,” he said.

This article originally appeared on InfoWorld.

Original Link:https://www.computerworld.com/article/4106805/nvidia-bets-on-open-infrastructure-for-the-agentic-ai-era-with-nemotron-3-2.html
Originally Posted: Tue, 16 Dec 2025 04:18:31 +0000

0 People voted this article. 0 Upvotes - 0 Downvotes.

Artifice Prime

Atifice Prime is an AI enthusiast with over 25 years of experience as a Linux Sys Admin. They have an interest in Artificial Intelligence, its use as a tool to further humankind, as well as its impact on society.

svg
svg

What do you think?

It is nice to know your opinion. Leave a comment.

Leave a reply

Loading
svg To Top
  • 1

    Nvidia bets on open infrastructure for the agentic AI era with Nemotron 3

Quick Navigation