AI Inference Takes Center Stage at CES 2026
At this year’s CES in Las Vegas, the focus shifted from building massive AI models to making AI more practical and accessible. Historically, tech companies spent billions on training huge AI models, but now, the spotlight is on inference — the phase where AI uses what it has learned to handle new data. This change marks a significant shift in how businesses and tech providers approach AI development and deployment.
The Changing Landscape of AI Spending
According to Lenovo CEO Yuanqing Yang, most AI investments used to go toward training models. About 80% of spending was on creating large language models that power generative AI, with only 20% directed at inference. But that balance is expected to flip soon. Yang predicts that in the near future, 80% of AI investment will focus on inference, and only 20% on training. This shift reflects a move towards deploying AI in real-world applications rather than just developing the models.
To support this transition, Lenovo introduced three new inference servers at CES. These servers aim to help businesses run AI models more efficiently and at scale. The company’s goal is to lead this emerging trend and provide tools that match the growing demand for AI inference capabilities.
Industry Experts Confirm the Trend
Experts agree that the shift from training to inference is already happening. A November report from Deloitte estimates that by 2025, inference workloads will make up half of all AI compute tasks. That number is expected to rise to about two-thirds in 2026. While infrastructure spending on these workloads is still catching up, the trend is clear: companies are moving from experimenting with AI to deploying it in real products and services.
As enterprises start to deploy AI tools like chatbots and automation, they tend to begin with small investments and scale up over time. This incremental approach means that the initial costs are lower, but as demand grows, so does the need for more powerful inference servers and infrastructure. This dynamic supports the rapid adoption of AI in industries like healthcare, manufacturing, and finance.
This year’s CES showcased how companies are adapting to these changes. Lenovo’s new servers are designed to meet diverse needs — from large-scale language models used in complex applications to compact systems suited for retail or industrial environments. The focus is on making AI deployment more flexible, affordable, and scalable for different types of businesses.
The Future of AI Infrastructure
Looking ahead, industry analysts predict that 2026 will be a pivotal year for AI inference. A recent report from the Futurum Group highlights that inference workloads will soon surpass training revenue. This indicates a major shift in how AI hardware and software are developed and marketed.
As companies move from testing AI models to deploying them widely, the demand for inference servers will continue to grow. Many enterprises are also exploring hybrid and edge deployments, bringing AI closer to where data is generated. This trend not only improves efficiency but also opens up new possibilities for real-time applications in retail, manufacturing, and telecommunications.
Lenovo’s launch of new inference servers at CES reflects this broader industry momentum. These servers are designed to handle diverse workloads and environments, from full-sized language models in data centers to compact devices for retail or industrial use. The focus is on making AI more accessible and easier to implement across different sectors.
Overall, CES 2026 made it clear that AI is entering a new phase. Instead of just building bigger models, the industry is now focusing on making AI usable in everyday applications. This shift promises to accelerate AI adoption and create new opportunities for innovation across many industries.















What do you think?
It is nice to know your opinion. Leave a comment.