Edge AI: The future of AI inference is smarter local compute

Edge AI: The future of AI inference is smarter local compute

NewsJanuary 19, 2026Artifice Prime

“The global edge AI market is on a steep upward trajectory,” says Joshua David, senior director of edge project management at Red Hat. To his point, the edge AI market is anticipated to climb to a $143 billion valuation by 2034.

The bump in edge AI goes hand in hand with a broader shift in focus from AI training, the act of preparing machine learning (ML) models with the right data, to inference, the practice of actively using models to apply knowledge or make predictions in production.

“Advancements in powerful, energy-efficient AI processors and the proliferation of IoT (internet of things) devices are also fueling this trend, enabling complex AI models to run directly on edge devices,” says Sumeet Agrawal, VP of product management at Informatica, an enterprise data management and integration company.

With that, the AI industry is entering a “new and potentially much larger phase: AI inference,” explains an article on the Morgan Stanley blog. They characterize this phase by widespread AI model adoption throughout consumer and enterprise applications.

While public clouds grant elasticity and usability, they have downsides for inference: added latency, data privacy concerns, and increased costs for processing, data ingress, and data egress.

Running AI on the edge solves many of these issues. “Edge AI provides several key benefits such as reduced latency, lower costs, enhanced security, and privacy,” says Red Hat’s David.

Amazon recently hiked prices 15% for GPUs primarily used for certain ML training jobs, signalling that cloud AI costs, particularly for centralized training, may be unpredictable. IDC predicts that by 2027, 80% of CIOs will turn to edge services from cloud providers to meet the demands of AI inference.

However, the shift won’t come without hurdles. Real-time performance demands, the large footprint of AI stacks, and a fragmented edge ecosystem remain top hurdles.

Below, we’ll look at the development progress around edge AI, explore the emerging technologies and practices for running AI on the edge, and consider how the future of computing will evolve at large in the AI age.

What’s driving edge AI growth

“The primary driver behind the edge AI boom is the critical need for real-time data processing,” says David. The ability to analyze data on the edge, rather than using centralized cloud-based AI workloads, helps direct immediate decisions at the source.

Others agree. “Interest in edge AI is experiencing massive growth,” says Informatica’s Agrawal. For him, reduced latency is a key factor, especially in industrial or automotive settings where split-second decisions are critical.

There is also the desire to feed ML models personal or proprietary context without sending such data to the cloud. “Privacy is one powerful driver,” says Johann Schleier-Smith, senior staff software engineer and AI tech lead at Temporal Technologies, provider of an open-source application platform. For heavily regulated sectors like healthcare or finance, processing such sensitive information locally becomes necessary for compliance.

“Interest in edge AI is indeed growing,” says Keith Basil, VP and GM of the edge business unit at SUSE. He points to the manufacturing sector, in which companies are exploring edge AI for various use cases, from running large servers for production lines to processing data from small sensors.

According to Rockwell Automation, 95% of manufacturers have either invested or plan to invest in AI/ML, generative AI, or causal AI in the next five years. 74% of leaders in manufacturing say AI has the potential to help them grow revenue, according to an Intel-sponsored CIO report from 2024.

The major outcome of local AI computation? Reduced cost. “This leads to significant cost and bandwidth optimization, as less data needs to be transmitted,” explains Agrawal.

Tapping the edge for certain workloads correlates with lower costs and reduced energy consumption. In a January 2025 paper published in ArXiv, Quantifying Energy and Cost Benefits of Hybrid Edge Cloud, the research determined that using hybrid edge cloud for agentic AI workloads, as opposed to pure cloud processing, can, under modeled conditions, reap energy savings of up to 75% and cost reductions exceeding 80%.

As the paper’s author, Siavash Alamouti, writes, “Edge processing directly utilizes the local context to minimize computational complexity and avoids these cloud-scale energy demands.”

The tech making local AI possible

The drivers seem clear, but what technology will it take to realize edge AI? Running AI computation on constrained edge environments will likely take a combination of smaller models, light frameworks, and optimized deployment patterns.

Smaller models

To date, most enterprises rely on large language models (LLMs) within public products, such as Anthropic’s Claude, Google’s Gemini, and OpenAI’s GPT models to centralize AI. But recent advancements in AI models are enabling localization.

Specifically, the rise of self-deployable small language models (SLMs) decreases reliance on cloud AI platforms for certain cases. “Small models are getting more powerful,” says Temporal’s Schleier-Smith. He points to OpenAI’s GPT-OSS and Hierarchical Reasoning Model as examples of recent advances.

Optimization strategies

A smaller footprint for local AI is helpful for edge devices, where resources like processing capacity and bandwidth are constrained. As such, techniques to optimize SLMs will be a key area to aid AI on the edge.

One strategy is quantization, a model compression technique that reduces model size and processing requirements. “This enables small language models to run on specialized hardware like NPUs, Google’s Edge TPU, Apple’s Neural Engine, and NVIDIA Jetson devices,” says Agrawal.

Self-contained packages could also help operationalize edge AI at scale. For Red Hat’s David, this equates to readily deployable base images that combine the OS, hardware drivers, and AI models all in one.

Edge runtimes and frameworks

New runtimes and frameworks can also help optimize edge inference. For this purpose, David highlights llama.cpp, a lightweight generative AI runtime, and frameworks like OpenVINO and LiteRT (formerly known as TensorFlow Lite) for inference using models on local hardware.

“Projects like llama.cpp, along with the GGUF model format, are enabling high-performance inference on a wide range of consumer devices,” adds Agrawal. “Similarly, MLC LLM and WebLLM are expanding possibilities for running AI directly in web browsers and on various native platforms.”

Cloud-native compatibility

There is also a big incentive for edge AI to be compatible with the cloud-native ecosystem and Kubernetes, which is increasingly deployed at the edge. For example, KServe, described as “the open-source standard for self-hosted AI,” is a framework that can aid edge inferencing on Kubernetes.

Another enabling technology is Akri, a sandbox project hosted by the Cloud Native Computing Foundation (CNCF). “Akri addresses a critical challenge at the edge: making a wide variety of dynamic and intermittently available leaf devices easily usable by Kubernetes,” SUSE’s Basil explains. By exposing IP cameras, sensors, USB devices, or other endpoints as Kubernetes resources with Akri, you could more easily deploy edge AI workloads that rely on such hardware, and monitor them in Kubernetes.

Open standards

Lastly, open industry standards will likely play a role in edge AI. “The rapidly expanding edge AI hardware and software landscape presents significant interoperability challenges,” says Basil. He believes projects like Margo, a Linux Foundation initiative, will be important for setting standards in industrial edge automation.

ONNX is another emerging standard to improve interoperability challenges among competing frameworks for on-device AI inference.

Barriers to edge AI in practice

Although the tech is available, edge practices are still emerging. It will take some effort to overcome the challenges to moving edge AI applications from a proof-of-concept stage to production reality.

“A primary limitation is the resource-constrained nature of edge devices,” says Agrawal. “Their limited memory and processing power make it difficult to deploy large, complex AI models, which often require substantial computational resources.”

Optimizing the model size for resource-constrained hardware, while still delivering the accuracy of computationally-intensive top-tier models that users have come to expect, remains another sticking point.

Practices for edge AI operation are still nascent. “A primary hurdle is the complex hardware enablement required for specialized edge devices which often don’t work out-of-the-box,” says David. For him, a lack of an end-to-end platform for deploying, monitoring, and managing models at the far edge currently forces some complex manual solutions.

“A major challenge for edge AI is the fragmented ecosystem,” adds Basil. “Unlike the standardized, mature environment of cloud computing, edge AI lacks a common framework for hardware, software, and communication protocols.” Fragmentation in the industry leads to competing device-specific software and techniques, which cause compatibility issues and custom workarounds on the edge.

Finally, managing a distributed network of AI models presents a complex logistical challenge, says Agrawal. “Securely updating, versioning, and monitoring the performance of models across countless deployed devices is a difficult task that organizations must solve to effectively scale their edge AI implementations.”

To get over some of these hurdles, experts recommend the following actions:

Adopt edge AI only where it makes sense (such as inference in low-connectivity environments).
Continually communicate business value to non-technical leadership.
Consider a hybrid cloud-edge strategy rather than fully edge or fully cloud deployments.
Abstract architectural software layers from specific hardware dependencies.
Choose models optimized for edge constraints.
Envision the full model life cycle, including updates, monitoring, and maintenance, from the outset.

From centralized to distributed intelligence

Although interest in edge AI is heating up, similar to the shift toward alternative clouds, experts don’t expect local processing to reduce reliance on centralized clouds in a meaningful way. “Edge AI will have a breakout moment, but adoption will lag that of cloud,” says Schleier-Smith.

Rather, we should expect edge AI to complement the public clouds with new edge capabilities. “Instead of replacing existing infrastructure, AI will be deployed at the edge to make it smarter, more efficient, and more responsive,” says Basil. This could equate to augmenting endpoints running legacy operating systems, or optimizing on-premises server operations, he says.

The general consensus is that edge devices will become more empowered in short order. “We will see rapid advancements in hardware, optimized models, and deployment platforms, leading to deeper integration of AI into IoT, mobile devices, and other everyday applications,” says Agrawal.

“Looking ahead, edge AI is poised for massive growth, driving a fundamental shift toward distributed, user-centric intelligence.”

Original Link:https://www.infoworld.com/article/4117620/edge-ai-the-future-of-ai-inference-is-smarter-local-compute.html
Originally Posted: Mon, 19 Jan 2026 09:00:00 +0000

Upvote0PointsDownvote

0 People voted this article. 0 Upvotes - 0 Downvotes.