Qatar
Positron targets Middle East’s AI infrastructure with energy-efficient inference
As the Middle East accelerates investment in sovereign AI, one U.S. startup is rethinking how models are deployed—helping improve efficiency, manage costs, and support broader access to AI at scale.
September 11, 2025 | 03:11 PM
Across the Gulf, governments are investing billions in sovereign AI initiatives, data center expansions, and smart city ecosystems. From NEOM’s greenfield infrastructure to Abu Dhabi’s G42-backed cloud services and Dubai’s push to embed AI in public services, the region is making AI central to its long-term economic leadership. But as these ambitions scale, they demand energy-efficient, high-performance infrastructure capable of running AI at national levels—without breaking the grids or the budgets. This challenge is prompting a closer look at energy-efficient alternatives to running AI models.This shift is already underway in the U.S., where trillions of dollars are being poured into data center infrastructure. With it, a wave of new players is emerging with hardware optimized for efficiency, scalability, and cost-effectiveness in AI workloads. One of the most promising is Positron, a startup with deep connections to capital partners behind well-known brands—two of the most compute-intensive sectors in AI. The company recently raised $51M to scale its inference accelerators, designed for high power efficiency, strong performance-per-dollar, and optimized memory bandwidth, key factors for large-scale inference deployment.As Middle Eastern nations plant their flags and chart the course of their AI futures, it’s worth looking for solutions beyond obvious hardware, like expensive, non-specialized GPUs. Positron’s recent deployments with leading US neoclouds like Cloudflare might be an indicator of the powerful and efficient infrastructure that will soon be deployed in many countries in the Middle East.Where AI Gets ExpensiveWhile hardware choices often focus on raw power or training benchmarks, the real costs and energy demands come after the model is trained. Inference, the phase where AI models are actually deployed and used, is what makes or breaks infrastructure strategies. And it’s where the gap between traditional GPUs and purpose-built systems becomes most apparent.In the training phase, models learn from data to make predictions and perform tasks. Inference is the subsequent phase where a trained model uses its knowledge to process new data and generate outputs like user prompts and synthetic content, or power customer-facing AI assistants. Where training is a one-time cost, inference is a continuous process that scales with every user interaction on every AI platform. And today, inference has become the largest and fastest-growing source of AI’s ongoing energy consumption.Positron is directly addressing this challenge with its inference system, designed from the ground up to maximize memory capacity and support large-scale deployment across neocloud platforms and enterprise environments. By focusing on efficiency, memory optimization, and compatibility, Positron is helping AI builders and infrastructure providers improve performance while working toward more energy-efficient operations.Rethinking AI’s Cost CurveToday, inference accounts for nearly 90 percent of all model-related workloads across both consumer and enterprise applications. Despite this, most inference is still being run on power-hungry general-purpose GPUs, which were originally designed to train large, flexible models, not for the demands of real-time, high-throughput inference workloads.This misalignment has created a hidden cost structure in which enterprises and cloud providers are forced to rely on expensive GPU hardware to perform repetitive, memory-bound inference tasks, often utilizing only a fraction of each chip’s capacity. This inefficiency inflates operating costs and puts additional strain on power grids in regions where energy management and infrastructure optimization are critical concerns.Positron’s answer to this problem is Atlas, a full-stack inference system that leverages reprogrammable FPGAs to maximize memory bandwidth and throughput. Atlas is designed to make more efficient use of memory bandwidth than standard GPUs, with a focus on improving cost-effectiveness and reducing energy consumption. The platform is already in use by neocloud companies, including Cloudflare and Parasail, where it’s powering large-scale inference workloads with no code modifications required.Strategic Funding to Scale AI AccessPositron just closed a $51.6 million Series A funding round led by Valor Equity Partners, Atreides Management, and DFJ Growth with participation from 1517 Fund, Flume Ventures, and Unless. The round brings the company’s total funding to $75 million and reflects growing investor confidence in inference-optimized infrastructure as the next major wave in AI.This funding will support the continued deployment of Positron’s first-gen product and accelerate the rollout of Titan, the company’s upcoming ASIC-based inference platform slated for release in 2026. Titan is being developed to support some of the largest and most complex AI models in use today, with a planned capacity of up to 16 trillion parameters and two terabytes of high-speed memory per chip. Its modular, high-throughput design is intended to address the growing demand for scalable, sovereign AI infrastructure in both enterprise and government deployments.Unlike traditional solutions, Positron’s hardware supports existing model binaries and APIs, minimizing the need for code rewrites or major system changes.Enabling Innovation in Emerging MarketsAs AI deployment expands beyond Big Tech and into broader enterprise and public-sector applications, the economic burden of inference is becoming increasingly visible. AI-driven tools are now reaching users across every geography and industry—but their accessibility still hinges on the cost and efficiency of the underlying computer infrastructure.By making inference more affordable and energy-efficient, AI will become accessible beyond the walls of Big Tech and elite research labs, which will open the door for a broader range of users to fine-tune open-source models and run them locally. This unlocks entirely new categories of use cases, from healthcare diagnostics in rural clinics to real-time translation in schools——applications that only become viable in power-constrained environments with energy-efficient inference solutions.This transformation is especially relevant in the Middle East, where governments are prioritizing both AI leadership and sustainability. By decoupling inference performance from power-hungry GPUs, Positron offers a path forward for building AI infrastructure that is both economically and environmentally aligned with the region’s strategic goals.The Future of AI InfrastructureLooking ahead, the next phase of AI evolution will not be determined by who can train the largest model, but by who can deploy it most efficiently. Inference is already the dominant workload—and its impact on cost, access, and infrastructure design is only accelerating. As AI becomes a foundational layer across sectors, the ability to serve models at scale—securely, affordably, and sustainably—will shape the region’s global economic posture.Positron’s inference-first architecture presents a credible alternative to GPU-dominated infrastructure. By designing systems tailored to how AI is actually used in production, the company is unlocking new opportunities for cloud providers, national governments, and enterprise platforms alike. In the Middle East, a region increasingly focused on energy efficiency, technological development, and AI infrastructure, this shift in architecture could power the next chapter of AI growth in digitally ambitious, resource-aware economies like those across the Middle East.
September 11, 2025 | 03:11 PM