AI Inference Bottleneck: How Gimlet Labs Just Changed Everything
If you have been following the AI industry, you already know the dirty secret nobody wants to talk about: most of the expensive, power-hungry hardware running AI workloads sits idle between 70 and 85 percent of the time. That is hundreds of billions of dollars burning a hole in the data center floor. A new startup called Gimlet Labs just raised $80 million to fix that, and their approach is turning heads across Silicon Valley.
| Credit: Google |
The Problem Nobody Was Solving Fast Enough
AI adoption has exploded. Enterprises are deploying agents, running inference at scale, and chaining together multi-step workflows that touch dozens of tools in a single session. The hardware industry scrambled to keep up, producing a sprawling, diverse ecosystem of chips — traditional CPUs, AI-optimized GPUs, high-memory systems, and specialized silicon from a growing list of manufacturers.
The catch? None of that hardware was designed to work together. Each chip does something brilliantly, but no single chip does everything well. AI inference is compute-bound. Decoding is memory-bound. Tool calls are network-bound. The result is a fragmented, inefficient fleet of hardware that most software cannot fully exploit.
That fragmentation is costing the industry dearly. Current estimates put data center spending on a trajectory toward nearly $7 trillion by 2030. Yet the utilization rate of deployed hardware hovers somewhere between 15 and 30 percent. The math is brutal.
Meet the Team That Saw the Gap
Zain Asgar is a Stanford adjunct professor and a founder who has been through the acquisition cycle before. He previously co-built Pixie, an open-source observability tool for Kubernetes, alongside Michelle Nguyen, Omid Azizi, and Natalie Serrino. Pixie was acquired by New Relic in 2020, just two months after launching with a Series A led by Benchmark. The technology eventually became part of the open-source organization overseeing Kubernetes itself.
When that chapter closed, the same team began asking a different question. Not how to observe what hardware is doing, but how to get every piece of hardware working at full capacity at the same time. That question became Gimlet Labs.
What a Multi-Silicon Inference Cloud Actually Means
Gimlet Labs calls its product the first and only multi-silicon inference cloud. That phrase sounds technical, but the concept is elegantly practical. The software acts as an orchestration layer that slices up an AI workload and distributes each slice to the hardware best suited to handle it simultaneously.
Think of it this way. When a large language model agent receives a query, it does not just generate text in one step. It performs retrieval, reasoning, decoding, and tool calls in sequence or in parallel. Each of those tasks has different hardware preferences. Gimlet's software maps those preferences in real time and routes work accordingly, across CPUs, GPUs, high-memory systems, and purpose-built AI chips — all at once.
The company goes even further. It claims to slice the underlying model itself, running different portions of the same model on different chip architectures simultaneously. That is a significant technical claim, and it apparently works. Gimlet reports reliably achieving 3x to 10x inference speed improvements at the same cost and power consumption.
The Hardware Partners Are Already On Board
Gimlet Labs has already secured partnerships with NVIDIA, AMD, Intel, ARM, Cerebras, and d-Matrix. That roster covers the dominant players in general-purpose compute, the leading AI-specialized silicon manufacturers, and the emerging wave of inference-focused chip startups. The breadth of those partnerships signals that this is not a pitch deck promise — these are production integrations.
As new chips roll out and older GPUs get redeployed into second-tier infrastructure, the multi-silicon fleet grows more complex and harder to manage. Gimlet's pitch is that the hardware puzzle is already being assembled. The missing piece was always the software to make it coherent.
80 Million Dollars and Already Profitable Territory
The Series A round was led by Menlo Ventures, with participation from Eclipse Ventures, Prosperity7, Triatomic, and Factory, which led the prior seed round. Angel investors include names from the highest levels of the technology industry — Sequoia's Bill Coughran, Stanford Professor Nick McKeown, former VMware CEO Raghu Raghuram, and Intel CEO Lip-Bu Tan. With the seed included, total funding now stands at $92 million.
What makes the funding story more compelling than most is the revenue behind it. Gimlet publicly launched in October and announced eight-figure revenues immediately out of the gate — meaning at least $10 million. In the four months since launch, the customer base has more than doubled. Current clients reportedly include a major AI model lab and an extremely large cloud computing company, neither of which Asgar is naming yet.
Asgar described the fundraising process as moving quickly once momentum built. After running into lead investor Tim Tully of Menlo Ventures by chance, and after receiving angel interest from Stanford professors, term sheets started arriving. When word spread that Asgar was reviewing offers, the round became oversubscribed fast.
Why This Matters Beyond the Funding Headline
The story of Gimlet Labs is not just a fundraising announcement. It is a signal about where enterprise AI infrastructure is heading in 2026.
The era of solving AI performance problems by simply buying more GPUs is running into economic and physical limits. Power costs are rising. Data center capacity is constrained. The pressure to justify AI spending with real efficiency gains is intensifying from boards and CFOs who have been patient but are now asking hard questions about return on investment.
Gimlet's approach reframes the conversation entirely. Instead of asking how much new hardware to buy, it asks how much more value can be extracted from the hardware already deployed. That is a fundamentally different and more sustainable model, and it resonates with enterprise buyers who are tired of being told the answer to every AI problem is more spend.
What Comes Next for AI Infrastructure
The 30-person team at Gimlet Labs is positioned at an interesting intersection: not a chip company, not a cloud provider, but a software layer that makes both work harder. That positioning gives it optionality. As the AI hardware ecosystem continues to fragment and diversify, the value of an abstraction layer that unifies it only increases.
Asgar's stated goal is to make AI workloads 10x more efficient than they are today. Given that current utilization rates are so low, the headroom to achieve that goal already exists in deployed infrastructure. The question is whether the software can consistently deliver on the promise at the scale of the world's largest model labs and cloud providers.
Early signals suggest it can. The customer list is growing. The revenue is real. The hardware partnerships are in place. And the problem being solved — wasted AI compute at industrial scale — is not going away anytime soon.
For an industry spending trillions to run AI at scale, the most valuable company might turn out to be the one that finally makes that scale worth what it costs.