CompactifAI by Multiverse Computing Could Slash AI Costs by 80%
AI costs are skyrocketing, especially for companies running large language models (LLMs). But a Spanish startup, Multiverse Computing, may have just changed the game with a breakthrough compression technology called CompactifAI. This innovation promises to cut model size by up to 95%—without sacrificing performance. In other words, AI models could soon be faster, cheaper, and even portable enough to run on devices like phones and Raspberry Pi. In this post, we’ll explore what CompactifAI is, how it works, and why it’s poised to redefine AI infrastructure. If you're wondering how to reduce LLM inference costs or deploy powerful AI models on edge devices, this might be your answer.
Image Credits:Vithun Khamsong / Getty ImagesWhat is CompactifAI and Why It Matters
At its core, CompactifAI is a quantum-inspired compression technology developed by Multiverse Computing. Unlike traditional model optimization, this technique is rooted in a concept called tensor networks, a computational approach that simulates quantum computing on classical hardware. CompactifAI compresses open-source LLMs like Llama 4 Scout, Llama 3.3 70B, and Mistral Small 3.1, shrinking them by as much as 95% while preserving accuracy. That’s revolutionary, especially at a time when training and running large models like GPT-4 can cost millions. By reducing the model size, organizations can significantly cut down on inference costs—up to 80%—and deploy AI in ways previously impossible.
How CompactifAI Lowers AI Costs and Boosts Speed
Multiverse's compressed “Slim” models aren’t just smaller—they’re up to 12x faster in inference and 4x more efficient than their uncompressed counterparts. For example, the Llama 4 Scout Slim model costs just $0.10 per million tokens on AWS, compared to $0.14 for the standard version. While that may seem minor per request, it becomes significant at scale, saving tens of thousands monthly. The models are available through Amazon Web Services or can be licensed for on-premise deployment. This makes them ideal for enterprises, startups, and developers looking to optimize LLM usage without compromising output quality.
Compact AI for the Edge: Phones, PCs, and Raspberry Pi
Here’s where things get really exciting: some of Multiverse’s compact AI models are so efficient they can run on phones, laptops, vehicles, drones, and even Raspberry Pi devices. That opens doors for on-device AI—real-time interactions, low latency, privacy, and offline functionality. Imagine smart home devices with ChatGPT-level capabilities or AI copilots in cars that don’t rely on the cloud. Multiverse is currently expanding its supported models to include DeepSeek R1 and other advanced reasoning models. While proprietary models like GPT-4 or Claude are not yet supported, the ability to deploy powerful, open-source LLMs at the edge is a massive leap forward.
The Minds Behind the Magic: Science Meets Startups
Multiverse Computing was co-founded by Román Orús, a professor at the Donostia International Physics Center. He’s internationally recognized for his work in tensor networks, a technique previously used mostly in quantum physics and now in AI compression. His team blends deep scientific expertise with real-world AI applications—an ideal combination for a startup solving such a complex problem. Their approach doesn’t just offer incremental improvement—it rethinks how and where AI models can run. With their €189 million Series B funding, the company is set to scale CompactifAI globally and help businesses build faster, greener, and more cost-efficient AI systems.
Post a Comment