DeepSeek Unveils Sparse Attention Model That Halves API Costs

DeepSeek releases ‘sparse attention’ model that cuts API costs in half, making long-context AI more affordable and efficient.
Matilda
DeepSeek Unveils Sparse Attention Model That Halves API Costs
DeepSeek Releases ‘Sparse Attention’ Model That Cuts API Costs In Half DeepSeek is shaking up the AI landscape with a breakthrough release. On September 29, 2025, the company announced DeepSeek V3.2-exp , an experimental model designed to lower inference costs. The highlight: DeepSeek releases ‘sparse attention’ model that cuts API costs in half , a move that could make long-context AI more affordable for developers worldwide. Image Credits:VCG / Getty Images What Makes Sparse Attention Different? At the core of V3.2-exp is DeepSeek Sparse Attention . Instead of processing every token equally, the model uses a “lightning indexer” to prioritize key sections of text. Then, a “fine-grained token selection system” drills deeper, pulling only the most relevant tokens into its limited attention window. This clever two-step process allows the model to handle long-context inputs with much lower server strain. In short, it’s about smarter token prioritization — not brute-force computing power. API Cost…