
DeepSeek has made permanent a 75% price cut on its flagship V4 Pro model, in a move that directly targets the cost structures behind today’s largest AI systems. The company’s new pricing tiers put its models far below comparable offerings from major Western labs that are widely used in enterprise production.
According to the company’s published pricing comparisons, DeepSeek V4 Pro now comes in at around seven times cheaper on input tokens and 17 times cheaper on output tokens than models such as Anthropic’s Claude Sonnet and OpenAI’s GPT 5.5-Med. For organisations running large-scale workloads, that gap translates into significantly lower operating costs for similar classes of capability.
DeepSeek is also pushing aggressively at the lower end of the stack. Its V4 Flash model, a lighter, speed-optimised variant is priced to undercut entry-tier options like Claude Haiku by roughly 10x to 25x. That positions V4 Flash as a budget-conscious choice for use cases that prioritise throughput and latency while still drawing on the same overall model family.
The pricing shifts are not positioned as a temporary promotion but as the output of architectural changes. DeepSeek attributes the cuts to a set of hardware–software optimisations, particularly around cache, that make its models more efficient to run at scale. While the company has not detailed every element of the stack in the provided material, it links the lower per-token prices directly to these efficiency gains.
The cost differential is especially stark when DeepSeek’s models are hosted natively in China. In that configuration, the company’s cache-read pricing is described as being 87 times cheaper than Western cloud offerings. That level of discount effectively sets a new price floor for cached inference in those regions, with implications for anyone running long-context or cache-heavy workloads.
The ripple effects are already visible among Chinese hardware and platform providers. Handset maker Xiaomi has moved to match DeepSeek’s cache-read pricing tier for its newly deployed MiMo architecture, mirroring the same level rather than trying to undercut it further. That indicates at least one major player sees DeepSeek’s pricing as a new reference point for AI infrastructure in its home market.
DeepSeek is pairing its pricing story with benchmark data aimed at showing that V4 Pro is not just cheaper, but competitive on quality. The company’s model card for DeepSeek V4 Pro highlights external evaluations placing it close to Western frontier systems on several technical measures.
On coding-agent tasks, DeepSeek V4 Pro records a score of 80.6% on the SWE-bench Verified leaderboard, a benchmark that tracks performance on software engineering-related challenges. That result is presented as putting the model almost on par with top-tier Western systems that target similar workloads in enterprise development and automation.
For broader reasoning and technical understanding, DeepSeek cites an 87.5 score on the advanced MMLU-Pro technical index, a demanding benchmark used to assess higher-level reasoning across specialised domains. That figure places V4 Pro in what DeepSeek describes as the “elite” range on that test, reinforcing the argument that its pricing does not come at the expense of capability.
Both V4 Pro and V4 Flash belong to the same model family, with V4 Pro aimed at more demanding tasks and V4 Flash tuned for speed. DeepSeek characterises V4 Flash as a hyper-optimised, fast variant intended for deployments where responsiveness and cost-per-call are critical.
The combination of aggressive token pricing, cache-read discounts in China, and benchmarked performance near Western frontier models positions DeepSeek as a cost-focused challenger in the global AI ecosystem. How far that pressure reshapes pricing and infrastructure strategies elsewhere remains to be seen, but the new floor it has set particularly around cached inference is now public and explicit.
Discover more from TechBooky
Subscribe to get the latest posts sent to your email.







