DeepSeek's new chatbot boasts an impressive introduction: "Hi, I was created so you can ask anything and get an answer that might even surprise you." This AI, a product of the Chinese startup DeepSeek, has quickly become a major player, even contributing to a significant drop in NVIDIA's stock price. Its success stems from a unique combination of architectural innovation and training methodologies.
DeepSeek's model distinguishes itself through several key technological advancements:
- Multi-token Prediction (MTP): Unlike traditional word-by-word prediction, MTP forecasts multiple words simultaneously, boosting both accuracy and efficiency.
- Mixture of Experts (MoE): This architecture leverages multiple neural networks to process data, accelerating training and improving overall performance. DeepSeek V3 utilizes 256 neural networks, activating eight for each token processing task.
- Multi-head Latent Attention (MLA): This mechanism focuses on the most crucial sentence elements, repeatedly extracting key details to minimize information loss and ensure nuanced understanding of input data.
Image: ensigame.com
DeepSeek initially claimed a remarkably low training cost of just $6 million for its powerful DeepSeek V3 model, using only 2048 GPUs. However, SemiAnalysis revealed a far more extensive infrastructure: approximately 50,000 Nvidia Hopper GPUs, including 10,000 H800, 10,000 H100, and additional H20 units, spread across multiple data centers. This represents a total server investment of roughly $1.6 billion, with operational expenses estimated at $944 million.
Image: ensigame.com
A subsidiary of the Chinese hedge fund High-Flyer, DeepSeek owns its data centers, providing unparalleled control over model optimization and faster innovation implementation. Its self-funded nature enhances agility and decision-making. The company also attracts top talent, with some researchers earning over $1.3 million annually, primarily recruiting from Chinese universities.
Image: ensigame.com
While DeepSeek's initial $6 million training cost claim seems misleading (covering only pre-training, excluding research, refinement, data processing, and infrastructure), the company has invested over $500 million in AI development. Its lean structure allows for efficient innovation compared to larger, more bureaucratic corporations.
Image: ensigame.com
DeepSeek's story highlights the potential of a well-funded, independent AI company to compete with industry giants. However, its success is undeniably linked to substantial investment, technological breakthroughs, and a strong team. While claims of revolutionary budget efficiency are arguably exaggerated, the company's costs remain significantly lower than competitors. For example, DeepSeek spent $5 million on R1, while ChatGPT4 cost $100 million. However, it’s still cheaper than its competitors.