AI experts say that DeepSeek's emergence has upended a key dogma underpinning the industry's approach to growth — showing that bigger isn't always better.
nvn news
Mon Feb 03 2025
Chinese startup DeepSeek is shaking up the AI industry with its latest models, claiming they match or outperform top U.S. models at a fraction of the cost. The company trained its DeepSeek-V3 model with under $6 million in computing power, far less than what U.S. firms spend. Its AI assistant has surpassed ChatGPT as the top-rated free app on Apple’s U.S. App Store.
The efficiency of DeepSeek-V3 and DeepSeek-R1 has drawn praise from Silicon Valley but also skepticism. Critics question its low training costs and speculate that the company may secretly own 50,000 Nvidia H100 chips, violating U.S. export controls—an allegation DeepSeek has not addressed.
DeepSeek is based in Hangzhou and controlled by Liang Wenfeng, co-founder of hedge fund High-Flyer. In 2023, the fund shifted its focus from trading to AI research, leading to DeepSeek’s creation. High-Flyer is believed to have invested in DeepSeek and owns patents related to AI training chips.
DeepSeek's rise has caught Beijing’s attention. On January 20, Chinese Premier Li Qiang invited Liang to a high-level meeting, signaling DeepSeek’s role in China’s push for AI self-sufficiency amid U.S. chip restrictions.
WHAT AI Experts Say
AI experts say that DeepSeek's emergence has upended a key dogma underpinning the industry's approach to growth — showing that bigger isn't always better. "The fact that DeepSeek could be built for less money, less computation and less time and can be run locally on less expensive machines, argues that as everyone was racing towards bigger and bigger, we missed the opportunity to build smarter and smaller," Kristian Hammond a professor of computer science at Northwestern University, said recently .
"In some ways, DeepSeek's advances are more evolutionary than revolutionary," Ambuj Tiwari , a professor of statistics and computer science at the University of Michigan, said . "They are still operating under the dominant paradigm of very large models (100s of billions of parameters) on very large datasets (trillions of tokens) with very large budgets."
This allows for faster training with fewer computational resources," Thomas CAO a professor of technology policy at Tufts University, told . "DeepSeek has also refined nearly every step of its training pipeline — data loading, parallelization strategies, and memory optimization — so that it achieves very high efficiency in practice."
Similarly, while it is common to train AI models using human-provided labels to score the accuracy of answers and reasoning, R1's reasoning is unsupervised. It uses only the correctness of final answers in tasks like math and coding for its reward signal, which frees up training resources to be used elsewhere.
No comments yet