DeepSeek Explodes on the Scene

PinIt

The release of the new Chinese chatbot DeepSeek-R1, which uses lower-powered processors, sent shockwaves throughout the industry. Some have called it a Sputnik moment for AI.

Chinese startup DeepSeek saw DeepSeek-R1, its artificial intelligence chatbot announced last week, jump to the top of the Apple App Store downloads on Monday, sending shockwaves throughout the industry. Some have called it a Sputnik moment for AI. The stocks of major chip players, including NVIDIA, Arm, Broadcom, and more, were hit. (NVIDIA’s stock dropped more than 13%.) Additionally, the Nasdaq stock market fell by more than 3% on Monday, with the drop at one point wiping more than $1 trillion off the index of technology stocks, according to industry reports

What’s behind such a strong reaction? According to Technology Review, “DeepSeek aimed for accurate answers rather than detailing every logical step, significantly reducing computing time while maintaining a high level of effectiveness.” As such, the company’s large language model (LLM) touts powerful performance at a fraction of competitors’ steep training costs. Perhaps more importantly, the open-source AI assistant accomplishes its results using less advanced (and lower cost) chips than rival LLMs.

That latter point has significant implications in a number of ways. First, it means rather than requiring vast arrays of high-end GPUs, DeepSeek can run on more modest processors, potentially opening up AI to a wide range of organizations that may have been locked out in the past due to costs.

A second implication relates to the ongoing U.S./China trade wars. The U.S. has been trying to limit China’s access to advanced technology for AI. Earlier this month, before he left office, the Biden administration introduced export controls intended to limit China’s access to powerful GPUs, which underpin advanced AI projects. It appears DeepSeek works with processors that are still readily available.

However, NVIDIA said in a statement on Monday, “DeepSeek’s work illustrates how new models can be created using [test time scaling], leveraging widely-available models and compute that is fully export control compliant. The company stressed that “inference still requires significant numbers of NVIDIA GPUs and high-performance networking.” 

See also: 2025 Predictions: Year of the Commoditization of Large Language Models (LLMS)

What makes DeepSeek different?

Scientific American reported that while DeepSeek “reportedly had a stockpile of high-performance NVIDIA A100 chips from times prior to the U.S. ban. So, its engineers could have used those to develop the model. But in a key breakthrough, the startup says it instead used much lower-powered NVIDIA H800 chips to train the new model, dubbed DeepSeek-R1.” Additionally, the same article noted that because the solution requires less computational power, the cost of running DeepSeek-R1 is a tenth of the cost of similar competitors.

Other industry reports cite different numbers on cost savings, “DeepSeek claims its V3 large language model cost just $5.6 million to train, a fraction of ChatGPT’s reported training costs of more than $100 million. With comparable performance to OpenAI’s o1 model, a 95% cost cut may be especially attractive to cash-strapped companies looking to leverage generative AI (GenAI).”

A further distinction is that the company has made the code behind the product open source. It is available on GitHub.

The model differs from others, such as o1, in how it reinforces learning during training. “While many LLMs have an external “critic” model that runs alongside them, correcting errors and nudging the LLM toward verified answers, DeepSeek-R1 uses a set of rules internal to the model to teach it which of the possible answers it generates is best,” noted experts in the article.

Salvatore Salamone

About Salvatore Salamone

Salvatore Salamone is a physicist by training who has been writing about science and information technology for more than 30 years. During that time, he has been a senior or executive editor at many industry-leading publications including High Technology, Network World, Byte Magazine, Data Communications, LAN Times, InternetWeek, Bio-IT World, and Lightwave, The Journal of Fiber Optics. He also is the author of three business technology books.

Leave a Reply

Your email address will not be published. Required fields are marked *