Home TECH Why DeepSeek could change what Silicon Valley believes about AI

Why DeepSeek could change what Silicon Valley believes about AI

23
0
The artificial intelligence breakthrough that is sending shock waves through stock markets, spooking Silicon Valley giants, and generating breathless takes about the end of America’s technological dominance arrived with an unassuming, wonky title: “Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.” The 22-page paper, released last week by a scrappy Chinese AI startup called DeepSeek, didn’t immediately set off alarm bells. It took a few days for researchers to digest the paper’s claims, and the implications of what it described. The company had created a new AI model called DeepSeek-R1, built by a team of researchers who claimed to have used a modest number of second-rate AI chips to match the performance of leading American AI models at a fraction of the cost.

DeepSeek said it had done this by using clever engineering to substitute for raw computing horsepower. And it had done it in China, a country many experts thought was in a distant second place in the global AI race.

Some industry watchers initially reacted to DeepSeek’s breakthrough with disbelief. Surely, they thought, DeepSeek had cheated to achieve R1’s results, or fudged their numbers to make their model look more impressive than it was. Maybe the Chinese government was promoting propaganda to undermine the narrative of American AI dominance. Maybe DeepSeek was hiding a stash of illicit Nvidia H100 chips, banned under US export controls, and lying about it. Maybe R1 was actually just a clever re-skinning of American AI models that didn’t represent much in the way of real progress.

Eventually, as more people dug into the details of DeepSeek-R1 — which, unlike most leading AI models, was released as open-source software, allowing outsiders to examine its inner workings more closely — their scepticism morphed into worry.


And late last week, when lots of Americans started to use DeepSeek’s models for themselves, and the DeepSeek mobile app hit the number one spot on Apple’s App Store, it tipped into full-blown panic.

Discover the stories of your interest


I’m skeptical of the most dramatic takes I’ve seen over the past few days — such as the claim, made by one Silicon Valley investor, that DeepSeek is an elaborate plot by the Chinese government to destroy the US tech industry. I also think it’s plausible that the company’s shoestring budget has been badly exaggerated, or that it piggybacked on advancements made by American AI firms in ways it hasn’t disclosed. But I do think that DeepSeek’s R1 breakthrough was real. Based on conversations I’ve had with industry insiders, and a week’s worth of experts poking around and testing the paper’s findings for themselves, it appears to be throwing into question several major assumptions the US tech industry has been making.

The first is the assumption that in order to build cutting-edge AI models, you need to spend huge amounts of money on powerful chips and data centres.

It’s hard to overstate how foundational this dogma has become. Companies like Microsoft, Meta and Google have already spent tens of billions of dollars building out the infrastructure they thought was needed to build and run next-generation AI models. They plan to spend tens of billions more — or, in the case of OpenAI, as much as $500 billion through a joint venture with Oracle and SoftBank that was announced last week.

DeepSeek appears to have spent a small fraction of that building R1. We don’t know the exact cost, and there are plenty of caveats to make about the figures they’ve released so far. It’s almost certainly higher than $5.5 million, the number the company claims it spent training a previous model.

But even if R1 cost 10 times more to train than DeepSeek claims, and even if you factor in other costs they may have excluded, like engineer salaries or the costs of doing basic research, it would still be orders of magnitude less than what American AI companies are spending to develop their most capable models.

The obvious conclusion to draw is not that US tech giants are wasting their money. It’s still expensive to run powerful AI models once they’re trained, and there are reasons to think that spending hundreds of billions of dollars will still make sense for companies like OpenAI and Google, which can afford to pay dearly to stay at the head of the pack.

But DeepSeek’s breakthrough on cost challenges the “bigger is better” narrative that has driven the AI arms race in recent years by showing that relatively small models, when trained properly, can match or exceed the performance of much bigger models.

That, in turn, means that AI companies may be able to achieve very powerful capabilities with far less investment than previously thought. And it suggests that we may soon see a flood of investment into smaller AI startups, and much more competition for the giants of Silicon Valley. (Which, because of the enormous costs of training their models, have mostly been competing with each other until now.)

There are other, more technical reasons that everyone in Silicon Valley is paying attention to DeepSeek. In the research paper, the company reveals some details about how R1 was actually built, which include some cutting-edge techniques in model distillation. (Basically, that means compressing big AI models down into smaller ones, making them cheaper to run without losing much in the way of performance.)

DeepSeek also included details that suggested that it had not been as hard as previously thought to convert a “vanilla” AI language model into a more sophisticated reasoning model, by applying a technique known as reinforcement learning on top of it. (Don’t worry if these terms go over your head — what matters is that methods for improving AI systems that were previously closely guarded by US tech companies are now out there on the web, free for anyone to take and replicate.)

Even if the stock prices of US tech giants recover in the coming days, the success of DeepSeek raises important questions about their long-term AI strategies. If a Chinese company is able to build cheap, open-source models that match the performance of expensive US models, why would anyone pay for ours? And if you’re Meta — the only US tech giant that releases its models as free open-source software — what prevents DeepSeek or another startup from simply taking your models, which you spent billions of dollars on, and distilling them into smaller, cheaper models that they can offer for pennies?

DeepSeek’s breakthrough also undercuts some of the geopolitical assumptions many American experts had been making about China’s position in the AI race.

First, it challenges the narrative that China is meaningfully behind the frontier when it comes to building powerful AI models. For years, many AI experts (and the policymakers who listen to them) have assumed that the United States had a lead of at least several years, and that copying the advancements made by US tech firms was prohibitively hard for Chinese companies to do quickly.

But DeepSeek’s results show that China has advanced AI capabilities that can match or exceed models from OpenAI and other American AI companies, and that breakthroughs made by US firms may be trivially easy for Chinese firms — or, at least, one Chinese firm — to replicate in a matter of weeks.

The results also raise questions about whether the steps the US government has been taking to limit the spread of powerful AI systems to our adversaries — namely, the export controls used to prevent powerful AI chips from falling into China’s hands — are working as designed, or whether those regulations need to adapt to take into account new, more efficient ways of training models.

And, of course, there are concerns about what it would mean for privacy and censorship if China took the lead in building powerful AI systems used by millions of Americans. Users of DeepSeek’s models have noticed that they routinely refuse to respond to questions about sensitive topics inside China, such as the Tiananmen Square massacre and Uyghur detention camps. If other developers build on top of DeepSeek’s models, as is common with open-source software, those censorship measures may get embedded across the industry.

Privacy experts have also raised concerns about the fact that data shared with DeepSeek models may be accessible by the Chinese government. If you were worried about TikTok being used as an instrument of surveillance and propaganda, the rise of DeepSeek should worry you, too.

I’m still not sure what the full impact of DeepSeek’s breakthrough will be, or whether we will consider the release of R1 a “Sputnik moment” for the AI industry, as some have claimed.

But it seems wise to take seriously the possibility that we are in a new era of AI brinkmanship now — that the biggest and richest US tech companies may no longer win by default, and that containing the spread of increasingly powerful AI systems may be harder than we thought.

At the very least, DeepSeek has shown that the AI arms race is truly on, and that after several years of dizzying progress, there are still more surprises left in store.

LEAVE A REPLY

Please enter your comment!
Please enter your name here