ChatGPT Maker Suspects China’s Dirt Cheap DeepSeek AI Models Were Built Using OpenAI Data — and the Irony Is Not Lost on the Internet News - Latest Updates and Information

OpenAI has voiced concerns that China's DeepSeek AI models, known for their remarkably low cost, may have been developed using data from OpenAI. This week, Donald Trump called DeepSeek a wake-up call for the U.S. tech industry, following a significant drop in Nvidia's market value—nearly $600 billion—triggered by DeepSeek's emergence.

The debut of DeepSeek sent shockwaves through the AI sector, causing a sharp decline in the stock prices of major AI players. Nvidia, a dominant force in GPU technology crucial for AI model development, suffered the most substantial loss in Wall Street history, with a 16.86% share drop. Microsoft, Meta Platforms, Alphabet (Google's parent company), and Dell Technologies also experienced declines, ranging from 2.1% to 8.7%.

DeepSeek promotes its R1 model as a significantly cheaper alternative to Western AI offerings like ChatGPT. Built upon the open-source DeepSeek-V3, it reportedly requires less computing power and had an estimated training cost of just $6 million—a claim that has been disputed by some. Regardless of the accuracy of this cost figure, DeepSeek has raised questions about the billions invested by American tech companies in AI, unsettling investors. The model's perceived effectiveness propelled it to the top of the U.S. free app download charts.

Bloomberg reported that OpenAI and Microsoft are investigating whether DeepSeek utilized OpenAI's API to integrate OpenAI's AI models into its own. OpenAI stated to Bloomberg that it is aware of efforts by Chinese and other companies to extract data from leading U.S. AI companies. This "distillation" technique, a violation of OpenAI's terms of service, involves training AI models by extracting data from larger, more capable models.

OpenAI emphasized its commitment to protecting its intellectual property, including careful selection of capabilities included in released models, and stressed the importance of collaboration with the U.S. government to safeguard advanced models from adversarial actions. David Sacks, President Donald Trump's AI czar, suggested that DeepSeek may have employed this distillation method, a practice that OpenAI is reportedly unhappy about. He anticipates that leading AI companies will implement measures to prevent such actions.

DeepSeek is accused of using OpenAI’s model to train its competitor using distillation. Image credit: Andrey Rudakov/Bloomberg via Getty Images.

The situation highlights the irony of OpenAI's position, given accusations that ChatGPT itself was built using data scraped from the internet. Tech PR writer Ed Zitron highlighted this hypocrisy on Twitter.

OpenAI's previous stance on the use of copyrighted material in AI training further complicates the issue. In a submission to the UK's House of Lords, OpenAI stated in January 2024 that creating AI tools like ChatGPT without copyrighted material was "impossible." This statement follows lawsuits from the New York Times and 17 authors, including George R. R. Martin, alleging the unlawful use of their copyrighted work. OpenAI maintains that its training practices constitute "fair use." The legal landscape surrounding AI training data and copyright remains complex and contested, further highlighted by a 2018 U.S. Copyright Office ruling that AI-generated art cannot be copyrighted.