AI Labs Under Attack: Distillation Threats Emerge
Leading US AI labs like Google, OpenAI, and Anthropic report facing sophisticated 'distillation' attacks aimed at illicitly extracting proprietary model capabilities. These attacks raise national security concerns and could reshape the future of AI development and access.
AI Labs Under Attack: Distillation Threats Emerge
Leading artificial intelligence laboratories in the United States, including Google DeepMind, OpenAI, and Anthropic, have reported experiencing sophisticated ‘distillation’ attacks. These attacks, allegedly orchestrated by foreign entities, aim to illicitly extract proprietary AI model capabilities, raising significant national security and intellectual property concerns.
Understanding AI Distillation
AI distillation is a legitimate and widely used technique in machine learning. It involves training a smaller, less complex model on the outputs of a larger, more powerful model. This process allows developers to create more efficient and cost-effective AI applications. For instance, Google DeepMind has used distillation to create versions of its powerful Gemini models, making advanced AI capabilities accessible at a lower cost.
However, distillation can also be exploited for malicious purposes. Competitors can use this method to acquire advanced AI functionalities in a fraction of the time and cost required for independent development. Anthropic, in its recent report, detailed how entities like DeepSeek, Moonshot AI, and Miniax allegedly created thousands of fraudulent accounts and generated millions of interactions with its Claude models to extract their capabilities.
National Security Risks Highlighted
Anthropic emphasized that models created through illicit distillation pose significant national security risks. Unlike models developed with built-in safeguards against misuse, distilled models may lack these crucial protections. This could enable malicious actors, including state-sponsored groups, to develop dangerous capabilities, such as advanced cyber weapons or even biological agents, without the ethical constraints present in the original models.
The potential for these unprotected AI capabilities to proliferate, especially if the distilled models are open-sourced, is a major concern. Such a scenario could empower authoritarian regimes with advanced AI for offensive cyber operations, disinformation campaigns, and mass surveillance, amplifying global security threats.
The Timeline of Revelations
The reports of distillation attacks emerged in rapid succession in mid-February. On February 12th, Google DeepMind announced it had identified an increase in model extraction attempts violating its terms of service. On the same day, OpenAI warned U.S. lawmakers that Chinese AI startup DeepSeek was attempting to replicate AI models from leading U.S. labs, including OpenAI itself, for its own training purposes.
A week later, Anthropic publicly detailed its findings, naming DeepSeek, Moonshot AI, and Miniax as perpetrators of large-scale distillation campaigns against its Claude models. The sheer volume of alleged interactions—over 16 million exchanges with Claude and 13 million with Miniax’s target—underscores the industrial scale of these operations.
Alleged Perpetrators and Tactics
- DeepSeek: Allegedly conducted over 150,000 exchanges, focusing on reasoning capabilities. Tactics included synchronized traffic across accounts and coordinated timing, suggesting efforts to maximize throughput and evade detection. A notable technique involved prompting Claude to articulate its internal reasoning, effectively generating chain-of-thought training data at scale.
- Moonshot AI: Reportedly used hundreds of fraudulent accounts targeting agentic reasoning, coding, computer vision, and agent development. Moonshot allegedly used request metadata to match campaign activity to public profiles of its senior staff and later attempted to reconstruct Claude’s reasoning traces.
- Miniax: Detected through request metadata and infrastructure indicators, Miniax allegedly conducted 13 million exchanges. The company was identified while actively training a model, providing Anthropic with visibility into the entire lifecycle of the distillation attack. Notably, when Anthropic released a new model during Miniax’s active campaign, the attackers reportedly pivoted within 24 hours to target the new system.
Public Reaction and Counterarguments
The revelations were met with a mixed public reaction, particularly on social media platforms like Twitter. Some users accused Anthropic of hypocrisy, drawing parallels between the alleged distillation attacks and the methods used to train AI models on vast datasets scraped from the internet, including copyrighted material. Critics argued that if AI companies can use the internet’s content, other AI companies should be able to use their model outputs.
Counterarguments suggest that the scale of the alleged attacks might be overstated for political purposes, particularly in the context of policy debates surrounding AI export controls. Some analysts point out that 16 million conversations, while substantial, might be a small fraction of the total daily interactions handled by leading AI models, questioning its sufficiency for extracting frontier capabilities unless the attacks were highly targeted and surgical.
Why This Matters: The Geopolitical and Economic Implications
The core issue transcends simple intellectual property theft. The ability to illicitly distill advanced AI capabilities could significantly alter the global AI landscape. If foreign entities can bypass existing export controls and rapidly acquire cutting-edge AI technology, it could erode the competitive advantage held by U.S. AI labs.
This situation arises amid sensitive geopolitical discussions. A month prior to the distillation attack revelations, the Trump administration reportedly considered shifting U.S. semiconductor policy to allow the export of advanced AI chips to China under certain conditions. Such a move, if enacted, could provide Chinese firms with the computational power needed to develop and deploy advanced AI systems, potentially narrowing the gap with U.S. leaders.
The timing of these public disclosures has led some to speculate that they may serve as a lobbying effort to influence government policy, urging for tighter restrictions rather than looser ones. The argument is that by highlighting the vulnerability of public APIs to capability extraction, AI companies could push for a more controlled, possibly private, ecosystem for their most advanced models.
The Future of AI: A Two-Tiered System?
The revelations of distillation attacks could accelerate a trend towards a more guarded AI development model. As AI capabilities become increasingly powerful, particularly in areas with dual-use potential (like bioweapons development or advanced cyber warfare), the ethical and security calculus for releasing models via public APIs shifts dramatically.
This could lead to a two-tiered AI system: one for vetted corporate and government entities with access to frontier capabilities, and another public tier that operates with models several generations behind. Such a scenario would concentrate immense AI power within a select group, a prospect that has long worried privacy advocates and open-source AI proponents.
Ultimately, the ongoing saga of distillation attacks highlights the complex interplay between technological advancement, national security, intellectual property, and international policy. The coming months will likely see intense debate and potential policy shifts as governments and AI developers grapple with these emerging threats and their profound implications.
Source: Google, OpenAI & Anthropic All Reported the Same Threat (YouTube)





