Home / Technology / AI 'Distillation': The Shady Side of Chatbot Cloning
AI 'Distillation': The Shady Side of Chatbot Cloning
13 Feb
Summary
- Actors attempted to clone AI by prompting it over 100,000 times.
- This practice is known as 'model extraction' or 'distillation'.
- Attackers seek a competitive edge by replicating AI capabilities.

Google disclosed on Thursday that malicious actors have targeted its Gemini AI, attempting to "clone" its capabilities through extensive prompting. "Commercially motivated" entities reportedly initiated over 100,000 prompts, largely in non-English languages, to collect data for training less expensive, copycat AI models. Google has labeled this tactic "model extraction," equating it to intellectual property theft, although the company itself has faced similar accusations in the past.
The industry refers to this method as "distillation." It involves feeding a target AI model numerous prompts and using the resulting input-output pairs to train a smaller, more cost-effective model that mimics the original's behavior. This technique allows developers to bypass the substantial investment required for training large language models from scratch. While the cloned model does not access the original's code or data, it can replicate many functionalities by analyzing outputs.
Google's threat intelligence group observed a rise in these distillation attacks, with many campaigns focusing on Gemini's simulated reasoning algorithms. The company stated it identified a specific 100,000-prompt campaign and has since enhanced Gemini's defenses, though details remain undisclosed. Such practices are not unique to Google; OpenAI has accused competitors of similar methods, and the technique has become a common approach for building smaller AI models across the industry.
Competitors have been cloning AI language model capabilities since at least the GPT-3 era. For instance, Stanford researchers created the Alpaca model by fine-tuning Meta's leaked LLaMA model using GPT-3.5 outputs for approximately $600. More recently, Elon Musk's xAI faced scrutiny when its Grok chatbot exhibited behavior eerily similar to ChatGPT, with an engineer attributing it to accidental data ingestion. As long as large language models are publicly accessible via APIs, determined actors can potentially replicate their functions over time, a challenge Google appears to be actively addressing.




