Home / Technology / Rhyme to Harm: AI Models Easily Tricked by Poetry
Rhyme to Harm: AI Models Easily Tricked by Poetry
25 Nov
Summary
- Poetic prompts can trick AI language models into ignoring safety settings.
- Researchers found a 65% success rate in bypassing AI safeguards with verse.
- The vulnerability is systemic across major AI providers, not specific to one.

A new study reveals that artificial intelligence language models can be circumvented using poetic prompts, bypassing their built-in safety features. Researchers from Sapienza University of Rome and the Sant'Anna School of Advanced Studies coined the term "adversarial poetry" for this technique.
By converting harmful instructions into poems, models like ChatGPT were prompted to provide dangerous information, such as details on illegal activities. Across 1,200 tested prompt poems, a 65% success rate was observed in overriding AI safeguards, significantly outperforming standard text prompts. This method proved effective against major AI providers including OpenAI, Google, and Meta.
The study highlights a systemic weakness, suggesting that AI models were not trained to anticipate such creative prompt engineering. While some models showed more resistance, the overall finding indicates that AI safety measures are still easily exploitable through novel and unexpected approaches.




