Home / Technology / Open-Weight AI Fails Under Sustained Chat Attacks
Open-Weight AI Fails Under Sustained Chat Attacks
2 Dec
Summary
- Multi-turn AI attacks bypass defenses previously thought secure.
- Attack success rates jump from 13% to 92% with conversational probing.
- Security researchers highlight the need for context-aware AI guardrails.

Open-weight AI models exhibit a critical weakness: while effective against single-turn attacks, they collapse under sustained conversational pressure. Cisco's research quantifies this, showing attack success rates escalating dramatically from an average of 13.11% for single prompts to 64.21% for multi-turn assaults, with some models reaching over 92% failure. This stark contrast underscores that current safety benchmarks fail to capture real-world adversarial tactics.
These multi-turn attacks exploit conversational persistence through techniques like information decomposition, contextual ambiguity, and role-playing. Researchers found that models struggle to maintain contextual defenses over extended dialogues, allowing attackers to refine prompts and bypass safeguards. This vulnerability is systemic, affecting numerous leading open-weight models tested, regardless of their alignment focus, though capability-first models show wider gaps.
To bridge this security gap, enterprises must prioritize context-aware guardrails, model-agnostic runtime protections, and continuous red-teaming focused on multi-turn strategies. Ignoring this systemic vulnerability could lead to catastrophic failures, emphasizing that securing AI conversations, not just individual prompts, is crucial for unlocking wider adoption and mitigating significant security risks.




