Home / Health / AI Health Bot Fails 52% of Emergency Cases
AI Health Bot Fails 52% of Emergency Cases
27 Feb
Summary
- AI tool under-triaged over half of simulated emergency medical cases.
- It missed 52% of gold-standard emergencies, directing users to non-urgent care.
- Mental health crisis alerts triggered inconsistently in suicidal ideation scenarios.

A recent study published in Nature Medicine has raised significant safety concerns about ChatGPT Health, OpenAI's AI tool for health guidance. The research found that the AI under-triaged over half of simulated emergency medical cases it was tested on. The tool, launched on January 7, 2026, was evaluated using 60 clinician-authored vignettes across various clinical domains.
While the AI performed adequately with moderate cases, it failed at the extremes. It missed 52% of gold-standard emergencies, incorrectly recommending non-urgent care for critical conditions like diabetic ketoacidosis and impending respiratory failure. Furthermore, the AI demonstrated inconsistency in flagging mental health crisis alerts for suicidal ideation.
The study, published on February 23, 2026, was based on synthetic cases and researchers emphasized the need for prospective, real-world validation. OpenAI acknowledged the research, noting the study's limitations and stating the model undergoes continuous updates. Experts warn that such AI triage failures could lead to harmful delays in critical care, fueling ongoing debates about AI's readiness for direct consumer health decision-making.




