How did ChatGPT Health perform in emergency medical case simulations?

In simulated emergency medical cases, ChatGPT Health under-triaged over half of the scenarios, missing 52% of gold-standard emergencies and recommending non-urgent care.

What are the safety concerns regarding ChatGPT Health?

The AI tool has demonstrated safety concerns by under-triaging critical medical cases and showing inconsistencies in crisis alert activations, raising questions about its readiness for direct consumer health decision-making.

Home / Health / AI Health Bot Fails 52% of Emergency Cases

AI Health Bot Fails 52% of Emergency Cases

Q: Were there issues with mental health crisis alerts from the AI?

Yes, the AI showed inconsistency in triggering mental health crisis alerts for suicidal ideation scenarios, activating unpredictably.

27 Feb

•

Summary

AI tool under-triaged over half of simulated emergency medical cases.
It missed 52% of gold-standard emergencies, directing users to non-urgent care.
Mental health crisis alerts triggered inconsistently in suicidal ideation scenarios.

AI Health Bot Fails 52% of Emergency Cases

A recent study published in Nature Medicine has raised significant safety concerns about ChatGPT Health, OpenAI's AI tool for health guidance. The research found that the AI under-triaged over half of simulated emergency medical cases it was tested on. The tool, launched on January 7, 2026, was evaluated using 60 clinician-authored vignettes across various clinical domains.

While the AI performed adequately with moderate cases, it failed at the extremes. It missed 52% of gold-standard emergencies, incorrectly recommending non-urgent care for critical conditions like diabetic ketoacidosis and impending respiratory failure. Furthermore, the AI demonstrated inconsistency in flagging mental health crisis alerts for suicidal ideation.

The study, published on February 23, 2026, was based on synthetic cases and researchers emphasized the need for prospective, real-world validation. OpenAI acknowledged the research, noting the study's limitations and stating the model undergoes continuous updates. Experts warn that such AI triage failures could lead to harmful delays in critical care, fueling ongoing debates about AI's readiness for direct consumer health decision-making.

Disclaimer: This story has been auto-aggregated and auto-summarised by a computer program. This story has not been edited or created by the Feedzop team.