How did ChatGPT Health perform in emergency medical case simulations?

In simulated emergency medical cases, ChatGPT Health under-triaged over half of the scenarios, missing 52% of gold-standard emergencies and recommending non-urgent care.

What are the safety concerns regarding ChatGPT Health?

The AI tool has demonstrated safety concerns by under-triaging critical medical cases and showing inconsistencies in crisis alert activations, raising questions about its readiness for direct consumer health decision-making.

Home / Health / AI Health Bot Fails 52% of Emergency Cases

AI Health Bot Fails 52% of Emergency Cases

Q: Were there issues with mental health crisis alerts from the AI?

Yes, the AI showed inconsistency in triggering mental health crisis alerts for suicidal ideation scenarios, activating unpredictably.

27 Feb

•

Summary

AI tool under-triaged over half of simulated emergency medical cases.
It missed 52% of gold-standard emergencies, directing users to non-urgent care.
Mental health crisis alerts triggered inconsistently in suicidal ideation scenarios.

AI Health Bot Fails 52% of Emergency Cases

A recent study published in Nature Medicine has raised significant safety concerns about ChatGPT Health, OpenAI's AI tool for health guidance. The research found that the AI under-triaged over half of simulated emergency medical cases it was tested on. The tool, launched on January 7, 2026, was evaluated using 60 clinician-authored vignettes across various clinical domains.

While the AI performed adequately with moderate cases, it failed at the extremes. It missed 52% of gold-standard emergencies, incorrectly recommending non-urgent care for critical conditions like diabetic ketoacidosis and impending respiratory failure. Furthermore, the AI demonstrated inconsistency in flagging mental health crisis alerts for suicidal ideation.