Home / Technology / AI Doctors Fail Medical Test: Study Finds Low Accuracy
AI Doctors Fail Medical Test: Study Finds Low Accuracy
12 Mar
Summary
- Large language models diagnosed medical conditions accurately less than 34.5% of the time.
- Users often provided insufficient information, hindering AI diagnostic accuracy.
- AI chatbots provided correct follow-up health steps only 44.2% of the time.

A study involving 1,298 participants in the UK found that large language models (LLMs), such as ChatGPT, correctly identified medical conditions in less than 34.5% of cases when used for medical advice. While LLMs demonstrate medical knowledge comparable to passing the US Medical Licensing Exam, practical application revealed significant issues.
Participants often failed to provide adequate information in their initial queries, with 16 out of 30 sampled interactions containing only partial details. Even when initial responses were correct, subsequent user input sometimes led to the introduction of new, incorrect information. The study also noted that LLMs provided accurate follow-up health steps only 44.2% of the time, underscoring the unreliability for serious medical situations.




