How often do large language models accurately diagnose medical conditions?

During a study in the UK, large language models accurately identified medical conditions in fewer than 34.5% of cases.

Why is AI less accurate in diagnosing medical conditions?

Users often do not provide enough information, which hinders the AI's ability to make an accurate diagnosis. Initial correct responses can also become inaccurate after users add more details.

Can AI chatbots provide correct medical follow-up steps?

The study found that AI chatbots provided correct follow-up steps to users only 44.2% of the time.

Home / Technology / AI Doctors Fail Medical Test: Study Finds Low Accuracy

AI Doctors Fail Medical Test: Study Finds Low Accuracy

12 Mar

•

Summary

Large language models diagnosed medical conditions accurately less than 34.5% of the time.
Users often provided insufficient information, hindering AI diagnostic accuracy.
AI chatbots provided correct follow-up health steps only 44.2% of the time.

AI Doctors Fail Medical Test: Study Finds Low Accuracy

A study involving 1,298 participants in the UK found that large language models (LLMs), such as ChatGPT, correctly identified medical conditions in less than 34.5% of cases when used for medical advice. While LLMs demonstrate medical knowledge comparable to passing the US Medical Licensing Exam, practical application revealed significant issues.

Participants often failed to provide adequate information in their initial queries, with 16 out of 30 sampled interactions containing only partial details. Even when initial responses were correct, subsequent user input sometimes led to the introduction of new, incorrect information. The study also noted that LLMs provided accurate follow-up health steps only 44.2% of the time, underscoring the unreliability for serious medical situations.

Disclaimer: This story has been auto-aggregated and auto-summarised by a computer program. This story has not been edited or created by the Feedzop team.