Do AI models like ChatGPT provide accurate women's health advice?

No, a recent test found that AI models, including ChatGPT, provided inadequate advice for 60% of women's health queries.

Why do AI models fail on women's health questions?

AI models often inherit biases from historical data, leading to gaps in understanding specific medical needs related to women's health.

Who developed the test for AI medical expertise in women's health?

The test was created by 17 women's health researchers, pharmacists, and clinicians from the US and Europe.

Home / Health / AI Models Miss Crucial Women's Health Advice

AI Models Miss Crucial Women's Health Advice

7 Jan

•

Summary

AI models inaccurately answered 60% of women's health queries.
Experts found AI advice insufficient for urgent medical needs.
Bias in training data contributes to AI's gender health gaps.

AI Models Miss Crucial Women's Health Advice

Leading AI models have demonstrated a significant failure rate in providing adequate advice for women's health concerns. A benchmark test involving 13 large language models revealed that approximately 60% of medical queries related to women's health were answered insufficiently by these AIs. These findings are particularly concerning as the queries were designed by medical professionals to represent situations requiring urgent attention, spanning specialties like gynaecology and neurology.

The research team, motivated by concerns over existing gender bias in medical knowledge being amplified by AI, highlighted the varied performance across different models. GPT-5 performed best, with a 47% failure rate, while Mistral 8B showed the highest failure rate at 73%. Experts suggest that historical training data, laden with societal biases, contributes to AI's limitations in understanding sex and gender-related medical information.