Home / Health / AI Models Miss Crucial Women's Health Advice
AI Models Miss Crucial Women's Health Advice
7 Jan
Summary
- AI models inaccurately answered 60% of women's health queries.
- Experts found AI advice insufficient for urgent medical needs.
- Bias in training data contributes to AI's gender health gaps.

Leading AI models have demonstrated a significant failure rate in providing adequate advice for women's health concerns. A benchmark test involving 13 large language models revealed that approximately 60% of medical queries related to women's health were answered insufficiently by these AIs. These findings are particularly concerning as the queries were designed by medical professionals to represent situations requiring urgent attention, spanning specialties like gynaecology and neurology.
The research team, motivated by concerns over existing gender bias in medical knowledge being amplified by AI, highlighted the varied performance across different models. GPT-5 performed best, with a 47% failure rate, while Mistral 8B showed the highest failure rate at 73%. Experts suggest that historical training data, laden with societal biases, contributes to AI's limitations in understanding sex and gender-related medical information.
While some experts question the broad applicability of the 60% figure due to the specific nature of the test queries, the researchers emphasize their benchmark's conservative approach. They aim to establish a clinical standard for evaluation, acknowledging that even minor omissions in healthcare advice can have significant consequences. Companies like OpenAI state their models are designed to support, not replace, medical care and are continuously improving accuracy and context awareness for users.




