Home / Technology / AI Outsmarted: New Exam Tests True Intelligence
AI Outsmarted: New Exam Tests True Intelligence
13 Mar
Summary
- A new 2,500-question exam, Humanity's Last Exam, was developed by global researchers.
- The test covers advanced math, humanities, science, and specialized fields.
- Leading AI models like GPT-4o scored below 3% on the challenging assessment.

A worldwide collaboration of nearly 1,000 researchers has developed "Humanity's Last Exam" (HLE), a comprehensive 2,500-question assessment to gauge advanced AI capabilities. The exam aims to overcome the limitations of older benchmarks that AI systems now easily surpass.
Details of HLE, which covers mathematics, humanities, natural sciences, and specialized academic areas, were published in Nature. The questions are designed to require deep knowledge and context, preventing simple internet searches for answers.
Leading AI models have demonstrated significant struggles with HLE. For instance, GPT-4o achieved only a 2.7% score, and even advanced systems like Gemini 3.1 Pro and Claude Opus 4.6 reached accuracy levels between 40% and 50%.
Dr. Tung Nguyen, a professor at Texas A&M University, emphasized that HLE is not a threat but a crucial tool for understanding AI's limitations, ensuring safer technology development and underscoring the enduring value of human expertise.
The researchers have made some questions public while retaining the majority to ensure the exam remains a durable, transparent, and effective benchmark for future AI development, confirming a wide gap in intelligence remains.




