What is Humanity's Last Exam?

Humanity's Last Exam is a new 2,500-question assessment developed by global researchers to accurately measure artificial intelligence capabilities, covering advanced academic fields.

How did AI models perform on Humanity's Last Exam?

Leading AI models struggled significantly, with GPT-4o scoring 2.7% and other advanced systems reaching accuracy levels between 40% and 50%.

Why was Humanity's Last Exam created?

The exam was developed because AI systems had begun scoring too high on older academic benchmarks, necessitating a new test to accurately assess their true understanding and limitations.

Home / Technology / AI Outsmarted: New Exam Tests True Intelligence

AI Outsmarted: New Exam Tests True Intelligence

13 Mar

Summary

A new 2,500-question exam, Humanity's Last Exam, was developed by global researchers.
The test covers advanced math, humanities, science, and specialized fields.
Leading AI models like GPT-4o scored below 3% on the challenging assessment.

AI Outsmarted: New Exam Tests True Intelligence

A worldwide collaboration of nearly 1,000 researchers has developed "Humanity's Last Exam" (HLE), a comprehensive 2,500-question assessment to gauge advanced AI capabilities. The exam aims to overcome the limitations of older benchmarks that AI systems now easily surpass.

Details of HLE, which covers mathematics, humanities, natural sciences, and specialized academic areas, were published in Nature. The questions are designed to require deep knowledge and context, preventing simple internet searches for answers.

Leading AI models have demonstrated significant struggles with HLE. For instance, GPT-4o achieved only a 2.7% score, and even advanced systems like Gemini 3.1 Pro and Claude Opus 4.6 reached accuracy levels between 40% and 50%.

Dr. Tung Nguyen, a professor at Texas A&M University, emphasized that HLE is not a threat but a crucial tool for understanding AI's limitations, ensuring safer technology development and underscoring the enduring value of human expertise.

The researchers have made some questions public while retaining the majority to ensure the exam remains a durable, transparent, and effective benchmark for future AI development, confirming a wide gap in intelligence remains.

Disclaimer: This story has been auto-aggregated and auto-summarised by a computer program. This story has not been edited or created by the Feedzop team.

Home / Technology / AI Outsmarted: New Exam Tests True Intelligence

AI Outsmarted: New Exam Tests True Intelligence

13 Mar

•

Summary

A new 2,500-question exam, Humanity's Last Exam, was developed by global researchers.
The test covers advanced math, humanities, science, and specialized fields.
Leading AI models like GPT-4o scored below 3% on the challenging assessment.

Disclaimer: This story has been auto-aggregated and auto-summarised by a computer program. This story has not been edited or created by the Feedzop team.