feedzop-word-mark-logo
searchLogin
Feedzop
homeFor YouIndiaIndia
You
bookmarksYour BookmarkshashtagYour Topics
Trending
Terms of UsePrivacy PolicyAboutJobsPartner With Us

© 2026 Advergame Technologies Pvt. Ltd. ("ATPL"). Gamezop ® & Quizzop ® are registered trademarks of ATPL.

Gamezop is a plug-and-play gaming platform that any app or website can integrate to bring casual gaming for its users. Gamezop also operates Quizzop, a quizzing platform, that digital products can add as a trivia section.

Over 5,000 products from more than 70 countries have integrated Gamezop and Quizzop. These include Amazon, Samsung Internet, Snap, Tata Play, AccuWeather, Paytm, Gulf News, and Branch.

Games and trivia increase user engagement significantly within all kinds of apps and websites, besides opening a new stream of advertising revenue. Gamezop and Quizzop take 30 minutes to integrate and can be used for free: both by the products integrating them and end users

Increase ad revenue and engagement on your app / website with games, quizzes, astrology, and cricket content. Visit: business.gamezop.com

Property Code: 5571

Home / Technology / AI Tests Obsolete: New Benchmark Redefines Progress

AI Tests Obsolete: New Benchmark Redefines Progress

7 Jan

Summary

  • New AI benchmark focuses on economically useful actions, not just recall.
  • GDPval-AA tests AI on real-world tasks across 44 occupations.
  • Scientific reasoning tests reveal AI still struggles with deep discovery.
AI Tests Obsolete: New Benchmark Redefines Progress

The rapidly evolving field of artificial intelligence is grappling with a significant challenge: existing benchmarks are failing to accurately measure the progress of increasingly sophisticated AI models. Artificial Analysis, a key benchmarking organization, has responded by releasing its Intelligence Index v4.0. This updated index fundamentally redefines how AI capabilities are assessed, moving beyond simple recall to evaluate "economically useful action" across agents, coding, scientific reasoning, and general knowledge.

Central to the new index is GDPval-AA, an evaluation designed to test AI models on real-world tasks relevant to 44 different occupations and nine industries. Unlike previous benchmarks, this test assesses AI's ability to produce professional deliverables such as documents, slides, and spreadsheets. Concurrently, the CritPT benchmark, focusing on graduate-level physics problems, highlights that even advanced AI systems still struggle with deep scientific reasoning, scoring below 11.5% on complex research challenges.

These advancements in AI evaluation come at a critical juncture for major players like OpenAI, Google, and Anthropic, who have recently launched new models. The revised benchmarks aim to provide clearer insights for enterprise buyers, particularly concerning AI hallucination rates, a distinct factor now weighed in the index. This shift signifies a move towards evaluating AI not just on its theoretical capabilities, but on its practical utility in performing tasks that professionals are paid to do.

Disclaimer: This story has been auto-aggregated and auto-summarised by a computer program. This story has not been edited or created by the Feedzop team.
GDPval-AA is a new AI benchmark that tests models on real-world, economically valuable tasks across 44 occupations and 9 industries, assessing their ability to produce professional deliverables.
Leading AI models have become so advanced that they now master traditional tests, making it difficult to differentiate between them and hindering enterprise decision-making.
Artificial Analysis's Intelligence Index v4.0 now emphasizes economically useful actions and real-world task completion, incorporating new evaluations like GDPval-AA and CritPT.

Read more news on

Technologyside-arrowOpenAIside-arrowAnthropicside-arrowGoogleside-arrow
•
trending

Delhi yellow alert issued

trending

Nepal scraps Everest waste scheme

trending

Madhav Gadgil passes away

trending

US tariffs impact India

trending

Real Madrid Super Copa clash

trending

Arsenal lead Premier League table

trending

Aamantran portal crashes

trending

Ashes Test: England vs Australia

You may also like

AI Chatbots Linked to Teen Suicides: Settlements Near

50 mins ago

article image

Google's AI Glasses: A 2026 Comeback Story?

3 Jan • 70 reads

article image

Google Buys Data Center Powerhouse for AI

23 Dec, 2025 • 115 reads

article image

Microsoft CEO: AI Race Needs Billions

22 Dec, 2025 • 105 reads

article image

Google Sues Data Scraper Over Stolen Content

20 Dec, 2025 • 128 reads

article image