feedzop-word-mark-logo
searchLogin
Feedzop
homeFor YouIndiaIndia
You
bookmarksYour BookmarkshashtagYour Topics
Trending
Terms of UsePrivacy PolicyAboutJobsPartner With Us

© 2026 Advergame Technologies Pvt. Ltd. ("ATPL"). Gamezop ® & Quizzop ® are registered trademarks of ATPL.

Gamezop is a plug-and-play gaming platform that any app or website can integrate to bring casual gaming for its users. Gamezop also operates Quizzop, a quizzing platform, that digital products can add as a trivia section.

Over 5,000 products from more than 70 countries have integrated Gamezop and Quizzop. These include Amazon, Samsung Internet, Snap, Tata Play, AccuWeather, Paytm, Gulf News, and Branch.

Games and trivia increase user engagement significantly within all kinds of apps and websites, besides opening a new stream of advertising revenue. Gamezop and Quizzop take 30 minutes to integrate and can be used for free: both by the products integrating them and end users

Increase ad revenue and engagement on your app / website with games, quizzes, astrology, and cricket content. Visit: business.gamezop.com

Property Code: 5571

Home / Technology / AI Fails White-Collar Job Test: New Benchmark Reveals Flaws

AI Fails White-Collar Job Test: New Benchmark Reveals Flaws

23 Jan

•

Summary

  • New Apex-Agents benchmark shows AI models failing white-collar tasks.
  • Models struggle most with multi-domain information retrieval.
  • Top models achieved only 24% accuracy on complex professional queries.
AI Fails White-Collar Job Test: New Benchmark Reveals Flaws

Despite predictions of AI replacing knowledge work, recent research indicates a slow change. Mercor's new Apex-Agents benchmark, designed to mimic real professional tasks in consulting, investment banking, and law, found that leading AI models are currently failing. These advanced models struggled to correctly answer more than a quarter of the complex queries presented.

The primary challenge for AI lies in its difficulty with multi-domain information retrieval, a core aspect of human knowledge work that often involves integrating data from various platforms like Slack and Google Drive. This limitation was evident in queries requiring in-depth analysis of company policies and relevant laws.

While OpenAI's GDPVal benchmark assesses general knowledge, Apex-Agents focuses on sustained task performance in specific high-value professions. Even the top performers, like Gemini 3 Flash at 24% accuracy, demonstrate that AI is not yet ready to automate these roles. However, the rapid pace of AI development suggests this benchmark will soon be surpassed.

trending

Microsoft email services outage

trending

NEET MDS 2026 exam date

trending

Al-Hilal vs Al Feiha LIVE

trending

Dow Jones nears 50000 points

trending

Delhi sees warmest January

trending

Casemiro leaving Manchester United

trending

NFL playoff weather forecast

trending

West Red Lake gold production

trending

Paarl Royals vs JSK Eliminator

Disclaimer: This story has been auto-aggregated and auto-summarised by a computer program. This story has not been edited or created by the Feedzop team.
Apex-Agents is a new benchmark testing AI models on real-world white-collar tasks from fields like law and investment banking.
AI models struggle primarily with integrating information from multiple sources and domains, a key aspect of complex professional work.
Currently, AI models score poorly on professional tasks, with top performers achieving only about 24% accuracy, indicating they are not yet ready for full automation.

Read more news on

Technologyside-arrowArtificial Intelligence (AI)side-arrow

You may also like

OpenAI's "Sweetpea" Earbuds: Apple's Next Big Rival?

1 day ago • 17 reads

article image

AI Models Miss Crucial Women's Health Advice

7 Jan • 83 reads

article image

Amazon Eyes $10B OpenAI Stake Amid AI Spending Spree

17 Dec, 2025 • 258 reads

article image

AI Trade Bifurcation: Winners and Losers Emerge

16 Dec, 2025 • 206 reads

article image

AI Chatbot Accused in Deadly Delusion Case

11 Dec, 2025 • 245 reads

article image