feedzop-word-mark-logo
searchLogin
Feedzop
homeFor YouUnited StatesUnited States
You
bookmarksYour BookmarkshashtagYour Topics
Trending
trending

Safeway open Thanksgiving Day

trending

Knicks vs Hornets NBA Cup

trending

Avalanche win tenth straight

trending

Faith Winter dies in crash

trending

Curry suffers quad bruise injury

trending

Fox leads Spurs victory

trending

Starbucks open Thanksgiving, hours vary

trending

Detroit Thanksgiving parade route, time

trending

NFL games Week 13 schedule

Terms of UsePrivacy PolicyAboutJobsPartner With Us

© 2025 Advergame Technologies Pvt. Ltd. ("ATPL"). Gamezop ® & Quizzop ® are registered trademarks of ATPL.

Gamezop is a plug-and-play gaming platform that any app or website can integrate to bring casual gaming for its users. Gamezop also operates Quizzop, a quizzing platform, that digital products can add as a trivia section.

Over 5,000 products from more than 70 countries have integrated Gamezop and Quizzop. These include Amazon, Samsung Internet, Snap, Tata Play, AccuWeather, Paytm, Gulf News, and Branch.

Games and trivia increase user engagement significantly within all kinds of apps and websites, besides opening a new stream of advertising revenue. Gamezop and Quizzop take 30 minutes to integrate and can be used for free: both by the products integrating them and end users

Increase ad revenue and engagement on your app / website with games, quizzes, astrology, and cricket content. Visit: business.gamezop.com

Property Code: 5571

Home / Technology / Nine-Month-Old AI Startup Unveils Benchmark Challenging Industry Giants

Nine-Month-Old AI Startup Unveils Benchmark Challenging Industry Giants

18 Nov

•

Summary

  • Artificial Analysis announces new benchmark for AI knowledge and hallucination
  • Benchmark covers over 40 topics, with most models more likely to hallucinate than provide correct answers
  • Claude 4.1 Opus takes first place in the benchmark's key metric
Nine-Month-Old AI Startup Unveils Benchmark Challenging Industry Giants

In a surprising move, a little-known nine-month-old AI company called Artificial Analysis has announced the launch of its new benchmark, AA-Omniscience, which evaluates knowledge and hallucination across more than 40 topics. The benchmark, revealed just last month, has already made waves in the industry.

The results of the AA-Omniscience benchmark are quite startling. According to the data, all but three of the language models tested were more likely to hallucinate, or provide incorrect information, than to give a correct answer. This highlights the significant challenges that still exist in developing AI systems with robust and reliable knowledge.

Despite the sobering findings, there were some bright spots. The Claude 4.1 Opus model managed to take first place in the benchmark's key metric, demonstrating its relative strength in accurately conveying information. This achievement by the nine-month-old startup's creation is a testament to the rapid advancements being made in the field of artificial intelligence.

As the industry continues to grapple with the complexities of building truly knowledgeable and trustworthy AI systems, the AA-Omniscience benchmark promises to play a crucial role in guiding future research and development efforts. With its comprehensive coverage and insightful results, this new tool could help shape the future of the AI landscape.

Disclaimer: This story has been auto-aggregated and auto-summarised by a computer program. This story has not been edited or created by the Feedzop team.
The AA-Omniscience benchmark is a new tool developed by Artificial Analysis, a nine-month-old AI company, to evaluate the knowledge and hallucination capabilities of language models across over 40 topics.
According to the article, the Claude 4.1 Opus model took first place in the benchmark's key metric, demonstrating its relative strength in accurately conveying information.
The benchmark found that all but three of the language models tested were more likely to hallucinate, or provide incorrect information, than to give a correct answer, highlighting the significant challenges in developing AI systems with robust and reliable knowledge.

Read more news on

Technologyside-arrow

You may also like

Claude 4.5 Opus Beats GPT-4.5 in Coding

25 Nov • 6 reads

article image

WWII's Secret Weapons: Drugs, Bats & Codebreaking

1 day ago • 3 reads

article image

AI's Future: Proofs, Not Promises

23 Nov • 80 reads

article image

AI Pioneer LeCun Leaves Meta for New Startup

20 Nov • 35 reads

article image

Judges Warn: AI Fakes Could Undermine Justice

18 Nov • 19 reads

article image