feedzop-word-mark-logo
searchLogin
Feedzop
homeFor YouUnited StatesUnited States
You
bookmarksYour BookmarkshashtagYour Topics
Trending
trending

Albino alligator Claude dies at 30

trending

College Football Playoff rankings reveal

trending

Duke defeats Florida, stays perfect

trending

Timberwolves edge Pelicans in OT

trending

Rupee crosses 90 against USD

trending

Thunder beat Warriors without Curry

trending

UConn defeats Kansas

trending

North Carolina defeats Kentucky

trending

USC Trojans defeat Oregon

Terms of UsePrivacy PolicyAboutJobsPartner With Us

© 2025 Advergame Technologies Pvt. Ltd. ("ATPL"). Gamezop ® & Quizzop ® are registered trademarks of ATPL.

Gamezop is a plug-and-play gaming platform that any app or website can integrate to bring casual gaming for its users. Gamezop also operates Quizzop, a quizzing platform, that digital products can add as a trivia section.

Over 5,000 products from more than 70 countries have integrated Gamezop and Quizzop. These include Amazon, Samsung Internet, Snap, Tata Play, AccuWeather, Paytm, Gulf News, and Branch.

Games and trivia increase user engagement significantly within all kinds of apps and websites, besides opening a new stream of advertising revenue. Gamezop and Quizzop take 30 minutes to integrate and can be used for free: both by the products integrating them and end users

Increase ad revenue and engagement on your app / website with games, quizzes, astrology, and cricket content. Visit: business.gamezop.com

Property Code: 5571

Home / Technology / AI Trust Soars: Gemini 3 Tops Real-World Tests

AI Trust Soars: Gemini 3 Tops Real-World Tests

4 Dec

•

Summary

  • Gemini 3 achieved a trust score surge from 16% to 69% in blind user testing.
  • The AI model ranked first in performance, interaction, and trust/safety categories.
  • Prolific's HUMAINE benchmark tests real-world user preferences, not just technical specs.
AI Trust Soars: Gemini 3 Tops Real-World Tests

A recent vendor-neutral evaluation by Prolific has placed Google's Gemini 3 model at the forefront of AI performance, particularly in user trust and real-world applicability. The HUMAINE benchmark, developed by researchers from the University of Oxford, utilized blind testing with 26,000 users to assess AI models across practical attributes, moving beyond traditional academic benchmarks. Gemini 3 demonstrated a remarkable increase in its trust score, rising from 16% to 69%, marking the highest score Prolific has ever recorded.

In the comprehensive evaluation, Gemini 3 secured the top position in three out of four key categories: performance and reasoning, interaction and adaptiveness, and trust and safety. While it slightly lagged in communication style, its overall consistency across 22 diverse demographic user groups, including variations in age, sex, and ethnicity, highlights its broad appeal. Users were found to be five times more likely to select Gemini 3 in head-to-head blind comparisons, underscoring its significant advancement over its predecessor.

Prolific's methodology emphasizes the importance of human data and blind testing to reveal how AI models truly perform across varied user populations. This approach contrasts with static technical benchmarks, offering insights into how model performance can differ based on audience demographics. Enterprises seeking to deploy AI are advised to adopt similar rigorous evaluation frameworks, focusing on consistency across use cases and user groups to ensure effective AI integration.

Disclaimer: This story has been auto-aggregated and auto-summarised by a computer program. This story has not been edited or created by the Feedzop team.
The HUMAINE benchmark is a vendor-neutral evaluation by Prolific that uses blind testing with real users to assess AI models on attributes like trust, performance, and adaptability in real-world scenarios.
Gemini 3 achieved its highest-ever trust score of 69% and ranked first in performance, interaction, and trust/safety categories during Prolific's HUMAINE benchmark.
Blind testing removes brand bias, allowing users to judge AI model outputs solely on their quality and relevance, providing a more accurate measure of performance across diverse demographics.

Read more news on

Technologyside-arrow

You may also like

AI Guardrails Cracked by Poetry: Study Reveals Weakness

1 Dec • 32 reads

article image

Gemini Takes Over: Assistant's Mobile Reign Ends Soon

24 Nov • 71 reads

article image

New AI Model Transforms Rare Disease Diagnosis

24 Nov • 43 reads

article image

AI 'Godmother' built future on dry cleaning skills

24 Nov • 51 reads

article image

Tech Billionaire Disses MBA: Build, Don't Study!

20 Nov • 49 reads

article image