Skip to content
Business Finance

AI is Only 30% Away From Matching Human-Level General Intelligence on GAIA Benchmark

H2O.ai 3 mins read
  • H2O.ai sets the world record in GAIA Agentic AI benchmark with h2oGPTe
  • H2O.ai beats Microsoft and Google researchers by more than 15 points on GAIA — widely hailed as the ultimate test for real-world intelligence

 


MOUNTAIN VIEW, Calif.--BUSINESS WIRE--

H2O.ai, the leader in open-source Generative AI and the most accurate Predictive AI platforms, today announced that h2oGPTe Agent has secured the #1 position on the GAIA (General AI Assistants) benchmark leaderboard with an unprecedented score of 65% — outperforming Google’s Langfun Agent (49%), Microsoft Research (38%), and Hugging Face (33%) leading entries. This remarkable achievement underscores H2O.ai's dominance in the emerging domain of general-purpose AI agents, setting a new gold standard for the industry.

This press release features multimedia. View the full release here: https://www.businesswire.com/news/home/20241223840924/en/

h2oGPTe Agent Tops GAIA Benchmark Test Results Dec 2024 (Graphic: Business Wire)

h2oGPTe Agent Tops GAIA Benchmark Test Results Dec 2024 (Graphic: Business Wire)

Why GAIA Matters

The GAIA benchmark measures how useful AI systems are in solving real-world tasks that require a lot of time, thought and effort for skilled humans. It consists of hundreds of challenges that require laborious research, data analysis, document handling and reasoning. Degree-holding human respondents achieve a score of 92% and require several human-days to solve all 300 test set problems.

h2oGPTe Agent outpaced competitors by delivering consistent robustness, accuracy and efficiency, highlighting its readiness for enterprise use cases that depend heavily on skilled human assistants.

Enterprise h2oGPTe Agent: A Landmark Achievement

This achievement solidifies H2O.ai’s leadership in the global race to build intelligent, adaptable AI assistants capable of transforming businesses.

Sri Ambati, Founder and CEO of H2O.ai, shared his enthusiasm:

“Today we are announcing that AI is only 30% away from matching human-level general intelligence on the GAIA benchmark. Open-ended questions in GAIA are a better measure of intelligence than MMLU, which relies on multiple choice. To share how exciting this is: the entire Gen AI ecosystem was barely able to pass a tenth in accuracy on one of the toughest AGI benchmarks merely a year ago.

“Makers at H2O.ai built h2oGPTe Agentic AI wielding the best models in the world for reasoning, multi-modal image, video, language understanding, code generation and execution to ace the GAIA benchmark with a stunning 15% accuracy leap over the previous record set by researchers from Google Deepmind using the same Claude-3.5-Sonnet. h2oGPTe Agent also beat Microsoft Research’s agent Magentic-1 that used OpenAI’s o1 model by 27%.

“Agentic AI is eating SaaS and with h2oGPTe Agentic AI now being generally available, all our enterprise customers can solve a wide range of sophisticated business and research problems.”

H2O.ai's success on GAIA underscores its philosophy of simplicity and adaptability:

  • Advanced reasoning and planning for solving complex, real-world tasks
  • Multimodal comprehension across text, images, and audio for seamless context understanding
  • Integration of enterprise tools like Python execution and DriverlessAI for predictive analytics and decision-making

H2O.ai’s win reaffirms its leadership in AI innovation, particularly in agentic systems poised to reshape business workflows.

Enterprise h2oGPTe 1.6 includes the Agent feature and is available on all public clouds, virtual private clouds and for on-premise deployments: https://h2o.ai/platform/enterprise-h2ogpte/

Read technical blog https://h2o.ai/blog/2024/h2o-ai-tops-gaia-leaderboard/

About H2O.ai

Founded in 2012, H2O.ai is at the forefront of the AI movement to democratize Generative AI. H2O.ai’s open-source Generative AI and Enterprise h2oGPTe, combined with Document AI and the award-winning autoML Driverless AI, have transformed more than 20,000 global organizations, and over half of the Fortune 500, including AT&T, Commonwealth Bank of Australia, Chipotle, Singtel, Workday, Progressive Insurance, and AES.

H2O.ai partners include Dell, Deloitte, Ernst & Young (EY), PricewaterhouseCoopers (PwC), NVIDIA, Snowflake, AWS, Google Cloud Platform (GCP) and Microsoft Azure. H2O.ai’s AI for Good program supports nonprofit groups, foundations, and communities in advancing education, healthcare, and environmental conservation. With a vibrant community of 2 million data scientists worldwide, H2O.ai aims to co-create valuable AI applications for all users.

H2O.ai has raised $256 million from investors, including Commonwealth Bank, Nvidia, Goldman Sachs, Wells Fargo, Capital One, Nexus Ventures and New York Life.


Contact details:

Media Contact

Betty Candel
VP GTM
betty.candel@h2o.ai

Media

More from this category

  • Business Finance
  • 20/12/2024
  • 08:56
Craigs Investment Partners

Craigs Announces Strategic Partnership with TA Associates

TAURANGA, New Zealand–BUSINESS WIRE– Craigs Investment Partners (“Craigs” or “the Firm”), a leading wealth management firm in New Zealand, today announced that TA Associates…

  • Contains:
  • Business Finance
  • 20/12/2024
  • 04:56
KnowBe4

Egress, a KnowBe4 Company, Named a Leader in Gartner® First Magic Quadrant(TM) for Email Security Platforms

Egress, a KnowBe4 company, recognized for its Ability to Execute and Completeness of Vision in the 2024 Magic Quadrant for Email Security Platforms LONDON--BUSINESS WIRE-- Egress, a KnowBe4 company, the first provider of adaptive AI-enabled email security, today announced it has been recognized as a Leader in 2024 Gartner Magic Quadrant for Email Security Platforms. Egress has been recognized for its Ability to Execute and Completeness of Vision in this report. We see a recent rise of advanced technology to address sophisticated inbound phishing attacks and behavior-led outbound data breaches has driven significant innovation in email security. Egress is proud…

  • Business Finance
  • 20/12/2024
  • 02:11
Interactive Brokers Group, Inc.

Interactive Brokers Enhances Web-Based Trading Experience for Financial Advisors Globally

Enhancements Enable Financial Advisors to Streamline Workflows and Engage Clients More Effectively GREENWICH, Conn.–BUSINESS WIRE– Interactive Brokers (Nasdaq: IBKR), an automated global electronic broker,…

  • Contains:

Media Outreach made fast, easy, simple.

Feature your press release on Medianet's News Hub every time you distribute with Medianet. Pay per release or save with a subscription.