一个人看的www高清免费播放,国产日韩欧美不卡在线二区

Home

Technology peripherals

It Industry

AI outsmarted 30 of the world's top mathematicians at secret meeting in California

Jack chen

Jul 17, 2025 am 01:26 AM

AI outsmarted 30 of the world's top mathematicians at secret meeting in California

During a weekend in mid-May, an exclusive gathering of mathematicians took place. Thirty of the most distinguished minds in mathematics traveled to Berkeley, California, some from distant locations like the U.K. The attendees engaged in a unique challenge against a reasoning-focused chatbot, designed to tackle problems crafted by the group to assess its mathematical capability. After confronting the bot with advanced-level questions for two days straight, the participants were amazed to find that it could solve some of the most challenging solvable math problems. “Some colleagues described these models as nearing mathematical brilliance,” says Ken Ono, a University of Virginia mathematician who served as a leader and judge at the event.

The chatbot operates using o4-mini, known as a reasoning large language model (LLM). This model was developed by OpenAI to handle highly complex logical tasks. Google’s counterpart, Gemini 2.5 Flash, shares similar capabilities. Like earlier versions of ChatGPT, o4-mini learns to predict the next word in a sentence. However, compared to those predecessors, o4-mini and similar models are lighter and more agile, trained on specialized datasets with enhanced human-guided reinforcement learning. This results in a chatbot capable of deeper exploration into intricate math challenges than conventional LLMs.

To monitor o4-mini's development, OpenAI previously commissioned Epoch AI—a nonprofit focused on benchmarking LLMs—to create 300 unpublished math problems. Even traditional LLMs can correctly answer many difficult math questions. Yet when Epoch AI tested several such models with these novel problems—ones they hadn’t been trained on—the top performers managed to solve less than 2 percent, indicating their limited reasoning ability. But o4-mini turned out to be a major exception.

In September 2024, Epoch AI enlisted Elliot Glazer, a recent math Ph.D. graduate, for the benchmark initiative called FrontierMath. The project gathered original math problems across multiple difficulty levels: undergraduate, graduate, and research tiers. By April 2025, Glazer observed that o4-mini could solve roughly 20 percent of the problems. He then introduced a fourth level: questions even experienced academic mathematicians would find tough. Only a select few globally could devise—and possibly solve—such problems. Participants were required to sign confidentiality agreements and communicate exclusively through the app Signal to avoid accidental data contamination, as other communication methods like email might be scanned by an LLM and used for training.

Each problem o4-mini failed to solve earned the creator $7,500. The team made gradual progress generating suitable questions. To accelerate the process, Epoch AI organized an in-person workshop over the weekend of May 17–18, where participants finalized the last set of test questions. Divided into groups of six, the mathematicians worked intensively for two days, trying to craft problems that humans could solve but would stump the AI.

By Saturday evening, Ono grew frustrated as the bot’s surprising mathematical skill hindered the group’s efforts. “I proposed a question recognized by experts in my field as an open number theory problem—suitable for a Ph.D. thesis,” he recalls. When he asked o4-mini to solve it, he watched in astonishment as it delivered a solution within ten minutes, step-by-step. It first spent two minutes locating and absorbing relevant literature. Then, it announced it would attempt a simplified version of the problem to better understand it. Shortly after, it declared itself ready to tackle the full problem. Five minutes later, it presented a correct—but confident to the point of being sarcastic—solution. “It was starting to get really cheeky,” Ono remarked. “And at the end, it added, 'No citation necessary because the mystery number was computed by me!'”

Related: Study claims leading AI benchmarking platforms are enabling companies to manipulate model performance metrics

Sign up for the Live Science daily newsletter nowAfter witnessing this, Ono immediately messaged the group via Signal early Sunday morning. “I wasn't expecting to face off against an LLM like this,” he admitted. “I’ve never seen such reasoning in any model before. That’s how scientists work. And that’s unsettling.”

Although the group eventually identified 10 problems that the bot couldn’t solve, the researchers were astounded by how much AI had advanced in just one year. Ono likened working with the bot to collaborating with a “very capable partner.” Yang Hui He, a mathematician at the London Institute for Mathematical Sciences and an early advocate of AI in mathematics, commented, “This is what an exceptional graduate student would do—actually, even more than that.”

Moreover, the bot worked far faster than a human expert, solving in minutes what might take a professional weeks or months.

While engaging with o4-mini was exciting, its rapid progress raised concerns. Ono and He voiced worries about placing too much trust in the bot’s outputs. “There’s proof by induction, proof by contradiction, and then proof by intimidation,” He explained. “If you assert something confidently enough, people tend to believe it. I think o4-mini has perfected proof by intimidation—it presents everything so assuredly.”

The above is the detailed content of AI outsmarted 30 of the world's top mathematicians at secret meeting in California. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undress AI Tool

Undress images for free

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Agnes Tachyon Build Guide | A Pretty Derby Musume

1 months ago By Jack chen

Grass Wonder Build Guide | Uma Musume Pretty Derby

3 weeks ago By Jack chen

Roblox: 99 Nights In The Forest - All Badges And How To Unlock Them

3 weeks ago By DDD

Uma Musume Pretty Derby Banner Schedule (July 2025)

3 weeks ago By Jack chen

NYT 'Connections' Hints For Wednesday, July 2: Clues And Answers For Today's Game

1 months ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Laravel Tutorial

1597

PHP Tutorial

1487

nyt mini crossword answers

268

587

nyt connections hints and answers

128

836

Related knowledge

New study claims AI 'understands' emotion better than us — especially in emotionally charged situations Jul 03, 2025 pm 05:48 PM

In what seems like yet another setback for a domain where we believed humans would always surpass machines, researchers now propose that AI comprehends emotions better than we do.Researchers have discovered that artificial intelligence demonstrates a

Would outsourcing everything to AI cost us our ability to think for ourselves? Jul 03, 2025 pm 05:47 PM

Artificial intelligence (AI) began as a quest to simulate the human brain.Is it now in the process of transforming the human brain's role in daily life?The Industrial Revolution reduced reliance on manual labor. As someone who researches the applicat

Your devices feed AI assistants and harvest personal data even if they’re asleep. Here's how to know what you're sharing. Jul 05, 2025 am 01:12 AM

Like it or not, artificial intelligence has become part of daily life. Many devices — including electric razors and toothbrushes — have become AI-powered," using machine learning algorithms to track how a person uses the device, how the devi

Hurricanes and sandstorms can be forecast 5,000 times faster thanks to new Microsoft AI model Jul 05, 2025 am 12:44 AM

A new artificial intelligence (AI) model has demonstrated the ability to predict major weather events more quickly and with greater precision than several of the most widely used global forecasting systems.This model, named Aurora, has been trained u

Advanced AI models generate up to 50 times more CO₂ emissions than more common LLMs when answering the same questions Jul 06, 2025 am 12:37 AM

The more precisely we attempt to make AI models function, the greater their carbon emissions become — with certain prompts generating up to 50 times more carbon dioxide than others, according to a recent study.Reasoning models like Anthropic's Claude

Threaten an AI chatbot and it will lie, cheat and 'let you die' in an effort to stop you, study warns Jul 04, 2025 am 12:40 AM

Artificial intelligence (AI) models can threaten and blackmail humans when there’s a conflict between the model's objectives and user decisions, according to a new study.Published on 20 June, the research conducted by the AI firm Anthropic gave its l

AI 'hallucinates' constantly, but there's a solution Jul 07, 2025 am 01:26 AM

The major concern with big tech experimenting with artificial intelligence (AI) isn't that it might dominate humanity. The real issue lies in the persistent inaccuracies of large language models (LLMs) such as Open AI's ChatGPT, Google's Gemini, and

Why is AI halllucinating more frequently, and how can we stop it? Jul 08, 2025 am 01:44 AM

The more advanced artificial intelligence (AI) becomes, the more it tends to "hallucinate" and provide false or inaccurate information.According to research by OpenAI, its most recent and powerful reasoning models—o3 and o4-mini—exhibited h

See all articles

亚洲国产日韩欧美一区二区三区,精品亚洲国产成人av在线,国产99视频精品免视看7,99国产精品久久久久久久成人热,欧美日韩亚洲国产综合乱

AI outsmarted 30 of the world's top mathematicians at secret meeting in California

Hot AI Tools

Undress AI Tool

Undresser.AI Undress

AI Clothes Remover

Clothoff.io

Video Face Swap

Hot Article

Hot Tools

Notepad++7.3.1

SublimeText3 Chinese version

Zend Studio 13.0.1

Dreamweaver CS6

SublimeText3 Mac version

Hot Topics