欧美亚洲综合在线一区,精品成人av一区二区三区

Table of Contents

Home

It Industry

Cutting-edge AI models from OpenAI and DeepSeek undergo 'complete collapse' when problems get too difficult, study reveals

Lisa Kudrow

Jul 07, 2025 am 01:02 AM

Cutting-edge AI models from OpenAI and DeepSeek undergo 'complete collapse' when problems get too difficult, study reveals

Artificial intelligence (AI) reasoning models aren't quite as capable as they appear. In reality, their performance breaks down completely when tasks become too complex, according to researchers at Apple.

Reasoning models like Anthropic's Claude, OpenAI's o3, and DeepSeek's R1 are advanced large language models (LLMs) designed to spend more time and computational resources to deliver more accurate responses compared to standard models.

The emergence of these models has led some major tech companies to make new claims that they may be close to achieving artificial general intelligence (AGI), which refers to systems that can surpass humans in most cognitive tasks.

However, a recent paper published on June 7 on Apple's Machine Learning Research website challenges these assertions and delivers a strong rebuttal to competing firms. According to the study, not only do reasoning models fail to demonstrate generalized reasoning ability, but their reasoning capabilities degrade significantly once tasks reach a certain level of complexity.

"Through extensive testing across various puzzles, we show that leading LRMs experience a total accuracy collapse beyond specific complexity thresholds," the researchers noted. "Additionally, they display an unexpected scaling limitation: their reasoning effort increases with problem difficulty up to a point, then diminishes even when sufficient token capacity is available."

LLMs improve by learning from massive amounts of human-generated data. This allows them to generate probabilistic patterns through their neural networks when prompted.

Related: AI frequently 'hallucinates,' but there's a fix

Sign up for the Live Science daily newsletter now. Reasoning models aim to enhance AI precision using a method known as "chain-of-thought." This involves generating multi-step responses that simulate how humans apply logic to solve problems.

This process enables chatbots to review and refine their reasoning, allowing them to handle more challenging tasks with greater accuracy. During chain-of-thought processing, models articulate their logic step-by-step in natural language, making it easier to trace their decision-making process.

Nevertheless, since this approach relies on statistical inference rather than genuine comprehension, chatbots often produce incorrect answers, make things up when they lack information, and sometimes offer strange or even dangerous advice.

An OpenAI technical report revealed that reasoning models are particularly susceptible to hallucinations—more so than regular models—and the issue worsens as models evolve.

For example, when asked to summarize factual information about individuals, the company’s o3 and o4-mini models generated false content 33% and 48% of the time, respectively, compared to just 16% for the earlier o1 model. OpenAI officials admitted they’re unsure why this occurs, stating that "more research is needed to understand the cause of these results."

"We believe the absence of thorough investigations into these issues stems from shortcomings in current evaluation methods," the authors of Apple's new study wrote. "Most existing evaluations center around well-known math and coding benchmarks, which, although useful, often face data contamination and lack controlled experimental conditions across varying complexities. Furthermore, they don’t provide insights into the structure and quality of the reasoning paths generated."

Peeking inside the black box

To better understand these limitations, the researchers tested both generic and reasoning models—including OpenAI's o1 and o3, DeepSeek R1, Anthropic's Claude 3.7 Sonnet, and Google's Gemini—by assigning them four classic puzzles to solve (river crossing, checker jumping, block-stacking, and The Tower of Hanoi). They could vary the puzzle complexity by increasing the number of elements involved.

Cutting-edge AI models from OpenAI and DeepSeek undergo 'complete collapse' when problems get too difficult, study reveals

In low-complexity scenarios, generic models outperformed reasoning models by solving problems without the added computational burden of reasoning chains. As the puzzles grew more complex, reasoning models initially gained an edge—but this advantage vanished entirely under high-complexity conditions, where both types of models saw their performance drop to zero.

Once a critical threshold was crossed, reasoning models allocated fewer tokens (the basic units models use to break down data) to more complex tasks, indicating that they engaged in less reasoning and faced fundamental limits in maintaining long chains of thought. These limitations persisted even when correct solutions were provided.

"When we gave the solution algorithm for the Tower of Hanoi to the models, their performance on this puzzle did not improve," the authors stated. "Moreover, examining the first incorrect move made by the models revealed surprising behavior. For instance, they could successfully complete up to 100 correct moves in the Tower of Hanoi but struggle to complete more than 5 correct moves in the River Crossing puzzle."

The above is the detailed content of Cutting-edge AI models from OpenAI and DeepSeek undergo 'complete collapse' when problems get too difficult, study reveals. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undress AI Tool

Undress images for free

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Grass Wonder Build Guide | Uma Musume Pretty Derby

4 weeks ago By Jack chen

Roblox: 99 Nights In The Forest - All Badges And How To Unlock Them

3 weeks ago By DDD

Uma Musume Pretty Derby Banner Schedule (July 2025)

4 weeks ago By Jack chen

Windows Security is blank or not showing options

4 weeks ago By 下次還敢

RimWorld Odyssey Temperature Guide for Ships and Gravtech

3 weeks ago By Jack chen

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Laravel Tutorial

1597

PHP Tutorial

1488

Related knowledge

Your devices feed AI assistants and harvest personal data even if they’re asleep. Here's how to know what you're sharing. Jul 05, 2025 am 01:12 AM

Like it or not, artificial intelligence has become part of daily life. Many devices — including electric razors and toothbrushes — have become AI-powered," using machine learning algorithms to track how a person uses the device, how the devi

Hurricanes and sandstorms can be forecast 5,000 times faster thanks to new Microsoft AI model Jul 05, 2025 am 12:44 AM

A new artificial intelligence (AI) model has demonstrated the ability to predict major weather events more quickly and with greater precision than several of the most widely used global forecasting systems.This model, named Aurora, has been trained u

Advanced AI models generate up to 50 times more CO₂ emissions than more common LLMs when answering the same questions Jul 06, 2025 am 12:37 AM

The more precisely we attempt to make AI models function, the greater their carbon emissions become — with certain prompts generating up to 50 times more carbon dioxide than others, according to a recent study.Reasoning models like Anthropic's Claude

AI 'hallucinates' constantly, but there's a solution Jul 07, 2025 am 01:26 AM

The major concern with big tech experimenting with artificial intelligence (AI) isn't that it might dominate humanity. The real issue lies in the persistent inaccuracies of large language models (LLMs) such as Open AI's ChatGPT, Google's Gemini, and

Why is AI halllucinating more frequently, and how can we stop it? Jul 08, 2025 am 01:44 AM

The more advanced artificial intelligence (AI) becomes, the more it tends to "hallucinate" and provide false or inaccurate information.According to research by OpenAI, its most recent and powerful reasoning models—o3 and o4-mini—exhibited h

Cutting-edge AI models from OpenAI and DeepSeek undergo 'complete collapse' when problems get too difficult, study reveals Jul 07, 2025 am 01:02 AM

Arrests made in hunt for hackers behind cyber attacks on M&S and Co-op Jul 11, 2025 pm 01:36 PM

The UK’s National Crime Agency (NCA) has arrested four individuals suspected of involvement in the cyber attacks targeting Marks and Spencer (M&S), Co-op, and Harrods.According to a statement, the suspects include two 19-year-old men, a 17-year-o

Post-quantum cryptography is now top of mind for cybersecurity leaders Jul 11, 2025 pm 01:38 PM

Post-quantum cryptography has become a top priority for cybersecurity leaders, yet recent research indicates that some organizations are not treating the threat with the seriousness it demands.Quantum computers will eventually be capable of solving t

See all articles

亚洲国产日韩欧美一区二区三区,精品亚洲国产成人av在线,国产99视频精品免视看7,99国产精品久久久久久久成人热,欧美日韩亚洲国产综合乱

Cutting-edge AI models from OpenAI and DeepSeek undergo 'complete collapse' when problems get too difficult, study reveals

Peeking inside the black box

Hot AI Tools

Undress AI Tool

Undresser.AI Undress

AI Clothes Remover

Clothoff.io

Video Face Swap

Hot Article

Hot Tools

Notepad++7.3.1

SublimeText3 Chinese version

Zend Studio 13.0.1

Dreamweaver CS6

SublimeText3 Mac version

Hot Topics