Apple Study Reveals AI Models Face Accuracy Collapse in Complex Tasks

June 11, 2025
Apple Study Reveals AI Models Face Accuracy Collapse in Complex Tasks

A recent study published by Apple researchers has unveiled significant limitations in advanced artificial intelligence (AI) models, particularly in their performance on complex problem-solving tasks. The paper, released on June 9, 2025, indicates that large reasoning models (LRMs) experience a 'complete accuracy collapse' when confronted with intricate challenges, raising concerns about their viability in achieving artificial general intelligence (AGI).

The research, conducted by a team at Apple, highlights that standard AI models outperform LRMs in low-complexity scenarios, while both types of models struggle significantly with high-complexity tasks. According to the paper, while LRMs are designed to tackle complex queries by deconstructing problems into smaller, manageable parts, they paradoxically reduce their reasoning effort as the complexity of the problem increases. This alarming trend suggests a fundamental scaling limitation in their cognitive capabilities.

Gary Marcus, a noted AI scholar and co-founder of Geometric Intelligence, described the findings from the Apple paper as 'pretty devastating.' In his commentary, Marcus emphasized that these insights challenge the prevailing assumptions regarding the path to AGI, stating, 'Anyone who thinks LLMs are a direct route to AGI is kidding themselves.' His concerns are echoed by Andrew Rogoyski of the Institute for People-Centred AI at the University of Surrey, who suggested that the industry might be at a 'cul-de-sac' in its current approaches to AI development.

The study tested various models, including OpenAI's o3, Google's Gemini Thinking, Anthropic's Claude 3.7 Sonnet-Thinking, and DeepSeek-R1, using puzzles like the Tower of Hanoi and River Crossing to evaluate their reasoning abilities. The results indicated that as the complexity of the tasks increased, the models not only failed to provide correct solutions but also wasted computational resources by exploring incorrect paths before arriving at any valid answers.

The Apple researchers noted that this 'critical threshold' leading to accuracy collapse is particularly troubling, as it undermines the foundational goals of AI development aimed at creating systems capable of human-like reasoning. They remarked, 'These insights challenge prevailing assumptions about LRM capabilities and suggest that current approaches may be encountering fundamental barriers to generalisable reasoning.'

The implications of this study extend beyond technical limitations; they pose significant questions about the future of AI and its potential to transform society. With the race towards AGI becoming increasingly scrutinized, experts urge a reevaluation of current methodologies and goals within the AI industry.

In conclusion, while advancements in AI continue to progress, the revelations from Apple's study serve as a stark reminder of the challenges that lie ahead. As the industry grapples with these findings, the quest for a truly intelligent machine capable of matching human cognitive abilities remains fraught with uncertainty and complexity.

Advertisement

Fake Ad Placeholder (Ad slot: YYYYYYYYYY)

Tags

Appleartificial intelligenceAI modelslarge reasoning modelsaccuracy collapseGary Marcusartificial general intelligenceAGImachine learningOpenAIGoogleAnthropicDeepSeekcomplex problem solvingcomputational resourcesresearch studytechnology limitationsAI challengesUniversity of SurreyInstitute for People-Centred AItechnology industrycognitive capabilitiesreasoning modelsAI developmentpuzzle solvingTower of HanoiRiver Crossing puzzlecomputational intelligenceAI ethicsfuture of AI

Advertisement

Fake Ad Placeholder (Ad slot: ZZZZZZZZZZ)