• Archives
  • Cryptocurrency
  • Earnings
  • Enterprise
  • About TechBooky
  • Submit Article
  • Advertise Here
  • Contact Us
TechBooky
  • African
  • AI
  • Metaverse
  • Gadgets
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
  • African
  • AI
  • Metaverse
  • Gadgets
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
TechBooky
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
Home Artificial Intelligence

Apple Warns AI Models Struggle with Complex Problem-Solving

Paul Balo by Paul Balo
June 9, 2025
in Artificial Intelligence
Share on FacebookShare on Twitter

Researchers look at the advantages and disadvantages of freshly available reasoning models in a study published by Apple on Saturday. These models, also referred to as large reasoning models (LRMs), “think” by using more computation to resolve challenging issues. Nevertheless, the study discovered that a complexity problem plagues even the most potent models. Instead of using more computation, as the models are trained to do, researchers found that when an issue is extremely complicated, the models completely collapse and give up on it.

Researchers claim that when faced with three regimes of complexity, both LRMs and large language models (LLMs) without thinking capability behave differently in a paper titled “The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity,” which was posted on Apple’s website.

Low, medium, and high complexity problems are the three complexity regimes that have been discussed in the study. The researchers chose to employ a number of puzzles that can increase in difficulty in order to examine how LLMs and LRMs perform while handling a broad range of complications. The Tower of Hanoi was one such puzzle.

Three pegs and multiple disks make up the Tower of Hanoi, a mathematical puzzle. To form a pyramid, disks are stacked in decreasing order of size. The goal of the puzzle is to move each disk one at a time from the leftmost peg to the rightmost peg. The catch is that a larger disk should never be stacked on top of a smaller disk. Children between the ages of six and fifteen are frequently the intended audience for this easy problem.

For this experiment, two reasoning models and their non-reasoning counterparts were selected by Apple researchers. Claude 3.7 Sonnet with Thinking and DeepSeek-R1 were the LRMs selected, and Claude 3.7 Sonnet and DeepSeek-V3 were the LLMs. A maximum of 64,000 tokens per person was allocated to the thinking budget. The experiment’s goal was to verify not only the ultimate accuracy but also the logic accuracy of the methods used to solve the puzzle.

Disk sizes were maintained between four and ten for the medium complexity assignment, whereas up to three disks were added for the low complexity task. Lastly, there were eleven to twenty disks in the high complexity challenge.

In completing the low complexity challenge, the researchers saw that LLMs and LRMs shown equal competence. Given the additional computational budget, reasoning models were able to solve the puzzle more precisely as the complexity grew. However, it was discovered that both models had a total collapse of reasoning when the tasks approached the high complexity zone.

It was also claimed that the same experiment was conducted again with additional models and puzzles, including Blocks World, River Crossing, and Checkers Jumping.

The issues raised by a number of other artificial intelligence (AI) researchers are emphasized in Apple’s study. While reasoning models are capable of generalizing within their distributed datasets, they struggle to “think” when faced with problems that are beyond their scope. They either attempt to discover shortcuts to solve the problem or give up and collapse entirely.

“Established mathematical and coding benchmarks are the main focus of current evaluations, which place an emphasis on final solution accuracy. However, the corporation stated in a post that this evaluation paradigm frequently suffers from data contamination and does not offer insights into the structure and quality of the reasoning traces.

Related Posts:

  • 0abf4dfc-cac6-42ee-be90-33e6f6229f53
    OpenAI o3 & o4 Mini Models Feature Visual Reasoning
  • GettyImages-1778706504
    Rumour: Microsoft Developing AI Models to Rival OpenAI
  • nvidia
    DiffUHaul, an AI Tool from Nvidia Research, Enables…
  • openai-logo-building-facade
    GPT-OSS Launch Marks OpenAI’s Shift to Open-Weight Models
  • DO3EOFAEMFNYHCIFVH2KMVCOVI
    DeepSeek Update Threatens Google and ChatGPT Dominance
  • Microsoft-datacenter-cold-aisle-server-racks-for-the-AMD-MI300X
    Microsoft Prepares for OpenAI's GPT-5 Launch
  • 1_zJIuoKQtvIUyJmaQrVK9KQ
    Understanding the Atom of Thoughts Prompting Technique
  • 2-1758799815688
    Microsoft Integrates Anthropic’s Claude AI Into Copilot

Discover more from TechBooky

Subscribe to get the latest posts sent to your email.

Tags: AppleLarge Language Modelslarge reasoning modelsllmlrm
Paul Balo

Paul Balo

Paul Balo is the founder of TechBooky and a highly skilled wireless communications professional with a strong background in cloud computing, offering extensive experience in designing, implementing, and managing wireless communication systems.

BROWSE BY CATEGORIES

Receive top tech news directly in your inbox

subscription from
Loading

Freshly Squeezed

  • Netflix Closes Its Mobile Squid Game Studio October 25, 2025
  • TikTok and Meta Breach EU Transparency Rules October 25, 2025
  • Instagram Tests AI-Powered Text Restyling Tool October 25, 2025
  • Gemini October Update Brings Veo 3.1, Flash 2.5, and Canvas October 25, 2025
  • OpenAI Adds Company Knowledge Search to ChatGPT October 24, 2025
  • WhatsApp Tests Storage Controls for Individual Chats October 24, 2025

Browse Archives

October 2025
MTWTFSS
 12345
6789101112
13141516171819
20212223242526
2728293031 
« Sep    

Quick Links

  • About TechBooky
  • Advertise Here
  • Contact us
  • Submit Article
  • Privacy Policy
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
  • African
  • Artificial Intelligence
  • Gadgets
  • Metaverse
  • Tips
  • About TechBooky
  • Advertise Here
  • Submit Article
  • Contact us

© 2025 Designed By TechBooky Elite

Discover more from TechBooky

Subscribe now to keep reading and get access to the full archive.

Continue reading

We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.