TechBooky AI Assistant
TechBooky AI Assistant
👋 Welcome to TechBooky AI Assistant

I can help with:
🔎 Tech News
🤖 AI Topics
💻 Gadgets
☁️ Cloud
✍️ Guest Posts
📢 Advertising
🔗 Backlinks
📩 Newsletter
  • AI Search
  • Cryptocurrency
  • Earnings
  • Enterprise
  • About TechBooky
  • Submit Article
  • Advertise Here
  • Contact Us
TechBooky
  • African
  • AI
  • Metaverse
  • Gadgets
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
  • African
  • AI
  • Metaverse
  • Gadgets
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
TechBooky
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
Home Artificial Intelligence

Apple Warns AI Models Struggle with Complex Problem-Solving

Paul Balo by Paul Balo
June 9, 2025
in Artificial Intelligence
Share on FacebookShare on Twitter

Researchers look at the advantages and disadvantages of freshly available reasoning models in a study published by Apple on Saturday. These models, also referred to as large reasoning models (LRMs), “think” by using more computation to resolve challenging issues. Nevertheless, the study discovered that a complexity problem plagues even the most potent models. Instead of using more computation, as the models are trained to do, researchers found that when an issue is extremely complicated, the models completely collapse and give up on it.

Researchers claim that when faced with three regimes of complexity, both LRMs and large language models (LLMs) without thinking capability behave differently in a paper titled “The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity,” which was posted on Apple’s website.

Low, medium, and high complexity problems are the three complexity regimes that have been discussed in the study. The researchers chose to employ a number of puzzles that can increase in difficulty in order to examine how LLMs and LRMs perform while handling a broad range of complications. The Tower of Hanoi was one such puzzle.

Three pegs and multiple disks make up the Tower of Hanoi, a mathematical puzzle. To form a pyramid, disks are stacked in decreasing order of size. The goal of the puzzle is to move each disk one at a time from the leftmost peg to the rightmost peg. The catch is that a larger disk should never be stacked on top of a smaller disk. Children between the ages of six and fifteen are frequently the intended audience for this easy problem.

For this experiment, two reasoning models and their non-reasoning counterparts were selected by Apple researchers. Claude 3.7 Sonnet with Thinking and DeepSeek-R1 were the LRMs selected, and Claude 3.7 Sonnet and DeepSeek-V3 were the LLMs. A maximum of 64,000 tokens per person was allocated to the thinking budget. The experiment’s goal was to verify not only the ultimate accuracy but also the logic accuracy of the methods used to solve the puzzle.

Disk sizes were maintained between four and ten for the medium complexity assignment, whereas up to three disks were added for the low complexity task. Lastly, there were eleven to twenty disks in the high complexity challenge.

In completing the low complexity challenge, the researchers saw that LLMs and LRMs shown equal competence. Given the additional computational budget, reasoning models were able to solve the puzzle more precisely as the complexity grew. However, it was discovered that both models had a total collapse of reasoning when the tasks approached the high complexity zone.

It was also claimed that the same experiment was conducted again with additional models and puzzles, including Blocks World, River Crossing, and Checkers Jumping.

The issues raised by a number of other artificial intelligence (AI) researchers are emphasized in Apple’s study. While reasoning models are capable of generalizing within their distributed datasets, they struggle to “think” when faced with problems that are beyond their scope. They either attempt to discover shortcuts to solve the problem or give up and collapse entirely.

“Established mathematical and coding benchmarks are the main focus of current evaluations, which place an emphasis on final solution accuracy. However, the corporation stated in a post that this evaluation paradigm frequently suffers from data contamination and does not offer insights into the structure and quality of the reasoning traces.

Related Posts:

  • google ai models internal debates
    Google Study Finds Internal Debate Boosts AI Reasoning
  • 0abf4dfc-cac6-42ee-be90-33e6f6229f53
    OpenAI o3 & o4 Mini Models Feature Visual Reasoning
  • nvidia
    DiffUHaul, an AI Tool from Nvidia Research, Enables…
  • GettyImages-1778706504
    Rumour: Microsoft Developing AI Models to Rival OpenAI
  • openai-logo-building-facade
    GPT-OSS Launch Marks OpenAI’s Shift to Open-Weight Models
  • DO3EOFAEMFNYHCIFVH2KMVCOVI
    DeepSeek Update Threatens Google and ChatGPT Dominance
  • Microsoft-datacenter-cold-aisle-server-racks-for-the-AMD-MI300X
    Microsoft Prepares for OpenAI's GPT-5 Launch
  • modelos-ia-resuelven-matematicas-avanzadas-gpt-5-2-futuro-scaled
    Alibaba’s Metis Agent Aims to Fix ‘Trigger‑Happy’ AI…

Discover more from TechBooky

Subscribe to get the latest posts sent to your email.

Tags: AppleLarge Language Modelslarge reasoning modelsllmlrm
Paul Balo

Paul Balo

Paul Balo is the founder of TechBooky and a highly skilled wireless communications professional with a strong background in cloud computing, offering extensive experience in designing, implementing, and managing wireless communication systems.

BROWSE BY CATEGORIES

Receive top tech news directly in your inbox

subscription from
Loading

Freshly Squeezed

  • Elon Musk Hits $1.1 Trillion as SpaceX Surpasses $2 Trillion Valuation June 13, 2026
  • SpaceX Prices Record $75 Billion IPO as Elon Musk Nears Trillionaire Status June 12, 2026
  • DoorDash Launches AI Chatbot for Food Orders June 12, 2026
  • Pool Launches App That Makes Screenshots More Useful June 12, 2026
  • Deezer Launches Tool to Detect AI-Generated Music June 12, 2026
  • Coinbase Introduces Platform for Agents to Trade Assets and Buy Premium Insights June 12, 2026
  • Meta Expands Edits App With AI Features and Desktop Access June 12, 2026
  • Ready-made LMS and custom development. Pros and cons of each path. June 11, 2026
  • TELCOs Pay 75 Million Users For Poor Network Service June 10, 2026
  • Anthropic Launches Claude Fable 5, Bringing Mythos-Class AI to the Public June 10, 2026
  • Discord Data Breach Reportedly Impacts Over 10 Million Users June 10, 2026
  • TikTok Removed Four Million Videos & Disrupted 86,000 LIVE Sessions In Nigeria June 10, 2026

Browse Archives

June 2026
MTWTFSS
1234567
891011121314
15161718192021
22232425262728
2930 
« May    

Quick Links

  • About TechBooky
  • Advertise Here
  • Contact us
  • Submit Article
  • Privacy Policy
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
  • African
  • Artificial Intelligence
  • Gadgets
  • Metaverse
  • Tips
  • AI Search
  • About TechBooky
  • Advertise Here
  • Submit Article
  • Contact us

© 2025 Designed By TechBooky Elite

Discover more from TechBooky

Subscribe now to keep reading and get access to the full archive.

Continue reading

We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.