June 20, 2025


🧠 The Illusion of AI “Thinking” (and Why It Matters)

TL;DR

  • Apple’s latest study shows reasoning LLMs collapse under high complexity—blowing apart the myth of “thinking” AI.
  • Real-world mortgage use cases — like doc processing, templated comms, and standardized QC rules — play to AI strengths.
  • Avoid relying on AI for nuanced underwriting or edge-case compliance.
  • Best strategy: accelerate adoption in low-risk zones, add oversight everywhere, and keep complexity-critical decisions human-centric.

What Apple Discovered

Apple researchers conducted experiments using unique puzzle challenges—like Tower of Hanoi, River Crossing, Checker Jumping, and Block Stacking—to test sophisticated AI models designed to exhibit “reasoning” behavior. These large reasoning models (LRMs), such as OpenAI o3, Anthropic Claude 3.7 “Thinking,” DeepSeek-R1, and Gemini, were pitted against simpler non-reasoning LLMs.

The findings were startling:

  • Complexity cliff: At lower complexity levels, non-reasoning LLMs sometimes outperformed reasoning models.
  • Sweet spot: In medium-complexity problems, reasoning models held an edge—only then.
  • Complete collapse: Faced with high-complexity puzzles, reasoning models often quit—refusing to think, even though they still had computational resources.

This demonstrated a fundamental weakness: current LLMs show “illusions of thinking” rather than genuine, scalable reasoning. They can mimic chain-of-thought but collapse when the problem demands true compositional thinking.

Why This Isn’t Just Another AI Critique

These extensive tests run by Apple very clearly show that LLMs have an upper limit on their ability to “reason”…at least for now. In fact, the tests applied problems solved by software decades ago yet the LLMs could not match their “reasoning” abilities.

  • Problems about complex logic: These weren’t surface-level tests—they required abstract reasoning.
  • Models “knew” their limits: They actually reduced effort when overwhelmed—an inverse to human problem-solving.
  • Supplying algorithms didn’t help: Telling models “this is how you solve it” didn’t rescue performance.

Industry and Expert Reactions

AI experts like Gary Marcus called the findings a “knockout blow” for the AGI dream—arguing that LLMs are pattern-matchers, not thinkers. Steven Sinofsky cautioned against anthropomorphizing AI: “It’s better to have a designed machine that works, rather than a human-like one that trips.” Reactions converged: LLM hype merits skepticism, especially when complexity exceeds a certain threshold.

The expert analysis of Apple’s study puts this all in context, indicating that we are likely headed for another “AI Winter” as the reality of what LLMs can achieve tries to catch-up with the hype. And they all but destroy the notion that we are nearing Artificial General Intelligence (AGI).

❄️ Sidebar: What Is an AI Winter?

An AI Winter refers to a period when enthusiasm and funding for artificial intelligence significantly decline—often due to underwhelming results, overpromised capabilities, or failures to meet commercial expectations.

Key Characteristics:
Investor Pullback: Funding dries up due to disappointing performance or inflated expectations not being met.
Research Slowdown: Universities and labs deprioritize AI projects as interest wanes.
Public Skepticism: Media and public opinion shift from hype to doubt, often ridiculing AI’s failed promises.
Layoffs and Pivoting: Companies shut down AI divisions or pivot toward more traditional tech.

Notable AI Winters:
First AI Winter (1970s): Caused by failure of symbolic AI (e.g., logic-based systems) to handle real-world complexity.
Second AI Winter (late 1980s–1990s): Triggered by the collapse of expert systems and unmet expectations in machine learning.


Could It Happen Again?
Possibly—but today’s AI has broader commercial use, like language models, image recognition, and robotic process automation. Still, if reasoning models continue to fail at complexity, we could see a funding and credibility contraction focused on “AGI” or advanced cognitive AI.


Lesson: Stay grounded. Use AI for what it does well—and don’t overinvest in the illusion of “thinking machines.”


đź›  What This Means for Mortgage Executives

Reality Check: LLMs Are Powerful—But Limited

  • Great for patterns: ChatGPT-style models excel at summarization, templated tasks, document analysis, customer chatbots, even underwriting assistance—but only when tasks aren’t overly complex.
  • Weak for novel logic: They fail at tasks requiring deep, unseen reasoning—e.g., unusual edge-case scenarios, comp structure nuance, rare compliance exceptions.

How to Proceed: The Middle Path

  1. Adopt Narrow AI Now
    • Document processing: Automate W-2 interpretation, verification, and simple extraction.
    • Comms support: Assist loan officers by drafting follow-ups, notifications, or borrower engagement notes.
    • Compliance checks: Detect mismatches in forms where the rules are known and repeatable.
  2. De‑Risk Complex Use Cases
    • Be cautious with AI for underwriter-level decisions, pricing analytics that require nuanced judgment, or portfolio stress scenarios that go beyond routine logic.
    • Keep subject matter experts “in the loop” for higher-order logic and edge-case analysis.
  3. Invest in Verification
    • Enable systems that flag when AI reasoning confidence is low (e.g., signature mismatches, unusual fee structures, data anomalies).
    • Build oversight into your workflows: human review for any conclusion where model confidence falls below a set threshold.
  4. Use Hybrid Models
    • Consider combining symbolic rule engines and domain-specific logic with LLMs. The latter excels in processing natural language and documents; the former provides deterministic reasoning where consistency matters.
  5. Emphasize Explainability
    • “Show your work” is more than a slogan. Use models that expose their confidence, intermediate steps, or logics, so you can audit and understand decisions.
  6. Stay Tactical, Not Tactical-Grasp for AGI
    • The lauded technology isn’t false—it’s just bounded. Use AI where it’s reliable; avoid relying on it where complexity risks “collapse.” Civilization isn’t built on narrow thinking alone—but not on general intelligence either.

âś… Recommendations: AI Adoption Strategy for Mortgage Executives

AreaAI RoleComplexity RiskRecommended Action
Document Ingestion & QCAuto-extracting data, flagging mismatchesLow–Med✔ Highly effective—scale fast
Borrower CommunicationAuto-drafts, sentiment analysisLow–Med✔ Great fit, especially with human review layers
Pricing/UnderwritingEdge-case analysis, unique scenario pricingMed–High⚠ Limit—AI can support, not replace expertise
Compliance / RegulationStandard rule automation, form completeness checksLow–Med✔ Great candidate especially with oversight
Strategic Decision-MakingRisk modeling, scenario planningHigh✖ Avoid—too risky and legally sensitive

🎯 Final Takeaways

  1. Be Aggressive — Where It Works
    Boost productivity with AI in clear, repetitively structured tasks—like intake, extraction, document classification, and templated communication—where reasoning models stay well within safe bounds.
  2. Stay Cautious — Where It Doesn’t
    Keep underwriting, novel compliance issues, and pricing decisions firmly in human hands where deeper reasoning and domain knowledge matter.
  3. Build for the AI–Human Future
    The winning approach blends AI tools to boost human workflow, not replace it. Automate weighable rules, then present results clearly—so your team can make well-informed decisions.
  4. Monitor for “Reasoning Collapse”
    Use confidence thresholds, log analysis, and alerts to flag when AI “gives up”—particularly around late-edge loan scenarios or borrower complexity.

đź”® The Path Ahead: Steady, Smart AI Integration

Apple’s findings serve as a potent reminder: LLMs are impressive, but not omniscient. For mortgage leaders, the AI imperative is clear—but so is the caution. Make smart, scaffolded investments. Lean into the clear wins, build oversight for everything else, and wait for the next wave of dependable, deeper AI reasoning before banking decisions on it.

In short: Adopt fast where it empowers, pause where it exposes, and always keep humans at the helm when complexity demands.


Leave a Comment

Your email address will not be published. Required fields are marked *

Facebook
Twitter
LinkedIn