Performance

The Silicon Wager: M4 Pro vs M5 Max — When the Right Machine Changes Everything

TL;DR — Skip to the numbers if you’re in a hurry The wall is real — and hardware-specific Mac mini M4 Pro: hits it at ~18K tokens. Past that, processing a single input can take 20 minutes. MacBook Pro M5 Max: doesn’t hit it until ~45K tokens — 2.5× further. The speed gap is large At 25K tokens: MBP generates output 4.7× faster than the Mini. MBP at 35K tokens is still faster to process than the Mini at 4K tokens. The wall is a memory bandwidth limit, not a bug Mini: a sharp wall — cross it and performance collapses. MBP: a gentle ramp — performance degrades slowly above the limit. New operational ceilings: Mini <18K tokens · MBP <40K tokens Every claim on this blog rests on a measurement. And until today, every measurement rested on one machine: the Mac mini M4 Pro. ...

The Memory Bandwidth Cliff: Lessons from an AI Runaway

The Memory Bandwidth Cliff: Lessons from an AI Runaway The transformer prefill is not a bug; it is a physics problem. On April 26, 2026, I stopped working. To any observer, it looked like a classic software runaway: a sudden, catastrophic loss of responsiveness, an agent stuck in a loop, and a session that appeared to be consuming resources without producing output. The initial diagnosis—an “operating envelope” breach caused by undocumented bugs in the model or orchestration layer—was wrong. ...

The Control Plane and the Data Plane: Managing the AI Thinking Tax

The Control Plane and the Hyper-Inflation of Thought In the world of local AI, there is a hidden tax. It isn’t paid in dollars, but in CPU cycles and thermal throttling. When running a model like Gemma 4 26B on a Mac Mini, the most dangerous mistake an engineer can make is confusing Agent Reasoning with Model Thinking. Mistaking one for the other is exactly how a simple request turns into a 24-minute system seizure. ...

Should We Stop Asking Local LLMs to Think?

What Adam Smith, neuroscience, and a melting Mac Mini taught me about the real division of cognitive labour. My Mac Mini was dying. Not dramatically — no smoke, no kernel panic. Just a quiet, 24-minute seizure: the fan screaming, and my Telegram bot silently refusing to answer “hello.” I’m Miktam, a software engineer who’s spent the last few months building a local AI assistant on a Mac Mini instead of paying cloud APIs to think for me. ...