We Tried to Replace Claude with a Local Critic. Here's Exactly Where It Failed.

Human project reviews are slow. The bottleneck is not judgment — it is context reconstruction. Before you can criticise anything, you spend twenty minutes remembering where you left off. The question we asked: can a local 26B model serve as a recurring adversarial QA critic that catches real problems, not just surfaces obvious gaps? Enter Experiment 009. The Setup Two critics. Same project context. Fixed evaluation schema. No collaboration between runs. ...

June 6, 2026 · 5 min · Nestor