Technical Context
I wouldn't chalk this up to "the model suddenly got dumber." The picture is more down-to-earth: in Claude Code, there had already been officially acknowledged quality regressions due to product settings, not the model weights themselves. Anthropic explicitly stated in spring 2026 that the issue was a reasoning-effort change, a bug where old thinking was lost after idle, and a flawed system prompt.
Now, Ultracode. I see the same trap many stumble into: it was pitched as simply the "most powerful thinking level," while in fact it's closer to an orchestration mode. That means it involves not just reasoning but a dynamic workflow with subagents, and for AI integration into work processes, that's a completely different class of behavior.
This is what creates the odd effect: on a routine task, the mode starts overcomplicating its own life. Instead of following instructions linearly, it builds a tree of checks, branching, and delegation. As a result, I don't get "smarter," I get "noisier": context gets smeared, steps get lost, and the sequence drifts.
The most telling symptom isn't in the answer, but in the session telemetry. If the mode spins up 20, 30, or 50+ subagents for a small code review, that's not magic—it's architectural overkill. And yes, in that scenario, the daily limit evaporates before your eyes.
That's why the advice from the community sounds reasonable: you shouldn't compare "Opus 4.8 is bad" but rather Max versus Ultracode on the same task. It's quite possible that for most everyday scenarios, Max provides more stable AI integration because it doesn't drag along unnecessary orchestration.
What This Means for Business and Automation
If I'm building AI automation for production, I wouldn't make this mode the default. It's good where genuine parallel decomposition is needed: a large code audit, multi-file migration, or complex verification.
Who wins? Teams with rare, heavy tasks where the cost of an error is higher than the token cost. Who loses? Anyone running regular reviews, fixes, and routine chains through this mode.
Money-wise, it's simple: extra subagents hit your limits, and missed instructions hit your engineers' time. I usually treat such issues not with "faith in a new mode" but with sound AI architecture: picking the mode by task class, limiting orchestration, and setting explicit stop rules for the agent.
If your Claude Code has already started burning limits while losing steps, I'd look at the workflow itself, not just the model. At Nahornyi AI Lab, we're exactly about dissecting these bottlenecks: where one strong agent is enough, where automation with AI is needed, and where it's best not to touch Ultracode at all—so your business doesn't pay for chaos.