Claude Opus Slows Down: How It Affects LLM Selection

Users are heavily complaining about the slow performance of Claude Opus in research and complex coding scenarios. Meanwhile, the new Grok 4.20 model is perceived as a significantly faster alternative. For businesses, this is a critical moment: it changes approaches to LLM selection, SLA frameworks, and overall AI automation architecture.

Technical Context: I Look at the Infrastructure Layer, Not the Noise

I do not see this as news about a "bad model," but rather a signal about the instability of the inference layer. User feedback points to a simple fact: Claude Opus has started losing in perceived speed during research tasks, while Grok 4.20, according to reviews, has dramatically taken the lead precisely in working pace. For teams, this is more important than beautiful demos because real productivity is measured not by screenshots, but by the time it takes to get a useful response.

I separately checked the confirmed facts regarding Claude. Anthropic has already had documented degradations in 2025 and early 2026: misrouting, inference stack errors, quality drops in Claude Code, and outages that affected Opus, Sonnet, and Haiku. This is a crucial detail: the market too often attributes problems to the "dumbing down of the model," whereas in practice, I regularly see that the root cause lies in routing, rollout procedures, and tool-calling orchestration.

Regarding Grok 4.20 and GLM 5, I do not see a verified technical baseline in the source data at the level of release notes, API metrics, or independent tokens/sec benchmarks. Therefore, I will not replace analytics with rumors. I only note what is present: a strong user signal about the speed of Grok 4.20, the mention of sub-agents, and the opinion that GLM 5 looks better in benchmarks.

For me, the conclusion is straightforward: if a model is integrated into a chain involving research, code, agents, and source verification, I evaluate not a single benchmark, but the combination of latency, stability, tool reliability, and rollback history. This is exactly how the architecture of AI solutions should be built, rather than through the fan club of a specific brand.

Impact on Business and Automation: The Most Predictable Model Wins, Not the Smartest

I have been telling clients an unpleasant but useful truth for a long time: in production, the winner is not the test leader, but the leader in operational predictability. If ChatGPT manages to complete a second research cycle while Opus is still thinking about the first, the cost of each analytical task rises for the business. This immediately impacts the unit economics of AI automation.

Who wins in this situation? Platforms that have better organized their inference pipeline, agent orchestration, and degradation control. Who loses? Companies that tied their AI integration to a single vendor without a backup route, without A/B routing, and without their own metrics for pipeline stages.

In our practice at Nahornyi AI Lab, I almost never design a critical process on a single model. I assemble a cascade: a fast model for initial search and routing, a more expensive one for verification, a separate layer for code tasks, and a fallback in case of a provider degradation. This is exactly how AI integration stops being a lottery.

If the reviews about the speed of Grok 4.20 are confirmed in production, I expect growing interest in it for research automation scenarios, analyst assistants, and agentic systems. However, I would not advise businesses to migrate based on emotion. First, you need to run your own workload: identical prompts, identical tools, identical limits, and measure the time, quality, and cost.

Strategic Outlook: The Market Is Shifting from Choosing the "Best Model" to the "Best Route"

I believe that the main shift in 2026 is the end of the era of the monolithic LLM stack. Today, a model might be strong in reasoning but weak in latency; another might be fast in sub-agents but uneven in complex fact-checking; a third might look beautiful in benchmarks but be inconvenient for enterprise deployment. Therefore, AI development is increasingly turning into a routing problem rather than a single API selection problem.

On Nahornyi AI Lab projects, I already see a repeating pattern. Where a business builds AI automation around model roles—"researcher," "validator," "executor"—the system weathers market fluctuations calmly. Where everything is hung on a single top LLM, any dip turns into an incident for the team and the client.

This is precisely why discussing "Opus is slow, Grok flies, GLM 5 is better on benchs" is not about fandom for me, but about AI architecture. I would currently advise companies to audit their stack around three questions: where is your time bottleneck, where do you lack a backup route, and where are you paying for intelligence that does not convert into results. Usually, right after such an audit, it becomes clear how to make AI automation faster and cheaper.

This analysis was prepared by Vadym Nahornyi — lead expert at Nahornyi AI Lab on AI architecture, AI integration, and AI automation for real businesses. If you want to review your current model stack, design backup routing, or build a resilient system for your process, I invite you to discuss your project with me and the Nahornyi AI Lab team.

Share this article

Twitter/X LinkedIn Telegram

Claude Opus Slows Down: How It Affects LLM Selection

Technical Context: I Look at the Infrastructure Layer, Not the Noise

Impact on Business and Automation: The Most Predictable Model Wins, Not the Smartest

Strategic Outlook: The Market Is Shifting from Choosing the "Best Model" to the "Best Route"

More News

GPT-5.5 Codex Outpaces Claude in Usability

Is Claude Code Slowing Down? Superpowers Might Be the Culprit