Claude Is Hitting Its Rate Limits Faster. Here's Why.

Anthropic has indeed tightened Claude's rate limits during peak hours, and heavy users have noticed. This is crucial for businesses because features like the 1M context window, memory, and agentic workflows now burn through sessions faster. This change demands more careful AI architecture and better token management for reliable automation.

What Exactly Did Anthropic Tweak in Claude?

Instead of rumors, I dug into discussions and official statements from Anthropic, and the picture is quite straightforward: the limits on Claude.ai haven't disappeared in March 2026, but they have been significantly tightened during peak hours. We're talking about 5-hour session limits, not the clean, transparent TPM (Tokens Per Minute) many are used to with the API.

The peak window is clearly defined: weekdays, 8 AM–2 PM ET. During this time, some users, especially those on the Pro plan, have started hitting the ceiling much earlier. Anthropic itself states that about 7% of its audience is affected, but if you're running Claude Code, agentic chains, and long conversations, your chances of being in that 7% are pretty high.

I wouldn't chalk it all up to 'platform greed.' Several factors are at play here.

Agentic workflows in Claude Code perform many hidden steps.
The 1M context window encourages keeping too much junk in the session.
The memory feature adds another persistent layer of tokens.
On Claude.ai, the limit feels like a shared session resource rather than clear per-request pricing.

This explains the strange feeling people have: you're working 'just like before,' but the limit evaporates noticeably faster. It's particularly frustrating with Opus and on tasks where the agent navigates files, reformulates steps, and runs long reasoning cycles.

There was also a temporary bonus for off-peak hours until March 28, where usage was partially doubled. But that was a promotion, not the new normal. If you're reading this after March 28, 2026, take it as a signal: the more generous era is over, and we have to live in a new reality.

Why 1M Context and Memory Are More Costly Than They Seem

What bothers me most here isn't the rate limit itself, but how people design their interactions with the model. A 1M context window sounds like a dream, but in practice, it often just gives you permission not to clean up after yourself.

If a session is holding 150,000-200,000 tokens, every new turn becomes more expensive. And if memory is enabled on top of that, the model also pulls in saved facts. Technically, it's convenient. In reality, you can get a silent budget leak where the context doesn't seem huge, but the session burns out in a flash.

I'd put it more bluntly: a large context window without discipline is almost always worse than a proper AI solution architecture with buffering, summarization, and breaking down tasks into stages.

What This Means for Business and Automation

For pet projects, this is an annoyance. For businesses, it's an architectural issue.

If your AI automation relies on Claude.ai as a 'manual multi-tool' for your team, sudden limits break the workflow. A developer or analyst hits the cap, the agent stops, and the process stalls. Internally, this doesn't look like a pricing problem but a drop in productivity.

The winners are those who already separate their usage modes. They move heavy tasks to the API, batch their processing, clean the context, disable memory where it's not needed, and don't force a single model to handle the entire pipeline. The losers are those building AI implementations with the mindset that 'the model is smart, it'll figure it out'.

At Nahornyi AI Lab, we deal with these kinds of issues regularly in practice: sometimes, rewriting a prompt and enforcing strict summarization every N steps is enough, but other times, we need to completely overhaul the AI architecture and move heavy agentic tasks from a subscription UI to a proper backend setup.

What I would check right now:

Disable memory in token-heavy scenarios.
Check the actual size of the active context, not just an estimate.
Separate interactive work from background processing.
Shift heavy runs to off-peak hours.
Budget for a cost increase, even if new pricing hasn't been announced.

And yes, the talk about future $400-500 subscriptions is still just talk for now. But the direction is clear: the more powerful the models and the longer the context, the more expensive it will be to pretend that tokens are infinite.

This analysis was written by me, Vadim Nahornyi, at Nahornyi AI Lab. I don't just rephrase press releases; I build and implement AI solutions for businesses hands-on, including AI integration, agentic pipelines, and AI-powered automation for real teams.

If your Claude usage is already hitting limits or you want to build AI automation without any budget surprises, get in touch. We can review your case and build a reliable system together.

Share this article

Twitter/X LinkedIn Telegram

Claude Is Hitting Its Rate Limits Faster. Here's Why.

What Exactly Did Anthropic Tweak in Claude?

Why 1M Context and Memory Are More Costly Than They Seem

What This Means for Business and Automation

More News

Gemma 4 Becomes Significantly More Practical on Edge

364M parameters and a new chance for on-device AI