Skip to main content
ai-agentsvercelautomation

Vercel Exposed a Major Flaw in AI Agent Skills

Vercel tested two methods for guiding AI agents and found a clear winner: the constant context of AGENTS.md decisively outperformed dynamic skills on real-world evaluations. This matters for business because it shifts AI architecture away from unreliable 'magic' invocations towards more deterministic, reliable systems for automation.

What was tested and why the numbers aren't just for show

I love publications like this not for the flashy headline, but for the moment where you can pinpoint an architectural flaw. Vercel ran agents on tasks requiring Next.js documentation, with some of the knowledge intentionally left out of the model's pre-training. This is a proper test of system behavior, not just a pretty demo.

The setup was simple. A baseline agent without external help achieved a 53% pass rate on evals. Then, it was given either skills or an AGENTS.md file, and that's where things got interesting.

In this design, skills were structured as folders with SKILL.md, metadata, instructions, and additional files. The agent first had to realize that the necessary skill existed, then decide to call it, and only then load its content. On paper, it looks neat. In reality, the agent often failed to even reach that step.

According to Vercel, skills alone also resulted in a 53% pass rate. That's zero improvement. Even worse, in about 56% of cases, the agent didn't call the skill at all, even when it was relevant.

But AGENTS.md, acting as a constant context in the system prompt, worked wonders. No need to search for anything, no need to make an intermediate decision to load. If you put a condensed summary of docs or an index in this file, the agent always sees it. In Vercel's evals, the version with the full compressed context in AGENTS.md reached a 100% pass rate.

What caught my eye wasn't the markdown magic itself. It was that markdown won not because it's elegant, but because it removed an unnecessary point of failure. The model didn't forget to call a tool because you simply didn't give it a chance to.

How this changes AI architecture and implementation

Translating this from the language of benchmarks to the language of production, the conclusion is very down-to-earth. When critical knowledge for a task is hidden behind an optional mechanic, you're building a fragile system. It might look elegant on a diagram, but it will be unstable in practice.

I see this constantly in AI automation projects. A team gives an agent a set of tools, skills, memory layers, routers, and a dash of hope on top. Then everyone is surprised when the agent is sometimes brilliant and other times seems to have forgotten where it is.

The AGENTS.md approach suggests a more practical architecture for AI solutions. Core rules, a domain knowledge index, constraints, response formats, and key routes should be kept in the constant context. Skills and tools should be reserved for what truly needs to be pulled on demand: large reference materials, external APIs, and infrequent procedures.

So it's not an 'either-or' situation, but a proper hybrid. I'd frame it this way: AGENTS.md for determinism, skills for extensibility. This starts to look like a mature AI architecture, not just a collection of features that happened to end up in the same repository.

There's an honest limitation, too. You can't endlessly expand the constant context. Vercel explicitly states that the point isn't to stuff all the documentation into AGENTS.md, but to condense it into a short, useful, well-indexed summary. They mention a size of around 8 KB instead of the original 40 KB of material, which sounds very sensible.

Who benefits from this shift? Teams that need predictable AI automation: support, internal copilots, agentic workflows for development, ops, and document processing. Who loses? Projects whose architecture relies on the belief that the model will 'figure out to call the right module on its own.'

I wouldn't treat this as a universal law of nature. These are Vercel's results on specific evals around Next.js, and the outcome might vary on other tasks. But the signal is very strong: when implementing AI in real processes, you need to design not only the agent's knowledge but also the path to access that knowledge.

At Nahornyi AI Lab, this is precisely where we most often trim the fat. We don't add ten more abstractions to an agent; we remove one unnecessary decision that it makes unreliably. And suddenly, everything starts working better, cheaper, and more smoothly.

This analysis was done by me, Vadym Nahornyi, at Nahornyi AI Lab. I develop AI solutions and build agentic systems with my own hands, so for me, these details are not theory but daily engineering routine. If you want to discuss your case, implement AI, or build AI automation without fragile magic, feel free to contact me. Let's look at your project together.

Share this article