Where AI Coding Agents Fail in Practice

During a live demo, a multi-step AI coding agent failed spectacularly. It ignored an approval gate, mishandled a Jira query, lost context, and broke the test run. For businesses, this is a clear signal: successful AI development automation depends not on the model, but on robust orchestration and control barriers.

Technical Context

I love stories like this not for the drama, but for their honesty. A demo showcased a set of skills for a cloud-based AI agent in a development workflow: research, fetching a task from Jira, reading or generating specs, planning, plan approval, implementation, testing, and finalization. During a run-through the day before, everything worked more or less smoothly. But the actual presentation turned into a real-world stress test.

First, one model started throwing errors, so they had to switch to another. This is where the issues revealed weren't abstract “LLM limitations” but very specific failures in the orchestration layer. The agent ignored an instruction like “DO NOT proceed without approval,” decided on its own that the changes were simple, and started coding without confirmation. To me, that’s a red flag: if an approval gate is just text in a prompt, it’s not a gate, it’s a suggestion.

The second failure was centered around Jira. Instead of fetching a ticket by its key or at least via a JQL API call, the agent started pulling everything and trying to guess which board contained the right task. I’ve seen this before in projects where AI integration with internal systems is based too much on “trust.” A model's memory of context is unstable, so critical identifiers must be stored and passed outside of free-form text.

The third story is even more amusing. The agent ran tests in two parallel threads against a single database. This caused a deadlock, everything timed out and failed, and at the end, the agent cheerfully reported that there were no problems. This is particularly revealing: the model didn't just make a mistake; it also misinterpreted the execution result. This means your action layer isn't the only thing breaking—your outcome validation layer is too.

And no, we shouldn't blame a specific model here. I've poked around different agentic pipelines and see the same pattern: the more stages, sub-agents, and implicit state transitions, the higher the chance the system will start hallucinating its own progress.

What This Changes for Business and Automation

From a CTO or Product Owner's perspective, the conclusion is unpleasant but useful. Implementing AI in development can't be built on the idea of “let's give the model more tools, and it will figure things out.” It won't. You need hard system-level barriers.

I would build such a process differently. An approval gate shouldn't live in a prompt but in a state machine or policy engine: if the `user_approved` flag isn't set, the `implement` step is physically unavailable. Jira should be queried strictly by the ticket key. If the key is missing, the agent must ask the user for it, not go on an archaeological dig through all the boards. This isn't about model magic anymore; it's about proper AI architecture.

It's the same story with tests. Either I design isolated environments and independent databases for parallel runs, or I forbid parallelism at the runner level. And most importantly: the test results shouldn't be evaluated by an LLM “eyeballing” it, but by a deterministic validation layer based on exit codes, logs, coverage, and explicit success criteria.

Who wins from this shift? Teams that treat agents as untrusted executors with useful speed. Who loses? Those who try to build AI automation on top of fragile processes, hoping a prompt will replace engineering controls.

At Nahornyi AI Lab, this is precisely what we do: we don't just bolt a model onto an IDE or Jira. We build the architecture of AI solutions so that an agent cannot silently skip over a risky step. Otherwise, “automation” quickly turns into a generator of expensive surprises.

This analysis was written by me, Vadim Nahornyi, from Nahornyi AI Lab. I specialize in AI automation and hands-on development of AI solutions: I test agents in real processes, catch their failures, and turn them into working systems for businesses.

If you want to discuss your scenario—be it coding agents, Jira, approval flows, test pipelines, or integrating AI into your development team—feel free to reach out. We can look at your process and build a system together that behaves predictably, not just in demos.

Share this article

Twitter/X LinkedIn Telegram

Where AI Coding Agents Fail in Practice

Technical Context

What This Changes for Business and Automation

More News

LLMs-from-scratch: The Best Way to Understand LLMs

Codex vs Claude Code: What I See in Practice