It is tempting to treat every AI coding problem as a model problem. Sometimes it is. A stronger model can reason better, recover from mistakes faster, and handle more context. But a lot of agent failures are configuration failures: unclear instructions, missing test commands, weak permissions, vague task boundaries, or tools that are easy to call incorrectly.
That shows up in the research as well as in day-to-day use. One recent paper on engineering pitfalls in AI coding tools looked at thousands of reported bugs across Claude Code, Codex, and Gemini CLI, with many issues landing around tool invocation and command execution. Another paper on configuring agentic AI coding tools treats configuration as its own design space across tools like Claude Code, Copilot, Cursor, Gemini, and Codex.
This matches what I see in real workflows. The agent is much better when the repo tells it how to work: where the tests are, which commands are safe, what style matters, how to handle generated files, when to ask, and what counts as done. Without that, the agent wastes effort rediscovering the rules or confidently applying the wrong ones.
Better models are still useful, but they are not a replacement for a well-shaped engineering environment. If a team wants reliable agent output, it should invest in the same boring things that help humans: clear docs, fast tests, predictable scripts, good errors, and reviewable changes. The agent just makes the payoff more obvious.