The easiest way to get bad work from a coding agent is to ask for too much and then act surprised when it wanders. That feels familiar. It is what happens when the task has no proper boundary, except the agent is faster, cheaper to restart, and somehow even more confident.
The paper Overeager Coding Agents: Measuring Out-of-Scope Actions on Benign Tasks looks at agents making changes beyond the requested task. That is the failure mode I worry about most in day-to-day use. Not dramatic sabotage. Just a model deciding that while it is here, it should also “clean up” something nearby.
This is why I keep coming back to bounded prompts and tests. If I am handing work to an agent, the task should say what to change, what not to change, what evidence is expected, and where to stop. The review should look for scope creep as carefully as it looks for failing code. If the agent fixed the bug but rewrote unrelated behaviour, that is not a free bonus.
The practical answer is boring: smaller tasks, explicit constraints, good tests, and human review. Agents are useful when they behave like focused contributors. When they are overtasked, they become a new version of an old lead-dev problem with nicer screenshots.