AI isn’t ready to replace human coders for debugging, researchers say

Even when given access to tools, AI agents can't reliably debug software.

Apr 11, 2025 - 23:56

AI isn’t ready to replace human coders for debugging, researchers say

There are few areas where AI has seen more robust deployment than the field of software development. From "vibe" coding to GitHub Copilot to startups building quick-and-dirty applications with support from LLMs, AI is already deeply integrated.

However, those claiming we're mere months away from AI agents replacing most programmers should adjust their expectations because models aren't good enough at the debugging part, and debugging occupies most of a developer's time. That's the suggestion of Microsoft Research, which built a new tool called debug-gym to test and improve how AI models can debug software.

Debug-gym (available on GitHub and detailed in a blog post) is an environment that allows AI models to try and debug any existing code repository with access to debugging tools that aren't historically part of the process for these models. Microsoft found that without this approach, models are quite notably bad at debugging tasks. With the approach, they're better but still a far cry from what an experienced human developer can do.

Read full article

Comments