Developing a CLI Tool to Evaluate Code Cognitive Complexity with LLMs(AI)
A while ago, a friend and I were discussing code cognitive complexity and maintainability. A friend of mine wished to have a tool to automatically evaluate whether a piece of code is hard to maintain. I wasn’t sure this was even possible — maintainability is notoriously hard to quantify programmatically. But LLMs can understand and generate human-like text or even code, and I wondered if that same capability could be applied to interpreting and evaluating code quality, going beyond what traditional static analysis tools can do. That thought led to the tool I eventually built. It’s now available on PyPi, and I believe it could be a valuable addition to any CI pipeline. In a previous post, I shared some early thoughts on maintainability and cognitive complexity of code that emerged while working on this tool. In this post, I’d like to go deeper and walk through the development process, using my CLI tool as a case study for building an LLM-based application.

A while ago, a friend and I were discussing code cognitive complexity and maintainability. A friend of mine wished to have a tool to automatically evaluate whether a piece of code is hard to maintain. I wasn’t sure this was even possible — maintainability is notoriously hard to quantify programmatically.
But LLMs can understand and generate human-like text or even code, and I wondered if that same capability could be applied to interpreting and evaluating code quality, going beyond what traditional static analysis tools can do.
That thought led to the tool I eventually built. It’s now available on PyPi, and I believe it could be a valuable addition to any CI pipeline.
In a previous post, I shared some early thoughts on maintainability and cognitive complexity of code that emerged while working on this tool. In this post, I’d like to go deeper and walk through the development process, using my CLI tool as a case study for building an LLM-based application.