All blogs

When AI writes code, let us automatically monitor the process

ŽV

Žiga Vukčevič

5 min reading

As AI writes more and more code, a paradox emerges: development speed increases, but so does the risk that process quality becomes a blind spot. The question is no longer whether AI helps us—the question is whether we still understand what we are building.

Code is produced faster. But who is overseeing the process?

Generative AI is increasing the efficiency of development teams in ways that were unthinkable until recently. At first glance, this is exclusively good news. But it raises a question we often skip in the excitement: code quality has become more measurable than ever—while the quality of the development process has become more invisible.

When AI produces code that works but no one truly understands; when tests are written after deployment instead of before; when source code review becomes a formality over generated content—process debt accumulates within the team. This is more insidious than technical debt because it cannot be measured with a code analysis tool. It only becomes apparent when something goes wrong.

Our answer: an error reporting agent

Our team addressed this problem in a practical way. We developed an AI agent that automatically collects, analyzes, and structures error data from the issue tracking system and places it in the context of project documentation. Its purpose is not merely to collect statistics—it is to detect patterns that the human team overlooks in the pace of day-to-day work.

The idea stems from a simple observation: errors are not only a technical signal, but also a process signal. If the same errors recur again and again in the same module, that does not speak only about the code—it speaks about how the functionality was specified, how it was implemented, and how it was reviewed. The agent reads these patterns automatically, without anyone on the team having to do it manually.

How the agent works

The agent operates in four phases.

First, it loads the context: It reads functional specifications from the project documentation and builds a project model—what functionalities exist, how they are described, what was agreed. This is a key step that distinguishes the agent from a simple error statistics tool.

Next, it retrieves issues: From the issue tracking system, it pulls all issues for the selected time period, along with all metadata—creation date, reporter, status, change history, and screenshots.

Then comes classification: The agent classifies each issue by type (visual, content, logical, security, performance, integration), severity (critical, high, medium, low), and area (frontend, backend, integration, infrastructure). The classification is based on a combination of metadata from the issue tracking system and semantic analysis of the issue title and description. At the same time, the agent links each issue to the relevant user story from the specification.

Finally, it compiles a report: A structured document with findings.

What the agent detects

In addition to basic statistics—how many issues, which types, who reported them—the agent looks for patterns that are too demanding for manual analysis.

It detects recurring patterns: Phrases that appear in issue descriptions again and again (e.g., validation, date, import, saving), which often indicates a systemic deficiency rather than an isolated bug.

It detects spikes: An unusually high concentration of issues in a short period, which often indicates a risky release or a poorly reviewed change.

It also assesses the quality of the reports themselves: Issues without a description or without steps to reproduce are a signal that the team lacks a shared reporting standard.

Particularly interesting is the contextual analysis section generated by AI: whether the distribution of issues indicates poor specification, technical debt, or signs of regression; which areas require attention; and what prioritization is recommended for the next sprint.

What we learned

It is common knowledge, but still: the quality of the agent’s analysis depends directly on the quality of the input data. If reported issues lack descriptions or steps to reproduce, the agent cannot compensate for that. The agent is therefore not only an analysis tool—it is a mirror of the team’s process discipline.

Likewise, classifying issues by area is limited by the precision of the description: the agent infers based on text and does not know the system’s internal architecture. Therefore, results are best where issues are well described and where project documentation is up to date.

Conclusion

Autonomous code generation is a reality that makes no sense to reject. The next step is not slowing down—it is deciding to give the process the same attention as the code itself. The error reporting agent is a step in that direction: it does not replace the team’s judgment, but provides data the team would not have without automation.

Software quality has never been only about what the code looks like. It has always been about how it is produced—and about the team responsible for it.

ŽV