Main image of article How IT Leaders Can Rebuild Trust in AI-Generated Code

AI is helping software teams produce code faster than ever, but the gains in productivity are creating a new challenge for IT leaders; maintaining confidence in what reaches production.

A Tricentis survey of more than 2,500 CEOs, CIOs, CTOs, engineering leaders, DevOps professionals, QA teams and developers found 60% of organizations are deploying untested code into production environments.

While that figure is largely unchanged from the previous year, the reasoning behind it has shifted; in 2025, organizations primarily attributed quality failures to accidental oversights.

Today, many acknowledge they are knowingly accepting the risk. Leadership pressure to prioritize speed was cited by 32% of respondents, while 30% said the volume of AI-generated code has become too overwhelming to test completely.

The findings suggest a growing tension inside software organizations--AI is accelerating development cycles and enabling teams to deliver features more quickly, but testing, governance and quality assurance processes are struggling to keep pace.

As a result, organizations are increasingly finding themselves in a position where software can be produced faster than it can be confidently validated.

AI Is Flooding Software Pipelines

The challenge extends across industries, with more than half of organizations in every major sector surveyed reporting deployment of untested code. Financial services organizations reported the highest rate at 64%, followed by retailers at 63% and energy and utilities providers at 58%.

For many organizations, the problem is not a lack of awareness.

“The fact that a majority of teams know their code isn’t fully tested, but ship it anyway, shows how much pressure enterprise software development teams are under to move faster,” says David Colwell, vice president of AI and machine learning at Tricentis.

AI-generated code is changing the economics of software delivery. Development teams can now create substantially more code in less time, but validation processes remain largely rooted in workflows designed for slower, human-driven development models.

“Many organizations are still relying on testing and governance processes built for slower, human-led development cycles,” Colwell says. “That creates a gap between how quickly software can be produced today and how confidently it can be validated under traditional quality systems.”

The survey suggests AI adoption itself is contributing to the challenge. Nearly half of organizations reported fully implementing AI internally, yet more than half of those organizations say their AI tools and processes change regularly.

One-third cited tool complexity and sprawl as a major obstacle to achieving continuous software quality at scale. Another third pointed to skills gaps, while more than a quarter (28%) said code volume is increasing faster than they can manage.

The Trust Gap Between Leadership and Engineering

The operational strain is exposing a growing disconnect between executive leadership and the teams responsible for maintaining software quality.

While 81% of CEOs reported high confidence in AI-driven systems and tools, among QA and DevOps professionals, that figure falls to 56%.

Similarly, 44% of C-level executives believe their organizations are highly prepared to operationalize, govern and scale AI agents across the software development lifecycle, compared to just 23% of QA and DevOps teams.

Aaron Reich, CTO at Avanade, says the two groups are often measuring success differently.

“Executives are measuring velocity and delivery; practitioners are measuring whether what ships actually works, doesn’t hallucinate in production, and doesn’t create downstream risk,” he explains.

Reich argues AI is exposing tensions that have existed inside software organizations for years.

“The deeper issue is that software quality has always been dependent on what metrics an organization chooses to measure and reward,” he says. “If the business is celebrating speed and throughput, that’s what teams optimize for. AI doesn’t change that dynamic, it amplifies it.”

Colwell says he agrees executive optimism can sometimes obscure operational realities.

“Leadership and practitioners are seeing AI readiness differently because they are measuring different things,” he says. “Executives often focus on strategic outcomes and investment returns, while practitioners are dealing with day-to-day implementation challenges.”

Traditional Testing Can No Longer Keep Up

The problem facing software organizations is one of scale and timing: historically, testing has been concentrated toward the end of the software development lifecycle.

That model becomes increasingly difficult to sustain when AI systems can generate code, tests and application changes at speeds that dramatically outpace manual review processes.

“If testing only happens at the end of the SDLC, it will never keep up with AI-generated code volumes,” Colwell says.

As software delivery accelerates, organizations are finding that quality assurance can no longer function as a final checkpoint. Instead, validation needs to become an ongoing process embedded throughout development.

The challenge is particularly acute for teams already struggling with tool sprawl and changing AI workflows. Every new platform, model and automation layer creates additional complexity that must be monitored, tested and governed.

Meanwhile, organizations are experimenting with AI agents capable of making decisions within development and release pipelines.

Building Quality Into AI-Native Development

Rather than attempting to test every line of AI-generated code, Colwell and Reich say they believe organizations must rethink how quality is incorporated into software delivery.

“It must move earlier in the lifecycle with clear validation criteria, risk models, and release standards defined before the first line of code gets written,” Colwell says.

That shift requires organizations to focus testing efforts where they matter most. AI-assisted testing, impact analysis and risk-based validation strategies can help teams identify which changes carry the greatest operational risk rather than treating every update equally.

Reich believes organizations should take an even broader view.

“The real practical step is rethinking SDLC itself as an AI-native process for your organization,” he says. “Not adapting the old model around AI but rebuilding the operating model with AI as a first-class citizen from the start.”

That requires integrating governance, quality controls and observability directly into development workflows instead of treating them as separate activities that occur later—and trust must be calibrated according to business impact.

“Not all code carries the same stakes,” he said. “What’s good enough for a proof of concept or pilot is a very different bar than what you need for something touching customers, regulated data, or critical infrastructure.”