AI in quality assurance: redefining trust, accountability, and innovation

insight
March 20, 2025
9 min read

Author

Nagarro Turntable_Speaker_Thomas Steirer

 

Thomas Steirer is Chief Technology Officer (CTO) at Nagarro. His focus is on developing scalable and sustainable solutions that are primarily designed to deliver valuable information.

Artificial intelligence (AI) is accelerating at a pace that challenges conventional approaches to quality, reliability, and accountability. AI now operates in complex, dynamic environments that impact everything from financial systems to critical infrastructure. As AI becomes increasingly autonomous and moves away from pre-defined tasks, there is a need for more transparent governance, measurable performance standards, and a stricter ethical framework. 

"If you live in the Vienna region and use public transport, online banking, or air travel, chances are you've interacted with systems that I've helped ensure run smoothly — at least most of the time. This experience has shaped how I think about software quality and what it truly means for a system to be "good." It provides a unique perspective, one that I call "destructive creativity." When I see a new product or program, my instinct isn't just to admire it. I first wonder, "That's cool... How do I break it?" 


The challenge today is not only to improve AI's capabilities but also to ensure that its decisions are measurable, explainable, and aligned with expectations. 

Debugging the future: AI in Quality Assurance

Traditional quality assurance (QA) has long been defined by precision — systems are rigorously tested for functionality, reliability, performance, and many other quality criteria, as defined in standards such as ISO 25010. QA methods are based on "destructive creativity'," where software is tested under stress to uncover vulnerabilities and ensure seamless execution in business-critical areas such as banking, transportation, and industrial automation. 

But AI has fundamentally disrupted this approach. Generative AI moves beyond fixed, deterministic outcomes into contextual, adaptive decision-making. A recent Global AI Standards Forum study found that 81% of AI experts believe that generative AI models need to be evaluated on their adaptability, not just their precision. For example, when asked about the influence of the French Revolution on the bakery trade, there is no "single correct answer" but rather a mixture of historical interpretations, cultural insights, speculative contexts, and the art of making excellent baguettes. Some of the answers AI generates when answering this question can be useful, while others are misleading. And it might be impossible to differentiate without knowing the "correct" answer beforehand. 

What we deem "correct", "useful", or even "acceptable" is entirely up to our expectations. Clearly, defining, verbalizing, and checking our expectations is a tricky process – even when it comes to communication between highly trained humans, even in the context of traditional software systems. For a layperson interacting with an AI, it can be downright impossible. 

This shift demands a redefinition of software quality that extends beyond mere correctness. Excellence must now include adaptability, creative interpretation, and contextual accuracy so that AI-driven systems improve, rather than obfuscate, decision-making. 

AI is not an exact science— how do we measure it?


The nature of AI performance presents a unique challenge: How can we measure its quality when the answers are probabilistic and conversational rather than algorithmic and/or numeric? Traditional software adheres to a clear success-failure metric, but AI operates in a spectrum of variability. 

Let's take autonomous vehicles: it is obvious that it is unrealistic to expect no accidents. The more practical question is: what should they be measured against? Should the benchmark be an average driver, a classic car enthusiast or a driving safety instructor? Defining success in artificial intelligence means shifting from perfectionism to pragmatism— setting reasonable, real-world standards rather than relying on theoretical ideals or, conversely, having no standards. 

This lack of clear benchmarks extends to self-driving cars and AI-driven decision-making in various sectors. Accountability remains a critical issue. According to a 2023 report by the AI Ethics Consortium, 92% of AI leaders believe AI governance and standardization are essential for trust, yet only 45% of companies have defined responsible AI metrics. If an AI-driven chatbot misinterprets a request or a self-driving vehicle makes a mistake, who is responsible? The developer who built it? Is the company deploying it? The AI system itself? 

Without a clear framework for accountability, trust in AI will remain fragile, ultimately slowing its widespread adoption and limiting its full potential. 

Rethinking AI Testing

As AI-driven systems become more interactive — whether through chatbots, virtual assistants or decision-making engines — the way we test them needs to evolve. Traditional QA methods, such as scripted test cases and AI self-evaluation, are no longer enough. The industry needs to move to validate AI systems against the following key questions: 

  • How accurate and truthful are AI-generated responses?
  • Are these systems truly accountable, and do they adhere to ethical standards?
  • What safeguards are in place to prevent bias, misinformation or unintended consequences?

Testing must go beyond functional correctness and consider context, ethical implications, and long-term reliability. This means incorporating real-world simulations, continuous monitoring, and human validation in the loop to ensure that AI systems remain trustworthy and adaptable and meet user expectations. As AI evolves, our approach to measuring it needs to include the ability to make fair, transparent, and responsible decisions. 

AI for Quality Assurance

Destructive Creativity: Building AI we can trust


Managing AI's transformation requires a lot of destructive creativity—a mindset that actively challenges assumptions, puts established models to the test, and adapts to AI's evolving complexity. This is not about incremental improvements but fundamental changes in the way we define quality, reliability, and accountability in AI-driven systems. 

Instead of viewing AI as a flawless entity, organizations need to see it as an evolving tool, in many ways more like an "employee" than a traditional software system. AI will fail, but it can also learn and improve—but only through rigorous questioning, testing, and ethical design will it become a trusted partner in decision-making. 

AI on the way to a future of trust

As AI permeates industries, business leaders, regulators, and developers are responsible for creating strict ethical guidelines and transparent AI governance. The future of AI is not just about what it can achieve but also about ensuring that it operates within a framework of accountability, fairness, and human-centered design. 

AI is estimated to contribute $15.7 trillion to the global economy by 2030, according to a 2024 study by the Global Economic Research Institute. However, over 70% of executives are still concerned about trust and ethical risks associated with AI deployments. Quality Assurance has long evolved beyond a purely technical process — it has become a philosophy prioritizing trust, integrity, and ethical innovation. In AI, this is becoming more crucial than ever. AI needs to optimize efficiency and uphold the values that make technology an enabler of progress, not uncertainty. 

That requires a lot more brainpower to articulate our expectations and validate that they're being met.  

The future of AI-driven systems is in our hands. It is up to us to take the steering wheel and ensure that AI is not just a tool for automation but a force for meaningful change, responsible innovation, and unwavering trust.

 

Curious about the evolving role of AI in quality assurance?


Watch Thomas Steirer deliver TEDx Talk, where he breaks down how AI is transforming software testing, the complexity of verifying AI-generated results, and the challenges of setting expectations in an autonomous world. From the trust dilemma in AI-driven decisions to the impact of machine learning on daily life, his insights push us to rethink the definition of software quality at a time when the answers are not always black and white. Don't miss this thought-provoking talk! 

 

 

Watch Thomas Steirer’s TEDx Talk on AI in quality assurance

 
Get in touch

Explore AI in quality assurance