From assistance to autonomy: Why AI now needs continuous quality intelligence

Mon, 24th Nov 2025

By Andrew Power, Head of UK/I, Tricentis

The first documented case of an AI-assisted cyber-espionage operation, using Anthropic's Claude AI, has changed the conversation about enterprise AI. In this instance, a commercially available model, connected to external tools through the Model Context Protocol (MCP), was directed to explore systems, interpret results, and adjust its behaviour with minimal further input. Crucially, the AI was not used to generate code or content; it contributed to the operational flow of the attack.

While many organisations still treat AI as a productivity or analysis layer, agentic models equipped with planning cycles, memory, and tool access now sit much closer to the execution layer than the advisory layer. At this point, the risk can no longer be evaluated purely through prompt control, policy libraries, or output review. The imperative has shifted to understanding and assuring a system's behaviour over time, rather than relying solely on what the model presents as knowledge.

From output accuracy to behavioural confidence

Traditional AI safety frameworks have been built around evaluating the quality of output. The focus has been on whether a model's response is correct, ethical, data-secure, and safe to publish or act on. Those disciplines remain essential, but they do not address what happens when the AI is no longer the final step in the chain, but an active participant.

Agentic AI interprets goals, breaks down tasks, calls external tools, and adapts based on the signals it receives. It behaves less like a search interface and more like a persistent, self-directing service component. The concern is not that these systems are inherently malicious, nor that enabling technologies like MCP are unsafe by design; the challenge lies rather in the pace at which autonomy is gaining ground. Which is faster than the mechanisms designed to validate and observe it.

Why existing assurance models don't scale

Software engineering has faced and solved similar challenges before. The early era of continuous delivery unlocked unprecedented development speed, but also created new categories of reliability risk. The solution was not to centralise or slow change; it was to build automated testing, observability, and real-time guardrails directly into delivery workflows.

Today, AI sits at a similar inflection point. Many organisations still treat AI as an external tool rather than part of the system architecture. But if an AI agent can introduce change, influence configuration, or trigger a downstream process, it must be handled with the same structural discipline as production code. Point-in-time review, manual approval, or isolated red-teaming cannot provide confidence for systems that are dynamic, adaptive, and continuously learning.

The software development industry has already demonstrated that confidence is not gained at the moment of deployment; it is sustained continuously through validation and feedback. And AI deserves the same lifecycle.

Applying testing discipline to autonomous AI

It's projected that by 2028, there will be more than 1.3 billion AI agents in operation, all with a centralised goal to supercharge efficiency. A new level of reward, granted, but they also introduce a new level of risk. If this growing number of AI agents is to operate effectively inside expanding enterprise systems, they require assurance models grounded in software testing principles.

The shift is not philosophical; it is procedural. Instead of validating only whether a model can generate correct responses, organisations must validate whether an agent behaves correctly under different conditions, over time, and across environments.

That means thinking in terms of behavioural regression testing, coverage of autonomous actions, repeatability, and explainability, just as mature teams do with software services today. The difference is that an AI agent's "version" is not always as discrete as a build number, which makes continuous, runtime insight essential.

The role of Quality Intelligence

As autonomy expands, no organisation can enumerate every failure mode or foresee every emergent behaviour. Quality Intelligence enables continuous analysis of agent behaviour rather than retrospective analysis. It enables visibility into how a decision path formed, what context influenced it, what external actions were triggered, and whether any of those diverged from accepted patterns or policies.

In autonomous systems, trust cannot be declared; it must be demonstrated. Behaviour, not intention, becomes the artefact of assurance. The objective is not to constrain functional autonomy, but to ensure that confidence grows in parallel with capability. When visibility and validation are in place, autonomy becomes an asset, not an ambiguity.

Companies need to focus more on investment in the orchestration layer that sits above autonomous AI as this is where teams can set guardrails, monitor how AI agents operate and step in when necessary.

A more resilient path forward

There is no benefit in slowing or rejecting autonomous AI. In many organisations, autonomy may become essential for scale, resilience, and operational efficiency. However, maturity must accompany capability. The pragmatic way forward mirrors the lessons already learned by engineering teams: treat AI agents as part of the production ecosystem, embed testing and observability into their lifecycle, and base trust on evidence rather than assumptions.

AI has begun to perform work, not merely describe it, and when technology performs work, quality becomes the mechanism that keeps innovation safe, scalable, and reliable.

Preferred Source

From assistance to autonomy: Why AI now needs continuous quality intelligence

Top stories