Datadog launches Bits AI SRE Agent to speed up IT incident fixes
Datadog has introduced Bits AI SRE Agent, an autonomous tool designed to assist software engineering teams during IT incidents by rapidly identifying root causes and suggesting resolution paths. The system has been tested in more than 2,000 customer environments, including deployments at Uber Freight and DelightRoom.
Incident automation
Bits AI SRE Agent is designed to work as a 24/7 on-call assistant, capable of investigating alerts autonomously and providing actionable information for faster remediation. Its operation is integrated with the Datadog platform, drawing on telemetry, organisational knowledge, and architecture data to provide insights for incident response.
The service validates its findings and delivers conclusions directly to third-party collaboration platforms. According to Datadog, investigations that previously required hours of manual effort may now be completed in minutes through autonomous processes.
Enterprise support
The agent addresses challenges encountered in complex environments, where finding the source of IT issues is made harder by fragmented systems and teams. It includes support for HIPAA-regulated workloads, role-based access controls, and enterprise-level agreements with AI partners. This is designed to meet the needs of organisations with advanced compliance and security requirements.
Customer adoption
Uber Freight and DelightRoom are among the organisations that have tested the agent in live scenarios. The agent has investigated tens of thousands of incidents across a range of alert severities and environments, including both routine notifications and high-priority issues.
"This launch represents a pivotal expansion of Datadog's AI strategy as our first generally available AI agent, and signals a new phase of intelligent, automated reliability," said Yanbing Li, Chief Product Officer, Datadog. "Bits AI SRE allows companies to mitigate issues faster, reduce customer impact, and adopt AI safely. It has already been tested against more than 2,000 customer environments, including both global enterprises and fast-growing start-ups with a diverse range of production environments. Tens of thousands of investigations have run to date, from routine alerts to high-severity incidents, with organizations already reporting positive outcomes. This reflects the tangible and immediate value, tied directly to operational and business outcomes, that we are delivering."
Operational feedback
Some customers report that Bits AI SRE reduces the time required to resolve incidents and improves the ability to manage high-stress situations, particularly during the initial moments of an incident after an alert is triggered.
"During an incident, the first five minutes are critical. Bits AI helps us cut through the noise by instantly surfacing the right context and correlations across our systems," said Thiyagarajan Anandan, Senior Engineering Manager, Uber Freight. "With smart tagging and naming, it automatically guides engineers to the right information, reducing cognitive load and giving us clarity and control when it matters most."
"With Bits AI SRE being on-call 24/7 for us, MTTR for our services have improved significantly," said Andrew Seok Ju Kim, Data Engineer, DelightRoom. "For most cases, the investigation is already taken care of well before our engineers sit down and open their laptops to assess the issue."