IT Brief UK - Technology news for CIOs & IT decision-makers
Story image
ExtraHop open sources its machine learning dataset
Thu, 14th Sep 2023

ExtraHop, a provider of cloud-native network detection and response (NDR), has announced it is open-sourcing its expansive 16 million-row dataset - one of the most robust available - to help defend against domains generated by algorithm (DGAs). It is to level the playing field for defenders and empower businesses of all sizes to better secure their organisations by strengthening defences against malware and botnet operations.

Amid a widening cybersecurity skills gap (up 26% in the last year) and dwindling resources, the cyber landscape is rapidly evolving. As new threats rapidly appear, open-sourced research and datasets are a solution to overcoming the challenges security teams face daily.

“The challenges we face in security are formidable and dynamic, and, with this initiative, we’re democratising the tools needed for threat research detection for security teams of all sizes, backgrounds, and industries,” says Raja Mukerji, chief scientist and co-founder of ExtraHop.

“Collaboration among the cybersecurity community is invaluable - coming together to share our best work is the only way to remain on the offence and put attackers at a disadvantage. Our research will be a game-changer for the community, and we encourage other teams to open source their own insights that will similarly benefit the industry at large.”

Striving for industry collaboration, ExtraHop is releasing its DGA detector dataset, made up of more than 16 million rows of data, on GitHub to help security teams identify malicious activity in their environments before it becomes a business problem.

DGAs are used by threat actors to maintain control within an organisation's environment upon entering a network, making attacks challenging to detect and stop. 

Initially built for ExtraHop's award-winning NDR platform, Reveal(x), this research can now be used by any security researcher to construct their machine learning (ML) classifier model to more quickly identify DGAs and intervene in attacks with more incredible speed and precision. Since its implementation in Reveal(x), the ExtraHop DGA model has demonstrated more than 98% accuracy.

ExtraHop's Reveal(x) 360 platform is the only network detection and response platform that delivers the 360-degree visibility needed to uncover the cyber truth. When organisations have full network transparency with ExtraHop, they see more, know more, and stop more cyberattacks.

“Giving threat actors the ability to operate undetected and an uptick in these types of attacks, DGAs are increasingly considered a major threat to businesses today,” says Todd Kemmerling, director of data science at ExtraHop. “As we began developing a model for detecting DGAs, it became apparent there was a lack of public datasets accessible to security teams with a wide-ranging set of resources. With this dataset, we are filling that gap, giving any security team access to the pivotal data needed to detect DGAs swiftly.”