Manchester, Oxford universities use AI to track COVID-19 variants
Mathematicians from The Universities of Manchester and Oxford are harnessing the power of artificial intelligence (AI) to track and identify emerging variants of COVID-19, as well as other potential future infections. This pioneering approach utilises an innovative AI framework combining dimension reduction techniques with an explainable clustering algorithm known as CLASSIX.
First released this week in the PNAS journal, this study could offer a valuable enhancement to existing methods for tracking viral evolution such as phylogenetic analysis, which currently relies on painstaking manual curation. Utilising the CLASSIX algorithm, the researchers can swiftly identify clusters of viral genomes within vast amounts of data that may pose a threat in future.
Roberto Cahuantzi, a researcher at The University of Manchester and the paper's primary and corresponding author, shared, "Since the emergence of COVID-19, we have seen multiple waves of new variants, heightened transmissibility, evasion of immune responses, and increased severity of illness." He emphasised the need for swift and effective detection of emerging variants to allow for proactive measures such as tailored vaccine developments.
At present, the GISAID database hosts almost 16 million sequences and continues to grow. Traditional analysis and mapping of the evolution and history of all COVID-19 genomes from this data absorb vast quantities of computer and human time. The new AI methodology allows for automated analysis of this information.
Thomas House, Professor of Mathematical Sciences at The University of Manchester, warns that unless we can display a tangible benefit from curating this data, there's a risk that it will be discarded. Acknowledging the limit to human expert time, he envisages this AI approach working symbiotically alongside human effort to streamline analysis and leave experts free to focus on other crucial developments.
The innovative method involves breaking down the genetic sequences of the COVID-19 virus into smaller '3-mers', represented numerically, and grouping similar sequences together using machine learning techniques. Professor Stefan Gttel of the University of Manchester explained that this CLASSIX algorithm is, "much less computationally demanding than traditional methods and is fully explainable, meaning that it provides textual and visual explanations of the computed clusters."
Cahuantzi additionally suggested that their analysis serves as proof of concept, demonstrating the potential use of machine learning techniques for the early detection of emergent major variants. He explains, "Whilst phylogenetics remains the gold standard for understanding the viral ancestry, these machine learning methods can accommodate several orders of magnitude more sequences than the current phylogenetic methods and at a low computational cost."