Written by Charlotte Harrison, Science Writer
Researchers from the European Bioinformatics Institute and collaborators have developed a tool that allows phylogenetic trees to be rapidly constructed from large genomic datasets. The new method, published in Nature Genetics, is a key step for better understanding the evolution and epidemiology of viruses such as SARS-CoV-2.
During the COVID-19 pandemic, researchers realized that existing computational tools for analysing genomic data were not suitable for handling the large datasets that emerged for sequenced SARS-CoV-2 genomes. This drawback made it challenging to study how SARS-CoV-2 was evolving and spreading.
The authors of the study looked for ways to improve the efficiency of probability-based phylogeny methods. They aimed to reduce the excessive computational time and memory resources that hamper existing methods. They developed a new algorithm that took advantage of the fact that viral genomes in epidemiology studies are likely to be closely related; for example viral genomes such as SARS-CoV-2 with only dozens of nucleotide differences.
The new tool that incorporates the algorithm, dubbed MAPLE — maximum parsimonious likelihood estimation — is tailored for large-scale genomic epidemiology. Specifically, the tool performs maximum likelihood phylogenetic inference and uses explicit probabilistic models of sequence evolution, which are combined with several features of maximum parsimony methods.
The authors used real and simulated SARS-CoV-2 genome data to show that MAPLE could infer phylogenetic trees more rapidly and from much larger collections of genomes than existing methods. MAPLE could also perform more extensive phylogenetic tree searches due to its reduced computational demand, which resulted in phylogenetic trees with greater accuracy.
Overall, MAPLE enables phylogenetic analysis of genomic epidemiology datasets that are at least 1–2 orders of magnitude larger than was previously possible, meaning scientists will be able to analyse millions of viral genomes at once.
The authors note that MAPLE is suited for analysing many sequences at short divergence from each other. As such, it will be useful for studying the evolution and spread of pathogens such as SARS-CoV-2, influenza viruses and Mycobacterium tuberculosis.
MAPLE can be accessed via the GitHub website.