Mobile Menu

Base editing outcomes predicted by machine learning model

Single nucleotide variants (SNVs) are implicated in about half of all genetic diseases and so, accurately targeting and editing these nucleotides would be a promising therapeutic pathway. To date, a number of base editing (BE) tools have been developed and used to target the installation of point mutations. However, the factors that affect the highly variable outcomes and precision of these tools are poorly understood.

Published in Cell, researchers at Broad Institute of MIT and Harvard University characterised the activity of eleven existing cytosine and adenine base editors to develop a machine learning model to accurately predict base editing genotypic outcome and efficiency. Using the model, BE-Hive, the team was able to correct 3,388 disease-associated SNVs with over 90% precision. Many of these SNVs had previously been considered intractable.

Individual eccentricities of BEs tend to make the tools unpredictable, where they can undershoot or overshoot the target sequence by a number of bases, producing a host of different products. David Liu, senior author of the paper commented on the scope of options that exist in a Broad Institute news post “New base editors come out seemingly every week… The progress is terrific, but it leaves researchers with a bewildering array of choices for what base editor to use.”

Firstly, to analyse BE activities the researchers developed libraries of 38,538 pairs of single guide RNA and target sequences to integrate them into mammalian cell types. The group had already established BE activity depends on the interplay of three factors: the particular base editor itself, the paired guide RNA, and the surrounding DNA sequence. Therefore, they hypothesised that they must test the DNA target sites in vivo with the BE paired to the guide RNA, and following treatment, sequence every cell to create a catalogue of data of how each BE impacted cells.

To analyse this catalogue, author and PhD candidate, Mark Shen designed an ML system to predict the optimal BE to use according to the target DNA site. Additionally, the model, BE-Hive, can accurately predict previously unforeseen features including rare, and potentially valuable, transversion edits to the target sequence. These changes for example, when a cytosine that should be replaced with a tyrosine, is instead swapped with another cytosine, guanine or even adenosine, can be used to correct pathogenic transversion SNVs. This study was able to correct 174 of such SNVs with over 90% precision again.

Last but not least, the team used the tool to engineer novel cytosine BE variants, optimised to modulate the editing outcomes. BE-Hive, the suite of ML algorithms that the team developed has been made open-access through a web app to assist researchers who are designing CRISPR experiments.

In concluding, the authors remarked on the significance of the work, which provides “both refined and novel insights into base editor functionality, advancing the targeting scope, biological understanding, precision, and overall effectiveness of base editing.”

Journal reference: Determinants of base editing outcomes from target library analysis and machine learning

Image Source: Technology Networks 


More on these topics

Machine Learning