No this is not EVE from the Disney Pixar Film WALL-E – but it is close. Researchers have developed an AI model called EVE that is able to interpret the meaning of gene variants in humans.
Gene variants and interpretation
With advances in sequencing technologies, there has been an exponential growth in identifying genetic variation within the human population. Understanding this genetic variation in the context of health and disease has the potential to transform healthcare. Researchers have been able to conduct studies that link genetic variation through access to sequencing, not only to diseases but also to biochemical and cellular phenotypes. However, linking specific genetic changes to disease phenotypes remains an open challenge due to the sheer number of variants in the human population.
Due to this challenge, researchers have developed several new experimental technologies that can assess the effects of thousands of mutations in parallel. However, these technologies do not easily scale to thousands of proteins. They also depend on the availability of assays that are relevant to human disease phenotypes.
To date, the overwhelming majority of protein variants in human disease-related genes still have unknown consequences. Researchers hope that computation will play a role in this, by helping to accelerate clinical variant interpretation at scale. However, state-of-the-art methods have typically relied on training machine-learning models on known disease labels. Unfortunately, these labels are sparse, biased and of variable quality, which has resulted in these models being insufficiently reliable.
An AI tool called EVE
In a recent study, published in Nature, researchers proposed an approach that leverages deep generative models to predict variant pathogenicity without relying on labels. The team specifically modelled the distribution of sequence variation across organisms and then used them to make predictions about the meaning of variations in human genes.
Their model EVE (evolutionary model of variant effect) outperformed computational approaches that rely on labelled data. It also performed on par with, if not better than, predictions from high-throughput experiments, which experts have typically used as evidence for variant classification. The model predicted the pathogenicity of more than 36 million variants across 3,219 disease genes. It also resulted in more than 256,00 variants, which are currently of unknown significance, to be reclassified as either benign or disease-causing.
While the researchers note that EVE is not a diagnostic test, they believe it could be used to augment current clinical methods that are used to determine the meaning of genetic variants. They also emphasise that when used in combination with such tools, the model could boost the precision and accuracy of diagnostic, prognostic and treatment decisions.
Although this type of modelling is still in its infancy, the researchers plan on extending their work beyond protein-coding regions. In the meantime, the researchers are also working on making clinical use of the genetic variation for which we have some understanding. For example, the team is participating in the Atlas of Variant Effects Alliance, a global research effort which aims to map the effects of variation across the genome and create a comprehensive atlas of all possible human gene variants and their effects on protein function and physiology.
Image credit: canva