Mobile Menu

What are the implications of solving the ‘protein folding problem’ for genomics?

At the Festival of Genomics and Biodata 2021 we brought together four of the brightest minds in the field of protein folding, with decades of combined experience, to discuss the recent protein folding problem breakthrough and its implications for genomics.

50-year-old protein folding problem

Proteins are the building blocks of life – responsible for most functions within a cell. Their functions are largely dependent on their unique 3D structure. For decades, researchers have longed to understand how a protein’s constituent parts map out the many twists and folds of its 3D structure. Late last year, Google AI offshoot DeepMind made a pioneering step in solving the great challenge – determining a protein’s 3D shape from its amino acid sequence. In the 14th CASP assessment, DeepMind’s AlphaFold 2 system achieved a median score of 92.4 GDT (Global Distance Test) overall across all targets. The results of the AlphaFold 2 algorithm have opened doors for the potential for biologists to use computational structure prediction as a tool in critical research. Read more about CASP and this finding.

The genome is a bit dull

At our recent Festival we were joined by John Moult (Professor, University of Maryland and co-founder of CASP), John Jumper (Senior Staff Research Scientist, DeepMind), Dame Janet Thornton (Director Emeritus of EMBL-EBI and Senior Scientist, European Bioinformatics Institute) and Tim Hubbard (Professor of Bioinformatics, Head of Department of Medical and Molecular Genetics and Head of Genome Analysis, Kings College London and Genomics England) where they discussed this recent breakthrough and what it means for genomics.

Although slightly unusual for a festival about genomics Moult humorously started by stating: “To me the genome is a bit dull because all it does is store information that makes proteins.” As the co-founder of CASP, Moult described the protein folding problem and why he launched the assessment. CASP has the mission of trying to advance computational methods for the protein folding problem. Moult referred to it as the analogy of clinical trials in this area. Moult noted that this is the first time these types of methods have solved a serious scientific problem.

When talking about the future of CASP Moult discussed that they have decided to start looking at RNA structures. However, due to the lack of ground truth data to test the methods thoroughly, Moult believes disorder is still too difficult to incorporate.

Second time around

DeepMind’s AlphaFold 2 algorithm significantly outperformed other teams at CASP14 and also outperformed their previous version’s performance at the last CASP. Jumper discussed how this iteration of the algorithm has differed compared to their submission two years ago. He described how they were excited about solving this important protein problem to transform the genomics revolution by turning those sequences into structures. In addition, he also believes that this technology will lead to progress within other problems enabled by large-scale genomics work. He imagines that with these techniques we will move protein bioinformatics to become structural so we can think about things within a structural context. For example, mapping human variation onto proteins. 

Jumper also discussed the limitations of the method, specifically in relation to dynamics. He described that they currently don’t have control over the state; therefore, it may not respond precisely to small energetic shifts that could be caused by mutations. Jumper suggested that this may limit absolute immediate variant effects but won’t limit it for very long.  Nonetheless, the team are still investigating this and hope to discuss it further in their upcoming paper that will explain how they developed the AlphaFold 2 algorithm.

How big of a deal is it?

Thornton referred to AlphaFold 2 as a “tremendous achievement” and did not believe that she would see this problem solved within her lifetime. This algorithm has a broad range of implications across the life sciences. However, Thornton noted that it will first be important to validate it across many more structures. She discussed that computational biologists should initially aim to learn from the successes of AlphaFold 2 and build similar or better predictors that can be made freely available to all. Ultimately, to Thornton, this breakthrough will lead to an encyclopaedia of the structures of all known protein domains – the so-called Lego blocks of protein structures. This in turn would give us complete structural coverage of proteomes and also power future structural analyses.

Thornton discussed that the algorithm will have important applications in improving the prediction of protein interactions, which she believes will be widely used as interactions are numerous and essential for function. For the genomics community, Thornton explained that there are currently only complete structures available for a few human proteins. This innovation will give us the coverage of human protein structures which will improve interpretation of genetic coding variants for health and disease. It can help us tease a part benign and pathogenic variants. In the long term, Thornton suggested that this knowledge will improve our ability to design novel proteins with new or modulated functions. For example, antibodies or green enzymes that clean up plastics.

Nonetheless, Thornton argued that these methods are computer hungry. Therefore, the amount of computer power available in academia, medicine and environmental science will need to be increased. She stated that we have opened Pandora’s box and believes that “it will have a huge impact going forward on the shape of life science research.”

Registration for on-demand access to watch this talk and all our other talks from the Festival will end on February 12th.


More on these topics

Festival of Genomics / Genomics / Protein