A new review paper, published in the Journal of Molecular Biology, has investigated the impact of advanced protein structure prediction technology on our knowledge of protein function. Computational biology is now stronger than ever, but will it actually answer unresolved questions in the field?
Protein structure prediction
Since the first human genome was sequenced, sequencing technology has progressed rapidly. For under $1000, an individual can get their genome sequenced. In addition, sequencing is also routinely performed in clinical and research settings. Through sequencing, thousands of genetic variants can now be uncovered every day. A major current challenge is to make sense of these variants in terms of their pathology or biological mechanisms.
3D structures of proteins can provide insights into how genetic changes impact protein function. However, until recently, these structures were not always accurate or even available. This problem was recently solved by Google AI offshoot DeepMind, who developed the algorithm AlphaFold2. AlphaFold2 is able to predict 3D protein structure from amino acid sequences with unprecedented accuracy.
It was thought that the emergence of AlphaFold2 onto the scene would solve the problem of understanding how genetic variants affect protein function, and how this relates to phenotype or disease. Surprisingly, it is now thought that it will have little to no impact on current approaches.
The main reason structure prediction technology was able to advance so quickly was because of the large number of known structures that are used to train in silico methods. There is simply an abundance of already available structures. This, together with the limitations that even the most advanced prediction approaches have, is why the authors of this paper were sceptical about it having a major impact. In addition, the structural resolution required to assess variant impact is high – perhaps even too high for AlphaFold2!
Unstructured and disordered regions
AlphaFold2 is sophisticated enough to predict structures for large repeat proteins. For other methods, these proteins can cause difficulty due to repeats often being missed. However, AlphaFold2 cannot confidently predict the areas between distinct repeats, affecting the prediction of the overall conformation of the full-length protein.
These linker regions are often unstructured or disordered. They are not generally thought to adopt one single structure and their sequences are mostly unknown. This leads to their structures being very difficult to predict. Though they may seem unimportant, these regions could potentially play large roles in biological function. For example, many genetic variants which have disrupted unstructured proteins may be key to understanding diseases.
Experimental vs predicted protein structure
To solve the problems discussed, even stronger computational methods of structure prediction will be required. However, due to the cost and environmental impact, the authors suggested that it may be better to actually solve protein structures rather than predict them. The current proteins that are predicted at such high accuracy are relatively small – typically single globular proteins or domains. To scale up, huge computational needs and costs would have to be met.
The paper argued that even if experimental methods are just as costly, ultimately an experimental structure would be more valuable than any prediction. This is because an experimental structure offers the opportunity to study the protein in greater detail.
Structural information is rarely enough to assess whether genetic variants are having a damaging effect on protein function. Contextual knowledge, such as reaction mechanisms and other bound molecules, is usually also required. An ideal solution would be a tool that can study precise side-chain orientations, non-protein molecules and system dynamics.
The authors suggested that in the future, focus should shift to meet the challenges that stand in the way of understanding genetic variants. Perhaps advanced molecular dynamics approaches can be used with AlphaFold2 predictions to understand how structures change in response to cellular conditions. This could then be used to help understand whether particular versions of proteins are active, or whether variants are oncogenic. It is hoped that in the years to come answers to these issues will be developed, paving the way for an even stronger understanding of proteins.
Image: Pleiotrope on Wikimedia Commons