The term ‘proteomics’ was first coined in 1996 by Marc Wilkins, used to define the large-scale analysis of all the proteins in a cell, tissue or organism. The goal of proteomics was to obtain a more wholesome view of biology, as opposed to studying each protein individually.
Proteomics has evolved over the past decades. This is mostly due to the accumulation of protein and DNA databases, with algorithms for searching through all the information generated, and improvements in technologies, such as mass spectrometry. Today, proteomics is essential for early disease diagnosis and monitoring. It also plays a crucial role in identifying target molecules for drug discovery and is used to understand complex gene functions.
Why is proteomics useful?
Often, the most useful insight cannot be obtained by only studying genes – much more can be found out by looking at proteins too. The proteome is highly dynamic, as proteins can be modified in response to internal and external cues and different proteins are constructed by the cell as circumstances change. This is why proteomics examinations can be described as a ‘snapshot’ of the protein environment at any given time.
Applications of proteomics
Annotation of the genome
Genomic information needs to be integrated with data obtained from protein studies to confirm the existence of a particular gene.
Protein modifications and post-transcriptional control can elucidate the underlying mechanisms for several biological processes. Proteomics enables these changes in the proteins expressed by a cell to be analysed simultaneously.
The wrong localization of proteins is recognised to negatively impact cellular function and even contribute to diseases, such as cystic fibrosis. Proteomics identifies the localization of proteins to create a 3D map of the cell and inform about protein regulation.
Cell growth, cell death and the cell cycle all involve signal transduction through protein-protein interactions. Proteomics can track all of these interactions by the yeast two hybrid method, a technique used to discover protein-protein relations by binding a transcription factor onto an upstream activating sequence.
Protein structure determination
Protein folding is essential for protein function, hence cell function, and proteomics studies can be used to predict protein structures from sequences. Investigating the structure-function relationship within individual proteins, along with their interactions with other proteins, will enable the location of genome modifications and the effects of disease-causing mutations
Types of proteomics
- Expression proteomics
- Structural proteomics
- Functional proteomics
Expression proteomics is the quantitative comparison of protein expression throughout the entire proteome of different samples. Essentially, all the changes in protein expression are investigated.
Structural proteomics, or cell-map proteomics, maps out the structure of the proteins present in a specific cellular organelle. It allows for the identification of all proteins, determines where they are located and characterizes all of their interactions. In turn, this helps to understand the overall architecture of cells and provides an explanation for why certain proteins result in unique phenotypes.
Functional proteomics is very broad – it focusses on any specific and directed proteomics approach. Most emerging research in this area is aimed at discovering the biological function of unknown proteins and defining cellular mechanisms at the molecular level. Proteome mining is a functional proteomics approach that is used to extract as much protein information as possible.
A mind map of the types of proteomics and their applications. Image credit: P. Graves and T. Haystead, 2002
A typical proteomics experiment: Step-by-step
- Step #1: Separation and isolation of proteins
- Step #2: Gain protein information
- Step #3: Database utilization
Separation and isolation of proteins
Complex protein mixtures must be resolved into individual components to allow proteins to be visualized, identified and characterized. A popular technology for protein separation and isolation is called polyacrylamide gel electrophoresis, which separates proteins depending on their size, structure and molecular weight.
Gain protein information
Mass spectrometry is a method used to gain structural information about a protein, such as peptide masses or amino acid sequences. It can also determine the type and location of protein modifications. There are three stages to mass spectrometry – sample preparation, sample ionization and mass analysis. The results can be used to identify proteins by searching DNA and protein databases.
Databases enable protein structure information to be used for protein identification. They require high quality data and optimised searching methods. Peptide mass fingerprinting, amino acid sequence searching, de novo peptide information searching and uninterpreted mass spectrometry data searching are all possible approaches to take, once again, depending on what type of data has been generated and the applications of the proteomics study.
Mass spectrometry revolutionised proteomics
The first major technology for the identification of proteins was Edman degradation, introduced in 1949, involving the N-terminal sequencing of proteins. Until the early 1990’s, protein sequencing was mainly accomplished by Edman degradation. However, developing membranes that are compatible with the sequencing chemicals became tiring and the applications were often limited by the N-terminal modification of proteins – it is challenging to determine whether a protein is N-terminally blocked before sequencing, meaning samples were often be lost in failed attempts.
Therefore, today most proteomics experiments rely on mass spectrometry for protein analysis. Over the last decade, the sensitivity and accuracy of mass spectrometry has increased greatly, so it is not surprising that there is now a great deal of literature considering how the tool can be applied to proteomics.
Step-by-step: Mass spectrometry
- Step #1: Sample preparation
- Step #2: Sample ionization
- Step #3: Mass analysis
Typically, a protein is isolated from a mixture using polyacrylamide gel electrophoresis. Extracting whole proteins from this gel is often laborious and inefficient, so following the addition of a protease, many of the individual peptides are resolved individually. This process is called ‘in-gel digestion’. Next, the peptides need to be purified to remove any contaminants, such as salts, buffers and detergents. Moreover, the peptides often need concentrating, which can be done through reverse-phase liquid chromatography.
Molecules must be dry and charged before being analysed by mass spectrometry. Therefore, the peptides are converted into ions by the addition or loss of one or more protons. ‘Soft’ ionization methods are often preferred, as they allow the formation of ions without degrading the sample integrity, which is important for obtaining information about the proteins in their native states. Electrospray ionization and mixed-assisted laser desorption/ionization are both soft ionization methods.
Typically, mass spectrometers consist of an ionization source, one or more mass analysers, an ion mirror and a detector.Mass analysis, carried out by mass analysers, resolves the molecular ions on the basis of their mass and charge in a vacuum. There are several different types of mass analysers and mass spectrometers, once again, each beneficial for different study designs. Peptide ions are introduced into a collision chamber, whereby they interact with either nitrogen or argon gas and undergo fragmentation. The type of fragmentation that the peptides undergo can then be used to determine what type of ions have been generated. For example, after fragmentation, if the charge is maintained on the C-terminus, the peptide is deemed a y-ion. Alternatively, if the charge is maintained on the N-terminus, the peptide is a b-ion. The correspondence in mass between the y- or b-ions is then used to identify the amino acid sequence.
More recently, protein microarrays have been established for high-throughput and rapid gene expression analysis. These technologies have the ability to characterize hundreds of thousands of proteins in parallel. However, as of yet, these tools have not yet progressed enough to easily explore the function of a complete genome. Nevertheless, it is inevitable that these platforms will increasingly find their way into proteomics studies in the near future.
AlphaFold advancing proteomics
The AlphaFold Protein Structure Database is a freely available library of information consisting of hundreds of thousands of protein sequences and their structures. The collaboration between DeepMind, an AI technology company that’s part of Google, and several other research partners, resulted in a significant leap forward in our understanding of the proteome. Essentially, the AlphaFold algorithm was proven to figure out the structure of proteins correctly and accurately within a matter of days – historically a hugely complex task, usually taking years to complete. This was due to computing resources, which also made the procedure extremely expensive.
Today, the protein structural coverage has been expanded to almost the entire human proteome using AlphaFold – 98.5% of human proteins. Learn more about AlphaFold by checking out: AlphaFold 2 Open up Protein Structure Prediction Software for All.
Image credit: Biocompare