Mobile Menu

How can proteogenomics inform pan-cancer analysis?

The central dogma of molecular biology states that DNA is transcribed into RNA and then translated into protein. But with every rule comes exceptions – post-transcriptional modifications and regulatory mechanisms can impact this central process, leading to decreased correlation between RNA and protein levels. This means that transcriptomics alone cannot be used to fully assess the proteomic landscape, especially in cancer cells.

Chad Creighton, Professor, Baylor College of Medicine, spoke at a recent Front Line Genomics webinar about pan-cancer mass-spectrometry-based proteogenomic analyses and the application of combining proteomics and genomics data to assess tumours. To hear Chad’s talk in full, as well as the other talks on ‘Harnessing Mass Spectrometry-Based Proteogenomics to Advance Precision Medicine’, please follow the webinar link.

Finding correlations

Pan-cancer analyses aim to define molecular subtypes across diverse tissue-based cancer types. Until now, transcriptomics has taken the lead in this field thanks to the early adoption of the relevant technology, leading to an abundance of data. However, recent updates to mass spectrometry methods have allowed large-scale proteomics analysis of different cancer types.

A number of projects have now made data from over 2,000 human tumours available in the public domain. This consists of proteomics and other corresponding data including genomic and transcriptomic information. The data spans 14 different tissue-based cancer types.

Chad and his team made a compendium of the above data to assess the correlations between protein expression and mRNA levels, to find out whether pan-cancer molecular subtypes exist on one level, but not the other, a finding that could influence treatment decisions.

Figure 1. Annotated gene sets of the ~10,000 genes included in the pan-cancer compendium. Red indicates high correlation between protein and mRNA levels and blue indicates poor correlation. Screenshot taken directly from FLG webinar.

They discovered that protein expression and mRNA levels broadly correlate, but with notable exceptions, namely in the humoral immune response.

Chad: When we look at this compendium, when we look across all these tumours, we find that protein expression broadly correlates with corresponding mRNA expression, but with notable exceptions… We find a median protein vs mRNA correlation R-value of about 0.4, so that’s a positive correlation. It’s not a perfection correlation, a far cry from a very high correlation. Theres lots of variation that is captured at the protein level but is simply not captured at mRNA level.

Proteomics based molecular subtypes

Proteomics-based molecular subtypes can provide insights into the pathways and processes that appear deregulated in tumour subsets. Chad and his team have published two studies to define these subtypes.

The first study from 2019 identified 10 proteome-based molecular subtypes, spanning 532 tumours from five tissues of origin. Using mRNA-based molecular atlases across 32 cancer types from the Cancer Genome Atlas and proteomics data, they took the same sets of tumours and classified by either mRNA-based or protein-based subtype. The matrix in Figure 2 describes the concordance between these subtypes.

Again, the team observed high correlation but with some exceptions. These included K2 (adaptive immune response) and K3 (humoral immune response) proteomic subtypes, which overlapped with mRNA-based immune subtypes. However, these subtypes had distinctions that existed only on the protein level, despite looking homogenous on mRNA level.

Figure 2. Concordance between proteomic- and mRNA-based subtypes. Screenshot taken directly from FLG webinar.

As for gene expression-based signatures, K2 is associated with T cells and K3 with the complement cascade. Given their impact on the immune system, these proteomic distinctions could impact a patient’s chance of responding to therapy. This highlights a need to assess proteomics alongside other omics in order to get a full picture of cancer, given that these alterations don’t appear at the transcriptomic level.

Mining proteomics data for gene targets

Another study from the team involved searching for potential drug targets for uterine cancer.

They examined molecular correlates of cancer grade (which is indicative of prognosis and treatment efficacy) and looked at both protein- and mRNA-grade correlations for kinases, which are good drug targets. Some of the kinase correlations were, once again, only significant in the protein data.

Four of these kinases were elected for further study; three were seen to have functional impact in cancer cell lines, including in cell viability. This highlighted the existence of subtypes that only appear in protein data and, ultimately, the value of the compendium, as researchers can combine datasets and identify genes with potential functional roles or therapeutic responses to certain drugs.

Chad: These experimental results provided a proof-of-concept for the resource value of our compiled proteomics results for guiding functional studies. Furthermore, publicly available proteomics and multiomics data from cancer cell line models may also aid in identifying therapeutic strategies for different cancer subsets.

Other applications of proteogenomics include assessing drug sensitivity, and Chad’s team have made their compendium available as part of the UACLAN data portal for ease of access.

Q&A Highlights:

Q: Are there any other (non-mass spectrometry-based) approaches to proteogenomics or are there any particular advantages to using mass spectrometry?

A: I’m an end-user so I don’t actually generate the proteomics data, but what’s great about mass spectrometry is that you get thousands of proteins. Recently, it’s ramped up. As a grad student, I remember maybe you would [only] get a few hundred.

I worked with Gordon Mills over several years looking at RPPA, and that’s an antibody-based platform. And I like both. RPPA is targeted; maybe you’ve got 200 proteins, but what’s nice about that is there are certain very important phosphoproteins, like the PI3K kinase pathway. Those are well represented in the RPPA platform. And sometimes mass-spectrometry doesn’t pick up the phosphoproteins, but you do get a lot of other proteins.

I think Apollo has been doing RPPA and mass-spec and that’s the best of both worlds, to use multiple platforms.  Because each platform has features that the other platform doesn’t.

Q: For using protein and mRNA to identify candidates in uterine cancer, what is the value of looking at the correlations of protein and mRNA to identify candidates of disease outcome? Wouldn’t one omic by itself be enough?

A: I can think of two reasons why it’s good to do the integrative analysis. I think, historically, we just looked at mRNA data and kind of used it as a good surrogate for protein, because we didn’t have protein data in most cases. But I think if you’re looking for new targets, something understudied, some whole genome studies will come up with hundreds of significant genes at the mRNA level, and how many of these are actually, really significant? Well, the protein data can be a good filter. So, that can narrow the search space.

And if you’re looking for new targets, another reason would maybe be the false discovery rate. If you look at one omic platform, a nominal P-value is often not enough to say something is differentially expressed. So, you might have a set of genes with a 20% FDR rate, maybe 20% of these could be false. But if you take it to another platform and overlap the results, and it shows up in both, that really cuts down your expected FDR.

Q: Have you got plans to use other omics – metabolomics, epigenomics etc.?

A: Yeah, we’d definitely like to get the full picture of cancer and for that you need to look at other omics. Metabolism pathways often come up, you see what the cancer cells are trying to do, but it doesn’t always show up at the protein level and that doesn’t necessarily reflect the level of metabolites. So, I’m sure as that data becomes available, we’ll definitely be looking at it.


Zhang Y, Chen F, Chandrashekar DS, Varambally S, Creighton CJ. Proteogenomic characterization of 2002 human cancers reveals pan-cancer molecular subtypes and associated pathways. Nature Communications. 13;13(1):2669.(2022). Erratum in: Nat Commun. 2022 Aug 10;13(1):4688.

Chen, F., Chandrashekar, D.S., Varambally, S. et al. Pan-cancer molecular subtypes revealed by mass-spectrometry-based proteomic characterization of more than 500 human cancers. Nature Communications. 10, 5679 (2019).

Chen, F., Zhang, Y., Gibbons, DL., Deneen, B., Kwiatkowski, DJ., Ittmann, M., Creighton, CJ. Pan-cancer molecular classes transcending tumor lineage across 32 cancer types, multiple data platforms, and over 10,000 cases. Clinical Cancer Research 24, no. 9 (2018). pp.2182-2193.

Monsivais, D., Vasquez, YM., Chen, F., Zhang, Y., Chandrashekar, DS., Faver, JC., Masand, RP., Scheurer, ME., Varambally, S., Matzuk, MM., Creighton, CJ. Mass-spectrometry-based proteomic correlates of grade and stage reveal pathways and kinases associated with aggressive human cancers. Oncogene, 40(11) (2021). pp.2081-2095.