Mobile Menu

Making sense of Proteogenomic Data for Human Cancer

Multi-omics data is essential to advancing cancer treatment and diagnosis. Making sense of said data (such as proteogenomic data) is a challenge all in itself. Fortunately, many experts in the field are developing tools and approaches to make things easier for all of us.

As part of our recent multi-omics webinar series, Prof. Bing Zhang, Professor of Molecular and Human Genetics, Baylor College of Medicine, presented his group’s recent work making sense of proteogenomics data for human cancer.

In this article, we give an overview of these proteogenomic maps of human cancer and the new web portal that Bing Zhang’s group have built, LinkedOmicsKB, which enables easy and effective multi-omics data analysis and visualisation.

To hear the full set of talks on-demand from our 3-part MULTI-OMICS ONLINE webinar series, please follow the link.

Interested in Multi-omics? Check out our free Multi-omics Playbook resource

Bringing multi-omics to cancer research

Cancer is a disease of genetic mechanisms, yet there are many processes downstream of the genome that influence cancer phenotype. By integrating mass-spectrometry (MS)-based proteomic data with next generation sequencing (NGS) genomic and transcriptomics, a more comprehensive understanding of human cancer can be found (Figure 1). This helps improve cancer diagnosis and treatment.

Figure 1. Proteogenomics – from DNA to cancer phenotype. Image taken directly from FLG webinar (January 23rd, 2024), full credit – Prof. Bing Zhang.

The Clinical Proteome Tumour Analysis Consortium (CPTAC) is a major initiative for this form of cancer proteogenomics. This consortium gathers blood, tumour and adjacent normal samples from patient cohorts for a variety of cancer types and analyses a variety of proteogenomic measures including:

  • Whole genome sequencing
  • Whole exome sequencing
  • Methylation
  • miRNA
  • RNA-seq
  • Global proteomics
  • PTM proteomics

Proteogenomic Atlas for cancer

In 2023, the CPTAC performed a standardised reanalysis of all their multi-omics data to generate a harmonised multi-omics map across 10 cancer types and >1000 patients1. This data is available in the Proteomic Data Commons. However, accessing and utilising this data is challenging given its size.

Hence, Bing’s group have released a new computational web portal, LinkedOmicsKB, to assist with using this data. Through this portal, you can easily search for a gene, mutation or phenotype that you are interested in and can directly access the data (Figure 2)2.

Figure 2. Overview of LinkedOmicsKB. Image taken directly from FLG webinar (January 23rd, 2024), full credit – Prof. Bing Zhang.

The next question is how to comprehend these results. This is where Bing’s group have introduced new visualization methods within their portal. These methods include:

  • A Pan-cancer Multi-omics Manhattan plot. This plot maps protein, RNA and SCNV data alongside phenotypes and cancer types to find significant markers for specific cancers.
  • An interactive table and heatmap viewer to visualize the correlation betweeen gene/SCNV/protein correlation to a particular outcome.
  • A correlogram to visualize cis-associations, i.e., how correlated protein, RNA, SCNV and methylation are for a particular outcome.

Overall, this portal presents a valuable data resource with 40,000 web pages dedicated to genes, proteins, mutations etc. and user-friendly data tools for visualization, exploration and analysis.

Use Cases of the Proteogenomic Atlas

The first use case involves CALHM5, an underappreciated druggable protein target. In the proteogenomic portal, CALHM5 was associated with cancer markers at the RNA level and protein level across all different cancer types. Pathway enrichment found this protein enriched in pathways for cancer progression.

Furthermore, one phosphorylation site was found for this protein, which was over-expressed across all cancer types. The protein kinase most correlated to this site was PRKG1, and this presented a possible drug candidate.

For the next use case, the group compared the general molecular differences between tumour tissue and normal tissue, and found several genes in which mRNA and protein were not correlated. This occurred both where protein levels were higher than RNA, and vice versa (Figure 3).

One example of these disregulated markers is SMARCA5. In tumours, this protein was highly over-expressed, but the RNA is in fact downregulated. A deeper dive into this gene saw an almost zero level of correlation between RNA and protein levels across all cancer types. The best correlator for SMARCA5 protein levels were the levels of another protein, BAZ1B, which turns out to be the binding partner for SMARCA5, and may explain why the protein levels remain high when the RNA levels do not, as this bound protein may prevent it from being degraded.

Figure 3. Example of the various visualisation tools from LinkedOmicsKB for the SMARCA5 use case. Image taken directly from FLG webinar (January 23rd, 2024), full credit – Prof. Bing Zhang.


1.            Li, Y. et al. Pan-cancer proteogenomics connects oncogenic drivers to functional states. Cell 186, 3921-3944.e25 (2023).

2.            Liao, S. et al. Integrated Spatial Transcriptomic and Proteomic Analysis of Fresh Frozen Tissue Based on Stereo-seq. bioRxiv, 2023.04.28.538364 (2023).

More on these topics

Bioinformatics / Cancer / Methods / Proteome