Mobile Menu

Proteogenomics in Drug Discovery: Appreciating the Big Picture

The use of genomics in drug discovery has been heralded as a revolutionary tool. With most drugs ultimately failing at some point in the pipeline, and an investment of 10-15 years and billions of dollars into those that do eventually make the cut, it has been suggested that the integration genomic data could potentially double the likelihood of a drug making it to the clinic.

In our recent webinar series, we were joined by a group of experts who discussed the integration of functional genomics into drug discovery. Among these speakers was Ryan Dhindsa (Assistant Professor, PI, Baylor College of Medicine), who gave a presentation entitled ‘From genes to mechanism: population-level proteogenomics’. Below, we summarise Ryan’s exciting talk. To view the full webinar series, click here.

How can phenotype data inform drug discovery?

Data from over the years has shown a clear benefit to using genomics in the drug discovery journey. This has been further revolutionised by the integration of phenotype data into the process. A great example of this is in the UK Biobank cohort, where phenotype data is linked to genetic data, and phenome-wide association studies (PheWAS) using this information are an effective way to observe rare variation across phenotypes. In one such study performed by Ryan and his colleagues, the team found an interesting loss-of-function mutation in the MAP3K15 gene that protected against diabetes. They were also able to look at gene dosage effects, due to the gene being located on the X chromosome, meaning they could gain more insights into its effect. Protective loss-of-function mutations are promising targets for new therapeutics, and some of the best drugs on the market target these variants.

However, protective loss-of-function mutations are the exception, not the norm: genetic variants usually increase disease risk. When a variant increases risk, this leads to a need to understand the disease mechanism, and functional genomics and disease models are a common way to do this. However, using multi-omics data from humans is a promising new avenue for gaining a better understanding of disease.

In a study that Ryan describes in this webinar, the team looked at rare variant associations found by assessing plasma protein levels in UK Biobank data. The plasma proteome can reflect the current state of human health because it characterises all proteins in the bloodstream, including those that are disease associated or hail from damaged cells, meaning it is an accurate reflection of phenotype. Additionally, plasma is one of the most accessible tissues, so finding biomarkers here is desirable for clinical reasons.

Study design

The team used the Olink Explore assay as part of the UK Biobank Pharma Proteomics Project, which measures protein levels for 3,000 plasma protein analytes. This cohort also includes whole exome sequencing data for the majority of participants. The researchers performed rare variant studies, like in the previous PheWAS, looking at variant and gene level associations.

Figure 1. Flowchart describing study design. Screenshot taken directly from Front Line Genomics webinar, accessible here.

In a variant level association test, the team found over 5,000 rare genotype-protein level associations across 1,200 proteins. 80% of these associations had not been identified in a companion GWAS, highlighting the importance of whole exome and genome studies to find rare variation. The team then validated the results by looking at how cis-pQTLs (variants that change the level of the protein encoded by the same gene) affect protein levels. They found that over 97% of protein truncating or loss-of-function variants were associated with lower plasma protein abundance. The effects of trans-pQTLs (where the protein-altering variant is found in a separate gene) were more varied.

The implications

The team looked at an allelic series in the gene NLRC4, in which gain-of-function variants have been previously linked to autoinflammatory syndromes. Using UK Biobank data, they found three mutations associated with changes to IL18 levels – some which decreased expression and others that increased it. They then explored how these mutations were linked to clinical outcomes in the previous PheWAS; none were associated with a serious clinical phenotype. This means that antagonism of the gene could have a beneficial risk profile, and that a catalogue of missense pQTLs is important for understanding normal variation in the population. Additionally, knowing that these variants can alter levels of IL18, but not cause a serious effect on the phenotype,s could help in a diagnostic setting.

Gene level association tests led to the identification of another 500 gene-protein level associations and hotspots associated with downstream changes. Additionally, the team looked at whether pQTLs could help to better understand disease biology, specifically in the context of clonal haematopoiesis. They found several pQTLs that were associated with distinct clinical phenotypes. Having access to these pQTLs is also helpful for gene discovery, as missense variants can have very different effects on a protein.

Ultimately, the work of Ryan and the rest of team shows that the combination of functional genomics, phenotypic and multi-omics data, alongside an appreciation of the impact of rare variation, is necessary for understanding the full picture of human health and, therefore, the drug discovery process.

Ryan: ‘Rare genetic variation clearly plays a role in complex human traits. Linking EHR and sequencing data is transforming therapeutic target identification and large-scale multi-omics and functional genomics can help elucidate disease mechanisms at scale.’

Q&A Highlights

Do you anticipate PheWAS becoming more widely used than GWAS in therapeutic discovery?

I think rare variant studies will become more popular, or integrating both rare and common variants, because WGS now is more affordable, so you don’t to limit yourself to common variants, assuming you have WGS data available.

You performed a proteogenomic analysis to understand disease mechanisms. Which of either cis or trans pQTLs would you recommend for targeting disease biology?

I think both are helpful. Cis associations are helpful if you’re trying to pick up an allelic series, you can see the effect the mutations have on the protein. But I think the clonal haematopoiesis example is a nice one to show how trans associations can help identify which pathways are perturbed in a genetic setting.