Mobile Menu

Using large-scale population cohorts to identify novel disease biomarkers

The value of large-scale population cohorts has come to the fore in recent years as these cohorts have matured. They now provide a wealth of longitudinal data associated with different disease and life outcomes, and researching these cohorts is proving invaluable to identify risk factors for diseases such as cancer. Examples of these longitudinal studies include:

  • the UK Biobank,
  • the Avon Longitudinal Study of Parents and Children (ALSPAC), and,
  • the European Prospective Investigation into Cancer and Nutrition (EPIC).

Marc Gunter, Professor and Chair in Cancer Epidemiology and Prevention, Imperial College London, is working on the proteomic aspect of the EPIC cohort and recently spoke at a Front Line Genomics webinar. In this article, we will give an overview of the EPIC study and its potential applications for identifying novel biomarkers for chronic diseases using quotes taken directly from the webinar. To hear Marc’s talk in full, as well as the other talks on Oncoproteomics as a tool to advance cancer diagnosis and therapy, please follow the webinar link.

That’s EPIC

The EPIC study is one of the largest cohort studies in the world. It was established between 1992-1999 and recruited ~520,000 people, aged 35 and over, from 10 European countries. The outset was to capture variation in diet and lifestyle and how it was associated with cancer development. At the last follow up, there are now over 70,000 people within the cohort who have cancer, a number that will only increase over time (see Figure 1 for cancer incident rates).

Figure 1. Current Incident of Cancers across the EPIC cohort. Image taken directly from FLG webinar (September 14th, 2023). Full credit – Prof. Marc Gunter.

Furthermore, EPIC has taken pre-diagnostic blood samples at recruitment alongside detailed epidemiological data for these individuals over the 20 years the study has been running. These blood samples, collected before disease onset, are a treasure trove to identify biomarkers of disease onset. Furthermore, EPIC is now in the position to begin applying the novel technologies, specifically multi-omics methodologies, to these existing blood samples, hopefully identifying the underlying biological pathways determining cancer pre-risk.

Marc: “We now have this very exciting opportunity to apply some of these newer molecular technologies to these blood samples, with the hope of identifying, for example, biomarkers of cancer and underlying biological pathways, which underlie the association between many cancer risk factors.”

Proteomic disease biomarkers

Within EPIC, the overall aim is to create a growing resource of ‘omics’ and biomarkers with three aims, to identify:

  • Aetiological markers for cancer and other diseases
  • Early detection markers for cancer
  • Biomarkers/pathways associated with cancer survival

EPIC currently has 70,000 individuals with GWAS data, 40,000 with metabolomics data (targeted and untargeted) and 50,000 with proteomics data. For proteomics, a case-cohort study has been created within EPIC in which 30,000 participants are undergoing high dimensional proteomic profiling (using the SomaScan 7K) to identify proteomic biomarkers of cancer and other chronic diseases (Alzheimer’s, diabetes etc.).

These individuals have had 7,000 proteins profiled. This profile targets proteins across every major biological pathway of interest (cell signalling, metabolism, immune signalling etc.). This approach has found multiple new disease end-point protein markers for multiple diseases including various cancers and type II diabetes.

Multiple proteins were found to be associated with the development of colon cancer in the EPIC study (See Figure 2). The ANTXR1 protein had a strong negative association, and it was already suspected that this protein had a role in colon cancer in small cohort studies, but this is the first time it has been shown at this scale. Immune signalling and complement proteins were identified too along with other interesting markers. Rectal cancer is shown in Figure 2 as another example. Due to the lower incidence rate, the analysis has lower power, but it still produced some unique proteomic hits.

Marc: “Next steps are the replication of these results in independent sets … potentially working with other independent cohorts to replicate these findings. We’re particularly interested in following some of these hits up for potential causal pathways, using triangulation of genetic data and working with the experimental models.”

Future outlook

As these cohort studies start to deploy the latest technologies, this wealth of biological data will be a stunning resource for anyone interested in disease development and progression. The collaborative efforts of these groups mean that deep profiling data from millions of people across decades could soon be available for wide-spread use.

The value of these cohorts cannot be overstated and it’s clear that continual monitoring of existing cohorts and the establishment of new cohorts (such as Our Future Health, UK or Connect, USA) are needed to truly get to the bottom of disease etiology.

Figure 2. Proteomic markers associated with the development of Colon (left) and Rectal (right) cancer. Image taken directly from FLG webinar (September 14th, 2023). Full credit – Prof. Marc Gunter.


Webinar Questions and Answers

Q: Is there overlap between the protein hits for the different cancers and disease?

A: Good question. There is. I can’t recall them offhand but there are certain proteins that come up for several different cancers. Many of them are related to the immune system and inflammatory pathways, which is not unexpected. We’ll be publishing some of this work quite soon and we’re aiming to have a paper where we look at the overlap between the diseases and also multi-morbidity, since we have patients with multiple diseases, so we aim to look at shared pathways between those.

Q: Is the EPIC database freely accessible to anyone?

A: The cohort is open to anyone to make a request for the data and to access the biological samples. If you go to the EPIC website, there is a particular form that needs to be completed, and its reviewed by the EPIC scientific steering committee. We have many collaborations from all over the world in which different groups have been accessing the EPIC data. We’re actually trying to make it more open access and more freely available.

To hear from the other speakers at this webinar and for the full Q&A please use this link.

More on these topics

cancer genomics / Methods / Oncology / Proteomics