FLG: What is the UK Biobank and how does it contribute to biomedical research?
UK Biobank was established by the Medical Research Council and the Wellcome Trust at the start of the century with the ambition of recruiting half a million men and women aged 40 – 69 from all around the UK and to follow their health for the next 30+ years thereafter. The data resource is available to any researcher anywhere in the world to use for health-related research that’s in the public interest. There are over 10,000 researchers using it in about 1500 ongoing projects.
FLG: How was the patient data collected and analysed?
Geography was carefully considered during recruitment to ensure that there was a heterogeneous mix of participants. Other measures were taken to ensure the involvement of typically difficult-to-recruit groups, such as younger men and those from lower socioeconomic backgrounds.
Volunteers completed detailed questionnaires about their lifestyle, and there were also medical history interviews which are being supplemented from their NHS health records. A wide array of physical measurements were made and biological samples collected and stored for future assays.
Subsequently, web questionnaires have allowed the capture of additional information about lifestyle (such as occupational history and by diet diaries), as well as about health outcomes that may be less well-identified through health record systems (such as mood, depression and cognitive decline).
Genotyping of all half million participants has already been completed and samples assayed for 30-40 clinically relevant biomarkers (such as those indicative of diabetes or thought to be associated with various cancers). One interesting dataset that will be available soon are results of telomere assays – significant since, for example, telomerase activity has been implicated in the aging process.
Regeneron, in alliance with several other industry partners, is undertaking exome sequencing of all of the participants’ samples. Data on the first 50,000 exomed participants have been made available to all researchers and exome data for about 300,000 will be available during 2020. More recently a consortium of the UK government, the Wellcome Trust charity and four companies have funded Whole Genome Sequencing (WGS) of the entire cohort. Again, those data will be made available to all approved researchers during the next few years.
FLG: How is the UK Biobank able to provide this information to researchers but ensure patient’s privacy is protected?
Firstly, the data that researchers receive have had all the patient identifiers removed. At UK Biobank, we check the objectives of the researchers during the access application stage. They must indicate in their application what they will be doing with the data, and their institution must sign a material transfer agreement. This states that they must not try to identify any participant and breaching these terms will result in penalties (including no further access).
FLG: What have been the main technical challenges of the project?
As we add the WGS information to the resource over the next 2-3 years, the scale of data limits the feasibility of sending the data to approved researchers. Computing power would limit which groups could download the data. To overcome these challenges, UK Biobank has been tasked with implementing a data analysis platform and data storage solution where researchers will be able to gain access to the platform and do their analyses. This will be important in democratising access so that it is completely accessible for all researchers worldwide.
FLG: Can you talk a bit about how WGS could transform the research stemming from this resource?
The true beauty of this project is how we’re seeing scientists from all walks of life using these data for purposes that no single research team could manage … or even imagine. With the genotype data alone, we’ve seen different research groups approach the data in interestingly different ways. Consequently, we learn much more than would be the case if it was just being used by one group. We don’t constrain how the data can be used in ways that might have been typical in the past.
One of the aims of Biobank is to create problems! For example, when we decided to image 100,000 people, we created the problem of image data on a scale that was unprecedented. This has driven the development of automated algorithms to analyse those images rather than doing it manually, which is creating a lot of novel derived variables that can be used for research.
I think the sequence data will be the same. WGS data on half a million individuals is a scale that no one previously has had to deal with. So, it’s important that they are in an accessible resource that allows as many people as possible to work out how to use them. That there will be interesting things that come out of these data is almost certain – what they will be is much less clear. We are creating a problem for the world’s scientific minds to solve – how to use a wealth of complex data to get clearer ideas about the many different causes of many different diseases.
FLG: One of the premises of the Festival of Genomics is to showcase all the amazing research that the UK is undertaking in genomics. How do you see the UK building on its status as a world leader in genomics and what role will the UK Biobank play in this?
It’s clear that in terms of research funding through both the charity and government sectors, the UK has taken an international lead. UK Biobank is creating opportunities for research, but it will now need to be carefully considered how the lessons learnt are incorporated into healthcare systems.
Sequence data may help to determine what treatments are given to particular patients, so allowing more personalised treatment, particularly in cancer. But the idea that one could also use genotype-based genetic risk scores to identify a few percent of the population who are at equivalent risk to that of individuals with single-gene disorders (as, for example, with BRCA and breast cancer) – and that multiple genetic risk scores can identify perhaps as much as a quarter of the population who are at that level of risk for one or more common diseases – is very exciting. It could help direct screening services more effectively …one can imagine younger women who score as high risk being invited for breast cancer screening earlier. Likewise, the use of cardio-protective treatments might be initiated earlier in individuals with a genetic risk score equivalent to that of familial hypercholesterolemia.
So, I think that what we’re seeing in the UK is leadership from the research funders; committing large-scale investment in the generation of data for research, and now driving its implementation in the healthcare system. There are few places in the world that would be able to do that. UK Biobank is playing its role by being an international resource for researchers from all around the world to use. And increasingly funding is being attracted from outside of the UK to help enhance UK Biobank – as with the sequencing projects – making it even more valuable for researchers.
FLG: In your opinion what is the most exciting thing happening in genomics right now?
The opportunity to implement polygenic risk scores in health care systems is something that could have a really big impact on the way in which we provide public health. As I mentioned, it’s potentially cost-neutral in that it could enhance the efficiency of screening; by targeting those at high risk earlier and those at low-risk later. Or, by using preventive treatments more efficiently. What’s exciting is that this is a deliverable from genetics which would have an enormous impact on population health. The findings being derived from UK Biobank are directly relevant to the UK population, and the NHS is well-placed to implement them.
FLG: Why have you chosen to speak at the Festival of Genomics this year?
We really do now have something to celebrate in terms of genetic findings that are ready to be introduced on a population scale, and the UK has been central to those discoveries – at least partially through research done using UK Biobank – and the NHS seems likely to be central to demonstrating the benefit of those findings for our population.