Karoline Kuchenbaecker, Professor of Genetic Epidemiology at University College London, joins us to discuss her work to increase representation in genetics studies and her broader research into the genetics of complex traits such as major depressive disorder.
Please note the transcript has been edited for brevity and clarity.
FLG: Hello, and welcome to the latest “A Spotlight On” interview. Today, I’m joined by Karoline Kuchenbaecker, Professor of Genetic Epidemiology at UCL. She’s going to talk to us about her research into the genetics of complex traits, and the importance of increasing diversity in genetic studies. So, without further ado, Karoline, could you introduce yourself and perhaps give us an overview of your work?
Karoline: Hi Lauren, thanks a lot. My name is Karoline, and I am a genomics researcher at UCL, as you mentioned. I’ve been conducting research about the genetic causes of diseases for quite a long time, and I’ve actually worked on different diseases over time to do this. The main focus, or the one thing that connects all of my work, is questions around diversity (especially people of different ancestry) and whether that matters and how it could impact findings in genomics – especially when we think about genomic applications for medicine.
FLG: Brilliant, thank you for that. I’d love to know how you first got into this line of work?
Karoline: Actually, that’s a little bit of an anecdote. I mean, I got into genomics in general, just because I thought it was really interesting and so new – and I studied psychology previously alongside statistics. But they’re all relatively older fields. Genomics, on the other hand, was so exciting, because you could find new things. But this diversity question emerged, literally on one particular day, when I was teaching in Cambridge, and I discussed how we do these genetic studies. There was a student who had a huge impact on me – I was explaining how we’re looking at ancestry in these studies and how we try to use genomics to infer the ancestry a person has, and then we usually exclude every single person who has any non-European ancestry. At the time, it was just completely routine – that’s how genetic studies were done then and are still done today. And that student I mentioned, who herself was not White, raised her hand and was completely shocked, and simply asked why are you excluding everybody? And it hadn’t occurred to me if I’m completely honest, the kind of massive implications of doing this and what it means to somebody who’s maybe not self-identifying as White.
FLG: We’ll get in to this a little later, but I also wanted to ask you about your role as Scientific Lead for Diverse Data at Genomics England. Could you tell us a bit about what that role involves?
Karoline: Yes, sure. That’s actually a great new initiative at Genomics England. Genomics England, just for a bit of background, is a government owned company that’s overseen a massive sequencing project in the UK – the 100,000 Genomes Project, which has now been completed. And they are really a key player in clinical genomic medicine implementation – so how the NHS is using whole genome sequencing. But more recently, there were new ideas for really massive projects. One very exciting initiative is the newborn screening project, and the other one is dedicated to diverse data. And as you can imagine, that’s the one where I contribute to – so there’s significant funding to look at diversity and improve things from different perspectives, and make sure that genomic medicine in the UK is as fair and equitable as possible.
FLG: That sounds like an amazing project. Perhaps I’ll have some time to ask you about that later on. Obviously a big focus of your work is on filling this gap in the diversity of genomics data. And I know you recently published a sort of “roadmap” on how to fix this in Nature. So, before we delve into that a little more, could you perhaps give us some context? So how diverse are current genetic studies? And is this a trend that’s improving?
Karoline: Yeah, important question. Currently, genomic studies are not very diverse. around 80% or so of published studies use data from people who have European ancestry – so the vast majority. And if you compare that to, you know, proportions of the world’s population, obviously, 80% of the world population are not White. There are other major ancestry groups, and there’s a big imbalance – for example the second largest group is people of East Asian ancestry. And some groups like this are really severely underrepresented. The thing that really surprised me was when we looked at these numbers in that paper you mentioned, we looked at it over time, and it hasn’t really changed. So things haven’t improved in terms of proportions in the last years or so despite repeated calls, and despite a lot more awareness, compared to the days when I started out in genomic research.
FLG: I remember reading the same thing in your editorial. And it was also a massive surprise for me, I had assumed that might have improved, especially over the last few years. So yeah, I wanted to go back to what might seem like quite a basic question. But why is it so important to include different populations in genetic research?
Karoline: Yeah, I don’t think it’s a basic question at all. There is ongoing research to really pin this down. So fundamentally, all people are genetically very, very similar. And the differences between different ancestry groups are minor. But by focusing on predominantly one, we’re still excluding a lot of genomic diversity. So we’re not really having the full picture. And there’s a lot of discoveries we’re missing out on, because there’s so many people we’re not including, then there’s also some practical consequences.
So, for example, if we’re systematically not studying people of African descent, or from certain regions, there may be particular mutations in those groups that we’re not actually aware of, and then if we do genomic screening in the NHS, we may be missing really important mutations a person may have just because we haven’t studied them, or we might just not be sure if this is really impacting on the disease or not. And that’s really problematic for people. And then the third very important bit is predicting risk based on genetics. It’s been well established that this doesn’t work very well across different ancestry groups. So, as we do most research in White people, predicting risk works by far best in White people.
FLG: Which is obviously a massive problem for healthcare generally as well. I think one of the main takeaways from that Nature paper you published was that it’s not just important for those populations that are currently being missed, but actually, for everyone. So I wanted to ask you as well about what sort of factors have contributed to underrepresented populations being absent research?
Karoline: It’s clearly a really complex story. From the probably quite naive perspective of the researcher – having been myself part of the more mainstream genomics research groups – the reason we gave, and the reason we believed in, was that if you have a lot of genetic diversity in your study, there is a risk that you find variants that are not really linked to disease – we call it population structure. And since studies had started to focus on people of European ancestry, it was the easiest thing to keep going and to build on those existing collections. And then it was just, yeah, a lot of the code had been written for this group. And, and so now, when a very small number of participants from another ancestry group are available, it was just not worth the hassle of all this added complexity – or so people thought.
And yeah, I think it’s just spiraled into a situation where people would just always go the easiest way. The bigger picture, for sure, is that we’re living in a society that has values and norms. If suddenly, people of European ancestry weren’t included in genomic research, that wouldn’t have been possible, it wouldn’t have happened, right. So I think social values, bias, has clearly contributed, and they impact on research at every level, right? From the politics, down to the communities of researchers – if they’re not representative and diverse, then there’s nobody to change my mind. It needed a student who, you know, did not identify with the majority White ethnic group to change my mind. So we need a lot of these people also in positions of influence. We need to value diversity. And yeah, that clearly hasn’t happened for a very long time.
FLG: Like you say, it’s definitely a systemic issue. And there was a point that you sort of picked up on there – how does lack of diversity among researchers also drive this bias?
Karoline: Yeah, exactly. As I said, I mean, I think this is complex, but first of all, if there’s a more diverse group in research, I think the norms are going to shift. And it just won’t be acceptable to continue in this current way anymore. And the other thing is that even the research questions you ask, may be slightly different. The types of diseases you focus on may not be identical. Being from a particular (ancestral) group gives you a certain mindset, and we’re influenced by it in all sorts of subtle ways that we don’t realize, and that impacts massively on our research – there is no such thing as objective science, that doesn’t exist.
FLG: Does this lack of representation impact on research around rare variants more? How does that influence things?
Karoline: That’s a really interesting question. You are right, I mean, there are probably some differences in the rare variant field and in the more common variant field. I think there has probably been more diversity in the rare variant field because certain populations provide certain advantages, especially in recessive disorders. Groups with more consanguinity are just are more likely to have recessive inherited rare variants. And so there has been a long-standing tradition in doing research in communities or countries with more consanguinity.
The other thing is that there is a general assumption in the rare variant field that findings would generally be more portable. But I’m not entirely sure it’s always true. So inversely, then maybe there’s lack of, you know, there wasn’t the same level of outcry in the research groups as there was in the common variant field just realizing with the big numbers, the big studies, how biased they were. So, yeah, maybe the change hasn’t quite arrived in that field as much as the common variant.
FLG: Yeah, so obviously, you’ve talked about some of the factors that have contributed to this. But the other main focus of your piece, and generally your work, is what we can do now. How can we ensure that populations aren’t left out in the future?
Karoline: Yeah, that’s a great question. So the first thing is, I would say surprisingly simple. Lots of researchers use existing data, existing studies. And many of these studies have at least some participants which are more diverse. The most widely used genetics research is the UK Biobank. It has half a million participants, so it’s gigantic. But it also has around 38,000 participants with non-European ancestry. And the vast majority, over 90% of studies, exclude all those participants. So a fairly simple starting point would be to not use a default analysis model where you exclude everybody else, and methods have caught up with it so it’s not actually necessary to exclude everybody else. So that’s a very simple first thing. And I think that more samples, better generalizability… it should benefit everybody.
Equally, what you can do now, if you’re a reviewer, and you have a study that only focuses on European ancestry, you should at least ask why. And I think we need to move on from a perspective where that’s the widely acceptable approach. And then, of course, when designing a study, I don’t see any reason anymore to focus it on European ancestry only. There are lots of arguments to possibly even focus on other groups, sometimes it’s a higher burden of disease, or actually more potential to discover new things that we didn’t know about because they have been under-studied. And, of course, try to find collaborators who are different in lots of ways from you, because eventually the research is very likely to be better. And there might be questions that you haven’t considered before. Community engagement is also important, and it’s something that any researcher could do fairly easily – to work with those communities, interact with them somehow, reach out, talk to patient representatives. You know, again, I think the indirect impact, even if you don’t instantly have a massive study that is very diverse, it’s likely to change your mindset. And then, of course, funding – we need to look at funding, and it’s important that funders value diversity as much as the research community does.
FLG: You mentioned the UK Biobank data. This is quite a tricky question, but what do you think the steps are to encouraging researchers to actually use that diverse data? Is it a question of just general education or is it better tools?
Karoline: Yeah, better tools are definitely the one thing that would make a gigantic difference. Also, and I think this is actually happening more and more, but better education, communication, changing the norms, discussing the value of these data. But practically, most of the time busy research groups with different levels of expertise – they have their scripts and their way to analyze data in place, and that’s been great for genomic research in that it has become easier and easier to do analysis. The problem is it’s not yet as easy to handle more diversity in data sets, and people are very unsure about it. And indeed, some of the more specialized software that is good at managing diversity is really difficult to use. So it’s very important to create pipelines that have really easy to use software. Not everybody is a coder or statistician in the field, lots of people with a medical background, everyone comes from different areas. So they’re not always super computer savvy. It’s great that there’s lots of different backgrounds in the field, but we beed an easy-to-use tool. I have tried to find funding to create that but haven’t been successful so far. So, let’s see.
FLG: I’d like to speak to you a bit more about some of your work and how you’re working to fill this gap in your own research. Could you tell us a little about that?
Karoline: In several ways. We are working on our own tools that are helping to address some of these questions. In my group and with our collaborators, we’re asking more fundamental questions beyond a specific disease, for example, if you do have lots of findings already, are they generally transferable – do they apply to everybody else? Or do we have to be careful about that? And then we specifically do research using existing datasets that are more diverse on a number of outcomes, the biggest one at the moment being major depression. And the final one is doing a large study. So recruiting data ourselves and supporting others to collect new data and to recruit. So yeah, those are the various approaches we’re using.
FLG: I guess part of that, and something you mentioned previously, is global collaboration and initiatives. I wondered if you could tell us a bit about the H3 Africa Initiative?
Karoline: Yes, sure. I’m not directly a part of H3 Africa, but it is a really, really a massive step-change. It’s a gigantic collaborative effort in Africa. It’s funded by Wellcome and the NIH. And it is the biggest project around genetics in different African countries. It’s led by African investigators, which is really extremely important (rather than Western investigators that go into African countries and take out samples). And the aim is to cover a lot of the diversity across different regions in Africa.
There are a large number of study sites in different countries and investigators running projects. And the scale is really large. So I think they’re targeting to get to genotyping of around 70,000 participants, which is amazing. So yeah, I think that is going to change the field to some extent. And the African continent in particular has been one of the most left out in terms of doing genetic research, although it’s the area in the world where there’s by far the most genetic diversity.
FLG: Yeah, and I guess the next step will just be making sure people actually use that resource. It sounds like an amazing initiative. We’ll focus in on some of your own research now. So, could you tell us a little about some of the key findings from your recent work into depression in diverse populations?
Karoline: Yes, absolutely. Depression is an interesting disease to study. It is a mental illness, it’s very, very common, and it’s actually very severe, it has a gigantic burden. Some people are not aware, but you know, it is an actual disease. And it it’s one of the diseases that has probably the biggest burden on the world. Depression is obviously a global disease, it impacts people everywhere. Although there are some differences in in prevalence. And just like with almost any other complex disease, the majority of research has so far been done in White people. Depression is also interesting, because it is heritable. About 30 to 40% of the variation risk is genetic. But it’s very complex. And that meant that the first genetic studies really struggled to find anything.
There’s been lots of discoveries for other diseases but depression has taken ages. Eventually, it turned out that there isn’t one major gene involved in depression, it looks like there’s loads of genes, and they just have a really small impact. Interestingly, the first study that found a gene Linked to Depression was done in China and Chinese people. But after that effort, not much happened in terms of diversity. So there have been a lot of big studies, and they have now become really gigantic. The published work has more than 100,000 cases, and the next upcoming study includes over a million people in total. So really big, really fantastic. And loads of depression genes have now been identified, published – it’s over 100 genetic variants. But in the latest release, we’re talking about 500 or so.
So with all of this wonderful progress, we were asking: What about people of diverse ancestry? And so we set out to look for data and mostly used existing data. So they were actually already there, anybody could have done this. And we just went around to other researchers, all sorts of bio banks and online resources to get every little bit of data. And we put all of that together, it took us five years, until the end, we had data from 80,000 cases. So people with depression and loads and loads of unaffected people with diverse ancestry. And it was great, we used that for discovery, and we found around 50 new variants linked to depression, which is great.
We also tested if the existing findings are transferable. And we were quite surprised to see that the picture is mixed. Many of the genes are not actually universally impacting on the research question. We don’t quite understand why yet. Based on the research we’ve done for other diseases, things tend to be fairly transferable. But for depression, it just seems to be more complex. And there are lots of potential explanations, maybe the environment plays a role, maybe depression is actually not just one disease, but lots of different little subgroups of diseases and maybe it is different in different countries and within different ethnic groups. You may be recruiting slightly different cases. We don’t know. But yeah, I think it’s an important reminder that you can’t just do research in one group, as your findings may not be generalizable.
FLG: That’s really interesting. Is there future work that you might do to look into whether depression is lots and lots of different diseases that were just being grouped under depression?
Karoline: Really good question. So we’re actually hoping that genetics could help with that. So it’s extremely complex to just look at the symptoms somebody has, because they can be so diverse that patients might have almost no overlap in symptoms, if, you know, it is a very heterogeneous disease. But if we use genetics, we’re hoping we could actually look at patient groups that share the same genetic risk factors, and that might be relevant to inform treatment. The other complicated area is that current treatments for depression don’t work for everybody. So hopefully, genetics can shed some light on that.
FLG: That would be great. I believe you’ve done another study in Pakistan on mental health disorders more broadly. Can you tell us a little bit about that? And perhaps some of the key findings from that study?
Karoline: Yeah, that’s right. So in this current depression paper that I mentioned, the smallest major ancestry group were people of South Asian ancestry. And that’s true, more generally, there’s actually very little research on the South Asian continent. And equally here in the UK, or the US, for people with South Asian ancestry. These things are about to change, but it was really important to us to make a difference there as well. So I’m collaborating with two amazing researchers from Pakistan, and also an American researcher, Jim Knowles. And in Pakistan, it’s Professor Muhammad Ayup (also recently joined UCL). And Dr. Arsalan Hassan. And we found several sources of funding to put together this really large study of mental illnesses in Pakistan. So we are recruiting around 10,000 patients with depression, 10,000 patients with schizophrenia, and 10,000, with bipolar disorder, so 30,000 cases in total, and we are recruiting controls as well. The recruitment is happening right now, we have a few 1000 for depression, schizophrenia has actually been completed, and bipolar is ongoing as well. And it’s just been an incredibly eye-opening experience to do this study.
FLG: Yeah, I can imagine. This links back to something you mentioned before, actually, which might be more specific to depression. But you mentioned that prevalence kind of differs across populations. I wonder if you could tell us a bit more about that?
Karoline: Yeah, so there are some groups or countries where prevalence of depression is really, really high. In the UK, it’s on the high end, in the US and some other European countries. And there are other countries in the world where it’s much lower – Japan, for example. And it’s very, very difficult to study why that may be the case. There’s also a number of countries like Pakistan, where it’s actually not really well known how high the prevalence is, there’s been work, for example, by the WHO to try and do these big global comparisons. But they also acknowledge that the data is so weak at the moment, there haven’t been enough large studies that were comparable and well conducted. So we don’t really know.
My hypothesis is not that these are genetic differences. I think it’s much more likely that these are the immense environmental differences that impact on people’s lives and increase their risk of depression. And they can be quite complex. I mean, there may be quite a few in Western countries where the way we live exposes us to things like stress (a really a well-established risk factor) for example. But in Pakistan, there’s definitely an accumulation of risk factors that tend to be slightly different. For example, in certain regions, there is an ongoing problem of terrorism. So loads of people will have been exposed to violence of some form, witnessing bomb blasts, losing a loved one, you know, really, really awful traumatic experiences. So it is clear that on a population level, these events will impact on people’s lives and their mental health and these are not equal across countries. Equally with poverty, there’s things linked to it – fear of being hungry tonight, or not being able to pay the rent… there are lots of refugees in Pakistan right now as well. People have lost loved lost their homes. So it’s easy to imagine how such very difficult life circumstances increase somebody’s risk of not just depression, but also other mental illnesses.
FLG: That’s incredibly interesting. I guess maybe in the UK and Western countries we are more aware of what depression is, would that have an effect on how the disease is reported as well?
Karoline: Absolutely. So whether somebody reports it and how it’s reported in the UK, even if you go back and look at reporting by age groups – a few generations ago, reporting of depression was much less. When you look at different ages, there will be very different likelihoods that somebody in a certain decade would say, yes I have depression. And people also reported differently. And there’s a lot of research on that. For example, in the UK, and in the US, the sort of emotional side of it, sadness, and so on, that’s a key symptom that is reported. And it’s also important for diagnosing it. But traditionally, in China, that was not a commonly reported symptom. And there would be a bigger focus on the somatic symptoms or more of the physical symptoms, but also how these emotional changes would be described and recognized can be quite different. So it is very hard to do this internationally. But again, you know, it is really important that we do. It’s truly possible for people to have the disease and just experience it quite differently. We’re all shaped by how we view the world, our life experiences, how we talk about ourselves, how we perceive and interpret our emotions.
FLG: Yeah, and I guess that makes it very difficult to study as well. So, we’ve spent a bit of time on mental health disorders and that aspect of your research, but you also do research on lots of other complex diseases. Just as an example, I wondered if you could tell us about some of the interesting findings about cholesterol in Uganda?
Karoline: Yeah, that was an interesting earlier study we have did, We looked at as you say, cholesterol levels – most people will be quite familiar with these. It’s such an important risk factor for cardiovascular disease. So having high cholesterol is bad. In fact, what we mostly focus on is having high LDL cholesterol. And it’s such a commonly used biomarker, it’s also easy to measure from blood, it’s really one of the most important clinical blood biomarkers.
So we didn’t actually expect to see what we found, we assumed that the genetics of it would be fairly constant across the world. But we did one of the one of the first studies with a truly global source of different datasets, including, as you mentioned, a study from Uganda. And we just compared findings, both in terms of individual variants, if they impacted on cholesterol levels in different groups, as well as genetic risk prediction. And what we found was that specifically for one for one type of these lipid biomarkers, triglycerides (a type of fat molecule in your actual blood), there were really big differences. So the findings from big European studies were not transferable to the Ugandans. And that was quite surprising to us.
And it’s hard to understand. But if I may speculate, I suspect that this may be related to diet. And again, something we don’t usually account for in genetic research, but the diet of those people from this Ugandan study is so different to those of Europeans. This was a rural Ugandan cohort. They were also younger than the other studies, in general, they had the more favorable cholesterol profile, there were very low rates of obesity. If you compare that to a UK-based general cohort of the same, we look quite different. And, yeah, so I suspect that some of these genes only impact on your blood lipids in the context where people eat certain things and have a certain diet. I think it is biologically plausible, but we don’t generally account for the environment at all. It’s more like, an annoying thing that makes things more complicated. And we generally assume if we have a finding that it will just always be valid across all types of people and environments. That’s clearly not always the case.
FLG: Do you think this is an area that will be sort of looked at more in the future?
Karoline: I think this is definitely where we want to be heading. The last 15 or so years we’ve largely spent on finding genetic variants in very simple study designs. And for the vast majority, we don’t understand the mechanism, we don’t understand if they are absolutely universally applicable or not. And that’s really what we want, what we need, what the greatest benefit of genetics would be – finding the mechanism. If you think about all of these complex diseases, for many of them, we have a very bad understanding of mechanism. Depression is one of them. But you know, lots of others are equally complex, and even well studied diseases, like common cancers, or cardiovascular disease, there’s still a lot of question marks. So what we have to do now is be more specific, be more diverse and have the breadth and diversity of participants (it’s not just ancestry, it’s also diversity in terms of lifestyle, age, sex, socio economic background) and to start looking at these things, and then really try and study the mechanisms behind different genetic associations.
FLG: One more area that I haven’t asked you about yet, I think some of your current work is focused on creating optimal analytical approaches for ancestrally diverse samples. Why is it so challenging to analyze these samples? And what are some of the tools that you are creating to overcome these challenges?
Karoline: The main challenge in the field is that when you’re trying to find new genetic variants there is this fear that the differences, the subtle differences, even in ancestry, could result in findings that are actually false positives. So practically, if we look at a very simple case, you might have two groups or two different ancestry groups in your study, because some variants differ in frequency between them. And now, if we study depression, and depression is also more common in one group, any variant that happens to be more common in that group looks like it leads to depression. But actually, it’s just slight ancestry differences. And this problem is known as population stratification. That’s been the one major cause for anxiety.
There are some standard methods that have been around for ages, and most people would use, such as accounting for ancestry covariates, for example. And they work much better than expected a lot of the time. So I think the problem has been the fear is bigger than it has to be sometimes. But there are newer methods, for example, mixed models that can look at both ancestry, but also include relatives, which is really great. They are a lot more computationally demanding, so they take a lot more time to run, and they need more expertise. If these methods were more easily available, that would be fantastic. And sometimes it’s just a really simple thing like data cleaning – people are often unsure about how to do data cleaning, when you have different ancestry groups, because some of your standard ways of doing that wouldn’t apply. But yes, there are lots of methods for lots of little questions that are being developed by us and, and others.
Part of the challenge is that there isn’t really an incentive to create great tools and make them user friendly or maintain them. As scientists, we don’t get grants for that, you get maybe a grant to develop this method, but then when you put it out there, it will be stuck. It will never be updated. You won’t help you don’t have time to help anybody use it. And it’s become so hard that eventually nobody will. There’s an incentives problem in the field. And Genomics England is actually trying to address this to some extent, my role there or is to support with a tooling initiative where the group is trying to get an overview of all of the existing methods and then try to find ways to make them more accessible. And that would also help us get a better idea of what it is that’s missing. For many questions, there are applications, they just might not be very well known. And so researchers might be unsure how they can address a certain challenge.
FLG: Well, that’d be great for future research. Do you have any sort of hopes or wishes for future developments? What will your research focus on in the future?
Karoline: I think one key hope is that we will all just become a lot more reflective, self-reflective, as a community of researchers, self-critical, and necessarily more diverse to begin with. I think the field is becoming, or has already become, extremely collaborative. But that was still very often limited to the US, the UK, maybe Australia, once in a while, European countries… But that’s changing. So H3 Africa is one example. But truly global initiatives with actual partnerships, rather than Western researchers dominating this. That is my personal aim. And I have some fantastic collaborations, but I think more widely, that’s a development in the field.
What we need to urgently do as part of this capacity building within countries like the UK, in order to be a more diverse group of researchers, we need to make sure from nursery onwards that people are equally supported, and that all of the super talented people from all of the different ethnic groups and backgrounds have the opportunity to do research, if that’s what they want to do. So that’s the big problem right now. But of course, there’s a global aspect to that as well. We also need to do capacity building to make sure all the talented people, for example that I’m working with in Pakistan, who are really amazing, get access to good training. And we will all benefit from better research, better collaborations.
FLG: Yes, like you’ve mentioned, it’s not just having an impact on those certain populations, it’s actually impacting everyone. That’s actually all we’ve got time for today. I’ve certainly learned so much from our chat. So I would just like to say thank you again for taking the time to answer these questions with us.
Karoline: Thank you so much. It was really great and very interesting. Fantastic questions. Thank you.
*Karoline will be speaking about her work on the genetics of major depression in diverse populations at The Festival of Genomics & Biodata 2023. To see the full agenda, head to our website.