Akl Fahed, MD, MPH is a physician, scientist, and innovator. He works at Mass General Hospital, the Broad Institute of MIT and Harvard Medical School and his research focuses on using genomics and data science to improve understanding, prevention and treatment of coronary artery disease. Akl talks about polygenic risk scores and the democratisation of genomic data.
Please note the transcript has been edited for brevity and clarity.
FLG: Hello, everyone. Welcome to the latest ‘A Spotlight On interview’. Today, I’m joined by Akl Fahed and we’re going to be talking about the genomic drivers of heart attacks. Akl, if you could please introduce yourself and tell everyone a little about what you do.
Akl Fahed: Thanks for having me and it’s great chatting with you today. I’m a cardiologist and a scientist in genetics. I spend some part of my time taking care of patients who have had a heart attack. I’m based in Boston, at Mass General Hospital and Harvard Medical School. I spend the bulk of my time really trying to understand why those patients had a heart attack and how we could prevent it in the next patient that comes over. To do that, I try to leverage all the data at my disposal, starting from what behaviours they have had in their life. What did we know about them before they had a heart attack? And then we have their DNA information that starts from the time they are born. We try to put the whole picture together, to see if we could prevent the heart attack in the next patient that comes over.
FLG: That’s fascinating and so important. For a bit of background and context, how many people do heart attacks affect? How important is it for us to stop this? And, more broadly, why are people having heart attacks?
Akl Fahed: Extremely important questions. The answer is, unfortunately, heart attack remains the number one cause of death in the world. Despite massive efforts and understanding of heart attacks, preventing it, with things like preventing and reducing smoking, all over the world it still remains the number one cause of death. When we say it is the number one cause of death, what that means is if you look globally at all people who die every year, one-third of the deaths are due to a group of conditions called cardiovascular disease. That includes heart attack plus other cardiac conditions. But in that group, the most common driver is heart attack. Whether it’s a heart attack that leads someone to die suddenly, or a heart attack that leads to failure of the heart muscle, and eventually leads to death. Then, there are related conditions, such as having a stroke, which is essentially a heart attack of the brain. But one-third of death in the world is due to cardiovascular disease.
Now, in certain countries where you have more data, you can start breaking it down. We can understand which ones are heart attacks that lead to immediate death, versus another heart attack that happens in someone who has already had a heart attack. The United States is one of those countries with that level of statistics. The most recent statistics say that one person has a heart attack every 40 seconds. If you think about it, from the time we started our conversation, there are two or three people who have had a heart attack. Unfortunately, this continues to be a major problem all over the world.
FLG: There are very complicated factors into why people have heart attacks. Can you define some of the genomic drivers of heart attacks?
Akl Fahed: I think the fascinating thing about heart attacks is, we have a pretty good understanding of a lot of the mechanisms that lead to heart attacks. We’ve done a lot to prevent heart attacks and reduce the prevalence of them. But there’s still a lot to learn. Starting with the non-genomic drivers, those are the common things that most people know about. Things such as your cholesterol, your blood glucose, your blood pressure, smoking, bad diet, not exercising. I would say there’s still a lot of room to improve them.
Unfortunately, what we’ve learned is, knowing about them doesn’t mean that everyone is following them. Even in some of the best places in the world, you’ll see that a lot of people that should have lower cholesterol, don’t. There’s a lot of room for massive public health efforts to try to make changes to people’s lives and behaviour, so that we can reduce the burden of cardiovascular disease, because we know those approaches actually work.
Now, sometimes you look at people and they have had all those risk factors controlled, yet they have a heart attack. And you start wondering why. Those are the observations that led a lot of people to say there must be something here, if all those risk factors are controlled and they’re having a heart attack. Then, there are other people who are more protected, maybe they do smoke, for example. We all know the anecdotal, “My grandfather smoked until he was 90 and never had a heart attack and it was good genes.” So, indeed, there’s a genetic factor. What we’ve learned over the years is that there are multiple genomic drivers of risk and heart attack.
When we talk about risk, we also talk about protection because it’s a spectrum. We like to classify those genomic drivers into three big groups. One is called monogenic drivers and monogenic drivers is the most well understood mechanism of genetic risk for heart attack. That means at a single point of someone’s DNA, there is a defect. That defect leads to a gene that does not behave as it is supposed to behave. In the case of a heart attack, that most common gene is the LDL receptor. There are other genes, but this is the one that is most commonly affected. The end result is that the LDL cholesterol does not go into the liver, but stays in the circulation, and then deposits in the arteries. This causes them to get blocked and then causes a heart attack. This genetic cause of heart attack is extremely rare. It is not common, if you look at the level of the population, it is less than half a percent.
Then the question is, many more people are having heart attacks – why? What we’ve learned is that there is another mechanism that we call the polygenic mechanism of risk. It is a mechanism that is well known in a lot of complex diseases, such as heart attack risk, diabetes and obesity. Instead of having one single point in your DNA that has a very large effect, you actually have many, many points across your DNA, there could be millions of them, and each one of them might increase or decrease your risk by just a little bit. It could be increasing it by half a percent, and another one is reducing it by one percent. For single individuals, if you sum up all of those variations across your genome together, you end up with a score that we call the polygenic score. With that score, if you look at the population level, you can start identifying individuals with high scores at increased risk and individuals with low scores at reduced risk. Most people are going to be in the middle, as they have average risk.
One of the landmark studies that came out of this from Boston showed that for a large segment of the population, somewhere between eight and twenty per cent of the population have at least double the risk compared to everyone else, depending where you draw the cut off. Sometimes, it is triple the risk if you look at the eight per cent cut off. That is a lot of people. That is the same risk that a single point in the DNA from the LDL receptor actually increases the risk. We’re seeing a lot of people have a high risk due to the polygenic mechanism.
The third mechanism is more recently understood. It’s different from DNA that you’re born with early in life, it is an epigenetic mechanism. These are changes that are happening to your DNA as you grow older and it relates to the interaction of the environment to your DNA. We call them somatic mutations. These are also single point changes in your DNA, and the end result is leading to a mechanism called CHIP, or clonal haematopoiesis of indeterminate potential. This is when a group of hematopoietic or blood stem cells get expanded in an abnormal way. It is the same mechanism that leads to some cancers, but in this case, it is an expansion of those cells without blood cancer. People who have CHIP due to those variations have double the risk of heart attack. It has become understood that this is related to increased inflammation in their bodies. Now, we can measure those variations and we can quantify that risk. These are not common, those mutations are present in half a per cent to one per cent of the population. But collectively, we’re thinking of three genomic mechanisms where people can have a risk of heart attack – the monogenic, the polygenic and somatic changes in the DNA.
FLG: Risk is such a complicated concept to communicate and, doctors are often having to have that conversation with patients. I think with single genes, especially in terms of the BRCA genes, they have become fairly well known in the public sphere. But these polygenic scores have a lot more nuance. If we’re talking about public benefit and patient benefit, how can we communicate this risk effectively?
Akl Fahed: This is a very important question and it’s one of the biggest challenges of using polygenic scores in clinical practice. For decades, we’ve known how to communicate risk from a monogenic cause from a single defect. It has been black or white, you have it or you don’t have it, and it’s very easy to understand. If you have it, you have increased risk, if you don’t have it, you don’t have increased risk. We have understood that and we have an entire infrastructure of genetic counselling and clinical geneticists that know how to report those, both from a laboratory perspective and from a patient communication perspective.
With polygenic scores, that entire infrastructure needs to be built and understood. Going from the epidemiology, statistics and research that my group does, to really moving towards making it an actual test that the lab can run. We also need to understand how we communicate it to patients. And the communication itself is not straightforward. Because, unlike the monogenic causes, you need to communicate this to pretty much everyone, because everyone has a polygenic score. And when you communicate it, you need to be able to have that individual understand the risk and act on it appropriately. There’s a lot of complexity.
So, how do you communicate someone at threefold increased risk versus a twofold increased risk? And what does twofold mean, when you talk about relative versus absolute risk? It brings a level of complexity. How do you integrate the polygenic score with all the other risk factors that you know about? What if someone has a single mutation and a high polygenic score? What if someone is a smoker and has a high polygenic score or a non-smoker with a low polygenic score? It all adds multiple layers of complexity.
There have been massive developments in that space trying to get to clinical implementation and trying to answer some of those questions. Some of that work comes from our group here in Boston, but also from many others. We’ve done some work on looking at the interplay of a single variation, the monogenic and polygenic goals, and how they come together. We have a little bit of a framework of understanding about how to report risk if you have both.
Another thing that we are working on is how does it interplay with your clinical and lifestyle risk factors. How do you actually bring them all together in a single number to the patient? The third thing is how do you build a report that you can offer to patients? What should that look like? How would people understand that? We’ve gone down to the level of doing focus group discussions, designing reports, working with designers, working with genetic counsellors and testing how people actually behave when you return results to them.
Overall, there’s way more work that needs to be done, but I would say the early results are encouraging. What we are seeing in the studies that have been done is that people interpret it in a positive way and it does serve as a motivating factor. We always worry about the risk of correct information – you give people the correct information, but maybe they interpret it incorrectly. You get a high genetic risk result and you say, “Well I have a high genetic risk so, it’s not worth it, I’m gonna go back to smoking”, but that doesn’t happen frequently. Anxiety is not usually high when you return results. Overall, I would say the early data of returning results is encouraging.
One thing that needs to happen is a more prospective follow-up to see what happens when you return that result to many people. We need larger sample sizes. We have a study right now with 100 people that that is being prepared for publication and there have been a few other prospective studies. But there are massive efforts in the UK and in the US to actually determine that at scale and understand how does that actually affect outcomes down the line. I think that’s really where the field needs to move, including clinical trials that actually enrolled people based on their high genomic risk information, and then try to test certain interventions prospectively.
FLG: And that comes with a lot of empowering the patient. You’ve talked about some of the methods there with genetic counsellors to help empower them and support them. But that extra involvement often puts more responsibility on patients as well. What are the key challenges you think we’ll see as we shift our systems to these patient-led ones? And with these sometimes complex diagnoses? What is your dream patient led system?
Akl Fahed: I’m glad you’re using the term patient-led system, because surely, that’s what I believe in. And that’s my personal opinion, I know many people won’t agree with it. I think health systems could have done better in prevention in many ways. You look at studies where entire themes are gaps in care and management. And you can’t help but wonder if there are that many gaps… Even for things we know absolutely work and are lifesaving, such as lowering someone’s cholesterol when it’s high. Or major gaps – look at studies where 30 to 50% of people who qualify are not getting the medication. And that’s not related to genetics. It is pure, known clinical interventions that we know work and save lives, and we’re not doing enough of them. It makes you wonder could this be better if it’s in the hands of patients?
I’m personally a big believer of the consumerism of preventive care. I believe in making efforts patient led, empowering patients to take ownership of their prevention, educating patients and making it easy to access that information.. I think genomics should follow the same path. This should be in the hands of patients, to enable them and empower them to make the changes with appropriate guidance. I think with technology, with software, with algorithms, this is not hard.
I think the other model, we know how it performs. We’ve had enough chance to test it over the decades. I think we need to start building comparison groups using patients as consumers of their own health and care and seeing how that actually performs compared to current standards where it’s mostly doctor driven. My hypothesis is that we would be better off if we put it in the hands of people.
FLG:, It sounds like large datasets are going to be essential for us to gain that population-wide foundation. What are the obstacles in gaining that reliable data and analysing it?
Akl Fahed: Thanks for bringing that up. The issue of access to and availability of data has been one of the best things that has happened in medicine over the past decade. This increased availability of large datasets to as many researchers as possible would not have been possible was it not for large efforts, such as the UK Biobank. I think the UK Biobank really revolutionised how we think about data, both genomic and nongenomic, and it improves our understanding of disease in remarkable ways. I think other countries like the United States are following suit with a lot of efforts, such as the All of Us project. But there are many national biobanks, hospitals and multiple countries in Europe creating datasets. I think those datasets are really changing the way we think about medicine and at a much more rapid pace. Basically, whatever used to take 10 years to discover, now we can discover it in a year if you make the data available. There’s a simple formula – just make the data more available to more people, and there’s always going to be a collective benefit. That requires a mentality change, it requires innovation, it requires people to really believe in that vision. Unfortunately, not everyone does.
So, there are unique models in the UK, United States and multiple other countries in Europe. But an issue remains – if you look at the world population, it is a very skewed representation in those datasets towards individuals of European ancestry. This is partly because it is countries in Europe and the United States that are fighting that effort. I think this is a major issue, because you end up with a large disproportionate representation in those datasets for individuals all around the world. And that’s very important in genetics, because genetic ancestry matters in a lot of those discoveries we make. There’s always a risk, if you take those discoveries and implement them at scale, you’re going to worsen healthcare disparities by implementing models and scores and information discovered from individuals of European ancestry. I think that’s a major issue.
What we need is, same as we have the UK Biobank, we need to have similar models all over the world and different countries with data access. In Africa, in Asia, in the Arab world, Middle East and the Levant. Arabs represent 5% of the world’s population yet we have nearly no publicly available genetic datasets at scale to look at – very, very little in the order of thousands. And that’s hard when you look at hundreds of millions of Arabs around the world.
It’s the same thing with individuals of African ancestry. In my mind, this is the biggest challenge. But I’m hopeful that those models, such as the UK Biobank, US based models, are starting to really prove to the world that this is the way to go. The next steps will be to improve and enhance data availability and data collection from all over the world.
FLG: And on that note about diversity of data and to return to heart attacks, we know that men and women respond differently, present differently. How feasible do you think it will be to implement integrated solutions equally to a range of groups?
Akl Fahed: In my mind, there are three groups. There are sex based differences, so men and women, there’s genetic ancestry differences, that’s very specific to genetics, but there’s also environmental differences related to geographic locations, where the population is coming from. If you study African-Americans in the US and you study people living in Africa, it’s going to be very different. The ultimate solution for all of that is representation in the training dataset. So that means, in the original discovery datasets where we build all those models, I think that should be the number one effort. That should be what everyone is striving towards – just more data availability, that is representative. Now, I would say that takes time. You have to see what else you can do to improve and to help them in that direction. The way I think about it is, it’s like a true north. And then to get there, you need to make small incremental efforts. I think there’s different efforts that could be done.
Obviously, the biggest effort should be done on recruitment and improving diversity. But along the way, there are a lot of things that you can do in the way you treat the data, the way you publish the data, the way you execute on those models. If you’re developing a model, you always need to look at sex-based differences in my mind, and make sure you’re not using a dataset with 80% men and 20% women and then ending up creating a generalised model that’s biased and going ahead and implementing it. When you implement, you need to also implement in diverse groups. If you don’t, you need to recognise that as a major caveat and say, “This only applies in that setting, and I need to build and implement it in another setting to be able to test that”. There are incremental steps to move in that direction.
Some are going to be on implementation, some are going to be on how you analyse the data. And then some are going to be, which we’re seeing a lot with polygenic scores, on improving the methods. A lot of the statistical methods can try to make up for improving the performance of a lot of the polygenic scores that we use, and in certain ancestry groups they do that very successfully, to the extent that certain scores perform equally to scores in Europeans. That’s solely based on a more datasets, but also proof computational methods to get there. So, again, incremental measures to get to that final goal.
FLG: And you’ve just mentioned computational methods as well. Machine learning is often touted as the solution to data overwhelm, how realistic and attainable is it in your field? And what are the limitations we might see?
Akl Fahed: I think it’s very realistic. Let’s talk about the hope of machine learning and what it is allowing us to achieve.. From a hope perspective, the same as an increased availability of genetic data, there’s an increased availability of all data, and an improvement in our understanding of how we digest and work with this data and apply machine learning models. I would say that opportunity is being leveraged to improve our understanding of disease in many ways.
I would say some of the most obvious ways in my field of cardiovascular disease are really around understanding the phenotype of the disease itself, better. Currently, when you think of coronary disease, or heart attack risk, most of the datasets are labelled as a binary phenotype as yes, no, someone had a heart attack or did not have a heart attack. But as a cardiologist who treats heart attacks every day, I can tell you that no two patients are alike. Every single patient is different to the next one that’s going to come over. We don’t really capture that in our genetic studies. One of my main efforts is really thinking about how do I capture that variation in the phenotype. And machine learning is a very powerful technique that allows us to do that.
A lot of the efforts that are happening in cardiovascular disease are using machine learning to understand raw imaging data from patients. You take someone’s image of their heart, the cardiac MRI, or you take the MRI angiogram, which is the test that you do to diagnose heart attack, where you see every single artery in a three dimensional way and try to use that data as input to try to understand the phenotype better. Mixing that with genetic data is really where you start getting a better understanding of the mechanisms of a disease and specific populations that might be at risk and might need to be treated differently. I think this is where the promise of merging genomics with machine learning on imaging data to understand biology better is happening. There have been multiple exciting papers in that space. I would say mostly from cardiac MRI, cardiac echo data and cardiac ultrasound data really shed light on new discoveries of biological pathways that we didn’t appreciate before. So I think that’s how machine learning offers a major promise.
Obviously, there are multiple other applications that are not related to risk prediction. These include automated readings of imaging studies and improving flow. There are machine learning applications to actually improve how we care for patients immediately. In my field, when we evaluate the arteries of someone, often we have to put wires inside their own arteries and do detailed measurements. Now, there are machine learning models that are predicting those measures without even having to put in the wire. That reduces the procedural time, allows us to do it on more people and reduces the risk of the procedure itself. These machine learning applications do not predict disease, but actually help us when we take care of patients, doing it in a more efficient way, in a safer way, on more people, and getting more data. I think that it’s not just a lot of promise, I would say there are actual applications that are happening with machine learning. I don’t think machine learning is the future, it’s actually the present.
Back to the limitations, it’s not all a rosy picture. I think there are limitations we have to consider. When we think about machine learning models, and how we’re using them to predict or build models , in the future, you have to always keep in mind that the models are only going to be as good as the data that you give them. It goes back to our earlier conversation on data quality and data diversity. If you have biases in the way you practice, and then you’re building a model based on that, the model is going to predict the bias. It’s about trying to pick up on those biases. Otherwise you could risk worsening those biases by building models that understand them and act on them. Those are things such as race in the United States, and we know we do have discrepancies in care based on race. If you’re implementing models that are learning from those discrepancies, then we are actually going to propagate them. That’s a major risk that people need to be cognisant of as we use those models. We need to be very critical of what they are doing, how they are doing it and what applications we’re using them for.
FLG: Absolutely. And something that I think is threaded throughout your interview today. So I’m intrigued to pick your brain. What are your thoughts on the democratisation of genomic data?
Akl Fahed: It might not be a very popular opinion, but I am all for it. I do think democratisation needs to happen in two ways. One is the democratisation of access of genomic data for scientists, because that will really increase the pace of discovery. And we’ve got to be fast. One thing that the COVID pandemic taught us, is that we just don’t have time. We’re getting better and better over the years, we’re faster at drug discovery than we were decades ago. But we’ve got to be even be faster to beat disease because the world keeps throwing bad things at us. I really believe in democratising access to data and making it easily available to scientists. Using the collective brain power of scientists all over the world is critical for us to beat disease.
The other way of thinking about democratisation of data is to patients. Again, I’m a big proponent of consumerisation of large parts of healthcare. Preventive care is one of them. My hypothesis is that if we put it in the hands of patients, and we educate them, we empower them, we will be better off than we are today.
I think the future for genomic data is going to be a genome first approach or preventive genomics approach where everyone has their DNA information. You have tools that can educate you and guide you on how to use that as you go on with your life. And because it’s available data that freely stratifies people at risk, why not use it early on? I think we still have multiple milestones to get there, but that that would be my personal vision.
FLG: Thank you so much! I’ve learned so much on everything from polygenic scores to machine learning and the democratisation of genomic data. Thank you for taking the time to shine a spotlight with us today.
Akl Fahed: Thank you so much, Poppy. I really enjoyed our conversation and I appreciate the time to chat with you and your audience. Thank you.