Mahsa Shabani is an Assistant Professor in Privacy Law at the Faculty of Law and Criminology, Ghent University in Belgium. Her research focus is on health data privacy, data sharing and access platforms. She also works on biomedical and genomics research ethics, law and policy, including the governance of biobanks and data protection issues. Her work has been extensively published in scientific journals with a broad readership.
Please note the transcript has been edited for brevity and clarity.
FLG: Hello, everyone, and hello, Mahsa, thank you so much for joining me today as we shine a spotlight on some of the social and ethical issues surrounding genomics. Today we’re going to be talking about data sharing and all the opportunities and challenges surrounding that. Before we start, Mahsa, if you could just introduce yourself and tell us a little bit about what you do.
Mahsa: Thanks a lot for the invitation! My name is Mahsa Shabani, and I am an Assistant Professor in Privacy Law, and its focus on health privacy, at Ghent University in Belgium in the Faculty of Law and Criminology.
FLG: Today we’re going to be talking about data sharing, but I think we should start off with the basics. How can genomic data be shared?
Mahsa: Maybe I can also start by saying why data sharing is important in genomics in the first place. There are two main reasons for such data sharing in genomics, and these show importance for both the clinical setting and also for the research setting. On the one hand, sharing genomic data would allow researchers across the world access to large-scale datasets so that they would not be limited only to their own data or data that is coming from their, for example, close colleagues. In that sense, this will improve the statistical power of their databases, and this, of course, is something that is definitely very valuable for genomics research. Secondly, as you may know, in genomics when it comes to clinical purposes there still are a lot of unknowns in terms of the interpretation of the results. So, it is possible for different clinicians to look at the data and see some variants or some mutations but not be able to interpret whether or not these are, for example, pathogenic or benign, and so on. In order to do that, it’s necessary to have access to some similar data from other patients, from other clinics and so on. It really raises the chance to find a match and to say that, for example, these two specific cases are really similar to each other, and it can also help the diagnosis for specific patients. They are the two really important implications for data sharing, both in the clinical setting and also in the research setting.
FLG: How does this important genomic data then get shared?
Mahsa: In order to share data, as I mentioned, in a more traditional setting, it used to be the case that you would just contact your close colleagues or your network and ask if they are dealing with a similar situation or scenario. In that way, it was pretty closed data sharing – it was really between some trusted colleagues and so on. But now, what is growing, and it’s been growing in the last decade, is the development of some infrastructure to allow data sharing via different online platforms. This would allow for researchers or clinicians across the world to look at these datasets and catalogues and say, ‘Well I’m interested to have access to this specific dataset or to learn more about these specific patients’, and therefore then request access for that database. What we are having now is called data sharing by different platforms, by different online platforms, which, of course, are following different data governance or data access models. But in principle, their purpose is pretty much the same. They really want to promote data sharing among researchers and clinicians across the world and improve their accessibility to those datasets.
FLG: What are some of the challenges of data sharing?
Mahsa: I think when we’re looking at data sharing in the context of genomics, there are certain challenges and issues that are associated with this type of data sharing. I can start with the very first one, which is related to sharing sensitive data from individuals, and in this case, from patients. What the problem is, in this context, is that when you are aiming to share sensitive health data from individuals, it can reveal some very sensitive health-related information, not only about the individuals themselves, but it can also reveal some familial-related information from these individuals. In that sense, I think the very first reaction is that when you are planning to develop such infrastructure to allow genomic data sharing, then you need to make sure that you are respecting all the relevant ethical guidelines, and also legal rules that are in place. Also, when we are speaking about the different ethical and legal related issues, the very first thing that comes to mind is the issues related to privacy of these individuals, and also issues related to data protection. So, I would say that this is maybe the very first concern or challenge that has been really present when we are speaking about data sharing in genomics.
FLG: The pandemic has really accelerated data sharing. Do you think we are on the right path? If not, what needs to be done?
Mahsa: I think that what we have witnessed in the last year and in the course of the pandemic was that we really need to have access to large-scale health data from across the world. We cannot really limit ourselves to just one university hospital or just one research institute. I think we (policymakers in the context of health, researchers, clinicians) felt this need to make sure that we have a good system in place, which would allow responsible data sharing. It was not to say that, well, now that we have seen this need, that it led to say we don’t need to care about privacy related issues or data protection related concerns or other ethical aspects of sharing data. It was a bit more of finding a balance to allow responsible data sharing, while at the same time respecting the rights of the individuals of whom you’re aiming to share their data. So, I think that it’s maybe now more of how to share data rather than should we share data.
I think we already have an answer for that – we have to share data. But it’s important to find that balance, and I think for that question we are still searching for the right answer because it’s not very easy. I think it’s shown that it’s not very easy to find that balance. I’m hearing mostly from biomedical researchers in this field that they are sometimes facing different challenges, different limitations in sharing data because trying to be compliant with regulations or different guidelines seems to be difficult for them. So, in a sense we still have work to do to find what is the best infrastructure or what is the best governance model that we can deploy to allow sharing data in a way that you’re not going to infringe on privacy rights or ethical principles in this context.
FLG: Data storage and access are key components of data sharing. What technologies or models are currently being adopted? What are the challenges?
Mahsa: I think that in response to the needs of finding the right infrastructure for sharing sensitive data, including genomic and genetic data, there has been a lot of efforts in recent years to find more technology-based solutions to the challenges that are arising from data protection and privacy and so on. In recent years, we have seen some European projects and also some private initiatives that make effort to use for example, blockchain-based technologies or different approaches to data sharing, which are often using a more decentralised approach to sharing data in a way to answer some of the questions or some of the challenges associated with data sharing. I think one of the unique things about data sharing is that you need to really pull data from different sources and sometimes, it’s not very easy to harmonise different infrastructures from different hubs that want to contribute data. Sometimes they are also following different regulatory frameworks.
So, let’s say that you’re speaking about sharing genomic data that is coming from different sources from different countries. Then you need to make sure that they have used the same consent model, for example, when they have collected this data or the approvals that they have in place. You need to bring all this together; you can’t just bring all this data in the same database. So that’s why this is really making it very challenging to come up with a single infrastructure to say that this is perfect enough to pull data from different sources on this platform and allow data sharing. That’s why in recent years, there has been a bit more growing attention to using decentralised models for data sharing and allowing different hubs and different data contributors to keep control of their data while still allowing discoverability and accessibility of these databases.
FLG: How can we ensure that there is greater harmonisation?
Mahsa: I think that there are different layers to this harmonisation. The first priority is related to the technical aspects of the data sharing. So, let’s say that you have data coming from different sources. You can imagine that maybe they have followed different data curation and data standardisation. This speaks to the quality of the data and the structure of the data which is coming from different databases and from different sources. This means that you really need to make sure that these existing databases, from a technical perspective, can also ensure the interoperability of these datasets. I think that’s something that many researchers and many projects are now busy with and are trying to really make this happen. I know that it’s in progress. But of course, this is an aspect that is very important.
Then, there’s another aspect or layer to this harmonisation, which is coming from the regulatory framework. So, as I mentioned already, depending on where the data is being collected, different regulations and also regulatory frameworks can be in place. This means, for example, a specific jurisdiction. It could a different consent model or different ethics approvals that might be obtained for collecting this data. So, it means that any secondary use of this health or genomic data needs to be in line with the original or initial approvals and consent forms. This is also something else that is now being addressed with different initiatives. For example, the Global Alliance for Genomics and Health (GA4GH), which is an international initiative that aims to help to development different policies to allow different cohorts, different sources in different countries to use these policies and hopefully eventually move in a direction that they harmonise their different data sharing related tools and mechanisms, including consent forms and so on.
FLG: More and more people are getting their genome sequenced, particularly outside of a clinical context. How can we ensure that individuals understand how their data is being shared?
Mahsa: That’s actually a very good question. At the moment, I think that we all agree that data sharing is not really just limited to a very traditional research setting. It is not really just something that researchers or clinicians need to worry about. But more and more we see a role also for the patients and the individuals themselves to take part and actually to be part of this data sharing ecosystem. I think in that sense, we already know that for some citizens they are more curious, they are more interested in being actively involved in the process of scientific research. For a while, we have had different discussions about the topic of citizen science, where some individuals are actively involved in the process of data sharing. I think that it may be special for this group because they are very interested in this topic, so they may take time to educate themselves about what the risks are and what the benefits are. It’s not really fair to say that every individual can educate themselves at that level about the risks and benefits of data sharing. So, I think that we need to stick something in between to make sure that individuals that have access to their own genomic data via these direct-to-consumer genetic testing services and so on are aware about the risks and benefits of data sharing.
I think that there is an immediate benefit in doing that for research and the advancement of research. Because if you have more citizens and more individuals aware of the benefits and potential risks, it is more likely that you will be able to also enhance trust among these individuals about why their data is valuable for research and why it should be used for different research purposes, so you have them on board when you are speaking about data sharing for research. I think this is very important and many researchers from social science, from different bioethics related research groups are busy with different methods to foster this public engagement, to be able to engage individuals in these discussions about data sharing. They are also aiming to develop different educational tools and different videos and so on to make individuals also part of this discussion. So, I think that this goes beyond just saying, ‘Well, you have given your consent so it means that you were fully aware of what will happen with your data’. I think that’s really not the case and many studies have shown that individuals may not even read the consent forms. They don’t read the privacy policies and as such, you cannot really rely on those tools to say that individuals are fully informed about the potential risks and benefits.
FLG: What are the options for people who don’t want to share their data or want to be selective?
Mahsa: I think that at this moment, if you look at the very traditional consent forms, you have to, in principle, give an option for individuals to opt out from either the whole research or data sharing or from specific purposes of sharing data. If you look at it from a strictly legal perspective, individuals have a choice to say whether or not they want their data to be shared for research purposes. But in practice, it is becoming more and more difficult to actively involve individuals in this discussion. Because let’s say if you deposit your genomic data in a database somewhere in the world, then there is a possibility for researchers to have access to these databases for different reasons and for research purposes. Sometimes it’s really difficult to go back to the individuals and ask them if they are happy with this specific type of data sharing or not every time. So that’s the reason that in recent years, researchers have tried to come up with some alternative approaches to the consent forms, including the dynamic consent model. The idea there is that by using different online platforms and digital tools, you can keep individuals in the loop and go back to them often and say, ‘We are using your data for this research – do you agree with that or not?’. So, really moving from the idea of traditional, paper-based, one-off consent in research.
FLG: Who owns genetic data? Who should own genetic data?
Mahsa: The concept of data ownership is a bit of a contested concept if you look at it from a purely legal perspective. I think you see more often in the ethical or sociology literature that people in a more liberal way use the term data ownership in the context of health data. But if you look at it from the legal perspective, it can have some important implications. For example, if you claim that this is my data, then what’s next? Should you be able to also sell your data, or what other specific rights can you exercise in this context and so on. So that’s why we’re a bit hesitant to use the terminology of data ownership in this context. But I think if we look at the different types of health and genomic data, in a more private setting, such as if you ask for it – whether you pay for it like direct-to-consumer testing – and you get your results, then I think it is easier to say that you can exercise different components of the ownership rights on your genomic data. You can get access to your raw data and your test results and so on, and this should already be included in the contract that you signed with you and the company. But when it comes to, for example, data which has been collected from you in the hospital and depending on the type of insurance that you have, sometimes it’s a question about whether or not, for example, that the hospitals can say, ‘Well, we have made all these efforts to generate this data, so we should also exercise some data ownership rights’. It’s quite a broad topic and I could talk about it for hours. But I think that it is safe to say that it is really a contested topic.
FLG: You co-authored a paper on DNA data marketplaces – would you be able to discuss this and what the challenges associated with it are?
Mahsa: This is a recent model of data sharing that we have seen being used with some initiatives and some start-ups in previous years. The idea is to introduce a more fair model of data sharing and with that, I mean that if individuals are sharing their data and if there are some interested parties, like pharmaceutical companies, and they are interested in getting access to this data because they can use it for developing different products and other commercial purposes, then these individuals should, first of all, have a say in how to share their data, but also they should be able to benefit from this data sharing. Because it’s not fair if the companies are making profit out of using this data for developing their products but nothing returns to the individuals. So that is the reason some initiatives and start-ups have used this model in previous years, and we mentioned a few of them in that paper. But we try to discuss what the associated ethical and legal challenges are with such models.
One of the main issues is that this can, actually, somehow challenge the traditional idea of participation in research. So, if you say that this is still something that’s considered to be a research activity, then we know that in the context of biomedical research, paying individuals for their participation is often considered to be a bit questionable, because we don’t want to somehow influence the consent to participate. In that sense, if you start to say, ‘We will pay individuals to donate their data’, we came to the conclusion that in this context, consent cannot be considered as a valid consent the way that we know it from the traditional research setting. Another issue that is also important in this context, is that it’s not very clear what the relevant privacy and data protection risks are in this context. So, it’s possible that maybe in the future, something happens with these databases and then it’s very important to know that individuals (and also those for whom this type of data has implications) are fully aware of the associated risks. So, leaving everything to the individuals to decide, about all the risks and all the benefits and so on, seems a bit concerning sometimes, especially in the context of genomic data. It can be considered very sensitive and that’s why we discussed in the paper how we should really proceed if we want to keep this model of data sharing.
FLG: Relationships between researchers and commercial companies are becoming more common, yet there is often a lack of public trust toward commercial companies – what will need to be done to build this trust?
Mahsa: I think that in this context, the very first important thing is to be very transparent about the whole arrangement of data sharing between publicly funded and also commercial entities. But in that sense, one thing that people active in this field have learnt is that individuals, and also citizens, don’t like to be surprised about these potential or existing collaborations. So, in that sense, it’s very important to be transparent about all these processes. I think that there are different ways for enhancing this transparency in this context. You can include this information in flyers or make some videos or anything that helps make sure that individuals are actually being informed about such collaborations. But also, the other thing that’s also important is that, as I mentioned, sometimes there is a question that if commercial companies are going to benefit from my data, then what is in return for me or for society? So, in that sense, that’s also something that is important to make sure that there are good benefit sharing models in place, and if there are such collaborations, then you need to really show how this can in return benefit society, and also eventually individuals in general. You can’t always expect that there is some individual benefit, but I think that, in general, something for society is something that is important.
FLG: Genealogy data can be used for unintended reasons like in law enforcement purposes. What are some of the ethical issues associated with this? Are there examples of where this approach has been misused?
Mahsa: I think that maybe this is also something that I just mentioned and it’s the element of surprise. So, I think that this was something that individuals and users of these databases were not really fully aware of, that there was a possibility for such use also by law enforcement. Also, something that is concerning for individuals is whether or not such access by law enforcement bodies or any third-party entities is regulated or not. So, if you start to feel that this can be used anywhere, it’s really uncharted territory and you can do whatever you want with these databases, then it doesn’t really give a good feeling to the individuals. In that sense, I think it was probably one of the main ethical and legal concerns to be very clear and transparent about how these databases can be used later on, because obviously use by law enforcement was not the first purpose that would have come to the minds of the users. Because these are the databases that are often used for genealogy purposes, it’s hardly something that you would do because you think it would be used to solve crime cases. So, in that sense, again I think information transparency and, of course, at the end of the day, making sure that this is also in line with the regulated frameworks.
FLG: What do you think the future of data sharing will look like? What is the data sharing utopia?
Mahsa: I think that as I also mentioned in the context of COVID-19, I think we now know that we should share data. The model question is – how? Many people are onboard now about data sharing, so this is something positive. What is less clear, is what is the best infrastructure we can find to make sure we can somehow promote data sharing and facilitate data sharing but at the same time, maintaining trust in the public about such data sharing. Looking at what is going on at this moment, I think that on the technical side, researchers are busy with developing different privacy preserving technologies, including more decentralised platforms, like federated networks and so on, to facilitate such data sharing. In the regulatory and more ethics related aspects, there have been extra challenges because of the data protection regulation in the EU (for example, GDPR). I think that researchers are struggling to find the right path to make sure that their data sharing is compliant with the GDPR, so it’s very much in progress. But I hope that with all of the initiatives that are going on at the moment, including the European Health Data Space, which is aiming to facilitate such secondary use of data and data sharing, can also help to provide more guidance and more clarity about the regulatory and ethics related aspects.
FLG: Thank you so much for joining me today, Mahsa, it has been very insightful, and I expect that data sharing will rapidly evolve in the coming years. Thanks.
Mahsa: You’re welcome. Thank you!