Published 20 years ago, the first drafts of the human genome opened the doors to a new generation of genetics. Hundreds of thousands of individuals have since been sequenced across the globe and we have increasing tools to study genomic variation. However, some aspects of the research ecosystem have hardly changed, and that remains a concern. To fulfil the long-delayed promise of the Human Genome Project, we must recommit to data sharing and diversity.
The explosion of data following the Human Genome Project led many governments, funding agencies and private research companies to develop their own custom-built databases. Unfortunately, these often present barriers to data access.
20 years on, there is still no specific universal policy that says research groups have to share their human genome data. There is also no universal format or database in which to share information. Requiring researchers to have a concrete data-sharing plan from the outset of a project could help to shift attitudes. Also, positive peer pressure could help researchers see data sharing as their duty. An NIH-wide policy set for January 2023 does just that. They aim to ensure data sharing is aligned both with ethical and privacy considerations, and with the FAIR principles. This means that data must be Findable, Accessible, Interoperable and Reusable.
Furthermore, for genomics to truly revolutionise medicine, it needs to be combined with phenotypic data. Physical characteristics, medical history and other identifiable traits can be linked to variants in the genome. Furthermore, detailed phenotypic data is required to check the quality of data and make reproducible findings. However, collecting this data increases the privacy risk for participants. While patients are now rightly being given more control of their data, scientists also need to be vetted to ensure participants are protected and provide true consent.
Overall, data repositories must be made more accessible and easier to contribute to. As has been highlighted repeatedly throughout this pandemic, data sharing can provide massive benefits to science and all of society. It’s time to solidify that foundation and improve sharing practises, but always with equity and respect!
Genome databases also largely over-represent DNA from people of European descent who live in high-income nations. In fact, less than 2% of the human genomes sequenced so far have been those of African people. Truly global databases are required to properly represent humanity’s vast genetic diversity. The fact this has not been achieved in two decades is a reminder of science’s history of mistreatment and neglect, particularly of African and Indigenous populations.
Last year, analyses of whole genome sequences of just 426 people across 50 ethnolinguistic groups in Africa revealed more than 3 million variants that were previously unknown. A rough estimate for capturing the full scope of Africa’s genetic variation would require sequencing ~3 million individuals. Participants would be selected to cover a broad range of ethnolinguistic, regional and other groups. This project has been proposed by the African Society of Human Genetics and could be completed in a decade with the right investment and resources. The global impact of such a project could be ground-breaking.
At this milestone anniversary, the genomics community needs to recommit to open data sharing. In future years, better attempts must be made to capture the full scope of variation to improve health care, equity and medical research globally.
Image Credit: Pixabay