Mobile Menu

Diversity in Genomics: Assessing the Mosaic of Human Life

Despite our DNA only differing at around 0.1% of base pairs, human life is complex, unique and varied. Although this variation exists, genomics research has typically focused on homogeneous groups, erroneously assuming a one-size-fits-all approach.

Over the years, many scientists have tried to address the lack of diversity in genomics research. Although a number of initiatives have been set up to serve this purpose, there is still much to be done, and it hasn’t always been done correctly. In this feature, we explore the importance of diversity in science, the initiatives aiming to solve this problem and the challenges associated with it.

The mosaic of human life

With every individual having their own unique experiences and perspectives, it’s obvious to see with the naked eye that all humans are different. In spite of this, science has often forgotten that one homogenous group does not necessarily reflect humanity as a whole.

Human beings share around 99.9% of our DNA with each other – a higher amount than most other species – but that missing 0.1% is much more important than it may seem. Within these sections of our genome is variation that explains much of the difference between each individual human. Crucially, however, it also underpins our health. Genetic variation can contribute to our susceptibility to certain diseases and our response to medications; this concept forms the basis of what we call ‘precision medicine’. Yet historically, studies have focussed heavily on European populations. In fact, a 2021 review posited that over 86% of genome-wide association studies (GWASs) have used volunteers of European descent. This number had grown since 2016 and is clearly out of sync with the demographics of the global population.

The overrepresentation of individuals of European ancestry in genetics research is problematic, not only because it limits our understanding of genetic diversity, but it also exacerbates health disparities and undermines the generalisability of research findings. To address these issues and promote equitable healthcare for all individuals, it is essential to prioritise diversity and inclusion in research efforts.

Genomics England Diverse Data Initiative

One initiative aiming to tackle the lack of diversity in genomic data is Genomics England’s Diverse Data project.  Launched in 2021, the initiate aims to collect samples from individuals who are typically underrepresented in research, with a goal to ensure that all patients, regardless of background, can benefit equally from genomics research.

The initiative has specific focus areas, namely in research on sickle cell disease and maternal healthcare. Crucially, the initiative also focuses on addressing why there is a lack of diversity in this field, including a historic mistrust of science in some marginalised communities. This is a particularly important topic, as simply obtaining more diverse samples is not enough to overcome the fundamental reasons underpinning the lack of data – much more needs to be done to appropriately engage with underrepresented communities.

Recently, members of the Diverse Data team assessed the impact of diversity in the 100,000 Genomes Project Cancer Cohort. The results revealed that there were fewer actionable insights prioritised in individuals from non-European backgrounds, suggesting a greater need for diversity within the Genomics England pipeline, which is now being addressed.

Why is there a lack of diversity in genomic data?

The ongoing lack of diversity in genomic data can be attributed to several interconnected factors. The prevalence of European samples within datasets is rooted in colonialism, racism and the disproportionate allocation of resources. Consequently, datasets have been predominantly populated with samples from European individuals. Moreover, the methods used to collect genomic data often favour convenience and cost-effectiveness over diversity – researchers may recruit participants from academic institutions or healthcare facilities that predominantly serve European populations. Additionally, studies may rely on existing datasets or biobanks that are already biased towards certain populations, perpetuating the lack of diversity in subsequent research.

Perhaps more important than all of these above reasons is a historical lack of trust in science from some marginalised communities, who may have been subjected to exploitative research practices without their informed consent. A well-known example is the Tuskegee Syphilis Study, where African American men were unknowingly left untreated for syphilis. These violations of trust coupled with biased recruitment and discriminatory practices within science and healthcare have had an ongoing impact on marginalised communities. This means that not only is it crucial to improve these practices, we must also strive for better engagement with these communities to ensure open and honest dialogue.

All of Us

Another large-scale initiative aiming to address the lack of diversity in research is the All of Us program, a US initiative with the goal of collecting genetic data from one million individuals. Ultimately, the aim of the project is to move healthcare away from a one-size-fits-all approach and ensure that everyone can benefit equally from precision healthcare. Diversity is a key priority of the program, and there is specific focus on ensuring the participation of Native American individuals. Part of the latter goal is to ensure appropriate and sensitive communication with these communities, and researchers have received input from Tribal leaders in order to achieve this.

But the All of Us program strives not only to achieve diversity in ethnicity, but also in sexuality and gender identity, socioeconomic status, education level and more. All of these factors play a huge role in our lives, and the researchers involved in the program are keen to explore how not only genetic but also environmental factors influence our health and well-being in a diverse setting.

Recent All of Us result causes controversy

In February 2024, a series of papers detailing results from the All of Us program was published. Over 250,000 genomes have already made their way into the database, and these articles detailed the initial methods and findings before the anticipated collection of one million genomes by 2026.

In one of these papers, published in Nature, the authors aimed to showcase the diversity of the dataset – 77% of participants are from historically underrepresented backgrounds. The article detailed millions of newfound genetic variants in the human population and addressed their connection to health.

However, debate was sparked due to the controversial way in which some of the data was presented. The authors had created a UMAP based on participants’ self-recorded race and ethnicity, with the resulting figure seeming to imply that humans can be shoehorned into distinct ethnic groups based on their genes.

Figure 1: UMAP describing genetic diversity within human populations. The figure has caused controversy due the implication that ethnicity can be separated in this way. Adapted from The All of Us Research Program Genomics Investigators, 2024.

Experts were quick to call for the paper’s redaction, citing concerns that the figure perpetuates the idea of genetic essentialism, and could be weaponised in a racist manner. The authors have no current plans to revise the figure, but have noted that there are considerations to be made when presenting this kind of data going forward.

The Human Pangenome Reference

The final key initiative we will cover in this feature is the Human Pangenome reference. This is an ambitious international effort, aiming to create a comprehensive map of genetic variation across diverse human populations. Traditional reference genomes, such as the one created in the Human Genome Project, provide a single representation of the human DNA based on a small number of individuals. However, these reference genomes fail to capture the full spectrum of genetic diversity present in the global human population.

The concept of the pangenome expands upon the idea of a single reference genome by incorporating a broader range of genetic variations observed across multiple individuals and populations. The human pangenome encompasses not only the core set of genes shared among individuals but also a more diverse repertoire of genetic variations.

The first version of the Human Pangenome was published in early 2023, consisting of DNA from 47 individuals. It is hoped that the reference will grow to contain data from over 350 individuals by the end of 2024.

What does the future hold?

The above initiatives are not the only efforts to increase diversity in research, with many programs now making the issue a key concern in their plans. That said, in this article we have focussed on three large-scale initiatives led by researchers from Western nations; in fact, a number of countries and regions are championing their own projects, such as Genomics Thailand and H3Africa. To explore initiatives such as these, check out our World of Genomics series.

The fact that the most well-known diversity initiatives are still based primarily in Western nations highlights an ongoing problem around resource distribution. Many countries still do not have the equipment, staff or training opportunities to fulfil large-scale genomics projects, and funding has historically been concentrated in developed Western regions. Furthermore, a lack of appropriate community engagement with those from minority communities and developing nations must be addressed to ensure that individuals will want to participate in studies with ethical and informed consent.

Ultimately, establishing meaningful partnerships with diverse communities will be essential to build trust and ensure the inclusion of underrepresented populations in genomic research. Engaging community leaders, stakeholders and advocacy groups should help foster collaboration, promote cultural sensitivity and empower individuals to participate in research initiatives. In addition, we must make efforts to ensure that the necessary tools make their way to other countries, so that scientists have the power to carry out their own research. Some sequencing companies have already begun to establish partnerships overseas, but it is clear that there is still much to be done.


More on these topics

Diversity / Genomics / GWAS / Social genomics