Researchers have assembled 64 haplotypes from 32 diverse human genomes in order to serve as a new reference for genetic variation and predisposition to human diseases.
Advances in long-read sequencing, alongside genome-wide mapping technologies, have enabled researchers to fully resolve and assemble both haplotypes of a human genome. Compared to short-read sequencing, these assemblies generally improve variant discovery. The largest gains have been in the detection of structural variants (SVs). For example, Illumina approaches identify only 5,000-10,000 SVs, whereas long-read approaches can detect >20,000 SVs. Combined with new technologies, such as single-cell template strand sequencing (Strand-seq), long-read sequencing has further enabled the unambiguous confirmation of both heterozygous- and homozygous-inverted configurations.
The Human Genome Structural Variation Consortium (HGSVC) recently created a method for phased genome assembly. The approach combines long-read PacBio whole-genome sequencing (WGS) and Strand-seq data to produce fully phased diploid genome assemblies (without dependency on parent–child trio data). This approach enables a more complete sequence-resolved representation of variation within the human genome.
In this landmark study, published in Science, researchers presented a resource consisting of phase genome assemblies. This specifically corresponded to 70 haplotypes (64 unrelated and 6 children) from a diverse panel of human genomes. This reference dataset represents 25 different human populations from across the globe. The team specifically focussed on the discovery of novel SVs by performing extensive orthogonal validation.
They identified 107,590 SVs, of which 68% were not discovered by short-read sequencing. They also characterised 130 of the most active mobile elements and found that 63% of all SVs arise by homology-mediated mechanisms.
This generated panel has significantly improved SV discovery. It has also provided fundamental new insights into the structure and variation of the human genome. The team hope it will serve as the basis to construct new population-specific references.
E. Albert Reece, MD, PhD, MBA Executive Vice President for Medical Affairs and Dean of the School of Medicine, University of Maryland, stated:
“The landmark new research demonstrates a giant step forward in our understanding of the underpinnings of genetically-driven health conditions.
This advance will hopefully fuel future studies aimed at understanding the impact of human genome variation on human diseases.”
Image by PublicDomainPictures from Pixabay