A recent article published in Nature, describes a breakthrough in our knowledge of the human genome with the first gapless, telomere-to-telomere assembly of the human X chromosome.
Since the first human genome was released in 2001, our understanding of the human genome and the impact it can have on our health has accelerated. Simultaneously, after two decades of improvement, the current human genome reference – GRCh38 – has provided the most accurate and complete vertebrate genome to date. Nonetheless, despite these improvements, there has never been a fully completed end-to-end (telomere-to-telomere) human chromosome, with many gaps still a mystery. Now, researchers at the National Human Genome Research Institute (NHGRI) have been able to reconstruct the human X chromosome end-to-end (telomere-to-telomere) with no gaps for the first time and at an extraordinary level of accuracy.
The fact that the human genome has remained unfinished 20 years on from its initial release, emphasises the difficulty in assembling the genome and the limitations of previous technology. Technology has often fallen short when it comes to sequencing repetitive regions within the genome, producing lots of short reads that are almost identical. This has resulted in a lot of difficulty when it comes to mapping them back to the reference.
Today, with rapid improvements in long-read sequencing technology, the researchers at NHGRI have been able to exploit this technology in order to reconstruct the hard-to-sequence regions within the genome.
Lead authors on this study, Karen Miga and Adam Phillippy from NHGRI, previously published a study in 2018 where they demonstrated the potential benefits of using nanopore technology to map the complete human genome. Subsequently, they co-founded the Telomere-to-Telomere (T2T) consortium with a goal of producing a complete human genome sequence.
This recent project, using a CHM13 cell line, provided a start to that goal. The researchers combined nanopore sequencing technology with Illumina and PacBio technologies as well as optical maps from BioNano Genomics in order to fill in the gaps of the current reference.
As the de novo assembly was split into three sections – at the centromere and at two nearly identical segmental duplications – the rest of the assembly was done manually. The two segmental duplications that were breaking the assembly were resolved by ultra-long reads that spanned these repeats and were uniquely anchored either side. The X chromosome centromere, a highly repetitive region consisting of ~3.1 million base pairs, was reconstructed by identifying variants within the centromere and using these as markers to align and connect long reads that spanned across the entire centromere.
The team then completed two rounds of iterative polishing with Oxford Nanopore, then PacBio and then Illumina. This process was done to refine the unique regions in the sequence and reach high levels of accuracy.
As well as producing long reads, nanopore technology is also sensitive to methylation and therefore can detect epigenetic changes. The X chromosome has many epigenomic features, including its unique methylation profile which is present due to X inactivation of the chromosome. From mapping patterns of methylation on the X chromosome, the team were able to confirm findings from previous observations and were also able to observe other interesting patterns of methylation.
Although mapping the remaining chromosomes will be a challenge, this study provides evidence that a complete, gapless construction of the human genome is within our reach. Discovering the missing gaps in the genome will be essential to ensure all genomic variants are identified and will help aid in our ability to understand human health and disease. The T2T consortium is continuing to work on mapping all of the CHM13 chromosomes, with the hope that in the future it can be used as a basis for a new, gapless reference genome.
Image credit: Freepix