Completed 20 years ago this month, the Human Genome Project was a ground-breaking piece of research. However, it was far from complete. A recent paper explores the advances in sequencing technology that have brought scientists closer to an end-to-end human genome map than ever before.
The Human Genome
Made up of a complex mix of genes and regulatory sequences, the human genome is often compared to a landscape. However, in many places this terrain is more like a desert highway, vast and repetitive.
For example, a chromosome’s centromere is composed of 1000s of near identical alpha-satellite sequences. 171 base-pair units must be correctly organised to ensure chromosomal stability and cell division. And yet, 20 years since the first drafts of the human genome project were released, these and other DNA features remain as stubborn gaps in our chromosomal map.
The most recent draft of the human genome, released in 2013, is still missing 5-10% of the genome. This 5-10% includes challenging regions, such as centromeres and the RNA sequences that encode ribosomes, which are present in long stretches of frequent and repeated gene copies. Moreover, the genome is sprinkled with stretches of segmental DNA. This DNA is the product of ancient chromosomal rearrangements and is difficult to map.
A Complete Picture
However, researchers at the Telomere-to-Telomere (T2T) consortium are set to fill these gaps. Their aim is to produce an end-to-end genome map stretching from one telomere to the other for every chromosome. They stress that the genomics world need to sequence many genomes, to slowly understand the variation in these under-studied genomic regions.
Many of the barriers undermining genome assembly efforts can be overcome by new long-read technologies. Biotech company Pacific Biosciences use an imaging system to directly read thousands of DNA strands in parallel. Alternatively, Oxford Nanopore Technologies threads DNA strands through tiny nanopores, reading hundreds of thousands of bases by measuring the small changes in electrical current that occur.
Early studies reported covering ~90% of the human genome with 99.8% accuracy using nanopore data alone. Such reports made it clear that the T2T goal is within reach. In fact, in late 2020 they published the first two complex assemblies for chromosomes X and 8.
Now, new technologies such as circular consensus sequencing are also being used. This process converts individual DNA strands into closed loops that can be read over and over. As a result, researchers can eliminate random errors to produce highly accurate results.
While the sequencing of chromosomes X and 8 took a year or more each to complete, they were able to sequence all the remaining chromosomes in a two month span with these combined approaches. They are also working on ribosomal RNA genes.
While currently most genomics efforts focus on known genes, more comprehensive analyses are likely to become a standard. As researchers begin routinely exploring the clinical impact of previously unmappable regions, the hope is that medical and research practices become increasingly effective.
Image Credit: Pixabay