As part of the Vertebrate Genomes Project, researchers have described the lessons learned from assembling error-free genomes of 16 vertebrate animals.
The Vertebrate Genomes Project
By having chromosome-level reference genomes, researchers can perform functional, comparative and populations genomics more effectively within and across species. Sanger sequencing led to the first high-quality genome assemblies of human and other model species, such as mouse and zebrafish. This approach required a lot of manual effort and was costly. The emergence of shotgun approaches and now next-generation sequencing has enabled more affordable and scalable genome sequencing. Nonetheless, short-read sequencing has led to gaps and other assembly errors.
To address these problems, the G10K consortium launched the Vertebrate Genomes Project. The aim of the project is to produce high-quality reference genomes for each of the 71,657 extant named vertebrate species. The hope is that these genomes can be used to address fundamental questions in biology, disease and biodiversity conservation.
16 vertebrate genomes
In their flagship paper published in Nature, the team presented methods and principles for sequencing and assembling high-quality reference genomes. They first evaluated multiple genome sequencing and assembly approaches extensively in one species – the Anna’s hummingbird (Calypte anna). They found that not a single sequencing technology had all the necessary components. Therefore, they combined many tools into one pipeline. This method was then deployed across sixteen species representing six major vertebrate classes.
The team found there were major sources of assembly error in the previous reference genomes. The new method was able to correct for these errors, add missing sequences and also reveal biological discoveries. For example, they found that extremely small populations of the endangered kākāpō survived in such low numbers due to purging of deleterious mutations that cause disease.
Excitingly, different organisations are already using this novel pipeline. Genomes that at one time took years to generate are now being produced within week and months. In addition, researchers are already using the new data to study genes that render bats immune to SARS-CoV-2.
Erich D. Jarvis, Chair of the Vertebrate Genomes Project, stated:
“The first high-quality genomes that we sequenced taught us so much about the technology and the biology that we decided to publish in these initial papers.
The next step is to sequence all 1,000 vertebrate genera, and then all 10,000 vertebrate families, and eventually every single vertebrate species.”
Image credit: By kjpargeter – freepik