Researchers at the Human Genome Sequencing Center at Baylor College of Medicine have identified genetic variant discrepancies between two human reference genomes.
Reference genomes
Since the Human Genome Project mapped the entire human genome back in 2003, technological improvements have enabled the development of updated reference genomes. Earlier this month, the Telomere-to-Telomere Consortium released a preprint article claiming to have addressed the remaining 8% of the genome. The human reference genome is important as it acts as a standard for comparisons in basic research and clinical settings. The GRCh38 (hg38) human reference genome was released over seven years ago. Despite this, the older GRCh37 (hg19) reference remains widely used by most researchers and clinical laboratories. To date, no study has quantified the impact of utilising difference reference assemblies for identifying variants associated with rare and common diseases.
Comparison of genomes
In a recent study, published in the American Journal of Human Genetics, researchers explored the differences in sequencing readouts between the two references for labs that are still using the older reference. They specifically analysed exome sequencing samples from 1,572 participants with Mendelian diseases and their family members.
By calling variants on both references, the team found that 1.5% of single-nucleotide variants and 2.0% of indels were discordant. Moreover, they identified 206 genes with discordant variants. These included eight genes implicated in Mendelian diseases and 53 associated with common disease phenotypes. They found that 73% of the discordant variants were clustered within sections of the genome called DISCordant Reference Patches (DISCREPs). These sections of the genome have known assembly problems and were enriched for genomic elements, including segmental duplications.
The team hope to bring this issue to the attention of labs working on the 206 genes.
Dr. He Li, co-first author of the study, expressed:
“For variant interpretation in the 206 genes enriched for discordant variants, reference assembly differences should be accounted for in the analysis, especially when lifting over variant coordinates from one reference to the other.”
Transitioning from using the hg19 reference to the hg38 reference can take a significant amount of time and resources. This large-scale study aims to ease the burden on labs considering the transition. The paper also quantifies the benefits and limitations of the new reference and validates its utility in a lab setting.
Image credit: By Jian Fan – canva