Mobile Menu

Capture the dark genome: From repeat-expansions to CRISPR unintended mutations – Webinar Summary

Tackling Disease-Related Repeat Expansion Analysis – Marzia Rossato

What is the Dark Genome?

The dark genome refers to parts of the genome, and genetic features, that are missed by short read sequencing. They are missed because they are very difficult or impossible to sequence due to their content. These areas have a big impact on disease, over 6,000 genes are missed in short read sequencing but are linked to 300 diseases.

Today’s method of choice tends to be long-read sequencing. However, these analyses are limited in their accuracy and are not cost-effective. To make long-read sequencing clinically useful, the data provided needs to be high coverage and highly specific to a region.

Long-read sequencing can highlight dark genomic regions that remain obscure to technologies based on short reads. Enrichment of long DNA fragments is crucial to exploit the potential of long-read sequencing to tackle these regions in the clinic. The Xdrop technology is a novel approach allowing the enrichment of long-DNA fragments using droplet partitioning, with very low input DNA and without knowing the exact target sequence.

Xdrop technology uses indirect sequence capture to detect and amplify a region a few kb away, allowing areas of the dark genome to be sequenced. Moreover, Xdrop technology is flexible enough that it can be used for both long and short reads.

How does it work?

High molecular weight DNA is encapsulated in droplets with primers, at a low density of only one DNA molecule per droplet. The primers are added for the amplification sequence which is a few kb away, which allow the droplet to be stained if the sequence is present. The droplets are sorted by colour, and DNA in droplets (dMDA) are amplified before being sequenced.

Can the Xdrop technology be used to analyse disease-related repeat-expansion?

Over 30 neurological disorders are known to be caused by repeat expansions. Tandem repeats vary from 1kb to hundreds of kb. The number of repeats determines the phenotype, including the severity and age of onset, so an accurate repeat number provides crucial information to patients.

PCR-Based methods or Southern Blots, commonly used methods for repeat analysis, cannot provide an accurate repeat count, or detect interruptions in the repeats, but long-read sequencing can.

Using the Xdrop workflow, the team at the University of Verona investigated Fragile X and Myotonic Dystrophy genes. In Fragile X, the FMR1 triplet repeats can reach up to 3kb, whilst in Myotonic Dystrophy, DMPK1, it can reach 10 kb in length.

As before a primer was designed for an amplification sequence a few kb from the repeats in each case. After the procedure the repeat region was enriched up to 500 times, extending up to 40kb from the target, determining the number of repeats in the FMR1 gene to be 35.

In conclusion, the Xdrop technology allows the enrichment of long-DNA fragments, up to 500 times up to 40KB from the target. It can be sequenced with long and short reads; with long-read sequencing, it can be applied to accurately determine the length of repeat expansions in repeat-related diseases.

Validation of CRISPR in a 100 kb region surrounding the editing site – Peter Mouritzen

Xdrop technology has been shown to sequence regions of the dark genome and to categorise repeat expansions. Another use has been demonstrated using induced pluripotent stem cells (iPSCs) and CRISPR editing.

CRISPR has been used as a gene-editing method for single bases, gene knockouts, and gene knock-ins. There has been much focus on the off-target effects of CRISPR and the unintended side effects of this regarding gene disruption – but these are relatively rare. However, on-target edits are also a concern that can alter the genotype status of your target, these happen at an unknown frequency. These are likely to be missed using current sequencing methods.

The iPSC Case Study: How can you detect on-target edits?

The normal method for validation correct gene-editing involves a detection sequence (~150 bp) in your region of interest, which results in 100 x enrichment. However, this does not detect unintended on-target editing. By using indirect sequence capture and placing the detection sequence some distance away from the gene-editing site, these changes can be detected.

To test the Xdrop’s ability to detect on-target edits, 5 human cell lines that were created with CRISPR to make an Alzheimer’s Disease model were investigated. These cell lines had one or two SNPs in the ApoE gene which have been implicated in increased AD risk. A detection sequence was designed to map some distance from the gene-editing site to pull out long fragments. As the Xdrop workflow is PCR-free, all fragments can be covered, even GC rich regions or other difficult to amplify regions.

By following the Xdrop workflow, a 100kb region around the detection site was sequenced with good coverage. This showed that some of the reads around the ApoE gene were out of phase by a few kb – detailed analysis showed that a 3.4kb vector fragment used in originally creating the cell lines had been inserted. Xdrop showed 2 out of the 5 cell lines had this insertion which wasn’t picked up in the initial validation.

So how did unintended editing happen?

The vector fragment was introduced early during CRISPR editing and was not detected as the original validation used an amplicon of 227 bp over the site of the 2 SNPs. If there was an insertion of 3.4 kb in this amplicon, it became a very long fragment which isn’t amplified by PCR or detected. This masks the unintended edit.

In summary, 2 out of the 5 iPCS cell lines showed unintended on-target CRISPR editing, with SNPs on one of two alleles, and rearrangement on one of two alleles caused by vector DNA integration. Standard validation methods failed to detect this, but the Xdrop indirect sequence capture enabled the detection. For more information check out the paper here: Blondal et al. 2020 bioRxiv (preprint).

In conclusion the Xdrop has been shown to be clinically relevant for triplet diseases, investigating the dark genome, and validating CRISPR edits – whilst using a novel approach of droplets and indirect sequence capture.

Samplix Grant – Cristina Gamba

The final takeaway from the webinar was the announcement of the Samplix Grant. There are 3 grants of 5,000 Euros each, with a prize of Samplix Xdrop and Sequencing Services pilot project. Send your samples to Samplix and it will be sequenced for you.

Dates and more information can be found at

Requirements – 750 words max. Provide an overview with a clear aim of the project, as well as a description of the region of interest and associated challenges. The project encompasses the full Xdrop Enrichment and Sequencing workflow i.e. from target enrichment to sequencing on a long or short read platform.