Complex brain disorders are highly heritable and largely polygenic. They arise from a complex combination of risk genes, found in non-coding regions and located near regulatory elements. A recent review in Nature discusses the use of computational genetics to model genetic risk in a cell-type and patient-specific manner.
Neuropsychiatric and neurodevelopmental disorders are highly heritable and often arise from a complex polygenic risk architecture. The majority of disease-associated gene loci exist in non-coding regions of the genome. Candidate risk loci within these non-coding regions are often regulatory elements, such as enhancers and promoters. These can influence the transcription of specific genes.
However, despite being known drivers of disease-related symptoms, the functional characterisation of regulatory elements remains largely incomplete. Computational genetics strategies can be used to conduct large-scale validation and unbiased identification of disease-associated risk loci in a cell-type-specific and genotype-dependent manner.
Advances in Computational Genetics
Human induced pluripotent stem cells (hiPSCs) can be used as a unique platform to model cell-type-specific risk for psychiatric disorders. When applied to these models, genomic approaches can uncover the biological relevance of genetic variants and predict their phenotypic influence.
For example, large-scale genome wide association studies (GWAS) take advantage of linkage disequilibrium (the non-random coinheritance of genetic variants) to identify thousands of genetically-associated single nucleotide polymorphisms (SNPs) involved in psychiatric disorders. This data then requires further computational prioritisation and the functional validation of SNPs en masse.
Progress in single-cell RNA sequencing technologies has allowed the mapping of genetic risk loci to specific brain cell types. Such analyses have uncovered that genetic risk for Alzheimer’s disease and Multiple Sclerosis is enriched in genes expressed by microglia. Alternatively, genetic risk for Schizophrenia and Autism Spectrum Disorder was found to be shared mainly between interneurons and pyramidal neurons. This cell-type-specific genetic risk indicates a key biological role of these cell types in the aetiology of these disorders, which can aid diagnoses and identification of novel therapeutic targets.
Computational Genetics: Parallel Screening Techniques
This used of human-derived cell populations provides a rich and heterogenous background for genetic screens. It also provides the unique ability to model susceptibility for psychiatric disorders in a donor-dependent and cell-type-specific manner.
The CRISPR-Cas9 toolbox can be used to induce point mutations to activate or repress transcription at specific sites in the genome. This allows the manipulation of gene expression without completely knocking a gene in or out, better reproducing the influence that a disease-associated SNP may have on transcription. CRISPR editing can link GWAS-associated variants to genes and to phenotypes. However, this is only possible for a small number of top predicted SNPs.
In comparison, giant parallel reporter assays can be used to rapidly detect random nucleotide variation within regulatory regions. This technology can also identify and characterise shifts in transcriptional activity associated with these variations, which may indicate disease causation. Similar high-throughput assays called MPRAs enable the en masse screening of millions of nucleotide variants within thousands of sequences. The findings of these assays can then be validated using CRISPR-Cas9 guided allelic replacement.
Challenges of Validating Non-Coding Regions
Whilst there have been rapid developments in computational methods used to predict genetic contributions to disease, it remains difficult to link loci, SNP variants and gene expression to phenotypic variability. Non-coding regions are difficult to screen as they give rise to an exhaustive number of potential variants. Also, single-nucleotide mutations often do not produce a detectable phenotype.
Additionally, regulatory elements can affect gene expression in different ways, depending on the individual’s genetic makeup. Thus, candidate genes must be validated in an appropriate genetic context.
Limitations of Computational Genetics
Unfortunately, one major caveat of MPRA data is that they fail to reproduce the structural context of endogenous location. The 3D structure of chromatin, transcription-associated domains and other regulatory sites can all impact the activity of regulatory elements. However, by combining MPRA data with other datasets, such as HI-C (a chromatin conformation capture technique) data, or RNA-seq data, we can attempt to identify regulatory elements in their endogenous context.
A further limitation is that most MPRAs contain DNA fragments between 145 and 170 base pairs in length. This range may not encompass the boundaries of all regulatory elements. However, recent methodical improvements have been made to address this constraint.
Conclusion
The sheer number of regulatory regions which exist in the genome hinders the functional characterisation and validation of all predicted enhancer-gene connections. However, recent successful application of massively parallel techniques, such as CRISPR-based screens and MPRAs has expanded the realm of possibility for mapping and predicting the human ‘regulome’.
These massively parallel sequencing techniques have already been applied in cancer cell lines, neural cells and neural stem cells. Their application to hiPSC-based neuronal models could in future provide a cell-type specific catalogue of the human regulatory architecture which underlies psychiatric disorders.