Numbers
The 1000 Genomes Project: The 1,000 Genome Project (1KGP) launched in January 2008 with the aim of generating the most detailed catalogue of common human genetic variation using whole-genome sequencing. The international project finished in 2015 and reconstructed the genomes of 2,504 individuals from 26 populations.
The 100,000 Genomics Project: The 100,000 Genome Project (100KGP) is a UK Government project that launched in 2013 and finished in 2018. The project involved sequencing whole genomes of NHS patients, focussing on rare diseases, some common cancer types and infectious diseases.
A
Aneuploidy: Aneuploidy refers to the presence of an abnormal number of chromosomes in a cell. For example, a typical human cell contains 46 chromosomes whereas the cells of patients with Down Syndrome contain 47 chromosomes.
Autosomes: An autosome refers to any of the numbered chromosomes, as opposed to the sex chromosomes. In humans, there are 22 pairs of autosomes and one pair of sex chromosomes (X and Y).
B
Base pair (bp): A base pair is a unit of double-stranded nucleic acids. It consists of two nucleobases bound to each other by hydrogen bonds.
Base editing: Base editing is a genome editing technique that directly generates precise point mutations into DNA or RNA in living cells.
BRCA genes: BRCA is an abbreviation for BReast CAncer. The BRCA1 and BRCA2 genes are tumour suppressor genes involved in DNA repair. Mutations affecting these genes leads to a higher risk of developing breast and/or ovarian cancer.
C
Candidate gene: A candidate gene is a gene located within a chromosomal region that is suspected of being involved in a given trait or function.
CAR T-cell therapy: Chimeric Antigen Receptor (CAR) T-cell is a form of immunotherapy that specifically alters patient’s T cells to produce a CAR specific for a tumour antigen on their surface. Following expansion, these cells are re-infused back into the patient to kill tumour cells with the specific antigen.
CFTR gene: The cystic fibrosis transmembrane conductance regulator (CFTR) gene encodes for an ATP-binding cassette (ABC) transporter-class ion channel. This channel conducts chloride ions across epithelial cell membranes that produce mucus, sweat, saliva, tears and digestive enzymes. Mutations of the CFTR gene result in cystic fibrosis which leads to thickened mucus in the lungs and frequent respiratory infections.
Chimera: A chimera is an organism that contains cells or tissue with genetically distinct compositions.
ChIP: Chromatin Immunoprecipitation (ChIP) is a type of immunoprecipitation experimental technique. It investigates the interaction between proteins and DNA in the cell.
ChIP-seq: Chromatin Immunoprecipitation (ChIP)-Sequencing is a method use to analyse protein interactions with DNA. It combines ChIP with massively parallel sequencing to identify binding sites of DNA-associated proteins.
Chromatin: Chromatin is a macromolecular complex of DNA and proteins that forms chromosomes within the nucleus. Histones are the major proteins in chromatin, enabling the DNA to be packaged into a compact form.
Contig: A contig is a group of cloned pieces of DNA representing overlapping regions of a particular chromosome.
CRISPR-Cas9: CRISPR stands for clustered regularly interspaced short palindromic repeats and Cas9 stands for CRISPR-associated protein 9. CRISPR-Cas9 is a gene-editing technology that was adapted from a naturally occurring adaptive immune system in bacteria. This Nobel Prize winning technology has generated a lot of excitement due to its ability to more accurately and efficiency edit the genome.
D
Direct-to-consumer testing: Direct-to-consumer (DTC) testing is a method of enabling consumers to access genetic tests without the involvement of a healthcare provider. These tests are often presented with the aim of empowering customers.
E
EGFR gene: The Epidermal Growth Factor Receptor (EGFR) gene encodes a transmembrane receptor tyrosine kinase that is expressed on the surface of cells. It is most commonly found on skin cells but can be found elsewhere in the body. Mutations in the EGFR gene can result in various cancers, including lung cancer.
Epigenetics: Epigenetics is the study of heritable phenotype changes that do not involve direct alterations to the DNA sequence. Epigenetic alterations, such as DNA methylation and histone acetylation, can alter gene expression.
Epistasis: Epistasis is a phenomenon whereby the effects of one gene are modified by that of one or several other genes.
eQTL: Expression quantitative trait loci (eQTL) are genomic regions that explain a fraction of the genetic variation in expression levels of mRNAs.
F
Fluorescence In Situ Hybridization (FISH): FISH is a laboratory cytogenetic technique that uses fluorescent probes to detect specific DNA sequences. The technique is based on the complementary nature of DNA or DNA/RNA double strands.
G
Gene drives: A gene drive is a natural process and technology of genetic engineering that enables propagation of a particular suite of genes throughout a population.
Gene therapy: Gene therapy is an experimental technique that involves introducing genetic material into cells to compensate for abnormal genes or to make a beneficial protein.
Genetic/genomic counselling: Genetic counsellors work directly with patients and their families to provide genetic information to support them, allowing them to make informed choices.
Genetic engineering: Genetic engineering involves the direct manipulation of an organism’s DNA using biotechnology to modify an organism.
Genome-wide Association Studies (GWAS): GWAS is a study design used in genetic research to detect associations between genetic regions and particular traits. This method is hypothesis free.
GenOMICC Study: The GenOMICC study is an open-source research study that aims to engage clinicians and scientists across the world to help understand the genetic factors that impact outcomes in COVID-19 illness.
H
Haplotype: A haplotype is a combination of alleles that are inherited together in an organism from a single parent.
HapMap: The International HapMap Project, launched in 2002, aimed to develop a haplotype map of the human genome in order to describe common patterns of human genetic variation.
Histone: A histone is a protein that associates with DNA in the nucleus to provide structural support to chromosomes. Long DNA molecules wrap around complexes of histone protein, giving the chromosome a more compact shape.
Horizontal gene transfer: Horizontal gene transfer is the exchange of genetic material between two different organisms. This typically occurs amongst prokaryotes to exchange beneficial functionalities.
The Human Cell Atlas: The Human Cell Atlas Project is an international collaborative effort that aims to define all human cell types in terms of distinctive molecular profiles.
The Human Genome Project: The Human Genome Project was a thirteen-year international research effort to determine the entire DNA sequence of the human genome.
I
Imputation: Imputation is a statistical process used to replace data that are missing from a dataset. Researchers do imputation in order to improve the accuracy of their datasets. In genetics, genotype imputation is often used to predict genotypes that are not directly assayed in a sample.
In Situ Sequencing: In Situ Sequencing is a new method that allows mRNA to be sequenced directly in a section of fixed tissue or cell sample.
J
K
Karyotype: A karyotype is an individual’s collection of chromosomes. It describes the number and appearance of chromosomes, often visualised under a light microscope.
Knockout: A genetic knockout (KO) is a genetic technique in which an organism’s genes can be made inoperative. Gene KO models are widely used to study the function of genes.
L
Linkage disequilibrium (LD): Linkage disequilibrium refers to the non-random association of alleles at different loci in a given population. LD is influenced by many factors, including selection, rate of genetic recombination, mutation rate, genetic drift, mating, population structure and genetic linkage.
Liquid biopsy: Liquid biopsy is a revolutionary technique that involves sampling and analysis of non-solid biological tissue, most commonly blood.
Long-read sequencing: Long-read, or third-generation, sequencing is a DNA sequencing technique that involves reading sequences of between 10,000 and 100,000 bp at a single time. Prominent companies include Oxford Nanopore and PacBio.
M
Messenger RNA (mRNA): Messenger RNA is a single-stranded molecule of RNA that is complementary to one of the DNA strands of a gene. mRNA plays a key role in protein synthesis.
Metagenomics: Metagenomics is the study of genetic material that is obtained directly from environmental samples.
Microbiome: The microbiome consists of all the genetic material from the microbes (bacteria, fungi, protozoa and viruses) that live on and within the body.
Multi-omics: Multi-omics is a biological analysis approach in which the datasets are multiple ‘omes’. Such omes include genome, proteome, transcriptome, epigenome, metabolome and microbiome. This integrated data approach enables us to understand the interrelation and combined influence of these omic levels on health and disease.
N
Next-generation sequencing: Next generation sequencing, massively parallel or deep sequencing, are terms that describe the DNA sequencing technologies that have revolutionised genomic research. These technologies are characterised by being highly scalable, allowing the entire genome to be sequenced at once.
Nanopore sequencing: Nanopore sequencing is a third-generation sequencing approach that involves direct, real-time analysis of long, single molecules of DNA or RNA. It works by monitoring the changes in electrical current as nucleic acids are passed through a protein nanopore.
O
Ontology: Gene ontology (GO) is a major bioinformatic initiative aimed at unifying the presentation of gene and gene product attributes across all species. It involves controlling vocabulary and annotating gene and gene product attributes.
P
p53 gene: The p53 gene, TP53, is a tumour suppressor gene that regulates the cell cycle. It is often referred to as the guardian of the genome and is the most frequently mutated gene in human cancer.
Penetrance: Penetrance is the measure of the proportion of individuals in a population with a particular variant of a gene that also express the related trait.
Pharmacogenomics: Pharmacogenomics (PGx) is the study of how an individual’s genes can affect their response to drugs. For example, variations in the VKORC1 gene affect people’s response to warfarin treatment.
Pleiotropy: Pleiotropy is a phenomenon by which one gene influences two or more seemingly unrelated phenotypic traits.
Polygenic Risk Score (PRS): A polygenic risk score is an estimate of an individual’s genetic liability to a trait or disease. It is calculated based on their genotype profile and relevant GWAS data.
Polymerase chain reaction (PCR): Polymerase chain reaction is a widely used method that amplifies small segments of DNA – making millions of copies of a specific DNA region.
Prophylactic: Prophylactic therapy or treatment is used to defend or protect from the spread or occurrence of disease or infection.
Q
R
Rare disease: A rare disease is any disease that affects a small percentage of the population. There is no single, widely accepted definition for rare diseases.
Reference genome: A reference genome is a digital nucleic acid sequence database that was assembled by scientists as a representative example of an individual organism’s set of genes. The human reference genome is derived from a number of people. The current reference genome build is GRCh38.
S
Sanger sequencing: Sanger sequencing is a method of DNA sequencing based on the process of selective incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication.
Shotgun sequencing: Shotgun sequencing is a method for sequencing random strand of DNA. The process involves randomly breaking up DNA into small fragments that are individually sequenced and then reassembled by regions of overlap.
Short-read sequencing: Short-read sequencing (or next-generation) involves the sequencing of DNA of 50-500 bp in length. The dominant player in this field is Illumina and bridge amplification.
Single cell sequencing: Single cell sequencing harnesses next-generation sequencing technologies to examine the sequence information from individual cells. This approach can identify complex and rare cell populations, uncover regulatory relationships between genes and track trajectories of distinct cell lineages.
Single Nucleotide Polymorphisms (SNPs): A single-nucleotide polymorphism is a DNA sequence variation that involves a substitution of a single nucleotide at a specific position in the genome. They are the most common type of genetic variant among individuals.
Spatial transcriptomics: Spatial transcriptomics is an in situ capturing technique that enables scientists to measure all the gene activity within a tissue sample and map where activity is occurring.
Synthetic biology: Synthetic biology is an interdisciplinary field of science that combines biology and engineering to design and construct new or redesign biological parts, devices and systems.
T
Topologically associating domain (TAD): Topologically associating domains are fundamental building blocks of 3D nuclear organisation. The boundaries of TADs regulate gene expression.
Telomeres: Telomeres are non-coding, repetitive sequences located at the ends of chromosomes. They protect the genome from nucleolytic degradation and are related to the ageing process and cancer.
T2T Consortium: The Telomere-to-Telomere consortium is an open, community-based effort to generate the first complete assembly of the human genome.
Transgenic: Transgenic, or genetically modified, organisms contain an exogenous or modified gene (transgene) from another species that have been introduced via artificial means.
U
UK Biobank: Launched in 2006, the UK Biobank is a large long-term biobank study which is exploring the respective contributions of genetic predisposition and environmental exposures. The database is regularly augmented with additional data.
V
Variant calling: Variant calling is the process by which genetic variants are identified from sequence data. A variant call file (VCF) is the usual output of this process.
Variants of uncertain (or unknown) significance: A variant of uncertain (or unknown) significance (VUS) is a genetic variant identified through genetic testing whose significance to the function or health of an organism is not currently known. It is part of the ACMG/AMP’s guidelines for interpreting sequence variants.
Vector: A vector is any vehicle, often a virus or a plasmid, that is used to carry a desired DNA sequence into the cells of a recipient.
W
Whole exome sequencing: Whole exome sequencing (WES) is a genomic sequencing technique for investigating all of the protein-coding regions of the genome.
Whole genome sequencing: Whole genome sequencing (WGS) is a comprehensive sequencing method for analysing the entire genome.
X
X-inactivation: X-inactivation also called lyonisation, is the process by which one of the copies of the X chromosome is transcriptionally silenced in female mammals to compensate for chromosome dosage.