Numbers
The 1000 Genomes Project: The 1,000 Genome Project (1KGP) launched in January 2008 with the aim of generating the most detailed catalogue of common human genetic variation using whole-genome sequencing. The international project finished in 2015 and reconstructed the genomes of 2,504 individuals from 26 populations.
The 100,000 Genomics Project: The 100,000 Genome Project (100KGP) is a UK Government project that launched in 2013 and finished in 2018. The project involved sequencing whole genomes of NHS patients, focussing on rare diseases, some common cancer types and infectious diseases.
A
Amniocentesis: A prenatal diagnostic procedure in which a small amount of amniotic fluid is extracted from the amniotic sac surrounding the developing fetus to obtain information about genetic conditions, chromosomal abnormalities and fetal well-being.
Antimicrobial resistance: The ability of microorganisms, such as bacteria, viruses and fungi, to resist the effects of antimicrobial drugs, making infections more difficult to treat.
Aneuploidy: Aneuploidy refers to the presence of an abnormal number of chromosomes in a cell. For example, a typical human cell contains 46 chromosomes whereas the cells of patients with Down Syndrome contain 47 chromosomes.
Autosomes: An autosome refers to any of the numbered chromosomes, as opposed to the sex chromosomes. In humans, there are 22 pairs of autosomes and one pair of sex chromosomes (X and Y).
B
Base pair (bp): A base pair is a unit of double-stranded nucleic acids. It consists of two nucleobases bound to each other by hydrogen bonds.
Base editing: Base editing is a genome editing technique that directly generates precise point mutations into DNA or RNA in living cells.
Biodata: Information or data relating to biological organisms or systems, including genomic, transcriptomic, proteomic, phenotypic, and environmental data.
Bioinformatics: A field of science that combines biology, computer science and statistics to analyse and interpret biological data, particularly genomic and molecular data.
Biomarker: A measurable biological characteristic or molecule that indicates the presence, progression or response to a disease or medical condition.
BRCA genes: BRCA is an abbreviation for BReast CAncer. The BRCA1 and BRCA2 genes are tumour suppressor genes involved in DNA repair. Mutations affecting these genes leads to a higher risk of developing breast and/or ovarian cancer.
C
Candidate gene: A candidate gene is a gene located within a chromosomal region that is suspected of being involved in a given trait or function.
CAR T-cell therapy: Chimeric Antigen Receptor (CAR) T-cell is a form of immunotherapy that specifically alters patient’s T cells to produce a CAR specific for a tumour antigen on their surface. Following expansion, these cells are re-infused back into the patient to kill tumour cells with the specific antigen.
Circular DNA: DNA molecules that form a closed loop instead of a linear structure, commonly found in bacteria and some viruses.
CFTR gene: The cystic fibrosis transmembrane conductance regulator (CFTR) gene encodes for an ATP-binding cassette (ABC) transporter-class ion channel. This channel conducts chloride ions across epithelial cell membranes that produce mucus, sweat, saliva, tears and digestive enzymes. Mutations of the CFTR gene result in cystic fibrosis which leads to thickened mucus in the lungs and frequent respiratory infections.
Chimera: A chimera is an organism that contains cells or tissue with genetically distinct compositions.
ChIP: Chromatin Immunoprecipitation (ChIP) is a type of immunoprecipitation experimental technique. It investigates the interaction between proteins and DNA in the cell.
ChIP-seq: Chromatin Immunoprecipitation (ChIP)-Sequencing is a method use to analyse protein interactions with DNA. It combines ChIP with massively parallel sequencing to identify binding sites of DNA-associated proteins.
Chromatin: Chromatin is a macromolecular complex of DNA and proteins that forms chromosomes within the nucleus. Histones are the major proteins in chromatin, enabling the DNA to be packaged into a compact form.
Contig: A contig is a group of cloned pieces of DNA representing overlapping regions of a particular chromosome.
CRISPR-Cas9: CRISPR stands for clustered regularly interspaced short palindromic repeats and Cas9 stands for CRISPR-associated protein 9. CRISPR-Cas9 is a gene-editing technology that was adapted from a naturally occurring adaptive immune system in bacteria. This Nobel Prize winning technology has generated a lot of excitement due to its ability to more accurately and efficiency edit the genome.
D
Direct-to-consumer testing: Direct-to-consumer (DTC) testing is a method of enabling consumers to access genetic tests without the involvement of a healthcare provider. These tests are often presented with the aim of empowering customers.
DNA: Deoxyribonucleic acid, a double-stranded molecule that carries the genetic instructions used in the development and functioning of living organisms.
DNA methylation: The addition of a methyl group to DNA, which can regulate gene expression and play a role in various biological processes and diseases.
Dominant negative mutation: A genetic mutation that produces a dysfunctional protein which interferes with the normal functioning of the protein produced by a non-mutated allele.
E
EGFR gene: The Epidermal Growth Factor Receptor (EGFR) gene encodes a transmembrane receptor tyrosine kinase that is expressed on the surface of cells. It is most commonly found on skin cells but can be found elsewhere in the body. Mutations in the EGFR gene can result in various cancers, including lung cancer.
Epigenetics: Epigenetics is the study of heritable phenotype changes that do not involve direct alterations to the DNA sequence. Epigenetic alterations, such as DNA methylation and histone acetylation, can alter gene expression.
Epigenome: The chemical modifications to the DNA and its associated proteins that regulate gene expression, without altering the underlying DNA sequence.
Epistasis: Epistasis is a phenomenon whereby the effects of one gene are modified by that of one or several other genes.
eQTL: Expression quantitative trait loci (eQTL) are genomic regions that explain a fraction of the genetic variation in expression levels of mRNAs.
Extracellular vesicles: Small membrane-bound structures released by cells into the extracellular space, containing various molecules such as proteins, nucleic acids, and lipids, which can play important roles in cell-to-cell communication.
F
Fluorescence In Situ Hybridization (FISH): FISH is a laboratory cytogenetic technique that uses fluorescent probes to detect specific DNA sequences. The technique is based on the complementary nature of DNA or DNA/RNA double strands.
G
Gene drives: A gene drive is a natural process and technology of genetic engineering that enables propagation of a particular suite of genes throughout a population.
Gene therapy: Gene therapy is an experimental technique that involves introducing genetic material into cells to compensate for abnormal genes or to make a beneficial protein.
Gene splicing: A post-transcriptional modification that involves the exclusion of certain sections of RNA. This process is important in maintaining proteome diversity.
Genetic/genomic counselling: Genetic counsellors work directly with patients and their families to provide genetic information to support them, allowing them to make informed choices.
Genetic engineering: Genetic engineering involves the direct manipulation of an organism’s DNA using biotechnology to modify an organism.
Genome-wide Association Studies (GWAS): GWAS is a study design used in genetic research to detect associations between genetic regions and particular traits. This method is hypothesis free.
Genomic imprinting: An epigenetic process where certain genes behave differently depending on whether they are inherited from the mother or the father, affecting how they are expressed and function in the body.
GenOMICC Study: The GenOMICC study is an open-source research study that aims to engage clinicians and scientists across the world to help understand the genetic factors that impact outcomes in COVID-19 illness.
H
Haplotype: A haplotype is a combination of alleles that are inherited together in an organism from a single parent.
HapMap: The International HapMap Project, launched in 2002, aimed to develop a haplotype map of the human genome in order to describe common patterns of human genetic variation.
Histone: A histone is a protein that associates with DNA in the nucleus to provide structural support to chromosomes. Long DNA molecules wrap around complexes of histone protein, giving the chromosome a more compact shape.
Horizontal gene transfer: Horizontal gene transfer is the exchange of genetic material between two different organisms. This typically occurs amongst prokaryotes to exchange beneficial functionalities.
The Human Cell Atlas: The Human Cell Atlas Project is an international collaborative effort that aims to define all human cell types in terms of distinctive molecular profiles.
The Human Genome Project: The Human Genome Project was a thirteen-year international research effort to determine the entire DNA sequence of the human genome.
I
Imputation: Imputation is a statistical process used to replace data that are missing from a dataset. Researchers do imputation in order to improve the accuracy of their datasets. In genetics, genotype imputation is often used to predict genotypes that are not directly assayed in a sample.
In Situ Sequencing: In Situ Sequencing is a new method that allows mRNA to be sequenced directly in a section of fixed tissue or cell sample.
J
K
Karyotype: A karyotype is an individual’s collection of chromosomes. It describes the number and appearance of chromosomes, often visualised under a light microscope.
Knockout: A genetic knockout (KO) is a genetic technique in which an organism’s genes can be made inoperative. Gene KO models are widely used to study the function of genes.
L
Linkage disequilibrium (LD): Linkage disequilibrium refers to the non-random association of alleles at different loci in a given population. LD is influenced by many factors, including selection, rate of genetic recombination, mutation rate, genetic drift, mating, population structure and genetic linkage.
Liquid biopsy: Liquid biopsy is a revolutionary technique that involves sampling and analysis of non-solid biological tissue, most commonly blood.
Long-read sequencing: Long-read, or third-generation, sequencing is a DNA sequencing technique that involves reading sequences of between 10,000 and 100,000 bp at a single time. Prominent companies include Oxford Nanopore and PacBio.
M
Machine learning: A subfield of artificial intelligence that focuses on the development of algorithms and models that enable computers to learn and make predictions or decisions without being explicitly programmed.
Messenger RNA (mRNA): Messenger RNA is a single-stranded molecule of RNA that is complementary to one of the DNA strands of a gene. mRNA plays a key role in protein synthesis.
Metagenomics: Metagenomics is the study of genetic material that is obtained directly from environmental samples.
Metabolomics: The study of small molecules, known as metabolites, within cells, tissues or organisms, and their roles in metabolism, health and disease.
Metastasis: The spread of cancer cells from the primary site to other parts of the body, forming secondary tumours.
Microbiome: The microbiome consists of all the genetic material from the microbes (bacteria, fungi, protozoa and viruses) that live on and within the body.
Microglia: Specialized immune cells in the central nervous system that play a crucial role in the brain’s immune response and the maintenance of neural health.
microRNA: Small non-coding RNA molecules that play a role in gene regulation by binding to messenger RNA (mRNA) and affecting its stability and translation, influencing various biological processes.
Multi-omics: Multi-omics is a biological analysis approach in which the datasets are multiple ‘omes’. Such omes include genome, proteome, transcriptome, epigenome, metabolome and microbiome. This integrated data approach enables us to understand the interrelation and combined influence of these omic levels on health and disease.
N
Next-generation sequencing: Next generation sequencing, massively parallel or deep sequencing, are terms that describe the DNA sequencing technologies that have revolutionised genomic research. These technologies are characterised by being highly scalable, allowing the entire genome to be sequenced at once.
Nanopore sequencing: Nanopore sequencing is a third-generation sequencing approach that involves direct, real-time analysis of long, single molecules of DNA or RNA. It works by monitoring the changes in electrical current as nucleic acids are passed through a protein nanopore.
Neurodegenerative disease: A group of disorders characterised by the progressive degeneration and dysfunction of neurons in the central nervous system, leading to cognitive and motor impairments, for example, Alzheimer’s disease.
O
Ontology: Gene ontology (GO) is a major bioinformatic initiative aimed at unifying the presentation of gene and gene product attributes across all species. It involves controlling vocabulary and annotating gene and gene product attributes.
Organoids: Three-dimensional structures grown in the lab that mimic the structure and function of specific organs or tissues, allowing for the study of diseases and drug testing.
P
p53 gene: The p53 gene, TP53, is a tumour suppressor gene that regulates the cell cycle. It is often referred to as the guardian of the genome and is the most frequently mutated gene in human cancer.
Pangenome: The complete set of genes for a given species, including core genes present in all individuals and accessory genes present in only some individuals or populations.
Penetrance: Penetrance is the measure of the proportion of individuals in a population with a particular variant of a gene that also express the related trait.
Pharmacogenomics: Pharmacogenomics (PGx) is the study of how an individual’s genes can affect their response to drugs. For example, variations in the VKORC1 gene affect people’s response to warfarin treatment.
Pleiotropy: Pleiotropy is a phenomenon by which one gene influences two or more seemingly unrelated phenotypic traits.
Polygenic Risk Score (PRS): A polygenic risk score is an estimate of an individual’s genetic liability to a trait or disease. It is calculated based on their genotype profile and relevant GWAS data.
Polymerase chain reaction (PCR): Polymerase chain reaction is a widely used method that amplifies small segments of DNA – making millions of copies of a specific DNA region.
Post-transcriptional modification: The chemical modifications that occur to RNA molecules after they have been transcribed from DNA, influencing RNA stability, localisation and function.
Precision medicine: An approach to healthcare that tailors medical treatments and interventions to individual patients based on their unique characteristics, such as genetics, environment and lifestyle.
Prime editing: A precise genome editing technique that combines CRISPR-Cas9 technology with a modified reverse transcriptase to directly rewrite specific DNA sequences, allowing for targeted changes to be made in the genome.
Prophylactic: Prophylactic therapy or treatment is used to defend or protect from the spread or occurrence of disease or infection.
Proteomics: The study of the structure, function, and interactions of proteins within a biological system, often focusing on large-scale analysis of protein expression, modification and localisation.
Q
R
R: A programming language commonly used for statistical computing and graphics, widely used in data analysis and visualisation in fields such as bioinformatics and data science.
RNA: Ribonucleic acid, a single-stranded molecule that plays a crucial role in the coding, decoding, regulation, and expression of genes. RNA is produced during the process known as transcription, and is further translated into protein.
Rare disease: A rare disease is any disease that affects a small percentage of the population. There is no single, widely accepted definition for rare diseases.
Reference genome: A reference genome is a digital nucleic acid sequence database that was assembled by scientists as a representative example of an individual organism’s set of genes. The human reference genome is derived from a number of people. The current reference genome build is GRCh38.
S
Sanger sequencing: Sanger sequencing is a method of DNA sequencing based on the process of selective incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication.
Shotgun sequencing: Shotgun sequencing is a method for sequencing random strand of DNA. The process involves randomly breaking up DNA into small fragments that are individually sequenced and then reassembled by regions of overlap.
Short-read sequencing: Short-read sequencing (or next-generation) involves the sequencing of DNA of 50-500 bp in length. The dominant player in this field is Illumina and bridge amplification.
Single cell sequencing: Single cell sequencing harnesses next-generation sequencing technologies to examine the sequence information from individual cells. This approach can identify complex and rare cell populations, uncover regulatory relationships between genes and track trajectories of distinct cell lineages.
Single Nucleotide Polymorphisms (SNPs): A single-nucleotide polymorphism is a DNA sequence variation that involves a substitution of a single nucleotide at a specific position in the genome. They are the most common type of genetic variant among individuals.
Somatic mosaicism: The presence of different genetic variations or mutations within the cells of an individual’s body.
Spatial transcriptomics: Spatial transcriptomics is an in situ capturing technique that enables scientists to measure all the gene activity within a tissue sample and map where activity is occurring.
Spermatogenesis: The process of sperm cell development, involving the production, maturation and differentiation of spermatozoa in the testes.
Stem cells: Undifferentiated cells that have the potential to develop into different cell types in the body, serving as a repair and regeneration system for tissues and organs.
Synthetic biology: Synthetic biology is an interdisciplinary field of science that combines biology and engineering to design and construct new or redesign biological parts, devices and systems.
T
Topologically associating domain (TAD): Topologically associating domains are fundamental building blocks of 3D nuclear organisation. The boundaries of TADs regulate gene expression.
Telomeres: Telomeres are non-coding, repetitive sequences located at the ends of chromosomes. They protect the genome from nucleolytic degradation and are related to the ageing process and cancer.
T2T Consortium: The Telomere-to-Telomere consortium is an open, community-based effort to generate the first complete assembly of the human genome.
Transcriptome: The complete set of RNA molecules produced by a cell, tissue or organism at a particular time. Transcriptomic studies can provide insights into gene expression and regulation.
Transgenic: Transgenic, or genetically modified, organisms contain an exogenous or modified gene (transgene) from another species that have been introduced via artificial means.
Trinucleotide repeat: A mutation where a sequence of three nucleotides is repeated multiple times. These repeats can occur in various regions of the genome and, when expanded beyond a certain threshold, are associated with certain genetic disorders, such as Huntington’s disease
U
UK Biobank: Launched in 2006, the UK Biobank is a large long-term biobank study which is exploring the respective contributions of genetic predisposition and environmental exposures. The database is regularly augmented with additional data.
V
Variant calling: Variant calling is the process by which genetic variants are identified from sequence data. A variant call file (VCF) is the usual output of this process.
Variants of uncertain (or unknown) significance: A variant of uncertain (or unknown) significance (VUS) is a genetic variant identified through genetic testing whose significance to the function or health of an organism is not currently known. It is part of the ACMG/AMP’s guidelines for interpreting sequence variants.
Vector: A vector is any vehicle, often a virus or a plasmid, that is used to carry a desired DNA sequence into the cells of a recipient.
W
Whole exome sequencing: Whole exome sequencing (WES) is a genomic sequencing technique for investigating all of the protein-coding regions of the genome.
Whole genome sequencing: Whole genome sequencing (WGS) is a comprehensive sequencing method for analysing the entire genome.
Wild type: The ‘standard’ or reference form of a gene, organism or characteristic, typically used to compare and understand variations or mutations.
X
X-inactivation: X-inactivation also called lyonisation, is the process by which one of the copies of the X chromosome is transcriptionally silenced in female mammals to compensate for chromosome dosage.