Multi-omics is the combining of different “omics” – the genome, epigenome, transcriptome and proteome. Studying each layer in isolation can only colour in part of the picture. By bringing all these different layers of biological insight together, we can begin to paint a more complete picture of human biology and disease.
What -omics technologies are available?
A multi-omics approach involves the combining of different “omics”: genomics, epigenomics, transcriptomics, and proteomics. The simultaneous study of each “omic” can provide a more accurate, holistic, and representative understanding of the complex molecular mechanisms that underpin biology.
Fundamentally, genomics investigates the structure, function, mapping, evolution and editing of information coded in our (and other species) genomes. That includes single nucleotide variants (SNVs), insertions, deletions, copy number variations (CNVs), duplications, inversions… the list goes on.
In the past decade, genomics has allowed us to predict, diagnose and treat diseases in a more unbiased and precise way than we ever could before. And in research, genomics has revealed the genes or mutations involved in thousands of different phenotypes, biological processes and diseases. This has allowed us to identify new biomarkers, new drug targets and so much more.
Epigenomics investigates modifications of DNA or DNA-associated proteins, such as DNA methylation, chromatin interactions and histone modifications. Epigenetic regulation of DNA can determine cell fate and function, and the epigenome can change based on the environment. What’s more, these DNA alterations can be passed on.
These changes can act as markers for cancer, metabolic syndromes, cardiovascular disease and more. They can be tissue-specific, cell-specific and even more specific than that – down to sub-cellular compartments – and changes can occur during both healthy and disease states.
Transcriptomics involves investigating RNA transcripts that are produced by the genome and how these transcripts are altered in response to regulatory processes. It’s the bridge between genotype and phenotype – the link between the genes and the proteins. Sandwiched nicely in the middle, it can tell us a lot about our biology.
Often, the most useful insight cannot be obtained by only studying genes – much more can be found out by looking at proteins too. The proteome is highly dynamic, as proteins can be modified in response to internal and external cues and different proteins are constructed by the cell as circumstances change. This is why proteomics examinations can be described as a ‘snapshot’ of the protein environment at any given time.
Proteomics has evolved over the past decades. This is mostly due to the accumulation of protein and DNA databases, with algorithms for searching through all the information generated, and improvements in technologies, such as mass spectrometry. Today, proteomics is essential for early disease diagnosis and monitoring. It also plays a crucial role in identifying target molecules for drug discovery and is used to understand complex gene functions.
Single-cell and spatial -omics
Single-cell analysis has allowed researchers to study the inner workings of a cell at a never-before-seen resolution and reveal the full complexity of cellular diversity. With spatial-omics researchers can now map the whole genome, epigenome, transcriptome, proteome, – and many other “omes” of hundreds of thousands of cells while preserving morphological and spatial context.
Single-cell techniques started with transcriptomics, but in subsequent years other omics have been added into the mix. In particular, single-cell proteomics has seen the most recent developments in technology and application.
Projects like the Human Cell Atlas have utilised current advances in single-cell analysis to reveal a previously unrecognised heterogeneity of cell types and defined new cell states that are associated with diseases from cancer to liver disease, Alzheimer’s and heart disease.
However, one crucial step in the single-cell analysis workflow is dissociation – breaking down tissues to prepare them for analysis. This breakdown of tissue means the spatial context is lost, as well as potentially changing features due to stress, cell death or cell aggregation.
This is where spatial omics comes in. Now researchers can see neighbouring cells, noncellular structures, which signals cells may have been exposed to and more. Spatial context also provides more information, allowing researchers to define things such as cellular phenotype, cell state and function. This is why spatial multi-omics was named one of the seven technologies to watch in a 2022 Nature article.
Developments in the spatial omics field have been rapid, skyrocketing past expectations. The development of this technology means that spatial context is preserved, and researchers can now profile cells and tissue in their morphological context and understand the influence of their local environment and surrounding cells.
The multi-omics approach
Multi-omics involves bringing the multiple “omics” together, to get a clearer and more comprehensive picture of biological processes, disease pathology, identify more robust drug targets and biomarkers, and more.
Genomics and transcriptomics
Genomics and transcriptomics can be integrated to prioritise different variants, analyse the function of genes, uncover mechanisms of disease, power drug target identification and fuel biomarker discovery.
Epigenomics and transcriptomics
Epigenomics and transcriptomics can tie gene regulation to gene expression, revealing patterns in the data and helping to decipher complex pathways and disease mechanisms. By studying both the epigenome and the transcriptome, researchers can derive new insights into biological processes and diseases pathology.
Genomics, epigenomics and transcriptomics
The combination of the sequencing from genomics, epigenomics and transcriptomics can help to understand the mechanisms controlling specific phenotypes, uncover new regulatory elements, help identify candidate genes, biomarkers and therapeutic agents.
Genomics and proteomics
The combination of genomics and proteomics can be very effective as it allows the genotype to be linked directly to the phenotype. This approach can elucidate and characterise biological processes, help untangle disease-driving mechanisms and inform the development of therapeutics.
Transcriptomics and proteomics
The combination of transcriptomics and proteomics is powerful as it can tie new discoveries back to known markers and clinical outcomes. This gives insights into how gene expression affects protein function and phenotype.
Data integration and bioinformatics
Data integration is the process of combining different omics datasets, allowing researchers to stack the multiple layers of biological insight together to get the whole picture. Integration is at the core of the multi-omics approach. However, this stage is often cited as the most challenging.
The optimal data integration strategy or approach depends on several factors. Firstly, the biological question being addressed has an impact. Different approaches can be broadly split into three categories: disease subtyping, disease insights and biomarker prediction. Another factor is the data: data type, quality, size and resolution can impact how the data should be analysed, interrogated and integrated. The third is the experiment itself – the organism, and even the tissue type, can impact which tool or package to use.
Machine learning and artificial intelligence
Machine learning (ML) and artificial intelligence (AI) approaches are becoming increasingly popular. However, ML or AI should not be considered a magic bullet – as with any technique, each has their own limitations and challenges.
Moreover, a lot of these approaches are not even that novel – in fact the buzz-wordy nature of these terms means that they are often used for relatively basic and old models like Random Forest, which was developed back in 1995. That being said, there’s a lot of innovation and development in the ML/AI space – and having some background knowledge may help you identify what is truly fresh and ground-breaking.
ML and AI considerations
Data shift occurs when there is a mismatch between the data an AI or ML model was trained and tested on and the data it encounters in the “real world”. Essentially, training fails to produce a good ML model because the training and testing data does not match other datasets and is not generalisable.
Even if a training process can produce a model that performs well on the test data, that model can still be flawed. This is because with ML models, the training process can produce many different models that all work on your test data, but these models differ in small, seemingly unimportant ways.
These differences can be attributed to many things, such as the way training data is selected or represented, the number of training runs, and so on.
Overfitting vs underfitting
Overfitting is when a statistical or ML model fits too exactly against its training data – and as a result, when the model is tested against unseen data, it cannot perform accurately.
Underfitting is the opposite of overfitting. To avoid overfitting, less time should be spent training the model. This is known as “early stopping,” – reducing the complexity of the model. However, pausing too early may cause the model to miss or exclude important features, leading to underfitting. This means the model, like with overfitting, is unable to generalize to new “real world” data.
Data leakage is a major problem in ML when developing predictive models. The goal of a predictive ML model is to make accurate predictions on new unseen data. When information from the data a model is trained on includes data that it is later tested on, the model has effectively already seen the answers, and its predictions seem much better than they really are.
Black box models
Some ML and AI models are referred to as “black box models,” where users and researchers know the inputs and the outputs, but do not know how the model actually works. However, if we can’t interpret the model, how can we falsify, test, and reproduce the results?
Interpretable models, or explainable models, instead make clear how the model works. Often these models are also open-source, and all the code is made easily accessible and freely available.