We are seeing an increased interest in the examination of multiple omic layers, to paint in-depth molecular pictures that can provide insights into the way our “-omes” interact in disease. However, this comes with a unique set of data challenges including the requirement for thoughtful study design, data integration, analytics and interpretation.
At the Festival of Genomics and Biodata 2022, we hosted a multi-omics workshop with a host of experts from the space. Below, we share some of the key insights learned from this session.
Opportunities and challenges in different types of multi-omics
Multi-omic analysis can be used in a very targeted manner, to really dig into the details of a biological question for greater understanding. Integration of multiple -omic datasets can reveal interactions that would have otherwise been lost. Equally, the use of multi-omics in more exploratory analysis can provide much greater coverage and can more easily reveal areas for targeted analysis.
Multi-omic data can be integrated in two main ways: horizontally and vertically.
Vertical integration is what you might traditionally think of when it comes to multi-omics. It involves combining different forms of data (transcriptomic, proteomic, genomic) that are acquired from the same split sample.
Horizontal integration is where the same type of -omic data is combined but from different labs, or platforms, or even biological systems.
These two types of integration, while different, often have the same issues that arise. There can be different coverage, noise or variation, and resolution in discrete -omic data types.
Due to the processes by which different data arises, one dataset (e.g transcriptomics) may have far more nodes than another (e.g. protein-protein interaction). This doesn’t necessarily mean that one dataset is more important than the other, and they must be normalised accordingly in order to integrate them together.
Another decision is when to integrate the data, known as early and late integration. Is the data combined at a very early stage, perhaps before specific biological context, or at a late stage, like combining known pathways into networks?
The Cancer Genome Atlas was cited as a good example of integrating known pathways into larger networks.
Examples of tools for multi-omic analysis
Tamás Korcsmáros, Group Leader at the Earlham Institute, shared with the workshop some useful tools for multi-omic data analysis.
The first was Sherlock, which has features that allow storing of all datasets in a redundant organised cloud storage, conversion of all datasets to common, optimised file formats, and execution of analytical queries on top of data files among others.
Additional examples of multi-layered network resources mentioned were:
- SignaLink (http://signalink.org/)
- ARN (http://autophagyregulation.org/)
- SalmoNet 2.0 http://salmonet.org/
SignaLink is an integrated resource to analyze signaling pathway cross-talks, transcription factors, miRNAs and regulatory enzymes.
ARN An integrated resource to analyze regulatory network of autophagy proteins
SalmoNet 2.0 is an integrated network resource containing regulatory, metabolic and protein-protein interactions in Salmonella
Multi-omic data visualisation tips
An interesting topic was simply how best to visualise the multi-omic networks that you are creating. Dezso Modos, Research Scientist at the Earlham Institute shared his advice on how to make quality, refined and accessible network visualisation for your data.
The first question to ask in these situations is, is a network visualisation necessary? While you could describe all the genes/proteins of interest in a table, what a network allows you to do is to “zoom in” and use unsupervised clustering to show relevant biological functions and relatedness between these genes/proteins of interest.
An important and easily overlooked factor to take into account when visualising data using colours is to check for common red/green colour-blindness. By changing the colour of different nodes and edges in a network, rather than the layout, it’s much easier to compare different subsets of the data.
It was also pointed out that shapes can convey data, and by using a variety of shapes like trapezoids, triangles, and rectangles, one can easily visualise key information at a glance. Another bit of good practice to follow is to always increase the values (font size, node size, edge width) of data of interest, rather than reduce the size of other data.
Managing data and data longevity
The volume of data is increasing at a tremendous rate. As Gemma Holliday, Bioinformatician at the Medicines Discovery Catapult put it, “we are drowning”. Gemma also highlighted how little we know about the data we are collecting. In November 2021, there were >214 million proposed proteins, with over two thirds of these being “predicted”.
If we stopped generating data today and tried to curate it all by hand with reference to unstructured data from texts and papers, it would take over a decade. Extracting information from these unstructured free texts is really important for building knowledge using the vast amounts of structured data. Methods like natural language processing and AI, that can convert synonymous notations into a single form (e.g., skeletal formula into a chemical formula), are vital for this. Equally, there are a lot of data being generated but not published. Negative results, unpublished results, PhD/masters theses, non-electronic papers etc. These may be overlooked when building biological knowledge around data.
Multi-omic data analysis is becoming more and more powerful as time goes on, and this workshop highlighted key areas for thought as the space continues to grow. Multi-omics opens up a ‘third eye’ for answering biological questions and will help researchers come closer to understanding biology and disease in a multi-layered contextual way. These insights will help make sure that as little of the huge potential multi-omics has to offer goes to waste.
Image Credit: Canva