Mobile Menu

A guide to cancer genomics

Original post by Liam Little in February 2023. Updated by Ashleigh Davey in September 2023.

According to the British Journal of Cancer, one in two of us will develop cancer in our lifetime. Despite this devastating statistic, there is hope on the horizon. Genomics has transformed our understanding of cancer, providing researchers with increasingly complex information on tumour heterogeneity and enabling clinicians to better monitor the success of certain treatments.

Rapid development and innovation have widened access to next-generation sequencing platforms among cancer researchers and enabled genome analysis on a scale that was unimaginable just a few years ago. Could we be standing on the verge of the next revolution in cancer genomics?

Report: Cancer Genomics

The basics of cancer biology

Cancer is a disease of the genome. Environmental factors can certainly influence the growth and spread of cancer, but the changes that first lead to this devastating disease originate inside the cell. Once believed to be a single disease, we now know that cancer is in fact a group of related diseases characterised by cells dividing uncontrollably and spreading into surrounding tissues.


The transformation of healthy cells into cancerous cells (otherwise known as oncogenesis) is a complex, multi-step process. This process begins with deleterious changes within a cell’s genome, which can have environmental, chemical or even viral origins. The development of next generation sequencing technologies in recent years has shed light on these cancer-causing genetic changes, which are largely split into three groups – mutations, gene amplifications and chromosomal rearrangements. By altering the expression or structure of critical genes, these changes promote the excessive growth of cancer cells, eventually allowing them to spread and infiltrate other tissues.  

 Several hundred critical genes have now been identified which, when mutated, play a direct role in cancer development. Termed “driver genes”, most are typically associated with regulation of cell growth, with mutations in cell cycle regulators and tumour suppressor genes being frequently observed. It is estimated that 1 – 10 driver mutations are required for oncogenesis, although this number has been shown to vary depending on cancer type.

In contrast, somatic mutations caused by the increased genetic instability of cancer cells, or mutations present in cells before oncogenesis, are known as “passenger mutations”. These do not play a direct role in tumour formation or cancer development.  

The hallmarks of cancer

Driver mutations typically affect the protein sequence arising from a gene – often with functional consequences. Defects in essential proteins result in cancer cells developing several key features that assist in their survival and immune system evasion. Collectively known as the “Hallmarks of Cancer”, these qualities were first documented in 2000 and have subsequently been expanded upon as next generation sequencing revolutionised our understanding of cancer genomics (see Figure 1).  

Figure 1: The 14 hallmarks of cancer. Sourced from Hanahan D, 2022. 

Several of these malignant traits involve critical genomic modifications that allow cancerous cells to remain in the cell cycle pathway. Due to the error-prone nature of DNA replication, the cell cycle contains essential checkpoints which evaluate the genomic integrity of the cell and prevent genetic errors from being copied into the next generation of cells. Should a defective cell be recognised, the checkpoints can trigger a variety of signalling pathways which prevent progression through the cell cycle.

Normally, excessive DNA damage triggers repair pathways or an induced cell death mechanism – apoptosis – to either fix or destroy abnormal cells. Cancerous cells employ a variety of methods to avoid this fate. This includes developing mutations which enable sustained proliferative signalling, replicative immortality, and resistance to cell death mechanisms. 

Tumour development

Following the initial genetic changes that trigger oncogenesis, the subsequent uncontrolled growth of cancerous cells results in the development of a tumour in the primary site. Defined as any mass formed from the abnormal proliferation of cells, tumours may be either benign or malignant, depending upon their ability of invade surrounding tissue or spread to secondary sites within the body. Notably, only malignant tumours are considered cancerous.

The development of tumours is a long, complex process which can take many years after the initial driver mutations occur. It is estimated that human tumours are only detectable once they number 10 – 100 billion cells, with researchers discovering that this process can take 10 years in breast and bowel cancers.

Tumour classification

Tumours are classified depending on the cell type from which they arise. The five main categories are carcinoma, sarcoma, leukaemia, lymphoma and myeloma (classified together), and central nervous system cancers.

Approximately 90% of human cancers fall under the carcinoma category, consisting of malignancies that arise in epithelial cells. Tumours can also be further classified depending on their tissue or organ of origin, for example erythroid leukaemias arise from precursors of erythrocytes.

The tumour microenvironment

Tumours rely on the local environment surrounding them for their continued survival. The tumour microenvironment (TME) is composed a diverse range of cell types – including tumour cells, immune cells, and endothelial cells – which are held together by components of the extracellular matrix. Tumour cells communicate with the TME using signalling molecules (such as cytokines and growth factors) to manipulate the activity of non-tumour cells in their favour. 

The TME is now known to be a critical factor in tumour progression and cancer pathogenesis. At the early tumour initiation stage, cancer cells are detected by the innate immune system, which infiltrate the primary tumour site. However, as tumour growth progresses, cancer cells modulate the activity of surrounding immune cells (such as macrophages and fibroblasts) to evade immune detection and promote tumour progression. Once the tumour is fully established, the TME also plays a role in the invasion and spread of cancerous cells into the bloodstream and secondary tissues.

Cancer metastasis

The spread of cancerous cells to a secondary site within the body (metastasis) is the primary cause of death for over 90% of cancer patients. Despite the importance of metastasis on patient prognosis, there are still many unanswered questions as to what drives cancer cell migration and how it can be prevented. 

Normal cells will migrate through the body until they contact another cell, get stuck, and create a uniform array of cells. On the other hand, tumour cells exhibit a reduced expression of cell surface adhesion molecules, meaning that when they contact other cells, they don’t get stuck. Instead, tumour cells continue to migrate over and around other cells, and (in culture) will grow in a disorderly and often multi-layered pattern. This lack of adhesion molecules plays an important role in the proliferation, invasion, and metastasis of cancer.

An overview of cancer biology

Cancer genomics sequencing options

Over the last decade, the rapid development of next-generation sequencing (NGS) approaches has vastly altered the cancer genomics landscape, providing researchers with the ability to assess multiple genes simultaneously. In this short time, NGS has enabled a deeper understanding of the complexities of tumour development and metastasis, leading to new discoveries, therapies and improved outcomes for people diagnosed with cancer.

As the technology continues to develop, researchers are faced with an expanding list of sequencing options that each play a distinct and essential role in cancer research.

Short-read sequencing

Short-read sequencing is the cornerstone of genomics research, owing to the wide variety of platforms and potential applications of the technology. As its name suggests, short-read sequencing requires the nucleic acid to be cut into short segments (known as reads), which are then amplified, sequenced, and aligned to a reference genome. Due to the cost of sequencing decreasing, short-read sequencing has become commonplace in clinical and research settings, with whole genome sequencing (WGS), whole exome sequencing (WES) and gene panel testing becoming essential tools in the cancer genomics realm. 

As the leaders of the short-read sequencing market, Illumina have made a splash in 2023 with the delivery of the eagerly awaited NovaSeq X Plus sequencer. Announced in September 2022, the NovaSeq X series provides a marked improvement to the NovaSeq 6000 whilst also being compatible with Illumina’s Complete Long Reads products.

As the sequencing market is continuously expanding, we present here the key players within the nucleic acid sequencing field (see Table 1). For a more in-depth assessment of current sequencing technologies, we refer you to the Front Line Genomics Sequencing Buyer’s Guide (5th Edition)

Table 1: Selected short-read sequencing technologies.

The relatively high accuracy of short-read sequencing enables researchers to identify small genetic variations that may have a role in cancer progression and treatment response. However, there are inherent limitations in sequencing shorter stretches of DNA. Since the strands must be fragmented and amplified in NGS, there is the high potential to introduce bias into the samples.

Short-read sequencing can also fail to generate sufficient overlap between DNA fragments to produce a full genome for a sample, meaning sequencing of a highly complex and repetitive genome (like that of human cancers) can be challenging. Importantly, larger genetic alterations such as inversions, translocation and indels may also be missed with this technique.

For these reasons, researchers may instead turn to long-read sequencing to answer certain research questions.

Long-read sequencing

In comparison to traditional short-read sequencing, long-read sequencing allows for the analysis of much longer (>10,000bp) reads. This overcomes the amplification bias of short-read sequencing by sequencing a single molecule and generating a longer length to overlap a sequence for better assembly.

Compared with NGS, long-read sequencing allows for better overall resolution of highly repetitive genomic sequences, allowing the assembly of large and complex genomes. It makes the task of assembling a complete picture of the 3-billion bit human genome much simpler, with less ambiguity and error. Long-read sequencing also enables other omics technologies to be brought into the picture, such as epigenetic modifications or RNA sequencing.

Pacific Biosciences (PacBio) developed single molecule real-time sequencing (SMRT); a long-read method based on a single DNA polymerase attached to a zero-mode waveguide (a nanostructure for fluorescence detection). In October 2022, PacBio unveiled its revolutionary new long-read sequencing system, Revio, which builds on SMRT technology to deliver 15 times more HiFi data and human genomes at scale for less than $1,000.

Nanopore-type sequencers are based on detecting changes in electrical current across a membrane when a DNA or RNA molecule passes through a protein nanopore. This allows direct sequencing of the molecules. Oxford Nanopore Technologies (ONT) is the main player in the commercial nanopore sequencer space, with several platforms available depending upon the research need.

Long-read vs Short-read Sequencing

Whole-genome sequencing

Both short-read and long-read technologies can be applied to whole genome sequencing (WGS), a vital tool in understanding the complexities and variations in cancer genomes. By analysing patient genomes base-by-base, WGS allows the detection of pathogenic mutations in all regions of the genome – including protein-coding genes and non-coding DNA. When combined with transcriptome analysis, WGS can also give researchers a comprehensive view of cancer as it progresses in response to therapy. 

Although clinical analysis is commonly limited to just the coding regions of the genome (a process known as whole exome sequencing (WES)), WGS analysis of non-coding regions is becoming increasingly more popular as information about mutations in non-coding regions grows. These areas of the genome are often essential regulators of gene activity, including promoters, enhancers and splicing machinery.

Numerous large-scale efforts have attempted to use genomics to characterise cancer. Projects like The Cancer Genome Atlas (TCGA), the International Cancer Genome Consortium (ICGC), the Catalog of Somatic Mutations in Cancer (COSMIC), Therapeutically Applicable Research to Generate Effective Treatments (TARGET), and the 100,000 Genomes Project have greatly improved our understanding of this disease and continue to provide invaluable data for ongoing research and resource development in onco-genomics.

Single-cell and spatial genomics

In recent years, single-cell and spatial sequencing have emerged, promising researchers the ability to create 3D cellular atlases of entire tissues and analyse hundreds of patient samples. Though costs remain relatively high compared to other sequencing technologies, they are increasingly utilised in oncology for their ability to detect heterogeneity among individual cells, distinguish between small numbers of cells, and to delineate cell maps.

Single-cell sequencing allows granularity and resolution at the single-cell level to determine different cell populations, types, and states; a level of detail that is lost in bulk sequencing. Pooling this information together to infer the spatial relationships between cells in tissues has significant promise for the future of cancer genomics.

The utility of single-cell sequencing in cancer genomics research is evident when considering carcinoma as an example. Carcinoma studies have largely investigated somatic oncogenic mutations, targeting functional characteristics and biochemical activity. Multiple targeted therapies have subsequently been created to treat multiple tumours, though the problem of treating relapse and drug resistance remain. Following the evolutionary path of carcinoma, genetically complex groups of individual carcinoma cells may develop and interact in a dynamic manner with each other. Studying this within tumour heterogeneity could lead to the development of novel treatment methods.

A Guide to Single-Cell and Spatial Analysis

RNA Sequencing

It cannot be denied that NGS has revolutionised our way of deciphering the genome. In addition to the information DNA-based assays can provide, RNA sequencing has rapidly become a staple technique in cancer diagnosis, drug development and disease prognosis. From bulk RNA to single-cell sequencing, analysis of RNA species within cells provides vital information about the gene activity and regulation which underlies cancer pathogenesis. 

Many populations of coding and non-coding RNA (the “transcriptome”) within cancer cells can be analysed to provide useful information about cancer aetiology. For example, mRNA quantification is commonly used to generate insights into gene function, with a recent study illustrating that bulk mRNA expression is an accurate predictor of cancer prognosis. However, only 2% of the genome is comprised of coding gene regions. Non-coding RNA species (including micro RNA (miRNA), circular RNA (crcRNA) and long non-coding RNA (lncRNA)) are diverse regulators of many cancer-related pathways, including angiogenesis, apoptosis and metastasis. 

As NGS technologies advance, the market of RNA sequencing platforms available has rapidly expanded to include the big names of sequencing technology – Illumina, PacBio and Oxford Nanopore being a few notable examples. Like DNA sequencing, both short-read and long-read methods can be applied to RNA populations, although the required library preparation steps differ depending on the platform selected.

How to do RNA Sequencing

Sequencing options for cancer genomics

Immunotherapy and precision oncology

In a healthy individual, the immune system responds to “foreign” cells (such as cancer) by attacking and eliminating them. Unfortunately, cancer cells have their own strategies for evading this immune response, leading to further proliferation and potential metastasis. The traditional course of action is to treat the disease using surgery, chemotherapy, or radiotherapy (or some combination of these). However, many patients simply do not respond to these established therapies.

Immunotherapy – a type of biological therapy – is a treatment strategy focused on harnessing the power of the patient’s immune system to attack cancer and stunt its development. It shows great promise as a bespoke therapy for cancers that do not respond to traditional treatments and could improve quality of life for many patients. There are several immunotherapy treatments available for patients (see Figure 2).

Figure 2: The various immunotherapy approaches for cancer treatment. Sourced from Mishra et al, 2022.

Checkpoint inhibitors

Immune checkpoint inhibitors are drugs that are able to block T cell activation and regulate hyperactivation of the immune system. The most well-known examples are antibodies that block the cytotoxic T lymphocyte antigen 4 (CTLA4) and programmed cell death 1 (PD-1) proteins.

These drugs are used to treat melanoma, renal cell carcinomas, colorectal cancers, non-small cell lung cancer, head and neck cancer, cervical cancer, endometrial cancer, bladder cancer and breast cancer – with more cancer types on the horizon.

CAR T-cell therapy

Chimeric antigen receptor (CAR) T-cell therapy – otherwise known as T-cell transfer therapy – is a specialised immunotherapy in which changes are made to the genes of a patient’s T-cells to increase their efficiency in recognising and destroying cancer.

Once these tweaks have been made in the lab, the T-cells are grown in batches and put back into the body via an intravenous drip. CAR T-cell therapy is currently used to treat children with some forms of leukaemia, and in adults with lymphoma.

Cancer vaccines

There are two types of cancer vaccines: prophylactic and therapeutic. Prophylactic vaccines are more similar to a traditional vaccine and are used to prevent infection by an oncogenic virus. One common example is the human papillomavirus vaccine against cervical cancer.

Therapeutic vaccines harness tumour-associated antigens to help the immune system eliminate cancer cells. Non-cancerous cells are protected from this attack as they either do not display these antigens or do not possess the antigens in high enough numbers to be targeted.

Cancer genomics in precision oncology

In the past, cancer was defined in terms of the tissue-of-origin – if a cancer originated in the lung, it is lung cancer. With the dawn of tumour sequencing, we now have an insight into the many different subsets of cells within a cancer and how these are defined based on their patterns of genetic alterations.

In the clinic, this has seen us move from treatment determined by the location of the tumour in the body, to considering the molecular patterns present in cancers and treating them accordingly. In clinical trials, we have seen a similar shift towards small, focused patient populations to test treatments, and drugs that are matched to specific mutations in a patient’s cancer. Ultimately, this has led to better responses to treatment.

NGS and genomic data

NGS data has proven instrumental in developing targeted therapies (drugs that directly attack cancer by altering expression of crucial oncogenes) and immunotherapies in precision oncology. NGS methods and bioinformatics platforms have generated oceans of cancer genomics data which has been used to target aggressive cancers that do not respond, or respond poorly, to conventional treatment options.

NGS was used to obtain massive amounts of genomic data from cancer patients with acute myeloid leukaemia, which later expanded to other solid tumours, and now forms The Cancer Genome Atlas (TCGA). NGS profiles from a host of tumours can assist in the creation of targeted therapies by identifying mutations in signalling pathways and blocking them with existing or novel drugs.

Cancer immunotherapy and precision oncology

Report: Cancer Genomics

Drug discovery and development

Drug discovery is a time-consuming and costly process, particularly given the high number of trials that ultimately “fail” or have negative outcomes. A high percentage of negative trial outcomes is to be expected in early-phase (I or II) trials, given they are mostly used as a proof-of-concept. However, the estimated 50% negative outcome rate in phase III trials represents a significant burden of cost in the drug development pipeline – and is a key target of genomics research.

Facilitating drug development with genomics

There are various ways that genomic information can help accelerate and improve drug development. Conceptual approaches in genetics and genomics help with target identification, prioritisation, and tractability, as well as predicting outcomes of pharmacological perturbations. Population genomics initiatives can also aid in target identification. Bulk and single-cell gene expression data is useful to understand the biological relevance of drug targets. Genome-wide CRISPR editing can screen for loss of function or activation of genes – a valuable tool for prioritising drug targets.

Genome sequencing and genotyping

Genome-wide association studies (GWAS) use high-density genotyping of common variants and linkage analysis. Exome sequencing captures the coding region of the human genome (about 1.5% of the entire genome). Whole genome sequencing achieves good coverage (around 85%) of the whole genome.

Exome sequencing and WGS are useful in identifying specific rare disease-associated variants that may be causal in cancer. Technical specifications of each technology may determine their success in translating variant discovery into actionable targets.


Transcriptional profiling of cells and tissues is a common technique in drug discovery, with high relevance to cancer drug and therapeutic development. Its use in supporting drug development includes mapping responses to compounds, interrogating tissues and cells for expression of target variants, and identifying causal variants of clinical phenotypes. It can also be used as a source of biomarkers to stratify patients for clinical trials.

Transcriptomics offers insights into the mechanisms of action and off-target effects in drugs. RNA-sequencing is not constrained by cell types or numbers, meaning accurate physiological models can be selected. This flexibility is derived from protocols ranging across low inputs, bulk or single-cell interrogation, and spatial transcriptomics.

CRISPR-based technologies

CRISPR-based genome editing facilitates the creation of targeted genetic perturbations at scale and can screen for a phenotype of interest. RNA programmable genome-targeting by CRISPR/Cas-9 has been used to inhibit or activate transcription, edit nucleotides, and modify epigenetic states.

Screening for disease-relevant or drug mechanism-of-action targets are limited by suitability and scalability of available model systems. CRISPR screens have, nonetheless, driven target prioritisation for various disease models and clarified targets, enhancers, and resistance genes for existing drugs.

Cancer drug discovery and development

Further reading

Report: Cancer Genomics

An overview of cancer biology

Sequencing options for cancer genomics

Cancer immunotherapy and precision oncology

Cancer drug discovery and development