We’ve come a long way since the dawn of Sanger sequencing in 1977. The rapid development of next-generation sequencing (NGS) approaches over the last decade has vastly altered the cancer genomics landscape, providing researchers with the ability to assess multiple genes simultaneously.
Whole-genome sequencing (WGS) enables researchers to analyse the entire genome, base by base. This high-resolution view offers a wealth of data on the function of genes and their potential role in diseases such as cancer.
Supported by NGS technologies, WGS can provide base-pair level information about the mutations present in cancer cells and enables discovery of cancer-associated variants (single nucleotide polymorphisms, copy number variations, insertions/deletions, and structure variants). When combined with transcriptome analysis, WGS can also give researchers a comprehensive view of cancer as it progresses in response to therapy.
Numerous large-scale efforts have attempted to use genomics to characterise cancer. Projects like The Cancer Genome Atlas (TCGA), the International Cancer Genome Consortium (ICGC), the Catalog of Somatic Mutations in Cancer (COSMIC), Therapeutically Applicable Research to Generate Effective Treatments (TARGET), and the 100,000 Genomes Project have greatly improved our understanding of this disease and continue to provide invaluable data for ongoing research and resource development in onco-genomics.
As the cost of NGS technologies continues to decrease, the hope is that WGS will in turn become more feasible, and more accessible, in the treatment of cancer.
In contrast to WGS, short-read technologies break DNA into small fragments that are then amplified and sequenced to produce “reads”. Short-read sequencing can be categorized as either single molecule based (involving the sequencing of a single molecule) or ensemble based (the sequencing of multiple identical copies of a DNA molecule that have usually been amplified together on isolated beads).
Illumina currently dominate the short-read sequencing market, but there are of course other options available that are applicable to human cancer genomics research (see Table 1).
The relatively high accuracy of short-read sequencing enables researchers to identify small genetic variations that may have a role in cancer progression and treatment response. However, there are inherent limitations in sequencing shorter stretches of DNA. Since the strands must be fragmented and amplified in NGS, there is the high potential to introduce bias into the samples.
Short-read sequencing can also fail to generate sufficient overlap between DNA fragments to produce a full genome for a sample, meaning sequencing of a highly complex and repetitive genome (like that of human cancers) can be challenging.
Importantly, larger genetic alterations such as inversions, translocation and indels may also be missed with this technique. As short reads lack the phasing information of each allele, it is not possible to determine which allele possesses the associated mutation. For these reasons, researchers may instead turn to long-read sequencing to answer certain research questions.
In comparison to traditional short-read sequencing, long-read sequencing (LRS) allows for the analysis of much longer (>10,000bp) reads. This overcomes the amplification bias of SRS by sequencing a single molecule and generating a longer length to overlap a sequence for better assembly.
Compared with NGS, long-read sequencing allows for better overall resolution of highly repetitive genomic sequences, allowing the assembly of large and complex genomes. It makes the task of assembling a complete picture of the 3-billion bit human genome much simpler, with less ambiguity and error. LRS also enables other omics technologies to be brought into the picture, such as epigenetic modifications or RNA sequencing.
So-called “true” LRS directly sequences single molecules of DNA in real time, often without the need for amplification. On the other hand, “synthetic” approaches use modified sampling and conventional SRS to reconstruct long reads from short read data.
Pacific Biosciences (PacBio) developed single molecule real-time sequencing (SMRT); a long-read method based on a single DNA polymerase attached to a zero-mode waveguide (a nanostructure for fluorescence detection). In October 2022, PacBio unveiled its revolutionary new long-read sequencing system, Revio, which builds on SMRT technology to deliver 15 times more HiFi data and human genomes at scale for less than $1,000.
Nanopore-type sequencers are based on detecting changes in electrical current across a membrane when a DNA or RNA molecule passes through a protein nanopore. This allows direct sequencing of the molecules. Oxford Nanopore Technologies (ONT) is the main player in the commercial nanopore sequencer space, with several platforms available depending upon the research need.
Single-cell and spatial genomics
In recent years, single-cell and spatial sequencing have emerged, promising researchers the ability to create 3D cellular atlases of entire tissues and analyse hundreds of patient samples. Though costs remain relatively high compared to other sequencing technologies, they are increasingly utilised in oncology for their ability to detect heterogeneity among individual cells, distinguish between small numbers of cells, and to delineate cell maps.
Single-cell sequencing allows granularity and resolution at the single-cell level to determine different cell populations, types, and states; a level of detail that is lost in bulk sequencing. Pooling this information together to infer the spatial relationships between cells in tissues has significant promise for the future of cancer genomics.
The utility of single-cell sequencing in cancer genomics research is evident when considering carcinoma as an example. Carcinoma studies have largely investigated somatic oncogenic mutations, targeting functional characteristics and biochemical activity. Multiple targeted therapies have subsequently been created to treat multiple tumours, though the problem of treating relapse and drug resistance remain. Following the evolutionary path of carcinoma, genetically complex groups of individual carcinoma cells may develop and interact in a dynamic manner with each other. Studying this within tumour heterogeneity could lead to the development of novel treatment methods.
It is also worth noting that a cell’s state and behaviour can be influenced by genetic and environmental factors. Tumour progression is influenced by underlying genetic mutations and by the tumour microenvironment (TME). Quantifying the contributions of these factors requires technologies to accurately measure the spatial location of genomic sequences with phenotypic features. Emerging high-resolution methods provide a view of tumour heterogeneity that incorporates the influence of the TME and diverse cell types of the tumour. Spatial analysis can be applied to analyse primary tumours, patient-derived xenografts, and in vitro systems to understand the hierarchical structure and environmental influences governing tumour ecosystems.