Original post by Liam Little in February 2023. Updated by Ashleigh Davey in September 2023.
Over the last decade, the rapid development of next-generation sequencing (NGS) approaches has vastly altered the cancer genomics landscape, providing researchers with the ability to assess multiple genes simultaneously. In this short time, NGS has enabled a deeper understanding of the complexities of tumour development and metastasis, leading to new discoveries, therapies and improved outcomes for people diagnosed with cancer.
As the technology continues to develop, researchers are faced with an expanding list of sequencing options that each play a distinct and essential role in cancer research.
Short-read sequencing is the cornerstone of genomics research, owing to the wide variety of platforms and potential applications of the technology. As its name suggests, short-read sequencing requires the nucleic acid to be cut into short segments (known as reads), which are then amplified, sequenced, and aligned to a reference genome. Due to the cost of sequencing decreasing, short-read sequencing has become commonplace in clinical and research settings, with whole genome sequencing (WGS), whole exome sequencing (WES) and gene panel testing becoming essential tools in the cancer genomics realm.
As the leaders of the short-read sequencing market, Illumina have made a splash in 2023 with the delivery of the eagerly awaited NovaSeq X Plus sequencer. Announced in September 2022, the NovaSeq X series provides a marked improvement to the NovaSeq 6000 whilst also being compatible with Illumina’s Complete Long Reads products. In addition, the newly released DRAGEN 4.2 secondary analysis software boasts significant improvement for germline variant detection and small copy number variants – a useful update for cancer variant detection.
Last year, Ultima Genomics unveiled their new UG 100 sequencer, bringing us one step closer to the ground-breaking “$100 genome”. Although instrument specifications are yet to be released, the initial data from their white paper looks promising, claiming a low cost of sequencing (~$1 per Gb) and high accuracy (>99.8% for SNPs). This year, Ultima also announced their partnership with Genome Insight in a collaboration that aims to bring affordable whole genome sequencing to cancer patients.
As the sequencing market is continuously expanding, we present here the key players within the nucleic acid sequencing field (see Table 1). For a more in-depth assessment of current sequencing technologies, we refer you to the Front Line Genomics Sequencing Buyer’s Guide (5th Edition).
The relatively high accuracy of short-read sequencing enables researchers to identify small genetic variations that may have a role in cancer progression and treatment response. However, there are inherent limitations in sequencing shorter stretches of DNA. Since the strands must be fragmented and amplified in NGS, there is the high potential to introduce bias into the samples.
Short-read sequencing can also fail to generate sufficient overlap between DNA fragments to produce a full genome for a sample, meaning sequencing of a highly complex and repetitive genome (like that of human cancers) can be challenging.
Importantly, larger genetic alterations such as inversions, translocation and indels may also be missed with this technique. As short reads lack the phasing information of each allele, it is not possible to determine which allele possesses the associated mutation. For these reasons, researchers may instead turn to long-read sequencing to answer certain research questions.
In comparison to traditional short-read sequencing, long-read sequencing allows for the analysis of much longer (>10,000bp) reads. This overcomes the amplification bias of short-read sequencing by sequencing a single molecule and generating a longer length to overlap a sequence for better assembly.
Compared with NGS, long-read sequencing allows for better overall resolution of highly repetitive genomic sequences, allowing the assembly of large and complex genomes. It makes the task of assembling a complete picture of the 3-billion bit human genome much simpler, with less ambiguity and error. Long-read sequencing also enables other omics technologies to be brought into the picture, such as epigenetic modifications or RNA sequencing.
So-called “true” long-read sequencing directly sequences single molecules of DNA in real time, often without the need for amplification. On the other hand, “synthetic” approaches use modified sampling and conventional short-read sequencning to reconstruct long reads from short read data.
Pacific Biosciences (PacBio) developed single molecule real-time sequencing (SMRT); a long-read method based on a single DNA polymerase attached to a zero-mode waveguide (a nanostructure for fluorescence detection). In October 2022, PacBio unveiled its revolutionary new long-read sequencing system, Revio, which builds on SMRT technology to deliver 15 times more HiFi data and human genomes at scale for less than $1,000.
Nanopore-type sequencers are based on detecting changes in electrical current across a membrane when a DNA or RNA molecule passes through a protein nanopore. This allows direct sequencing of the molecules. Oxford Nanopore Technologies (ONT) is the main player in the commercial nanopore sequencer space, with several platforms available depending upon the research need.
Combining the sequencing power of long-read technologies with the accuracy of short-read sequencing can bring the advantages of both methods to cancer genomics research. Using a tandem approach, short-read data can be used to correct errors originating in long reads, with a number of bioinformatics tools being developed to support this hybrid approach. Alternatively, the selective application of long-read sequencing in certain clinical settings may be more beneficial than using a widespread short-read approach.
Both short-read and long-read technologies can be applied to whole genome sequencing (WGS), a vital tool in understanding the complexities and variations in cancer genomes. By analysing patient genomes base-by-base, WGS allows the detection of pathogenic mutations in all regions of the genome – including protein-coding genes and non-coding DNA. When combined with transcriptome analysis, WGS can also give researchers a comprehensive view of cancer as it progresses in response to therapy.
Although clinical analysis is commonly limited to just the coding regions of the genome (a process known as whole exome sequencing (WES)), WGS analysis of non-coding regions is becoming increasingly more popular as information about mutations in non-coding regions grows. These areas of the genome are often essential regulators of gene activity, including promoters, enhancers and splicing machinery. Although there is some evidence of pathogenic driver mutations in promoters and untranslated regions, understanding how non-coding variants affect gene function and cancer development is still an ongoing area of cancer research.
Numerous large-scale efforts have attempted to use genomics to characterise cancer. Projects like The Cancer Genome Atlas (TCGA), the International Cancer Genome Consortium (ICGC), the Catalog of Somatic Mutations in Cancer (COSMIC), Therapeutically Applicable Research to Generate Effective Treatments (TARGET), and the 100,000 Genomes Project have greatly improved our understanding of this disease and continue to provide invaluable data for ongoing research and resource development in onco-genomics.
As the cost of NGS technologies continues to decrease, the hope is that WGS will in turn become more feasible, and more accessible, in the treatment of cancer.
Single-cell and spatial genomics
In recent years, single-cell and spatial sequencing have emerged, promising researchers the ability to create 3D cellular atlases of entire tissues and analyse hundreds of patient samples. Though costs remain relatively high compared to other sequencing technologies, they are increasingly utilised in oncology for their ability to detect heterogeneity among individual cells, distinguish between small numbers of cells, and to delineate cell maps.
Single-cell sequencing allows granularity and resolution at the single-cell level to determine different cell populations, types, and states; a level of detail that is lost in bulk sequencing. Pooling this information together to infer the spatial relationships between cells in tissues has significant promise for the future of cancer genomics.
The utility of single-cell sequencing in cancer genomics research is evident when considering carcinoma as an example. Carcinoma studies have largely investigated somatic oncogenic mutations, targeting functional characteristics and biochemical activity. Multiple targeted therapies have subsequently been created to treat multiple tumours, though the problem of treating relapse and drug resistance remain. Following the evolutionary path of carcinoma, genetically complex groups of individual carcinoma cells may develop and interact in a dynamic manner with each other. Studying this within tumour heterogeneity could lead to the development of novel treatment methods.
It is also worth noting that a cell’s state and behaviour can be influenced by genetic and environmental factors. Tumour progression is influenced by underlying genetic mutations and by the tumour microenvironment (TME). Quantifying the contributions of these factors requires technologies to accurately measure the spatial location of genomic sequences with phenotypic features. Emerging high-resolution methods provide a view of tumour heterogeneity that incorporates the influence of the TME and diverse cell types of the tumour. Spatial analysis can be applied to analyse primary tumours, patient-derived xenografts, and in vitro systems to understand the hierarchical structure and environmental influences governing tumour ecosystems.
It cannot be denied that NGS has revolutionised our way of deciphering the genome. In addition to the information DNA-based assays can provide, RNA sequencing has rapidly become a staple technique in cancer diagnosis, drug development and disease prognosis. From bulk RNA to single-cell sequencing, analysis of RNA species within cells provides vital information about the gene activity and regulation which underlies cancer pathogenesis.
Many populations of coding and non-coding RNA (the “transcriptome”) within cancer cells can be analysed to provide useful information about cancer aetiology. For example, mRNA quantification is commonly used to generate insights into gene function, with a recent study illustrating that bulk mRNA expression is an accurate predictor of cancer prognosis. However, only 2% of the genome is comprised of coding gene regions. Non-coding RNA species (including micro RNA (miRNA), circular RNA (crcRNA) and long non-coding RNA (lncRNA)) are diverse regulators of many cancer-related pathways, including angiogenesis, apoptosis and metastasis.
As NGS technologies advance, the market of RNA sequencing platforms available has rapidly expanded to include the big names of sequencing technology – Illumina, PacBio and Oxford Nanopore being a few notable examples. Like DNA sequencing, both short-read and long-read methods can be applied to RNA populations, although the required library preparation steps differ depending on the platform selected.