Mobile Menu

The Latest Developments in Sequencing Technologies

This feature was put together using content from Chapter 3 and Chapter 9 of the 2024 Sequencing Buyer’s Guide. For an in depth look at all aspects of sequencing, from sample prep to analysis, please download the full guide.

In the following feature, we will review some of the most exciting developments in sequencing from 2022 and 2023. This includes an overview of advancements in the sequencing market, innovations in multi-omics, proteomics and solid-state nanopore sequencing and a vision for where developments in sequencing may go in the near future.

Advancements in Sequencing

Sequencing is an exciting field of development. Starting with the first generation of sequencing, Sanger sequencing in the 1970’s, the last decade and a half has seen the release of two more generations.

Second generation or Next Generation Sequencing (NGS) took off in the late 2000’s/early 2010’s and presented a methodology to sequence an entire genome using short-read technology. It is the main sequencing methodology used today and Illumina are the major providers, accounting for the majority of the world’s sequencing data.

A few years later the third generation or long-read sequencing methodologies were launched, producing technologies that sequence significantly longer DNA fragments and hence can cover pesky repetitive DNA regions and offer the ability to perfectly sequence a genome.

While these technologies were first developed over a decade ago, the current state of affairs in 2nd generation (NGS) and 3rd generation (long-read) sequencing is by no means at the peak. For one, we’ve seen a gradual improvement in sequencing accuracy over recent years, and it is at its highest point today.

Long-read methods have historically lacked accuracy, with some methods having an error rate over 10%, but 2022 and 2023 has seen significant improvements in accuracy, bringing them to comparable accuracy as standard short-read methods (see here and here). Sequencing bases at Q30 (equivalent to one error in 1000 bases) has been a standard expectation in short-read and the major long-read platform providers: PacBio and Oxford Nanopore are claiming a Q30 standard and a Q28 standard respectively.

However, some short-read sequencers have very recently gone beyond the Q30 standard and are racing to Q40 and beyond. Q40 is the equivalent of one error in 10,000 bases, an order of magnitude better than the Q30 and is currently routinely achieved by Element Biosciences’ AVITI and PacBio’s Onso. In fact, this jump in quality caused initial disturbances in pipelines such as 10x Genomics’ Cell Ranger, which was aborting runs due the quality parameters being outside the standard guidelines (which was fixed very quickly with one line of code).

Higher accuracy might make NGS more valuable for rare variant detection in cancer and improve performance in low-pass/shallow whole genome sequencing. The shift to Q40 will necessitate some computational changes but higher accuracy on the whole represents progress. The T2T, Human Pangenome Reference and Genome in a Bottle consortia have their sights on a loftier goal in a project nicknamed the Q100 project. This project aims to create a genome benchmark with near complete accuracy (1 error in 10 billion bases).

We will now briefly overview the developments in the short-read and long-read markets.

Short-read market overview

2022 saw major short-read releases such as the Illumina NovaSeqX and PacBio’s Onso. 2023 might be seen as the year of delivering promises, digesting 2022’s releases and incremental small gains.

Illumina continues to dominate the short-read market. In the last decade it was estimated that over 90% of the world’s sequencing data is generated on Illumina machines. One estimate in 2022 shows it still to be over 90% The recent release of the 25B flow cell for the NovaSeqX creates a new kind of scale, 26 billion reads per run at a cost of $0.64 dollar per million reads. One strength for Illumina is the breadth of offerings, covering the highest throughput instruments (summarized above) to low-cost instruments such as the iSeq.

Challengers who have risen over the last year and a half (e.g. Element Biosciences and Ultima Genomics) are carving a small space in the market and beginning to sell sequencers. Ultima Genomics is still operating relatively in stealth after their ground-breaking announcement of the $100 genome but are delivering units selectively. Element are now on the second generation of their chemistry – Avidity Cloudbreak, have achieved the $200 genome and have passed 100 commercial orders on their instrument as of September 2023.

At the end of 2022, Pacific Biosciences (PacBio), who have traditionally operated in long-read technologies (see next), announced the Onso, their first short-read sequencer. It uses sequencing by binding (similar to AVITI) in which fluorescently tagged nucleotides are not directly incorporated into the newly synthesized strands on the flow-cell. The first Onso systems were shipped in Q3 2023, and feedback will soon be available.

Long-read market overview

2023 has seen PacBio and Oxford Nanopore Technologies both improving their market share, representing the increasing interest in long-read technologies. Both have new models under development. PacBio only recently released the Revio but are producing an ultra-high throughput production-scale sequencer and a lower throughput benchtop model. Nanopore has two new products under development, the MinION Mk1D and the SmidgION, the latter would be the smallest sequencing device so far and is designed to be used with a smartphone anywhere.

Furthermore, short-read focused companies have produced long-read kits to enable the benefits of longer reads on their platform. Illumina launched Complete Long Reads for NovaSeq in March 2023. This kit tagments long-single-molecule fragments and can generate contiguous long-read sequences around 5-7kb in length with some reads greater than 10kb. Element Biosciences also have the LoopSeq™ for the AVITI™, released late in 2022, which barcodes longer sequences (up to 5kb) before sequencing. This expands the use of AVITI to applications where longer reads are valuable such as understanding microbial diversity, viral genomes and the immune repertoire. MGI’s MGIEasy stLFR kit does much the same as LoopSeq and allows long read information to be gained from their short-read platforms through barcoding.

With the successful crossover from Illumina, PacBio, MGI and Element into both short-read and long-read, a balanced and flexible sequencing offering from companies could be the future of the sequencing market.

The future of sequencing

For the first time, sequencing costs are being driven to general affordability. Innovation and new sequencers will be coming. In fact, you can track the development of new sequencers here at Shawn Baker’s blog. A cursory look shows that several companies are working on their own spin on Illumina’s SBS chemistry and several more are working to produce their version of nanopore single molecule sequencing. Furthermore, you can see the NGS necropolis, which highlights how precarious the development of sequencers can be for companies.

The market right now has a varied but limited portfolio of both short-read and long-read sequencers. In late 2022 and early 2023, a few records were broken, Both Illumina and Element claimed the capacity to produce a $200 genome on their new instruments and Ultima Genomics and MGI went one step further and claimed that their cost-saving sequencers could produce a genome for as little as $100. This progress massively outstrips Moore’s Law (see Figure 1) and while the cost of a genome across the world still sits at around $500 – $600, it clearly won’t be long before this value is sub $200.

Figure 1. The cost of sequencing a full human genome from 2001 to 2022. Image Credit: Our World in Data

Looking ahead more broadly, a blog post from Nava Whiteford from this year details a few interesting directions that the future of sequencing could take to improve the experience of everyday scientists.

  • The first direction was for sequencers to become boring like qPCR machines – reliable and easily available. Considering how important sequencing is to single-cell, spatial, liquid biopsy etc., there would be several quality-of-life improvements if having a brand-new sequencer wasn’t a big deal. Namely, having access to numerous vendors for consumables and easy access to the sequencing instruments  (perhaps even second-hand).
  • The second was for a development in automation beyond microfluidics and pipetting robots. Instead, sequencing could adopt automation workflows that operate sample-to-answer. Here perhaps, platforms could integrate the automated preparation, meaning samples could be introduced into the sequencer to produce a whole genome without any hands-on preparation.
  • Third, and following on from the second, is an increased use of sequencing for diagnostics. Illumina reports that clinical applications are ~50% of their market. Harnessing the human genome is arguably the future of healthcare, and in recognition of this, research published this year from the 100,000 Genomes cancer program has objectively showed that NGS yielded more comprehensive information than cancer panels across 33 types of tumour1. Additionally, the UK Biobank recently released data for ½ million genomes for worldwide use, in recognition of the clinical value of these data.

Multi-omics technology

If you want to understand a disease, or define a cell type, is sequencing DNA or RNA enough? The answer, quite often, is no2,3. This is why multi-omics sequencing is becoming an increasingly valued methodology for scientists4-6. Furthermore, these methods are being developed at high resolution, with exciting single-cell and spatial methods released in the last 12-18 months7-9.

Multi-omics involves  collecting multiple ‘omics’ measurements from a single sample, or even a single cell. This multi-modal data is then integrated together using sophisticated computational methods. Now, relationships between DNA, RNA, proteins and more (see Figure 2) can be explored and we can construct multimodal profiles of diseases. These are more robust than their mono-omic counterparts, and can be used to help untangle heterogeneity of disease progression and treatment response.

Figure 2. A selection of the multi-omics approaches that are currently available to researchers. Image credit: Roychowdhury, et al. 10

There are several exciting areas of development in multiomics including:

  • Integration of single-cell data: Single-cell omics technologies are rapidly advancing, and researchers are now able to generate multi-omics data from individual cells. This has the potential to provide unprecedented insights into cellular heterogeneity, cell-to-cell communication and disease mechanisms.
  • Multi-omics data visualization: As the amount of multi-omics data being generated continues to increase, there is a growing need for effective data visualization tools. New visualization methods, such as interactive network-based visualization platforms, are now being developed to help researchers gain insights from complex multi-omics data sets.
  • Multi-omics biomarker discovery: Integrating data from multiple omics technologies can help identify biomarkers that are more accurate and reliable than those identified using a single technology. These biomarkers can be used for disease diagnosis, prognosis and treatment.
  • Deep learning approaches: Deep learning approaches, such as deep neural networks and convolutional neural networks, are now being applied to multi-omics data sets to identify complex patterns and relationships between different omics data types. These methods have the potential to reveal new insights into disease mechanisms and identify novel therapeutic targets.

The principal challenge for multi-omics sequencing comes from the inherent difficulty in integrating omic data, which exist at different data scales, noise ratios and ‘completeness’ (amounts of missing data). Computational methods present a solution to this problem and deploy either algorithm-based or machine learning models to effectively match the omics data within a sample/cell11. The latest of these methods can perform sophisticated mosaic integration, linking omics data from the same sample and from different samples alike12,13

Please refer to the Front Line Genomics Multi-Omics Playbook for an in-depth overview of the latest multi-omics methods, an array of applications and the most exciting integration methods available.

Single-molecule proteomic sequencing

An up-and-coming topic within sequencing is the rapid developments in the capacity for proteomic sequencing. Detecting and identifying proteins has been possible for decades using fluorescent antibodies (immunoassays) or using mass spectrometry-based methods. It is only very recently that methods have emerged that allow for high-throughput proteomics.

2023 has seen some impressive examples of large-scale proteomics work. For example, large-scale proteomics work in six human cell lines identified 1 million peptides across 17,717 human proteins (using the Thermo Fisher Scientific Orbitrap, see below) and has built a catalogue of peptides for future work14. Other examples have revealed unique proteomic patterns from  thousands of proteins in blood plasma for lung cancer using Seer’s proteomics platform15 and across cancer types using the Olink Proteomics Explore technology16.

We will now briefly overview some examples of commercial proteomic sequencing platforms and kits that are available to the reader.

Quantum-Si’s Platinum single-molecule protein sequencing platform works on digested individual peptides that are immobilized in wells. Fluorescently labelled N-terminal amino acid (NAA) recognizers then bind the individual peptides, and the resulting unique fluorescence signal is recorded onto a chip. By sequentially cleaving the NAA, the next amino acid is exposed, and the sequence is recorded. This allows unbiased single-molecule resolution and shows post-translational modifications, as well as detecting low abundance proteins in complex mixtures.

SomaLogic’s SomaScan kit uses Slow Off-rate Modified Aptamers (SOMAmer) instead of antibodies, which provide greater specificity. From a 55 µL sample, their platform can measure 11,000 proteins using the extensive SOMAmer library and the platform has a high throughput of 1,000 samples a day.

Olink Proteomics has an Explore kit based on the Proximity Extension Assay (PEA) technology that can sequence over 5,400 proteins from 2 µL of sample. Matched antibodies carrying DNA tags bind to proteins in the sample and form a dual bond. Only these bound proteins will then have their DNA tag hybridized, this DNA is amplified and can then be read by NGS.

Seer’s Proteograph technology harnesses engineered nanoparticles which  consist of a magnetic core and a surface which binds to proteins within a biofluid. A single nanoparticle can bind to a broad range of proteins and hence a panel of diverse nanoparticles can profile a dynamic range of proteins via mass spectrometry. Output-wise, this results in quantifying  thousands of proteins in hours rather than days or weeks.

Thermo Fisher Scientific’s Orbitrap Astral Mass Spectrometer is setting new standards for mass spectrometers, capable of analysing one sample in 8 minutes and identifying over 8,000 protein groups in one run. With more time, over 15,000 proteins can be detected, and this technology works for single cells too.

Bruker’s timsTOF Ultra uses trapped ion mobility spectrometry (TIMS) and quadruple Time-Of-Flight (TOF) technology to produce 4D-Proteomics™. This mass spectrometer can identify >5000 protein groups and >55,000 peptides at single-cell sensitivity with very high confidence (<1% FDR)

Nanopore proteomic sequencing

Much interest has also been expressed in repurposing nanopore technology for proteomic sequencing17-19, among other purposes20. This has struggled to be developed for several reasons. Typically, nanopores are too big and lack the sensitivity required to discriminate between amino acids21Theoretically a better nanopore could help, but it’s challenging to get high accuracy protein nanopore sequencers. Furthermore, peptides translocate too quickly through the nanopore for individual amino acids17

The ideal approach would be one in which proteins are unfolded, linearly translocated through the nanopore amino acid by amino acid, and the individual amino acids are recognized by the specific current signatures they produce.

In October 2023, a multi-pass, single-molecule nanopore sequencing paper was published on bioRxiv, where  long protein strands were sequenced with single-amino-acid sensitivity22. Their approach uses a ClpX motor to ratchet proteins through the nanopore while reading individual amino acids, with each region being read multiple times through causing the ClpX to slip and eventually dissociate from the nanopore (see Figure 3). This approach seems highly promising and closer to the ideal approach of drawing proteins through the nanopore but it does have its drawbacks (see here).

Figure 3. Nanopore protein reading using a ClpX unfoldase, Schematic of cis-based unfoldase approach on the MinION platform. Image Credit: Motone, et al. 22

Solid-state nanopore sequencing

As hinted in the proteomics section above, nanopores should ideally have dimensions comparable to the analyte of interest, so that measurable changes in ionic current amplitude can be measured above the noise level when the analyte passes through20. Biological nanopores (formed by protein subunits or DNA scaffolds) have precise dimensions (1-10 nm) enabling the recognition of certain biomolecules. However, being biological, they have a relative shelf life, limited reuse potential and they’re difficult to engineer23.

Solid-state nanopores address many of these concerns since they are crafted from inorganic/plastic membranes (e.g., Si3N4). Pores are artificially created in single atom thick sheets of material and the pores can have diameters up to hundreds of nanometres wide, allowing large biomolecules and complexes to pass through. These nanopores can be constructed with several methods such as electron milling24 and laser-based optical etching25.

Solid-state nanopores have other benefits, for example they could dramatically increase signal to noise ratios – one study found this to be on the order of 160-fold26 (see Figure 4). With recent advances in solid-state nanopores and protein nanopore engineering, it is now possible to build artificial systems that recapitulate biological pores in vitro. The hope is for solid-state nanopores to become a powerful single-molecule detection platform that is agnostic to the nature of the sequenced molecule 27.

Figure 4. Biological vs. Solid-state nanopores signal-noise ratio. Image Credit: Fragasso, et al. 26

There are currently no viable commercial solid-state nanopore sensors, mostly due to the challenges in making the manufacturing process cost-effective28. But the promise of these nanopores is obvious and Oxford Nanopore Technologies recently acquired Northern Nanopore Instruments, who specialise in innovative solid-state nanopore fabrication technology. This reflects their understanding that solid-state nanopores are perhaps the future of nanopore-based sequencing. Furthermore, solid-state nanopores are an important part of the toolkit to achieve Oxford Nanopore Technologies’ mission – to enable the analysis of anything, by anyone, anywhere.

Thank you for reading this feature, if you would like more like this, please download the full Sequencing Buyer’s Guide.

References

1.         Sosinsky, A. et al. Insights for precision oncology from the integration of genomic and clinical data of 13,880 tumors from the 100,000 Genomes Cancer Programme. Nature Medicine (2024).

2.         Babu, M. & Snyder, M. Multi-Omics Profiling for Health. Molecular & Cellular Proteomics 22, 100561 (2023).

3.         Battle, A. et al. Impact of regulatory variation from RNA to protein. Science 347, 664-667 (2015).

4.         Vandereyken, K., Sifrim, A., Thienpont, B. & Voet, T. Methods and applications for single-cell and spatial multi-omics. Nature Reviews Genetics, 1-22 (2023).

5.         Baysoy, A., Bai, Z., Satija, R. & Fan, R. The technological landscape and applications of single-cell multi-omics. Nature Reviews Molecular Cell Biology, 1-19 (2023).

6.         Li, X. Harnessing the potential of spatial multiomics: a timely opportunity. Signal Transduction and Targeted Therapy 8, 234 (2023).

7.         Zhang, D. et al. Spatial epigenome–transcriptome co-profiling of mammalian tissues. Nature 616, 113-122 (2023).

8.         Liu, Y. et al. High-Spatial-Resolution Multi-Omics Sequencing via Deterministic Barcoding in Tissue. Cell 183, 1665-1681.e18 (2020).

9.         Liu, Y. et al. High-plex protein and whole transcriptome co-mapping at cellular resolution with spatial CITE-seq. Nature Biotechnology (2023).

10.       Roychowdhury, R. et al. Multi-Omics Pipeline and Omics-Integration Approach to Decipher Plant’s Abiotic Stress Tolerance Responses. Genes 14, 1281 (2023).

11.       Argelaguet, R., Cuomo, A.S.E., Stegle, O. & Marioni, J.C. Computational principles and challenges in single-cell data integration. Nature Biotechnology 39, 1202-1215 (2021).

12.       Hao, Y. et al. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nature Biotechnology (2023).

13.       Ghazanfar, S., Guibentif, C. & Marioni, J.C. Stabilized mosaic single-cell data integration using unshared features. Nature Biotechnology (2023).

14.       Sinitcyn, P. et al. Global detection of human variants and isoforms by deep proteome sequencing. Nature Biotechnology 41, 1776-1786 (2023).

15.       Donovan, M.K.R. et al. Functionally distinct BMP1 isoforms show an opposite pattern of abundance in plasma from non-small cell lung cancer subjects and controls. PLOS ONE 18, e0282821 (2023).

16.       Álvez, M.B. et al. Next generation pan-cancer blood proteome profiling using proximity extension assay. Nature Communications 14, 4308 (2023).

17.       Brinkerhoff, H., Kang, A.S.W., Liu, J., Aksimentiev, A. & Dekker, C. Multiple rereads of single proteins at single–amino acid resolution using nanopores. Science 374, 1509-1513 (2021).

18.       Yan, S. et al. Single molecule ratcheting motion of peptides in a Mycobacterium smegmatis Porin A (MspA) nanopore. Nano letters 21, 6703-6710 (2021).

19.       Chen, Z. et al. Controlled movement of ssDNA conjugated peptide through Mycobacterium smegmatis porin A (MspA) nanopore by a helicase motor for peptide sequencing application. Chemical science 12, 15750-15756 (2021).

20.       Ying, Y.-L. et al. Nanopore-based technologies beyond DNA sequencing. Nature Nanotechnology 17, 1136-1146 (2022).

21.       Timp, W. & Timp, G. Beyond mass spectrometry, the next step in proteomics. Science Advances 6, eaax8978 (2020).

22.       Motone, K. et al. Multi-pass, single-molecule nanopore reading of long protein strands with single-amino acid sensitivity. bioRxiv, 2023.10.19.563182 (2023).

23.       Xue, L. et al. Solid-state nanopore sensors. Nature Reviews Materials 5, 931-951 (2020).

24.       Storm, A.J., Chen, J.H., Ling, X.S., Zandbergen, H.W. & Dekker, C. Fabrication of solid-state nanopores with single-nanometre precision. Nature Materials 2, 537-540 (2003).

25.       Gilboa, T., Zrehen, A., Girsault, A. & Meller, A. Optically-Monitored Nanopore Fabrication Using a Focused Laser Beam. Scientific Reports 8, 9765 (2018).

26.       Fragasso, A., Schmid, S. & Dekker, C. Comparing Current Noise in Biological and Solid-State Nanopores. ACS Nano 14, 1338-1349 (2020).

27.       Lindsay, S. The promises and challenges of solid-state sequencing. Nat Nanotechnol 11, 109-11 (2016).

28.       Liu, H., Zhou, Q., Wang, W., Fang, F. & Zhang, J. Solid-State Nanopore Array: Manufacturing and Applications. Small 19, 2205680 (2023).