It has been over 30 years since the first generation of DNA sequencing technology was developed in 1977. Since then, sequencing platforms have made considerable progress and every transformation has led to a huge shift towards furthering genome research, clinical disease research and drug development.
NGS became available at the beginning of the 21st century. Perhaps the biggest advance that NGS offered was the ability to produce a huge amount of data, alongside its ability to provide a highly efficient, rapid, low-cost and accurate approach to DNA sequencing, beyond the reach of traditional Sanger methods.
Today, there are several DNA sequencing companies that market either second- or third-generation NGS technologies. The sequencing industry has grown tremendously since the Human Genome Project’s rough draft was completed and the challenge of completing the $1,000 whole genome was proposed. DNA sequencing is a booming business because it has become an integral process in many areas of clinical diagnostics. It also underpins the emerging field of liquid biopsy.
*This page was updated on 03/11/2022 to include some of the recent big announcements made by Ultima Genomics, PacBio and Illumina.
What is Massively Parallel Sequencing?
The terminology used to describe different sequencing platforms can sometimes be confusing, probably due to the fact that NGS has evolved so rapidly over the last few decades. First-generation NGS platforms are easy to define because they are based on Sanger sequencing. The uncertainty arises when discussing second-generation NGS, which is based on pyrosequencing or pH change detection, as it can also be described as first-generation massively parallel sequencing (MPS). Furthermore, third-generation NGS is sometimes called second-generation MPS.
Some believe that using MPS to categorize sequencing technologies is more appropriate than NGS because the generations differentiate between short-read and long-read sequencing. Therefore, MPS is clearly separate from the classical Sanger sequencing. Moreover, as sequencing technologies evolve, it could be argued that they are no longer ‘a next generation of sequencing’, but instead, that they are literally sequencing in parallel – hence MPS.
Nevertheless, NGS is still the most common terminology that is used to refer to sequencing platforms. And so that is what will be used throughout this guide.
The DNA sequencing landscape
The global market for DNA sequencing is predicted to grow from $15.7 billion in 2021 to $37.7 billion by 2026. The rising prevalence of viral diseases, such as COVID-19, and the increasing cases of cancer globally are likely to drive genomic research and propel the industry at an even more rapid rate. Constant collaborations and partnerships between major market players have enabled the continuous launch of novel innovative technologies and have increased the demand for NGS platforms.
NGS accounted for 58.6% of the DNA sequencing market in 2019. Advances in these technologies, along with the declining cost of sequencing, have made NGS faster, more affordable and accurate. Also, NGS is gaining popularity as a routine clinical diagnostic test for the COVID-19 infection. In 2020, Illumina received authorization for its COVIDSeq test for the detection of SARS-CoV-2 infection, and Helix OpCo LLC gained approval for its COVID-19 NGS test.
In 2019, oncology dominated the market for DNA sequencing and accounted for a revenue share of 24.4%. This is because NGS technologies are showing huge potential in clinical research and the development of cancer diagnostics. Also, they are becoming invaluable to precision medicine, with the first liquid biopsy companion diagnostic being approved in 2020 to guide cancer treatment.
Pie chart to show the proportions of global DNA sequencing market share, by end use, in 2019. Image credit: Grand View Research, 2020
North America accounted for the largest share in the DNA sequencing market, with 44.3% in 2019. This is likely due to the availability of technologically advanced healthcare infrastructure and the presence of several US government initiatives that support research in drug development and cancer treatments. The Asia Pacific market for DNA sequencing is projected to grow rapidly in the coming years, and strategic initiatives are being undertaken by international sequencing companies to expand their global presence. In 2020, GenapSys raised $75 million to finance the worldwide expansion of their products to help the control of COVID-19.
The major companies that operate the worldwide DNA sequencing are Agilent Technologies, Illumina, QIAGEN, Perkin Elmer, Thermo Fisher Scientific, Roche, Macrogen, Bio-Rad Laboratories, Oxford Nanopore Technologies and Myriad Genetics. These organisations are making efforts to address the rising consumer demand whilst making significant investments in expanding their portfolios. In 2020, giant strides were taken – MGI announced the commercial availability of its sequencing platforms in the US, CARTANA launched an expanded range of in situ sequencing kits for single-cell gene expression mapping and Bio-Rad Laboratories announced the launch of its novel approach to RNA sequencing library preparation (SEQuoia Complete Strand Library Pep Kit). In 2022 alone, we’ve seen some huge announcements. Arguably, the biggest of these was Ultima Genomics revealing the $100 genome just 8 years after Illumina’s HiSeq X Ten Sequencer breakthrough in 2014. Illumina has now countered with the launch of NovaSeq X Series, which promises to generate more than 20,000 whole genomes per year.
Despite increasing competition, the marketplace for DNA sequencing machines is still dominated by Illumina. The organisation has driven the cost of DNA sequencing down from several billions two decades ago, to less than $1,000 now. Thermo Fisher Scientific is believed to own most of the remaining market not captured by Illumina. The company is more diverse and launched its latest NGS system, called Ion GeneStudio S5 Series, in 2018.
Considerations for DNA sequencing
This guide focusses on the various DNA sequencing companies that exist today and discusses which NGS machines they market. Currently, second-generation NGS technologies are the most commonly used approach because they remain the fastest and the cheapest form of gene sequencing. However, advances in third-generation NGS machines have meant that they continue to grow in popularity, particularly as their prices drop.
There are various practical uses for DNA sequencing, including research, drug development, biomarker analysis and therapeutic decision making. Each application is better suited for different types of sequencing. For example, when studying complex organisms with little reference data, long-read sequencing on a third-generation NGS platform is probably the most appropriate option.
Furthermore, before samples can be analysed, they must be treated and prepared. Sample preparation processes differ depending on the type of sequencingbeing performed because each technology has unique considerations.
Second-generation NGS technologies, of the kind developed by Illumina and others, can be grouped into two major categories – sequencing by hybridization or sequencing by synthesis. Sequencing by hybridization is an approach whereby a collection of overlapping oligonucleotide sequences is assembled together to determine the DNA sequence. Sequencing by synthesis technology uses a polymerase or ligase enzyme to incorporate nucleotides with a fluorescent tag, which are then identified to determine the DNA sequence.
Diagram to show the principle of sequencing by synthesis: a) flow cell; b) incorporation of nucleotides results in fluorescence emission; c) zoomed in flow cell showing different nucleotides associated with their specific colour. Image credit: G. Untergasser, 2019
All second-generation NGS technologies are dependent on amplification before sequence analysis. This amplification step is needed to generate a large enough number of copies of each DNA template so that there is sufficient signal strength for each base addition.
Three different techniques are commonly used to amplify DNA fragments before sequencing: emulsion PCR, bridge amplification and DNA nanoball generation. Second-generation NGS technologies can be categorized based on what DNA amplification techniques they utilize.
|DNA Amplification||NGS Technology|
|Emulsion PCR||Ion Torrent GenapSys|
|DNA nanoball generation||BGI Group (Beijing Genomics Institute)|
Advantages of second-generation NGS:
- High sequence accuracy
- Relatively cheap
- Able to sequence fragmented DNA
Disadvantages of second-generation NGS:
- Only capable of producing short sequencing reads (reads are between 200-300 bases long)
- Not able to resolve structural variants or distinguish highly homologous genomic regions
- Not suitable for analysis of sequences that contain large numbers of repetitive sequence elements, transcript isoforms or methylation signatures
The Ion Torrent technologies are sold by Thermo Fisher Scientific and several versions of the platform are now available. The platforms, which utilize emulsion PCR for DNA amplification, were first released in 2010. Currently, there are two Ion Torrent machines on the market – the Ion GeneStudio S5 System and the Ion Torrent Genexus System.
Ion GeneStudio S5 System
This machine is designed for scalable and targeted NGS to support small and large projects. It delivers a high level of range and can read lengths of up to 600 base pairs. The system is able to promote cost-effective experiments with a variety of applications, including cancer research, inherited disease and infectious diseases. It has a flexible chip format, allowing up to 260 million reads and a total output of up to 50 gigabases.
Ion Torrent Genexus System
This machine can be utilized for automated specimen-to-report workflow and is capable of delivering results in just one day. Essentially, with only five minutes of hands-on time, the system is the first fully integrated NGS platform that delivers results in a single day.
Ion Torrent sequencing chemistry
The Ion Torrent platform is essentially a chip that consists of a multi-well plate. It uses 5-micron beads, each containing around 50,000 amplified DNA fragments. When a nucleotide is added to one of the fragments, a flow of hydrogen ions is released – hence an ion torrent. The flow of hydrogen ions lowers the pH in the well, allowing the detection of nucleotide incorporation.
Diagram showing the release of hydrogen ions by the addition of deoxyribonucleotides, which are detected by an ion sensor. Each incorporation leads to a corresponding number of released hydrogens and intensity of signal. Image credit: D. Tack, 2011
A limitation of Ion Torrent is the presence of homopolymers (repeats of the same nucleotide). If there are over ten of the same bases in a row on the template DNA strand, the platform can no longer make accurate calls. Also, the approach is dependent on emulsion PCR to amplify the DNA plates, which can be lengthy and may introduce biases. However, the Ion Chef System can be used to automate the emulsion PCR process, removing the need for lengthy and multi-step preparation procedures with water and oil emulsions.
The highest throughput chip for the machine can generate 50 gigabases of sequence per run, meaning it is not capable of sequencing large regions of DNA or whole genomes. However, there is the potential to increase the sequence output over time by simply creating chips containing more wells. Also, a major advantage of the Ion Torrent system is that no camera or light scanner is needed because nucleotide incorporation is directly converted into voltage, which is recorded directly and greatly speeds up the process.
The GenapSys Sequencing Platform is a small desktop-based machine that is designed to be fast and cost-effective. It is marketed as an alternative NGS system for personal use, due to its affordability and control over the entire sequencing process, allowing the user to perform runs on their own schedule.
The portable sequencer has been used to help track many of the coronavirus strains and mutations by the Chinese government’s Center for Disease Control and Prevention. The sequencer weighs less than 4.5 kilograms and runs on electrical sequencing chips, which can be easily changed according to the scale of analysis required. The standard G3 chip can perform 13 million reads in 24 hours, with an accuracy of over 80%.
Recently, the company received a $70 million funding round. Hesaam Esfandyarpour, the company’s founder, commented: “This latest round of funding is a testament to the strength and momentum of GenapSys’ technology. The world is just beginning to unlock the immense potential of genomic sequencing, and this capital will help fuel GenapSys’ next stage of growth.”
Furthermore, GenapSys has now partnered with Twist Bioscience to combine target enrichment and library preparation tools with the sequencing platform. It is thought that these linked technologies will further support and drive innovation in COVID tracking, drug discovery and cancer diagnostics.
GenapSys sequencing chemistry
The GenapSys system uses complementary metal oxide semiconductor (CMOS) sequencing chips, which have millions of individual sensors on their surface. Each sensor consists of electrodes in close proximity to each other, which detect minute electrical changes caused by nucleotide incorporation. The magnitude of the differential electrical changes correlates with the number of incorporated nucleotides, which is then plotted onto a scatter graph.
Although the platform is only useful for small sequencing gene panels, the 16M chip allows for many applications including targeted sequencing, small genome sequencing, gene editing validation, various types of RNA sequencing and targeted single cell assay sequencing.
Illumina’s first platform was purchased from Solexa and was called the Genome Analyzer. It was made commercially available in 2007. The machine could sequence 6 million amplified fragments per lane, but was only able to generate around 30 base reads per amplified fragment. Illumina rapidly increased this to over 100 base pairs. The number of amplified fragments on the flow cell was also enhanced so that, in the end, the Genome Analyzer was capable of an 80-gigabase output.
HiSeq was Illumina’s second available NGS machine and released in 2010. It ran two flow cells – one carried out the chemistry of base addition, while the other was scanned to determine which base was incorporated at each amplification. This was followed by the HiSeq X10, which was able to further increase the number of fragments that could be analysed by incorporating patterned flow cell dimples, as opposed to randomly amplified clusters. Today, the NextSeq and NovaSeq machines are sold by Illumina, along with a variety of benchtop sequencers, such as the iSeq 100 and the MiniSeq.
The NextSeq 500 was launched in 2014 and uses a two-dye sequencing technology, as opposed to its predecessors that used four-dye techniques. Only red and green images are taken, enabling a significant reduction in cycle and data processing times. The instrument is capable of reading 400 million base pairs in around a 30-hour run time. The NextSeq 1000 and 2000 machines were released in 2020 and are designed to simplify workflows by offering onboard informatics and cloud-based technology. The P3 flow cell has extended the reach of the NextSeq 2000 instrument, by offering 1.1 billion reads in a single sequencing run.
The NovaSeq 6000 was released in 2017. It is capable of running three different chips and can generate 100 gigabases of sequence output for just $375 – this price is for the sequencing only and doesn’t include DNA isolation, library preparation, sequencing analysis or data storage. Essentially, the machine has the capacity to sequence up to 48 whole human genomes per run, which could take up to 44 hours. Other key applications include single cell profiling, transcriptome sequencing and metagenomic profiling.
HiSeq X Series
The HiSeq X Ten Sequencer was released in 2022 and is capable of generating up to 16 Tb of sequence output in a single run. The system can sequence a human genome at 30x coverage or greater for less than $1000 and can deliver over 18,000 human genomes per year. It can produce up to 52 billion reads per flow cell and the maximum run duration is 48 hours. The system promises to allow WGS beyond the human species and can also be used in whole exome sequencing, transcritpome sequencing, single-cell analysis and multi-omics.
Illumina sequencing chemistry
Illumina sequencing is based on a technique called bridge amplification, whereby DNA molecules with attached adapters are amplified on a glass flow cell, which contains nanowells to space out the fragments and avoid overcrowding. Each nanowell contains oligonucleotides that anchor the adapters for a phase called cluster generation. Fluorescent blockers are used so that the DNA polymerase can only add one nucleotide at a time to the fragment. After each round of synthesis, the wavelength of the fluorescent tag is recorded at each nanowell, before washing the non-incorporated molecules away. This process is repeated until the full DNA molecule is sequenced.
A diagram explaining the steps of Illumina sequencing technology. Image credit: BiteSizeBio, 2012
Illumina technologies are dependent on DNA amplification and errors during these procedures can cause problems, such as barcode skipping. But this can usually be avoided by using double barcode strategies. The main limitation of second-generation NGS technologies in general is that they are limited to producing short reads, reducing their ability to examine complex genomes that contain a significant proportion of repetitive sequences and characterize transcripts. Illumina’s NextSeq machines can only produce reads of up to 150 base pairs, and even if all of the add-on extras are applied, the NovaSeq 6000 is limited to 250 base pairs.
Nevertheless, mate-pair sequencing can be used to increase the power of short-read sequencing. Illumina’s Nextera Mate Pair Library Prep kit biotinylates the ends of large DNA fragments, which are then circularized to bring the two end pieces together. The circular DNA is then fragmented and sequenced. Combining data from mate-pair sequencing with short-read sequencing provides increased information and maximises the coverage across the genome and allows larger inserts to be read across greater distances.
BGI Group, formerly known as the Beijing Genomics Institute, is a Chinese company that was formed in 1999 to participate in the Human Genome Project. Now, the organisation has evolved to work on animal cloning, health testing and contract research. BGI purchased Complete Genomics in 2012 and its products are sold by a subsidiary (MGI).
The DNBSEQ-T7 was launched in 2019 and was designed to support an array of large-scale sequencing applications for health projects and clinical research. Along with the One Million Genomes Total Solution software and hardware, it has been reported that the DNBSEQ-T7 can sequence up to 800,000 samples per year. The hardware solution includes an automated library preparation system, meaning the sequencing machine can operate for 24 hours with no need for manual intervention, allowing the completion of 60 whole human genomes per day. Its commercialisation is expected to reduce the cost of personal whole-genome sequencing to under $500, in turn changing the sequencing landscape.
BGI sequencing chemistry
The sequencing protocol utilized by BGI is called combinatorial probe-anchor synthesis (cPAS). This consists of rolling circle replication with the Phi 29 DNA polymerase, which synthesizes a long, single-stranded DNA that self assembles into a nanoball (around 300 nanometres across). Fluorescent probes are incorporated, and the nanoballs are attached to a silicon wafer flow cell where they selectively bind to the positively charged material in a highly ordered pattern. The emission of fluorescence is then imaged and measured to record the base position.
Again, common to all short-read sequencing, the main disadvantage of the BGI platforms is that no long lengths of DNA sequences can be obtained using these methods. Nevertheless, an important advantage of cPAS-based sequencing is that the high accuracy of Phi 29 DNA polymerase ensures accurate amplification of the circular template. Also, because the DNA nanoballs remain in place on the flow cell, they do not produce optical duplicates and do not interfere with neighbouring DNA.
Cool-MPS is a sequencing set sold by MGI, which is a novel antibody-based product that enables base calling to be achieved by the specific binding of fluorescently labelled antibodies, instead of deoxynucleosides being incorporated with fluorescent labels. This allows BGI systems to be optimized with fewer amplified molecules, because each antibody has multiple fluorescent groups attached. Also, more nanoballs can be tightly packed into patterned flow cells for even greater sequence output.
Third-generation NGS is a class of DNA sequencing methods that were first described around 2009 and are still under active development. These technologies are capable of producing substantially longer reads than second-generation sequencing, with wide implications for genome research. Particularly useful applications of third-generation NGS include the study of epigenetic markers, transcriptomics and metagenomics.
These machines sequence single DNA molecules and do not amplify templates before sequencing. Instead, methodologies have been developed to directly increase DNA enough to obtain sufficient signal strength without amplification.
Advantages of third-generation NGS:
- Possible to start with considerably longer DNA fragments
- Lack of amplification leads to easier library preparation and portable technologies
- Epigenetic markers are stable and so methylation signatures and histone modifications are preserved
- Generates very long sequence reads
Disadvantages of third-generation NGS:
- Signals obtained from individual fragments can be weak
- Overall lower accuracy
There are two principal companies that develop third-generation NGS technologies – Pacific Biosciences and Oxford Nanopore Technologies. Each takes a fundamentally different approach to sequencing.
Pacific Biosciences’ first instrument was called the PacBio RS and released in 2010. Eight years later, Illumina agreed to purchase the company and the deal was expected to close the following year. However, the deal was abandoned at the beginning of 2020 because it was objected by the Federal Trade Commission on anti-competitiveness grounds. Since then, Pacific Biosciences has announced a complete reshuffle of its leadership team and has acquired firm that makes kits for extracting high molecular weight DNA, called Circulomics. The CEO of Pacific Biosciences, called Christian Henry, commented on the recent acquisition: “One of our core strategies is to improve the front end of our sequencing workflows. The Nanobind technology that Circulomics has created is already proven in the market and will accelerate our efforts to make sample extraction and library preparation easier for our customers. By adding the team to PacBio we will be able to deeply integrate their technology into our workflows which will improve our entire long-read sequencing workflow.”
As of October 2022, PacBio has also entered the short-read sequencing market with the release of its Onso platform. This system uses sequencing by binding (SBB) technology and claims to enable up to 500 million reads, 150 Gb output per run, and all in under 48 hours. However, our main focus here will be on the long-read offerings from PacBio.
The PacBio Sequel System was originally released in 2015. It was able to generate very long sequence reads, thousands of base pairs long. Also, it was capable of determining cytosine from 5-methyl cytosine and could also be used to directly sequence RNA without the need for converting it into complementary DNA. The latest evolution of the Sequel System is called the Sequel IIe, which was launched in 2020. This instrument features advanced in-house data processing and cloud enablement to deliver the results rapidly, with reduced compute and storage costs. The recent release of HiFi Sequencing and Software v10.1 enables whole genome sequencing-based applications using the Sequel II with above 99.9% accuracy. If the current highest output chips, which contain 8 million zero-mode waveguides, are combined with single molecule, real-time (SMRT) sequencing, around 120 gigabases of sequence can be generated.
Newly launched in 2022, the PacBio Revio system promises long-read sequencing at scale. The platform can produce up to 90 GB of output in 24 hours. It is based around PacBio’s “HiFi” reads, which are tens of kilobases long and allow researchers to resolve large variants and map difficult regions of the genome. The Revio system offers 90% of bases ≥Q30 and a median read accuracy ≥Q30. It can also cover all types of variant calling, including SNVs, indels, and SVs.
Pacific Biosciences sequencing chemistry
SMRT sequencing is the core technology that powers Pacific Biosciences platforms. The SMRT Cell contains millions of tiny wells called zero-mode waveguides. Single DNA molecules are immobilised at the bottom of these wells whilst DNA polymerase incorporates fluorescently labelled nucleotides. To detect the addition of each base, the light emitted at the top of the zero-mode waveguide is recorded and analysed. This methodology allows DNA fragments to be read multiple times by synthesizing oligonucleotides that are attached to the ends of DNA fragments and shaping them into ‘smart-bells’. These individual circular molecules enable the polymerase to go around the DNA multiple times, resulting in much higher sequencing accuracy. This technology can generate very long sequence reads and much longer DNA fragments can be used.
A diagram to show the process of SMRT sequencing. Image credit: PacBio
A problem that Pacific Biosciences faces is that the platform only sequences individual molecules, and the signal strength of base incorporations is quite low. Although the zero-mode waveguides measure some signal, the DNA polymerase adds bases so quickly that the ability to detect growing sequences is challenging and the overall sequencing accuracy is only around 85%. However, these sequencing errors are random, meaning that if the same fragment is sequenced multiple times, a higher overall accuracy can be achieved. SMRT sequencing technology enables an overall sequencing accuracy of 99%, although this results in a slightly lower output.
Oxford Nanopore Technologies
Oxford Nanopore Technologies is a UK-based company that was founded in 2005. Their first product, called MinION, was introduced in 2014. It was a small handheld device with features such as being highly portable, easy sample preparation and the capability of directly sequencing both DNA and RNA.
The MinION is a portable protein nanopore sequencing USB device that has been commercially available since 2015, when its error rate was reported to be around 30%. The company has driven continuous updates and in 2020, the R10.3 nanopore was released. This new design has a longer barrel and dual reader head, leading to an accuracy of up to 99.9% with a unique molecular identifier method. This shows that dramatic increases in sequence accuracy have been made since the first MinION chips. Now, the platform is capable of the rapid identification of viral pathogens, monitoring of antibiotic resistance and the analysis of structural variants in cancer, among other applications.
Oxford Nanopore sequencing chemistry
Oxford Nanopore Technologies developed a sequencing technology that determines the sequence of DNA molecules as they are threaded through a small nanopore. The platforms work by passing an ionic current through nanopores and measuring the changes in electrical charge as nucleotides pass through the small pore. The nanopores can be created by proteins that puncture membranes or solid material. An adapted phi29 motor protein is used to thread the DNA into the nanopore. As the electrical current changes across the nanopore, it is possible to determine the sequence of nucleotides being passed through it.
Currently, nanopore sequencing cannot yet resolve single bases in a DNA strand because six nucleotides are threaded through a nanopore at once. This makes it difficult to determine the sequence because each base is part of six changes in electrical current. Strategies to address this include modifying the pores to slow and restrict strand translocation, or sequencing the same fragment multiple times. The solution may be to use a small solid graphene nanopore that only allows a single nucleotide though at any one time. This would enable nanopore sequencing to achieve an even greater accuracy and higher throughput.
Major advantages of Oxford Nanopore’s platforms are the small size, portability and low cost. The MinION is the just slightly larger than the average USB drive and it is possible to directly plug the data straight into a computer. The machine costs around $1,000 and provides several gigabases of sequence in less than 48 hours. It is probably the most cost-effective approach to mapping highly complex genomes transcript isoforms, directly sequencing RNA and characterizing DNA methylation. Also, no complicated library preparation is required.
Applications of NGS technologies
NGS is not yet like PCR, whereby it is a viable strategy for an individual to have their own machine. Instead, it is more likely that an institution makes the necessary investments and owns the platform. Before this purchase, it is important to ask: What do researchers want from the sequencing? The answer may be one of many applications. This may require the laboratory to purchase both second- and third-generation NGS platforms, on top of the appropriate library preparation, storage capabilities and skilled bioinformatic specialists.
To avoid hugely comprehensive and expensive NGS ecosystems, combining platforms may be most beneficial, which is what the Ion Torrent Genexus system does. If the clinical laboratory is large or part of a bigger institution, a full NGS ecosystem could be a viable option. If so, a combination of Illumina’s MiSeq and NextSeq 2000 would enable the rapid sequencing of a smaller number of samples.
There are a number of different NGS solutions for RNA sequencing, each depending on what is required from the output data. For determining transcript abundance, Illumina’s NovaSeq 6000 may be most effective due to its low cost and high throughput. If the goal is to sequence the entire transcript and isoforms, SMRT sequencing on a Pacific Biosciences platform could be best due to the combination of long reads and good accuracy. For more information about the RNA sequencing procedure, read this article: How to do RNA sequencing
There are a variety of epigenetic modifications that occur within complex genomes, but the characterization of cytosine methylations to form 5-methyl cytosine is probably most focused on. This could be explored by carrying out whole genome sequencing on Illumina’s NovaSeq 6000, followed by treating the sample with bisulfite and carrying out sequencing again. This would reveal all the methylations across the genome, but it would cost double as much as carrying out whole genome sequencing alone. Instead, the Ion Torrent system could be used to solely focus on a subset of cytosines, meaning methylated DNA would be selected for by affinity enrichment or enzymatic compartmentalization using methylation sensitive enzymes.
Microbial genomes are much smaller than human genomes. Therefore, it would be possible to sequence hundreds of microbial genomes on a single lane of Illumina’s NovaSeq 6000 run. However, microbiome sequencing involves a complex mixture of micro-organisms, which is predicted to become more and more useful in determining human health. These tests will require sequence throughput and sufficient accuracy to differentiate between closely related species.
Whole genome sequencing
This is the most comprehensive way to analyze a genome as it reveals single nucleotide polymorphisms, indels and alterations in copy number. A disadvantage of whole genome sequencing is the cost, so consideration needs to be taken about whether it is actually necessary to analyze such a large region. Second-generation NGS technologies present challenges for whole genome sequencing due to their short-read abilities, so third-generation NGS platforms are much more beneficial for exploring the structure of complex genomes. The ideal solution may be to combine both short-read technology for output and cost, and long-read technology for structure and the analysis of complex genomic regions.
Whole exome sequencing
Usually, whole genome sequencing is too much sequencing and requires huge amounts of data to store. Often, the next viable alternative is to focus on the much more manageable exome instead. Therefore, only second-generation NGS should be considered, as the pieces of DNA that are captured are only exome sized. The Illumina NovaSeq 6000 may be a good option.
Open Resources for NGS
Image credit: Argonaut Manufacturing Services