The order of nucleotides within a sequence of DNA is important. It underpins subsequent protein function and, in cases of mutations, the subsequent disease manifestation. Determining the order of nucleotides within biological samples is an integral part of a range of research applications. Over the past 50 years, a number of techniques and technologies have been developed to facilitate this feat. Throughout this period of time, there has been tremendous changes and developments, enabling us to get more precise and gain more insights into the genetic code of life. In this blog, we will go through the history of DNA sequencing technologies, highlighting some of the key discoveries and researchers along the way.
First Generation DNA sequencing technologies
Working from crystallographic data produced by Rosalind Franklin and Maurice Wilkins, James Watson and Francis Crick famously solved the 3D structure of DNA in 1953. However, the ability to read the genetic sequence did not follow for some time. Initial progress remained slow as the techniques available were mainly borrowed from analytical chemistry. These techniques were only able to measure nucleotides composition and not order. Nonetheless, by combining these techniques with selective ribonuclease treatments, Robert Holley and colleagues in 1965 produced the first whole nucleic acid of alanine tRNA from Saccharomyces cerevisiae.
Meanwhile, Fred Sanger and colleagues developed a related technique based on the detection of radiolabelled partial-digestion fragments. The famous Sanger sequencing originated in the late 1970s when Sanger developed a gel-based method that combined a DNA polymerase with a mixture of standard and chain-terminating nucleotides (ddNTPs). Mixing dNTPS with ddNTPs leads to random early termination of sequencing reactions during PCR. Four reactions are run in parallel, each containing one version of a chain-terminating nucleotide. Visualisation of this process using gel electrophoresis enables the sequence to be read off base by base. At the time, this technique was revolutionary. It enabled sequencing of 500-1,000bp fragments.
A variant of the Sanger method – plus and minus method – developed by Sanger and Alan Coulson led to the first sequence of a DNA genome, that of bacteriophage φX174, in 1977. Two years later Allan Maxam and Walter Gilbert published their chemical cleavage technique which became the first widely adopted method for DNA sequencing.
By the 1980s, Sanger’s original method had been automated (capillary electrophoresis). Large slab gels were replaced with acrylic-finer capillaries and the results could be viewed on an electropherogram. This technology was essential to the completion of the Human Genome Project in 2003. Nonetheless, even after the Human Genome Project, the cost of capillary electrophoresis remained too high to enable large-scale sequencing projects. Driven largely by grants from the NHGRI, by the mid-2000s, there were several efforts attempting to bring the costs of sequencing down. Labs across the world were testing out new methods and techniques for higher-throughput screening.
Second Generation DNA sequencing technologies
Alongside developments in large-scale dideoxy-sequencing efforts, another technique emerged that set the stage for the first wave of next generation sequencing (NGS) technologies. The pyrosequencing technique method utilised the luminescence produced during pyrophosphate synthesis. This approach was used to infer sequences by measuring pyrophosphate production as each nucleotide washed through the system. Additionally, in this approach, the template DNA was fixed to a solid phase. Later improvements incorporated beads that were attached DNA.
Pyrosequencing was licensed to 454 Life Sciences (later purchased by Roche), where it evolved into the first major successful commercial NGS technology. Libraries of DNA molecules were attached to beads and then underwent a water-in-oil emulsion PCR. Pyrosequencing can then occur as smaller bead-linked enzymes and dNTPs are washed over the plate. This parallelisation increased the yield of sequencing efforts by orders of magnitudes.
Following the success of 454, a number of parallel sequencing techniques emerged. Among the most important being the Solexa method of sequencing (later acquired by Illumina). In this approach, adapter DNA molecules are passed over a lawn of complementary oligonucleotides bound to a flow cell. A process known as bridge amplification then allows the formation of dense clusters of amplified fragments. This allows a fluorescent signal to be detected every time a single dNTP is added sequentially as sequencing-by-synthesis proceeds. Over time, the number of clusters being read grows. Illumina instruments became the first commercially available massively parallel sequencing technology.
Other technologies during the time emerged including Ion Torrent which measures the pH difference during polymerisation and SOLiD which involves sequencing-by-ligation rather than synthesis (e.g., catalysed with a polymerase). These technologies also became part of the NGS landscape. NGS platforms are the dominant type of sequencing technology used today. They enable large-scale sequencing at relatively low costs. They are however limited in read length – NGS platforms typically produce reads of ~50-500bp in length.
Third Generation DNA sequencing technologies
One of the most widely used third-generation technologies is the single molecule real time (SMRT) platform from Pacific Bioscience. This technique uses miniaturised wells, known as zero-mode waveguides, in which a single polymerase incorporates labelled nucleotides and light emission is measured in real time. PacBio machines are also capable of producing incredibly long reads, up to and exceeding 10 kb in length. These reads are particularly useful for de novo genome assemblies.
The most anticipated area for third-generation DNA sequencing development is the promise of nanopore sequencing. Their potential was first established way before second-generation sequencing had emerged. Researchers had demonstrated that single-stranded RNA or DNA could be driven across a lipid bilayer through large α-haemolysin ion channels by electrophoresis. Nanopore sequencing enables direct, real-time analysis of long DNA or RNA fragments. It works by monitoring changes to an electrical current as nucleic acids are passed through a protein nanopore.
Oxford Nanopore Technologies, the first company offering nanopore sequencers, has generated a lot of excitement around their nanopore platforms, including GridION and MinION. The MinION in particular, is a small, mobile phone sized USB device, that has been utilised out in the field, including during the Ebola virus epidemic. The fast run times and compact nature of the MinION machine also presents the opportunity to decentralise sequencing. With further refinements, nanopore sequencers could revolutionise not just the composition of the data produced, but also where, when and by whom it can be produced.
DNA sequencing is fundamental to measure major properties of various life forms. Over the past 50 years, researchers from around the world have invested a lot of time and effort into developing and improving the technologies that underpin DNA sequencing. Alongside the evolving DNA sequencing landscape, our research teams have also evolved to accommodate the increasingly demanding technological intricacies of these techniques. Researchers have moved from the lab to the computer and have gone from pouring gels to running code. DNA sequencing has a rich history, that has paved the way to new insights and lessons learned. Understanding and learning from these previous generations will help inform the progress of the next.
Image credit: By macrovector – www.freepik.com