Nucleic acid sequencing has become an integral part of modern biomedical research. The advances in sequencing technology, from its invention to modern-day equivalents, have been extraordinary. 2022 marks the 50th anniversary of the sequencing of the first complete gene. It’s impressive to see how far science has come – so let’s take a look.
In 1965, Robert Holley sequenced the first tRNA (for alanine), for which he was awarded the Nobel Prize in 1986. Holley’s team of researchers determined the tRNA’s structure by using two ribonucleases to split the tRNA molecule into pieces. Each enzyme split the molecule at location points for specific nucleotides, and the structure had to be manually ‘puzzled out’ by hand, by an entire team.
This was taken one step further when, in 1972, Walter Fiers became the first to sequence the DNA of a complete gene (the gene encoding the coat protein of the bacteriophage MS2). He utilised RNAses to digest the virus RNA and isolate oligonucleotides. He then separated them via electrophoresis and chromatography.
The first major breakthrough in sequencing technology was made by Fredrick Sanger in 1977, when he and his colleagues introduced the “dideoxy” chain-termination method for sequencing DNA molecules, also known as “Sanger Sequencing”. It earned him his second Nobel Prize. This method was later used by Sanger and colleagues to sequence human mitochondrial DNA (16,569 base pairs), and bacteriophage λ (48,502 base pairs) – the first complete genome. Sanger sequencing was the most widely used form of sequencing for over 30 years.
Sanger sequencing requires a single-stranded DNA template, a DNA primer, a DNA polymerase, normal deoxynucleotide triphosphates (dNTPs), and modified di-deoxynucleotide triphosphates (ddNTPs), the latter of which terminate DNA strand elongation. The DNA sample is divided into four separate sequencing reactions, and to each reaction only one of the four dideoxynucleotides (A, T, G or C) is added. The dideoxynucleotides are added at very low concentrations, such that they are very rarely incorporated. Because of this, many different DNA strand lengths are formed, each with a radioactive nucleotide at their terminus. By ‘lining up’ all the varied length strands from all four reactions, one is able to see where each nucleotide occurs.
However, Sanger sequencing lacked automation and was extremely time consuming. Given the potential of Sanger sequencing, there was a great deal of work done to automate it. In 1987, Leroy Hood and Michael Hunkapiller succeeded in the automation of the Sanger sequencing process. They brought two major improvements to the method. Firstly, DNA fragments were labelled with fluorescent dyes instead of radioactive molecules. Secondly, data acquisition and analysis was made possible on the computer. The resulting instrument was named as ABI 370 and can now be found in certain science museums!
In 1996, Mostafa Ronaghi, Mathias Uhlen and Pȧl Nyŕen introduced a new DNA sequencing technique called pyrosequencing. This automated technology is based on the measurement of luminescence generated as a result of pyrophosphate synthesis during sequencing (sequencing-by-synthesis technology). It is classified as high-throughput sequencing. This was later implemented in an automated system, the 454 system, that was the first next generation sequencing platform to come to market.
Other notable platforms that are based on different technologies are SOLiD system’s “sequencing-by-ligation” from 2007, and the Ion Torrent by Life Technologies from 2011, which uses “sequencing-by-synthesis” technology that detects hydrogen ions when new DNA is synthesised.
Oxford Nanopore Technologies’ systems such as GridION, MinION or Flongle are portable handheld systems for RNA and DNA sequencing. The GridION was first introduced in 2012 and uses the changes in electrical conductivity that occur when DNA strands pass through biological nanopores to identify the nucleotide sequence.
Next generation sequencing
NGS includes various technologies that perform sequencing and gather data from multiple reactions running simultaneously. It’s also referred to as massive parallel sequencing. Even though there are many NGS platforms available, all of them follow three general steps:
– Sample/library preparation: A library is prepared by fragmenting the DNA sample and ligating it with adapter molecules.
– Amplification and sequencing: The library is converted into single stranded molecules. Amplification creates clusters of DNA molecules. Each cluster acts as an individual reaction where a sequencing run is performed.
– Data output and analysis: At the end of the reaction, each NGS run provides a large amount of raw data.
The human genome project and 100,000 genomes project
The Human Genome Project was the international research effort to determine the DNA sequence of the entire human genome. It took 13 years and was published in 2003, with an estimated cost of over $300 million. Today, a whole human genome can be sequenced in one day for under $1000.
The 100,000 Genomes Project was first announced by UK Prime Minister David Cameron in December 2012, resulting in the creation of Genomics England. In December 2018, the full 100,000 genomes milestone was reached, taking over half the time that sequencing just one genome took in 2003.
It’s hard to imagine how much further sequencing can go. Are we reaching a plateau of optimisation, where the rate of efficiency gain will slowly drop off? Time will tell.
Image Credit: Canva