Sample preparation is the process of getting DNA ready for Next Generation Sequencing (NGS). This requires a few steps…
- Nucleic acid extraction
- Library preparation
- Purification and quality control
Why is sample preparation important? Increasingly, NGS is being asked to handle more challenging samples, from diverse origins, of lower quality or of small size. Before these samples can be analyzed, they must be treated and prepared. This helps to prevent contamination, improve accuracy and minimize the risk of biases. Sample preparation is no longer just the ‘warm up’ for NGS – if any of the processes are done poorly, sequencing will not obtain successful results.
Sample preparation varies depending on the type of material being sampled and the purpose of the experiment. Different types of genetic material (DNA or RNA) have slightly different sample preparation processes. On top of that, the different applications of NGS add another dimension. Therefore, no preparation protocol is always optimal, and there are a number of questions that need to be asked before the experiment to determine the best methods. It would be impossible to cover each and every route that could be taken in one guide, but we have compiled a wealth of information about some of the most important aspects of sample preparation for NGS.
|What are the Different Types of Sequencing?||The different types of sequencing are explained to give the rest of the guide more context. Each sequencing technique requires different sample preparation protocols, some of which will be covered in more detail in other chapters.|
|Step-by-Step Guide of Sample Preparation||The basic sample preparation steps are covered – these are very generalized to cover many types of sequencing. The challenges that sample preparation faces are expanded upon.|
|How to Extract Nucleic Acids||The different sample types that nucleic acids can be isolated from are described and the steps for isolating nucleic acids are explained.|
|What are NGS Libraries?||The typical DNA library preparation protocol is explained, including fragmentation, attachment of adapters and library quantification.|
|How to Generate an RNA Sequencing Library||RNA library preparation for different types or RNA is explained. The basic steps for conducting single cell RNA sequencing are also covered.|
|An Overview of Targeted Sequencing||Information about targeted sequencing is included and some of the key methods of target enrichment are covered.|
|Resources for Sample Preparation||Further resources to gain additional information about sample preparation are provided, including reports, webinars and social media accounts to follow.|
Sample preparation is an exciting space right now – there are a huge variety of different industry providers that are all competing with each other, driving the quality of technology up and the prices of sample preparation down. Nevertheless, sometimes sample preparation protocols can seem to be a laborious, multistep complicated process. This guide has been designed to untangle the confusion surrounding the numerous processes and provide an insight into the basic steps needed to perform great sample preparation for NGS!
And to read about some of these topics in greater depth, download the Sample Preparation Guide for MPS. It was written by David I Smith and includes information about a variety of sample preparation topics, from the isolation of nucleic acids to generating several different types of NGS libraries. Download it here:
What are the Different Types of Sequencing?
Sample preparation processes differ depending on the type of sequencing being performed because each technology has unique considerations. The abundance of new applications for sequencing data is constantly growing, in turn driving the need for more diverse sample preparation protocols.
Here are some common types of sequencing:
Whole genome sequencing
The process of determining the entire DNA sequence of an organism’s genome at a single time. Nearly any biological sample containing a full copy of the DNA can provide the genetic material necessary for full genome sequencing, such as saliva, epithelial cells or bone marrow.
Whole exome sequencing
A technique for sequencing all of the protein-coding regions in the genome, known as the exome. The goal is to identify genetic variants that alter protein sequences and are responsible for diseases. The exons need to be selected before they can be sequenced – these are called target-enrichment strategies.
This allows the sequencing of specific areas of the genome for in-depth analysis more rapidly and cost effectively than whole genome sequencing. This method typically requires less sample input than other sequencing types. The two main methods are hybridization capture and amplicon sequencing (explained in the ‘An Overview of Targeted Sequencing’ chapter below).
Reveals the presence and quantity of RNA in a sample at a given time. This allows the analysis of the transcriptome – the set of all coding and non-coding RNA transcripts. This means that post-transcriptional modifications, gene fusions, mutations and changes in gene expression over time can be explored. During library preparation, RNA is reverse transcribed to complementary DNA (cDNA), because DNA is more stable, and this allows for amplification using DNA polymerase (explained in the ‘How to Generate an RNA Sequencing Library’ chapter below)
A tool to understand genome-wide methylation with single nucleotide resolution. DNA methylation is a process where methyl groups are added to the DNA molecule, changing the activity of a DNA segment without changing the sequence. Bisulfite treatment of DNA before sequencing yields information about the methylation status. It can be used to reveal epigenetic modifications and rare methylation events. Check out methylation sequencing in more detail here:
For more information about sequencing technologies, download the Sequencing Buyer’s Guide report. It provides an in-depth insight into different NGS technologies and how to achieve sequencing in the least expensive way!
Step-by-Step Guide of Sample Preparation
Sample preparation is essentially the steps that need to be taken to transform mixtures of nucleic acids from biological samples into different types of libraries ready to be sequenced by NGS technologies. If the protocols are not followed correctly, the success of sequencing will be compromised. Each step of the preparation is fundamental and has different considerations depending on the type of sample and NGS platform. Therefore, it is important to consider how the most efficient protocols can be carried out before starting the experiment to ensure the highest quality results.
The general steps for sample preparation are as follows:
Step #1: Extract the genetic material
This is the first step in every sample preparation protocol. Nucleic acids (DNA or RNA) are extracted from a variety of biological samples. These could be blood, cultured cells, tissue selections or urine (explained in the ‘How to Extract Nucleic Acids’ chapter below).
Step #2: Library preparation
A series of steps are needed to generate a library – the ultimate goal is to convert the extracted nucleic acids into an appropriate format for the chosen sequencing technology. This is done by fragmenting the targeted sequences to a desired length, followed by attaching specific adapter sequences to the end of these targeted fragments. The adapters may also include a barcode, which identify specific samples and permit multiplexing. The fragmentation can be done by physical or enzymatic methods (explained in the ‘What are NGS Libraries?’ chapter below).
Step #3: Amplification
This is an optional step, but it is usually required. It is dependent on the application of NGS and the sample size. Amplification becomes essential to obtain enough coverage for reliable sequencing for samples with small amounts of starting material. Polymerase chain reaction (PCR) is a common method to increase the amount of DNA. For more information about the advent of PCR methods that have enabled the detection of nucleic acids in small sample sizes, check out Next-Generation PCR ONLINE webinar series on-demand.
Step #4: Purification and quality control
This step is usually necessary to remove any unwanted material that could hinder sequencing. Some NGS platforms may have narrow size requirements, and so discarding too large or too small fragments can improve sequencing efficiency. The optimal library size is determined by the sequencing application. This ‘clean up’ is typically done by magnetic bead-based clean up or on agarose gels. Quality control is the final process before proceeding to sequencing. Confirming the quality and quantity of DNA improves the confidence of sequencing data. The experiments down the line are time-consuming and expensive, so tight quality control steps are needed to make sure that all the samples are fit for their applications.
Common challenges in sample preparation
Challenge 1: Many samples are extracted from a limited number of cells – or even a single cell. These don’t provide enough genetic material alone and so need to undergo PCR. However, this amplification step is prone to introducing bias to the sample. PCR duplication is when there are multiple copies of exactly the same DNA fragment. Too many PCR duplicates can lead to uneven sequencing coverage of the experiment.
Solution 1: It is somewhat impossible to eliminate all sources of bias, but it is important to know where the bias occurs and take all practical steps to minimize it. A high PCR duplication rate indicates that the library preparation needs some modification – it’s probably necessary to improve the complexity of the NGS library. Many programs exist that can remove PCR duplicates – the most commonly used are called Picard MarkDuplicates and SAMTools. Also, specific PCR enzymes have been shown to minimize amplification bias. Ultimately, the goal in library preparation is to do it in such a way where complexity of a sample is maximized and bias due to amplification is minimized.
Challenge 2: Inefficient library construction is a problem faced during sample preparation. It is reflected by a low percentage of fragments with the correct adaptors. The consequences are a decreased amount of sequencing data being obtained and an increased number of chimeric fragments. Chimeric reads are derived from portions of the genome that are not next to each other and are a source for errors during sequencing.
Solution 2: Efficient A-tailing of PCR products has been reported to prevent chimera formation – the procedure is universal and can be applied to a number of different library construction techniques. Additionally, strand-split artifact reads (SSARs) have been suggested to reduce the number of chimeric artifacts in a sample and chimera detection programs can be used to filter the raw sequences to achieve an overall chimera rate of just 1%.
Challenge 3: Sample contamination is an inherent problem because separate libraries are usually prepared in parallel. The most probable primary source of contamination is pre-amplification, which is a method that increases the amount of nucleotide sequences before PCR.
Solution 3: Contamination risk can be reduced by lowering human contact with the samples. Also, one room or area should be dedicated for pre-PCR testing. This room could further be divided into areas – one for PCR mixture preparation and another for the addition of the extracted nucleic acids into the mixture.
Challenge 4: The large costs of library preparation are mostly due to lab equipment, the need for trained personnel and reagent costs.
Solution 4: The introduction of using tagmentation reactions to combine fragments with an adapter has significantly reduced costs. The price per sample will decrease as less hands-on time is required. As automation techniques become ever-so popular, the accuracy and efficiency of sample preparation is likely to increase – although the initial cost of the instruments and maintenance may still be high.
How to Extract Nucleic Acids
The very first step of sample preparation is the isolation of nucleic acids. This involves a series of steps to obtain pure DNA or RNA. As this is the very starting point for a number of downstream applications, the high quality of nucleic acids is crucial for the success of sequencing later on.
The first question that should be asked is – what source are the nucleic acids being extracted from?
The best sample type to isolate nucleic acids from is probably a homogenous population of cells from an in vitro culture (a group of uniform cells obtained from studies conducted outside of the organism). For example, white blood cells isolated from a blood sample would be relatively homogenous. However, some clinical samples are not so homogenous and so have very limited amounts of nucleic acids to work with – a fine needle biopsy of a small tumor sample would most likely prove difficult to isolate from.
The quality of extracted nucleic acids depends on the quality of the starting sample. Fresh starting material is always recommended, but this is often not possible. So, samples should be stored appropriately, which usually involves freezing or cooling at specific temperatures.
The next question is – what are the nucleic acids going to be used for? And this comes with a range of further considerations. In particular, these depend on the type of sequencing machine.
Typical steps for extracting nucleic acids
- Cell disruption – The first step in nucleic acid isolation is to break apart the cell wall or cell membrane to release the genetic material. This can be achieved by physical, chemical or enzymatic methods. Specific disruption methods are usually chosen based on the properties of the sample, meaning that a wide range of tools and approaches are used together to achieve cell disruption.
- Physical methods use force and involve some sort of grinding or crushing, for example with a mortar and pestle under liquid nitrogen. They are usually used on more structured materials, such as tissues.
- Chemical methods disrupt cellular membranes with a variety of agents that denature proteins – detergents and chaotropes are commonly used.
- Enzymatic methods are usually used in combination with chemical and physical procedures. Typical enzymatic treatments include lysozyme, proteinase K and lipase.
Table of nucleic acid extraction techniques – the main characteristics of chemical and mechanical methods are described. Image credit: Harrison, 2003
- Removal of membrane lipids and proteins – Cellular debris may need to be removed before purification to reduce the carryover of unwanted materials, such as proteins and lipids, as they may interfere with reactions later on. This is usually done by centrifugation, filtration or bead-based methods.
- Nucleic acid purification – Once the debris has been removed, the nucleic acids can then be purified by one of many different methods – silica, ion exchange, cellulose or precipitation-based methods. Sample purity may vary depending on the aim of the analysis, the extraction method used, the starting material and the lab technician. Wash buffers usually contain alcohol and are used to eliminate contaminants from the sample.
- Nucleic acid concentration – The quantity of nucleic acids is also important, and if wrong, could cause adverse effects. If there is an insufficient nucleic acid concentration in a sample, unwanted products may be amplified during PCR or short-read lengths may be generated during sequencing. If the nucleic acid concentration is too high, increased background during sequencing procedures may occur. Spectrophotometric analysis and gel electrophoresis can be used to determine the quality and quantity of extracted nucleic acids.
Choosing an appropriate DNA extraction method
The choice of isolation method depends on the aim of the study, the type of analysis and the type of nucleic acids. Also, it is important to consider the sample type. For example, if you are looking at mucous samples, such as nasal discharges or sputum, the viscosity of the material would need to be decreased. This could be done using a mucous-dissolving agent, like the mucolytic acetylcysteine. The spleen and liver are transcriptionally active organs and so have a very high RNA content. If the samples were intended for DNA analysis, then they would have to be treated with ribonuclease (RNase) before purification to break the RNA down.
There are a number of other points that need considering when choosing a DNA extraction method:
- Ease-of-use – Some DNA isolation protocols need multiple washes and several spins in a centrifuge. The time spent carrying out these processes can add up and may not be worth it for the purification of a small amount of DNA.
- Time and throughput – Some extraction methods are difficult to automate, whereas some have already been automated. Decreased technician handling time not only saves time, but also reduces the opportunity for contamination with foreign materials and decreases some biases.
- Cost – Each extraction method has a different cost. Some techniques may have a higher initial cost, yet save money and prevent waste in the long-term, such as by eliminating the constant need for chemicals and plastic tubes.
What are NGS Libraries?
The preparation of a sequencing library is necessary before NGS analysis – a sequencing library is essentially a pool of DNA fragments with adapters attached. Numerous kits for making sequencing libraries are available commercially from a variety of vendors. Competition has steadily driven prices down and quality up.
Typical Method of DNA library preparation for NGS
This is the breaking of DNA strands into pieces, which can be done by using physical, chemical or enzymatic methods. A typical physical method is acoustic shearing, whereby short-wavelength, high-frequency acoustic energy is focused on the DNA sample to disrupt the molecules. Restriction endonucleases are usually used during enzymatic methods.
Attachment of adapters
Adapters are a short, chemically-synthesised oligonucleotide that can be attached to the ends of DNA molecules. They are designed to interact with a specific sequencing platform and act as barcodes to identify where each nucleotide was originally located. The DNA fragment ends are blunted and the 5’ ends are phosphorylated using a mixture of enzymes. Taq polymerase then facilitates the attachment of adapters to the DNA fragments. The optimal ratio is 10 adapters to one fragment. Too many adapters cause adapter dimers to form. If they are not removed, along with fragments of the wrong size, they will significantly lower the sequencing efficiency and quality. This can be done by magnetic bead-based clean-up, or the products can be purified on an agarose gel.
This is an important step because it provides the number of nucleic acids ready to be sequenced in a sample. This is key to obtaining high quality NGS data. There are a variety of techniques that can be employed to determine the number of nucleic acids present in an NGS library, such as UV absorption, intercalating dyes, hydrolysis probes and droplet digital emulsion PCR.
It is important to try and obtain the highest complexity level as possible in an NGS library because this will reduce the amount of bias. Library complexity refers to the number of unique DNA fragments present – in other words, the library should reflect the starting material as much as possible. Reductions in complexity usually result from PCR amplification during library preparation, which elevates the number of duplicate reads. Also, shorter fragments are typically less specific in terms of alignment and so further decrease the complexity of a sample. As NGS technology steadily evolves, sample requirements will become less strict and starting materials will require less amplification, thus improving library complexity.
Tagmentation is an alternative method for library preparation. It uses an engineered transposase enzyme to fragment the DNA and add specific adapters to both ends of the fragments, all at the same time. Therefore, it improves on traditional preparation procedures, as it combines DNA fragmentation, end-repair and adapter ligation all into a single step. However, this method is much more sensitive to the amount of DNA input compared to other fragmentation methods.
There are different ways to prepare a library, depending on the sequencing platform and the planned analysis. The Sample Preparation Guide for MPS provides information about generating libraries for different types of sequencing platforms, including for both short-read and long-read fragments. Check it out:
How to Generate an RNA Sequencing Library
RNA sequencing allows you to look at alternative gene spliced transcripts, post-transcriptional modifications, gene fusion, mutations and changes in gene expression over time. The cost of RNA sequencing is continually falling, enabling varied investigations of molecular biology in a more precise and thorough manner. It has been used clinically to help determine the optimal treatment strategies based on molecular alterations detected in cancers. It is thought that the future use of RNA sequencing alongside DNA sequencing would be a powerful tool to help cancer patients, although this has not yet been implemented.
When setting up an RNA sequencing experiment, one of the first considerations should be what is being investigated about transcription – Is gene expression being explored? Are the transcripts being characterized of high or low abundance? Is the aim to find out which strand each transcript was derived from? Answering these queries should help to determine what type of RNA sequencing libraries should be generated.
For example, if the objective of an RNA sequencing experiment is the discovery of complex transcriptional events, then the library should capture the entire transcriptome – coding, non-coding, antisense and intergenic RNAs. But if the aim is to study only the coding messenger RNA (mRNA) transcripts, the processes for library preparation will differ.
RNA library preparation
RNA library preparation tends to be specific to the sequencing platform. In all RNA sequencing experiments, RNA is isolated and converted into cDNA. This is so that the information can be input into an NGS platform. Also, DNA is more stable than RNA and it allows for amplification using DNA polymerases. Once the cDNA library has been constructed, the molecules are fragmented and amplified where appropriate. Adapters are then added to each end of the fragments. Next, a selection strategy may be used to enrich the library for the type of RNA of interest.
rRNA is the most abundant component of total RNA isolated from human cells and tissues – it comprises of up to 90% of an RNA sample. These must be removed from total RNA before sequencing to allow efficient gene detection. There are two main approaches – the selection of polyadenylated RNA (polyA) using oligo primers, and depletion of rRNAs through hybridization capture followed by magnetic bead separation. PolyA selection is used for most transcriptome studies because it only requires a low sequencing depth. Targeted depletion of rRNA is particularly useful when studying transcripts that lack a polyA tail, such as non-coding RNAs or partially degraded transcripts.
The accuracy of the detection of particular RNA species is largely dependent on the nature of library construction. Each stage of the RNA sequencing library preparation can be manipulated to enhance the detection of certain transcripts, whilst limiting the ability to detect other transcripts. For example, a protocol modification that should be considered is the fragmentation time of the RNA – if it is done before cDNA synthesis, it reduces strand-specific bias and provides a more accurate estimate of transcript abundance. Other possible improvements include the use of unique molecular identifiers (UMIs) to detect PCR duplicates and enhancing the analysis of degraded RNA, such as that obtain from formalin-fixed paraffin-embedded (FFPE) blocks.
Single cell RNA sequencing
RNA sequencing of large numbers of cells does not allow for detailed assessment of a single cell, or the individual nuclei that package the genome. This is a relatively new field as the first single cell RNA sequencing study was published in 2009. Since then, there has been a growing interest in conducting similar studies. Now, there are a number of vendors that produce kits for single cell RNA sequencing, such as Illumina, ThermoFisher, Cellecta and New England Biolabs.
Single cell RNA sequencing is being used more frequently because there are multiple copies of most transcripts in all cells, and the cost of carrying out single cell RNA sequencing is much less than whole genome sequencing. Assessments of transcriptional differences between individual cells have been used to identify rare cell populations. These may have remained undetected in pooled analysis, but now malignant tumor cells can be detected within a tumor mass, or single cells can be examined where each one is unique, like T-lymphocytes that express highly diverse T-cell receptors.
The single cell RNA library preparation procedure consists of isolating the single cells and disrupting these cells to allow for the capture of as many RNA molecules as possible. Primers are often used to enhance the capture of a specific RNA species, which are then converted into cDNA by a reverse transcriptase. The extremely small amount of cDNA needs to be amplified by PCR, which may introduce bias.
Currently, most of the costs associated with RNA sequencing are linked to cDNA preparation, but this is likely to follow NGS sequencing prices and decrease as RNA sequencing becomes more popular. The reduced costs will likely drive the trend of examining a larger number of individual cells in each study.
To find out more about the applications of single cell analysis and implementation of the latest technologies, check out the Biology at High Resolution report. It reviews in-depth the advances of single cell in biomedical research and proteomics. It also discusses the challenges faced in single cell data analysis and what lessons can be learned from its implementation:
An Overview of Targeted Sequencing
Targeted sequencing is a broad category that includes any technique that is focused on specific genes – everything from whole exome sequencing to small gene panels. Targeted gene sequencing produces a smaller and more manageable dataset, making analysis easier. Key genes of interest can be sequenced to a high depth, enabling the identification of rare variants. This usually provides cost effective findings for studies of disease-related genes.
Targeted gene panels have become popular in mainstream clinical care due to their relative affordability and focused application. They have been developed for studying many aspects of cancer, such as monitoring somatic changes and exploring the landscape of genetic aberrations to identify novel therapies or repurpose existing ones. These days, targeted gene panels are also being produced for liquid biopsies. These are non-invasive tests that can reflect all individual tumor mutations in real-time and hold promise for monitoring cancer initiation or relapse.
For further reading, the Liquid Biopsy report provides additional information about transformative technologies in the field.
What is target enrichment?
Generating libraries for targeted DNA analysis requires an extra step – target enrichment. It can be achieved through a variety of techniques, depending on several factors such as cost, ease-of-use and reproducibility.
Some key methods of enrichment are:
Hybridization capture-based target enrichment – DNA is fragmented and prepared for sequencing as normal, by adding adapters and barcodes. The DNA is then hybridized into single-stranded probes that are biotinylated. These can then be recovered using streptavidin magnetic beads. This enrichment method has many advantages, including its scalability, the retainment of start-stop codons and the ability to detect duplicates. However, it does require multiple amplification steps before sequencing, creating a lengthy and complex workflow, inferring high costs.
Amplicon-based target enrichment – Primers are used to amplify specific fragments of interest, enabling the simultaneous targeting of several regions, whilst only needing a limited amount of DNA input. It is possible to generate multiple products using multiplex PCR. PCR-based enrichment methods may not be ideal for targeting very large genomic regions due of the cost of primers and reagents, on top of the requirement for large DNA input amounts.
Currently, the target enrichment methods that are typically used can be complex and lengthy – sometimes workflows can take multiple days to complete. Novel target enrichment methods are continually being developed to increase the efficiency of library preparation for targeted sequencing.
Molecular inversion probes (MIP) – These allow adjustable targeting of specific regions of the genome using a pair of single stranded DNA probes that contain sequences complementary to the target, joined by a loop linker. This loop ensures that the probes bind close to one another, reducing the chance that they will bind off-target. PCR is then used to close the circle by copying the target region. There are only four steps to the MIP process, making the workflow much simpler than other target enrichment methods. It is also easy to automate and is readily scalable.
Diagram of an MIP – the single strand of DNA is joined by a link looper (orange) in between the probes (light blue). Image credit: T. Au Yeung, 2010
The Linked Target Capture (LTC) has also been designed to reduce the hybridization processes to less than eight hours as it replaces the long existing methods with a combined ‘target-capture-PCR’ workflow.
Resources for Sample Preparation
Here are some additional further reading resources relating to sample preparation for NGS:
Advances of NGS technologies that will drive greater sequence output and higher sequence accuracy are inevitable. The opportunities that this will provide for biomedical research will be hugely exciting, and at the start of all these processes is sample preparation – no matter what sequencing platform is used or what the applications of the sequencing are, sample preparation remains at the forefront of every such successful experiment.
Don’t forget to follow Front Line Genomics for more information about how genomics is being used to benefit patients.
Image credit: Nucleus Biotech