Mobile Menu

5 recommendations for improving rigour and reproducibility in single-cell genomics

An opinion article by Professor Gregory Gibson, Georgia Institute of Technology and director of the Center for Integrative Genomics, offers suggestions for improving rigour and reproducibility in single-cell genomic studies.

We all know that single-cell genomics is a powerful tool to better understand the properties of tissues and organs in health and disease. The power of single-cell genomics cannot be realised without the parallel emergence of powerful pipelines and algorithms for analysing large datasets. Gregson’s central thesis, published in PLOS Genetics, is that training in the underlying statistical foundations of single-cell genomics is not widely available, which combined with a culture that favours under-reporting of the effect of analytical decisions, is contributing to over-confidence in the technique.

He notes that there is good reason to be confident that the majority of findings in the literature are correct, but says “I am concerned about a publishing culture that often glosses over the uncertainty that is necessarily intrinsic to [the] analysis of what are among the most complex and voluminous datasets ever produced.”

He offers perspectives on five areas where robustness can be improved.

1. Reproducibility

The single-cell field has failed to establish expectations for reproducibility that are commonplace in other areas of genomics. In the absence of independently obtained datasets, two approaches to enhance reproducibility should be encouraged by journals.

First, investigators should endeavour to confirm key findings by an independent team of analysts provided with the same dataset and bioinformatic objective.

Second, the principles of cross-validation should be adopted as a routine component of bioinformatics. Such approaches have quickly become standard in the field of polygenic risk assessment and could be implemented readily in single-cell genomics.

2. Clustering

One of the key steps in single-cell analysis is the  first post-quality control step of assigning cells to clusters; this is acknowledged as a major source of irreproducibility. To improve reproducibility and improve consensus on what evidence is required to identify a set of cells as a distinct cluster, three standards should be adopted by the community.

First,transparent reporting — for example, in the Methods section of papers — of the criteria used to define clusters and the pipeline used.

Second, reporting the robustness of cluster assignments in the form of a reproducibility metric.

Third, only including those cells that repeatedly cluster together as the core cells of each cluster for downstream analysis.

3. Evaluation of significance

It is widely appreciated that hypothesis testing in single-cell data is fraught with inconsistency, and is often linked to inflated test statistics.

Confidence in p-values can be misplaced in the context of single-cell genomic data. A shift to more consideration of variance components would be beneficial. To this end, applications such as variancePartition and Principal Variance Component Analysis that are available for bulk RNAseq data will need to be further developed for single-cell datasets.

4. Covariate adjustment

Most single-cell genomic datasets derive from a few individual donors, so it is common practice for investigators to pool cells from multiple individuals without considering the random effect of the donor.

Existing single-cell analysis pipelines were not designed to handle random effects, so advances in statistical methodology are needed. Such tools will help optimise experimental design and encourage attention to the impact of random individual sample variation. They will also help investigators decide on sequencing depth, cell number and sample size.

5. Normalisation

The critical role that normalisation has for downstream inference is underappreciated in single-cell genomics. But detection of differential expression is strongly dependent on the mode of normalisation. For this reason, it is good practice for analysts to pursue multiple strategies of normalisation.

Reviewers and editors should be encouraged to promote analytical diversity. It should be regarded as a sign of careful and thoughtful bioinformatics, rather than requiring the use of a standard pipeline.

Conclusions

Overall, Gibson advocates that there needs to be more reporting of analytical decisions in publications. Without this, single-cell profiling is likely to suffer a reproducibility crisis.  He hopes that implementing some of his recommendations will encourage more collaboration between biologists and statisticians. Gibson hopes to “nudge the field toward more acceptance of the ambiguity in single-cell genomic interpretation as a consequence of the complexity of the datasets”.

Written by Charlotte Harrison, Science Writer

Image Credit: Canva