A new study has revealed the development of PRECAST, a data integration method for multiple spatial transcriptomics datasets. Spatially resolved transcriptomics is a set of emerging technologies that enable the transcriptomic profiling of tissues with the physical location of expressions. However, current methods for data integration tend to focus on single-cell RNA-seq datasets, without considering spatial information.
PRECAST aims to fill this gap by unifying spatial factor analysis simultaneously with spatial clustering and embedding alignment, while only requiring partially shared cell/domain clusters across datasets. The study shows that PRECAST is computationally scalable and applicable to spatial transcriptomics datasets from different platforms, and has been tested on both simulated and (four) real datasets. The results of the study demonstrate improved cell/domain detection with outstanding visualization, and the estimated aligned embeddings and cell/domain labels facilitate many downstream analyses.
The challenges of integrating spatial data
Spatial transcriptomics is a rapidly growing field that utilizes new technologies to study gene expression profiles while retaining information on the physical location of the tissue. These technologies include in situ hybridization (ISH) methods such as MERFISH, seqFISH, and seqFISH+, as well as in situ capturing technologies like ST, HDST, Slide-seq, and 10x Genomics Visium. Spatial provides researchers with the ability to explore the interactions between cells and their local environment, identify genes with spatial variations, and track changes in gene expression over time.
Similar to single-cell RNA-sequencing (scRNA-seq), spatial studies require the identification of cell/domain clusters using both spatial and expression information. However, when analysing multiple spatial datasets from different slides or conditions, it is important to remove unwanted variations caused by batch effects or biological differences.
Many current data integration methods for scRNA-seq do not consider spatial information, and most existing methods for spatial data integration are limited to low-dimensional space using the principal components of conventional dimension reduction. Researchers developed PRECAST with the aim of creating a rigorous method for data integration of multiple spatial datasets, which will be capable of estimating shared embeddings of biological effects, aligning the embeddings, and clustering the aligned embeddings to obtain cell/domain clusters across datasets that promote spatial smoothness.
How PRECAST works
PRECAST is a unified and principled probabilistic model that simultaneously estimates embeddings for cellular biological effects, performs spatial clustering, and aligns the estimated embeddings across multiple tissue sections. This is a key advantage over other existing methods of data integration, which often perform these tasks sequentially. Furthermore, PRECAST takes advantage of CAR to account for the local microenvironments of neighbouring spots, and an intrinsic CAR component has been used to promote spatial smoothness in the observed expressions of SRT data, which is not found in other methods.
In terms of applications, PRECAST can be used for visualization, trajectory analysis, and SVA and DE analysis for combined tissue slices. Additionally, the method can further remove batch effects across multiple tissue slides based on housekeeping genes, making expression data comparable for different cell/domain clusters. This is particularly useful when analyzing SRT datasets from multiple tissue slides, where biological variations between cell/domain types are often confounded by factors related to data generation processes.
Overall, PRECAST is a potentially powerful new tool for data integration in the field of SRT, providing a unified and principled approach for aligning shared embeddings of biological effects while accounting for complex batch effects and/or biological effects between slides.