A new study, published in Nature Biotechnology, has developed an artificial intelligence (AI) algorithm capable of identifying single diseased cells. The ability of this deep learning method to differentiate between nuanced cell types shows the potential for use in personalised medicine.
Single-cell reference atlases
Large single-cell reference atlases, such as the Human Cell Atlas, map healthy cell types across the human body. They use single cell and spatial genomics as well as computational techniques to discover which genes are switched on in an individual cell. Currently, researchers can map a query dataset onto an atlas to understand the influence of ageing and disease on cells.
However, current single-cell reference atlases are not perfect. When comparing a query dataset to the atlas, the accompanying data has typically been generated in different laboratories with different experimental techniques. This often results in measurement errors. The application of data integration methods can remove these errors, but legal restrictions on data sharing can impede this. In addition, reference atlases require a high level of computational resources and expertise.
The team behind the current study developed a deep learning strategy to overcome these challenges. Their method is able to map query datasets onto a reference more effectively than current methods.
Transfer learning preserves privacy
The researchers developed an AI algorithm called single-cell architectural surger (scArches). scArches uses transfer learning to allow efficient mapping of query datasets onto references without the need to share raw data, which can be legally challenging.
“Instead of sharing raw data between clinics or research centres, the algorithm uses transfer learning to compare new datasets from single-cell genomics with existing references and thus preserves privacy and anonymity. This also makes annotating and interpreting of new data sets very easy and democratises the usage of single-cell reference atlases dramatically,” first author Mohammad Lotfollahi said.
The efficiency of scArches
To measure the efficiency of the algorithm, the team evaluated its performance against de novo integration methods. De novo data integration has no restrictions, unlike scArches, which is designed to operate without sharing raw data and with limited computational resources. Amazingly, the researchers found that scArches performed at a similar level to the de novo integration methods. Also, scArches was able to operate faster than the de novo strategies. In less than one hour, it was able to map one million query cells.
In addition, the researchers found that scArches was able to differentiate between cell states that were transcriptionally similar. For example, natural killer and natural killer T cells are highly alike. Despite this, scArches was able to separate the two cell types into distinct clusters.
Identification of COVID-19 cells
To explore whether scArches could be used to study COVID-19, the team applied scArches to several lung bronchial samples. The researchers used single-cell transcriptomics to compare lung cells of patients with COVID-19 to healthy lung cells. Despite biological variations between the patients, scArches was able to successfully separate diseased cells from healthy cells. These results suggest scArches could be applied to further investigate the effect of COVID-19 on human cells.
The AI algorithm developed in this study demonstrates the ability of transfer learning models to separate different cell states without sharing raw data and with limited computational resources. These types of models will allow for the wider use of single-cell reference atlases in the future.
Additionally, the team showed that scArches was able to identify specific diseased cells in patients with COVID-19. This finding demonstrates a potential role for scArches in disease diagnosis and personalised patient treatment. The researchers hope that, in the future, scArches can be applied to large disease reference datasets. This will allow for the assessment of similarities between diseases at the single-cell level.
Senior author Fabian Theis said:
“Our vision is that in the future we will use cell references as easily as we nowadays do for genome references. In other words, if you want to bake a cake, you usually do not want to try coming up with your own recipe — instead you just look one up in a cookbook. With scArches, we formalise and simplify this lookup process.”
Photo by Michael Dziedzic on Unsplash