The advent of single-cell genomics has resulted in the generation of single-cell atlases at a rapid pace. For example, the Human Cell Atlas is the world’s largest, growing single-cell reference atlas. These atlases contain huge amounts of data, including millions of cells across tissues, organs, and developmental stages. However, using them as tools can be challenging; single-cell datasets may contain measurement errors (batch effect), the global availability of computational resources is limited, and the sharing of raw data is often legally restricted.
Now, a group of researchers has developed a deep learning strategy for mapping query datasets on top of a reference called single-cell architectural surgery (scArches). The new tool enables efficient, decentralized, iterative reference building and contextualization of new datasets with existing references without sharing raw data. The deep learning strategy has the ability to help researchers understand the influences of aging, environment, and disease on a cell—and ultimately diagnose and treat patients better.
This work is published in Nature Biotechnology in the paper, “Mapping single-cell data to reference atlases by transfer learning.”
“Instead of sharing raw data between clinics or research centers, the algorithm uses transfer learning to compare new datasets from single-cell genomics with existing references and thus preserves privacy and anonymity,” noted Mohammad Lotfollahi, the leading scientist of the algorithm and a third year PhD candidate in computational biology at the School of Life Sciences at the Technical University of Munich (TUM). “This also makes annotating and interpreting of new data sets very easy and democratizes the usage of single-cell reference atlases dramatically.” Lotfollahi is a student in the lab of Fabian Theis, PhD, director of the Institute of Computational Biology in Neuherberg, Germany.
Using examples from mouse brain, pancreas, immune, and whole-organism atlases, the authors showed that scArches, “preserves biological state information while removing batch effects, despite using four orders of magnitude fewer parameters than de novo integration.”
In addition, the researchers applied scArches to study COVID-19 in lung bronchial samples. They compared the cells of COVID-19 patients to healthy references using single-cell transcriptomics. The algorithm was able to separate diseased cells from the references and thus enabled the user to pinpoint the cells in need for treatment, for both mild and severe COVID-19 cases. Biological variation between patients did not affect the quality of the mapping process.
“Our vision is that in the future we will use cell references as easily as we nowadays do for genome references,” asserted Theis. In other words, “if you want to bake a cake, you usually do not want to try coming up with your own recipe—instead you just look one up in a cookbook. With scArches, we formalize and simplify this lookup process.”