New Tool Unifies Single-Cell Data
Published:25 Feb.2024    Source:Wellcome Trust Sanger Institute
A new methodology that allows for the categorisation and organisation of single-cell data has been launched. It can be used to create a harmonised dataset for the study of human health and disease. Researchers at the Wellcome Sanger Institute, the University of Cambridge, EMBL's European Bioinformatics Institute (EMBL-EBI), and collaborators developed the tool, known as CellHint. CellHint uses machine learning to unify data produced across the world, allowing it to be accessed by the wider research community, potentially driving new discoveries.
 
Single-cell genomics enables the understanding of every cell in the context of the human body at high resolution. Currently, a challenge in assembling the diverse datasets produced by single-cell research is that there is no unified system for naming and organising data. To address this, researchers from the Wellcome Sanger Institute, and collaborators developed CellHint, which can unify cell types produced by independent laboratories. CellHint then places the data into a defined graph that shows the relationships between cell subtypes, giving a full picture of all the cells identified across different datasets. The researchers also applied CellHint to 12 tissues from 38 datasets, providing a deeply curated cross-tissue database with around 3.7 million cells. Each cell was annotated, which is the process of labelling cells with particular information. They also showed how it can create various models for automatic cell annotation across human tissues.
 
CellHint stands out from other tools because it makes full use of the often inconsistent but valuable cell annotation information from individual studies, to achieve biologically-driven data integration. Researchers are excited that with CellHint, cells from independent laboratories can be re-annotated and they can utilise the resulting information to put each cell into different contexts beyond the original study. They hope that this tool will greatly facilitate the reuse of molecular and cellular data and information across laboratories, potentially driving new discoveries in biology.