Bioinformatics: Researchers Develop a New Machine Learning Approach
Published:01 Apr.2024    Source:Helmholtz Centre for Infection Research
Researchers from the Würzburg Helmholtz Institute for RNA-based Infection Research and the Helmholtz AI Cooperative applied data integration and artificial intelligence (AI) to develop a machine learning approach that can predict the efficacy of CRISPR technologies more accurately than before. The genome or DNA of an organism incorporates the blueprint for proteins and orchestrates the production of new cells. Aiming to combat pathogens, cure genetic diseases or achieve other positive effects, molecular biological CRISPR technologies are being used to specifically alter or silence genes and inhibit protein production.
 
One of these molecular biological tools is CRISPRi (from "CRISPR interference"). CRISPRi blocks genes and gene expression without modifying the DNA sequence. As with the CRISPR-Cas system also known as "gene scissors," this tool involves a ribonucleic acid (RNA), which serves as a guide RNA to direct a nuclease (Cas). In contrast to gene scissors, however, the CRISPRi nuclease only binds to the DNA without cutting it. This binding results in the corresponding gene not being transcribed and thus remaining silent. CRISPRi screens are a highly sensitive tool that can be used to investigate the effects of reduced gene expression. In their study, published today in the journal Genome Biology, the scientists used data from multiple genome-wide CRISPRi essentiality screens to train a machine learning approach. Supported by additional AI tools ("Explainable AI"), the team established comprehensible design rules for future CRISPRi experiments. The scientists were particularly surprised to find that the guide RNA itself is not the primary factor in determining CRISPRi depletion in essentiality screens. "Certain gene-specific characteristics related to gene expression appear to have a greater impact than previously assumed," explains Yu. The study also reveals that integrating data from multiple data sets significantly improves the predictive accuracy and enables a more reliable assessment of the efficiency of guide RNAs.
 
The characteristics of targeted genes have a significant impact on guide RNA depletion in genome-wide screens. Combining data from multiple CRISPRi screens significantly improves the accuracy of prediction models and enables more reliable estimates of guide RNA efficiency. The study provides valuable insights for designing more effective CRISPRi experiments by predicting guide RNA efficiency, enabling precise gene-silencing strategies.