Comparative Analysis of Alternative Splicing Events in Foliar Transcriptomes of Potato Plants Inoculated with Phytophthora Infestans

Alternative splicing (AS) is a common process during gene expression of plants in coping various biotic or abiotic stresses. The work reports identification and analysis of AS events in foliar samples of two potato lines, including a wild type line and a pathogen resistant transgenic line (+RB), inoculated with Phytophthora infestans . After combining all RNA-seq data collected from 36 samples, a total of 10,246 AS events were identified, including 1,563 exon skipping, 1,368 alternative donor sites, 3,091 alternative acceptor sites, 884 intron retention, and 3,340 complex events, which consisted of more than one basic event. These AS events were generated from 45,874 isoform transcripts expressed from 13,704 genes. It was estimated 30.2% of genes undergoing AS in this analysis. Furthermore, we identified 406 specific AS events, which were generated from 281 genes, and 766 differentially expressed transcripts (DETs) in the sample collected 24 hours after inoculation of P. infestans in +RB lines. These DETs were expressed from 763 genes, and among them, 338 genes were alternatively spliced. These results indicate that both AS and differential gene expression may contribute to the resistance against P. infestans in +RB line of potato plants.

Potato (Solanum tuberosum L.) is one of the most important food crops. A number of genes with pre-mRNA subjecting to AS in potato have been reported. For example, under cold stress, an invertase gene generated two transcripts, with one of them having an exon skipping event (Anne-Sophie et al., 1996); BRANCHED1a (BRC1a) gene, which encodes a TCP transcription factor that controls lateral shoot outgrowth, generated two transcripts, with one of them retaining an intron (Nicolas et al., 2015); one of two StPPCK2 (phosphoenolpyruvate carboxylase kinase, PPCK) genes produced an isoform transcript with an intron retained (Marsh et al., 2003); and amongst 8 potato metacaspases (SotubMCs) genes, SotubMC2, SotubMC4, SotubMC6 and SotubMC7 genes produced multiple alternative spliced variants of different lengths (Dubey et al., 2019). The first draft potato genome represented approximately 86% of the 844-megabase genome with 39,031 protein coding genes identified from a homozygous doubled-monoploid potato clone (Potato Genome Sequencing Consortium, 2011). The genome sequence super scaffolds were further assembled into pseudomolecules corresponding to 12 potato chromosomes, representing 674 Mb of the 723 Mb genome assembly having 37,482 protein coding genes (Sharma et al., 2013). RNA sequencing (RNA-seq) is using next-generation sequencing (NGS) technology to sequence and quantify RNA transcripts of a transcriptome (Chu and Corey, 2012). Thus, the available genome sequences and RNA-seq data provide an unprecedented opportunity for genome-wide identification of AS events in potato plants.
Phytophthora infestans is an oomycete, a fungus-like microorganism that causes the serious potato disease known as late blight or potato blight. Gene RB cloned from Solanum bulbocastanum confers broad spectrum resistance to potato late blight (Song et al., 2003;Van Der Vossen et al., 2003). Using an RNA-seq approach, Gao and Bradeen (2016) collected the foliar transcriptomes in response to P. infestans inoculation of two lines of potato plants, including a wild type line and a transgenic line (+RB) which contains the disease resistance (RB) gene against P. infestans.Their analysis focused on the identification and analysis of differentially expressed genes (DEGs). Because AS in plants is a common process in development and growth regulation, and in coping various biotic and abiotic stresses, including plant defense against pathogens (Gassmann, 2008), we used RNA-seq data generated by Gao and Bradeen (2016) to examine AS events and differential isoform expressions in foliar transcriptomes of the two contrasting lines of potato plants inoculated with P. infestans. This work is part of our efforts aiming at integrating multiple data resources to generate a catalog of genes subjecting to AS in potato plants.

Genome and RNA-seq data collection
The genome sequence and annotation files (version 4.03) were downloaded from the Phytozome database (https://phytozome-next.jgi.doe.gov/) (Sharma et al., 2013). RNA-seq data from the NCBI Sequence Read Archive (accession number SRP073120) were downloaded (https://www.ncbi.nlm.nih.gov/sra/docs/sradownload/) using SRA Toolkit. The dataset consisting of 36 transcriptomes from foliage samples were generated by Gao and Bradeen (2016). The foliar transcriptomes were collected from wild type (WT) and transgenic line (+RB) plants with each inoculated with either P. infestans or water and sampled at 0 (pre-inoculation), 6 and 24 hours-post inoculation with three bio-replicates. The details of the plant materials, RNA-preparation and sequencing were described by Gao and Bradeen (2016).

Transcripts functional annotation
The transcript sequences were retrieved using the gtf_to_fasta tool in the TopHat package (Kim et al., 2013). The annotation information was extracted from the annotation file for previously annotated transcripts (version 4.03), and new transcripts were functionally annotated, using BLASTX search against UniProt-Swiss-Prot database. The GTF file, transcript sequence file, and additional data can be downloaded at: http://proteomics.ysu.edu/publication/data/Potato/GAO2016/.

Mapping RNA-seq data to the genome
A total of 36 RNA-seq data were collected from NCBI SRA database which were generated from foliar samples by Gao and Bradeen (2016). The data varied from 7 to 27 million paired reads. These sequence data were mapped to the genome with a mapping rate varying from 29.4 to 77.5% and 1.9 to 5.6% reads mapped to two or more genomic loci, with a total of 547.8 million reads (53.1%) mapped to the reference genome. Among them 3.8% were mapped to multiple loci.

Identification of AS events and comparative analysis between two lines with or without pathogen treatments
AS events were identified from each of 36 RNA-seq data. Because both the numbers of raw reads and the mapping rates varied substantially among treatments, comparing the numbers of AS events would not provide a meaningful outcome. Intuitively, we expect that the number of AS events would have a strong positive correlation with the number of mapped reads, as more reads mapped to the genome would generate more transcript isoforms and AS events. The correlation analysis of mapped reads and the AS events shows a strong positive linear correlation with a correlation coefficient (R) of 0.97 regardless of the treatments ( Figure 1A). That is, 94% (R 2 ) of the differences among the numbers of AS events in different RNA-seq data can be explained by the numbers of mapped reads. Furthermore, we used cuffmerge tool to merge the mapping gtf files of the three replicates in each treatment for comparing the profile of AS events at different sampling times after inoculation. A trend similar to the individual mapping sample data was found, that is, the AS events in the combined data among treatments were highly positively correlated with the number of mapped reads (R 2 = 0.95) ( Figure 1B). This was also true when the data collected at different times were further combined, the numbers of mapped reads could explain 98% of the differences in the numbers of AS events in the four treatments (R 2 = 0.98). The categories of AS events in each treatment at different sampling times, merged data for all time points, and all merged data were listed in Table 1. The types of AS events were further classified into four basic types including intron retention (IR), alternative acceptor site (AA), alternative donor site (AD), and exon skipping (ES) (Sablok et al., 2011). Other events or complex events refer to two isoforms having two or more basic AS events. When individual sample data were analyzed, the average proportions were 26.0% AA, 23.8% IR, 15.7% ES, and 13.6% AD, respectively, though there were some variations among different samples. However, when we merged data of sample replicates, the average proportions were 29.9% AA, 18.8% ES, 15.9% AD, and 9.1% IR. The proportion of IR was greatly decreased in comparing with the individual RNA-seq data (Table 1). Similar trends were clearly Computational Molecular Biology 2023, Vol.13, No.1, 1-8 http://bioscipublisher.com/index.php/cmb observed when further combining data from different sampling times in each treatment (Table 1). We speculate that this is caused by cuffmerge, as this tool filters out a number of lowly expressed "transfrags" that are "probably artifacts." After the combination of all RNA-seq data generated in this project, a total of 10,246 AS events were identified, including 1,563 ES, 1,368 AD, 3091 AA, 884 IR, and 3,340 complex events (Table 1). These AS events were generated from 13,704 genes with 45,874 isoform transcripts. Since a total of 45,305 genes were identified using RNA-seq mapping, it was estimated 30.2% of genes undergoing AS in potato in this project. As the number of AS events is greatly dependent on the number of mapped reads, it cannot be used to directly detect the effects of treatments of +RB gene and P. infestans inoculation. Thus, we compared the profiles of AS events among treatments to examine the specific changes in AS events of treatment effects. The dynamic changes of AS events were clearly demonstrated in the data sampled at 0 h (pre-inoculation), 6 h and 24 h after inoculation in four different treatments (Figure 2). We carried more analysis on the transcriptomes collected at 24 h post-inoculation and identified 406 specific AS events only occurred in +RB line inoculated P. infestans (Figure 2). These AS events involved 899 transcripts generated from 281 genes including 32 newly identified genes in this work. The functions of these genes are diverse and some of them may be involved in the regulation of potato resistance against this pathogen.

Identification of differentially expressed transcript isoforms (DETs)
Detailed analysis of differentially expressed genes (DEGs) among the treatments were reported by Gao and Bradeen (2016). As there were 30.2% genes alternatively spliced in this study, identifying DETs, in addition to identifying DEGs, was expected to provide a more accurate understanding of the responses to the inoculation of P. infestans in the two contrasting potato lines. The numbers of DETs identified between two lines and treatment (water-vs P. infestans-inoculation) comparisons at three different sampling time points, using an FDR threshold of 0.05 (Table 2).  At 0 h (pre-inoculation), we would expect there were no DETs in comparison of water and P. infestans treatments in either line of potato plants. However, there were 325 DETs identified in +RB line between water treatment and P. infestans treatment. In addition, there were 537 DETs identified in comparison of two lines used for P. infestans inoculation even pre-inoculation, in contrast, there was no DET identified in these two lines used for water treatments. Such a large discrepancy between these two comparisons was likely caused by sampling errors or unknown factors, although in theory there should be no difference as these samples were collected prior to inoculation of P. infestans (Table 2).
Interestingly, at 6 h post-inoculation of P. infestans DETs were not detected in the +RB line inoculated with P. infestans in comparing with water treatment, however, 973 DETs were detected at 24 h after inoculation in +RB line, while only 182 DETs were detected in wild-type plants for the same treatments (Table 2). Thus, we focused on analyzing the DETs of samples at 24 h and identified 766 DETs specific in the +RB line inoculated with P. infestans in comparing water treatment by removing DETs commonly found in other treatments at 24 h samplings ( Figure 3). We compared these 766 DETs with those 325 DETs identified at 0 h in +RB line, 732 of them were only found in the 24 h treatments. Further we compared these 766 DETs with 537 DETs identified between two lines pre-inoculation of P. infestans (0 h), 700 DETs were only found in the +RB lines at 24 h post-inoculation. Thus, we expected these DETs most likely represented the true +RB line responses to P. infestans inoculations. Among these 766 DETs, 347 of them were down-regulated and 419 of them were up-regulated, 505 of them had an adjusted p-value < 0.01. These DETs were generated from 763 genes, and interestingly, 338 of them were alternatively spliced. The functional annotation and their expression values of DETs can be found in the supplementary files available on our website.

Discussion
Plant gene expression is a highly regulated process including both differential gene expressions and alternative splicing that results in differential transcript expression. Gao and Bradeen (2016) collected the foliar transcriptomes in response to P. infestans inoculation of two lines of potato plants and analyzed DEGs. In this work, we used their data and identified genes subjecting to AS and their associated isoform transcripts in these two contrasting lines of potato plants. Particularly, we identified 406 specific AS events that were generated from 281 genes that occurred in +RB line sampled at 24 h post inoculation with P. infestans. Furthermore, among 776 DETs, about half of them were generated from genes subjecting to AS. The RB gene isolated in Solanum bulbocastanum, a wild potato species, encodes a polypeptide of 970 aa and belongs to the CC-NBS-LRR class of plant resistance genes (Song et al., 2003;Van Der Vossen et al., 2003). Plant NBS-LRR proteins detect pathogen-associated proteins which are most often the effector molecules of pathogens responsible for virulence (DeYoung and Innes, 2006). Cao et al. (2021) used the transcriptome dataset generated by Gao et al. (2013) from potato tubers inoculated with P. infestans and found 2,857 lncRNAs which included a total of 2,491 lncRNAs generated from 173 alternatively spliced genes, with 133 differentially expressed lncRNAs from the resistance lines (Cao et al., 2021). These results suggest that both AS and differential gene expression may contribute to the resistance against P. infestans in +RB line of potato plants. These DETs certainly warrant further detailed experimental investigation regarding the roles they may play in the resistance against P. infestans.
A large number of AS events were identified in potato plants in this analysis and the AS rate was estimated to be about 30.2%. Previously, we reported that about~65% of AS rate in tomato plants by integrating multiple sources of mRNA sequence sources including more than 20 published RNA-seq projects (Clark et al., 2019). We would expect a similar AS rate in potato plants when more RNA-seq data are used in the AS identification process. We need to point out that the isoform sequences are derived from assembling fragments of RNA-seq data which may not be accurate, thus these sequences need to be confirmed using specific primers to obtain full-length mRNAs. Our current work is expected to serve as a starting point for future incorporation of more RNA-seq data for identifying more AS genes and their associated isoform sequences in potato plants.

Authors' contributions
XJM designed the experiments. JAL collected and processed RNA-seq data for genome mapping and alternative splicing identification. Both JAL and XJM analyzed the data and prepared the manuscript.