FunSecKB2: a fungal protein subcellular location knowledgebase  

John Meinken1,4 , David K. Asch2,4 , Kofi A. Neizer-Ashun3 , Guang-Hwa Chang3 , Chester R.Cooper JR2,4 , Xiang Jia Min2,4
1. Department of Computer Science and Information Systems, Youngstown State University, OH 44555, USA
2. Department of Biological Sciences, Youngstown State University, Youngstown, OH 44555, USA
3. Department of Mathematics & Statistics, Youngstown State University, Youngstown, OH 44555, USA
4. Center for Applied Chemical Biology, Youngstown State University, Youngstown, OH 44555, USA
Author    Correspondence author
Computational Molecular Biology, 2014, Vol. 4, No. 7   doi: 10.5376/cmb.2014.04.0007
Received: 05 Aug., 2014    Accepted: 21 Sep., 2014    Published: 22 Oct., 2014
© 2014 BioPublisher Publishing Platform
This is an open access article published under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Preferred citation for this article:

Meinken et al., 2014, FunSecKB2: a fungal protein subcellular location knowledgebase, Computational Molecular Biology, Vol.4, No.6, 1-17 (doi: 10.5376/cmb.2014.04.0007)

Abstract

FunSecKB2 is an improved and updated version of the fungal secretome and subcellular proteome, i. e. protein subcellular location, knowledgebase. The fungal protein sequence data were retrieved from UniProtKB, consisting of nearly 2 million entries with 167 species having a complete proteome. The assignments of protein subcellular locations were based on curated information and prediction using seven computational tools. The tools used for subcellular location prediction include SignalP, WoLF PSORT, Phobius, TargetP, TMHMM, FragAnchor, and PS-Scan. Secreted proteins, i.e. secretomes, along with 15 other subcellular proteomes were predicted. The database can be searched by users using several different types of identifiers, gene name or keyword(s). A subcellular proteome from a species can be searched or downloaded. BLAST searching whole fungal protein data or secretomes is available. Community annotation of subcelluar locations based on experimental evidence is also supported. A primary analysis revealed that the secretome size of a fungal species is one of the determining factors to its lifestyle. The Gene Ontology and protein domain analysis of fungal secretomes revealed that fungal secretomes contain a large number of hydrolases, peptidases, oxidoreductases, and lysases, which may have potential applications in bio-processing of chemical wastes or biofuel production. The database provides an important and rich resource for the fungal community looking for protein subcellular location information and performing comparative subcellular proteome analysis.
 

Database URL: http://proteomics.ysu.edu/secretomes/fungi2/index.php 
Keywords
Computational prediction; Fungi; Secreted protein; Secretome; Signal peptide; Subcellular location; Subcellular proteome

Fungi play important roles in nature and in our daily life. In nature, fungal species serve as decomposers of biomass, which is critical for carbon and nutrient cycling. In our daily life, edible mushrooms are well-known examples of fungi. Saccharomyces cerevisiae, known as a baker’s yeast, is widely used in winemaking, baking and brewing. Some fungi are also known as producers for drugs, such as antibiotics. Fungal species are also important pathogens in insects, animals, human and plants.

Fungi belong to one of the four kingdoms of eukaryotic organisms. Fungal cells contain multiple subcellular compartments for performing different subcellular activities. For example, a mitochondrion, which is a membrane-enclosed structure, is mainly used to provide cellular energy; and a nucleus is a place for storing genetic materials and a site for controlling gene transcription. In this work, we define a secretome as all proteins secreted outside the plasma membrane in a species. These proteins include cell wall proteins, extracellular matrix proteins, and secreted soluble proteins that may serve as a hormone or signal molecule or an enzyme. However, the proteins in the secretory pathway machinery were not included, which is slightly different form the original definition of a secretome (Tjalsma et al., 2000; Lum and Min, 2011a). Secreted proteins in biotrophic fungi are identified as the main effectors responsible for pathogenic or symbiotic interactions between plants and fungi (Girard et al., 2013). Saprophytic fungi secrete a large number of families of hydrolytic enzymes such as glycoside hydrolases for breaking down complex biomaterials like lignin and cellulose (Martinez et al., 2004; Martinez et al., 2009; Murphy et al., 2011). Recently, along with complete genome sequencing of many fungi, identification and analysis of secretomes in fungi has been an important subject of research, using both computational and experimental approaches (Bouws et al., 2008). For example, the secretomes have been reported in following fungi including Aspergillus niger (Tsang et al., 2009; Braaksma et al., 2010), Aspergillus fumigatus (Powers-Fletcher et al., 2011), Candida albicans (Lee et al., 2003; Ene et al., 2012), Doratomyces stemonitis C8 (Peterson et al., 2011), Fusarium graminearum (Paper et al., 2007; Brown et al., 2012), Irpex lacteus (Salvachúa et al., 2013), Magnaporthe oryzae (Jung et al., 2012), Mycosphaerella graminicola (Morais et al., 2012), Paracoccidioides (a complex of several phylogenetic species) (Weber et al., 2012), Penicillium echinulatum (Ribeiro et al., 2012), Phanerochaete chrysosporium (Wymelenberg et al., 2005), Sclerotinia sclerotiorum (Yajima and Kav, 2006), Trichoderma harzianum (Do Vale et al., 2012), and Ustilago maydis (Mueller et al., 2008).
Two fungal specific secretome databases, the Fungal Secretome Database (FSD, http://fsd.snu.ac.kr/) and the Fungal Secretome Knowledgebase (FunSecKB, http://proteomics.ysu.edu/secretomes/fungi.php) have been constructed for the community to search fungal secretome related information (Choi et al., 2010; Lum and Min, 2011). FSD was constructed using a three-layer hierarchical identification rule based on 9 different programs (Choi et al., 2010). We developed the FunSecKB using 6 different tools for predicting secreted proteins from RefSeq data set of fungi (Lum and Min, 2011). However, since the release of FunSecKB, the available fungal protein data have been increased tremendously. In this work, we describe FunSecKB2, a fungal protein subcellular location knowledgebase, also known as the Fungal Secretome and Subcellular Proteome Knowledgebase (Version 2), that is, an expanded, updated, and improved version of FunSecKB. FunSecKB2 is constructed with a refined protocol for including curated subcellular information and predicted information on secretomes and other subcellular proteomes of 15 subcellular locations. This improved fungal protein knowledgebase is expected to serve as a central portal for providing information on fungal protein subcellular locations to users in the fungal research and industrial community who are interested in exploiting fungi for a global development of the bioeconomy (Lange et al., 2012).
1 Data Collection and Database Implementation
1.1 Data collection
The protein sequences for all fungi were retrieved from the UniProtKB/Swiss-Prot dataset and the UniProtKB/TrEMBL dataset (release 2013_08) (http://www.uniprot.org/downloads). The UniProtKB/ Swiss-Prot dataset contains manually annotated non-redundant protein sequences with information extracted from literature of experimental results and curator-evaluated computational analysis (The UniProt Consortium, 2014). The UniProtKB/TrEMBL contains protein sequences associated with computationally generated annotation and large-scale functional characterization. The dataset consisted of a total of 1,976,832 fungal proteins with 30,859 and 1,945,973 entries retrieved from the UniProtKB/ Swiss-Prot dataset and the TrEMBL dataset, respectively.
1.2 Methods for protein subcellular location assignment
The fungal protein sequences were processed using the following programs: SignalP (version 3.0 and 4.0, http://www.cbs.dtu.dk/services/SignalP/), (Bendtsen et al., 2004b; Petersen et al., 2011), Phobius (http://phobius.binf.ku.dk/) (Käll et al., 2007), WoLF PSORT (http://wolfpsort.org/) (Horton et al., 2007), and TargetP (http://www.cbs.dtu.dk/services/TargetP/) (Emanuelsson et al., 2007) for signal peptide and subcellular location prediction. These predictors were previously evaluated favorably and are widely used by the fungal secretome research community (Min, 2010). TMHMM (http://www.cbs.dtu.dk/services/TMHMM) was used to identify proteins having transmembrane domains (Krogh et al., 2001) and Scan-Prosite (called PS-Scan in standalone version) (http://www.expasy. org/tools/scanprosite/) was used to scan endoplasmic reticulum (ER) targeting sequence (Prosite: PS00014) (de Castro et al., 2006; Sigrist et al., 2010). For predicting membrane proteins using TMHMM, the entries having membrane domains not located within the N-terminus (the first 70 amino acids) were treated as real membrane proteins. Protein sequences predicted to have a signal peptide by SignalP (version 3) were further processed using the FragAnchor webserver to identify the glycosylphosphatidyinositol (GPI) anchors (http://navet.ics.hawaii.edu/~fraganchor/ NNHMM/NNHMM.html) (Poisson et al., 2007). With the exception of FragAnchor, we used the standalone tools installed on a local Linux system for data processing. The commands for how to run these tools often can be found in the “readme” page in each downloaded package and were summarized by Lum and Min (2013).
The categories of fungal protein subcellular locations include: secreted proteins, mitochondrial (membrane or non-membrane), ER (membrane or lumen), cytosol (cytoplasm), cytoskeleton, Golgi apparatus (membrane or lumen), nuclear (membrane or non-membrane), vacuolar (membrane or non-membrane), lysosome, peroxisome, plasma membrane, and other membrane proteins. For assigning a protein subcellular location, the UniProtKB annotation and our curated subcellular information was considered prior to using prediction information. For proteins not having annotated subcellular information, their subcellular location assignments are based on prediction. Our recent accuracy evaluation of the computational tools revealed that the highest prediction accuracy (92.1% in sensitivity and 98.9% in specificity) for fungal secretomes was achieved by combining SignalP, WoLF PSORT, and Phobius for signal peptide prediction, with TMHMM for eliminating membrane proteins and PS-Scan for removing ER targeting proteins (Min, 2010). Thus, the secretome was limited to include manually curated secreted proteins and proteins predicted having a signal peptide at their N-terminus by all the three programs but not having a transmembrane domain or an ER targeting signal. In this work, SignalP4 is used to replace SignalP3 as SignalP4 improves the prediction accuracy (Petersen et al., 2011; Melhem et al., 2013). However, the information generated by SignalP3 was also included as it predicts signal peptide cleavage sites more accurately than SignalP4 (Petersen et al., 2011). The detailed methods for assigning a protein subcellular location are described below.
Secreted protein
Secreted proteins are further divided as curated secreted proteins, highly likely secreted, likely secreted, and weakly likely secreted proteins. Curated secreted proteins include proteins that are annotated to be “secreted” or “extracellular” or “cell wall” in subcellular location from the UniProtKB/Swiss-Prot data set which are “reviewed”. It also includes manually collected secreted proteins from recent literature by our curators. Three predictors consisting of SignalP4, Phobius, and WoLF PSORT are used for protein secretory signal peptide or subcellular location prediction. The highly likely secreted, likely secreted, and weakly likely secreted proteins are proteins that are predicted to be secreted or contain a secretory signal peptide by three, two, or one of the three predictors, respectively. These proteins do not have a transmembrane domain or an ER retention signal.
ER proteins
ER proteins were predicted by WoLF PSORT and PS-Scan. Proteins predicted to contain a signal peptide by SignalP 4.0 and an ER target signal (Prosite: PS00014) by PS-Scan were treated as luminal ER proteins. Further, if they contain one or more transmembrane domains, they are classified as ER membrane proteins.
GPI-anchored proteins
Signal peptide containing proteins that were predicted to have a GPI anchor by FragAnchor were further classified as GPI-anchored proteins. Protein sequences predicted to have a signal peptide and a GPI anchor may attach to the outer leaflet of the plasma membrane or be secreted becoming components of the cell wall.
Proteins in other subcellular locations
Other subcellular locations including mitochondria, cytosol (cytoplasm), cytoskeleton, Golgi apparatus, lysosome, nucleus, peroxisome, plasma membrane and vacuole proteins were predicted by WoLF PSORT. For proteins predicted as located in mitochondria, Golgi apparatus, nucleus, and vacuole, if a protein contains one or more transmembrane domain, it is further classified as a membrane protein in that specific subcellular location.
1.3 Database implementation
The data were stored in a relational database using MySQL hosted in a Linux server. The user interface and modules to access the data were implemented using PHP. BLAST utility and community annotation submission can be accessed from links on the main user interface at http://proteomics.ysu.edu/secretomes/ fungi2/index.php. The Supplementary Tables and other data described in the work can be downloaded at http://proteomics.ysu.edu/publication/data/FunSecKB2/.
2 Results
2.1 Evaluation of prediction accuracies of protein subcellular locations
The prediction methods we employed as described above were based on our previous evaluation of computational tools (Min, 2010; Meinken and Min, 2012; Melhem et al., 2013). To further estimate the prediction accuracies of our methods for each subcellular location in this dataset we retrieved 14884 proteins having an annotated, unique subcellular location from UniProtKB/Swiss-Prot set. Proteins having multiple subcellular locations or labeled as “fragment” were excluded. The prediction accuracies were measured as the sensitivity, the specificity, and Matthews correlation coefficient (MCC) based on formulas used previously (Min, 2010). The accuracy results are shown in Table 1. The prediction accuracies from plasma membrane and lysosome were not included as the numbers of positive proteins were too few (<20). In comparing with methods using a single tool, our method - i.e. using a combination of multiple tools including SignalP 4.0, WoLF PSORT and Phobius for secretory signal peptide prediction and PS-Scan for removing ER proteins and TMHMM for removing membrane proteins - significantly improved the prediction accuracy for secretomes (Min, 2010; Meinken and Min, 2012). For prediction of secretome size in a given species, the predicted set of highly likely secreted proteins would provide a relatively accurate estimation as this method has the highest specificity (>0.99), and interestingly, the number of false negatives is close to the number of false positives in the dataset used for evaluation. Including the predicted likely secreted protein set into a secretome only slightly decreased the MCC value as only a small number of entries belong to this category. However, the predicted set of weakly likely secreted proteins needs to be treated with caution as the number of false positives was far more than the number decrease of the false negatives (Table 1).

 

Table 1 Evaluation of prediction accuracies of fungal protein subcellular locations


We also compared the accuracy of mitochondrial proteins predicted by WoLF PSORT and TargetP. We found that the MCC values were 0.67 for WoLF PSORT and 0.56 for TargetP, and we also found using both tools increased the mitochondrial protein prediction specificity, from 0.93 using WoLF PSORT only to >0.98 when both were used. However, using both tools did not improve the MCC value due to the decrease in prediction sensitivity. Thus, we selected WoLF PSORT for assigning mitochondrial proteins. However, a user should be aware that if both WoLF PSORT and TargetP predicted the protein is a mitochondrial protein, the prediction is more reliable than prediction just from one of them.
The prediction accuracies for other subcellular locations vary significantly. Prediction of nuclear proteins had 0.85 in sensitivity, 0.71 in specificity, and 0.53 in MCC. The accuracies for other subcellular locations including the ER, Golgi apparatus, vacuole, peroxisome, cytoplasm, and cytoskeleton were very low in MCC (<0.4) (Table 1). However, it should be noted that the low accuracies were caused by very low sensitivities, and in fact, the specificities were relatively high (>0.98). Thus, there are a good number of proteins located in these subcellular locations that cannot be predicted. However, if a protein is predicted to be located in such a location, the prediction is most likely correct. Nonetheless, the accuracies for predicting these subcellular locations of fungal proteins need to be improved.
2.2 Overview of subcellular proteome distribution in different species
The database contains predicted subcellular location information of proteins generated from 16554 fungal species or varieties (strains) with 189 of them each having at least 1000 protein entries. The species names, some of which may have more than one strain or variety, can be found on the user interface, which facilitate species specific searching or downloading. Species having <1000 protein entries can also searched with a species name provided by the user. The distributions of subcellular proteomes in different fungal species are summarized in Table 2 and Supplementary Table 1. Table 2 includes the following subcellular locations: secreted proteins (4 subcategories), mitochondrial membrane and mitochondrial non-membrane, cytoplasm (cytosol), cytoskeleton, nuclear membrane and nuclear non-membrane, plasma membrane, and glycosylphosphatidylinositol (GPI) anchored proteins. The category of secreted proteins includes the following subcategories: curated secreted, highly likely secreted, likely secreted, and weakly secreted proteins. Information on other subcellular protein locations including endoplasmic reticulum (membrane or lumen), Golgi apparatus (membrane or lumen), lysosome, peroxisome, vacuole (membrane or non-membrane), other membrane, and other curated locations can be found in Supplementary Table 1.

 

Table 2 Summary of some major subcullar locations of proteins in different fungal different speces. Data of other subcellular locations of fungal proteins are in Supplementary Table 1


The variability of genome sizes and thus the proteome sizes is pretty large in different fungal species. However, it should be noted that in the database, as showed in Table 2, the total proteins of a given species is not necessarily the proteome size, but rather a collection of all proteins available from the species. For example, for Saccharomyces cerevisiae, its reference proteome size as compiled UniProtKB consists only of 6,621 proteins, there are a total of 79,093 proteins in our database under the name of Saccharomyces cerevisiae, thus obviously consisting of proteins obtained from multiple strains. The subcellular distributions of fungal proteins were estimated based on the pooled data for each phylum for Ascomycota, Basidiomycota and Microsporidia. Interestingly, we found that the nucleus represents the largest compartment for protein destination: 39.2% in Ascomycota, 39.2% in Basidiomycota, and 57.4% in Microsporidia, respectively, were predicted to be located in the nucleus. Mitochondria represent another large compartment for protein targeting: 19.5% in Ascomycota, 21.1% in Basidiomycota, and 16.7% in Microsporidia, respectively, were located in mitochondria. Approximately 18 – 21% of proteins are located in cytosol or cytoplasm. The proportions of secretomes vary from 0.3% to 10.5% with an average of 4.6% in Ascomycota, from 1.9% to 7.4% with an average of 4.4% in Basidiomycota, and from 0.5% to 1.7% with an average of 0.9% in Microsporidia, respectively. However, here the secretome is limited to including curated secreted proteins and highly likely secreted proteins, thus the number represents a lower bound of a species secretome. Including other proteins predicted as likely secreted and weakly likely secreted proteins, the size of secretome certainly will be significantly increased, but there would be an increase in the number of false positives, i.e., non-secreted proteins in the set.
2.3 Relationship of lifestyle and secretome size in different fungi
Similar to our previous analysis in FunSecKB work (Lum and Min, 2011), the secretome size (Y) was highly correlated with its proteome size (X) in a species (r = 0.87) with a regression as Y = 0.081X - 271. (Figure 1). However, species having different lifestyles showed differences in secretome size and proportion of secreted proteins. Lowe and Howlett (2012) examined the relationship between lifestyle and secretome size and found that fungi with biphasic lifestyle have a large proportion of secreted proteins and animal pathogens have fewer genes than saprophytes or plant interacting fungi do, and a lower proportion of predicted secreted. In the work of Lowe and Howlett (2012), the secretome prediction was only used SignalP, and thus, its size may be over estimated. Using the data we collected in this work, we examined the relationship between fungal lifestyles and their secretome sizes (Figure 1, Supplementary Table 2). As the data for each species in the database contain redundant or duplicated protein entries, we only used the proteins in datasets of reference or complete proteomes compiled by UniProt (http://www.uniprot.org/taxonomy/complete-proteomes). We collected species having a complete proteome and a lifestyle in the category of animal or/and human pathogen, plant pathogen, and saprophyte. Some of them may be classified into more than one category and these entries are annotated (see Supplementary Table 2). In general agreement with Lowe and Howlett (2012) reported, human and animal pathogens, including entomopathogens and some nematode killing fungal parasites have a relatively smaller proteome size – the majority of them have <12000 protein sequences, some of them are known as Microsporidian parasites having a genome encoding a total of 2000 - 4000 proteins, with less than 1% of them being secreted (Figure 1). The proportion of secreted proteins varied from 0.3 to 7.9% with an average of 2.8% in human/animal pathogens. On other hand, plant pathogens and saprophytes have much more variable proteome sizes from ~ 4000 to 18000 and a relatively higher proportion of secreted proteins, though variable, from 1.3 to 7.1% with an average of 4.2% in saprophytes and from 1.7 to 10.5% with an average of 6.3% in plant pathogens. Clearly, these results show that secretome size is one of the important determining factors in controlling fungal lifestyles. However, as species having a similar size of secretome may have different lifestyles, the composition within each secretome may play a more critical role in determining its lifestyle in each species.

 

Figure 1 Relationship between proteome size and secretome size in fungal species having different lifestyles


2.4 Functional analysis of fungal secreted proteins
To provide an overview of the functionalities of all fungal secreted proteins, we carried out Gene Ontology (GO) analysis. The secreted protein set including curated and predicted highly likely secreted proteins only was used to search the UniProtKB/Swiss-Prot dataset with BLASTP with a cutoff E-value of 1e-10. GO information was retrieved from UniProt ID mapping data (http://www.uniprot. org/downloads) and analyzed using GO SlimViewer with generic GO terms (McCarthy et al., 2006). GO biological process and molecular function classification of the secretomes are summarized in Table 3. Molecular function classification revealed that fungal secreted proteins consist of a large number of hydrolases (~33.7%), proteins having ion binding activity (21.1%), peptidase (15.7%), oxidoreducatases (14%), and some other enzymatic activities. Fungal secreted proteins are involved in more than 60 different biological processes. The main biological processes include catabolic process (24.6%), carbohydrate (22.0%) or lipid (4.0%) metabolic process, cell wall organization or biogenesis (6.4%), response to stress, small molecule and nitrogen metabolic process, etc. It should be noted that GO classification was only an estimate of the distributions of each category as ~54% of the predicted secreted proteins do not have GO annotation information.

 

Table 3 Gene Ontology (GO) classification of fungal secreted proteins


We further categorized the functions of predicted secreted fungal proteins using the rpsBLAST tool to search the Pfam database with a cutoff E-value of 1e-10. Among a total of 93430 predicted secreted proteins, 43953 protein sequences have a Pfam match and a total of 880 protein families were detected. The summary of the Pfam analysis with 33 highly encoded secreted protein families in fungi is shown in Table 4. A complete list can be downloaded (http://proteomics.ysu.edu/publicaiton/ data/). The top 10 highly encoded secreted protein families in fungi were eukaryotic aspartyl protease, carboxylesterase family, FAD binding domain containing family, subtilase family, glycosyl hydrolase family 61, glycosyl hydrolases family 28, glycosyl hydrolases family 18, GMC oxidoreductase, serine carboxypeptidase, and glycosyl hydrolase family 3. These proteases identified here such as aspartyl protease, subtilase, and other peptidase families are likely to be required for synergistic degradation of the proteins present in the various growth medium or substrate materials in the environments (Druzhinina et al. 2012; Girard et al. 2013). GO analysis and functional domain analysis are consistent in showing these proteins are mainly involved in biodegrading complex bio-molecules including carbohydrates, proteins, lipids, and other molecules.

 

Table 4 Highly encoded secreted protein families in fungi


3 Discussion
We constructed the fungal protein subcellular location database and named it Fungal Secretome and Subcellular Proteome Knowledgebase (FunSecKB2). Comparing with FunSecKB (Lum and Min 2011), the number of total protein entries increased from 478,073 in FunSecKB to 1,976,832 in FunSecKB2, and the number of fungal species including different varieties and strains having a complete proteome increased from 52 in FunSecKB to 210 in FunSecKB2. The subcellular locations in FunSecKB2 were also expanded to include not only secretomes but also other subcellular locations including mitochondria, cytosol (cytoplasm), cytoskeleton, Golgi apparatus, lysosome, nucleus, peroxisome, plasma membrane and vacuole. In addition, for the secretomes, we further classified them as curated, predicted to be highly likely secreted, likely secreted, and weakly likely secreted protein subsets. This refinement of classifications of secreted proteins and other subcellular locations would greatly enhance comparative analysis of subcellular proteomes in different species. However, as the protein sequence data were obtained from the UniProtKB and some duplicated entries are present, thus for proteome-wide analysis for a given species the non-redundant reference or complete proteome dataset needs to be used and that can be downloaded at UniProt (http://www.uniprot.org/taxonomy/complete-proteomes). It also should be noted that for a given species in the list if no specific strain or sub-genotype is specified, the entries for that specific species included all available proteins from the species.
We also provided the BLAST tool to allow users to search all fungal protein data or the predicted fungal secreted protein data with their own protein sequences. This utility facilitates identifying protein homologs with their potential subcellular locations. Otherwise, for any anonymous protein sequence users can predict protein subcelluar locations using the tools we have used in this work. Other available tools for prediction of secretomes and other protein subcellular locations were summarized by Meinken and Min (2012) and Caccia et al. (2013). Recently Cortázar et al. (2014) implemented a webserver, named SECRETOOL, which integrated several tools for predicting fungal secretomes. As some of the tools implemented in the server are the same tools as we used, we expect the server generates fairly reliable results for fungal secretome prediction, thus, it is particularly useful for newly generated proteomes (Cortázar et al., 2013; Lum and Min, 2011). In addition, another available database, named the fungal secretome database (FSD), which was constructed using a slightly different suite of tools, may provide extra subcellular location information for these fungal proteins (Choi et al., 2010).
Fungal species have a secretome adapted to their environment and the selection pressure exerted by environmental constrains led to the species with varying complexity in their secretome compositions (Girad et al., 2013; Alfaro et al., 2014). Depending on the lifestyle, fungal species which belong to saprotrophs mainly have degrading hydrolases in their secretomes, biotrophic species have both degrading hydrolases and compatibility effectors, mycorrhiza species have degrading hydrolases, compatibility effectors, and exchange effectors, and necrotrophic species have degrading hydrolases and killing effectors (Girad et al., 2013, Alfaro et al., 2014). The basal secretome contains generally two pools of proteins: a large proportion represented by the polysaccharide degrading enzymes, i.e. hydrolases acting on glycosyl bonds, and a minor part including the proteases, lipases, and oxidoreductases, etc. (see Table 3). In this work, the secretome identification was limited to classical secreted proteins, i.e., signal peptide containing proteins, and curated proteins which may include both classical and leadless secreted proteins (LSP). SecretomeP was a tool implemented for predicting these LSPs in bacteria and mammals (http://www.cbs.dtu.dk/services/SecretomeP/) (Bendtsen et al., 2004a). Because the tool has not been trained with fungal data and the prediction accuracy could not be evaluated, we did not include this tool in our data processing. We would like to request the fungal research community to submit fungal protein subcellular locations, particularly LSPs, with experimental evidence traceable from literature to the database. Genome-wide computational prediction of a secretome for a species provides the first step for experimental validation and characterization of secreted proteins under various changing environments or culture conditions (Alfaro et al., 2014). Along with our published plant secretome and subcellular proteome knowledgebase (PlantSecKB) (Lum et al., 2014), we expect that FunSecKB2 will serve the community a useful resource for genome-wide comparative analysis and for further exploring the potential applications of fungal secreted proteins in biofuel production, environmental remediation, and prevention and treatment of plant and human fungal pathogens.
Authors' contributions
JM implemented the database, DA collected the lifestyle data, KA and GZ participated in method development, XJM and CC conceived of the study, designed the procedure of data processing. XJM, JM, DA and CC analyzed the data and prepared the manuscript. All authors read and approved the final manuscript.
Acknowledgements
We thank Gengkon Lum and Dr. Feng Yu for their assistance in maintaining the server and Jessica Orr and Stephanie Frazier for manually curating secreted proteins.
Funding
The work is funded by Youngstown State University (YSU) Research Council. The work is also supported by a YSU Research Professorship award and the College of Science, Technology, Engineering, and Mathematics Dean’s reassigned time for research to XJM. JM was supported with a graduate research assistantship by the Center for Applied Chemical Biology at YSU.

References
Alfaro M., Oguiza J.A., Ramírez L. et al. 2014, Comparative analysis of secretomes in basidiomycete fungi, J Proteomics, 102C: 28-43
http://dx.doi.org/10.1016/j.jprot.2014.03.001

Bendtsen J.D., Jensen L.J., Blom N. et al. 2004a, Feature based prediction of non-classical and leaderless protein secretion, Protein Eng Des Sel, 17: 349-356
http://dx.doi.org/10.1093/protein/gzh037

 

Bendtsen J.D., Nielsen H., von Heijne, G. et al. 2004b, Improved prediction of signal peptides: SignalP 3.0, J Mol Biol, 340: 783-795
http://dx.doi.org/10.1016/j.jmb.2004.05.028

Bouws H., Wattenberg A. and Zorn H, 2008, Fungal secretomes-nature's toolbox for white biotechnology. Appl. Microbiol. Biotechnol. 80: 381-388
http://dx.doi.org/10.1007/s00253-008-1572-5

Braaksma M., Martens-Uzunova E.S., et al. 2010, An inventory of the Aspergillus niger secretome by combining in silico predictions with shotgun proteomics data, BMC Genomics, 19: 584
http://dx.doi.org/10.1186/1471-2164-11-584

Brown N.A., Antoniw J., and Hammond-Kosack K.E., 2012, The predicted secretome of the plant pathogenic fungus Fusarium graminearum: a refined comparative analysis, PLoS One, 7: e33731
http://dx.doi.org/10.1371/journal.pone.0033731

 

Caccia D., Dugo M., Callari M., et al. (2013) Bioinformatics tools for secretome analysis, Biochim. Biophys. Acta., 1834: 2442-2453
http://dx.doi.org/10.1016/j.bbapap.2013.01.039

Choi J., Park J., Kim D., et al. 2010, Fungal secretome database: integrated platform for annotation of fungal secretomes, BMC Genomics, 11: 105
http://dx.doi.org/10.1186/1471-2164-11-105

 

Cortázar A.R., Aransay A.M., Alfaro M., et al. 2014, SECRETOOL: integrated secretome analysis tool for fungi, Amino Acids, 46: 471-473
http://dx.doi.org/10.1007/s00726-013-1649-z

De Castro E., Sigrist C.J., Gattiker A., et al. 2001 ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins, Nucleic Acids Res., 34: W362-365

Do Vale L.H., Gómez-Mendoza D.P., Kim M.S., et al. 2012, Secretome analysis of the fungus Trichoderma harzianum grown on cellulose, Proteomics, 12: 2716-2728
http://dx.doi.org/10.1002/pmic.201200063

 

Druzhinina I.S., Shelest E., and Kubicek C.P., 2012, Novel traits of Trichoderma predicted through the analysis of its secretome, FEMS Microbiol Lett., 337: 1-9
http://dx.doi.org/10.1111/j.1574-6968.2012.02665.x

Emanuelsson O., Brunak S., von Heijne G., et al. 2007, Locating proteins in the cell using TargetP, SignalP and related tools, Nat. Protoc., 2: 953-971
http://dx.doi.org/10.1038/nprot.2007.131

 

Ene I.V., Heilmann C.J., Sorgo A.G., et al. (2012) Carbon source-induced reprogramming of the cell wall proteome and secretome modulates the adherence and drug resistance of the fungal pathogen Candida albicans, Proteomics, 12: 3164-3179
http://dx.doi.org/10.1002/pmic.201200228

Girard V., Dieryckx C., Job C. et al. 2013, Secretomes: the fungal strike force, Proteomics, 13: 597-608
http://dx.doi.org/10.1002/pmic.201200282

Horton P., Park K.-J., Obayashi T., et al. 2007, WoLF PSORT: protein localization predictor, Nucleic Acids Res., 35: W585-587
http://dx.doi.org/10.1093/nar/gkm259

 

Jung Y.H., Jeong S.H., Kim S.H., et al. 2012, Secretome analysis of Magnaporthe oryzae using in vitro systems, Proteomics, 12: 878-900
http://dx.doi.org/10.1002/pmic.201100142

Käll L., Krogh A., and Sonnhammer E.L.L., 2007, Advantages of combined transmembrane topology and signal peptide prediction - the Phobius web server, Nucleic Acids Res., 35: W429-432
http://dx.doi.org/10.1093/nar/gkm256

Krogh A., Larsson B., von Heijne G., et al. 2001, Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes, J. Mol. Biol., 305: 567-580
http://dx.doi.org/10.1006/jmbi.2000.4315

Lange L., Bech L., Busk P.K., et al. 2012, The importance of fungi and of mycology for a global development of the bioeconomy, IMA Fungus, 3: 87-92
http://dx.doi.org/10.5598/imafungus.2012.03.01.09

Lee S.A., Wormsley S., Kamoun S., et al. 2003, An analysis of the Candida albicans genome database for soluble secreted proteins using computer-based prediction algorithms, Yeast, 20: 595-610
http://dx.doi.org/10.1002/yea.988

 

Lowe R.G., and Howlett B.J., 2012, Indifferent, affectionate, or deceitful: lifestyles and secretomes of fungi, PLoS pathogens, 8: e1002515
http://dx.doi.org/10.1371/journal.ppat.1002515

Lum G., and Min X.J., 2011, FunSecKB: the fungal secretome knowledgebase, Database (Oxford), 2011, doi: 10.1093/database/bar001
http://dx.doi.org/10.1093/database/bar001

Lum G., and MinX.J., 2013, Bioinformatic protocols and the knowledge-base for secretomes in fungi, In: Gupta V.K., Tuohy M.G., Ayyachamy M., Turner K.M. and O’Donovan A. (eds), Laboratory Protocols in Fungal Biology: Current Methods in Fungal Biology, Springer, pp 545-557
http://dx.doi.org/10.1007/978-1-4614-2356-0_54

Lum G., Meinken J., Orr J., et al. 2014, PlantSecKB: the plant secretome and subcellular proteome knowledgebase. Comput. Mole. Biol., 4: 1-17

Martinez D., Challacombe J., Morgenstern I., et al. 2009, Genome, transcriptome, and secretome analysis of wood decay fungus Postia placenta supports unique mechanisms of lignocellulose conversion, Proc Natl Acad Sci U S A,106: 1954-1959
http://dx.doi.org/10.1073/pnas.0809575106

Martinez D., Larrondo L.F., Putnam N., et al. 2004, Genome sequence of the lignocellulose degrading fungus Phanerochaete chrysosporium strain RP78, Nat Biotechnol. 22: 695-700
http://dx.doi.org/10.1038/nbt967
http://dx.doi.org/10.1038/nbt0704-899b
http://dx.doi.org/10.1038/nbt0704-899a

McCarthy F.M., Wang N., Magee G.B., et al. 2006, AgBase: a functional genomics resource for agriculture, BMC Genomics, 7: 229
http://dx.doi.org/10.1186/1471-2164-7-229

Meinken J., and Min X.J., 2012, Computational prediction of protein subcellular locations in eukaryotes: an experience report, Comput. Mole. Biol., 2: 1-7

 

Melhem H., Min X.J., and Butler G., 2013, The impact of SignalP 4.0 on the prediction of secreted proteins. IEEE Symposium Series on Computational Intelligence (IIEEE SSCI 2013): The 10th annual IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, Singapore, pp.16-22 (doi: 10.1109/CIBCB.2013.6595383)

Min X.J., 2010, Evaluation of computational methods for secreted protein prediction in different eukaryotes, J. Proteomics Bioinform., 3: 143-147

Morais do Amaral A., Antoniw J., Rudd J.J., et al. 2012, Defining the predicted protein secretome of the fungal wheat leaf pathogen Mycosphaerella graminicola, PLoS One. 7: e49904
http://dx.doi.org/10.1371/journal.pone.0049904

 

Mueller O., Kahmann R., Aguilar G., et al. 2008, The secretome of the maize pathogen Ustilago maydis, Fungal Genet. Biol., 1: S63-S70
http://dx.doi.org/10.1016/j.fgb.2008.03.012

Murphy C., Powlowski J., Wu M., et al. 2011, Curation of characterized glycoside hydrolases of fungal origin, Database (Oxford). 2011, doi: 10.1093/database/bar020
http://dx.doi.org/10.1093/database/bar020

Paper J.M., Scott-Craig J.S., Adhikari N.D., et al. 2007, Comparative proteomics of extracellular proteins in vitro and in planta from the pathogenic fungus Fusarium graminearum, Proteomics, 7: 3171-3183
http://dx.doi.org/10.1002/pmic.200700184

PetersenT.N., Brunak S., von Heijne G., et al. 2011, SignalP 4.0: discriminating signal peptides from transmembrane regions, Nature Methods, 8: 785-786
http://dx.doi.org/10.1038/nmeth.1701

 

Peterson R., Grinyer J., and Nevalainen H., 2011, Secretome of the coprophilous fungus Doratomyces stemonitis C8, isolated from koala feces, Appl. Environ. Microbiol., 77: 3793-3801
http://dx.doi.org/10.1128/AEM.00252-11

Poisson G., Chauve C., Chen X., et al. 2007, FragAnchor a large scale all Eukaryota predictor of Glycosylphosphatidylinositol-anchor in protein sequences by qualitative scoring, Genomics Proteomics Bioinform., 5: 121-130
http://dx.doi.org/10.1016/S1672-0229(07)60022-9

Powers-Fletcher M.V., Jambunathan K., Brewer J.L., et al. 2011, Impact of the lectin chaperone calnexin on the stress response, virulence and proteolytic secretome of the fungal pathogen Aspergillus fumigatus, PLoS One, 6: e28865
http://dx.doi.org/10.1371/journal.pone.0028865

Ribeiro D.A., Cota J., Alvarez T.M., et al. 2012, The Penicillium echinulatum secretome on sugar cane bagasse, PloS One, 7: e50571
http://dx.doi.org/10.1371/journal.pone.0050571

 

Salvachúa D., Martínez A.T., Tien M, et al. 2013, Differential proteomic analysis of the secretome of Irpex lacteus and other white-rot fungi during wheat straw pretreatment, Biotechnol. Biofuels. 6: 115
http://dx.doi.org/10.1186/1754-6834-6-115

Sigrist C.J.A., Cerutti L., de Casro E., et al. 2010, PROSITE, a protein domain database for functional characterization and annotation, Nucleic Acids Res., 38: 161-166
http://dx.doi.org/10.1093/nar/gkp885

 

The UniProt Consortium, 2014, Activities at the Universal Protein Resource (UniProt). Nucleic Acids Res., 42:D191-198
http://dx.doi.org/10.1093/nar/gkt1140

Tjalsma H., Bolhuis A., Jongbloed J.D., et al. 2000, Signal peptide-dependent protein transport in Bacillus subtilis: a genome-based survey of the secretome, Microbiol. Mol. Biol. Rev., 64: 515-547
http://dx.doi.org/10.1128/MMBR.64.3.515-547.2000

 

Tsang A., Butler G., Powlowski J., et al. 2009, Analytical and computational approaches to define the Aspergillus niger secretome, Fungal Genetics Biol., 46:S153-160
http://dx.doi.org/10.1016/j.fgb.2008.07.014

Weber S.S., Parente A.F.A., Borges C.L., et al. 2012, Analysis of the secretomes of Paracoccidioides mycelia and yeast cells, PloS One, 7: e52470
http://dx.doi.org/10.1371/journal.pone.0052470

 

Wymelenberg A.V., Sabat G., Martinez D., et al. 2005, The Phanerochaete chrysosporium secretome: database predictions and initial mass spectrometry peptide identifications in cellulose-grown medium, J. Biotechnol., 118: 17-34
http://dx.doi.org/10.1016/j.jbiotec.2005.03.010

Yajima W., and Kav N.N., 2006, The proteome of the phytopathogenic fungus Sclerotinia sclerotiorum, Proteomics, 6: 5995-6007
http://dx.doi.org/10.1002/pmic.200600424
 

Computational Molecular Biology
• Volume 4
View Options
. PDF(2386KB)
. FPDF(win)
. HTML
. Online fPDF
Associated material
. Readers' comments
Other articles by authors
. John Meinken
. David K. Asch
. Kofi A. Neizer-Ashun
. Guang-Hwa Chang
. Chester R.Cooper JR
. Xiang Jia Min
Related articles
. Computational prediction
. Fungi
. Secreted protein
. Secretome
. Signal peptide
. Subcellular location
. Subcellular proteome
Tools
. Email to a friend
. Post a comment