Linking codon usage bias to functional genomics in pigs

Khobondo  J. O.; Ngeno  K.; Kahi  A. K.

Linking codon usage bias to functional genomics in pigs

Khobondo J. O.

, Ngeno K.

, Kahi A. K.

Animal Breeding and Genomics Group, Department of Animal Sciences, Egerton University, PO Box 536 -20115 Egerton. Kenya

Author

Correspondence author
Genomics and Applied Biology, 2015, Vol. 6, No. 6 doi: 10.5376/gab.2015.06.0006
Received: 15 Jul., 2015 Accepted: 16 Aug., 2015 Published: 15 Sep., 2015

This is an open access article published under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Preferred citation for this article:

Khobondo J. O., Ngeno K. and Kahi A.K., 2015, Linking codon usage bias to functional genomics in pigs, Genomics and Applied Biology, Vol.6, No.6, 1-7 (doi: 10.5376/gab.2015.06.0006)

Abstract

The recent completion of a high-quality draft genome of Sus scrofa has enabled the detailed investigation of a variety of genomics features. There have been attempts to link genotypic variation to phenotypic variation in different animals including pigs using single nucleotide polymorphisms, copy number variation and decipher codon usage bias. The prevalence of codon usage has never been ascertained in any animal, this study therefore link codon usage bias to gene ontology enrichment. The genome CDS sequence was downloaded from Ensemble v68 Sus scrofa build 10.2 using BioMart (Ensembl v68). A total of 21,550 CDS with more than 50 amino acids (150 bp) were used to derive genomic codon adaptation index (proxy for codon usage bias) using an in house built perl script. Five percent low and highly codon usage biased gene were extracted. BinGO v2.44 within Cytoscape v.2.8.3 was used to identify enriched gene ontology terms using human gene annotation as background and validated by perl script. Gene ontology terms related to immune response and sensory perception were linked to lowly codon usage bias. The highly codon usage bias genes were overrepresented in gene ontology terms involved in housekeeping functions of cells. Codon usage bias controls functional genomics.

Keywords

Codon usage; Gene enrichment; Functional genomic; Pigs

1 Introduction
Pigs have been important in agriculture and welfare for thousands of years. The recent completion of a high-quality draft genome of Sus scrofa (Groenen et al., 2012) enables the detailed investigation of a variety of genomics features. For example, there are attempts to link genotypic variation to phenotypic variation in different animals including pigs. Advances in molecular genetics from protein markers to single nucleotide polymorphisms (SNPs) and copy number variation (CNV) have shown drastic effects on phenotype (Freeman et al., 2006) however, these types of variation are unlikely to solely explain the large phenotypic diversity found at the inter and intra specific level (Paudel et al., 2013). Structural variations (SVs) like copy number variations (CNVs) have shown to play a prominent role in phenotypic evolution, adaptation and domestication of pigs (Paudel et al., 2013). Among the genetic variations, the advent of next generation sequencing methods has further allowed for a comprehensive screen of variation in codon usage bias (CUB) preference. Studying the degeneracy of genetic code, which enables most amino acids to be coded by more than one codon called 'synonymous' codon (Wright, 1990) has been done in pigs (Khobondo et al., 2015). Huge interspecific and even intragenomic variation in codon usage within and between genomes has been documented as well (Jia et al., 2009). Several biological factors such as tRNA abundance (Kanaya et al., 2001), strand specific mutational bias, replicational, transcriptional and translational selection (Hershberg and Petrov, 2008), secondary structure of proteins, mRNA structure, GC composition (Knight et al., 2001), genomic composition factors (Khobondo et al., 2015) and environmental factors (Basak and Ghosh, 2005) have been reported to influence the synonymous codon usage in various organisms. The afore mentioned factors led to two hypotheses on the evolution of codon bais; mutation bias and natural selection for translation accuracy and efficiency respectively (Sharp et al., 2005). The mutational bias hypothesis predicts that genes in the GC-rich regions of the genome preferentially use G- and C-ending codons, while those in the AT-rich regions use A- and T-ending codons (Zhang et al., 2009) as observed in mammals. Khobondo et al., (2015) confirmed the existence of codon usage bias in the porcine genome which might suggest there is weak selection of preferred codons for translation accuracy. The codon usage bias was influenced subtle by nucleotide composition factors (GC, GC3, CDS length) among others. In the study, there was a negative correlation between genomic codon adaptation index (gCAI), a proxy of codon usage bias and GC content or GC3s. However, this finding contradicted other findings (Hershberg and Petrov, 2010) and was attributed to the difference in the genome isochore structure, ambiguity (vary with space and time) of the gene expression in mammals, or due to difference in methodology of calculating codon adaptation index (CAI) variants. The negative correlation was reported between gCAI of pig and gene length and was consistent with other reports in organism such as yeast, Caenorhabditis elegans, Drosophila melanogaster, Arabidopsis thaliana, Populus tremula and Silene latifolia (Qiu et al., 2011). This correlation shows that metabolic systems prefer to express those genes that are less costly (Hahn and Kern, 2005). Despite this evidence on CUB, it is not known how this phenomenon (codon usage bias) may affect gene functionality and paucity. Therefore, this study was done to relate the pig CUB (5% of each genes showing highest and lowest biasness) to gene ontologies and functional genomics.

2 Materials and Methods
2.1 Sequence data
The genome sequence used for analysis was downloaded from Ensemble v68 (Sus scrofa build 10.2) using BioMart (Ensembl v68). A total of 23,269 coding sequences was extracted from the female Duroc pig breed as the reference genome,. Only 21,550 CDS with more than 50 amino acids (150 bp) were included for analysis. Gene ontology (GO) terms were downloaded from Ensembl genome browser as well.

2.2 Codon indices: Genomic Codon Adaptation Index (gCAI)
Genomic codon adaptation index (gCAI) used in this study was computed earlier (Khobondo et al., 2015) as the geometric mean relative synonymous codon usage (RSCU) divided by the highest possible geometric mean of RSCU given the same amino acid (AA) sequence using an in house perl script.

Therefore, the value gCAI is a proxy for codon bias because values are normalized using codon frequencies at equilibrium, thus there is no assumption of expression bias (Khobondo et al., 2015).

2.3 Analysis tools
An in house Perl script was used to derive codon indices as described by Khobondo et al., (2015). Five percent (5%) of most and least bias genes according to gCAI were extracted and grouped in two categories (low and high bias). Because not all pig genes have associated gene names, the genes without gene names were blasted against the human Refseq mRNAs and human reference protein sequences (blastn and blastp respectively) and the best human hit was assigned as gene name. Human orthologs of porcine genes were used to perform gene ontology (GO) analysis. BinGO v2.44 (Maere et al., 2005) a plugin of Cytoscape v2.8.3 (Shannon et al., 2003) was used to identify enriched GO terms using human gene annotation as background. Hypergeometric test was used to assess the significance of the enriched terms and Benjamini and Hochberg correction was implemented for multiple comparisons. Validation of over-represented GO terms from BinGO was done using a Perl script that compared the GO terms between the two files (selected highest or lowest biased) and all GO terms downloaded from Ensembl genome browser. Statistical significance was computed using a chi- square test. In order to correct for false enrichment, P-value threshold of 0.0001 was used as significant value for GO analysis.

3 Results
3.1 High codon usage bias and Gene Ontology terms
Gene ontology analysis on the 5% high and low CUB genes using BinGO and validated by in-house Perl script found 28 and 71 GO terms to be significantly enriched in highly and lowly CUB genes, respectively. The significant GO terms covered all the three gene ontology domains of cellular components, biological processes and molecular functions. Notable associated GO terms like cell surface, plasma membrane, nucleolus, nucleoplasm and nucleus showing anatomical structures are cellular components related to biological processes. The over-representation of ribosome, actin binding for translation and holding cellular matrix (mentioned above), were expected in highly biased genes. The same apply for heme binding for oxygen supply in all the body cells, DNA repair and transmembrane transport. Ubiquitination, phosphorylation, protein kinase binding and protein kinase activity are also highly enriched among high CUB genes (Table 1).

Table 1 The overrepresented GO terms for the 5% highly codon biased genes of the pig genome

3.2 Low codon usage bias and gene ontology
For the lowly biased genes, there was over-repre- sentation of GO terms related to immune response (Table 2). For example, biological processes like defense response, defense to bacteria and virus, negative regulation of inflammatory cytokines and apoptotic process were significant. Antigen processing and presentation as well as lysosome which are acquired and innate immune response respectively were enriched in low CUB. Notably were the over-representations of many GO terms for organs development (liver, lung, skin, skeletal muscle, and thymus) and GO terms related to many negative regulations of some detrimental molecular functions e.g. low response to ultra violet light and blood coagulation. Olfactory GO terms like detection of chemical stimulus involved in sensory perception of smell were also enriched (Table 2).

Table 2 The overrepresented GO terms of the 5% lowest codon biased genes

4 Discussion
GO terms are associated with codon usage bias
GO terms such as phosphorylation involved in gene regulation mechanisms were found significantly enriched in highly biased genes. Enriched GO terms were also found to be associated with major processes such as transcription of genetic materials for gene expression, trans-membrane transport of the transcript from nucleus to cytosol and ribosome manufacture to facilitate translation. Other major term such as actin binding play a central role in many eukaryotic cells basic metabolism. It compliments cytoskeleton to shape the cells, acts in cell division, motility, contraction, adhesion, phagocytosis, protein sorting, DNA repair and signal transduction (Uribe and Jay, 2009). Many studies have established the presence of actin in the nucleus and cytoplasm and have shown that its functions are diverse in both cell components. Possible roles for nuclear actin include contribution to the organization of chromatin remodeling complexes, RNA processing or regulation of DNAase I function (Olave et al., 2002). In addition, actin plays a direct role in transcription by RNA polymerases, II and III (Percipalle and Visa, 2006). Ubiquitination (ubiquitin- dependent protein catabolic process, ubiquitin-protein ligase activity) is a reversible post-translational modification of cellular proteins and is known to play central roles in the regulation of various cellular processes, such as protein degradation, protein trafficking, cell-cycle regulation, DNA repair, apoptosis and signal transduction (kimura and Tanaka, 2010). Such functions are highly regulated and require high codon usage bias as witnessed in this study. For normal cell functioning, protein - protein interaction and the activities of enzymes are regulated by phosphorylation of tyrosine. More so, in all genes activation, phosphorylation plays a central role. It is not surprising that these molecular functions and biological processes are enriched in high CUB genes for cell hemostasis purposes. Taken together, most GO terms enriched in high CUB genes are important for day to day physiological functions of the cell and may be termed as ‘housekeeping’.

Functional enrichment analyses in this study do mimic Paudel et al., (2013) findings on CNVs. The enrichment of low CUB involved in the immune related genes is interesting. Genes involved in virus response such as interferon (IFN), cytochrome P450 (CYP), are usually fast evolving due to their importance for the organism to respond rapid changes in the environment. For example, members of interferon (IFN) gene families, involved in defense against viral infections (Table 2), and CYP genes, which are responsible for detoxification and drug metabolism, were found to have high CNR (Paudel et al., 2013) and very plausible in this study. This could be because these genes are less conserved and need to evolve fast to adapt to ever changing antigenic determinants, evolution of pathogen and immune evasion mechanism explored by pathogens. The low CUB usage results reported in this study concur with Paudel et al., (2013) and show that these types of genes are often found to be CN variable in pigs. The observed overrepresentation of low codon biased in olfactory receptors (OR) was expected. Sus scrofa have the largest repertoire of functional OR genes in mammals whose genome has been sequenced to date (Nguyen et al., 2012), likely related to the strong dependence on their sense of smell for foraging especially in wild and in extensive production system. For efficiency of foraging different feeds, high flexibility of codon usage is thus justified. There are about 1301 porcine OR genes, nearly a third are found as copy number variable in pigs. Such large numbers of genes are less conserved and might explain the low CUB as reported in this study. These findings suggest that the wide variety of environmental conditions faced by pigs around the world have resulted in low CUB for flexibility and high CNVs. This low CUB could be because these genes are less conserved and need to evolve fast to adapt to ever changing antigenic determinants and artificially created environment for immune and olfactory receptor genes respectively.

Cell apoptosis through fas ligand (Griffith et al., 1995) being the principal mechanism by which the majority of effecter T and B lymphocytes die after clearance of an infection may justify the low CUB witnessed in this study. This might be to meet the changing infection- clearance status to regulate uncontrolled activation of lymphocytes that may result to self-destruction, limit auto reactivity and bring forth immune tolerance.

Conclusion
Functional analyses revealed high and low codon usage bias enriched with genes related to housekeeping functions and immune, sensory perception and response to stimulus respectively.

Authors’ contributions
Khobondo J.O. conceived and designed the experiments, developed perl script, wrote the manuscript; Kahi A.K. discussed and improved manuscript; Ngeno K. analysed the data, improved the manuscript. All authors read and approved the final manuscript.

References
Basak S., and Ghosh T.C., 2005, On the origin of genomic adaptation at high temperature for prokaryotic organisms, Biochemical and Biophysical Research Communications, 330: 629-632
http://dx.doi.org/10.1016/j.bbrc.2005.02.134

Freeman J.L., Perry G.H., Feuk L., Redon R., McCarroll S.A., Altshuler D.M., Aburatani H., Jones K.W., Tyler-Smith C., Hurles M.E., Carter N.P., Scherer S.W., and Lee C., 2006, Copy number variation: new insights in genome diversity, Genome Res, 16: 949-961
http://dx.doi.org/10.1101/gr.3677206

Groenen M.A., Archibald A.L., Uenishi H., Tuggle C.K., Takeuchi Y., Rothschild M.F., and Fairley S., 2012, Analyses of pig genomes provide insight into porcine demography and evolution, Nature, 491(7424): 393-398
http://dx.doi.org/10.1038/nature11622

Griffith T.S., Brunner T., Fletcher S.M., Green D.R., and Ferguson T.A., 1995, Fas ligand-induced apoptosis as a mechanism of immune privilege, Science, 270(5239): 1189-1192
http://dx.doi.org/10.1126/science.270.5239.1189

Hahn M.W., and Kern A.D., 2005, Comparative Genomics of Centrality and Essentiality in Three Eukaryotic Protein-Interaction Networks, Molecular Biology and Evolution, 22: 803-806
http://dx.doi.org/10.1093/molbev/msi072

Hershberg R., and Petrov D.A., 2008, Selection on Codon Bias, Annual Review of Genetics, 42: 287-299
http://dx.doi.org/10.1146/annurev.genet.42.110807.091442

Hershberg R., and Petrov D.A., 2010, Evidence That Mutation Is Universally Biased towards AT in Bacteria, PLoS Genetic, 6: e1001115
http://dx.doi.org/10.1371/journal.pgen.1001115

Jia R., Cheng A., Wang M., Xin H., Guo Y., Zhu D., Qi X., Zhao L., Ge H., and Chen X., 2009, Analysis of synonymous codon usage in the UL24 gene of duck enteritis virus, Virus Genes, 38: 96-103
http://dx.doi.org/10.1007/s11262-008-0295-0

Kanaya S., Yamada Y., Kinouchi M., Kudo Y., and Ikemura T., 2001, Codon Usage and tRNA Genes in Eukaryotes: Correlation of Codon Usage Diversity with Translation Efficiency and with CG-Dinucleotide Usage as Assessed by Multivariate Analysis, Journal of Molecular Evolution, 53: 290-298
http://dx.doi.org/10.1007/s002390010219

Khobondo J.O., Okeno T.O., and Kahi A.K., 2015, Genomic composition factors affect codon usage in porcine genome, African Journal of Biotechnology, 14(4): 341-349
http://dx.doi.org/10.5897/AJB2014.14110

Kimura Y., and Tanaka K., 2010, Regulatory mechanisms involved in the control of ubiquitin homeostasis, Journal of Biochemistry, 147: 793-798
http://dx.doi.org/10.1093/jb/mvq044

Knight R., Freeland S., and Landweber L., 2001, A simple model based on mutation and selection explains trends in codon and amino-acid usage and GC composition within and across genomes, Genome Biology, 2: research0010.1 - research0010.13

Maere S., Heymans K., Kuiper M., 2005, BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks, Bioinformatics, 21:3448-3449
http://dx.doi.org/10.1093/bioinformatics/bti551

Nguyen D., Lee K., Choi H., Choi M., Le M., Song N., Kim J-H., Seo H., Oh J-W., Lee K., Kim T-H., Park C., 2012, The complete swine olfactory subgenome: expansion of the olfactory gene repertoire in the pig genome, BMC Genomics, 13: 584
http://dx.doi.org/10.1186/1471-2164-13-584

Olave I., Wang W., Xue Y., Kuo A., and Crabtree G.R., 2002, Identification of a polymorphic, neuron-specific chromatin remodeling complex, Genes and Development, 16: 2509-2517
http://dx.doi.org/10.1101/gad.992102

Paudel Y., Madsen O., Megens H.J., Frantz L.A., Bosse M., Bastiaansen J.W., and Groenen M.A., 2013, Evolutionary dynamics of copy number variation in pig genomes in the context of adaptation and domestication, BMC genomics, 14(1): 449
http://dx.doi.org/10.1186/1471-2164-14-449

Percipalle P., and Visa N., 2006, Molecular functions of nuclear actin in transcription, The Journal of Cell Biology, 172: 967-971
http://dx.doi.org/10.1083/jcb.200512083

Qiu S., Bergero R., Zeng K., and Charlesworth D., 2011, Patterns of Codon Usage Bias in Silene latifolia., Molecular Biology and Evolution, 28: 771-780
http://dx.doi.org/10.1093/molbev/msq251

Shannon P., Markiel A., Ozier O., Baliga N.S., Wang J.T., Ramage D., Amin N., Schwikowski B., Ideker T., 2003, Cytoscape: a software environment for integrated models of biomolecular interaction networks, Genome Research, 13: 2498-2504
http://dx.doi.org/10.1101/gr.1239303

Sharp P.M., Bailes E., Grocock R.J., Peden J.F., and Sockett R.E., 2005, Variation in the strength of selected codon usage bias among bacteria, Nucleic Acids Research, 33: 1141-1153
http://dx.doi.org/10.1093/nar/gki242

Uribe R., and Jay D., 2009, A review of actin binding proteins: new perspectives, Molecular Biology Reports, 36: 121-125
http://dx.doi.org/10.1007/s11033-007-9159-2

Wright F., 1990, The ‘effective number of codons’ used in a gene, Gene, 87: 23-29
http://dx.doi.org/10.1016/0378-1119(90)90491-9

Zhang Q., Zhao S., Chen H., Liu X., Zhang L., and Li F., 2009, Analysis of the codon use frequency of AMPK family genes from different species, Molecular Biology Reports, 36: 513-519
http://dx.doi.org/10.1007/s11033-007-9208-x

Genomics and Applied Biology

• Volume 6

View Options
. PDF(501KB)
. FPDF(win)
. HTML
. Online fPDF
Associated material
. Readers' comments
Other articles by authors
. Khobondo J. O.

. Ngeno K.

. Kahi A. K.