Visualization Map Analysis of Literature on Genomics Research Based on Bibliometrics of CiteSpace

Chunhe Zhao; Weixi Cui; Junsheng Meng; Xudong Zhang; Xiaoyu Zhao; Lei Wang; Zhanjun Dong

Research Article

Visualization Map Analysis of Literature on Genomics Research Based on Bibliometrics of CiteSpace

Chunhe Zhao

, Weixi Cui

, Junsheng Meng

, Xudong Zhang

, Xiaoyu Zhao

, Lei Wang

, Zhanjun Dong

Hebei General Hospital, Hebei, 050051, China

Author

Correspondence author
Genomics and Applied Biology, 2017, Vol. 8, No. 5 doi: 10.5376/gab.2017.08.0005
Received: 13 Nov., 2017 Accepted: 19 Dec., 2017 Published: 29 Dec., 2017

This article was first published in Genomics and Applied Biology (2017, 36: 1248-1256) in Chinese, and here was authorized to translate and publish the paper in English under the terms of Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Preferred citation for this article:

Zhao C.H., Cui W.X., Meng J.S., Zhang X.D., Zhao X.Y., Wang L., and Dong Z.J., 2017, Visualization map analysis of literature on genomics research based on bibliometrics of CiteSpace, Genomics and Applied Biology, 8(5): 30-48 (doi: 10.5376/gab.2017.08.0005)

Abstract

In this paper, we compared the domestic and foreign research situation in the field of genomics from the perspective of bibliometrics, aiming to provide some references for further study. We retrieved genomics related literature through the Web of Science and CNKI from 1985 to 2016, and then analyzed the number of published articles, the networks of collaboration among authors, research institutes and countries, the co-occurrence of keywords and co-citations of the retrieved literature through the software CiteSpace 4.0.R5 SE. The number of published periodical articles develops in fluctuation and shows a rising trend in general. The USA holds the largest share of world publications in genomics, the following is the UK and China in order. A core institution with considerable scale has formed in America and good cooperative relationships have been established between universities in the institution. Chinese Academy of Sciences is the main research institution in China. The cooperation between the international institutions is more often and closer than that at home. The researchers with the most articles published in Chinese and foreign journals are Liu J.Q. and Zhang Y respectively. The main co-occurrence and burst keywords are Pharmacogenomics, Functional genomics, Comparative genomics, Proteomics, etc. The articles with the highest centrality, highest citation frequency and highest burst strength are (Yang, 2007), (Lander, 2001) and (Altschul, 1997), respectively. Interagency cooperation should be enhanced so as to push forward the development of genomics research in China.

Keywords

Genomics; CiteSpace; Bibliometrics; Visualization analysis

Background

Genomics is a discipline which studies on biological genomes to solve the practical problems. It was firstly put forward by Thomas Roderick in 1986. It covers genome mapping, sequencing, gene tagging and functional analyzing of the whole genomes (Li and Yin, 2000). Genomics provides information on biological genomes and systematic utilization of related data in order to work out the significant issues in the fields of biology, industry and medicine. We retrieved genomics related literature through CNKI and the Web of Science that have been published since 1985, and then analyzed the authors, institutions, collaboration among institutes and that among countries, keywords burst and co-occurrence, and co-citations of the retrieved literature through the visualization analysis software CiteSpace 4.0.R5 SE. We aims to explore the research hot spots and fronts to provide some references for further study on genomics.

CiteSpace is a citation visualization analysis software developed by Java language, focusing on analyzing the potential knowledge contained in scientific analysis. Using pathfinder network scaling and co-citation analysis theory, CiteSpace measures the literature (collection) of a specific domain and explores the key documents that will or may play an important role on the evolution of a certain knowledge domain or scientific field, which will be the intellectual turning point of the domain or the creative path that may affect the development of the domain. Through the knowledge structure and its transformation displayed by visualization network map of citing articles and its references or keywords, CiteSpace reflects the dynamic evolution of research hot spots in the certain field. So far, CiteSpace has been widely used in mapping and file management, management science and engineering, education, public administration, sociology, sport science, basic medicine, theoretical economics, philosophy, business administration, biology, applied economics, history of science and technology, psychology and other fields. CiteSpace serves as a valuable scientific research tool for the studies of the above fields (Chen, 2004; Chen, 2006; Chen et al., 2010; Chen, 2012; Chen and Leydesdorff, 2013; Chen et al., 2014).

1 Results and Analysis

1.1 Publication amount analysis

According to the retrieving results in the Web of Science and CNKI, we analyzed the number of published genomics related articles (from 1985 to 2016) and made a list of them (Table 1).

Table 1 The number of published periodical articles (piece)

The number of published genomics related articles is increasing year by year (Figure 1). It reaches almost as high as 4000 pieces in 2015, which indicates that more and more researchers and researching institutes abroad have attached great importance to genomic studies and have obtained abundant research achievement. Compared with that, the number of published articles in China shows an upward trend in fluctuation since 1997. Moreover, there is sudden increase from 2013 to 2015, which illustrates that our country has increased the investment in genomics related fields in recent years, and has made great progress.

Figure 1 The number of published periodical articles (piece)

1.2 Visualized collaboration map analysis

1.2.1 Co-authorship network analysis

Table 2 shows the main authors that have published more than 10 Chinese-language articles in the field of genomics from 1985 to 2016. Jiaqiang Liu ranks the first with 29 published papers. The visualized co-authorship network map (Figure 2) shows that there are totally 168 authors and 78 items of collaboration among them, which has formed co-authorship networks represented by Jiaqiang Liu and Miqu Wang, Wei Zhang and Honghao Zhou, Huanming Yang and Jun Yu, Dongsheng Zhou and Ruifu Yang, Hongtao Song and Ying Hou, Yu Li and Tianyu Wang, Shilin Chen, Dacheng Hao and Peigen Xiao, Jiaqi Wang, Shengguo Zhao and Kailang Liu, Wenli Ma, Jiebing Ke and Wenling Zheng, Chaoying He and Shouyi Chen, Song Wu and Zhiming Cai, Guanghong Cui and De Qiu, Jieshou Li and Yousheng Li, Jun Yue and Honghai Wang, Xingguo Ye and Lipu Du, Ling Wang and Jie Fu, and Zhu Chen, etc.

Table 2 The main authors of the Chinese-language articles

Figure 2 The visualized co-authorship networks of Chinese-language articles

Table 3 shows the main authors that have published more than 50 foreign-language articles in the field of genomics from 1985 to 2016. Zhang Y. ranks the first with 149 published papers. The visualized co-authorship network map (Figure 3) shows that there are totally 190 authors and 130 items of collaboration among them, which indicates that the cooperation between the international researchers is stronger and closer than that in China. The scale of co-authorship networks is relatively large which are formed by Zhang Y., Wang J., Liu L., Li H., Wang L., Wang Y., Lee S.H., Varshney R.K., Li J., Wang W., Zhang J., Li X., Liu Y., Li Y., Wang X., Wang Q., Katze M.G., and Kumar S., etc, and the output of articles is the largest.

Table 3 The main authors of foreign-language articles

Figure 3 The visualized co-authorship networks of foreign-language articles

1.2.2 Co-institution network analysis

The paper analyzed the research institutes of the first authors in studied articles in China and here are the top 10 research institutes (Table 4). Institute of Crop Science of Chinese Academy of Agricultural Sciences published the most articles in China. The visualized institute collaboration network map (Figure 4) shows that there are totally 153 institutes and 9 items of collaboration among them (1 Year Per Slice, Select top 10 from each slice), compared with 106 institutes and 59 items (Figure 5) respectively at abroad (1 Year Per Slice, Select top 10 from each slice), which indicates that the cooperation at abroad is much stronger and the number of programs is far more than that at home. There are 6 American institutes in the top 10 list of foreign institutes (Table 5) and all of them are universities, which shows that a core institution with considerable scale has formed in America. Chinese Academy of Sciences, the main research institution of genomics in China, ranks the second on the list.

Table 4 The top 10 domestic institutions in the number of published articles

Figure 4 The visualized co-institution networks of Chinese-language articles

Figure 5 The visualized co-institution networks of foreign-language articles

Table 5 The top 10 foreign institutions in the number of published articles

1.2.3 Analysis on networks of co-operation among countries

The visualized networks of co-operation among countries (Figure 6) was generated by setting Time Slice Length=1, Node Type=Country, Selection Criteria: Top 50 per slice. The international cooperation networks are rather complicated among those 83 countries showing on the map with totally 606 items of collaboration among them. Table 6 shows the Top 10 countries in the number of published articles. USA ranks the first with 17276 published papers in the field of genomics which indicates that its scientific research level is at the forefront of the world. England ranks the second with 2915 published articles. China is the third on the list with 2662 published articles which is of the leading level in the world, indicating that the investment of China in genomics related fields is relatively large but it is also far behind compared with that of USA.

Figure 6 The visualized networks of co-operation among countries

Table 6 The top 10 countries in the number of published articles

1.3 Visualized map analysis of co-keywords

The keywords with highest frequency are always the indicator of hot topics in a certain researching field. Thus, this paper analyzed the co-occurrence networks of keywords of the articles in the genomics related fields in order to explore the research hot spots and their evolution.

The paper analyzed the co-keywords networks in Chinese-language articles by setting Time Slice Length=1, Node Type=Keywords, Selection Criteria: Top 80 per slice, and we got the visualized networks of co-keywords consisting of 700 nodes and 2664 links (Figure 7). Table 7 shows the Top 20 keywords. Moreover, cluster analysis was also made which generated 18 relatively obvious clusters (words in red in Figure 7 are the label words of clusters). The module value (Q value)= 0.5709>0.3 and the mean silhouette value (S value)= 0.5112>0.5 illustrate that the generated cluster structure is apparent and the clusters are reasonable. And they are also the bases of the settling of Selection Criteria. The main burst keywords in those articles are listed in Figure 8. Burst keywords are the nodes where the frequency increases or decreases abruptly. Those nodes are usually considered as the turning points of a certain research hot spot.

Figure 7 The visualized networks of co-keywords in Chinese-language articles

Table 7 The top 20 keywords with highest occurrence frequency in Chinese-language articles

Figure 8 The main burst keywords in Chinese-language articles

The paper analyzed the co-keywords networks in English-language articles by setting Time Slice Length=1, Node Type=Keywords, Selection Criteria: Top 20 per slice, and we got the visualized networks consisting of 138 nodes and 466 links (Figure 9). Table 8 lists the Top 20 keywords. Moreover, cluster analysis was also made which generated 10 relatively obvious clusters. The module value (Q value)= 0.5452>0.3 and the mean silhouette value (S value)= 0.7669>0.7 illustrate that the generated cluster structure is apparent and the clusters are highly efficient and reasonable. The main burst keywords in those articles are listed in Figure 10.

Figure 9 The visualized networks of co-keywords in foreign-language articles

Table 8 The top 20 keywords with highest occurrence frequency in foreign-language articles

Figure 10 The main burst keywords in foreign-language articles

From 1985 to the beginning of 21st century, there were burst keywords like pharmacy company, biotechnology, combinatorial chemistry, human genome, and life sciences (Figure 8; Figure 10). New technologies like genomics, combinatorial chemistry and the following highly efficient screening of drugs have transformed biotechnological research and pharmacy industries. At the same time, Human Genome Project was launched by National Human Genome Research Institute of USA in 1990, aiming to uncover the secrets of 3 billion base pairs that constitute all the genes of human body. In 1998, Human Genome Center in Institute of Genetics and Developmental Biology was established in Chinese Academy of Sciences. In the same year, National Human Genome Center, Beijing and National Human Genome Center at Shanghai were established. The next year, Human Genome Center was registered internationally and completed the task of sequencing about 30Mb region from human chromosome break No.8 which accounts for 1% of the whole human genome. China is the 6th country following USA, UK, France, German and Japan to have participated in Human Genome Project (Wu, 2009). During the period, DNA chip technology was developed at the right moment to high-efficiently and quickly test and analyse a large sum of genetic information. Post-genome era came around when human gene map was portrayed completely in 2000. Functional genomics became the focus of genomic research. New technologies like two dimensional gel electrophoresis and DNA chips were developed and applied. It concentrated on cognizing and analyzing the genetic and non-genetic sequences and their functions of the whole genome in attempt to make out the encyclopedia to help interpret the profound DNA language. In 2010, there were burst keywords like metagenomics and comparative genomics. Metagenomics takes microorganism DNA extracted from environmental samples as study objects to construct the metagenomic library. It screens and searches for new physiological activators to acquire information about micro-organic genetic diversity and molecular ecology from the environment (Huang et al., 2009). Comparative genomics mainly compares the whole genome of different species and comprehends the functional and developmental correlation of the whole genome. In addition, post-genome era has made proteomics more popular. Proteomics is mainly about a comprehensive study of the properties of protein, providing theoretical basis and solution for the clarification and attack of many disease mechanisms at the protein level. It has also brought revolution to the medical field. Keywords like pharmacogenomics and individualized treatment have been extensively mentioned in references. In the course of clinical treatment, it is often found that different patients have different therapeutic effects and side effects on the same drug. Pharmacogenomics, based on gene theory, studies the relationship between the gene itself and its mutants and its drug effects. Through the gene detection of the patients, the individualized treatment plan is provided according to its genotype, so as to improve the efficacy and reduce the occurrence of adverse drug reactions. In recent years, the development of the treatment on non-small cell lung cancer depends largely on the research and application of pharmacogenomics. The direct motivation of Human Genome Project is to solve the basic genetic problems of human diseases including cancer, and to achieve early prevention and treatment, so as to reduce the risk of disease. In addition, in recent years, traditional Chinese medicine genomics which takes Chinese medicine raw species as study objects has also developed rapidly. With the help of techniques like gene chips and bio-informatics, the targets of traditional Chinese medicine and their mechanism of action can be explored to determine the effective parts of traditional Chinese medicine, identify Chinese herbs, distinguish genuine herbs, screen new drugs and shorten the cycle and finally open up a new path for the modernization of traditional Chinese medicine (Xing et al., 2007) (Figure 7). In 2014, the concept of big data burst. In the same year, Google marched towards genomics and joined Global Alliance for Genomics and Health in order to gather the resources in this field and build databases based on big data to solve the problem of interpreting the results of complex gene detection.

1.4 Visualized map analysis of co-citation

Visualized co-citation map was portrayed by CiteSpace to study the key references used in the field of genomics. At present, CiteSpace is still unable to carry out the literature co-citation analysis of the literature in CNKI, so this study only analyzes the English literature. Because of the huge amount of data cited in English literature, this study is divided into three time periods to analyze the cited literature in the English literature-the first 5 years (1985-2009), the second 5 years (2010-2014) and recent two years (2015 and 2016). According to the document quantity and cluster analysis Q value, Selection Criteria is set as Top 10, Top 30, Top 100, respectively (Figure 11; Figure 12; Figure 13). Q value is 0.7171, 0.5164 and 0.6984, respectively, which indicates the cluster structure is apparent. The main clusters and top terms of reference co-citation network in the three periods are listed below (Table 9; Table 10; Table 11).

Figure 11 The visualized network of reference co-citation (1985-2009)

Figure 12 The visualized network of reference co-citation (2010-2014)

Figure 13 The visualized network of reference co-citation (2015-2016)

Table 9 The main clusters and top terms in the reference co-citation network (1985-2009)

Table 10 The main clusters and top terms in the reference co-citation network (2010-2014)

Table 11 The main clusters and top terms in the reference co-citation network (2015-2016)

Table 12 lists the top 10 literature with highest centrality. Literature with highest centrality means that papers occupy an important position in structure, that is, they play an important role in connecting other nodes or several different clusters. These documents can be regarded as a landmark in the field of genomics (Chen et al., 2014).

Table 12 The top 10 literature with highest centrality

Note: a refers to the cluster in the reference co-citation network (1985-2009); b refers to the cluster in the reference co-citation network (2010-2014); c refers to the cluster in the reference co-citation network (2015-2016)

The literature “Yang ZH-2007” ranks the first with centrality of 0.38. The title is “PAML 4: Phylogenetic analysis by maximum likelihood”. PAML is a software package for phylogenetic analysis on DNA or protein sequences using maximum likelihood methods. It was developed by Ziheng Yang and provided for academic use for free. He is the author of the paper, and also he is a famous ethnic Chinese scientist, academician of the Royal Academy of Sciences and professor in statistical genetics of University of London. The use of multi-core PAML parallel algorithm has obvious acceleration effect on data set analysis of DNA and protein sequences. The literature “Dunham I-2012” ranks the second with centrality of 0.37. The title is “An integrated encyclopedia of DNA elements in the human genome”. The integrated encyclopedia of DNA elements can systematically map the areas of transcription, transcription factor association, chromosome structure, and histone modification. These data enable the biochemical function of 80% genes to be assigned. Many of the candidate regulatory elements are found to be associated with other regulatory elements and expression genes, providing new insights into the regulation mechanism of genes. Some newly discovered gene elements and sequence variations associated with human diseases show statistical correlation with sequence variation, which helps to explain variation. The literature “Cong L-2013” ranks the third with the centrality of 0.36. The title is “Multiplex Genome Engineering Using CRISPR/Cas Systems”. Gene editing technology CRISPR/Cas9 is listed by SCIENCE as one of the ten major advances in science and technology in the year of 2013. In this paper, two kinds of II CRISPR (regularly spaced short palindrome repeats) / Cas (CRISPR related protein) system were designed and proved that Cas9 nuclease can accurately split endogenous genomes directly induced by short RNA in human and mouse cells. Cas9 can also be converted into an incisional enzyme that promotes homologous directional repair with minimal mutagenic activity. Multiple boot sequences can be encoded into a single CRISPR array, enabling simultaneous editing of several parts of the mammalian genome. It has been proved that RNA guided nuclease technology is convenient, programmable and widely applicable. Besides, the most recently published high-centrality literature in 2014 is also about CRISPR/Cas9 system entitled “Genetic Screens in Human Cells Using the CRISPR-Cas9 System”. In the CRISPR/Cas9 system, the enzyme Cas9 cuts at the DNA target site. The target of DNA is determined in the following way: the RNA molecule called CRISPR RNA (crRNA) uses some of its sequences to bind to another RNA molecule called tracrRNA by base pairing. The chimeric RNA (tracrRNA/crRNA) is formed and then pairing with the target DNA site with another portion of the crRNA sequence. In this way, this chimeric RNA can guide Cas9 to the target site and cut it. In practical application, tracrRNA and crRNA can be used as the two guiding RNA (gRNA) or can be fused together to form a one-way guide RNA (single guide RNA, sgRNA), and it’s used to guide the enzyme Cas9 binding to the target DNA sequence and cutting. CAS9 together with sgRNA is called the Cas9-sgRNA system. Therefore, in order to construct nuclear gene modified animal models by using CRISPR/Cas9, Cas9/sgRNA software package was complied to assist in the rapid design and screening of highly active and specific sgRNA, constructing sgRNA expression library and large-scale construction of genetically modified animal models. Another high-centrality literature published in 2014 was “The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST)”. Genome annotation belongs to the category of functional genomics. RAST is a fast annotation tool using Subsystem technology. It is a genome annotation tool for complete or nearly complete bacteria and archaea. The accuracy, consistency and completeness of RAST are based on two databases: the Subsystem Library of artificial rectification and the FIGfams Library of protein, which can be used to predict ORF, Rrna, Trna and corresponding functional genes, and these information can be used to build metabolic network.

By integrating the co-cited literature of three periods, the top 10 most cited references are listed in Table 13. The highly cited literature is generally an important document with a fundamental role. The top is “Lander ES-2001” with the 851 times of citation. The title is “Initial sequencing and analysis of the human genome”. This paper reports the results of an international collaboration dedicated to providing free drafts of human genome sequencing, and makes a preliminary analysis of the sequencing data. The second-place literature is “Altschul SF-1997” with 797 times of citation. The title is “Gapped BLAST and PSI-BLAST：a new generation of protein database search programs”. Gapped BLAST is Gapped Basic Local Alignment Search Tool while PSI-BLAST is Position Specific Iterated BLAST. BLAST is a search tool widely used to retrieve protein or DNA sequences similar to the current research sequences in protein or DNA databases. The improved BLAST allows the insertion of the vacancy, that is, the vacancy BLAST, and its running speed is up to three times that of the original. PSI-BLAST introduces the location specific scoring matrix into BLAST, and retrieves protein or DNA database by using this matrix. It finds out the best retrieval results through many iterations. The single iteration of PSI-BLAST is similar to that of vacancy BLAST, but its sensitivity to bio-correlation sequences with weak similarity is stronger than that of vacancy BLAST. It has thus been used to explore and discover the new and interesting BRCT protein super family members. The third-place literature is “Li H-2009” with 659times of citation. The title is “The Sequence Alignment/Map format and SAMtools”. It’s also the literature of high centrality. SAM format is a text format for storing and reading gene sequence data. It can support short sequence reading and long sequence reading (up to 128Mbp) generated by different sequencing platforms. In addition, the most recently published high-centrality literature in 2011 is “Molecular Evolutionary Genetics Analysis Using Maximum Likelihood，Evolutionary Distance，and Maximum Parsimony Methods”. The comparative analysis of molecular sequence data plays a vital role in reproducing the evolutionary history of species and inferring the impacts of natural selection on the creation of genes and species evolution. This document has released the latest version of MEGA (Molecular Evolutionary Genetics Analysis) software MEGA5, which introduces the maximum likelihood rate algorithm to deduce the evolutionary tree, select the best alternative model (nucleotide or amino acid), deduce the ancestral sequence and state (and probability), and estimate the rate of evolution. In computer simulation and analysis, the maximum likelihood rate algorithm adopted by MEGA5 is better than other softwares in inferring phylogenetic trees and replacing parameters. This version supports Windows, Mac OS X and Linux systems, available at http://www.megasoftware.net free of charge.

Table 13 The top 10 most cited references (1985-2016)

The references with strong citation bursts of 1985-2009 and 2010-2014 are shown below (Figure 14; Figure 15). There is no citation burst in 2015 or 2016. The reference with strong citation burst refers to the sudden increase in the cited frequency of the reference at a time point or time period, so it contains two dimensions: the burst strength and the bursting time. The reference with the highest burst strength is “Altschul SF-1997” with the value of 74.7129. The title is “Gapped BLAST and PSI-BLAST：a new generation of protein database search programs”. It’s also the reference of high citation. It indicates that this paper has received great attention in genome research, especially from 2003 to 2005, and has played an important role in the research of this field. It is the research hot spot during this period. In addition, the literature of most recent bursting time is “Finn RD-2010”. The bursting time was 2011-2012. It is the research frontier in recent years. The title is “The Pfam protein families database”. Pfam is a protein motif database, which is widely used in proteomics research. It is based on hidden Markov model and provides multiple sequence alignment services. It is a large set of protein family. This paper introduces its latest version, Pfam24.0, which applies the latest version of Hidden Markov Model package, HMMER3. HMMER3 runs 100 times faster than HMMER2. The sensitivity is greatly improved by the application of forward algorithm. Pfam 24 contains 11912 protein families. Pfam application website: http://pfam.sanger.ac.uk/ (UK), http://pfam.janelia.org/ (USA), http://pfam.sbc.su.se/ (Sweden).

Figure 14 The references with strongest citation bursts (1985-2009)

Figure 15 The references with strongest citation bursts (2010-2014)

2 Discussion

Through the bibliometric analysis and visualization analysis in the field of genomics from 1985 to 2016, it is found that there are more papers in genomics in China, but there is less cooperation among researchers and among research institutions. Therefore, while strengthening the research of genomics and encouraging domestic research institutions to strengthen the research investment in this discipline, at the same time, we should also encourage the exchanges and cooperation of various scientific research institutions in various regions of China, as well as cross regional and transnational cooperation between China and other countries or regions in the field of genomics.

The Human Genome Project was put forward in 1985 and officially launched in 1990. With the completion of the draft of the human genome, the post genome era has come around. Through the analysis of co-keywords, burst keywords and co-citation, we can also find that the current research focus of genomics has been transferred to post genomics. Functional genomics has become the focus of research, and proteomics is the core of it. Proteomics research can help people learn a comprehensive understanding about the occurrence and development of disease from the protein level, and it can provide theoretical basis and solution for the early diagnosis of disease, discovery of bio-markers of the disease and search of the target molecule for corresponding drugs. It has great application prospect in disease prevention and individualized treatment. What’s more, pharmacogenomics is also a keyword with relatively high co-occurrence frequency in the study of genomics, and it has developed new ideas for traditional pharmaceutical research. Because of the different reactions of different patients to the same drugs, individualized medication guidance services are provided for patients by genotype detection to improve the efficacy and ensure the safety of drug use.

Researchers in China should firmly grasp the frontier and hot spot of genomics research, and carry out in-depth research in advantageous fields. Researchers in the field of medicine should also increase their enthusiasm for genomics research, and apply the research results to clinical practice, making this strategy of individualized treatment a reality.

3 Materials and Methods

3.1 Domestic database retrieval

This study retrieves the genomics related studies in the Chinese Journal Full-text Database (CNKI), which is used as a data source in the field of genomics research in China. The subject word was "Genomics" and the time span was limited from 1985-01-01 to 2016-4-30. The data sources were China Academic Journal Network Publishing Database, China Doctoral Dissertations Full-text Database, China Master’s Theses Full-text Database, Characteristic Journal and China Mono-graphic Series Full-text Database. 10164 articles were retrieved. A total of 8512 articles excluding conference notice, agency introduction, task interview, forum essay and publicity of science popularization were used as a sample of genomics research in China.

3.2 Foreign database retrieval

This study retrieves the relevant studies on genomics in the Web of Science ^TM core collection database as a foreign source of data in this field. The data sources were databases like Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Arts & Humanities Citation Index (A&HCI), Conference Proceedings Citation Index - Science (CPCI-S), Conference Proceedings Citation Index - Social Science & Humanities (CPCI-SSH) and Emerging Sources Citation Index (ESCI). The subject word was "Genomics" and the time span was limited from 1985 to 2016. The literature type was ARTICLE or REVIEW. A total of 40519 articles were retrieved and used as an international research sample in the field of genomics.

3.3 Analysis methods

This research adopts CiteSpace series of information visualization software CiteSpace 4.0.R5 SE version developed by Chaomei Chen, the professor of School of Computing and Information Science in Drexel University, USA. The networks of collaboration among authors, research institutes and countries, the co-occurrence of keywords and co-citations of the retrieved literature are all analyzed. It uses visualization map to explore the research frontiers and hot spots in the field of genomics at home and abroad, and also contracts and analyzes the domestic and foreign research status of genomics.

Authors’ contributions

ZCH and CWX were the executors of design and research. They were responsible for paper conception, database retrieval, data analysis, map making, document reading and sorting, first draft writing and revision. MJS, ZXD, ZXY, and WL participated in data retrieval, data sorting and analysis. DZJ directed research design, data analysis, paper writing and revision. All the authors have read and agreed with the final manuscript.

Acknowledgments

This study was funded by Pharmacy Department of Hebei General Hospital. Thank Meng Junsheng, Zhang Xudong, Zhao Xiaoyu, Lei Wang, Cui Weixi and Dong Zhanjun for participating in this research design, data retrieval and analysis, guidance on paper writing and revision.

References

Chen C.M., 2004, Searching for intellectual turning points: Progressive knowledge domain visualization, Proceedings of the National Academy of Sciences, 101(l): 5303-5310

https://doi.org/10.1073/pnas.0307513100

PMid:14724295 PMCid:PMC387312

Chen C.M., 2006, CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature, Journal of the American Society for Information Science and Technology, 57(3): 359-377

https://doi.org/10.1002/asi.20317

Chen C.M., 2012, Predictive effects of structural variation on citation counts, Journal of the American Society for Information Science and Technology, 63(3): 431-449

https://doi.org/10.1002/asi.21694

Chen C.M., and Leydesdorff L., 2013, Patterns of connections and movements in dual-map overlays: A new method of publication portfolio analysis, Journal of the Association for Information Science and Technology, 65(2): 334-351

https://doi.org/10.1002/asi.22968

Chen C.M., Ibekwe-SanJuan F., and Hou J.H., 2010, The structure and dynamics of co-citation clusters: A multiple-perspective cocitation analysis, Journal of the American Society for Information Science and Technology, 61(7): 1386-1409

https://doi.org/10.1002/asi.21309

Chen Y., Chen C.M., Hu Z.G., and Wang X., eds., 2014, Principles and applications of analyzing a citation space, Science Press, Beijing, China, pp.12, 134-144

Huang X.L., Huang S.J., Guo L.Q., and Lin J.F., 2009, Advances of metagenomics, Microbiology China, 36(7): 1058-1066

Xing Z.W., Wang Z., Gao S.H., and Wang Y.Y., 2007, Microarray and research of Chinese medications: pharmacogenomics of Chinese medications, China Journal of Chinese Materia Medica, 32(4): 289-292

Genomics and Applied Biology

• Volume 8