Author
Correspondence author
Computational Molecular Biology, 2025, Vol. 15, No. 6
Received: 20 Nov., 2025 Accepted: 12 Dec., 2025 Published: 25 Dec., 2025
Horizontal gene transfer (HGT) plays a crucial role in microbial evolution and the diversification of soil ecosystem functions. This study employed computational methods to detect and analyze HGT events in agricultural soil microbiomes, aiming to reveal how gene transfer shapes microbial community dynamics and ecological functions. Metagenomic data were generated from soil samples collected from conventional and organic farming systems, then assembled, binned, and annotated to identify potential donor and recipient species. Combining sequence-based comparative genomics, composition-based analysis, and machine learning models, common mobile genetic elements and gene families associated with environmental adaptation and antibiotic resistance were detected. The results indicate that HGT significantly promotes microbial resilience and nutrient cycling in soil ecosystems and is influenced by environmental parameters such as pH and nutrient availability. This study provides a methodological framework for computational HGT detection and offers new insights into microbial evolution, soil health, and sustainable agricultural management. Future research integrating multi-omics data and standardized benchmarks will improve the accuracy and ecological interpretability of HGT studies.
1 Introduction
When discussing how microorganisms acquire new capabilities, the concept of horizontal gene transfer (HGT) is always unavoidable. It is not the kind of traditional inheritance passed from parents to offspring, but rather more like a "casual exchange of genes" behavior among microorganisms. Such exchanges occur intermittently in many ecological environments, and sometimes even cause certain bacteria to suddenly possess characteristics that do not originally exist, such as a significantly enhanced tolerance to antibiotics (Arnold et al., 2021). However, such changes are not always easily observable directly, and many transfer events are often hidden in the details of the genome. Nevertheless, once these genes are successfully transferred, they will quietly alter the genomic structure of the recipient bacteria and may also cause them to behave differently in the environment. Therefore, when studying how microorganisms adapt to their surrounding environment, it is almost inevitable to take HGT into account.
In a structurally complex environment like soil, HGT is often regarded as an important force for maintaining the vitality of microbial communities. The rhizosphere, mycorrhizal sphere and other regions are particularly lively, and genes are more likely to be exchanged in these places (Nielsen and Van Elsas, 2019). These exchanges are not necessarily all beneficial, but they do make soil microorganisms more malleable, constantly reshaping metabolic functions, resistance characteristics and ecological relationships, and thus influencing important processes such as nutrient cycling. It is worth noting that external factors such as manure input or pollution can also stir up this transfer activity, sometimes enhancing and sometimes suppressing it, making the occurrence frequency of HGT in the soil not fixed.
This study attempts to use bioinformatics processes such as MetaCHIP to calculate and identify HGT events in soil microbial communities without the need for a reference genome. MetaCHIP combines optimal matching and phylogenetic judgment methods, capable of capturing recent and earlier gene transfers, and thus is suitable for studying genetic communication at the community scale. By adopting these methods, we hope to gain a more detailed understanding of the role of HGT in soil microbial ecology and its evolutionary significance, and provide some operational references for how to deal with gene flow in environmental management.
2 Ecological Context of HGT in Soil Microbiomes
2.1 Diversity and complexity of microbial life in soil ecosystems
Before discussing how genes flow in the soil, one point is easily overlooked: the soil itself is an extremely crowded and complex "microbial city". All kinds of bacteria are mixed together, sometimes competing and sometimes cooperating, and the community structure is thus constantly changing. This high degree of diversity cannot be maintained simply by evolutionary accumulation. Many microorganisms actually rely on horizontal gene transfer (HGT) to maintain the vitality and adaptability of the community (Coyte et al., 2022a; Zhu et al., 2024). Especially in the rhizosphere and mycelial sphere, microbial interactions are particularly frequent, and there are more opportunities for gene exchange, which often makes this area be regarded as a place where the complexity of the community is "magnified". Of course, not all regions are so active, but overall, the diversity of soil microorganisms is indeed inseparable from this kind of gene flow.
2.2 Environmental conditions influencing gene exchange (e.g., pH, moisture, nutrients)
The occurrence of HGT is not exactly the same in different soil environments. pH, humidity, the amount of nutrients, and even the presence of pollutants can all influence the frequency and mode of gene exchange. For instance, after the application of manure, foreign microorganisms and mobile genetic elements (MGEs) will rush in within a short period of time, thereby making the HGT slightly more active. However, in soils containing antibiotics or heavy metals, the situation is different. These stresses will select microorganisms carrying resistance genes, making the genes on MGEs more likely to spread (Sobecky and Coombs, 2009). These environmental factors do not act independently but jointly influence the flow rate of genes and the interaction patterns among different taxonomic groups, and ultimately will also be reflected in the stability of the community.
2.3 Role of HGT in microbial adaptation and functional gene dissemination
In the soil, whether microorganisms can gain a firm foothold in a new environment often depends not only on their own capabilities but also on whether they have the opportunity to "pick up" suitable genes from other microorganisms. HGT precisely provides such a shortcut to push genes related to antibiotic tolerance, metabolic patterns and resilience around the community (Maheshwari et al., 2017). These genes sometimes do not offer much help to certain microbiota, but for others, they may be key to entering new niches or recovering from stress. Changes at the community level are not always unidirectional. In some cases, HGT can make the entire community more stable, but this effect is often influenced by environmental conditions or the characteristics of the genes themselves (Coyte et al., 2022b). Understanding how these genes flow can not only help explain the adaptability of soil systems, but is also closely related to practical applications such as improving soil health, promoting sustainable agriculture and environmental remediation.
3 Computational Methods for HGT Detection
3.1 Sequence-based approaches: comparative genomics and phylogenetic incongruence
When analyzing HGT, many studies start with the sequence, but the approach is not complicated. It simply juxtaposes the evolutionary history of a certain gene with the phylogenetic relationship of the species it belongs to for comparison. If the two don't match, it's highly likely that this gene is not a "local product". The ideas of comparative genomics and phylogenetic inconsistencies basically revolve around this point. Tools like HGTphyloDetect swallow a large number of sequences in the process of reconstructing the gene tree, and then sort out possible donor-recipient relationships bit by bit, and even capture subtle transfers between closely related species (Yuan et al., 2023). However, such methods are not omnipotent. On the one hand, they require a large amount of computing power. As long as the reference database is slightly missing or biased, the judgment is prone to deviation, which is quite common in actual operation.
3.2 Composition-based techniques: k-mer profiles, GC skew, codon usage bias
Another type of detection approach does not rely on phylogenetic information but focuses on the "textual style" of the gene sequence itself. Characteristics such as k-mer composition, GC content skew, and codon usage preference often reveal whether a gene "resembles a local". These methods suggest the possibility of horizontal transfer by looking for sequence patterns that are not in harmony with the host genome (Ravenhall et al., 2015). For instance, some methods do not even require the prior demarcation of codon intervals; they can identify exogenous genes merely based on nucleotide distribution patterns (Tsirigos and Rigoutsos, 2005). Compositional technologies usually run fast and are suitable for large-scale genomic data. However, for those old transfer events that have been "assimilated" during evolution, they are sometimes not easy to identify.
3.3 Machine learning and network models
When dealing with increasingly large amounts of data, researchers often incorporate machine learning into the analysis of HGT. Its idea is not mysterious. It simply mixes genomic, environmental conditions and functional information together, allowing the model to search for which genes might come from where and to whom on its own. In recent years, methods based on knowledge graphs combined with graph neural networks have performed well in predicting HGT related to resistance diffusion (Islam et al., 2025), which can be regarded as a new trend in this type of research. However, there are also more "old-fashioned" but still practical processes, such as HGTector, which mainly determines which genes are not like the host's own by observing the distribution characteristics of BLAST results, and is not very sensitive to interferences such as gene deletion or changes in evolution rate (Zhu et al., 2014). The value of these tools lies in two aspects: on the one hand, they can support the processing of large-scale data; on the other hand, they also make it easier for us to understand the way genes flow within microbial communities.
4 Data Generation and Preprocessing for Soil HGT Studies
4.1 Metagenomic sampling strategies for soil microbiomes
When conducting soil metagenomic sampling, one point that researchers usually confirm first is that the soil is actually very heterogeneous within. Microorganisms at different depths and locations may be completely different, so sampling is often carried out in layers and by regions. The rhizosphere, a place where microorganisms interact particularly frequently, is a "key point" that is almost never overlooked during sampling (Brito, 2021). Some teams also manipulate the timing, such as repeatedly sampling during the period after manure application, because manure may bring new plasmids and mobile genetic elements, causing a "small peak" of HGT in a short period of time (Macedo et al., 2022). These designs may seem complex, but in fact, their core purpose is only one: to capture as much as possible the newly occurring gene exchanges in the soil, as well as the traces left over from earlier periods.
4.2 Assembly and binning challenges in complex metagenomic datasets
When dealing with soil metagenomes, assembly and box sorting are almost unavoidable challenges. There are too many and too diverse microorganisms in the soil. Coupled with similar sequences among closely related species, subtle differences in strains, and uneven sequencing coverage, it is often not as smooth as it sounds to piece together the genome completely and assign it to the corresponding taxonomic units (Song et al., 2019). Such uncertainties will directly affect the recognition of HGT, especially for those recent transfer events where genetic differences have not accumulated too much, which are easily masked by assembly errors or fragmented sequences. To reduce these problems, tools like MetaCHIP have been developed, which combine the best match and phylogenetic information, enabling the identification of potential HGTS at the community scale without referring to the genome (Figure 1).
![]() Figure 1 Example output for the flanking regions of an identified HGTs (Adopted from Song et al., 2019) |
4.3 Functional and taxonomic annotation pipelines supporting HGT discovery
After the sequence assembly is completed, whether the transferred genes can be truly singled out actually depends more on the subsequent functions and classification annotations. Researchers usually compare these sequences with well-organized databases to see what their functions are and which types of microorganisms they might belong to. This not only helps to identify those fragments carrying resistance genes or mobile elements, but also links gene transfer with its role in the ecosystem. Automated processes such as nf-core/hgtseq unify sequencing data from different sources, making subsequent HGT analysis easier to compare (Carpanzano et al., 2022). The clearer the annotations are, the easier it is to understand how genes move within the community and what impact they have on functions.
5 Case Study: Computational Detection of HGT in an Agricultural Soil Microbiome
5.1 Study background: sampling agricultural soils under conventional and organic practices
Before conducting HGT analysis, how and from where the soil is collected actually largely determine what can be observed subsequently. The samples used in this study were from farmlands with different farming methods, including traditional, organic, and plots that had been fertilized with manure or irrigated with sewage. These management measures often change the local microbial composition and naturally affect the frequency and mode of HGT (Ren et al., 2022). Researchers pay particular attention to antibiotic resistance genes (ARGs) and mobile genetic elements (MGEs) mainly because they are more closely associated with ecological risks and health problems. The sampling strategy is not merely for comparing the farming methods themselves, but aims to present the changing patterns of HGT under different agricultural disturbances as much as possible.
5.2 Implementation of HGT detection pipeline using metagenomic assemblies and alignment tools
When processing these soil data, the research team did not immediately conduct HGT determination. Instead, they first assembled and bogged the metagenomic data, trying to piece together the genomes that could be reconstructed as much as possible. Only after the sequence framework is clear can the method of comparative genomics and phylogenetic inconsistency be used to determine which genes may be "foreign" (Wijaya et al., 2025). Meanwhile, researchers have also incorporated some ideas of statistical models and machine learning, such as taking into account the differences in gene length and alignment biases caused by closely related species (Figure 2) (Sevillya et al., 2020). These methods do not replace each other but are combined into the same process to ensure that the HGT signals in complex agricultural soils are not masked by disordered data.
![]() Figure 2 Performance comparison between χ2 and Chernoff approach for SI-based HGT detection under simulated data (Adopted from Sevillya et al., 2020) |
5.3 Major findings: prevalent mobile gene families, microbial donors/recipients, and environmental influences
When analyzing the results, a phenomenon that is hard to ignore is that in the soil that has been treated with manure, there are significantly more mobile gene families related to resistance and metabolism, especially shortly after application, when gene exchanges seem to be more frequent. However, the sources of mobile genes are not singular. The study simultaneously identified multiple possible donors and recipients, including the common Bacillus and Nocardia genera in soil, as well as the Comonas genus from fecal backgrounds. This indicates that genes can interweave and flow among different ecological sources. It is worth noting that, apart from manure, factors such as sewage irrigation and pollution from mining can also alter the frequency of HGT and the way drug resistance genes spread. Overall, the impact of agricultural management measures on the gene transfer network is more direct than imagined.
6 Implications of HGT in Soil Health and Microbial Ecology
6.1 Spread of antibiotic resistance and environmental resilience through HGT
In soil environments, antibiotic resistance is often not something that a certain bacterial community "evolves on its own", but rather rapidly flows among bacteria through horizontal gene transfer. Many drug resistance genes are hidden on mobile genetic elements. Once they spread, they will give more originally sensitive fungi a survival advantage. Such changes sometimes enhance the stability of the entire community in the face of antibiotics or pollutants, but they may also cause certain gene donor groups to lose some of their original ecological competitiveness. Despite this, HGT is still regarded as an important way for soil bacteria to cope with natural and anthropogenic pressures, enabling the microbial community to maintain a certain degree of adaptability in a changing environment.
6.2 Influence on microbial community structure and ecosystem functionality
At the community level, the impact brought by HGT is often broader than expected. Many genes related to degradation, resistance or tolerance move within the community, making some bacteria more potent and competitive in specific ecological niches such as the rhizosphere. The flow of genes makes it easier for microorganisms to enter new environments and adapt to different resource conditions, and also causes changes in the composition of the community. Nutrient cycling, soil fertility and plant health will all be affected in a chain reaction by these changes. In addition, when genes flow among different groups, the cooperative or competitive relationships among microorganisms will also be reshaped, bringing about a new balance in the entire network structure.
6.3 Implications for sustainable agriculture and microbial resource management
In agricultural production, discussions related to HGT often cannot bypass the issues of soil health and the spread of drug-resistant genes. Practices such as manure application or sewage irrigation, on the one hand, replenish soil nutrients, but on the other hand, may also cause certain drug-resistant genes or functional genes to spread more rapidly in the soil, thereby altering the operation mode of microbial communities. To keep this kind of risk within a reasonable range, it is particularly important to understand how genes flow among microorganisms. Mastering these mechanisms can not only help formulate more stable soil management measures, but also enable people to utilize microorganisms more effectively to enhance the stress resistance of crops and improve soil fertility. It can be said that integrating the understanding of HGT into agricultural practice is a crucial step towards a more resilient agricultural ecosystem.
7 Challenges and Future Prospects
7.1 Limitations in computational sensitivity and specificity for HGT detection
When conducting computational tests for HGT, the most common problem researchers encounter is often not how to set up the process, but that it is always difficult to balance sensitivity and specificity. Especially when it occurs among species that are closely related, the differences in sequence or phylogenetic aspects are so small that they are almost impossible to catch. The signals given by algorithms are often ambiguous, which may either miss true transfers or mistake common variations for exogenous genes. Although there are some improvement measures now, such as using heuristic judgments based on collinearity information or incorporating gene length and kinship into adaptive criteria, when it comes to an environment like soil where microorganisms are extremely complex and communities change rapidly, specificity is still not easy to guarantee. It is precisely for this reason that in order to more accurately identify various types of HGT events, in the future, we may still have to rely on stronger and more detailed computational models.
7.2 Integration of multi-omics data to improve accuracy and interpretation
To make the detection of HGT more reliable, an increasing number of studies have begun to consider data from different omics together. When metagenomic, transcriptomic, proteomic and even metabolomic information is combined, transfer events that were originally difficult to explain by sequence alone can often find clearer ecological significance (De Sousa et al., 2023). Some new frameworks, such as models built on knowledge graphs, can also incorporate the complex relationships among genes, movable components and environmental variables into the analysis, making the prediction effect stronger. Such integration not only helps to identify the transmission path of resistance genes, but also can find other functional transfers related to soil health, and the explanatory level is much broader than that of the single-omics approach.
7.3 Need for standardization and curated benchmark datasets in HGT research
In the entire field of HGT research, a long-standing difficulty is the lack of recognized and reliable benchmark datasets. Without clear reference standards, it is difficult for different analytical tools to be truly compared from the same starting point. Even when the same experiment is repeated, consistent conclusions may not be reached. Therefore, reference datasets that can cover multiple types of microbial communities and involve different types of gene transfer scenarios are particularly important. They can not only be used to verify methods but also contribute to the improvement of subsequent tools. Moreover, if there is an open, easy-to-operate toolbox that combines phylogenetic inference with high-throughput computing, the threshold for entering this field will also be much lower. Only by completing these fundamental tasks can we gain a deeper understanding of the role of HGT in soil microbial communities and its long-term significance in agricultural and environmental sustainability.
8 Conclusion
When studying horizontal gene transfer in soil, computational analysis has almost become an unavoidable step. A process like MetaCHIP is to infer which genes might "come from elsewhere" by leveraging phylogenetic relationships and best-matching information without a reference genome. This method can not only capture recent gene exchanges but also reveal some earlier traces, thereby providing a clearer understanding of the role of HGT in resistance, metabolism and environmental adaptation. However, the reality is often not as smooth as the flowchart. Incomplete metagenomic assembly and overly similar strains often cause interference to the analysis, and this is basically inevitable in actual operation.
The significance of HGT detection is not limited to "identifying transfer events" itself; it is more like a key to understanding how soil microbial communities maintain diversity and cope with stress. The spread of antibiotic resistance or metabolic capacity is often explained through these gene transfer pathways, and these functions are closely related to soil conditions, plant health, and the operation of ecosystems. The gene flow map outlined through computational tools also enables researchers to more easily understand the interaction patterns among different bacterial communities, thereby providing more practical basis for ecological models.
When studying horizontal gene transfer in soil, computational analysis has almost become an unavoidable step. A process like MetaCHIP is to infer which genes might "come from elsewhere" by leveraging phylogenetic relationships and best-matching information without a reference genome. This method can not only capture recent gene exchanges but also reveal some earlier traces, thereby providing a clearer understanding of the role of HGT in resistance, metabolism and environmental adaptation. However, the reality is often not as smooth as the flowchart. Incomplete metagenomic assembly and overly similar strains often cause interference to the analysis, and this is basically inevitable in actual operation.
Acknowledgments
We thank the anonymous reviewers for their insightful comments and suggestions that greatly improved the manuscript.
Conflict of Interest Disclosure
The authors affirm that this research was conducted without any commercial or financial relationships that could be construed as a potential conflict of interest.
Arnold B., Huang I., and Hanage W., 2021, Horizontal gene transfer and adaptive evolution in bacteria, Nature Reviews Microbiology, 20(4): 206-218.
https://doi.org/10.1038/s41579-021-00650-4
Brito I., 2021, Examining horizontal gene transfer in microbial communities, Nature Reviews Microbiology, 19(7): 442-453.
https://doi.org/10.1038/s41579-021-00534-7
Carpanzano S., Santorsola M., nf-core community, and Lescai F., 2022, hgtseq: a standard pipeline to study horizontal gene transfer, International Journal of Molecular Sciences, 23(23): 14512.
https://doi.org/10.3390/ijms232314512
Coyte K., Stevenson C., Knight C., Harrison E., Hall J., and Brockhurst M., 2022a, Horizontal gene transfer and ecological interactions jointly control microbiome stability, PLOS Biology, 20(11): e3001847.
https://doi.org/10.1371/journal.pbio.3001847
Coyte K., Stevenson C., Knight C., Harrison E., Hall J., and Brockhurst M., 2022b, Horizontal gene transfer increases microbiome stability, bioRxiv, 25: 481914.
https://doi.org/10.1101/2022.02.25.481914
De Sousa J., Lourenço M., and Gordo I., 2023, Horizontal gene transfer among host-associated microbes, Cell Host & Microbe, 31(4): 513-527.
https://doi.org/10.1016/j.chom.2023.03.017
Hong Y.D., and Huang H.Y., 2024, The role of soil microbiota in rice cultivation and its implications for agricultural sustainability, Molecular Soil Biology, 15(2): 87-98.
https://doi.org/10.5376/msb.2024.15.0010
Islam M., Summers A., and Arpinar I., 2025, Knowledge graph-based framework for detecting horizontal gene transfer events driving antimicrobial resistance, bioRxiv, 9: 658534.
https://doi.org/10.1101/2025.06.09.658534
Macedo G., Olesen A., Maccario L., Leal H., Maas P., Heederik D., Mevius D., Sørensen S., and Schmitt H., 2022, Horizontal gene transfer of an IncP1 plasmid to soil bacterial community introduced by Escherichia coli through manure amendment in soil microcosms, Environmental Science & Technology, 56(16): 11398-11408.
https://doi.org/10.1021/acs.est.2c02686
Maheshwari M., Abulreesh H., Khan M., Ahmad I., and Pichtel J., 2017, Horizontal gene transfer in soil and the rhizosphere: impact on ecological fitness of bacteria, In: Agriculturally Important Microbes for Sustainable Agriculture: Volume I: Plant-Soil-Microbe Nexus, Singapore: Springer Singapore, pp.111-130.
https://doi.org/10.1007/978-981-10-5589-8_6
Nielsen K., and Van Elsas J., 2019, Horizontal gene transfer and microevolution in soil, In: Modern Soil Microbiology, Third Edition, CRC press, pp.105-123.
https://doi.org/10.1201/9780429059186-7
Ravenhall M., Skunca N., Lassalle F., and Dessimoz C., 2015, Inferring horizontal gene transfer, PLoS Computational Biology, 11(5): e1004095.
https://doi.org/10.1371/journal.pcbi.1004095
Ren Z., Zhao Y., Han S., and Li X., 2022, Regulatory strategies for inhibiting horizontal gene transfer of ARGs in paddy and dryland soil through computer-based methods, Science of the Total Environment, 856: 159096.
https://doi.org/10.1016/j.scitotenv.2022.159096
Sevillya G., Adato O., and Snir S., 2020, Detecting horizontal gene transfer: a probabilistic approach, BMC Genomics, 21(Suppl 1): 106.
https://doi.org/10.1186/s12864-019-6395-5
Sobecky P., and Coombs J., 2009, Horizontal gene transfer in metal and radionuclide contaminated soils, Horizontal Gene Transfer: Genomes in Flux, 2009: 455-472.
https://doi.org/10.1007/978-1-60327-853-9_26
Song W., Wemheuer B., Zhang S., Steensen K., and Thomas T., 2019, MetaCHIP: community-level horizontal gene transfer identification through the combination of best-match and phylogenetic approaches, Microbiome, 7(1): 36.
https://doi.org/10.1186/s40168-019-0649-y
Tsirigos A., and Rigoutsos I., 2005, A new computational method for the detection of horizontal gene transfer events, Nucleic Acids Research, 33(3): 922-933.
https://doi.org/10.1093/nar/gki187
Wijaya A., Anžel A., Richard H., and Hattab G., 2025, Current state and future prospects of horizontal gene transfer detection, NAR Genomics and Bioinformatics, 7(1): lqaf005.
https://doi.org/10.1093/nargab/lqaf005
Yuan L., Lu H., Li F., Nielsen J., and Kerkhoven E., 2023, HGTphyloDetect: facilitating the identification and phylogenetic analysis of horizontal gene transfer, Briefings in Bioinformatics, 24(2): bbad035.
https://doi.org/10.1093/bib/bbad035
Zhu Q., Kosoy M., and Dittmar K., 2014, HGTector: an automated method facilitating genome-wide discovery of putative horizontal gene transfers, BMC Genomics, 15(1): 717.
https://doi.org/10.1186/1471-2164-15-717
Zhu S., Hong J., and Wang T., 2024, Horizontal gene transfer is predicted to overcome the diversity limit of competing microbial species, Nature Communications, 15(1): 800.
https://doi.org/10.1038/s41467-024-45154-w

. HTML
Associated material
. Readers' comments
Other articles by authors
. Jun Wang
. Qikun Huang
Related articles
. Horizontal gene transfer
. Soil microbiome
. Metagenomics
. Computational biology
. Microbial ecology
Tools
. Post a comment
.png)
.png)