Author
Correspondence author
Computational Molecular Biology, 2025, Vol. 15, No. 4 doi: 10.5376/cmb.2025.15.0016
Received: 01 May, 2025 Accepted: 10 Jun., 2025 Published: 01 Jul., 2025
Wang H.P., and Li M.H., 2025, Large language models for biological knowledge extraction, Computational Molecular Biology, 15(4): 160-170 (doi: 10.5376/cmb.2025.15.0016)
The surge in biomedical literature has led to severe information overload for researchers, necessitating automated knowledge extraction tools. Large Language Models (LLMs), which have emerged in recent years, demonstrate superior performance in text understanding and generation, providing a new approach for biological knowledge extraction. This study reviews the applications of LLMs in tasks such as named entity recognition, relation extraction, and event extraction, and discusses their latest advancements in subfields such as genomics, proteomics, and pharmacology. The advantages of LLMs over traditional methods in contextual understanding and semantic representation are analyzed, along with the optimization effects of domain adaptation, fine-tuning, and cue engineering on model performance. A case study of extracting gene-disease associations using the BioGPT model demonstrates the application process and effectiveness of LLMs, while also analyzing challenges related to data quality, model illusion, and privacy protection. The future directions of LLM integration with knowledge graphs, multimodal data integration, and knowledge verification are discussed, along with related ethical considerations. These advancements are expected to provide new paradigms for future biomedical research.
. PDF(373KB)
. FPDF(win)
. FPDF(mac)
. HTML
. Online fPDF
Associated material
. Readers' comments
Other articles by authors
. Hongpeng Wang
. Minghua Li
Related articles
. Large language model
. Biomedical text mining
. Knowledge extraction
. Knowledge graph
. Cue engineering
. Engineering
Tools
. Email to a friend
. Post a comment