Author
Correspondence author
Computational Molecular Biology, 2025, Vol. 15, No. 3 doi: 10.5376/cmb.2025.15.0014
Received: 02 Apr., 2025 Accepted: 13 May, 2025 Published: 04 Jun., 2025
Wang H.M., 2025, Pretrained language models for biological sequence understanding, Computational Molecular Biology, 15(3): 141-150 (doi: 10.5376/cmb.2025.15.0014)
Pre-trained language models (PLMS) are increasingly becoming innovative tools in life sciences, capable of autonomously learning rich representations from massive amounts of biological sequence data. They capture complex patterns and long-term dependencies in DNA, RNA and protein sequences through self-supervised training, effectively compensating for the limitations of traditional bioinformatics methods. This paper reviews the progress of PLM in the field of biological sequence understanding, covering the model principles and their applications in protein function prediction, gene expression regulation, and structural modeling, etc. It focuses on discussing the case of using the ESM-2 model to predict the impact of protein stability mutations and its comparison with traditional methods. Finally, this paper analyzes the challenges such as data sparsity, model interpretability and computational cost, and looks forward to the development prospects of the deep integration of artificial intelligence and molecular biological science. These advancements indicate that pre-trained models are leading a transformation in the research paradigm of biological sequences.
. PDF(569KB)
. FPDF(win)
. FPDF(mac)
. HTML
. Online fPDF
Associated material
. Readers' comments
Other articles by authors
. Haimei Wang
Related articles
. Pre-trained language model
. Biological sequence
. Protein function prediction
. Gene regulation
. Protein structure prediction
Tools
. Email to a friend
. Post a comment