Pretrained Language Models for Biological Sequence Understanding

Haimei Wang

Research Insight

Pretrained Language Models for Biological Sequence Understanding

Haimei Wang

Hainan Institute of Biotechnology, Haikou, 570206, Hainan, China

Author

Correspondence author
Computational Molecular Biology, 2025, Vol. 15, No. 3 doi: 10.5376/cmb.2025.15.0014
Received: 02 Apr., 2025 Accepted: 13 May, 2025 Published: 04 Jun., 2025

This is an open access article published under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Preferred citation for this article:

Wang H.M., 2025, Pretrained language models for biological sequence understanding, Computational Molecular Biology, 15(3): 141-150 (doi: 10.5376/cmb.2025.15.0014)

Abstract

Pre-trained language models (PLMS) are increasingly becoming innovative tools in life sciences, capable of autonomously learning rich representations from massive amounts of biological sequence data. They capture complex patterns and long-term dependencies in DNA, RNA and protein sequences through self-supervised training, effectively compensating for the limitations of traditional bioinformatics methods. This paper reviews the progress of PLM in the field of biological sequence understanding, covering the model principles and their applications in protein function prediction, gene expression regulation, and structural modeling, etc. It focuses on discussing the case of using the ESM-2 model to predict the impact of protein stability mutations and its comparison with traditional methods. Finally, this paper analyzes the challenges such as data sparsity, model interpretability and computational cost, and looks forward to the development prospects of the deep integration of artificial intelligence and molecular biological science. These advancements indicate that pre-trained models are leading a transformation in the research paradigm of biological sequences.

Keywords

Pre-trained language model; Biological sequence; Protein function prediction; Gene regulation; Protein structure prediction

[Full-Text PDF] [Full-Flipping PDF] [Full-Text HTML]

Computational Molecular Biology

• Volume 15

View Options
. PDF(569KB)
. FPDF(win)
. FPDF(mac)
. HTML
. Online fPDF
Associated material
. Readers' comments
Other articles by authors
. Haimei Wang

Related articles
. Pre-trained language model

. Biological sequence

. Protein function prediction

. Gene regulation

. Protein structure prediction

Tools
. Email to a friend
. Post a comment