The Role and Challenges of Genome-wide Association Studies in Revealing Crop Genetic Diversity

Danyan Ding

Review and Progress

The Role and Challenges of Genome-wide Association Studies in Revealing Crop Genetic Diversity

Danyan Ding

Institute of Life Sciences, Zhejiang A&F University, Zhuji, 311800, China

Author

Correspondence author
Bioscience Methods, 2024, Vol. 15, No. 1 doi: 10.5376/bm.2024.15.0002
Received: 01 Dec., 2023 Accepted: 04 Jan., 2024 Published: 19 Jan., 2024

This is an open access article published under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Preferred citation for this article:

Ding D.Y., 2024, The role and challenges of genome-wide association studies in revealing crop genetic diversity, Bioscience Method, 14(1): 8-19 (doi: 10.5376/bm.2024.15.0002)

Abstract

Genome-wide association studies (GWAS) have shown remarkable achievements in the study of crop genetic diversity, providing a powerful tool for crop improvement by identifying genetic markers and genes related to key agronomic traits. However, GWAS faces challenges such as the complexity of population structure, the difficulty of detecting rare variants and small-effect variants, and the complexity of result interpretation. This study aims to combine new technologies such as CRISPR/Cas9 gene editing and GWAS results. Integrating multi-omics data (such as transcriptomics, proteomics) and GWAS will improve the ability to analyze traits, deeply understand the complex mechanisms of trait formation, and accelerate crop production. Character improvement. This study also emphasizes the importance of protecting and rationally utilizing crop genetic resources, hoping that GWAS will exert greater potential in crop genetic research and improvement in the future, with a view to contributing to the sustainable development of agriculture.

Keywords

Genome-wide association studies (GWAS); Crop improvement; Genetic diversity; Gene editing; Multi-omics integration

Genome -wide association studies (GWAS) are powerful genetic tools that allow scientists to identify genetic markers across the entire genome that are associated with specific traits. This approach is based on a basic assumption: that specific genetic variants, or allele frequencies, are distributed differently in populations with different trait expressions. By comparing genomic data from thousands of individuals, GWAS can reveal which genetic variants are associated with disease, physiological traits, or specific traits in agriculture (such as yield, disease resistance, etc.). The importance of this approach lies in its ability to reveal the genetic basis behind complex traits-those that may be influenced by multiple genes as well as environmental factors.

Crop genetic diversity refers to the genetic variation within crop populations, including genetic differences between different species, varieties, varieties and cultivars. This diversity is the result of biological evolution and the basis of agricultural production. Genetic diversity enables crops to adapt to environmental changes, resist pests and diseases, and improve the stability and sustainability of agricultural systems (Abdelraheem et al., 2021). In crop improvement, genetic diversity can be used to develop new varieties with high yield, high quality, stress resistance and other characteristics to meet the growing demand for food and cope with the challenges posed by climate change.

The background and motivation for the application of GWAS in crop genetic diversity research stems from the urgent need for crop trait improvement. With the growth of population and limited resources, how to increase crop yields, improve crop quality, and enhance crop resistance to adversity has become a global challenge. Although traditional breeding methods have achieved great success in the past few centuries, with the development of genetics and molecular biology, people have begun to seek more precise and efficient methods to explore the genetic potential of crops. GWAS provides a solution that not only rapidly identifies genes associated with key traits across a broad range of genetic backgrounds, but also reveals the genetic mechanisms underlying these traits. This is of great significance for guiding molecular-assisted breeding and achieving precise improvement of crop traits (Peng et al., 2022).

Specifically, the application of GWAS allows researchers to discover novel, advantageous genetic variants in a broad range of crop populations that may have been overlooked in traditional breeding. For example, in major food crops such as rice and wheat, the application of GWAS has successfully identified multiple key genes or gene regions related to yield, disease resistance, and stress tolerance. These findings not only enrich our understanding of crop genetic diversity, but also provide new strategies for molecular-assisted selection and genetic improvement of crops. In addition, GWAS can also help to discover valuable genetic resources in wild species and local varieties, which are crucial for enhancing crop adaptability and sustainable production.

Although GWAS has shown great potential in revealing crop genetic diversity, it also faces many challenges during its application, including data complexity, selection of analysis methods, interpretation of results, and effective use of genetic information. Therefore, future research needs to innovate and improve at multiple levels to fully leverage the role of GWAS in crop genetic diversity research and contribute to global agricultural production and crop improvement.

1 Overview of GWAS Technology

1.1 Basic principles and methods of GWAS

Genome-wide association studies (GWAS) are a method used to find genes or genomic regions in genetic material that are associated with specific traits. The basic principle is based on a hypothesis: if a genetic variation (usually a single nucleotide polymorphism, SNP) is closely related to a specific trait, then individuals with this variation should show a certain degree of improvement in this trait. Common feature. GWAS identifies genetic variation associated with a trait by comparing the frequency of genetic markers in individuals or populations with different trait expressions.

The GWAS method usually includes several key steps , which is to collect a large enough sample population that has obvious phenotypic differences in specific traits; then conduct a genome-wide scan on these samples to record thousands of Information about genetic markers (mainly SNPs); finally, statistical methods are used to analyze the correlation between these genetic markers and traits to identify markers that are significantly associated with traits (Hasan et al., 2021).

During the analysis process, the statistical model used in GWAS can control the potential confounding effects of population structure and genetic background, thereby improving the accuracy of the association signal. This step is critical because population structure (i.e., differences in genetic background) can lead to false-positive results. Once SNPs that are significantly associated with a trait are identified, researchers can further explore the genes near those SNPs to identify specific genes or genomic regions that may have an impact on trait expression.

A major advantage of GWAS is that it does not rely on prior genetic knowledge and enables unbiased exploration across the entire genome. This means that GWAS can reveal previously unrecognized new genes and genetic mechanisms that influence complex traits. However, the identified genetic markers usually require further experimental and functional studies to verify their actual impact on traits, which includes the use of genetic engineering, gene editing technology, and phenotypic identification (Peng et al ., 2022).

By integrating genetic and phenotypic data from large samples, GWAS provide a powerful method for understanding the genetic basis of complex traits. Despite challenges such as large data volumes, complex analyses, and difficult interpretation of results, GWAS has achieved remarkable achievements in multiple fields, especially in human disease genetics, agriculture, and plant breeding.

1.2 GWAS data types and acquisition methods

To conduct genome-wide association studies (GWAS) mainly include two categories: genetic data and phenotypic data. Genetic data involves an individual's genomic information, usually in the form of single nucleotide polymorphisms (SNPs) , while phenotypic data is about an individual's performance on specific traits, such as height, yield, disease resistance, etc. Accurate collection and high-quality processing of these two types of data are critical to the success of GWAS.

Acquisition of genetic data is usually accomplished through high-throughput genotype sequencing technology. The process involves extracting DNA from each study subject and then analyzing it using gene chips or next-generation sequencing (NGS) technology. Gene chip is a cost-effective method that can detect millions of known SNP sites simultaneously (Abdelraheem et al., 2021). Next-generation sequencing technology allows researchers to not only detect known SNP sites but also discover new genetic variants, although this method is more expensive. The acquired genetic data are then processed and quality controlled through bioinformatics methods to ensure data accuracy and usability.

The collection of phenotypic data involves the precise measurement and recording of individual traits. This process requires the use of standardized methods to evaluate and record the performance of each individual on the studied traits while controlling environmental variables. For agricultural crops, phenotypic data can include traits such as yield, maturity, and disease resistance; while in human genetics research, it may include disease status, biochemical indicators, or other health indicators. The quality of phenotypic data directly affects the accuracy of GWAS analysis, so data reliability must be ensured through precise measurements and sufficient replication.

During the data collection process, the representativeness and diversity of the sample also need to be taken into consideration. Selecting samples with sufficient numbers and genetic background diversity can help enhance the discovery power of GWAS, which is especially important when looking for rare variants or genes with small effects. In addition, collecting detailed environmental and lifestyle data may also be critical for some studies, as these factors may interact with genetic factors to influence trait performance.

The collection and processing of genetic and phenotypic data required to conduct GWAS is a complex but critical process. High-quality data acquisition methods, including advanced sequencing technology, precise phenotypic measurements, and meticulous data processing and analysis, are the foundation for ensuring the success of GWAS and realizing its application potential in genetic research.

1.3 Statistical methods and computational tools for GWAS analysis

In genome-wide association studies (GWAS), a range of statistical methods and computational tools are used to analyze the correlation between genetic and phenotypic data, aiming to identify genetic variants associated with specific traits. These statistical methods mainly include correlation analysis, group structure and kinship correction, and multivariate analysis.

Association analysis is one of the core statistical methods in GWAS, which identifies potential genetic factors by calculating the correlation between the frequency of genetic markers (such as SNPs) and specific traits. The most commonly used method is single-locus association analysis, in which each SNP is tested individually for statistical association with trait performance. This is usually done through linear regression or logistic regression models, linear regression is used for continuous traits, and logistic regression is used for categorical traits (such as disease states) (Peng et al., 2022).

Considering that population structure and relatedness may lead to false positive associations, methods to correct for these potential confounding factors are also included in the GWAS analysis. Population structure refers to the genetic background differences present in a sample set, while kinship refers to the blood relationship between samples. These factors, if not controlled, may result in erroneous associations of genetic markers with traits. The effects of population structure can be identified and corrected by using methods like principal component analysis (PCA), while mixed linear models (MLM) can improve the accuracy of GWAS by taking into account both population structure and kinship. Multivariate analysis allows multiple traits or multiple genetic markers to be considered simultaneously to explore interactions and joint effects between them. This approach can help reveal the genetic basis of complex traits, especially when traits are biologically interconnected.

To handle the complex data and statistical analysis of GWAS, a variety of computational tools and software packages have been developed. PLINK is one of the most widely used GWAS data analysis tools. It provides a series of functions, including data management, basic statistical analysis, correlation analysis, and control of population structure. GCTA (Genome-wide complex trait analysis) is another popular tool specifically used to estimate the contribution of genetic variation to trait variance and perform population structure correction. In addition, software such as Admixture, Eigenstrat and Structure can be used to analyze population structure, and FastLMM Tools such as Factored spectrally transformed linear mixed models are used to handle mixed linear model analysis (Ceballos et al., 2015).

The choice of statistical methods and computational tools for GWAS depends on the specific needs of the study, including the type of trait, the genetic background of the sample, and the goals of the study. Correct application of these methods and tools can effectively identify genetic variants associated with traits, providing strong support for understanding the genetic basis of traits.

2 The Role of GWAS in the Study of Crop Genetic Diversity

2.1 GWAS reveals genetic basis of crop traits

Genome-wide association studies (GWAS) play an extremely important role in the study of crop genetic diversity. It provides an efficient and powerful method to reveal the genetic basis behind crop traits. Through GWAS, scientists can identify genetic variations related to important agronomic traits, such as yield, stress resistance (including drought resistance, salt-alkali resistance), disease resistance, and quality characteristics, across the entire crop genome . This process not only deepens our understanding of crop genetic diversity, but also provides powerful molecular tools for crop improvement, greatly promoting the development of precision breeding technology.

The application of GWAS allows scientists to discover new and beneficial genetic variations in a wide range of crop populations, including traditional varieties, landraces and wild relatives. These genetic resources are valuable assets for crop improvement, and they can be used to develop new varieties that are adapted to different environmental conditions and have high yield and quality traits. For example, in rice and wheat, multiple key genes or gene regions related to yield and disease resistance have been successfully identified through GWAS. These findings not only enhance crop genetic diversity but also improve crop productivity and sustainability (Ceballos et al., 2015).

In addition, GWAS also provides a new perspective on understanding the genetic mechanisms of crop traits. By analyzing the association between traits and genetic variation, researchers can uncover the gene networks and regulatory pathways that control complex traits, thereby gaining a deeper understanding of the genetic and molecular basis of traits. This is particularly important for the study of crop adversity stress responses because it involves the interaction of multiple genes and environmental factors (Abdelraheem et al., 2021). Through GWAS, researchers can identify key genetic factors that affect crop phenotypes under specific environmental conditions, providing guidance for environmental adaptability and stress-resistant breeding of crops.

Although GWAS has shown great potential in studying crop genetic diversity, its application also faces challenges, including the need for a large number of samples to enhance the statistical power of the study, handling the complexity brought by population structure and genetic background diversity, and the need to extract data from massive amounts of samples. Genetic variation identifies factors that actually influence traits. However, with the advancement of high-throughput sequencing technology, improvements in data analysis methods, and the development of bioinformatics tools, the application of GWAS in the study of crop genetic diversity will become more extensive and in-depth. In the future, GWAS is expected to further promote the process of crop molecular breeding and achieve precise improvement of crop traits and sustainable development of agricultural production.

2.2 Application of GWAS in the study of crop genetic diversity

GWAS have made significant progress in the study of genetic diversity in many crops, especially in revealing genetic loci associated with important agronomic traits. Several successful case studies are introduced below, demonstrating the application results of GWAS in crop genetic diversity research.

In maize (Zea mays L.), GWAS has achieved a major breakthrough and successfully identified many genetic loci and potential genes related to complex traits. These traits include responses to abiotic and biotic stresses, and their discovery holds promise for enhancing fitness and yield through effective breeding strategies. In addition, research using GWAS also involves how to use multi-omics methods including genomics, transcriptomics, proteomics, metabolomics, epigenomics and phenomics to deepen the understanding of complex traits of maize. understanding, thereby improving environmental stress tolerance and promoting maize production (Bhat et al., 2021).

Haplotype-based models are an important method for GWAS that accurately capture allelic diversity by integrating high-density marker data, improving the ability to discover epistatic interactions and minimizing the need for multiple testing. This method has been developed and applied in major crops such as wheat, rice and soybeans. Compared with traditional single-site models, haplotype-based models are more efficient and reliable in identifying haplotypes associated with selected traits.

In China , for example , the National Medium-Term Gene Bank at the Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences (OCRI-CAAS) preserves more than 8,000 sesame germplasm. Similarly, the Beijing National Long-term Gene Bank preserves approximately 4 500 parts of sesame material (Figure 1). Based on these large collections, a strategy to build a core collection of sesame began in the early 2000s using morphological descriptors and later molecular tools. Ultimately, OCRI established a sesame core germplasm bank, containing 705 different accessions, including 405 local varieties, 95 varieties from China, and 205 accessions from 28 other countries. The entire panel in Illumina HiSeq 2000 (http://www.ncgr.ac.cn/SesameHapMap), a total of 5 were detected in the genome 407 981 SNPs, with an average of 2 SNPs every 50 bp (Figure 1) (Muez et al., 2021). It can be seen that in order to explore the genetic basis of economically important agronomic traits and identify possible causative genes, these developed GWAS panels need to be updated by providing more materials reflecting different agroecological contexts around the world.

Figure 1 Process of key steps in Sesame GWAS implementation (Muez et al., 2021)

For another example, Nouraei et al. (2024) used the 90KSNP array to conduct genome-wide association analysis and revealed the genetic determinants of key traits related to wheat drought tolerance, namely plant height, root length, and root and shoot dry weight. Using a mixed linear model (MLM) approach to analyze 125 well-watered and drought stress-treated wheat accessions, we identified 53 that were significantly related to the stress sensitivity (SSI) and tolerance index (STI) of the target traits. Related SNPs. Notably, chromosomes 2A and 3B have 10 and 9 relevant markers, respectively. On 17 chromosomes, 44 unique candidate genes were identified, mainly located in the distal ends of chromosomes 1A, 1B, 1D, 2A, 3A, 3B, 4A, 6A, 6B, 7A, 7B and 7D. These genes are involved in multiple functions related to plant growth, development, and stress response, providing a rich resource for future research. Clustering patterns emerged, especially 7 genes related to plant height SSI and 4 genes related to plant height STI and shoot dry weight, clustered in specific regions of the 2AS and 3BL chromosome arms. Furthermore, shared genes encoding polygalacturonase, auxin-related protein peptide deacylases, and receptor-like kinases highlighted the interconnectedness between plant height and shoot dry weight (Figure 2).

Figure 2 Genetic diversity of 125 wheat varieties (Nouraei et al., 2024)

Note: A: Population structure estimated by STRUCTURE, with the best subpopulation (K=6), each color represents a subpopulation; B: Phylogenetic tree, each branch represents a germplasm, and the length of the branch represents the genetic distance; C: Correlation (kinship) heat map, the blue in the middle represents the degree of correlation; D: Three-dimensional principal component analysis (PCA) diagram, illustrating the germplasm distribution based on the first three principal components (PC).

2.3 GWAS are useful for the identification and utilization of genetic variation in wild relatives of crops

Genome-wide association studies (GWAS) are a key technology for exploring and utilizing beneficial genetic variations in crop wild relatives, and they provide a powerful platform for modern breeding. Wild relatives (CWRs) possess rich genetic diversity and unique adaptive characteristics, which are crucial for improving crop stress resistance, yield and nutritional quality. With the development of high-throughput sequencing technology and the widespread availability of reference genomes, researchers are now able to mine the genetic diversity hidden in CWRs and apply it to crop improvement.

In a study on tomato (Solanum spp.), researchers conducted supergenome analysis of wild tomatoes and cultivated varieties to reveal the genetic diversity and structural variation within the genus Tomato. The study found that the abundance and recent amplification of transposable elements (TEs) in wild tomato species may enhance the genomic diversity and environmental adaptability of these wild species compared with cultivated tomatoes. These findings provide insights into understanding the genome evolution of Solanaceae plants and provide the possibility to utilize genetic resources in wild tomato species (Sahito et al., 2024) (Figure 3).

Figure 3 SV-based GWAS identifies additional correlation signals of tomato fruit flavor (Sahito et al., 2024)

Note: a: Modeling of pan-genome and core genome sizes when incorporating additional genomes into the clustering and composition of the tomato super-pangenome; b: Different types of structures within each genome compared to the S. galapagense reference genome Number of variations; c: distribution of structural variations in 12 tomato genomes on 12 chromosomes; d: wild-specific genome fragment on chromosome1. The 8kb sequence is present in the genomes of all 9 wild tomatoes, but this sequence is not present in domesticated tomato and the 8 kb wild-specific region containing both genes is boxed in red; e: Dot plot showing alignment of chromosome 3 between 12 tomato genomes and the S. galapagense genome

Further research highlights the use of wild relatives in modern breeding, particularly in meeting the challenges posed by future food demands and climate change. By exploring rare and valuable wild species conserved in genetic banks, botanical gardens, national parks, conservation hotspots, and inventories, researchers can discover CWRs carrying agronomically important traits (Soodeh et al., 2022). In addition, by adopting modern breeding experimental methods (such as de novo domestication, genome editing and speed breeding) and computational methods (such as machine learning), the utilization of CWR seeds in breeding programs can be accelerated to improve crop adaptability and yield.

These research examples demonstrate that beneficial genetic variants can be successfully identified and exploited from CWRs using GWAS and other genomic tools, which is of great significance for enhancing crop stress resistance, increasing yield, and improving nutritional quality. Through these methods, researchers can expand the sources of genetic variation in breeding programs and select those traits and genes that contribute to crop improvement, thereby increasing the sustainability of agriculture and global food security.

3 Challenges and Limitations of GWAS

3.1 Challenges of GWAS in crop genetic research

GWAS have become a powerful tool in genetics research, especially in uncovering associations between genetic variations and complex traits. Although GWAS has made significant progress in multiple fields, it still faces a series of challenges and limitations.

Population structure refers to the natural clustering that exists in a sample, which can lead to false positive association results. Differences in genetic background may also mask true gene-trait associations, especially when comparing across populations. To overcome this challenge, researchers need to use sophisticated statistical methods to correct for the effects of population structure, such as structural equation modeling or mixed linear models, but this increases the complexity and computational burden of the analysis.

GWAS is mainly used to identify the association between frequently occurring genetic variants and traits, but its ability to detect rare variants is limited. Rare variants may have important effects on traits, but their low frequency in the population makes them difficult to detect through GWAS. This limits the ability of GWAS to reveal the full picture of genetic diversity.

In GWAS, thousands of genetic markers are tested simultaneously for association with a specific trait, which requires correction for multiple comparisons to avoid false-positive results. Although correction for multiple testing (such as Bonferroni correction) can reduce the false positive rate, it also increases the risk of missing true associations (false negative results). Balancing this trade-off is an important consideration in GWAS design (Bardak et al., 2021).

Even if GWAS successfully identifies genetic markers associated with a trait, translating these statistical associations into biological meaning remains a challenge. Identification of candidate genes near associated markers and their functions requires further experimental validation. In addition, the expression of traits is often controlled by multiple genes and affected by environmental factors, which makes interpreting GWAS results more complex.

With the advancement of sequencing technology, the amount of genetic data generated has increased dramatically, which has placed higher demands on data storage, processing, and analysis. Processing large-scale data sets requires expensive computing resources and specialized data analysis skills, which may limit the capabilities of some research institutions or individual researchers.

Despite these challenges and limitations, GWAS remain an indispensable tool in modern genetics and genomics research. Through continuous technological innovation and methodological improvements, as well as interdisciplinary collaboration, it is expected to overcome these obstacles in the future and more effectively utilize the potential of GWAS in genetic research.

3.2 Detection of rare variants and small effect variants in GWAS

Detecting rare variants and variants of small effect presents specific challenges in genome-wide association studies (GWAS). These challenges arise primarily from the low minor allele frequencies (MAFs) of these variants and the subtle differences in their effects on traits. Because rare variants occur at low frequencies in the population, traditional single-variant analysis methods are often underpowered at typical next-generation sequencing (NGS) sample sizes. Additionally, as sample size increases, the multiple testing burden on single rare variant analysis increases because more unique rare variant sites will be detected. Therefore, obtaining adequate power for single variant rare variant analysis often requires extremely large sample sizes, which are often not practical and/or economically feasible.

Analysis of rare variants typically uses "aggregate" testing, whereby identified variants are collectively tested on the basis of physically overlapping predefined genomic regions. This approach requires a clear definition of the set of variants suitable for analysis, typically by defining genomic regions, such as genes, into which overlapping rare variants are grouped, and is particularly suitable for large-scale indiscriminate scans (e.g., whole-exome sequencing/whole-genome sequencing) (Chen et al., 2022).

Although many aggregated rare variant analysis methods have been developed, they mainly fall into two broad categories: burden tests and variance-component (or "core") tests. For example, the set-based Sequence Core Association Test (SKAT) and its variants are widely used in aggregated rare variant analysis. To address the challenge of rare variants, researchers are also working to combine publicly available whole-genome sequencing (WGS) datasets to create a single reference panel that increases the depth of low-frequency and rare haplotypes. For example, the Haplotype Reference Consortium has combined low read depth WGS data from 20 studies of mainly European ancestry, which improves the accuracy of genotype imputation, especially at low frequency variants down to 0.1% MAF down and allows for smoother interpolation on existing servers (Bomba et al., 2017).

In addition to imputation, custom genotyping arrays are another strategy used to investigate low-frequency and rare variants in association studies. These arrays are often designed for specific diseases and aim to enrich standard haplotype marker SNP panels with variants identified through sequencing and fine mapping efforts.

3.3 GWAS complexity study

The complexity of interpreting genome-wide association study (GWAS) results involves multiple levels, the most critical of which is the confirmation and functional verification of candidate genes. Although GWAS can reveal genetic variations associated with specific traits, how to translate statistical correlations into biological mechanisms remains a huge challenge. This is mainly because most of the variants discovered by GWAS are located outside the coding region and involve non-coding variants, and the mechanism of their impact on traits is not as intuitive as coding region variants.

Confirming candidate genes involves complex bioinformatics analysis, which requires further statistical fine mapping, transcriptome-wide association studies (TWAS) and other methods to screen out the true correlation with traits from a long list of related genetic markers identified by GWAS. candidate genes. However, even after candidate genes are identified, functional validation remains challenging. This is because functional verification in the laboratory requires tedious molecular biology experiments, such as gene expression, transcription factor binding, reporter gene assays, in vivo models, genome editing, and chromatin interactions, to confirm how these candidate genes or variants function. Affects trait performance at the molecular level (Alsheikh et al., 2022).

A recent systematic review revealed the current research status in experimental verification of non-coding variants in GWAS, by examining 1 Screening and evaluation of 454 articles finally confirmed that 309 non-coding GWAS variants were experimentally verified. These variants regulate 252 genes and involve 130 human disease traits. These variations mainly work through multiple mechanisms such as cis-regulatory elements (70%), promoters (22%), and non-coding RNAs (8%). This study highlights the complexity of experimentally validating GWAS findings and the multifaceted approach required when prioritizing variants and nominating target genes (Alsheikh et al., 2022).

These findings highlight that the translation from GWAS results to functional understanding is a multistep process that requires the integration of multiple bioinformatics and experimental methods. Despite some progress, validating thousands of GWAS associations remains a huge challenge for the field. In addition, with the development of functional genomics and gene editing technology, we have reason to expect that GWAS results will be more effectively parsed in the future and more knowledge about the genetic basis of complex traits will be revealed.

4 Future Development Directions of GWAS

4.1 The potential of integrating multi-omics data with GWAS

Integrating multi-omics data (such as transcriptomics , proteomics) and GWAS are opening up new areas of crop trait analysis capabilities. The potential of this integrative approach lies in its ability to provide a more comprehensive view of how crop traits are co-regulated by genes, transcripts, proteins and metabolites. For example, by combining GWAS and transcriptome data, researchers are able to more accurately identify key genes associated with complex traits in crops, such as the DROUGHT1 (DROT1) gene in rice, which is associated with drought resistance, and the MADS26 gene in corn, which affects seed germination (Ahmad et al., 2017).

(mGWAS) that combines genomic, transcriptomic and metabolomic data has been widely used in crops including rice, corn, wheat, barley and tomato, revealing metabolic pathways and genetic variations associated with complex traits, This provides a new perspective for metabolomics-related breeding. For example, by combining genomic, transcriptomic, and metabolomic data, researchers revealed how tomato fruit metabolite content changes during breeding (Zhang et al., 2022) and how metabolism can be altered by selecting genes associated with larger fruits. configuration file.

Additionally, studies combining genomic, transcriptomic, and microbiome data have begun to demonstrate how the microbiome plays a role in crop growth and response to environmental stresses, including drought and disease resistance . For example, by combining transcriptome analysis and microbiome data, researchers identified soil microbes that influence nitrogen metabolism in ultra-high-yielding rice.

These studies demonstrate the great potential of integrating multi-omics data with GWAS to improve the ability to analyze crop traits. By gaining a more comprehensive perspective from different biomolecular levels, researchers can better understand the genetic and molecular basis of crop traits, thereby providing new strategies and targets for crop improvement and breeding.

4.2 Precision breeding strategy based on GWAS

Precision breeding strategies based on genome-wide association studies (GWAS) and molecular-assisted selection (MAS) technology are changing the field of crop improvement, providing a powerful platform for identifying genetic markers associated with important traits and accelerating the development of crop varieties. Selection and breeding process.

In cotton improvement, the application of genetic diversity, quantitative trait locus (QTL) mapping and molecular-assisted selection technology demonstrates the potential of this approach. By analyzing the genome of cotton species, researchers are able to generate a large number of high-throughput DNA markers and identify QTLs associated with valuable traits, which provides a basis for MAS-based breeding projects. Through QTL mapping and GWAS methods, DNA markers associated with valuable traits have greatly accelerated the breeding process, transforming selection from phenotypic selection to selection based on DNA or gene levels. This process not only increases the efficiency and precision of crop improvement programs, but also reduces the cost and time of developing new varieties and hybrids (Fakhriddin et al., 2021).

The use of molecular markers, including polymerase chain reaction (PCR)-based molecular markers and hybridization-based DNA markers, has been successful in breeding and genetic activities in multiple crops. The use of these molecular markers in crop breeding programs not only increases the productivity and accuracy of traditional breeding methods, but also provides the ability to select at any stage of plant growth and development. In particular, DNA marker technology enables breeders to identify complex quantitative traits with unprecedented speed and precision (Hasan et al., 2021) .

With the development of next-generation sequencing (NGS) technology, high-throughput and rapid data generation for the genome, transcriptome, proteome, and metabolome has become feasible. Integrating multiple omics approaches can elucidate gene functions and networks under physiological and environmental stress conditions, which is critical for improving crop yield and enhancing biotic and abiotic stress tolerance. Integrated multi-omics approaches with robust technologies have been used to identify and decode key components of stress response, senescence, and yield in different economically important crops (Yang et al., 2021).

Precision breeding strategies and molecular-assisted selection based on GWAS have broad application prospects in crop improvement. These methods can not only accelerate the breeding process and improve the efficiency of genetic improvement of crops, but also provide a solid scientific basis for the continuous improvement of crop stress resistance, yield and quality through in-depth understanding of the genetic and molecular basis of crops. As technology advances, we expect these methods to play a greater role in improving crops and ensuring food security around the world.

4.3 Combining new technologies with GWAS to accelerate the improvement of crop traits

Combining CRISPR/Cas9 gene editing technology with GWAS results provides a fast and accurate path for improving crop traits. Gene loci related to important agronomic traits identified through GWAS can be directly edited through CRISPR/Cas9 technology, thus accelerating the crop improvement process. This integrated approach can not only improve crop yield, stress resistance and quality, but also speed up the breeding process and reduce the breeding cycle.

The introduction of CRISPR/Cas9 technology makes editing specific genes simple and precise. By designing specific sgRNA, the CRISPR/Cas9 system can accurately identify and cut the target site on the genome, thereby achieving the knockout, knock-in or modification of specific genes. Compared with traditional breeding techniques, this method can directly improve crop traits at the molecular level, greatly shortening the breeding cycle and reducing unnecessary genetic background changes.

For example, research on applying CRISPR/Cas9 technology to improve crop quality has made progress in multiple areas, including the modulation of appearance, taste, nutritional content and other preferred traits. This method has been used to improve traits of nearly 20 crop varieties, including yield improvement, biotic and abiotic stress management, etc. (Abdelraheem et al., 2021). Many of the findings are considered proofs of concept, describing how the CRISPR/Cas9 system can be applied by knocking out specific important genes reported in GWAS.

(CREs) in crops to improve many important agricultural traits. As CRISPR/Cas9 genome editing technology continues to improve, research using GWAS to increase the agronomic value of crops is expected to further increase.

Through these studies, CRISPR/Cas9 technology has demonstrated its great potential in accelerating crop trait improvement and realizing precision breeding strategies. In the future, with more research progress on the combined application of CRISPR/Cas9 and GWAS, it is expected that more breakthroughs will be achieved in crop improvement, improving crop yield, quality and stress resistance to meet the growing global food demand.

5 Conclusion

GWAS have made remarkable achievements in revealing crop genetic diversity and promoting crop improvement. Through GWAS, scientists can identify genetic markers and genes related to important agronomic traits on a genome-wide scale, which provides a powerful tool for understanding the genetic basis of crop traits and implementing precision breeding. Especially in the study of key traits such as crop stress resistance, yield improvement, and quality improvement, GWAS has demonstrated its unique value and potential.

However, GWAS also faces a series of challenges in practice, including the complexity of population structure, the difficulty of detecting rare variants and small-effect variants, the complexity of result interpretation, and the analysis and management of big data. To fully realize the potential of GWAS in the study of crop genetic diversity, we must continue to promote technological and methodological innovation, including improving the accuracy of statistical analysis methods, developing analysis platforms that can effectively integrate multi-omics data, and exploring more efficient Candidate gene validation and functional validation pathways.

In the future, GWAS has broad application prospects, especially when combined with emerging genome editing technologies such as CRISPR/Cas9, which can achieve more efficient improvement of crop traits. Key genes or genetic variations identified through GWAS can become direct targets of gene editing technologies such as CRISPR/Cas9, thereby accelerating the crop improvement process (Berhe et al., 2021). In addition, integrating multi-omics data such as transcriptomics and proteomics with GWAS results can not only improve the accuracy of trait analysis, but also provide a deeper understanding of the complex molecular mechanisms of trait formation.

While promoting the application of GWAS, we should also pay attention to its impact on the utilization of crop genetic resources. Genetic diversity is a valuable resource for crop improvement. The protection and rational utilization of crop genetic resources, especially those in wild relatives of crops, is of great significance for maintaining crop genetic diversity, promoting crop adaptation to environmental changes, and sustainable development.

GWAS has become an indispensable tool for crop genetic research and crop improvement, and its contribution to understanding crop genetic diversity and promoting crop trait improvement cannot be ignored. In the future, through technological innovation and interdisciplinary cooperation, GWAS is expected to solve the challenges faced, further unleash its potential in crop genetic research and crop improvement, and make greater contributions to the sustainable development of global agriculture.

References

Abdelraheem A., Thyssen G.N., Fang D.D., Jenkins J.N., McCarty J.C., and Wedegaertner T., 2021, GWAS reveals consistent QTL for drought and salt tolerance in a MAGIC population of 550 lines derived from intermating of 11 upland cotton (Gossypium hirsutum) parents, Mol. Gen. Genomics, 296: 119-129.

https://doi.org/10.1007/s00438-020-01733-2

Ahmad F., Akram A., Farman K., Abbas T., Bibi A., and Khalid S., 2017, Molecular markers and marker assisted plant breeding: current status and their applications in agricultural development, J. Environ. Agric. Sci., 11: 35-50.

Alsheikh A.J., Wollenhaupt S., King E.A., Reeb J., Ghosh S., Stolzenburg L.R., Tamim S., Lazar J., Davis J.W., and Jacob H.J., 2022, The landscape of GWAS validation; systematic review identifying 309 validated non-coding variants across 130 human diseases, BMC Med Genomics, 1;15(1): 74.

https://doi.org/10.1186/s12920-022-01216-w

Bardak A., Çelik S., Erdoğan O., Ekinci R., and Dumlupinar Z., 2021, Association mapping of Verticillium wilt disease in a worldwide collection of cotton (Gossypium hirsutum L.), Plants, 10: 306.

https://doi.org/10.3390/plants10020306

Berhe M., Dossa K., and You J., 2021, Genome-wide association study and its applications in the non-model crop Sesamum indicum, BMC Plant Biology, 21(1): 283.

https://doi.org/10.1186/s12870-021-03046-x

Bhat J.A., Yu D., Bohra A., and Varshney R.K., 2021, Features and applications of haplotypes in crop breeding, Communications biology, 4(1): 1266.

https://doi.org/10.1038/s42003-021-02782-y

Bomba L., Walter K., and Soranzo N., 2017, The impact of rare and low-frequency genetic variants in common disease, Genome Biol., 18: 77.

https://doi.org/10.1186/s13059-017-1212-4

os H., Kawuki R.S., Gracen V.E., Yencho G.C., and Hershey C.H., 2015, Conventional breeding, marker-assisted selection, genomic selection and inbreeding in clonally propagated crops: a case study for cassava, Theor. Appl. Genet. 128, 1647-1667.

https://doi.org/10.1007/s00122-015-2555-4

Chen W.N., Coombes B.J., and Larson N.B., 2022, Recent advances and challenges of rare variant association analysis in the biobank sequencing era, Front. Genet., 13: 2022.

https://doi.org/10.3389/fgene.2022.1014947

Fakhriddin N.K., Ozod S.T., Dilrabo K.E., Bunyod M.G., Barno B.O., Mukhlisa K.K., Feruza U.R., Kuvandik K.K., Doston S.E., Mukhammad T.K., Madina D.K., Naim N.K., Roza S.A., Sukumar S., John Z.Y., and Ibrokhim Y.A., 2021, Genetic Diversity, QTL Mapping, and Marker-Assisted Selection Technology in Cotton (Gossypium spp.), Front. Plant Sci., 12:39.

https://doi.org/10.3389/fpls.2021.779386

Hasan N., Choudhary S., and Naaz N., 2021, Recent advancements in molecular marker-assisted selection and applications in plant breeding programmes, J .Genet Eng. Biotechnol, 19: 128.

https://doi.org/10.1186/s43141-021-00231-1

Muez B., Komivi D., Jun Y., Pape A.M., Idrissa N.D., Diaga D., Zhang X.R., and Wang L.H., 2021, Genome-wide association study and its applications in the non-model crop Sesamum indicum, SpringerLink, 21(283): 31-39.

https://doi.org/10.1186/s12870-021-03046-x

Nouraei S., Mia M.S., Liu H., 2024, Genome-wide association study of drought tolerance in wheat (Triticum aestivum L.) identifies SNP markers and candidate genes. Mol Genet Genomics 299: 22.

https://doi.org/10.1007/s00438-024-02104-x

Peng Z., Zhao C., Li S., Guo Y., Xu H., Hu G., 2022, Integration of genomics, transcriptomics and metabolomics identifies candidate loci underlying fruit weight in loquat, Hortic. Res. 9, uhac037.

https://doi.org/10.1093/hr/uhac037

Sahito J.H., Zhang H., Gishkori Z.G.N., et al. Advancements and Prospects of Genome-Wide Association Studies (GWAS) in Maize, International Journal of Molecular Sciences, 2024, 25(3): 1918.

https://doi.org/10.3390/ijms25031918

Soodeh T., Jaco Z., William J.W., Jacob M., David E., Jacqueline B., 2022, Application of crop wild relatives in modern breeding: An overview of resources, experimental and computational methodologies, Front Plant Sci., 13: 22.

https://doi.org/10.3389/fpls.2022.1008904

Yang D.D., Mumtaz A.S., Huang L.Y., Walid B.A., Zhang J., Wu Y., Li J., Muzafar H.S., and Wang F.Y., Front. Plant Sci., 12: 39.

Zhang R., Zhang C.P., Yu C.Y., Dong J.G., Hu J.H., 2022, Integration of multi-omics technologies for crop improvement: Status and prospects, Sec. Integrative Bioinformatics, 2: 27.

https://doi.org/10.3389/fbinf.2022.1027457

Bioscience Methods

• Volume 15

View Options
. PDF(0KB)
. HTML
Associated material
. Readers' comments
Other articles by authors
. Danyan Ding

Related articles
. Genome-wide association studies (GWAS)

. Crop improvement

. Genetic diversity

. Gene editing

. Multi-omics integration

Tools
. Email to a friend
. Post a comment