A Structure-Driven Statistical Framework for Rare Variant Association Analysis: Unifying Burden Tests and SKAT with a Decision Pipeline for Method Selection

Xuanjun Fang

Research Article

A Structure-Driven Statistical Framework for Rare Variant Association Analysis: Unifying Burden Tests and SKAT with a Decision Pipeline for Method Selection

Xuanjun Fang

Hainan Provincial Key Laboratory of Crop Molecular Breeding, Hainan Institute of Tropical Agricultural Resources (HITAR), Sanya, 572025, Hainan, China

Author

Correspondence author
Computational Molecular Biology, 2026, Vol. 16, No. 4
Received: 16 May, 2026 Accepted: 20 Jun., 2026 Published: 05 Jul., 2026

This is an open access article published under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

With the advent of high-throughput sequencing technologies, research in complex trait genetics is expanding from the GWAS paradigm focused primarily on common variants to full-spectrum analyses that also encompass rare variants. Compared with single-marker tests, rare variant analysis faces substantial challenges in statistical power and model specification because of low allele frequencies, strong effect heterogeneity, and contamination by non-causal variants. To address these issues, set-based aggregation tests have gradually become the dominant analytical strategy, with burden tests and the Sequence Kernel Association Test (SKAT) constituting two core methodological tracks. In this study, we develop a unified statistical framework to systematically compare burden tests and the SKAT family of methods in terms of their modeling assumptions, conditions of applicability, and performance boundaries. We show that the differences between these two classes of methods fundamentally arise from different characterizations of effect structure within a gene region: burden tests, through linear aggregation, are more suitable for scenarios dominated by concordant effect directions, whereas SKAT, through variance-component modeling, is better suited to heterogeneous effect directions and sparse causal architectures. On this basis, we further propose a decision pipeline for method selection based on genetic architecture, reframing method selection as a problem of causal architecture identification and thereby enabling a shift from heuristic choice to model-driven decision-making. Through simulation and empirical analyses, we show that different methods have clearly defined optimal regions within the space of effect-direction concordance and causal proportion, whereas SKAT-O and multi-kernel methods exhibit strong robustness when the underlying structure is unknown. Drawing on case studies from both human and crop genetics, this study further demonstrates the potential of rare variant analysis for elucidating functional mechanisms and supporting applied research. Overall, this study integrates rare variant association analysis into a structure-oriented statistical inference framework, providing a unified and operational theoretical foundation for method selection, result interpretation, and cross-study application in complex trait research.

Keywords

Rare variants; Burden test; SKAT; SKAT-O; Aggregation tests; Causal architecture; Decision pipeline; Complex trait genetics

[Full-Text HTML]

Computational Molecular Biology

• Volume 16

View Options
. PDF
. HTML
Associated material
. Readers' comments
Other articles by authors
. Xuanjun Fang