Electronic Theses and Dissertations
Date
2025
Document Type
Dissertation
Degree Name
Doctor of Philosophy
Department
Public Health
Committee Chair
Yu Jiang
Committee Member
Hongmei Zhang
Committee Member
Meredith Ray
Committee Member
Syed Hasan Arshad
Abstract
Binary outcomes with rare events (event rate less than 5%) present significant analytical challenges, particularly in high-dimensional settings such as epigenome-wide association studies (EWAS) involving DNA methylation data. Traditional logistic regressions are often inadequate under these conditions, suffering from bias, high variance, and diminished power when the event rate is low and the number of predictors exceeds the sample size (p >> n). To address these limitations, we developed a series of novel methods that enhance sensitivity and accuracy in identifying biologically meaningful biomarkers while accounting for both event rarity and high dimensionality. We first introduce two innovative screening approaches. The Rare-Screening method incorporates bootstrap resampling with empirical Bayes adjustments to stabilize inference, while the Firth-ttScreening method applies a Firth-corrected logistic regression within a cross-validation framework. Simulation studies and application to the Isle of Wight (IOW) male birth cohort to study the association between DNAm at birth and asthma acquisition during adolescence. The results show the developed methods have much higher sensitivity comparing to the benchmark methods. Rare-Screening identified 579 CpG sites and Firth-ttScreening identified 34 CpG sites from over 450,000 CpGs, with 25 CpGs overlapping between the two as candidate biomarkers. Secondly, we introduce a novel penalty, FMCP, in the joint model to address the challenges posed by high dimensionality and rare events. FMCP integrates a log-F penalty for bias reduction in rare events and Minimax Concave Penalty (MCP) for sparsity. Simulations show FMCP consistently achieved superior sensitivity and accuracy compared to conventional MCP and LASSO methods. Out of the 25 screened CpG sites, FMCP selected 10 CpG sites, whereas the traditional MCP method failed to work at the low event rate. Finally, we propose a Bayesian hierarchical model that integrates prior biological knowledge through a log-F(1,1) prior for covariate correction and a spike-and-slab prior for variable selection. A Beta hierarchical structure enables adaptive weighting of prior inclusion probabilities. For the IOW dataset, 3 CpG sites as potential biomarkers for asthma transition. Overall, the methodological innovations in the current study provide a robust framework for analyzing rare binary outcomes in high-dimensional biological data, advancing the discovery of epigenetic biomarkers.
Library Comment
Dissertation or thesis originally submitted to ProQuest.
Notes
Embargoed until 02-18-2026
Recommended Citation
Abrar, Mohammad Nahian Ferdous, "Screening and Variable Selection for Analyzing Binary Outcome with Rare Events and High-Dimensional Predictors" (2025). Electronic Theses and Dissertations. 3871.
https://digitalcommons.memphis.edu/etd/3871
Comments
Data is provided by the student.