Variable selection in semi-parametric models
Abstract
We propose Bayesian variable selection methods in semi-parametric models in the framework of partially linear Gaussian and problit regressions. Reproducing kernels are utilized to evaluate possibly non-linear joint effect of a set of variables. Indicator variables are introduced into the reproducing kernels for the inclusion or exclusion of a variable. Different scenarios based on posterior probabilities of including a variable are proposed to select important variables. Simulations are used to demonstrate and evaluate the methods. It was found that the proposed methods can efficiently select the correct variables regardless of the feature of the effects, linear or non-linear in an unknown form. The proposed methods are applied to two real data sets to identify cytosine phosphate guanine methylation sites associated with maternal smoking and cytosine phosphate guanine sites associated with cotinine levels with creatinine levels adjusted. The selected methylation sites have the potential to advance our understanding of the underlying mechanism for the impact of smoking exposure on health outcomes, and consequently benefit medical research in disease intervention.
Publication Title
Statistical Methods in Medical Research
Recommended Citation
Zhang, H., Maity, A., Arshad, H., Holloway, J., Karmaus, W., Lawson, A., Lee, D., & MacNab, Y. (2016). Variable selection in semi-parametric models. Statistical Methods in Medical Research, 25 (4), 1736-1752. https://doi.org/10.1177/0962280213499679