Unified variable selection in semi-parametric models

Abstract

We propose a Bayesian variable selection method in semi-parametric models with applications to genetic and epigenetic data (e.g., single nucleotide polymorphisms and DNA methylation, respectively). The data are individually standardized to reduce heterogeneity and facilitate simultaneous selection of categorical (single nucleotide polymorphisms) and continuous (DNA methylation) variables. The Gaussian reproducing kernel is applied to the transformed data to evaluate joint effect of the variables, which may include complex interactions between, e.g., single nucleotide polymorphisms and DNA methylation. Indicator variables are introduced to the model for the purpose of variable selection. The method is demonstrated and evaluated using simulations under different scenarios. We apply the method to identify informative DNA methylation sites and single nucleotide polymorphisms in a set of genes based on their joint effect on allergic sensitization. The selected single nucleotide polymorphisms and methylation sites have the potential to serve as early markers for allergy prediction, and consequently benefit medical and clinical research to prevent allergy before its manifestation.

Publication Title

Statistical Methods in Medical Research

Share

COinS