Electronic Theses and Dissertations


jiajing wang



Document Type


Degree Name

Doctor of Philosophy


Mathematical Sciences

Committee Chair

George Ebenezer

Committee Member

Hongmei Zhang

Committee Member

Meredith Ray

Committee Member

Dale Bowman

Committee Member

Shengtong Han


Epigenetics is one possible mechanism explaining the disease heritability withoutchanging DNA sequence. DNA methylation (DNAm), as an epigenetic factor with amemory of environmental exposure, may potentially explain the existing missingheritabability. Learning the patterns of DNAm transmission at different CpG sites andstudying the interconnection among the transmitted CpG sites is beneficial to diseaseprediction and prevention. In my dissertation, I will focus on two perspectives to study thetransmission of DNA methylation: clustering and network.The first project describes a nested Bayesian clustering method to identify DNAmethylation (DNAm) sites (CpG sites) such that DNAm is transmitted from onegeneration to the next, and to study heterogeneity among CpG sites with DNAmtransmitted. To facilitate this goal, the beta regression is employed to infer the transmissionstatus and, for CpG sites with DNAm transmitted, to cluster transmission patternsat a population level. The transmission status and patterns are inferred under a Bayesianframework. Simulations with different scenarios are used to demonstrate and evaluate theapplicability of the method. We demonstrate the approach using a triad (mother, father,and offspring) data set with DNA methylation assessed at 4063 CpG sites to detecttransmitted CpGs and their DNAm transmission patterns.The second project proposes a comprehensive comparison of three existing Gaussiangraphical models on epigenetic network constructions based on the precision matrix. Thethree methods, the projection method, the horseshoe method and the HRS (hit and runsampler) method, are assessed in different scenarios, and six statistics, sensitivity,specificity, MCC, F1-score, KL-divergence and quadractic loss, are used to compare theperformance across different approaches. The simulation study suggests that bothprojection method and the horseshoe method performed well in edge set identification in low-dimensional setting, but a higher loss in precision matrix estimation, and the HRSmethod always performed well in both graphical structure identification and precisionmatrix estimation in high-dimensional setting. The three methods are further applied in1043 CpGs that are maternal-transmission dominated to identify the potential networkstructures.


Data is provided by the student.

Library Comment

Dissertation or thesis originally submitted to ProQuest


Open Access