Electronic Theses and Dissertations
Date
2021
Document Type
Dissertation
Degree Name
Doctor of Philosophy
Department
Mathematical Sciences
Committee Chair
George Ebenezer
Committee Member
Hongmei Zhang
Committee Member
Meredith Ray
Committee Member
Dale Bowman
Committee Member
Shengtong Han
Abstract
Epigenetics is one possible mechanism explaining the disease heritability withoutchanging DNA sequence. DNA methylation (DNAm), as an epigenetic factor with amemory of environmental exposure, may potentially explain the existing missingheritabability. Learning the patterns of DNAm transmission at different CpG sites andstudying the interconnection among the transmitted CpG sites is beneficial to diseaseprediction and prevention. In my dissertation, I will focus on two perspectives to study thetransmission of DNA methylation: clustering and network.The first project describes a nested Bayesian clustering method to identify DNAmethylation (DNAm) sites (CpG sites) such that DNAm is transmitted from onegeneration to the next, and to study heterogeneity among CpG sites with DNAmtransmitted. To facilitate this goal, the beta regression is employed to infer the transmissionstatus and, for CpG sites with DNAm transmitted, to cluster transmission patternsat a population level. The transmission status and patterns are inferred under a Bayesianframework. Simulations with different scenarios are used to demonstrate and evaluate theapplicability of the method. We demonstrate the approach using a triad (mother, father,and offspring) data set with DNA methylation assessed at 4063 CpG sites to detecttransmitted CpGs and their DNAm transmission patterns.The second project proposes a comprehensive comparison of three existing Gaussiangraphical models on epigenetic network constructions based on the precision matrix. Thethree methods, the projection method, the horseshoe method and the HRS (hit and runsampler) method, are assessed in different scenarios, and six statistics, sensitivity,specificity, MCC, F1-score, KL-divergence and quadractic loss, are used to compare theperformance across different approaches. The simulation study suggests that bothprojection method and the horseshoe method performed well in edge set identification in low-dimensional setting, but a higher loss in precision matrix estimation, and the HRSmethod always performed well in both graphical structure identification and precisionmatrix estimation in high-dimensional setting. The three methods are further applied in1043 CpGs that are maternal-transmission dominated to identify the potential networkstructures.
Library Comment
Dissertation or thesis originally submitted to ProQuest
Notes
Open Access
Recommended Citation
wang, jiajing, "Statistical methods for patterns and interconnections between variables with applications in epigenetics data" (2021). Electronic Theses and Dissertations. 2826.
https://digitalcommons.memphis.edu/etd/2826
Comments
Data is provided by the student.