Electronic Theses and Dissertations

Author

Luhang Han

Date

2024

Document Type

Dissertation

Degree Name

Doctor of Philosophy

Department

Mathematical Sciences

Committee Chair

Hongmei Zhang

Committee Member

E. Olusegun George

Committee Member

Hui Zhang

Committee Member

Ching-Chi Yang

Abstract

High dimensional data is widely studied in many areas, such as in epigenetic study and biological science. However, there is challenging to develop and apply clustering methods to high dimensional data. In this dissertation, we propose two clustering methods to study DNA methylation (DNAm) data in one-dimensional and stochastic optical reconstruction microscopy (STORM) image data in two-dimensional. DNAm changes are known to be associated with different age stages, and the mapped genes could be linked to incidence of diseases. Thus, learning the features of DNAm at different CpG sites will benefit subsequent epigenetic epidemiological studies on marker detections for health outcomes or exposures. However, currently no methods are available to effectively and efficiently identify dynamic and stable CpGs. Hence, we develop a Bayesian two-stage clustering method to 1) determine whether DNAm at a CpG site was stable over time, and 2) assign each unstable CpG site into a specific cluster based on temporal trend of DNAm at that site. We use simulations to demonstrate and evaluate the proposed method. Real data application to M-values of DNA methylation at 325 subjects with 2,000 CpG sites at two time points from Isle of Wight birth cohort is then used in the demonstration. STORM is a Single Molecule Localization Microscopy (SMLM) technique. SMLM images provide an opportunity to observe isolated physical locations of observed proteins and study protein interactions by analyzing point patterns using spatial point theories. Previous popular methods for this type of problems are largely for single image analysis and have different limitations, such as, heavy load of simulations, false positivity etc. Hence, we are motivated to propose new methods to overcome these limitations. We develop standardized statistics to study location patterns (clustered, dispersed or random) in one species for 2D images to compare across different treatment conditions or study groups. Simulations are used to demonstrate and assess the developed method. We then apply the approach to 2D STORM image data generated from the Department of Immunology, St. Jude Children’s Research Hospital. The data set composed of x, y co-ordinates of mitochondria and lysosomes at 10 cells under two conditions.

Comments

Data is provided by the student.

Library Comment

Dissertation or thesis originally submitted to ProQuest.

Notes

Open Access

Share

COinS