Master of Science
Partitioning of data into clusters is a widely popular method of gaining insight into the similarities and differences of groups. Amongst the most popular approaches are the K-means and K-prototype methods, however, they fail to consider potential joint effects and interactions of the variables. The Vector in Partition (VIP) algorithm fills this gap with a distance measure designed to partition genetic and epigenetic data; specifically gene expression (GE), DNA methylation (CPG), and single nucleotide polymorphisms (SNP). This work focuses on an extension to the VIP method by furthering incorporating K-means and K-prototype framework within the novel distance measure to incorporate covariate data. This extension allows for another layer of combining complex joint effects of genetic/epi-genetic data and other health-related data to dictate clustering. The results from simulated data showed high accuracy, sensitivity, and specificity of cluster assignments across varying criteria and outperformance of the original VIP method.
Dissertation or thesis originally submitted to ProQuest
Handwerker, Joseph, "Vector in Partition extension: Analysis of clustering when genetics distance is weighted by covariates." (2023). Electronic Theses and Dissertations. 3106.