Electronic Theses and Dissertations
Date
2025
Document Type
Thesis
Degree Name
Master of Science
Department
Public Health
Committee Chair
Meredith Ray
Committee Member
Ching-Chi Yang
Committee Member
Hongmei Zhang
Committee Member
Yu Jiang
Abstract
Clustering analysis is a fundamental technique in machine learning, with K-means and its variants being widely used for their interpretability and efficiency. The Vector in Partition (VIP) algorithm extends K-means by incorporating a multi-dimensional distance measure for nested genetic data structures. Still, it inherits the challenge of selecting the optimal number of clusters (k). This thesis proposes integrating the simplified silhouette score (SSI) into VIP’s optimization options to improve k selection. Through simulation studies comparing SSI with currently implemented methods (Elbow, Slope, and Minimum AIC and BIC), we demonstrate that SSI consistently performs on par or better than existing methods, particularly in datasets with distinct clusters. Across tested settings, SSI appears resilient to changes in the number of subjects, although performance is slightly reduced at larger numbers of genes. While performance declines in non-distinct datasets, the SSI remains a useful heuristic for assessing clustering solutions, with minimal additional computational cost.
Library Comment
Dissertation or thesis originally submitted to ProQuest.
Notes
Open Access
Recommended Citation
Pirrotta, Luke Xavier, "Integrating the simplified silhouette score into the Vector in Partition (VIP) algorithm for cluster validation" (2025). Electronic Theses and Dissertations. 3870.
https://digitalcommons.memphis.edu/etd/3870
Comments
Data is provided by the student.