Date

2024

Document Type

Dissertation

Degree Name

Doctor of Philosophy

Department

Public Health

Committee Chair

Meredith Ray.

Abstract

Cluster analysis is a popular, well-utilized unsupervised machine learning technique to group individual subjects based on the similarity of their traits. In epidemiological and biomedical research, clustering individuals into groups in order to study their group patterns with respect to outcomes of interest is often useful. Popular clustering methods, including the k-means framework, are well suited to cluster individuals based on continuous, nominal, and mixed type variables. Gower’s similarity is an additional option for clustering based on mixed type data. Clustering of data that has an inherently nested structure poses unique challenges to the classic clustering methodologies, as these solutions do not have the ability to cluster individuals based on vector variables often present in high dimensional data. An example is seen in the problem of clustering individuals based on genetic/epi-genetic data which encompasses single nucleotide polymorphisms (SNPs) information, deoxyribonucleic acid methylation (DNAm) levels information, and level of expression information across multiple genes. At the person level, this data comes together to create a set of multidimensional variables. An appropriate clustering strategy called Vectors in Partitioning has been developed by the research team at the University of Memphis School of Public Health’s Epidemiology, Biostatistics, and Environmental Health division. This novel clustering strategy calculates a distance measure at the gene level, considering multiple input variables of mixed type, nested within the gene, which are summed and compared at the person level. A similar challenge is posed to the classic clustering methods by data containing grouped variables which, together, measure latent constructs. For example, epidemiological data is often structured such that various latent constructs are measures as the combination of multiple variables, representing factors that are likely to work in combination to predict health or social outcomes of interest. The aim of this dissertation is to develop two novel clustering methods which will account for this type of grouped data structure. To our knowledge, no clustering methods currently exist that account for grouped variable data structure. Our proposed methods are novel non-parametric approaches that will allow for assessment of the influence of data with grouped variable structure on various health outcomes.

Comments

Data is provided by the student.”

Library Comment

Dissertation or thesis originally submitted to ProQuest.

Notes

Embargoed unitl 3/27/2026

Recommended Citation

Plaxco, Allison, "Extensions of Vectors in Partitioning: Analysis of Clustering for Grouped Data" (2024). Electronic Theses and Dissertations Archive. 3459.
https://digitalcommons.memphis.edu/etd/3459

Download

COinS

Electronic Theses and Dissertations Archive

Extensions of Vectors in Partitioning: Analysis of Clustering for Grouped Data

Date

Document Type

Degree Name

Department

Committee Chair

Abstract

Comments

Library Comment

Notes

Recommended Citation

Search

Browse

Author Corner

Libraries

Electronic Theses and Dissertations Archive

Extensions of Vectors in Partitioning: Analysis of Clustering for Grouped Data

Author

Date

Document Type

Degree Name

Department

Committee Chair

Abstract

Comments

Library Comment

Notes

Recommended Citation

Share

Search

Browse

Author Corner

Libraries