Electronic Theses and Dissertations
Date
2025
Document Type
Thesis
Degree Name
Master of Science
Department
Public Health
Committee Chair
Yu Jiang
Committee Member
Hongmei Zhang
Committee Member
Yongmei Wang
Abstract
Single-cell RNA sequencing (scRNA-seq) is undergoing rapid development and widely adopted in biomedical research. Clustering of scRNA-seq data is usually quite challenging due to batch effects, high dropout events, and high dimensionality of gene expression. In this study, we evaluate five clustering methods — Zero-Inflated Negative Binomial Mixed Model (ZINBMM), Seurat, Single-cell Clustering via Contrastive Trajectory Regularization (scCCTR), Single-cell Masked Autoencoder (scMAE), and Deep Embedding for Single-cell Clustering (DESC) —using simulation studies. Varied batch effects, sample sizes, and cluster structures were used in the simulation framework. Clustering performance is assessed using the adjusted rand index (ARI) and normalized mutual information (NMI). The results show that ZINBMM consistently outperforms the other methods across a wide range of simulation settings. Seurat generally ranks second to ZINBMM in most scenarios. These findings provide practical recommendations for scRNA-seq data analysis, particularly for studies involving data integration across batches or platforms.
Library Comment
Dissertation or thesis originally submitted to ProQuest.
Notes
Embargoed until 08-06-2027
Recommended Citation
Zhang, Shiyuan, "COMPARISON OF CLUSTERING APPROACHES WITH APPLICATION TO SINGLE CELL RNA-SEQ DATA ACCOUNTING FOR BATCH EFFECTS" (2025). Electronic Theses and Dissertations. 3860.
https://digitalcommons.memphis.edu/etd/3860
Comments
Data is provided by the student.