Electronic Theses and Dissertations

Author

Shiyuan Zhang

Date

2025

Document Type

Thesis

Degree Name

Master of Science

Department

Public Health

Committee Chair

Yu Jiang

Committee Member

Hongmei Zhang

Committee Member

Yongmei Wang

Abstract

Single-cell RNA sequencing (scRNA-seq) is undergoing rapid development and widely adopted in biomedical research. Clustering of scRNA-seq data is usually quite challenging due to batch effects, high dropout events, and high dimensionality of gene expression. In this study, we evaluate five clustering methods — Zero-Inflated Negative Binomial Mixed Model (ZINBMM), Seurat, Single-cell Clustering via Contrastive Trajectory Regularization (scCCTR), Single-cell Masked Autoencoder (scMAE), and Deep Embedding for Single-cell Clustering (DESC) —using simulation studies. Varied batch effects, sample sizes, and cluster structures were used in the simulation framework. Clustering performance is assessed using the adjusted rand index (ARI) and normalized mutual information (NMI). The results show that ZINBMM consistently outperforms the other methods across a wide range of simulation settings. Seurat generally ranks second to ZINBMM in most scenarios. These findings provide practical recommendations for scRNA-seq data analysis, particularly for studies involving data integration across batches or platforms.

Comments

Data is provided by the student.

Library Comment

Dissertation or thesis originally submitted to ProQuest.

Notes

Embargoed until 08-06-2027

Available for download on Friday, August 06, 2027

Share

COinS