Multi-hierarchy documents clustering based on LSA space dimensionality character
Abstract
The statistical characteristics of dimensionality in latent semantic analysis (LSA) space were studied to realize automatic document clustering under different concept levels. It is concluded that dimensionalities corresponding bigger singular values describe commonness among semantic elements, while dimensionalities corresponding smaller ones describe discrepancy. There exists some latent relation between dimensionalities in LSA Space and concept granularities in natural languages. Different dimensionalities of LSA Space are adopted for document clustering under certain concept granularity. Experimental results are in good agreement with the above idea. In addition, in the LSA-based algorithm of document clustering, better clustering precisions are obtained by taking the row vectors of document self-indexing matrix as the objects to be clustered, instead of document vectors with low dimensions.
Publication Title
Qinghua Daxue Xuebao/Journal of Tsinghua University
Recommended Citation
Liu, Y., Qi, H., Hu, X., Cai, Z., & Dai, J. (2005). Multi-hierarchy documents clustering based on LSA space dimensionality character. Qinghua Daxue Xuebao/Journal of Tsinghua University, 45 (SUPPL.), 1783-1786. Retrieved from https://digitalcommons.memphis.edu/facpubs/8244