Generalizing latent semantic analysis


Latent Semantic Analysis (LSA) is a vector space technique for representing word meaning. Traditionally, LSA consists of two steps, the formation of a word by document matrix followed by singular value decomposition of that matrix. However, the formation of the matrix according to the dimensions of words and documents is somewhat arbitrary. This paper attempts to reconceptualize LSA in more general terms, by characterizing the matrix as a feature by context matrix rather than a word by document matrix. Examples of generalized LSA utilizing n-grams and local context are presented and compared with traditional LSA on paraphrase comparison tasks. © 2009 IEEE.

Publication Title

ICSC 2009 - 2009 IEEE International Conference on Semantic Computing