Faculty Publications

Similarity measures based on Latent Dirichlet Allocation

Vasile Rus, University of Memphis
Nobal Niraula, University of Memphis
Rajendra Banjade, University of MemphisFollow

Abstract

We present in this paper the results of our investigation on semantic similarity measures at word- and sentence-level based on two fully-automated approaches to deriving meaning from large corpora: Latent Dirichlet Allocation, a probabilistic approach, and Latent Semantic Analysis, an algebraic approach. The focus is on similarity measures based on Latent Dirichlet Allocation, due to its novelty aspects, while the Latent Semantic Analysis measures are used for comparison purposes. We explore two types of measures based on Latent Dirichlet Allocation: measures based on distances between probability distribution that can be applied directly to larger texts such as sentences and a word-to-word similarity measure that is then expanded to work at sentence-level. We present results using paraphrase identification data in the Microsoft Research Paraphrase corpus. © 2013 Springer-Verlag.

Publication Title

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Recommended Citation

Rus, V., Niraula, N., & Banjade, R. (2013). Similarity measures based on Latent Dirichlet Allocation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 7816 LNCS (PART 1), 459-470. https://doi.org/10.1007/978-3-642-37247-6_37

Link to Full Text

COinS

Faculty Publications

Similarity measures based on Latent Dirichlet Allocation

Abstract

Publication Title

Recommended Citation

Search

Browse

Author Corner

Libraries

Faculty Publications

Similarity measures based on Latent Dirichlet Allocation

Authors

Abstract

Publication Title

Recommended Citation

Share

Search

Browse

Author Corner

Libraries