Faculty Publications

Latent semantic analysis models on wikipedia and TASA

Dan Ştefǎnescu, University of Memphis
Rajendra Banjade, University of MemphisFollow
Vasile Rus, University of Memphis

Abstract

This paper introduces a collection of freely available Latent Semantic Analysis models built on the entire English Wikipedia and the TASA corpus. The models differ not only on their source, Wikipedia versus TASA, but also on the linguistic items they focus on: all words, content-words, nouns-verbs, and main concepts. Generating such models from large datasets (e.g. Wikipedia), that can provide a large coverage for the actual vocabulary in use, is computationally challenging, which is the reason why large LSA models are rarely available. Our experiments show that for the task of word-to-word similarity, the scores assigned by these models are strongly correlated with human judgment, outperforming many other frequently used measures, and comparable to the state of the art.

Publication Title

Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014

Recommended Citation

Ştefǎnescu, D., Banjade, R., & Rus, V. (2014). Latent semantic analysis models on wikipedia and TASA. Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014, 1417-1422. Retrieved from https://digitalcommons.memphis.edu/facpubs/2925

This document is currently not available here.

COinS

Faculty Publications

Latent semantic analysis models on wikipedia and TASA

Abstract

Publication Title

Recommended Citation

Search

Browse

Author Corner

Libraries

Faculty Publications

Latent semantic analysis models on wikipedia and TASA

Authors

Abstract

Publication Title

Recommended Citation

Share

Search

Browse

Author Corner

Libraries