Lemon and tea are not similar: Measuring word-to-word similarity by combining different methods

Abstract

Substantial amount of work has been done on measuring word-to-word relatedness which is also commonly referred as similarity. Though relatedness and similarity are closely related, they are not the same as illustrated by the words lemon and tea which are related but not similar. The relatedness takes into account a broader ranLemge of relations while similarity only considers subsumption relations to assess how two objects are similar. We present in this paper a method for measuring the semantic similarity of words as a combination of various techniques including knowledge-based and corpus-based methods that capture different aspects of similarity. Our corpus based method exploits state-of-the-art word representations. We performed experiments with a recently published significantly large dataset called Simlex-999 and achieved a significantly better correlation (ρ = 0.642, P < 0.001) with human judgment compared to the individual performance.

Publication Title

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Recommended Citation

Banjade, R., Maharjan, N., Niraula, N., Rus, V., & Gautam, D. (2015). Lemon and tea are not similar: Measuring word-to-word similarity by combining different methods. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9041, 335-346. https://doi.org/10.1007/978-3-319-18111-0_25

Faculty Publications

Lemon and tea are not similar: Measuring word-to-word similarity by combining different methods

Abstract

Publication Title

Recommended Citation

Search

Browse

Author Corner

Libraries

Faculty Publications

Lemon and tea are not similar: Measuring word-to-word similarity by combining different methods

Authors

Abstract

Publication Title

Recommended Citation

Share

Search

Browse

Author Corner

Libraries