Electronic Theses and Dissertations
Identifier
356
Date
2011
Document Type
Dissertation
Degree Name
Doctor of Philosophy
Major
Computer Science
Committee Chair
Vasile Rus
Committee Member
Arthur C. Graesser
Committee Member
King-Ip (David) Lin
Committee Member
Vinhthuy T. Phan
Abstract
This dissertation investigates and proposes ways to quantify and measure semantic similarity between texts. The general approach is to rely on linguistic information at various levels, including lexical, lexico-semantic, and syntactic. The approach starts by mapping texts onto structured representations that include lexical, lexico-semantic, and syntactic information. The representation is then used as input to methods designed to measure the semantic similarity between texts based on the available linguistic information.While world knowledge is needed to properly assess semantic similarity of texts, in our approach world knowledge is not used, which is a weakness of it.We limit ourselves to answering the question of how successfully one can measure the semantic similarity of texts using just linguistic information.The lexical information in the original texts is retained by using the words in the corresponding representations of the texts. Syntactic information is encoded using dependency relations trees, which represent explicitly the syntactic relations between words. Word-level semantic information is relatively encoded through the use of semantic similarity measures like WordNet Similarity or explicitly encoded using vectorial representations such as Latent Semantic Analysis (LSA). Several methods are being studied to compare the representations, ranging from simple lexical overlap, to more complex methods such as comparing semantic representations in vector spaces as well as syntactic structures. Furthermore, a few powerful kernel models are proposed to use in combination with Support Vector Machine (SVM) classifiers for the case in which the semantic similarity problem is modeled as a classification task.
Library Comment
Dissertation or thesis originally submitted to the local University of Memphis Electronic Theses & dissertation (ETD) Repository.
Recommended Citation
Lintean, Mihai Cosmin, "Measuring Semantic Similarity: Representations and Methods" (2011). Electronic Theses and Dissertations. 273.
https://digitalcommons.memphis.edu/etd/273
Comments
Data is provided by the student.