A Sentence similarity method based on chunking and information content


This paper introduces a method for assessing the semantic similarity between sentences, which relies on the assumption that the meaning of a sentence is captured by its syntactic constituents and the dependencies between them. We obtain both the constituents and their dependencies from a syntactic parser. Our algorithm considers that two sentences have the same meaning if it can find a good mapping between their chunks and also if the chunk dependencies in one text are preserved in the other. Moreover, the algorithm takes into account that every chunk has a different importance with respect to the overall meaning of a sentence, which is computed based on the information content of the words in the chunk. The experiments conducted on a well-known paraphrase data set show that the performance of our method is comparable to state of the art. © 2014 Springer-Verlag Berlin Heidelberg.

Publication Title

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)