Assessing Entailer with a corpus of natural language from an intelligent tutoring system


In this study, we compared Entailer, a computational tool that evaluates the degree to which one text is entailed by another, to a variety of other text relatedness metrics (LSA, lemma overlap, and MED). Our corpus was a subset of 100 self-explanations of sentences from a recent experiment on interactions between students and iSTART, an Intelligent Tutoring System that helps students to apply metacognitive strategies to enhance deep comprehension. The sentence pairs were hand coded by experts in discourse processing across four categories of text relatedness: entailment, implicature, elaboration, and paraphrase. A series of regression analyses revealed that Entailer was the best measure for approximating these hand coded values. The Entailer could explain approximately 50% of the variance for entailment, 38% of the variance for elaboration, and 23% of the variance for paraphrase. LSA contributed marginally to the entailment model. Neither lemma-overlap nor MED contributed to any of the models, although a modified version of MED did correlate significantly with both the entailment and paraphrase hand coded evaluations. This study is an important step towards developing a set of indices designed to better assess natural language input by students in Intelligent Tutoring Systems. Copyright © 2007, American Association for Artificial Intelligence (www.aaai.org). All rights reserved.

Publication Title

Proceedings of the Twentieth International Florida Artificial Intelligence Research Society Conference, FLAIRS 2007

This document is currently not available here.