Towards a structured representation of generic concepts and relations in large text corpora
Abstract
Extraction of structured information from text corpora involves identifying entities and the relationship between entities expressed in unstructured text. We propose a novel iterative pattern induction method to extract relation tuples exploiting lexical and shallow syntactic pattern of a sentence. We start with a single pattern to illustrate how the method explores additional paterns and tuples by itself with increasing amount of data. We apply frequency and correlation based filtering and ranking of relation tuples to ensure the correctness of the system. Experimental evaluation compared to other state of the art open extraction systems such as Reverb, textRunner and WOE shows the effectiveness of the proposed system.
Publication Title
International Conference Recent Advances in Natural Language Processing, RANLP
Recommended Citation
Bhattarai, A., & Rus, V. (2013). Towards a structured representation of generic concepts and relations in large text corpora. International Conference Recent Advances in Natural Language Processing, RANLP, 65-73. Retrieved from https://digitalcommons.memphis.edu/facpubs/3284