WikiMorph: Learning to Decompose Words into Morphological Structures
Abstract
This paper presents WikiMorph, a tool that automatically breaks down words into morphemes, etymological compounds (morphemes from root languages), and generates contextual definitions for each component. It comes in two flavors: a dataset and a deep-learning-based model. The dataset was extracted from Wiktionary and contains over 450k entries. We then used this dataset to train a GPT-2 model to generalize and decompose any word into morphemes and their definitions. We find that the model accurately generates complex breakdowns when given a high-quality initial definition.
Publication Title
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Recommended Citation
Yarbro, J., & Olney, A. (2021). WikiMorph: Learning to Decompose Words into Morphological Structures. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 12749 LNAI, 406-411. https://doi.org/10.1007/978-3-030-78270-2_72