Faculty Publications

New generation model of word vector representation based on CBOW or skip-gram

Zeyu Xiong, National University of Defense TechnologyFollow
Qiangqiang Shen, National University of Defense Technology
Yueshan Xiong, National University of Defense TechnologyFollow
Yijie Wang, National University of Defense Technology
Weizi Li, The University of North Carolina at Chapel Hill

Abstract

Word vector representation is widely used in natural language processing tasks. Most word vectors are generated based on probability model, its bag-of-words features have two major weaknesses: they lose the ordering of the words and they also ignore semantics of the words. Recently, neural-network language models CBOW and Skip-Gram are developed as continuous-space language models for words representation in high dimensional real-valued vectors. These vector representations have recently demonstrated promising results in various NLP tasks because of their superiority in capturing syntactic and contextual regularities in language. In this paper, we propose a new strategy based on optimization in contiguous subset of documents and regression method in combination of vectors, two of new models CBOW-OR and SkipGram-OR for word vector learning are established. Experimental results show that for some words-pair, the cosine distance obtained by the CBOW-OR (or SkipGram-OR) model is generally larger and is more reasonable than CBOW (or Skip-Gram), the vector space for Skip-Gram and SkipGram-OR keep the same structure property in Euclidean distance, and the model SkipGram-OR keeps higher performance for retrieval the relative words-pair as a whole. Both CBOW-OR and SkipGram-OR model are inherent parallel models and can be expected to apply in large-scale information processing.

Publication Title

Computers, Materials and Continua

Recommended Citation

Xiong, Z., Shen, Q., Xiong, Y., Wang, Y., & Li, W. (2019). New generation model of word vector representation based on CBOW or skip-gram. Computers, Materials and Continua, 60 (1), 259-273. https://doi.org/10.32604/cmc.2019.05155

Link to Full Text

COinS

Faculty Publications

New generation model of word vector representation based on CBOW or skip-gram

Abstract

Publication Title

Recommended Citation

Search

Browse

Author Corner

Libraries

Faculty Publications

New generation model of word vector representation based on CBOW or skip-gram

Authors

Abstract

Publication Title

Recommended Citation

Share

Search

Browse

Author Corner

Libraries