Faculty Publications

Automated content analysis across six languages

Leah Cathryn Windsor, University of Memphis
James Grayson Cupit, University of Memphis
Alistair James Windsor, University of Memphis

Abstract

Corpus selection bias in international relations research presents an epistemological problem: How do we know what we know? Most social science research in the field of text analytics relies on English language corpora, biasing our ability to understand international phenomena. To address the issue of corpus selection bias, we introduce results that suggest that machine translation may be used to address non-English sources. We use human translation and machine translation (Google Translate) on a collection of aligned sentences from United Nations documents extracted from the Multi-UN corpus, analyzed with a “bag of words” analysis tool, Linguistic Inquiry Word Count (LIWC). Overall, the LIWC indices proved relatively stable across machine and human translated sentences. We find that while there are statistically significant differences between the original and translated documents, the effect sizes are relatively small, especially when looking at psychological processes.

Publication Title

PLoS ONE

Recommended Citation

Windsor, L., Cupit, J., & Windsor, A. (2019). Automated content analysis across six languages. PLoS ONE, 14 (11) https://doi.org/10.1371/journal.pone.0224425

Link to Full Text

COinS

Faculty Publications

Automated content analysis across six languages

Abstract

Publication Title

Recommended Citation

Search

Browse

Author Corner

Libraries

Faculty Publications

Automated content analysis across six languages

Authors

Abstract

Publication Title

Recommended Citation

Share

Search

Browse

Author Corner

Libraries