Decoding Categorical Speech Perception from Evoked Brain Responses


Categorical perception (CP) is important to understand how fast and accurately speech sounds are invariantly perceived by humans despite widespread variability in their physical acoustic properties. In this work, we built a framework to understand the temporal (when in time) characteristics of speech evoked responses that differentiate prototypical vowels (true phonetic categories) from ambiguous ones (lacking a clear phonetic identity). We recorded event-related potentials (ERPs) from in young, healthy adults as they rapidly classified speech sounds along an acoustic-phonetic continuum. Source derived response features were submitted to support vector machine (SVM) classifiers. Whole-brain data provided the best decoding of prototypical from ambiguous speech with an accuracy of 95.16% [area under the curve (AUC) 95.14%; F1-score 95.00%] at 120 ms. Separate analyses using left hemisphere (LH) and right hemisphere (RH) data showed that LH activity was a better predictor of speech categorization as compared to RH (89.03 % vs. 86.45% accuracy). Notably, CP was decoded less robustly and later in time using hemisphere-specific data compared to whole-brain data. Our results are consistent with notions that the early ERPs (e.g., P2) are sensitive to CP and the general dominance of LH in auditory-linguistic processing.

Publication Title

2020 IEEE Region 10 Symposium, TENSYMP 2020