Electronic Theses and Dissertations





Document Type

Dissertation (Campus Access Only)

Degree Name

Doctor of Philosophy


Electrical and Computer Engr


Computer Engineering

Committee Chair

Bonny Banerjee

Committee Member

Eddie Jacobs

Committee Member

Madhusudhanan Balasubramanian

Committee Member

Aaron L. Robinson


This dissertation is an investigation of computational models for sensorimotor integration and word learning in pre-linguistic development. In particular, computational models are investigated for three problems: (1) acoustic-to-articulatory mapping or speech inversion, (2) speech motor skill acquisition and speech production, and (3) cross-situational noun learning. For the first problem, we show that the simpler general regression neural network model performs at par, if not better, than the state-of-the-art deep belief network in experiments with MOCHA-TIMIT and MNGU0 databases. For the second problem, we propose a developmental agent with perception (audio), action (vocalization) and learning capabilities, in the predictive coding framework. We show that, when exposed to an environment of linguistic sounds (Harvard-Haskins database of regularly-timed speech) without any external reinforcement signal, the agent learns to generate speech-like sounds (acoustic babbling followed by proto-syllables and vowels) as well as the timing for motor command execution. Random goal exploration leads to the self-organization of developmental stages of vocal sequences in the agent due to increase in complexity of vocalization. For the third problem, we investigate reinforcement learning models for early word learning. Cross-situational learning and social pragmatic theory are taken into account. As social cues, joint attention and prosodic cues in caregiver’s speech are considered. We show that, when a reinforcement learning model is exposed to a group of speakers, it comes to understand an initial set of vocabulary items belonging to the language used by the group. In standard experiments with the CHILDES dataset, the attentional-prosodic deep Q-network model outperforms existing word learning models.


Data is provided by the student.

Library Comment

Dissertation or thesis originally submitted to the local University of Memphis Electronic Theses & dissertation (ETD) Repository.