Electronic Theses and Dissertations
Date
2022
Document Type
Dissertation
Degree Name
Doctor of Philosophy
Department
Electrical & Computer Engineering
Committee Chair
Bonny Banerjee
Committee Member
Madhusudhanan Balasubramanian
Committee Member
Deepak Venugopal
Committee Member
Aaron L Robinson
Abstract
With the proliferation of soft and hard sensors, data in multiple sensor modalities has become commonplace. In this dissertation, we propose a general-purpose agent model that operates using a closed perception-action loop. The agent actively and sequentially samples its environment, driven by sensory prediction error. It learns where and what to sample by minimizing this prediction error, without any reinforcement. This end-to-end model is evaluated on three applications: (1) generation and recognition of handwritten numerals and alphabets from images and videos, (2) generation and recognition of human-human interactions from videos, and (3) recognition of emotions from speech via generation. For each application, the model yields state-of-the-art accuracy on benchmark datasets, while also maintaining sample and model size efficiency. In order to validate our model with respect to human performance, we collect mouse-click attention tracking (mcAT) data from 382 participants trying to recognize handwritten numerals and alphabets (upper and lowercase) from images via sequential sampling. Images from benchmark datasets are presented as stimuli. The collected data consists of a sequence of sample (click) locations, predicted class label(s) at each sampling, and duration of each sampling. We show that on average, participants observe only 12.8% of an image for recognition. When exposed to the same stimuli and experimental conditions as the participants, our agent model performs handwritten numeral/alphabet recognition more efficiently than the participants as well as a highly-cited attention-based reinforcement model.
Library Comment
Dissertation or thesis originally submitted to ProQuest.
Notes
Embargoed until 4/12/2024
Recommended Citation
Baruah, Murchana, "THE PERCEPTION-ACTION LOOP IN ATTENTION-BASED PREDICTIVE AGENTS: APPLICATION TO MULTIMODAL DATA GENERATION AND RECOGNITION" (2022). Electronic Theses and Dissertations. 3394.
https://digitalcommons.memphis.edu/etd/3394
Comments
Data is provided by the student.