Online affect tracking with multimodal Kalman filters
Abstract
Arousal and valence have been widely used to represent emotions dimensionally and measure them continuously in time. In this paper, we introduce a computational framework for tracking these affective dimensions from multimodal data as an entry to theMultimodal Affect Recognition Sub-Challenge of the 2016 Audio/Visual Emotion Challenge and Workshop (AVEC2016). We propose a linear dynamical system approach with a late fusion method that accounts for the dynamics of the a ective state evolution (i.e., arousal or valence). To this end, single-modality predictions are modeled as observations in a Kalman filter formulation in order to continuously track each affective dimension. Leveraging the inter-correlations between arousal and valence, we use the predicted arousal as an additional feature to improve valence predictions. Furthermore, we propose a conditional framework to select Kalman filters of different modalities while tracking. This framework employs voicing probability and facial posture cues to detect the absence or presence of each input modality. Our multimodal fusion results on the development and the test set provide a statistically significant improvement over the baseline system from AVEC2016. The proposed approach can be potentially extended to other multimodal tasks with inter-correlated behavioral dimensions.
Publication Title
AVEC 2016 - Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge, co-located with ACM Multimedia 2016
Recommended Citation
Somandepalli, K., Gupta, R., Nasir, M., Booth, B., Lee, S., & Narayanan, S. (2016). Online affect tracking with multimodal Kalman filters. AVEC 2016 - Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge, co-located with ACM Multimedia 2016, 59-66. https://doi.org/10.1145/2988257.2988259