Fusing annotations with majority vote triplet embeddings

Abstract

Human annotations of behavioral constructs are of great importance to the machine learning community because of the difficulty in quantifying states that cannot be directly observed, such as dimensional emotion. Disagreements between annotators and other personal biases complicate the goal of obtaining an accurate approximation of the true behavioral construct values for use as ground truth. We present a novel majority vote triplet embedding scheme for fusing real-time and continuous annotations of a stimulus to produce a gold-standard time series. We illustrate the validity of our approach by showing that the method produces reasonable gold-standards for two separate annotation tasks from a human annotation data set where the true construct labels are known a priori. We also apply our method to the RECOLA dimensional emotion data set in conjunction with state-of-the-art time warping methods to produce gold-standard labels that are sufficiently representative of the annotations and also that are more easily learned from features when evaluated using a battery of linear predictors as prescribed in the 2018 AVEC gold-standard emotion sub-challenge. In particular, we find that the proposed method leads to gold-standard labels that aid in valence prediction.

Publication Title

AVEC 2018 - Proceedings of the 2018 Audio/Visual Emotion Challenge and Workshop, co-located with MM 2018

Share

COinS