Support vector learning for gender classification using audio and visual cues: A comparison


Computer vision systems for monitoring people and collecting valuable demographics in a social environment will play an increasingly important role in enhancing user’s experience and can significantly improve the intelligibility of a human computer interaction (HCI) system. For example, a robust gender classification system is expected to provide a basis for passive surveillance and access to a smart building using demographic information or can provide valuable consumer statistics in a public place. The option of an audio cue in addition to the visual cue promises a robust solution with high accuracy and ease-of-use in human computer interaction systems. This paper investigates the use of Support Vector Machines(SVMs) for the purpose of gender classification. Both visual (thumbnail frontal face) and audio (features from speech data) cues were considered for designing the classifier and the performance obtained by using each cue was compared. The performance of the SVM was compared with that of two simple classifiers namely, the nearest prototype neighbor and the k-nearest neighbor on all feature sets. It was found that the SVM outperformed the other two classifiers on all datasets. The best overall classification rates obtained using the SVM for the visual and speech data were 95. 31% and 100%, respectively.

Publication Title

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)