Loading…

Leveraging inter-rater agreement for audio-visual emotion recognition

Human expressions are often ambiguous and unclear, resulting in disagreement or confusion among different human evaluators. In this paper, we investigate how audiovisual emotion recognition systems can leverage prototypicality, the level of agreement or confusion among human evaluators. We propose t...

Full description

Saved in:
Bibliographic Details
Main Authors: Yelin Kim, Provost, Emily Mower
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Human expressions are often ambiguous and unclear, resulting in disagreement or confusion among different human evaluators. In this paper, we investigate how audiovisual emotion recognition systems can leverage prototypicality, the level of agreement or confusion among human evaluators. We propose the use of a weighted Support Vector Machine to explicitly model the relationship between the prototypicality of training instances and evaluated emotion from the IEMOCAP corpus. We choose weights of prototypical and non-prototypical instances based on the maximal accuracy of each speaker. We then provide per-speaker analysis to understand specific speech characteristics associated with the information gain of emotion given prototypicality information. Our experimental results show that neutrality, one of the most challenging emotion to recognize, has the highest performance gain from prototypicality information, compared to other emotion classes: Angry, Happy, and Sad. We also show that the proposed method improves the overall multi-class classification accuracy significantly over traditional methods that do not leverage prototypicality.
ISSN:2156-8111
DOI:10.1109/ACII.2015.7344624