Loading…
Leveraging inter-rater agreement for audio-visual emotion recognition
Human expressions are often ambiguous and unclear, resulting in disagreement or confusion among different human evaluators. In this paper, we investigate how audiovisual emotion recognition systems can leverage prototypicality, the level of agreement or confusion among human evaluators. We propose t...
Saved in:
Main Authors: | , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Human expressions are often ambiguous and unclear, resulting in disagreement or confusion among different human evaluators. In this paper, we investigate how audiovisual emotion recognition systems can leverage prototypicality, the level of agreement or confusion among human evaluators. We propose the use of a weighted Support Vector Machine to explicitly model the relationship between the prototypicality of training instances and evaluated emotion from the IEMOCAP corpus. We choose weights of prototypical and non-prototypical instances based on the maximal accuracy of each speaker. We then provide per-speaker analysis to understand specific speech characteristics associated with the information gain of emotion given prototypicality information. Our experimental results show that neutrality, one of the most challenging emotion to recognize, has the highest performance gain from prototypicality information, compared to other emotion classes: Angry, Happy, and Sad. We also show that the proposed method improves the overall multi-class classification accuracy significantly over traditional methods that do not leverage prototypicality. |
---|---|
ISSN: | 2156-8111 |
DOI: | 10.1109/ACII.2015.7344624 |