Loading…
KAN-AV dataset for audio-visual face and speech analysis in the wild
Human-computer interaction is becoming increasingly prevalent in daily life with the adoption of intelligent devices. These devices must be capable of interacting in diverse settings, such as environments with noise, music and differing illumination and occlusion conditions. They must also interact...
Saved in:
Published in: | Image and vision computing 2023-12, Vol.140, p.104839, Article 104839 |
---|---|
Main Authors: | , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Human-computer interaction is becoming increasingly prevalent in daily life with the adoption of intelligent devices. These devices must be capable of interacting in diverse settings, such as environments with noise, music and differing illumination and occlusion conditions. They must also interact with a variety of end users across ages and backgrounds. Therefore, the machine learning community needs in-the-wild multi-modal datasets to develop models for face and speech analysis so that they can be applicable in most real world scenarios. However, most existing audio and audio-visual databases are captured in controlled conditions with few or no age and kinship labels. In this paper, we introduce the KAN-AV dataset which contains 98 h of audio-visual data from 970 identities across ages. Two thirds of the identities have kin relations in the dataset. The dataset is manually annotated with labels for kinship, age, and gender and is intended to drive future research in face and speech analysis.
•We introduce a large-scale, in-the-wild, audio-visual face and speech dataset•It contains 970 identities across ages, with age labels, and kinship labels for ⅔•We introduce the task of kinship verification with speech•We also introduce cross-modal kinship matching (from face to voice and vice versa)•Dataset is suitable for classification tasks involving age, identity and kinship |
---|---|
ISSN: | 0262-8856 1872-8138 |
DOI: | 10.1016/j.imavis.2023.104839 |