Loading…

A multimodal approach for multi-label movie genre classification

Movie genre classification is a challenging task that has increasingly attracted the attention of researchers. The number of movie consumers interested in taking advantage of automatic movie genre classification is overgrowing, thanks to media streaming service providers’ popularization. In this pap...

Full description

Saved in:
Bibliographic Details
Published in:Multimedia tools and applications 2022-06, Vol.81 (14), p.19071-19096
Main Authors: Mangolin, Rafael B., Pereira, Rodolfo M., Britto, Alceu S., Silla, Carlos N., Feltrim, Valéria D., Bertolini, Diego, Costa, Yandre M. G.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Movie genre classification is a challenging task that has increasingly attracted the attention of researchers. The number of movie consumers interested in taking advantage of automatic movie genre classification is overgrowing, thanks to media streaming service providers’ popularization. In this paper, we addressed the multi-label classification of movie genres in a multimodal way. To this end, we created a dataset composed of trailer video clips, subtitles, synopses, and movie posters from 152,622 movie titles of the Movie Database (TMDb). Such a large dataset was carefully curated, organized, and made available as a contribution of this work. We labeled each movie of the dataset according to a set of eighteen genre labels. In the experimental evaluation performed in this paper, we computed different kinds of descriptors, such as Mel Frequency Cepstral Coefficients (MFCCs), Statistical Spectrum Descriptor (SSD), Local Binary Pattern (LBP) from spectrograms, Long-Short Term Memory (LSTM), and Convolutional Neural Networks (CNN). With these descriptors, we trained different monolithic classifiers using BinaryRelevance and ML-kNN techniques. Besides, we also explored the combination of classifiers/features using a late fusion strategy. The fusion of a LSTM trained on synopses and another LSTM trained on the movie subtitles provided our best results in F-Score (0.674) and AUC-PR (0.725) metrics. These results corroborate the existence of complementarity among classifiers trained on different sources of information in this field of application. As far as we know, this is the most comprehensive study developed in terms of diversity of multimedia sources of information to perform movie genre classification.
ISSN:1380-7501
1573-7721
DOI:10.1007/s11042-020-10086-2