Loading…
Speech/music classification using visual and spectral chromagram features
Automatic speech/music classification is an important tool in multimedia content analysis and retrieval which efficiently categorizes input audio and store it into relevant classes. This article proposes use of chromagram textural and spectral features for speech and music classification. Chromagram...
Saved in:
Published in: | Journal of ambient intelligence and humanized computing 2020, Vol.11 (1), p.329-347 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Automatic speech/music classification is an important tool in multimedia content analysis and retrieval which efficiently categorizes input audio and store it into relevant classes. This article proposes use of chromagram textural and spectral features for speech and music classification. Chromagram textural feature set is based on transforming the input audio into a chromagram image representation and then extracting uniform local binary pattern textural descriptors. Chroma spectral features involves novel chroma bin features which exploits music tonality present in the music signal. The optimal feature subset from the original feature set is selected using eigenvector centrality based feature selection, removing the redundant and irrelevant features and further enhancing the prediction performance. The performance of the algorithm is evaluated using S&S, GTZAN and MUSAN databases providing the advantage and suitability of both chroma spectral and visual features for the classification task. Extensive experiments performed using support vector machine classifier shows that the chromagram textural descriptors outperform other state-of-the-art approaches. Besides, good results are also achieved in the mismatched training and testing. |
---|---|
ISSN: | 1868-5137 1868-5145 |
DOI: | 10.1007/s12652-019-01303-4 |