Loading…

Comparative study of singing voice detection based on deep neural networks and ensemble learning

This paper investigates various structures of neural network models and various types of stacked ensembles for singing voice detection. The studied models include convolutional neural networks (CNN), long short term memory (LSTM) model, convolutional LSTM model, and capsule net. The input features t...

Full description

Saved in:
Bibliographic Details
Published in:Human-centric computing and information sciences 2018-11, Vol.8 (1), p.1-18, Article 34
Main Authors: You, Shingchern D., Liu, Chien-Hung, Chen, Woei-Kae
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper investigates various structures of neural network models and various types of stacked ensembles for singing voice detection. The studied models include convolutional neural networks (CNN), long short term memory (LSTM) model, convolutional LSTM model, and capsule net. The input features to the network models are MFCC (mel-frequency cepstrum coefficients), spectrogram from short-time Fourier transformation, or raw PCM samples. The simulation results show that CNN model with spectrogram inputs yields higher detection accuracy, up to 91.8% for Jamendo dataset. Among the studied stacked ensemble methods, performing voting strategy yields comparable performance as the other methods, but with much lower computational cost. By voting with five models, the accuracy reaches 94.2% for Jamendo dataset.
ISSN:2192-1962
2192-1962
DOI:10.1186/s13673-018-0158-1