Loading…

Comparative study of singing voice detection based on deep neural networks and ensemble learning

This paper investigates various structures of neural network models and various types of stacked ensembles for singing voice detection. The studied models include convolutional neural networks (CNN), long short term memory (LSTM) model, convolutional LSTM model, and capsule net. The input features t...

Full description

Saved in:

Bibliographic Details
Published in:	Human-centric computing and information sciences 2018-11, Vol.8 (1), p.1-18, Article 34
Main Authors:	You, Shingchern D., Liu, Chien-Hung, Chen, Woei-Kae
Format:	Article
Language:	English
Subjects:	Artificial Intelligence Artificial neural networks Communications Engineering Comparative studies Computer Science Computer simulation Computer Systems Organization and Communication Networks Ensemble learning Fourier transforms Information Systems and Communication Service Information Systems Applications (incl.Internet) Model accuracy Networks Neural networks Short term Singing User Interfaces and Human Computer Interaction Voice recognition Voting
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This paper investigates various structures of neural network models and various types of stacked ensembles for singing voice detection. The studied models include convolutional neural networks (CNN), long short term memory (LSTM) model, convolutional LSTM model, and capsule net. The input features to the network models are MFCC (mel-frequency cepstrum coefficients), spectrogram from short-time Fourier transformation, or raw PCM samples. The simulation results show that CNN model with spectrogram inputs yields higher detection accuracy, up to 91.8% for Jamendo dataset. Among the studied stacked ensemble methods, performing voting strategy yields comparable performance as the other methods, but with much lower computational cost. By voting with five models, the accuracy reaches 94.2% for Jamendo dataset.
ISSN:	2192-1962 2192-1962
DOI:	10.1186/s13673-018-0158-1