Loading…

Segmentation of Speech and Humming in Vocal Input

Non-verbal vocal interaction (NVVI) is an interaction method in which sounds other than speech produced by a human are used, such as humming. NVVI complements traditional speech recognition systems with continuous control. In order to combine the two approaches (e.g. "volume up, mmm") it i...

Full description

Saved in:

Bibliographic Details
Published in:	Radioengineering 2012-09, Vol.21 (3), p.923-929
Main Authors:	A. J. Sporka, O. Polacek, J. Havlik
Format:	Article
Language:	English
Subjects:	MFCC Multi-layer perceptron Neural network Non-verbal vocal interaction Segmentation Speech
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	929
container_issue	3
container_start_page	923
container_title	Radioengineering
container_volume	21
creator	A. J. Sporka O. Polacek J. Havlik
description	Non-verbal vocal interaction (NVVI) is an interaction method in which sounds other than speech produced by a human are used, such as humming. NVVI complements traditional speech recognition systems with continuous control. In order to combine the two approaches (e.g. "volume up, mmm") it is necessary to perform a speech/NVVI segmentation of the input sound signal. This paper presents two novel methods of speech and humming segmentation. The first method is based on classification of MFCC and RMS parameters using a neural network (MFCC method), while the other method computes volume changes in the signal (IAC method). The two methods are compared using a corpus collected from 13 speakers. The results indicate that the MFCC method outperforms IAC in terms of accuracy, precision, and recall.
format	article
fullrecord	<record><control><sourceid>doaj</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_7313cfea69944f4eb231fa316484b779</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_7313cfea69944f4eb231fa316484b779</doaj_id><sourcerecordid>oai_doaj_org_article_7313cfea69944f4eb231fa316484b779</sourcerecordid><originalsourceid>FETCH-LOGICAL-d221t-1bb38733d5421ade9659df83bd2280721dbbd1fc48ff74edc391cdeec72eff923</originalsourceid><addsrcrecordid>eNotjc1Kw0AURmeh0Fr7DvMCgdx7J5nMUoraQMFF1W2Y3zglmQlJuvDtLerqg3PgfHdsCwhlgRXghj0sy6Usa8Cq3DI4-370adVrzInnwM-T9_aL6-T48TqOMfU8Jv6ZrR54m6br-sjugx4Wv__fHft4eX4_HIvT22t7eDoVDhHWAoyhRhK5SiBo51VdKRcaMjfdlBLBGeMgWNGEIIV3lhRYd_uW6ENQSDvW_nVd1pdumuOo5-8u69j9gjz3nZ7XaAffSQKywetaKSGC8AYJgiaoRSOMlIp-ANI0S7I</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Segmentation of Speech and Humming in Vocal Input</title><source>IngentaConnect Journals</source><creator>A. J. Sporka ; O. Polacek ; J. Havlik</creator><creatorcontrib>A. J. Sporka ; O. Polacek ; J. Havlik</creatorcontrib><description>Non-verbal vocal interaction (NVVI) is an interaction method in which sounds other than speech produced by a human are used, such as humming. NVVI complements traditional speech recognition systems with continuous control. In order to combine the two approaches (e.g. "volume up, mmm") it is necessary to perform a speech/NVVI segmentation of the input sound signal. This paper presents two novel methods of speech and humming segmentation. The first method is based on classification of MFCC and RMS parameters using a neural network (MFCC method), while the other method computes volume changes in the signal (IAC method). The two methods are compared using a corpus collected from 13 speakers. The results indicate that the MFCC method outperforms IAC in terms of accuracy, precision, and recall.</description><identifier>ISSN: 1210-2512</identifier><language>eng</language><publisher>Spolecnost pro radioelektronicke inzenyrstvi</publisher><subject>MFCC ; Multi-layer perceptron ; Neural network ; Non-verbal vocal interaction ; Segmentation ; Speech</subject><ispartof>Radioengineering, 2012-09, Vol.21 (3), p.923-929</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784</link.rule.ids></links><search><creatorcontrib>A. J. Sporka</creatorcontrib><creatorcontrib>O. Polacek</creatorcontrib><creatorcontrib>J. Havlik</creatorcontrib><title>Segmentation of Speech and Humming in Vocal Input</title><title>Radioengineering</title><description>Non-verbal vocal interaction (NVVI) is an interaction method in which sounds other than speech produced by a human are used, such as humming. NVVI complements traditional speech recognition systems with continuous control. In order to combine the two approaches (e.g. "volume up, mmm") it is necessary to perform a speech/NVVI segmentation of the input sound signal. This paper presents two novel methods of speech and humming segmentation. The first method is based on classification of MFCC and RMS parameters using a neural network (MFCC method), while the other method computes volume changes in the signal (IAC method). The two methods are compared using a corpus collected from 13 speakers. The results indicate that the MFCC method outperforms IAC in terms of accuracy, precision, and recall.</description><subject>MFCC</subject><subject>Multi-layer perceptron</subject><subject>Neural network</subject><subject>Non-verbal vocal interaction</subject><subject>Segmentation</subject><subject>Speech</subject><issn>1210-2512</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><sourceid>DOA</sourceid><recordid>eNotjc1Kw0AURmeh0Fr7DvMCgdx7J5nMUoraQMFF1W2Y3zglmQlJuvDtLerqg3PgfHdsCwhlgRXghj0sy6Usa8Cq3DI4-370adVrzInnwM-T9_aL6-T48TqOMfU8Jv6ZrR54m6br-sjugx4Wv__fHft4eX4_HIvT22t7eDoVDhHWAoyhRhK5SiBo51VdKRcaMjfdlBLBGeMgWNGEIIV3lhRYd_uW6ENQSDvW_nVd1pdumuOo5-8u69j9gjz3nZ7XaAffSQKywetaKSGC8AYJgiaoRSOMlIp-ANI0S7I</recordid><startdate>20120901</startdate><enddate>20120901</enddate><creator>A. J. Sporka</creator><creator>O. Polacek</creator><creator>J. Havlik</creator><general>Spolecnost pro radioelektronicke inzenyrstvi</general><scope>DOA</scope></search><sort><creationdate>20120901</creationdate><title>Segmentation of Speech and Humming in Vocal Input</title><author>A. J. Sporka ; O. Polacek ; J. Havlik</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-d221t-1bb38733d5421ade9659df83bd2280721dbbd1fc48ff74edc391cdeec72eff923</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>MFCC</topic><topic>Multi-layer perceptron</topic><topic>Neural network</topic><topic>Non-verbal vocal interaction</topic><topic>Segmentation</topic><topic>Speech</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>A. J. Sporka</creatorcontrib><creatorcontrib>O. Polacek</creatorcontrib><creatorcontrib>J. Havlik</creatorcontrib><collection>DOAJ Directory of Open Access Journals</collection><jtitle>Radioengineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>A. J. Sporka</au><au>O. Polacek</au><au>J. Havlik</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Segmentation of Speech and Humming in Vocal Input</atitle><jtitle>Radioengineering</jtitle><date>2012-09-01</date><risdate>2012</risdate><volume>21</volume><issue>3</issue><spage>923</spage><epage>929</epage><pages>923-929</pages><issn>1210-2512</issn><abstract>Non-verbal vocal interaction (NVVI) is an interaction method in which sounds other than speech produced by a human are used, such as humming. NVVI complements traditional speech recognition systems with continuous control. In order to combine the two approaches (e.g. "volume up, mmm") it is necessary to perform a speech/NVVI segmentation of the input sound signal. This paper presents two novel methods of speech and humming segmentation. The first method is based on classification of MFCC and RMS parameters using a neural network (MFCC method), while the other method computes volume changes in the signal (IAC method). The two methods are compared using a corpus collected from 13 speakers. The results indicate that the MFCC method outperforms IAC in terms of accuracy, precision, and recall.</abstract><pub>Spolecnost pro radioelektronicke inzenyrstvi</pub><tpages>7</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1210-2512
ispartof	Radioengineering, 2012-09, Vol.21 (3), p.923-929
issn	1210-2512
language	eng
recordid	cdi_doaj_primary_oai_doaj_org_article_7313cfea69944f4eb231fa316484b779
source	IngentaConnect Journals
subjects	MFCC Multi-layer perceptron Neural network Non-verbal vocal interaction Segmentation Speech
title	Segmentation of Speech and Humming in Vocal Input
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T23%3A18%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-doaj&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Segmentation%20of%20Speech%20and%20Humming%20in%20Vocal%20Input&rft.jtitle=Radioengineering&rft.au=A.%20J.%20Sporka&rft.date=2012-09-01&rft.volume=21&rft.issue=3&rft.spage=923&rft.epage=929&rft.pages=923-929&rft.issn=1210-2512&rft_id=info:doi/&rft_dat=%3Cdoaj%3Eoai_doaj_org_article_7313cfea69944f4eb231fa316484b779%3C/doaj%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-d221t-1bb38733d5421ade9659df83bd2280721dbbd1fc48ff74edc391cdeec72eff923%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true