Loading…

Computational auditory scene recognition

In this paper, we address the problem of computational auditory scene recognition and describe methods to classify auditory scenes into predefined classes. By auditory scene recognition we mean recognition of an environment using audio information only. The auditory scenes comprised tens of everyday...

Full description

Saved in:

Bibliographic Details
Main Authors:	Peltonen, Vesa, Tuomi, Juha, Klapuri, Anssi, Huopaniemi, Jyri, Sorsa, Timo
Format:	Conference Proceeding
Language:	English
Subjects:	Artificial neural networks Libraries Mel frequency cepstral coefficient Rail transportation Roads Vehicles
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	II-1944
container_issue
container_start_page	II-1941
container_title
container_volume	2
creator	Peltonen, Vesa Tuomi, Juha Klapuri, Anssi Huopaniemi, Jyri Sorsa, Timo
description	In this paper, we address the problem of computational auditory scene recognition and describe methods to classify auditory scenes into predefined classes. By auditory scene recognition we mean recognition of an environment using audio information only. The auditory scenes comprised tens of everyday outside and inside environments, such as streets, restaurants, offices, family homes, and cars. Two completely different but almost equally effective classification systems were used: band-energy ratio features with 1-NN classifier and Mel-frequency cepstral coefficients with Gaussian mixture models. The best obtained recognition rate for 17 different scenes out of 26 and for an analysis duration of 30 seconds was 68.4%. For comparison, the recognition accuracy of humans was 70% for 25 different scenes and the average response time was around 20 seconds. The efficiency of different acoustic features and the effect of test sequence length were studied.
doi_str_mv	10.1109/ICASSP.2002.5745009
format	conference_proceeding
fullrecord	<record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_5745009</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5745009</ieee_id><sourcerecordid>5745009</sourcerecordid><originalsourceid>FETCH-LOGICAL-i173t-d5391996a18dd3fd9f7a05bee6af58525a66f4eb1b5779f0cc96e3b2485372a43</originalsourceid><addsrcrecordid>eNotj0tLw0AUhQcfYKz9Bd1k6SbxzvPOXUrQKhQUquCuTJI7MtImJUkX_fdWLBw4iw8O3xFiIaGUEujhtXpcr99LBaBKi8YC0IXIlEYqJMHXpZgTejhFowGFVyKTVkHhpKEbcTuOPwDg0fhM3Ff9bn-YwpT6LmzzcGjT1A_HfGy443zgpv_u0h-8E9cxbEeen3smPp-fPqqXYvW2POmsiiRRT0VrNUkiF6RvWx1bihjA1swuROutssG5aLiWtUWkCE1DjnWtjLcaVTB6Jhb_u4mZN_sh7cJw3JxP6l8w6UPw</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Computational auditory scene recognition</title><source>IEEE Xplore All Conference Series</source><creator>Peltonen, Vesa ; Tuomi, Juha ; Klapuri, Anssi ; Huopaniemi, Jyri ; Sorsa, Timo</creator><creatorcontrib>Peltonen, Vesa ; Tuomi, Juha ; Klapuri, Anssi ; Huopaniemi, Jyri ; Sorsa, Timo</creatorcontrib><description>In this paper, we address the problem of computational auditory scene recognition and describe methods to classify auditory scenes into predefined classes. By auditory scene recognition we mean recognition of an environment using audio information only. The auditory scenes comprised tens of everyday outside and inside environments, such as streets, restaurants, offices, family homes, and cars. Two completely different but almost equally effective classification systems were used: band-energy ratio features with 1-NN classifier and Mel-frequency cepstral coefficients with Gaussian mixture models. The best obtained recognition rate for 17 different scenes out of 26 and for an analysis duration of 30 seconds was 68.4%. For comparison, the recognition accuracy of humans was 70% for 25 different scenes and the average response time was around 20 seconds. The efficiency of different acoustic features and the effect of test sequence length were studied.</description><identifier>ISSN: 1520-6149</identifier><identifier>ISBN: 9780780374027</identifier><identifier>ISBN: 0780374029</identifier><identifier>EISSN: 2379-190X</identifier><identifier>DOI: 10.1109/ICASSP.2002.5745009</identifier><language>eng</language><publisher>IEEE</publisher><subject>Artificial neural networks ; Libraries ; Mel frequency cepstral coefficient ; Rail transportation ; Roads ; Vehicles</subject><ispartof>2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2002, Vol.2, p.II-1941-II-1944</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5745009$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,27904,54533,54910</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5745009$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Peltonen, Vesa</creatorcontrib><creatorcontrib>Tuomi, Juha</creatorcontrib><creatorcontrib>Klapuri, Anssi</creatorcontrib><creatorcontrib>Huopaniemi, Jyri</creatorcontrib><creatorcontrib>Sorsa, Timo</creatorcontrib><title>Computational auditory scene recognition</title><title>2002 IEEE International Conference on Acoustics, Speech, and Signal Processing</title><addtitle>ICASSP</addtitle><description>In this paper, we address the problem of computational auditory scene recognition and describe methods to classify auditory scenes into predefined classes. By auditory scene recognition we mean recognition of an environment using audio information only. The auditory scenes comprised tens of everyday outside and inside environments, such as streets, restaurants, offices, family homes, and cars. Two completely different but almost equally effective classification systems were used: band-energy ratio features with 1-NN classifier and Mel-frequency cepstral coefficients with Gaussian mixture models. The best obtained recognition rate for 17 different scenes out of 26 and for an analysis duration of 30 seconds was 68.4%. For comparison, the recognition accuracy of humans was 70% for 25 different scenes and the average response time was around 20 seconds. The efficiency of different acoustic features and the effect of test sequence length were studied.</description><subject>Artificial neural networks</subject><subject>Libraries</subject><subject>Mel frequency cepstral coefficient</subject><subject>Rail transportation</subject><subject>Roads</subject><subject>Vehicles</subject><issn>1520-6149</issn><issn>2379-190X</issn><isbn>9780780374027</isbn><isbn>0780374029</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2002</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotj0tLw0AUhQcfYKz9Bd1k6SbxzvPOXUrQKhQUquCuTJI7MtImJUkX_fdWLBw4iw8O3xFiIaGUEujhtXpcr99LBaBKi8YC0IXIlEYqJMHXpZgTejhFowGFVyKTVkHhpKEbcTuOPwDg0fhM3Ff9bn-YwpT6LmzzcGjT1A_HfGy443zgpv_u0h-8E9cxbEeen3smPp-fPqqXYvW2POmsiiRRT0VrNUkiF6RvWx1bihjA1swuROutssG5aLiWtUWkCE1DjnWtjLcaVTB6Jhb_u4mZN_sh7cJw3JxP6l8w6UPw</recordid><startdate>20020101</startdate><enddate>20020101</enddate><creator>Peltonen, Vesa</creator><creator>Tuomi, Juha</creator><creator>Klapuri, Anssi</creator><creator>Huopaniemi, Jyri</creator><creator>Sorsa, Timo</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>20020101</creationdate><title>Computational auditory scene recognition</title><author>Peltonen, Vesa ; Tuomi, Juha ; Klapuri, Anssi ; Huopaniemi, Jyri ; Sorsa, Timo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i173t-d5391996a18dd3fd9f7a05bee6af58525a66f4eb1b5779f0cc96e3b2485372a43</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2002</creationdate><topic>Artificial neural networks</topic><topic>Libraries</topic><topic>Mel frequency cepstral coefficient</topic><topic>Rail transportation</topic><topic>Roads</topic><topic>Vehicles</topic><toplevel>online_resources</toplevel><creatorcontrib>Peltonen, Vesa</creatorcontrib><creatorcontrib>Tuomi, Juha</creatorcontrib><creatorcontrib>Klapuri, Anssi</creatorcontrib><creatorcontrib>Huopaniemi, Jyri</creatorcontrib><creatorcontrib>Sorsa, Timo</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Peltonen, Vesa</au><au>Tuomi, Juha</au><au>Klapuri, Anssi</au><au>Huopaniemi, Jyri</au><au>Sorsa, Timo</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Computational auditory scene recognition</atitle><btitle>2002 IEEE International Conference on Acoustics, Speech, and Signal Processing</btitle><stitle>ICASSP</stitle><date>2002-01-01</date><risdate>2002</risdate><volume>2</volume><spage>II-1941</spage><epage>II-1944</epage><pages>II-1941-II-1944</pages><issn>1520-6149</issn><eissn>2379-190X</eissn><isbn>9780780374027</isbn><isbn>0780374029</isbn><abstract>In this paper, we address the problem of computational auditory scene recognition and describe methods to classify auditory scenes into predefined classes. By auditory scene recognition we mean recognition of an environment using audio information only. The auditory scenes comprised tens of everyday outside and inside environments, such as streets, restaurants, offices, family homes, and cars. Two completely different but almost equally effective classification systems were used: band-energy ratio features with 1-NN classifier and Mel-frequency cepstral coefficients with Gaussian mixture models. The best obtained recognition rate for 17 different scenes out of 26 and for an analysis duration of 30 seconds was 68.4%. For comparison, the recognition accuracy of humans was 70% for 25 different scenes and the average response time was around 20 seconds. The efficiency of different acoustic features and the effect of test sequence length were studied.</abstract><pub>IEEE</pub><doi>10.1109/ICASSP.2002.5745009</doi></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1520-6149
ispartof	2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2002, Vol.2, p.II-1941-II-1944
issn	1520-6149 2379-190X
language	eng
recordid	cdi_ieee_primary_5745009
source	IEEE Xplore All Conference Series
subjects	Artificial neural networks Libraries Mel frequency cepstral coefficient Rail transportation Roads Vehicles
title	Computational auditory scene recognition
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T21%3A35%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Computational%20auditory%20scene%20recognition&rft.btitle=2002%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech,%20and%20Signal%20Processing&rft.au=Peltonen,%20Vesa&rft.date=2002-01-01&rft.volume=2&rft.spage=II-1941&rft.epage=II-1944&rft.pages=II-1941-II-1944&rft.issn=1520-6149&rft.eissn=2379-190X&rft.isbn=9780780374027&rft.isbn_list=0780374029&rft_id=info:doi/10.1109/ICASSP.2002.5745009&rft_dat=%3Cieee_CHZPO%3E5745009%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i173t-d5391996a18dd3fd9f7a05bee6af58525a66f4eb1b5779f0cc96e3b2485372a43%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5745009&rfr_iscdi=true