Loading…

Modulation-scale analysis for content identification

For nonstationary signal classification, e.g., speech or music, features are traditionally extracted from a time-shifted, yet short data window. For many applications, these short-term features do not efficiently capture or represent longer term signal variation. Partially motivated by human auditio...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on signal processing 2004-10, Vol.52 (10), p.3023-3035
Main Authors: Sukittanon, S., Atlas, L.E., Pitton, J.W.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c349t-ba4431c90ed006588c67d1ae2e6757890a030310fe8c0317be5c92f8addab6423
cites cdi_FETCH-LOGICAL-c349t-ba4431c90ed006588c67d1ae2e6757890a030310fe8c0317be5c92f8addab6423
container_end_page 3035
container_issue 10
container_start_page 3023
container_title IEEE transactions on signal processing
container_volume 52
creator Sukittanon, S.
Atlas, L.E.
Pitton, J.W.
description For nonstationary signal classification, e.g., speech or music, features are traditionally extracted from a time-shifted, yet short data window. For many applications, these short-term features do not efficiently capture or represent longer term signal variation. Partially motivated by human audition, we overcome the deficiencies of short-term features by employing modulation-scale analysis for long-term feature analysis. Our analysis, which uses time-frequency theory integrated with psychoacoustic results on modulation frequency perception, not only contains short-term information about the signals, but also provides long-term information representing patterns of time variation. This paper describes these features and their normalization. We demonstrate the effectiveness of our long-term features over conventional short-term features in content-based audio identification. A simulated study using a large data set, including nearly 10 000 songs and requiring over a billion audio pairwise comparisons, shows that modulation-scale features improves content identification accuracy substantially, especially when time and frequency distortions are imposed.
doi_str_mv 10.1109/TSP.2004.833861
format article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_proquest_journals_883604317</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>1337279</ieee_id><sourcerecordid>2426394211</sourcerecordid><originalsourceid>FETCH-LOGICAL-c349t-ba4431c90ed006588c67d1ae2e6757890a030310fe8c0317be5c92f8addab6423</originalsourceid><addsrcrecordid>eNp9kL1PwzAQxS0EEuVjZmCJGGBKe44df4yo4ksqAokisVmuc5FcpXGJk6H_PS5BQmJguXfD7z3dPUIuKEwpBT1bvr1OCwA-VYwpQQ_IhGpOc-BSHKYdSpaXSn4ck5MY1wCUcy0mhD-Hamhs70ObR2cbzGxrm130MatDl7nQ9tj2ma_S9LV33-QZOaptE_H8R0_J-_3dcv6YL14enua3i9wxrvt8ZTln1GnACkCUSjkhK2qxQCFLqTRYYMAo1KhcUrnC0umiVraq7Erwgp2SmzF324XPAWNvNj46bBrbYhii0UCFoKBoIq__JQvFNDCxj7z6A67D0KWXo1GKCUgHywTNRsh1IcYOa7Pt_MZ2O0PB7Ms2qWyzL9uMZSfH5ejwiPhLMyYLqdkX8sF5Yg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>883604317</pqid></control><display><type>article</type><title>Modulation-scale analysis for content identification</title><source>IEEE Xplore (Online service)</source><creator>Sukittanon, S. ; Atlas, L.E. ; Pitton, J.W.</creator><creatorcontrib>Sukittanon, S. ; Atlas, L.E. ; Pitton, J.W.</creatorcontrib><description>For nonstationary signal classification, e.g., speech or music, features are traditionally extracted from a time-shifted, yet short data window. For many applications, these short-term features do not efficiently capture or represent longer term signal variation. Partially motivated by human audition, we overcome the deficiencies of short-term features by employing modulation-scale analysis for long-term feature analysis. Our analysis, which uses time-frequency theory integrated with psychoacoustic results on modulation frequency perception, not only contains short-term information about the signals, but also provides long-term information representing patterns of time variation. This paper describes these features and their normalization. We demonstrate the effectiveness of our long-term features over conventional short-term features in content-based audio identification. A simulated study using a large data set, including nearly 10 000 songs and requiring over a billion audio pairwise comparisons, shows that modulation-scale features improves content identification accuracy substantially, especially when time and frequency distortions are imposed.</description><identifier>ISSN: 1053-587X</identifier><identifier>EISSN: 1941-0476</identifier><identifier>DOI: 10.1109/TSP.2004.833861</identifier><identifier>CODEN: ITPRED</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Data mining ; Distortion ; Feature extraction ; Frequency ; Human ; Humans ; Information analysis ; Modulation ; Multiple signal classification ; Pattern analysis ; Pattern classification ; Perception ; Psychoacoustics ; Signal analysis ; Signal classification ; Simulation ; Speech</subject><ispartof>IEEE transactions on signal processing, 2004-10, Vol.52 (10), p.3023-3035</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2004</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c349t-ba4431c90ed006588c67d1ae2e6757890a030310fe8c0317be5c92f8addab6423</citedby><cites>FETCH-LOGICAL-c349t-ba4431c90ed006588c67d1ae2e6757890a030310fe8c0317be5c92f8addab6423</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/1337279$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,54796</link.rule.ids></links><search><creatorcontrib>Sukittanon, S.</creatorcontrib><creatorcontrib>Atlas, L.E.</creatorcontrib><creatorcontrib>Pitton, J.W.</creatorcontrib><title>Modulation-scale analysis for content identification</title><title>IEEE transactions on signal processing</title><addtitle>TSP</addtitle><description>For nonstationary signal classification, e.g., speech or music, features are traditionally extracted from a time-shifted, yet short data window. For many applications, these short-term features do not efficiently capture or represent longer term signal variation. Partially motivated by human audition, we overcome the deficiencies of short-term features by employing modulation-scale analysis for long-term feature analysis. Our analysis, which uses time-frequency theory integrated with psychoacoustic results on modulation frequency perception, not only contains short-term information about the signals, but also provides long-term information representing patterns of time variation. This paper describes these features and their normalization. We demonstrate the effectiveness of our long-term features over conventional short-term features in content-based audio identification. A simulated study using a large data set, including nearly 10 000 songs and requiring over a billion audio pairwise comparisons, shows that modulation-scale features improves content identification accuracy substantially, especially when time and frequency distortions are imposed.</description><subject>Data mining</subject><subject>Distortion</subject><subject>Feature extraction</subject><subject>Frequency</subject><subject>Human</subject><subject>Humans</subject><subject>Information analysis</subject><subject>Modulation</subject><subject>Multiple signal classification</subject><subject>Pattern analysis</subject><subject>Pattern classification</subject><subject>Perception</subject><subject>Psychoacoustics</subject><subject>Signal analysis</subject><subject>Signal classification</subject><subject>Simulation</subject><subject>Speech</subject><issn>1053-587X</issn><issn>1941-0476</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2004</creationdate><recordtype>article</recordtype><recordid>eNp9kL1PwzAQxS0EEuVjZmCJGGBKe44df4yo4ksqAokisVmuc5FcpXGJk6H_PS5BQmJguXfD7z3dPUIuKEwpBT1bvr1OCwA-VYwpQQ_IhGpOc-BSHKYdSpaXSn4ck5MY1wCUcy0mhD-Hamhs70ObR2cbzGxrm130MatDl7nQ9tj2ma_S9LV33-QZOaptE_H8R0_J-_3dcv6YL14enua3i9wxrvt8ZTln1GnACkCUSjkhK2qxQCFLqTRYYMAo1KhcUrnC0umiVraq7Erwgp2SmzF324XPAWNvNj46bBrbYhii0UCFoKBoIq__JQvFNDCxj7z6A67D0KWXo1GKCUgHywTNRsh1IcYOa7Pt_MZ2O0PB7Ms2qWyzL9uMZSfH5ejwiPhLMyYLqdkX8sF5Yg</recordid><startdate>20041001</startdate><enddate>20041001</enddate><creator>Sukittanon, S.</creator><creator>Atlas, L.E.</creator><creator>Pitton, J.W.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>F28</scope><scope>FR3</scope></search><sort><creationdate>20041001</creationdate><title>Modulation-scale analysis for content identification</title><author>Sukittanon, S. ; Atlas, L.E. ; Pitton, J.W.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c349t-ba4431c90ed006588c67d1ae2e6757890a030310fe8c0317be5c92f8addab6423</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2004</creationdate><topic>Data mining</topic><topic>Distortion</topic><topic>Feature extraction</topic><topic>Frequency</topic><topic>Human</topic><topic>Humans</topic><topic>Information analysis</topic><topic>Modulation</topic><topic>Multiple signal classification</topic><topic>Pattern analysis</topic><topic>Pattern classification</topic><topic>Perception</topic><topic>Psychoacoustics</topic><topic>Signal analysis</topic><topic>Signal classification</topic><topic>Simulation</topic><topic>Speech</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sukittanon, S.</creatorcontrib><creatorcontrib>Atlas, L.E.</creatorcontrib><creatorcontrib>Pitton, J.W.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE/IET Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><jtitle>IEEE transactions on signal processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sukittanon, S.</au><au>Atlas, L.E.</au><au>Pitton, J.W.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Modulation-scale analysis for content identification</atitle><jtitle>IEEE transactions on signal processing</jtitle><stitle>TSP</stitle><date>2004-10-01</date><risdate>2004</risdate><volume>52</volume><issue>10</issue><spage>3023</spage><epage>3035</epage><pages>3023-3035</pages><issn>1053-587X</issn><eissn>1941-0476</eissn><coden>ITPRED</coden><abstract>For nonstationary signal classification, e.g., speech or music, features are traditionally extracted from a time-shifted, yet short data window. For many applications, these short-term features do not efficiently capture or represent longer term signal variation. Partially motivated by human audition, we overcome the deficiencies of short-term features by employing modulation-scale analysis for long-term feature analysis. Our analysis, which uses time-frequency theory integrated with psychoacoustic results on modulation frequency perception, not only contains short-term information about the signals, but also provides long-term information representing patterns of time variation. This paper describes these features and their normalization. We demonstrate the effectiveness of our long-term features over conventional short-term features in content-based audio identification. A simulated study using a large data set, including nearly 10 000 songs and requiring over a billion audio pairwise comparisons, shows that modulation-scale features improves content identification accuracy substantially, especially when time and frequency distortions are imposed.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TSP.2004.833861</doi><tpages>13</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1053-587X
ispartof IEEE transactions on signal processing, 2004-10, Vol.52 (10), p.3023-3035
issn 1053-587X
1941-0476
language eng
recordid cdi_proquest_journals_883604317
source IEEE Xplore (Online service)
subjects Data mining
Distortion
Feature extraction
Frequency
Human
Humans
Information analysis
Modulation
Multiple signal classification
Pattern analysis
Pattern classification
Perception
Psychoacoustics
Signal analysis
Signal classification
Simulation
Speech
title Modulation-scale analysis for content identification
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T04%3A55%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Modulation-scale%20analysis%20for%20content%20identification&rft.jtitle=IEEE%20transactions%20on%20signal%20processing&rft.au=Sukittanon,%20S.&rft.date=2004-10-01&rft.volume=52&rft.issue=10&rft.spage=3023&rft.epage=3035&rft.pages=3023-3035&rft.issn=1053-587X&rft.eissn=1941-0476&rft.coden=ITPRED&rft_id=info:doi/10.1109/TSP.2004.833861&rft_dat=%3Cproquest_ieee_%3E2426394211%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c349t-ba4431c90ed006588c67d1ae2e6757890a030310fe8c0317be5c92f8addab6423%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=883604317&rft_id=info:pmid/&rft_ieee_id=1337279&rfr_iscdi=true