Loading…
Modulation-scale analysis for content identification
For nonstationary signal classification, e.g., speech or music, features are traditionally extracted from a time-shifted, yet short data window. For many applications, these short-term features do not efficiently capture or represent longer term signal variation. Partially motivated by human auditio...
Saved in:
Published in: | IEEE transactions on signal processing 2004-10, Vol.52 (10), p.3023-3035 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c349t-ba4431c90ed006588c67d1ae2e6757890a030310fe8c0317be5c92f8addab6423 |
---|---|
cites | cdi_FETCH-LOGICAL-c349t-ba4431c90ed006588c67d1ae2e6757890a030310fe8c0317be5c92f8addab6423 |
container_end_page | 3035 |
container_issue | 10 |
container_start_page | 3023 |
container_title | IEEE transactions on signal processing |
container_volume | 52 |
creator | Sukittanon, S. Atlas, L.E. Pitton, J.W. |
description | For nonstationary signal classification, e.g., speech or music, features are traditionally extracted from a time-shifted, yet short data window. For many applications, these short-term features do not efficiently capture or represent longer term signal variation. Partially motivated by human audition, we overcome the deficiencies of short-term features by employing modulation-scale analysis for long-term feature analysis. Our analysis, which uses time-frequency theory integrated with psychoacoustic results on modulation frequency perception, not only contains short-term information about the signals, but also provides long-term information representing patterns of time variation. This paper describes these features and their normalization. We demonstrate the effectiveness of our long-term features over conventional short-term features in content-based audio identification. A simulated study using a large data set, including nearly 10 000 songs and requiring over a billion audio pairwise comparisons, shows that modulation-scale features improves content identification accuracy substantially, especially when time and frequency distortions are imposed. |
doi_str_mv | 10.1109/TSP.2004.833861 |
format | article |
fullrecord | <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_proquest_journals_883604317</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>1337279</ieee_id><sourcerecordid>2426394211</sourcerecordid><originalsourceid>FETCH-LOGICAL-c349t-ba4431c90ed006588c67d1ae2e6757890a030310fe8c0317be5c92f8addab6423</originalsourceid><addsrcrecordid>eNp9kL1PwzAQxS0EEuVjZmCJGGBKe44df4yo4ksqAokisVmuc5FcpXGJk6H_PS5BQmJguXfD7z3dPUIuKEwpBT1bvr1OCwA-VYwpQQ_IhGpOc-BSHKYdSpaXSn4ck5MY1wCUcy0mhD-Hamhs70ObR2cbzGxrm130MatDl7nQ9tj2ma_S9LV33-QZOaptE_H8R0_J-_3dcv6YL14enua3i9wxrvt8ZTln1GnACkCUSjkhK2qxQCFLqTRYYMAo1KhcUrnC0umiVraq7Erwgp2SmzF324XPAWNvNj46bBrbYhii0UCFoKBoIq__JQvFNDCxj7z6A67D0KWXo1GKCUgHywTNRsh1IcYOa7Pt_MZ2O0PB7Ms2qWyzL9uMZSfH5ejwiPhLMyYLqdkX8sF5Yg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>883604317</pqid></control><display><type>article</type><title>Modulation-scale analysis for content identification</title><source>IEEE Xplore (Online service)</source><creator>Sukittanon, S. ; Atlas, L.E. ; Pitton, J.W.</creator><creatorcontrib>Sukittanon, S. ; Atlas, L.E. ; Pitton, J.W.</creatorcontrib><description>For nonstationary signal classification, e.g., speech or music, features are traditionally extracted from a time-shifted, yet short data window. For many applications, these short-term features do not efficiently capture or represent longer term signal variation. Partially motivated by human audition, we overcome the deficiencies of short-term features by employing modulation-scale analysis for long-term feature analysis. Our analysis, which uses time-frequency theory integrated with psychoacoustic results on modulation frequency perception, not only contains short-term information about the signals, but also provides long-term information representing patterns of time variation. This paper describes these features and their normalization. We demonstrate the effectiveness of our long-term features over conventional short-term features in content-based audio identification. A simulated study using a large data set, including nearly 10 000 songs and requiring over a billion audio pairwise comparisons, shows that modulation-scale features improves content identification accuracy substantially, especially when time and frequency distortions are imposed.</description><identifier>ISSN: 1053-587X</identifier><identifier>EISSN: 1941-0476</identifier><identifier>DOI: 10.1109/TSP.2004.833861</identifier><identifier>CODEN: ITPRED</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Data mining ; Distortion ; Feature extraction ; Frequency ; Human ; Humans ; Information analysis ; Modulation ; Multiple signal classification ; Pattern analysis ; Pattern classification ; Perception ; Psychoacoustics ; Signal analysis ; Signal classification ; Simulation ; Speech</subject><ispartof>IEEE transactions on signal processing, 2004-10, Vol.52 (10), p.3023-3035</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2004</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c349t-ba4431c90ed006588c67d1ae2e6757890a030310fe8c0317be5c92f8addab6423</citedby><cites>FETCH-LOGICAL-c349t-ba4431c90ed006588c67d1ae2e6757890a030310fe8c0317be5c92f8addab6423</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/1337279$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,54796</link.rule.ids></links><search><creatorcontrib>Sukittanon, S.</creatorcontrib><creatorcontrib>Atlas, L.E.</creatorcontrib><creatorcontrib>Pitton, J.W.</creatorcontrib><title>Modulation-scale analysis for content identification</title><title>IEEE transactions on signal processing</title><addtitle>TSP</addtitle><description>For nonstationary signal classification, e.g., speech or music, features are traditionally extracted from a time-shifted, yet short data window. For many applications, these short-term features do not efficiently capture or represent longer term signal variation. Partially motivated by human audition, we overcome the deficiencies of short-term features by employing modulation-scale analysis for long-term feature analysis. Our analysis, which uses time-frequency theory integrated with psychoacoustic results on modulation frequency perception, not only contains short-term information about the signals, but also provides long-term information representing patterns of time variation. This paper describes these features and their normalization. We demonstrate the effectiveness of our long-term features over conventional short-term features in content-based audio identification. A simulated study using a large data set, including nearly 10 000 songs and requiring over a billion audio pairwise comparisons, shows that modulation-scale features improves content identification accuracy substantially, especially when time and frequency distortions are imposed.</description><subject>Data mining</subject><subject>Distortion</subject><subject>Feature extraction</subject><subject>Frequency</subject><subject>Human</subject><subject>Humans</subject><subject>Information analysis</subject><subject>Modulation</subject><subject>Multiple signal classification</subject><subject>Pattern analysis</subject><subject>Pattern classification</subject><subject>Perception</subject><subject>Psychoacoustics</subject><subject>Signal analysis</subject><subject>Signal classification</subject><subject>Simulation</subject><subject>Speech</subject><issn>1053-587X</issn><issn>1941-0476</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2004</creationdate><recordtype>article</recordtype><recordid>eNp9kL1PwzAQxS0EEuVjZmCJGGBKe44df4yo4ksqAokisVmuc5FcpXGJk6H_PS5BQmJguXfD7z3dPUIuKEwpBT1bvr1OCwA-VYwpQQ_IhGpOc-BSHKYdSpaXSn4ck5MY1wCUcy0mhD-Hamhs70ObR2cbzGxrm130MatDl7nQ9tj2ma_S9LV33-QZOaptE_H8R0_J-_3dcv6YL14enua3i9wxrvt8ZTln1GnACkCUSjkhK2qxQCFLqTRYYMAo1KhcUrnC0umiVraq7Erwgp2SmzF324XPAWNvNj46bBrbYhii0UCFoKBoIq__JQvFNDCxj7z6A67D0KWXo1GKCUgHywTNRsh1IcYOa7Pt_MZ2O0PB7Ms2qWyzL9uMZSfH5ejwiPhLMyYLqdkX8sF5Yg</recordid><startdate>20041001</startdate><enddate>20041001</enddate><creator>Sukittanon, S.</creator><creator>Atlas, L.E.</creator><creator>Pitton, J.W.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>F28</scope><scope>FR3</scope></search><sort><creationdate>20041001</creationdate><title>Modulation-scale analysis for content identification</title><author>Sukittanon, S. ; Atlas, L.E. ; Pitton, J.W.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c349t-ba4431c90ed006588c67d1ae2e6757890a030310fe8c0317be5c92f8addab6423</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2004</creationdate><topic>Data mining</topic><topic>Distortion</topic><topic>Feature extraction</topic><topic>Frequency</topic><topic>Human</topic><topic>Humans</topic><topic>Information analysis</topic><topic>Modulation</topic><topic>Multiple signal classification</topic><topic>Pattern analysis</topic><topic>Pattern classification</topic><topic>Perception</topic><topic>Psychoacoustics</topic><topic>Signal analysis</topic><topic>Signal classification</topic><topic>Simulation</topic><topic>Speech</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sukittanon, S.</creatorcontrib><creatorcontrib>Atlas, L.E.</creatorcontrib><creatorcontrib>Pitton, J.W.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE/IET Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><jtitle>IEEE transactions on signal processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sukittanon, S.</au><au>Atlas, L.E.</au><au>Pitton, J.W.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Modulation-scale analysis for content identification</atitle><jtitle>IEEE transactions on signal processing</jtitle><stitle>TSP</stitle><date>2004-10-01</date><risdate>2004</risdate><volume>52</volume><issue>10</issue><spage>3023</spage><epage>3035</epage><pages>3023-3035</pages><issn>1053-587X</issn><eissn>1941-0476</eissn><coden>ITPRED</coden><abstract>For nonstationary signal classification, e.g., speech or music, features are traditionally extracted from a time-shifted, yet short data window. For many applications, these short-term features do not efficiently capture or represent longer term signal variation. Partially motivated by human audition, we overcome the deficiencies of short-term features by employing modulation-scale analysis for long-term feature analysis. Our analysis, which uses time-frequency theory integrated with psychoacoustic results on modulation frequency perception, not only contains short-term information about the signals, but also provides long-term information representing patterns of time variation. This paper describes these features and their normalization. We demonstrate the effectiveness of our long-term features over conventional short-term features in content-based audio identification. A simulated study using a large data set, including nearly 10 000 songs and requiring over a billion audio pairwise comparisons, shows that modulation-scale features improves content identification accuracy substantially, especially when time and frequency distortions are imposed.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TSP.2004.833861</doi><tpages>13</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1053-587X |
ispartof | IEEE transactions on signal processing, 2004-10, Vol.52 (10), p.3023-3035 |
issn | 1053-587X 1941-0476 |
language | eng |
recordid | cdi_proquest_journals_883604317 |
source | IEEE Xplore (Online service) |
subjects | Data mining Distortion Feature extraction Frequency Human Humans Information analysis Modulation Multiple signal classification Pattern analysis Pattern classification Perception Psychoacoustics Signal analysis Signal classification Simulation Speech |
title | Modulation-scale analysis for content identification |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T04%3A55%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Modulation-scale%20analysis%20for%20content%20identification&rft.jtitle=IEEE%20transactions%20on%20signal%20processing&rft.au=Sukittanon,%20S.&rft.date=2004-10-01&rft.volume=52&rft.issue=10&rft.spage=3023&rft.epage=3035&rft.pages=3023-3035&rft.issn=1053-587X&rft.eissn=1941-0476&rft.coden=ITPRED&rft_id=info:doi/10.1109/TSP.2004.833861&rft_dat=%3Cproquest_ieee_%3E2426394211%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c349t-ba4431c90ed006588c67d1ae2e6757890a030310fe8c0317be5c92f8addab6423%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=883604317&rft_id=info:pmid/&rft_ieee_id=1337279&rfr_iscdi=true |