Loading…

Deep learning of chroma representation for cover song identification in compression domain

Methods for identifying a cover song typically involve comparing the similarity of chroma features between the query song and another song in the data set. However, considerable time is required for pairwise comparisons. In addition, to save disk space, most songs stored in the data set are in a com...

Full description

Saved in:
Bibliographic Details
Published in:Multidimensional systems and signal processing 2018-07, Vol.29 (3), p.887-902
Main Authors: Fang, Jiunn-Tsair, Chang, Yu-Ruey, Chang, Pao-Chi
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c316t-50d7466f1b826e7a07f540a19878a544c21386fbeb374b940b9a810442fa63e93
cites cdi_FETCH-LOGICAL-c316t-50d7466f1b826e7a07f540a19878a544c21386fbeb374b940b9a810442fa63e93
container_end_page 902
container_issue 3
container_start_page 887
container_title Multidimensional systems and signal processing
container_volume 29
creator Fang, Jiunn-Tsair
Chang, Yu-Ruey
Chang, Pao-Chi
description Methods for identifying a cover song typically involve comparing the similarity of chroma features between the query song and another song in the data set. However, considerable time is required for pairwise comparisons. In addition, to save disk space, most songs stored in the data set are in a compressed format. Therefore, to eliminate some decoding procedures, this study extracted music information directly from the modified discrete cosine transform coefficients of advanced audio coding and then mapped these coefficients to 12-dimensional chroma features. The chroma features were segmented to preserve the melodies. Each chroma feature segment was trained and learned by a sparse autoencoder, a deep learning architecture of artificial neural networks. The deep learning procedure was to transform chroma features into an intermediate representation for dimension reduction. Experimental results from a covers80 data set showed that the mean reciprocal rank increased to 0.5 and the matching time was reduced by over 94% compared with traditional approaches.
doi_str_mv 10.1007/s11045-017-0476-x
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2015877885</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2015877885</sourcerecordid><originalsourceid>FETCH-LOGICAL-c316t-50d7466f1b826e7a07f540a19878a544c21386fbeb374b940b9a810442fa63e93</originalsourceid><addsrcrecordid>eNp1kEtLAzEUhYMoWKs_wF3AdfRmJq9ZSn1CwY1u3ITMNKkpbTImU6n_3gwjuHJ1uZzzncs9CF1SuKYA8iZTCowToJIAk4IcjtCMclkTUBU7RjNoqpqIspyis5w3AIWiYobe76zt8daaFHxY4-hw95HizuBk-2SzDYMZfAzYxYS7-GUTzrH4_Koo3vluUn0o4m4E8riuSoAP5-jEmW22F79zjt4e7l8XT2T58vi8uF2SrqZiIBxWkgnhaKsqYaUB6TgDQxslleGMdRWtlXCtbWvJ2oZB2xhVnmWVM6K2TT1HV1Nun-Ln3uZBb-I-hXJSV0C5klIpXlx0cnUp5pys033yO5O-NQU9VqinCnWpUI8V6kNhqonJxRvWNv0l_w_9AJ90dIo</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2015877885</pqid></control><display><type>article</type><title>Deep learning of chroma representation for cover song identification in compression domain</title><source>Springer Nature</source><creator>Fang, Jiunn-Tsair ; Chang, Yu-Ruey ; Chang, Pao-Chi</creator><creatorcontrib>Fang, Jiunn-Tsair ; Chang, Yu-Ruey ; Chang, Pao-Chi</creatorcontrib><description>Methods for identifying a cover song typically involve comparing the similarity of chroma features between the query song and another song in the data set. However, considerable time is required for pairwise comparisons. In addition, to save disk space, most songs stored in the data set are in a compressed format. Therefore, to eliminate some decoding procedures, this study extracted music information directly from the modified discrete cosine transform coefficients of advanced audio coding and then mapped these coefficients to 12-dimensional chroma features. The chroma features were segmented to preserve the melodies. Each chroma feature segment was trained and learned by a sparse autoencoder, a deep learning architecture of artificial neural networks. The deep learning procedure was to transform chroma features into an intermediate representation for dimension reduction. Experimental results from a covers80 data set showed that the mean reciprocal rank increased to 0.5 and the matching time was reduced by over 94% compared with traditional approaches.</description><identifier>ISSN: 0923-6082</identifier><identifier>EISSN: 1573-0824</identifier><identifier>DOI: 10.1007/s11045-017-0476-x</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Artificial Intelligence ; Artificial neural networks ; Audio data ; Circuits and Systems ; Datasets ; Decoding ; Deep learning ; Discrete cosine transform ; Electrical Engineering ; Engineering ; Identification methods ; Music ; Neural networks ; Representations ; Signal,Image and Speech Processing</subject><ispartof>Multidimensional systems and signal processing, 2018-07, Vol.29 (3), p.887-902</ispartof><rights>Springer Science+Business Media New York 2017</rights><rights>Copyright Springer Science &amp; Business Media 2018</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c316t-50d7466f1b826e7a07f540a19878a544c21386fbeb374b940b9a810442fa63e93</citedby><cites>FETCH-LOGICAL-c316t-50d7466f1b826e7a07f540a19878a544c21386fbeb374b940b9a810442fa63e93</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Fang, Jiunn-Tsair</creatorcontrib><creatorcontrib>Chang, Yu-Ruey</creatorcontrib><creatorcontrib>Chang, Pao-Chi</creatorcontrib><title>Deep learning of chroma representation for cover song identification in compression domain</title><title>Multidimensional systems and signal processing</title><addtitle>Multidim Syst Sign Process</addtitle><description>Methods for identifying a cover song typically involve comparing the similarity of chroma features between the query song and another song in the data set. However, considerable time is required for pairwise comparisons. In addition, to save disk space, most songs stored in the data set are in a compressed format. Therefore, to eliminate some decoding procedures, this study extracted music information directly from the modified discrete cosine transform coefficients of advanced audio coding and then mapped these coefficients to 12-dimensional chroma features. The chroma features were segmented to preserve the melodies. Each chroma feature segment was trained and learned by a sparse autoencoder, a deep learning architecture of artificial neural networks. The deep learning procedure was to transform chroma features into an intermediate representation for dimension reduction. Experimental results from a covers80 data set showed that the mean reciprocal rank increased to 0.5 and the matching time was reduced by over 94% compared with traditional approaches.</description><subject>Artificial Intelligence</subject><subject>Artificial neural networks</subject><subject>Audio data</subject><subject>Circuits and Systems</subject><subject>Datasets</subject><subject>Decoding</subject><subject>Deep learning</subject><subject>Discrete cosine transform</subject><subject>Electrical Engineering</subject><subject>Engineering</subject><subject>Identification methods</subject><subject>Music</subject><subject>Neural networks</subject><subject>Representations</subject><subject>Signal,Image and Speech Processing</subject><issn>0923-6082</issn><issn>1573-0824</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNp1kEtLAzEUhYMoWKs_wF3AdfRmJq9ZSn1CwY1u3ITMNKkpbTImU6n_3gwjuHJ1uZzzncs9CF1SuKYA8iZTCowToJIAk4IcjtCMclkTUBU7RjNoqpqIspyis5w3AIWiYobe76zt8daaFHxY4-hw95HizuBk-2SzDYMZfAzYxYS7-GUTzrH4_Koo3vluUn0o4m4E8riuSoAP5-jEmW22F79zjt4e7l8XT2T58vi8uF2SrqZiIBxWkgnhaKsqYaUB6TgDQxslleGMdRWtlXCtbWvJ2oZB2xhVnmWVM6K2TT1HV1Nun-Ln3uZBb-I-hXJSV0C5klIpXlx0cnUp5pys033yO5O-NQU9VqinCnWpUI8V6kNhqonJxRvWNv0l_w_9AJ90dIo</recordid><startdate>20180701</startdate><enddate>20180701</enddate><creator>Fang, Jiunn-Tsair</creator><creator>Chang, Yu-Ruey</creator><creator>Chang, Pao-Chi</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20180701</creationdate><title>Deep learning of chroma representation for cover song identification in compression domain</title><author>Fang, Jiunn-Tsair ; Chang, Yu-Ruey ; Chang, Pao-Chi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c316t-50d7466f1b826e7a07f540a19878a544c21386fbeb374b940b9a810442fa63e93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Artificial Intelligence</topic><topic>Artificial neural networks</topic><topic>Audio data</topic><topic>Circuits and Systems</topic><topic>Datasets</topic><topic>Decoding</topic><topic>Deep learning</topic><topic>Discrete cosine transform</topic><topic>Electrical Engineering</topic><topic>Engineering</topic><topic>Identification methods</topic><topic>Music</topic><topic>Neural networks</topic><topic>Representations</topic><topic>Signal,Image and Speech Processing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Fang, Jiunn-Tsair</creatorcontrib><creatorcontrib>Chang, Yu-Ruey</creatorcontrib><creatorcontrib>Chang, Pao-Chi</creatorcontrib><collection>CrossRef</collection><jtitle>Multidimensional systems and signal processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Fang, Jiunn-Tsair</au><au>Chang, Yu-Ruey</au><au>Chang, Pao-Chi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep learning of chroma representation for cover song identification in compression domain</atitle><jtitle>Multidimensional systems and signal processing</jtitle><stitle>Multidim Syst Sign Process</stitle><date>2018-07-01</date><risdate>2018</risdate><volume>29</volume><issue>3</issue><spage>887</spage><epage>902</epage><pages>887-902</pages><issn>0923-6082</issn><eissn>1573-0824</eissn><abstract>Methods for identifying a cover song typically involve comparing the similarity of chroma features between the query song and another song in the data set. However, considerable time is required for pairwise comparisons. In addition, to save disk space, most songs stored in the data set are in a compressed format. Therefore, to eliminate some decoding procedures, this study extracted music information directly from the modified discrete cosine transform coefficients of advanced audio coding and then mapped these coefficients to 12-dimensional chroma features. The chroma features were segmented to preserve the melodies. Each chroma feature segment was trained and learned by a sparse autoencoder, a deep learning architecture of artificial neural networks. The deep learning procedure was to transform chroma features into an intermediate representation for dimension reduction. Experimental results from a covers80 data set showed that the mean reciprocal rank increased to 0.5 and the matching time was reduced by over 94% compared with traditional approaches.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11045-017-0476-x</doi><tpages>16</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0923-6082
ispartof Multidimensional systems and signal processing, 2018-07, Vol.29 (3), p.887-902
issn 0923-6082
1573-0824
language eng
recordid cdi_proquest_journals_2015877885
source Springer Nature
subjects Artificial Intelligence
Artificial neural networks
Audio data
Circuits and Systems
Datasets
Decoding
Deep learning
Discrete cosine transform
Electrical Engineering
Engineering
Identification methods
Music
Neural networks
Representations
Signal,Image and Speech Processing
title Deep learning of chroma representation for cover song identification in compression domain
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T19%3A40%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20learning%20of%20chroma%20representation%20for%20cover%20song%20identification%20in%20compression%20domain&rft.jtitle=Multidimensional%20systems%20and%20signal%20processing&rft.au=Fang,%20Jiunn-Tsair&rft.date=2018-07-01&rft.volume=29&rft.issue=3&rft.spage=887&rft.epage=902&rft.pages=887-902&rft.issn=0923-6082&rft.eissn=1573-0824&rft_id=info:doi/10.1007/s11045-017-0476-x&rft_dat=%3Cproquest_cross%3E2015877885%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c316t-50d7466f1b826e7a07f540a19878a544c21386fbeb374b940b9a810442fa63e93%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2015877885&rft_id=info:pmid/&rfr_iscdi=true