Loading…

Deep learning of chroma representation for cover song identification in compression domain

Methods for identifying a cover song typically involve comparing the similarity of chroma features between the query song and another song in the data set. However, considerable time is required for pairwise comparisons. In addition, to save disk space, most songs stored in the data set are in a com...

Full description

Saved in:

Bibliographic Details
Published in:	Multidimensional systems and signal processing 2018-07, Vol.29 (3), p.887-902
Main Authors:	Fang, Jiunn-Tsair, Chang, Yu-Ruey, Chang, Pao-Chi
Format:	Article
Language:	English
Subjects:	Artificial Intelligence Artificial neural networks Audio data Circuits and Systems Datasets Decoding Deep learning Discrete cosine transform Electrical Engineering Engineering Identification methods Music Neural networks Representations Signal,Image and Speech Processing
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c316t-50d7466f1b826e7a07f540a19878a544c21386fbeb374b940b9a810442fa63e93
cites	cdi_FETCH-LOGICAL-c316t-50d7466f1b826e7a07f540a19878a544c21386fbeb374b940b9a810442fa63e93
container_end_page	902
container_issue	3
container_start_page	887
container_title	Multidimensional systems and signal processing
container_volume	29
creator	Fang, Jiunn-Tsair Chang, Yu-Ruey Chang, Pao-Chi
description	Methods for identifying a cover song typically involve comparing the similarity of chroma features between the query song and another song in the data set. However, considerable time is required for pairwise comparisons. In addition, to save disk space, most songs stored in the data set are in a compressed format. Therefore, to eliminate some decoding procedures, this study extracted music information directly from the modified discrete cosine transform coefficients of advanced audio coding and then mapped these coefficients to 12-dimensional chroma features. The chroma features were segmented to preserve the melodies. Each chroma feature segment was trained and learned by a sparse autoencoder, a deep learning architecture of artificial neural networks. The deep learning procedure was to transform chroma features into an intermediate representation for dimension reduction. Experimental results from a covers80 data set showed that the mean reciprocal rank increased to 0.5 and the matching time was reduced by over 94% compared with traditional approaches.
doi_str_mv	10.1007/s11045-017-0476-x
format	article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2015877885</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2015877885</sourcerecordid><originalsourceid>FETCH-LOGICAL-c316t-50d7466f1b826e7a07f540a19878a544c21386fbeb374b940b9a810442fa63e93</originalsourceid><addsrcrecordid>eNp1kEtLAzEUhYMoWKs_wF3AdfRmJq9ZSn1CwY1u3ITMNKkpbTImU6n_3gwjuHJ1uZzzncs9CF1SuKYA8iZTCowToJIAk4IcjtCMclkTUBU7RjNoqpqIspyis5w3AIWiYobe76zt8daaFHxY4-hw95HizuBk-2SzDYMZfAzYxYS7-GUTzrH4_Koo3vluUn0o4m4E8riuSoAP5-jEmW22F79zjt4e7l8XT2T58vi8uF2SrqZiIBxWkgnhaKsqYaUB6TgDQxslleGMdRWtlXCtbWvJ2oZB2xhVnmWVM6K2TT1HV1Nun-Ln3uZBb-I-hXJSV0C5klIpXlx0cnUp5pys033yO5O-NQU9VqinCnWpUI8V6kNhqonJxRvWNv0l_w_9AJ90dIo</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2015877885</pqid></control><display><type>article</type><title>Deep learning of chroma representation for cover song identification in compression domain</title><source>Springer Nature</source><creator>Fang, Jiunn-Tsair ; Chang, Yu-Ruey ; Chang, Pao-Chi</creator><creatorcontrib>Fang, Jiunn-Tsair ; Chang, Yu-Ruey ; Chang, Pao-Chi</creatorcontrib><description>Methods for identifying a cover song typically involve comparing the similarity of chroma features between the query song and another song in the data set. However, considerable time is required for pairwise comparisons. In addition, to save disk space, most songs stored in the data set are in a compressed format. Therefore, to eliminate some decoding procedures, this study extracted music information directly from the modified discrete cosine transform coefficients of advanced audio coding and then mapped these coefficients to 12-dimensional chroma features. The chroma features were segmented to preserve the melodies. Each chroma feature segment was trained and learned by a sparse autoencoder, a deep learning architecture of artificial neural networks. The deep learning procedure was to transform chroma features into an intermediate representation for dimension reduction. Experimental results from a covers80 data set showed that the mean reciprocal rank increased to 0.5 and the matching time was reduced by over 94% compared with traditional approaches.</description><identifier>ISSN: 0923-6082</identifier><identifier>EISSN: 1573-0824</identifier><identifier>DOI: 10.1007/s11045-017-0476-x</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Artificial Intelligence ; Artificial neural networks ; Audio data ; Circuits and Systems ; Datasets ; Decoding ; Deep learning ; Discrete cosine transform ; Electrical Engineering ; Engineering ; Identification methods ; Music ; Neural networks ; Representations ; Signal,Image and Speech Processing</subject><ispartof>Multidimensional systems and signal processing, 2018-07, Vol.29 (3), p.887-902</ispartof><rights>Springer Science+Business Media New York 2017</rights><rights>Copyright Springer Science & Business Media 2018</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c316t-50d7466f1b826e7a07f540a19878a544c21386fbeb374b940b9a810442fa63e93</citedby><cites>FETCH-LOGICAL-c316t-50d7466f1b826e7a07f540a19878a544c21386fbeb374b940b9a810442fa63e93</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Fang, Jiunn-Tsair</creatorcontrib><creatorcontrib>Chang, Yu-Ruey</creatorcontrib><creatorcontrib>Chang, Pao-Chi</creatorcontrib><title>Deep learning of chroma representation for cover song identification in compression domain</title><title>Multidimensional systems and signal processing</title><addtitle>Multidim Syst Sign Process</addtitle><description>Methods for identifying a cover song typically involve comparing the similarity of chroma features between the query song and another song in the data set. However, considerable time is required for pairwise comparisons. In addition, to save disk space, most songs stored in the data set are in a compressed format. Therefore, to eliminate some decoding procedures, this study extracted music information directly from the modified discrete cosine transform coefficients of advanced audio coding and then mapped these coefficients to 12-dimensional chroma features. The chroma features were segmented to preserve the melodies. Each chroma feature segment was trained and learned by a sparse autoencoder, a deep learning architecture of artificial neural networks. The deep learning procedure was to transform chroma features into an intermediate representation for dimension reduction. Experimental results from a covers80 data set showed that the mean reciprocal rank increased to 0.5 and the matching time was reduced by over 94% compared with traditional approaches.</description><subject>Artificial Intelligence</subject><subject>Artificial neural networks</subject><subject>Audio data</subject><subject>Circuits and Systems</subject><subject>Datasets</subject><subject>Decoding</subject><subject>Deep learning</subject><subject>Discrete cosine transform</subject><subject>Electrical Engineering</subject><subject>Engineering</subject><subject>Identification methods</subject><subject>Music</subject><subject>Neural networks</subject><subject>Representations</subject><subject>Signal,Image and Speech Processing</subject><issn>0923-6082</issn><issn>1573-0824</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNp1kEtLAzEUhYMoWKs_wF3AdfRmJq9ZSn1CwY1u3ITMNKkpbTImU6n_3gwjuHJ1uZzzncs9CF1SuKYA8iZTCowToJIAk4IcjtCMclkTUBU7RjNoqpqIspyis5w3AIWiYobe76zt8daaFHxY4-hw95HizuBk-2SzDYMZfAzYxYS7-GUTzrH4_Koo3vluUn0o4m4E8riuSoAP5-jEmW22F79zjt4e7l8XT2T58vi8uF2SrqZiIBxWkgnhaKsqYaUB6TgDQxslleGMdRWtlXCtbWvJ2oZB2xhVnmWVM6K2TT1HV1Nun-Ln3uZBb-I-hXJSV0C5klIpXlx0cnUp5pys033yO5O-NQU9VqinCnWpUI8V6kNhqonJxRvWNv0l_w_9AJ90dIo</recordid><startdate>20180701</startdate><enddate>20180701</enddate><creator>Fang, Jiunn-Tsair</creator><creator>Chang, Yu-Ruey</creator><creator>Chang, Pao-Chi</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20180701</creationdate><title>Deep learning of chroma representation for cover song identification in compression domain</title><author>Fang, Jiunn-Tsair ; Chang, Yu-Ruey ; Chang, Pao-Chi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c316t-50d7466f1b826e7a07f540a19878a544c21386fbeb374b940b9a810442fa63e93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Artificial Intelligence</topic><topic>Artificial neural networks</topic><topic>Audio data</topic><topic>Circuits and Systems</topic><topic>Datasets</topic><topic>Decoding</topic><topic>Deep learning</topic><topic>Discrete cosine transform</topic><topic>Electrical Engineering</topic><topic>Engineering</topic><topic>Identification methods</topic><topic>Music</topic><topic>Neural networks</topic><topic>Representations</topic><topic>Signal,Image and Speech Processing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Fang, Jiunn-Tsair</creatorcontrib><creatorcontrib>Chang, Yu-Ruey</creatorcontrib><creatorcontrib>Chang, Pao-Chi</creatorcontrib><collection>CrossRef</collection><jtitle>Multidimensional systems and signal processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Fang, Jiunn-Tsair</au><au>Chang, Yu-Ruey</au><au>Chang, Pao-Chi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep learning of chroma representation for cover song identification in compression domain</atitle><jtitle>Multidimensional systems and signal processing</jtitle><stitle>Multidim Syst Sign Process</stitle><date>2018-07-01</date><risdate>2018</risdate><volume>29</volume><issue>3</issue><spage>887</spage><epage>902</epage><pages>887-902</pages><issn>0923-6082</issn><eissn>1573-0824</eissn><abstract>Methods for identifying a cover song typically involve comparing the similarity of chroma features between the query song and another song in the data set. However, considerable time is required for pairwise comparisons. In addition, to save disk space, most songs stored in the data set are in a compressed format. Therefore, to eliminate some decoding procedures, this study extracted music information directly from the modified discrete cosine transform coefficients of advanced audio coding and then mapped these coefficients to 12-dimensional chroma features. The chroma features were segmented to preserve the melodies. Each chroma feature segment was trained and learned by a sparse autoencoder, a deep learning architecture of artificial neural networks. The deep learning procedure was to transform chroma features into an intermediate representation for dimension reduction. Experimental results from a covers80 data set showed that the mean reciprocal rank increased to 0.5 and the matching time was reduced by over 94% compared with traditional approaches.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11045-017-0476-x</doi><tpages>16</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0923-6082
ispartof	Multidimensional systems and signal processing, 2018-07, Vol.29 (3), p.887-902
issn	0923-6082 1573-0824
language	eng
recordid	cdi_proquest_journals_2015877885
source	Springer Nature
subjects	Artificial Intelligence Artificial neural networks Audio data Circuits and Systems Datasets Decoding Deep learning Discrete cosine transform Electrical Engineering Engineering Identification methods Music Neural networks Representations Signal,Image and Speech Processing
title	Deep learning of chroma representation for cover song identification in compression domain
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T19%3A40%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20learning%20of%20chroma%20representation%20for%20cover%20song%20identification%20in%20compression%20domain&rft.jtitle=Multidimensional%20systems%20and%20signal%20processing&rft.au=Fang,%20Jiunn-Tsair&rft.date=2018-07-01&rft.volume=29&rft.issue=3&rft.spage=887&rft.epage=902&rft.pages=887-902&rft.issn=0923-6082&rft.eissn=1573-0824&rft_id=info:doi/10.1007/s11045-017-0476-x&rft_dat=%3Cproquest_cross%3E2015877885%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c316t-50d7466f1b826e7a07f540a19878a544c21386fbeb374b940b9a810442fa63e93%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2015877885&rft_id=info:pmid/&rfr_iscdi=true