Loading…
Deep learning of chroma representation for cover song identification in compression domain
Methods for identifying a cover song typically involve comparing the similarity of chroma features between the query song and another song in the data set. However, considerable time is required for pairwise comparisons. In addition, to save disk space, most songs stored in the data set are in a com...
Saved in:
Published in: | Multidimensional systems and signal processing 2018-07, Vol.29 (3), p.887-902 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c316t-50d7466f1b826e7a07f540a19878a544c21386fbeb374b940b9a810442fa63e93 |
---|---|
cites | cdi_FETCH-LOGICAL-c316t-50d7466f1b826e7a07f540a19878a544c21386fbeb374b940b9a810442fa63e93 |
container_end_page | 902 |
container_issue | 3 |
container_start_page | 887 |
container_title | Multidimensional systems and signal processing |
container_volume | 29 |
creator | Fang, Jiunn-Tsair Chang, Yu-Ruey Chang, Pao-Chi |
description | Methods for identifying a cover song typically involve comparing the similarity of chroma features between the query song and another song in the data set. However, considerable time is required for pairwise comparisons. In addition, to save disk space, most songs stored in the data set are in a compressed format. Therefore, to eliminate some decoding procedures, this study extracted music information directly from the modified discrete cosine transform coefficients of advanced audio coding and then mapped these coefficients to 12-dimensional chroma features. The chroma features were segmented to preserve the melodies. Each chroma feature segment was trained and learned by a sparse autoencoder, a deep learning architecture of artificial neural networks. The deep learning procedure was to transform chroma features into an intermediate representation for dimension reduction. Experimental results from a covers80 data set showed that the mean reciprocal rank increased to 0.5 and the matching time was reduced by over 94% compared with traditional approaches. |
doi_str_mv | 10.1007/s11045-017-0476-x |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2015877885</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2015877885</sourcerecordid><originalsourceid>FETCH-LOGICAL-c316t-50d7466f1b826e7a07f540a19878a544c21386fbeb374b940b9a810442fa63e93</originalsourceid><addsrcrecordid>eNp1kEtLAzEUhYMoWKs_wF3AdfRmJq9ZSn1CwY1u3ITMNKkpbTImU6n_3gwjuHJ1uZzzncs9CF1SuKYA8iZTCowToJIAk4IcjtCMclkTUBU7RjNoqpqIspyis5w3AIWiYobe76zt8daaFHxY4-hw95HizuBk-2SzDYMZfAzYxYS7-GUTzrH4_Koo3vluUn0o4m4E8riuSoAP5-jEmW22F79zjt4e7l8XT2T58vi8uF2SrqZiIBxWkgnhaKsqYaUB6TgDQxslleGMdRWtlXCtbWvJ2oZB2xhVnmWVM6K2TT1HV1Nun-Ln3uZBb-I-hXJSV0C5klIpXlx0cnUp5pys033yO5O-NQU9VqinCnWpUI8V6kNhqonJxRvWNv0l_w_9AJ90dIo</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2015877885</pqid></control><display><type>article</type><title>Deep learning of chroma representation for cover song identification in compression domain</title><source>Springer Nature</source><creator>Fang, Jiunn-Tsair ; Chang, Yu-Ruey ; Chang, Pao-Chi</creator><creatorcontrib>Fang, Jiunn-Tsair ; Chang, Yu-Ruey ; Chang, Pao-Chi</creatorcontrib><description>Methods for identifying a cover song typically involve comparing the similarity of chroma features between the query song and another song in the data set. However, considerable time is required for pairwise comparisons. In addition, to save disk space, most songs stored in the data set are in a compressed format. Therefore, to eliminate some decoding procedures, this study extracted music information directly from the modified discrete cosine transform coefficients of advanced audio coding and then mapped these coefficients to 12-dimensional chroma features. The chroma features were segmented to preserve the melodies. Each chroma feature segment was trained and learned by a sparse autoencoder, a deep learning architecture of artificial neural networks. The deep learning procedure was to transform chroma features into an intermediate representation for dimension reduction. Experimental results from a covers80 data set showed that the mean reciprocal rank increased to 0.5 and the matching time was reduced by over 94% compared with traditional approaches.</description><identifier>ISSN: 0923-6082</identifier><identifier>EISSN: 1573-0824</identifier><identifier>DOI: 10.1007/s11045-017-0476-x</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Artificial Intelligence ; Artificial neural networks ; Audio data ; Circuits and Systems ; Datasets ; Decoding ; Deep learning ; Discrete cosine transform ; Electrical Engineering ; Engineering ; Identification methods ; Music ; Neural networks ; Representations ; Signal,Image and Speech Processing</subject><ispartof>Multidimensional systems and signal processing, 2018-07, Vol.29 (3), p.887-902</ispartof><rights>Springer Science+Business Media New York 2017</rights><rights>Copyright Springer Science & Business Media 2018</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c316t-50d7466f1b826e7a07f540a19878a544c21386fbeb374b940b9a810442fa63e93</citedby><cites>FETCH-LOGICAL-c316t-50d7466f1b826e7a07f540a19878a544c21386fbeb374b940b9a810442fa63e93</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Fang, Jiunn-Tsair</creatorcontrib><creatorcontrib>Chang, Yu-Ruey</creatorcontrib><creatorcontrib>Chang, Pao-Chi</creatorcontrib><title>Deep learning of chroma representation for cover song identification in compression domain</title><title>Multidimensional systems and signal processing</title><addtitle>Multidim Syst Sign Process</addtitle><description>Methods for identifying a cover song typically involve comparing the similarity of chroma features between the query song and another song in the data set. However, considerable time is required for pairwise comparisons. In addition, to save disk space, most songs stored in the data set are in a compressed format. Therefore, to eliminate some decoding procedures, this study extracted music information directly from the modified discrete cosine transform coefficients of advanced audio coding and then mapped these coefficients to 12-dimensional chroma features. The chroma features were segmented to preserve the melodies. Each chroma feature segment was trained and learned by a sparse autoencoder, a deep learning architecture of artificial neural networks. The deep learning procedure was to transform chroma features into an intermediate representation for dimension reduction. Experimental results from a covers80 data set showed that the mean reciprocal rank increased to 0.5 and the matching time was reduced by over 94% compared with traditional approaches.</description><subject>Artificial Intelligence</subject><subject>Artificial neural networks</subject><subject>Audio data</subject><subject>Circuits and Systems</subject><subject>Datasets</subject><subject>Decoding</subject><subject>Deep learning</subject><subject>Discrete cosine transform</subject><subject>Electrical Engineering</subject><subject>Engineering</subject><subject>Identification methods</subject><subject>Music</subject><subject>Neural networks</subject><subject>Representations</subject><subject>Signal,Image and Speech Processing</subject><issn>0923-6082</issn><issn>1573-0824</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNp1kEtLAzEUhYMoWKs_wF3AdfRmJq9ZSn1CwY1u3ITMNKkpbTImU6n_3gwjuHJ1uZzzncs9CF1SuKYA8iZTCowToJIAk4IcjtCMclkTUBU7RjNoqpqIspyis5w3AIWiYobe76zt8daaFHxY4-hw95HizuBk-2SzDYMZfAzYxYS7-GUTzrH4_Koo3vluUn0o4m4E8riuSoAP5-jEmW22F79zjt4e7l8XT2T58vi8uF2SrqZiIBxWkgnhaKsqYaUB6TgDQxslleGMdRWtlXCtbWvJ2oZB2xhVnmWVM6K2TT1HV1Nun-Ln3uZBb-I-hXJSV0C5klIpXlx0cnUp5pys033yO5O-NQU9VqinCnWpUI8V6kNhqonJxRvWNv0l_w_9AJ90dIo</recordid><startdate>20180701</startdate><enddate>20180701</enddate><creator>Fang, Jiunn-Tsair</creator><creator>Chang, Yu-Ruey</creator><creator>Chang, Pao-Chi</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20180701</creationdate><title>Deep learning of chroma representation for cover song identification in compression domain</title><author>Fang, Jiunn-Tsair ; Chang, Yu-Ruey ; Chang, Pao-Chi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c316t-50d7466f1b826e7a07f540a19878a544c21386fbeb374b940b9a810442fa63e93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Artificial Intelligence</topic><topic>Artificial neural networks</topic><topic>Audio data</topic><topic>Circuits and Systems</topic><topic>Datasets</topic><topic>Decoding</topic><topic>Deep learning</topic><topic>Discrete cosine transform</topic><topic>Electrical Engineering</topic><topic>Engineering</topic><topic>Identification methods</topic><topic>Music</topic><topic>Neural networks</topic><topic>Representations</topic><topic>Signal,Image and Speech Processing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Fang, Jiunn-Tsair</creatorcontrib><creatorcontrib>Chang, Yu-Ruey</creatorcontrib><creatorcontrib>Chang, Pao-Chi</creatorcontrib><collection>CrossRef</collection><jtitle>Multidimensional systems and signal processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Fang, Jiunn-Tsair</au><au>Chang, Yu-Ruey</au><au>Chang, Pao-Chi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep learning of chroma representation for cover song identification in compression domain</atitle><jtitle>Multidimensional systems and signal processing</jtitle><stitle>Multidim Syst Sign Process</stitle><date>2018-07-01</date><risdate>2018</risdate><volume>29</volume><issue>3</issue><spage>887</spage><epage>902</epage><pages>887-902</pages><issn>0923-6082</issn><eissn>1573-0824</eissn><abstract>Methods for identifying a cover song typically involve comparing the similarity of chroma features between the query song and another song in the data set. However, considerable time is required for pairwise comparisons. In addition, to save disk space, most songs stored in the data set are in a compressed format. Therefore, to eliminate some decoding procedures, this study extracted music information directly from the modified discrete cosine transform coefficients of advanced audio coding and then mapped these coefficients to 12-dimensional chroma features. The chroma features were segmented to preserve the melodies. Each chroma feature segment was trained and learned by a sparse autoencoder, a deep learning architecture of artificial neural networks. The deep learning procedure was to transform chroma features into an intermediate representation for dimension reduction. Experimental results from a covers80 data set showed that the mean reciprocal rank increased to 0.5 and the matching time was reduced by over 94% compared with traditional approaches.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11045-017-0476-x</doi><tpages>16</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0923-6082 |
ispartof | Multidimensional systems and signal processing, 2018-07, Vol.29 (3), p.887-902 |
issn | 0923-6082 1573-0824 |
language | eng |
recordid | cdi_proquest_journals_2015877885 |
source | Springer Nature |
subjects | Artificial Intelligence Artificial neural networks Audio data Circuits and Systems Datasets Decoding Deep learning Discrete cosine transform Electrical Engineering Engineering Identification methods Music Neural networks Representations Signal,Image and Speech Processing |
title | Deep learning of chroma representation for cover song identification in compression domain |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T19%3A40%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20learning%20of%20chroma%20representation%20for%20cover%20song%20identification%20in%20compression%20domain&rft.jtitle=Multidimensional%20systems%20and%20signal%20processing&rft.au=Fang,%20Jiunn-Tsair&rft.date=2018-07-01&rft.volume=29&rft.issue=3&rft.spage=887&rft.epage=902&rft.pages=887-902&rft.issn=0923-6082&rft.eissn=1573-0824&rft_id=info:doi/10.1007/s11045-017-0476-x&rft_dat=%3Cproquest_cross%3E2015877885%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c316t-50d7466f1b826e7a07f540a19878a544c21386fbeb374b940b9a810442fa63e93%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2015877885&rft_id=info:pmid/&rfr_iscdi=true |