Loading…

A Data-Driven Analysis of Robust Automatic Piano Transcription

Algorithms for automatic piano transcription have improved dramatically in recent years due to new datasets and modeling techniques. Recent developments have focused primarily on adapting new neural network architectures, such as the Transformer and Perceiver, in order to yield more accurate systems...

Full description

Saved in:
Bibliographic Details
Published in:IEEE signal processing letters 2024, Vol.31, p.681-685
Main Authors: Edwards, Drew, Dixon, Simon, Benetos, Emmanouil, Maezawa, Akira, Kusaka, Yuta
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c245t-f6c16c8c3b2af5f9d5d1b18e297491999b5d55c88c7fc9e4826e45619112de753
container_end_page 685
container_issue
container_start_page 681
container_title IEEE signal processing letters
container_volume 31
creator Edwards, Drew
Dixon, Simon
Benetos, Emmanouil
Maezawa, Akira
Kusaka, Yuta
description Algorithms for automatic piano transcription have improved dramatically in recent years due to new datasets and modeling techniques. Recent developments have focused primarily on adapting new neural network architectures, such as the Transformer and Perceiver, in order to yield more accurate systems. In this letter, we study transcription systems from the perspective of their training data. By measuring their performance on out-of-distribution annotated piano data, we show how these models can severely overfit to acoustic properties of the training data. We create a new set of audio for the MAESTRO dataset, captured automatically in a professional studio recording environment via Yamaha Disklavier playback. Using various data augmentation techniques when training with the original and re-performed versions of the MAESTRO dataset, we achieve state-of-the-art note-onset accuracy of 88.4 F1-score on the MAPS dataset, without seeing any of its training data. We subsequently analyze these data augmentation techniques in a series of ablation studies to better understand their influence on the resulting models.
doi_str_mv 10.1109/LSP.2024.3363646
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2933610874</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10428040</ieee_id><sourcerecordid>2933610874</sourcerecordid><originalsourceid>FETCH-LOGICAL-c245t-f6c16c8c3b2af5f9d5d1b18e297491999b5d55c88c7fc9e4826e45619112de753</originalsourceid><addsrcrecordid>eNpNkE1LAzEQhoMoWKt3Dx4Cnrdm8rXJRVjqJxQsWs8hm81CSrupya7Qf--W9uBp5vC8LzMPQrdAZgBEPyy-ljNKKJ8xJpnk8gxNQAhVUCbhfNxJSQqtibpEVzmvCSEKlJigxwo_2d4WTyn8-g5Xnd3sc8g4tvgz1kPucTX0cWv74PAy2C7iVbJddins-hC7a3TR2k32N6c5Rd8vz6v5W7H4eH2fV4vCUS76opUOpFOO1dS2otWNaKAG5akuuQatdS0aIZxSrmyd9lxR6bmQoAFo40vBpuj-2LtL8WfwuTfrOKTx2GyoHj8Goko-UuRIuRRzTr41uxS2Nu0NEHOwZEZL5mDJnCyNkbtjJHjv_-GcKsIJ-wOllWHF</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2933610874</pqid></control><display><type>article</type><title>A Data-Driven Analysis of Robust Automatic Piano Transcription</title><source>IEEE Xplore (Online service)</source><creator>Edwards, Drew ; Dixon, Simon ; Benetos, Emmanouil ; Maezawa, Akira ; Kusaka, Yuta</creator><creatorcontrib>Edwards, Drew ; Dixon, Simon ; Benetos, Emmanouil ; Maezawa, Akira ; Kusaka, Yuta</creatorcontrib><description>Algorithms for automatic piano transcription have improved dramatically in recent years due to new datasets and modeling techniques. Recent developments have focused primarily on adapting new neural network architectures, such as the Transformer and Perceiver, in order to yield more accurate systems. In this letter, we study transcription systems from the perspective of their training data. By measuring their performance on out-of-distribution annotated piano data, we show how these models can severely overfit to acoustic properties of the training data. We create a new set of audio for the MAESTRO dataset, captured automatically in a professional studio recording environment via Yamaha Disklavier playback. Using various data augmentation techniques when training with the original and re-performed versions of the MAESTRO dataset, we achieve state-of-the-art note-onset accuracy of 88.4 F1-score on the MAPS dataset, without seeing any of its training data. We subsequently analyze these data augmentation techniques in a series of ablation studies to better understand their influence on the resulting models.</description><identifier>ISSN: 1070-9908</identifier><identifier>EISSN: 1558-2361</identifier><identifier>DOI: 10.1109/LSP.2024.3363646</identifier><identifier>CODEN: ISPLEM</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Ablation ; Acoustic properties ; Acoustics ; Algorithms ; Audio data ; Data analysis ; Data augmentation ; Data models ; Datasets ; Neural networks ; Piano transcription ; Pianos ; Pipelines ; Recording ; Training ; Training data</subject><ispartof>IEEE signal processing letters, 2024, Vol.31, p.681-685</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c245t-f6c16c8c3b2af5f9d5d1b18e297491999b5d55c88c7fc9e4826e45619112de753</cites><orcidid>0009-0004-6111-835X ; 0009-0004-4921-5253 ; 0000-0002-6098-481X ; 0000-0002-6820-6764 ; 0009-0002-9827-5434</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10428040$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,4024,27923,27924,27925,54796</link.rule.ids></links><search><creatorcontrib>Edwards, Drew</creatorcontrib><creatorcontrib>Dixon, Simon</creatorcontrib><creatorcontrib>Benetos, Emmanouil</creatorcontrib><creatorcontrib>Maezawa, Akira</creatorcontrib><creatorcontrib>Kusaka, Yuta</creatorcontrib><title>A Data-Driven Analysis of Robust Automatic Piano Transcription</title><title>IEEE signal processing letters</title><addtitle>LSP</addtitle><description>Algorithms for automatic piano transcription have improved dramatically in recent years due to new datasets and modeling techniques. Recent developments have focused primarily on adapting new neural network architectures, such as the Transformer and Perceiver, in order to yield more accurate systems. In this letter, we study transcription systems from the perspective of their training data. By measuring their performance on out-of-distribution annotated piano data, we show how these models can severely overfit to acoustic properties of the training data. We create a new set of audio for the MAESTRO dataset, captured automatically in a professional studio recording environment via Yamaha Disklavier playback. Using various data augmentation techniques when training with the original and re-performed versions of the MAESTRO dataset, we achieve state-of-the-art note-onset accuracy of 88.4 F1-score on the MAPS dataset, without seeing any of its training data. We subsequently analyze these data augmentation techniques in a series of ablation studies to better understand their influence on the resulting models.</description><subject>Ablation</subject><subject>Acoustic properties</subject><subject>Acoustics</subject><subject>Algorithms</subject><subject>Audio data</subject><subject>Data analysis</subject><subject>Data augmentation</subject><subject>Data models</subject><subject>Datasets</subject><subject>Neural networks</subject><subject>Piano transcription</subject><subject>Pianos</subject><subject>Pipelines</subject><subject>Recording</subject><subject>Training</subject><subject>Training data</subject><issn>1070-9908</issn><issn>1558-2361</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNpNkE1LAzEQhoMoWKt3Dx4Cnrdm8rXJRVjqJxQsWs8hm81CSrupya7Qf--W9uBp5vC8LzMPQrdAZgBEPyy-ljNKKJ8xJpnk8gxNQAhVUCbhfNxJSQqtibpEVzmvCSEKlJigxwo_2d4WTyn8-g5Xnd3sc8g4tvgz1kPucTX0cWv74PAy2C7iVbJddins-hC7a3TR2k32N6c5Rd8vz6v5W7H4eH2fV4vCUS76opUOpFOO1dS2otWNaKAG5akuuQatdS0aIZxSrmyd9lxR6bmQoAFo40vBpuj-2LtL8WfwuTfrOKTx2GyoHj8Goko-UuRIuRRzTr41uxS2Nu0NEHOwZEZL5mDJnCyNkbtjJHjv_-GcKsIJ-wOllWHF</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Edwards, Drew</creator><creator>Dixon, Simon</creator><creator>Benetos, Emmanouil</creator><creator>Maezawa, Akira</creator><creator>Kusaka, Yuta</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0009-0004-6111-835X</orcidid><orcidid>https://orcid.org/0009-0004-4921-5253</orcidid><orcidid>https://orcid.org/0000-0002-6098-481X</orcidid><orcidid>https://orcid.org/0000-0002-6820-6764</orcidid><orcidid>https://orcid.org/0009-0002-9827-5434</orcidid></search><sort><creationdate>2024</creationdate><title>A Data-Driven Analysis of Robust Automatic Piano Transcription</title><author>Edwards, Drew ; Dixon, Simon ; Benetos, Emmanouil ; Maezawa, Akira ; Kusaka, Yuta</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c245t-f6c16c8c3b2af5f9d5d1b18e297491999b5d55c88c7fc9e4826e45619112de753</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Ablation</topic><topic>Acoustic properties</topic><topic>Acoustics</topic><topic>Algorithms</topic><topic>Audio data</topic><topic>Data analysis</topic><topic>Data augmentation</topic><topic>Data models</topic><topic>Datasets</topic><topic>Neural networks</topic><topic>Piano transcription</topic><topic>Pianos</topic><topic>Pipelines</topic><topic>Recording</topic><topic>Training</topic><topic>Training data</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Edwards, Drew</creatorcontrib><creatorcontrib>Dixon, Simon</creatorcontrib><creatorcontrib>Benetos, Emmanouil</creatorcontrib><creatorcontrib>Maezawa, Akira</creatorcontrib><creatorcontrib>Kusaka, Yuta</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE/IET Electronic Library</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE signal processing letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Edwards, Drew</au><au>Dixon, Simon</au><au>Benetos, Emmanouil</au><au>Maezawa, Akira</au><au>Kusaka, Yuta</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Data-Driven Analysis of Robust Automatic Piano Transcription</atitle><jtitle>IEEE signal processing letters</jtitle><stitle>LSP</stitle><date>2024</date><risdate>2024</risdate><volume>31</volume><spage>681</spage><epage>685</epage><pages>681-685</pages><issn>1070-9908</issn><eissn>1558-2361</eissn><coden>ISPLEM</coden><abstract>Algorithms for automatic piano transcription have improved dramatically in recent years due to new datasets and modeling techniques. Recent developments have focused primarily on adapting new neural network architectures, such as the Transformer and Perceiver, in order to yield more accurate systems. In this letter, we study transcription systems from the perspective of their training data. By measuring their performance on out-of-distribution annotated piano data, we show how these models can severely overfit to acoustic properties of the training data. We create a new set of audio for the MAESTRO dataset, captured automatically in a professional studio recording environment via Yamaha Disklavier playback. Using various data augmentation techniques when training with the original and re-performed versions of the MAESTRO dataset, we achieve state-of-the-art note-onset accuracy of 88.4 F1-score on the MAPS dataset, without seeing any of its training data. We subsequently analyze these data augmentation techniques in a series of ablation studies to better understand their influence on the resulting models.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/LSP.2024.3363646</doi><tpages>5</tpages><orcidid>https://orcid.org/0009-0004-6111-835X</orcidid><orcidid>https://orcid.org/0009-0004-4921-5253</orcidid><orcidid>https://orcid.org/0000-0002-6098-481X</orcidid><orcidid>https://orcid.org/0000-0002-6820-6764</orcidid><orcidid>https://orcid.org/0009-0002-9827-5434</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1070-9908
ispartof IEEE signal processing letters, 2024, Vol.31, p.681-685
issn 1070-9908
1558-2361
language eng
recordid cdi_proquest_journals_2933610874
source IEEE Xplore (Online service)
subjects Ablation
Acoustic properties
Acoustics
Algorithms
Audio data
Data analysis
Data augmentation
Data models
Datasets
Neural networks
Piano transcription
Pianos
Pipelines
Recording
Training
Training data
title A Data-Driven Analysis of Robust Automatic Piano Transcription
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T21%3A37%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Data-Driven%20Analysis%20of%20Robust%20Automatic%20Piano%20Transcription&rft.jtitle=IEEE%20signal%20processing%20letters&rft.au=Edwards,%20Drew&rft.date=2024&rft.volume=31&rft.spage=681&rft.epage=685&rft.pages=681-685&rft.issn=1070-9908&rft.eissn=1558-2361&rft.coden=ISPLEM&rft_id=info:doi/10.1109/LSP.2024.3363646&rft_dat=%3Cproquest_cross%3E2933610874%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c245t-f6c16c8c3b2af5f9d5d1b18e297491999b5d55c88c7fc9e4826e45619112de753%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2933610874&rft_id=info:pmid/&rft_ieee_id=10428040&rfr_iscdi=true