Loading…

Score Images as a Modality: Enhancing Symbolic Music Understanding through Large-Scale Multimodal Pre-Training

Symbolic music understanding is a critical challenge in artificial intelligence. While traditional symbolic music representations like MIDI capture essential musical elements, they often lack the nuanced expression in music scores. Leveraging the advancements in multimodal pre-training, particularly...

Full description

Saved in:
Bibliographic Details
Published in:Sensors (Basel, Switzerland) Switzerland), 2024-08, Vol.24 (15), p.5017
Main Authors: Qin, Yang, Xie, Huiming, Ding, Shuxue, Li, Yujie, Tan, Benying, Ye, Mingchuan
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Symbolic music understanding is a critical challenge in artificial intelligence. While traditional symbolic music representations like MIDI capture essential musical elements, they often lack the nuanced expression in music scores. Leveraging the advancements in multimodal pre-training, particularly in visual-language pre-training, we propose a groundbreaking approach: the Score Images as a Modality (SIM) model. This model integrates music score images alongside MIDI data for enhanced symbolic music understanding. We also introduce novel pre-training tasks, including masked bar-attribute modeling and score-MIDI matching. These tasks enable the SIM model to capture music structures and align visual and symbolic representations effectively. Additionally, we present a meticulously curated dataset of matched score images and MIDI representations optimized for training the SIM model. Through experimental validation, we demonstrate the efficacy of our approach in advancing symbolic music understanding.
ISSN:1424-8220
1424-8220
DOI:10.3390/s24155017