Loading…

DiffuseRoll: multi-track multi-attribute music generation based on diffusion model

Recent advances in generative models have shown remarkable progress in music generation. However, since most existing methods focus on generating monophonic or homophonic music, the generation of polyphonic and multi-track music with rich attributes remains a challenging task. In this paper, we prop...

Full description

Saved in:
Bibliographic Details
Published in:Multimedia systems 2024-02, Vol.30 (1), Article 19
Main Authors: Wang, Hongfei, Zou, Yi, Cheng, Haonan, Ye, Long
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Recent advances in generative models have shown remarkable progress in music generation. However, since most existing methods focus on generating monophonic or homophonic music, the generation of polyphonic and multi-track music with rich attributes remains a challenging task. In this paper, we propose a novel image-based music generation approach DiffuseRoll, which is based on the diffusion models to generate multi-track, multi-attribute music. Specifically, we generate music piano-rolls with diffusion models and map them to MIDI format files for output. To capture rich attribute information, we design the color-encoding system to encode music note sequences into color and position information representing note pitch, velocity, tempo and instrument. This scheme enables a seamless mapping between discrete music sequences and continuous images. We propose Music Mini Expert System (MusicMES) to optimize the generated music for better performance. We conduct subjective experiments in evaluation metrics, namely Coherence, Diversity, Harmoniousness, Structureness, Orchestration, Overall Preference and Average. The results of subjective experiments are improved compared to the state-of-the-art image-based methods.
ISSN:0942-4962
1432-1882
DOI:10.1007/s00530-023-01220-9