Loading…
Bar transformer: a hierarchical model for learning long-term structure and generating impressive pop music
Recently many deep learning-based automatic music generation models have been proposed. How to generate long pieces of pop music with distinctive musical characteristics remains a challenging problem, as it relies heavily on musical structures. Some transformer-based models take advantage of self-at...
Saved in:
Published in: | Applied intelligence (Dordrecht, Netherlands) Netherlands), 2023-05, Vol.53 (9), p.10130-10148 |
---|---|
Main Authors: | , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Recently many deep learning-based automatic music generation models have been proposed. How to generate long pieces of pop music with distinctive musical characteristics remains a challenging problem, as it relies heavily on musical structures. Some transformer-based models take advantage of self-attention for generating long-sequence music; however, most pay little attention to well-organized musical structures. In this article, we propose a novel note-to-bar hierarchical model named the
Bar Transformer
to address long-term dependency issues and generate impressive and structurally meaningful music. In particular, we propose a novel note-to-bar approach that pre-processes the notes within each individual bar to provide a strong structural constraint to increase our model’s awareness of the note-to-bar structure in music. The Bar Transformer is constructed using an encoder-decoder framework, including a two-layer encoder and an arrangement decoder. In the two-layer encoder, the bottom is a note-level encoder, which outputs embeddings by learning the relation between notes within an individual bar, and the top is a bar-level encoder, which uses these embeddings to encode each bar from the melody and chord. The decoder is an arrangement decoder used to generalize the interrelationships among the bars and simultaneously generate melodies and chords. The experimental results of the structural analysis and the aural evaluations demonstrate that our approach outperforms the Music Transformer model and other regressive models used for music generation. |
---|---|
ISSN: | 0924-669X 1573-7497 |
DOI: | 10.1007/s10489-022-04049-3 |