MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

Self-supervised learning (SSL) has recently emerged as a promising paradigm for training generalisable models on large-scale data in the fields of vision, text, and speech. Although SSL has been proven effective in speech and audio, its application to music audio has yet to be thoroughly explored. T...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2024-12
Main Authors: Li, Yizhi, Yuan, Ruibin, Zhang, Ge, Ma, Yinghao, Chen, Xingran, Yin, Hanzhi, Xiao, Chenghao, Lin, Chenghua, Ragni, Anton, Benetos, Emmanouil, Gyenge, Norbert, Dannenberg, Roger, Liu, Ruibo, Chen, Wenhu, Xia, Gus, Shi, Yemin, Huang, Wenhao, Wang, Zili, Guo, Yike, Fu, Jie
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!