Loading…

VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music

In this work, we introduce VERSA, a unified and standardized evaluation toolkit designed for various speech, audio, and music signals. The toolkit features a Pythonic interface with flexible configuration and dependency control, making it user-friendly and efficient. With full installation, VERSA of...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2024-12
Main Authors: Shi, Jiatong, Hye-jin Shim, Tian, Jinchuan, Arora, Siddhant, Wu, Haibin, Petermann, Darius, Jia Qi Yip, Zhang, You, Tang, Yuxun, Zhang, Wangyou, Dareen Safar Alharthi, Huang, Yichen, Saito, Koichi, Han, Jionghao, Zhao, Yiwen, Donahue, Chris, Watanabe, Shinji
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In this work, we introduce VERSA, a unified and standardized evaluation toolkit designed for various speech, audio, and music signals. The toolkit features a Pythonic interface with flexible configuration and dependency control, making it user-friendly and efficient. With full installation, VERSA offers 63 metrics with 711 metric variations based on different configurations. These metrics encompass evaluations utilizing diverse external resources, including matching and non-matching reference audio, text transcriptions, and text captions. As a lightweight yet comprehensive toolkit, VERSA is versatile to support the evaluation of a wide range of downstream scenarios. To demonstrate its capabilities, this work highlights example use cases for VERSA, including audio coding, speech synthesis, speech enhancement, singing synthesis, and music generation. The toolkit is available at https://github.com/shinjiwlab/versa.
ISSN:2331-8422