Loading…
TatarTTS: An Open-Source Text-to-Speech Synthesis Dataset for the Tatar Language
This paper introduces an open-source dataset for speech synthesis in the Tatar language. The dataset comprises approximately 70 hours of transcribed audio recordings, featuring two professional speakers (one male and one female). Notably, it is the first large-scale dataset of its kind that is publi...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This paper introduces an open-source dataset for speech synthesis in the Tatar language. The dataset comprises approximately 70 hours of transcribed audio recordings, featuring two professional speakers (one male and one female). Notably, it is the first large-scale dataset of its kind that is publicly available, aimed at promoting Tatar text-to-speech (TTS) applications in both academic and industrial contexts. The paper describes the procedures for developing the dataset, discusses the challenges faced, and outlines important future directions. To demonstrate the reliability of the dataset, baseline end-to-end TTS models were built and evaluated using the subjective mean opinion score (MOS) measure. The dataset, training recipe, and pretrained TTS models are publicly available. |
---|---|
ISSN: | 2831-6983 |
DOI: | 10.1109/ICAIIC60209.2024.10463261 |