Loading…

Google Crowdsourced Speech Corpora and Related Open-Source Resources for Low-Resource Languages and Dialects: An Overview

This paper presents an overview of a program designed to address the growing need for developing freely available speech resources for under-represented languages. At present we have released 38 datasets for building text-to-speech and automatic speech recognition applications for languages and dial...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2020-10
Main Authors: Butryna, Alena, Shan-Hui, Cathy Chu, Demirsahin, Isin, Gutkin, Alexander, Ha, Linne, He, Fei, Jansche, Martin, Cibu Johny, Katanova, Anna, Kjartansson, Oddur, Li, Chenfang, Merkulova, Tatiana, Yin May Oo, Knot Pipatsrisawat, Rivera, Clara, Sarin, Supheakmungkol, Pasindu de Silva, Keshan Sodimana, Sproat, Richard, Wattanavekin, Theeraphol, Jaka Aris Eko Wibawa
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper presents an overview of a program designed to address the growing need for developing freely available speech resources for under-represented languages. At present we have released 38 datasets for building text-to-speech and automatic speech recognition applications for languages and dialects of South and Southeast Asia, Africa, Europe and South America. The paper describes the methodology used for developing such corpora and presents some of our findings that could benefit under-represented language communities.
ISSN:2331-8422