Loading…

Building a Taiwanese Mandarin Spoken Language Model: A First Attempt

This technical report presents our initial attempt to build a spoken large language model (LLM) for Taiwanese Mandarin, specifically tailored to enable real-time, speech-to-speech interaction in multi-turn conversations. Our end-to-end model incorporates a decoder-only transformer architecture and a...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2024-11
Main Authors: Chih-Kai, Yang, Yu-Kuan, Fu, Chen-An, Li, Yi-Cheng, Lin, Yu-Xiang, Lin, Chen, Wei-Chih, Ho Lam Chung, Chun-Yi, Kuan, Huang, Wei-Ping, Ke-Han, Lu, Lin, Tzu-Quan, Wang, Hsiu-Hsuan, En-Pei Hu, Chan-Jan, Hsu, Liang-Hsuan Tseng, I-Hsiang, Chiu, Ulin Sanga, Chen, Xuanjun, Po-chun Hsu, Shu-wen, Yang, Hung-yi, Lee
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This technical report presents our initial attempt to build a spoken large language model (LLM) for Taiwanese Mandarin, specifically tailored to enable real-time, speech-to-speech interaction in multi-turn conversations. Our end-to-end model incorporates a decoder-only transformer architecture and aims to achieve seamless interaction while preserving the conversational flow, including full-duplex capabilities allowing simultaneous speaking and listening. The paper also details the training process, including data preparation with synthesized dialogues and adjustments for real-time interaction. We also developed a platform to evaluate conversational fluency and response coherence in multi-turn dialogues. We hope the release of the report can contribute to the future development of spoken LLMs in Taiwanese Mandarin.
ISSN:2331-8422