Loading…

Voice Conversion for Persons with Amyotrophic Lateral Sclerosis

Amyotrophic lateral sclerosis (ALS) results in progressive paralysis of voluntary muscles throughout the body. As speech deteriorates, individuals rely on pre-programmed messages available on commercial speech generating devices to communicate using one of the generic electronic voices on the device...

Full description

Saved in:
Bibliographic Details
Published in:IEEE journal of biomedical and health informatics 2020-10, Vol.24 (10), p.2942-2949
Main Authors: Zhao, Yunxin, Kuruvilla-Dugdale, Mili, Song, Minguang
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Amyotrophic lateral sclerosis (ALS) results in progressive paralysis of voluntary muscles throughout the body. As speech deteriorates, individuals rely on pre-programmed messages available on commercial speech generating devices to communicate using one of the generic electronic voices on the device. To replace these generic voices and restore vocal identity, our aim is to develop personalized voices for people with ALS via the approach of voice conversion. The task is challenging because very few people have large quantities of their premorbid healthy speech recorded. Therefore, we have to rely on small quantities of dysarthric speech concomitant with an individual's disease stage. Further, progressive fatigue prohibits acquisition of large speech datasets and individuals display a range of dysarthria severities resulting from breathing, voice, articulation, resonance, and prosody disturbances. As the first step to address these problems, we use healthy source speakers and propose the approach of combining a structured sparse spectral transform with multiple linear regression-based frequency warping prediction for spectral conversion, and interpolating the transformed spectral frames for speech rate modification. Our experimental data included four healthy source speakers from the ARCTIC dataset, and four target ALS speakers with mild to severe dysarthria, forming 16 speaker pairs. Subjective listening evaluations showed that on average, (i) the proposed approach improved speech intelligibility by about 80% over the target speakers' speech, (ii) the converted voice was 3 times more similar to the target speakers' speech than to the source speakers' speech, and (iii) the converted speech quality was close to the MOS scale "good" relative to the source speakers' speech being "excellent."
ISSN:2168-2194
2168-2208
DOI:10.1109/JBHI.2019.2961844