Loading…

Sequence labeling to detect stuttering events in read speech

•The effect of data augmentation technique has improved the performance of all applied classifiers.•The results on human transcripts show that, without feature engineering, the BLSTM outperform the CRF classifiers.•The results after added auxiliary features to support the CRFaux classifier allows pe...

Full description

Saved in:
Bibliographic Details
Published in:Computer speech & language 2020-07, Vol.62, p.101052, Article 101052
Main Authors: Alharbi, Sadeen, Hasan, Madina, Simons, Anthony J H, Brumfitt, Shelagh, Green, Phil
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c340t-6cd757154b985b57f8bf942a2e2f8ba20400b41ea7ba37c1ee3f0483f722ae673
cites cdi_FETCH-LOGICAL-c340t-6cd757154b985b57f8bf942a2e2f8ba20400b41ea7ba37c1ee3f0483f722ae673
container_end_page
container_issue
container_start_page 101052
container_title Computer speech & language
container_volume 62
creator Alharbi, Sadeen
Hasan, Madina
Simons, Anthony J H
Brumfitt, Shelagh
Green, Phil
description •The effect of data augmentation technique has improved the performance of all applied classifiers.•The results on human transcripts show that, without feature engineering, the BLSTM outperform the CRF classifiers.•The results after added auxiliary features to support the CRFaux classifier allows performance improvements.•The results of CRFngram , CRFaux and BLSTM classifiers on ASR transcripts, scored against human transcription degrade in these three classifiers. Stuttering is a speech disorder that, if treated during childhood, may be prevented from persisting into adolescence. A clinician must first determine the severity of stuttering, assessing a child during a conversational or reading task, recording each instance of disfluency, either in real time, or after transcribing the recorded session and analysing the transcript. The current study evaluates the ability of two machine learning approaches, namely conditional random fields (CRF) and bi-directional long-short-term memory (BLSTM), to detect stuttering events in transcriptions of stuttering speech. The two approaches are compared for their performance both on ideal hand-transcribed data and also on the output of automatic speech recognition (ASR). We also study the effect of data augmentation to improve performance. A corpus of 35 speakers’ read speech (13K words) was supplemented with a corpus of 63 speakers’ spontaneous speech (11K words) and an artificially-generated corpus (50K words). Experimental results show that, without feature engineering, BLSTM classifiers outperform CRF classifiers by 33.6%. However, adding features to support the CRF classifier yields performance improvements of 45% and 18% over the CRF baseline and BLSTM results, respectively. Moreover, adding more data to train the CRF and BLSTM classifiers consistently improves the results.
doi_str_mv 10.1016/j.csl.2019.101052
format article
fullrecord <record><control><sourceid>elsevier_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1016_j_csl_2019_101052</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0885230819302967</els_id><sourcerecordid>S0885230819302967</sourcerecordid><originalsourceid>FETCH-LOGICAL-c340t-6cd757154b985b57f8bf942a2e2f8ba20400b41ea7ba37c1ee3f0483f722ae673</originalsourceid><addsrcrecordid>eNp9z81KxDAQB_AgCq6rD-AtL9A6-WpS9CKLrsKCB_Uc0nSqKbW7JtkF396W9expPuA_zI-QawYlA1bd9KVPQ8mB1fMMip-QBYNaFUZU4pQswBhVcAHmnFyk1ANApaRekLtX_N7j6JEOrsEhjB80b2mLGX2mKe9zxjgv8YBjTjSMNKJradoh-s9Lcta5IeHVX12S98eHt9VTsXlZP6_uN4UXEnJR-VYrzZRsaqMapTvTdLXkjiOfWsdBAjSSodONE9ozRNGBNKLTnDustFgSdrzr4zaliJ3dxfDl4o9lYGe-7e3EtzPfHvlT5vaYwemxQ8Bokw8ztA1xstl2G_5J_wJL7WJ4</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Sequence labeling to detect stuttering events in read speech</title><source>Elsevier</source><creator>Alharbi, Sadeen ; Hasan, Madina ; Simons, Anthony J H ; Brumfitt, Shelagh ; Green, Phil</creator><creatorcontrib>Alharbi, Sadeen ; Hasan, Madina ; Simons, Anthony J H ; Brumfitt, Shelagh ; Green, Phil</creatorcontrib><description>•The effect of data augmentation technique has improved the performance of all applied classifiers.•The results on human transcripts show that, without feature engineering, the BLSTM outperform the CRF classifiers.•The results after added auxiliary features to support the CRFaux classifier allows performance improvements.•The results of CRFngram , CRFaux and BLSTM classifiers on ASR transcripts, scored against human transcription degrade in these three classifiers. Stuttering is a speech disorder that, if treated during childhood, may be prevented from persisting into adolescence. A clinician must first determine the severity of stuttering, assessing a child during a conversational or reading task, recording each instance of disfluency, either in real time, or after transcribing the recorded session and analysing the transcript. The current study evaluates the ability of two machine learning approaches, namely conditional random fields (CRF) and bi-directional long-short-term memory (BLSTM), to detect stuttering events in transcriptions of stuttering speech. The two approaches are compared for their performance both on ideal hand-transcribed data and also on the output of automatic speech recognition (ASR). We also study the effect of data augmentation to improve performance. A corpus of 35 speakers’ read speech (13K words) was supplemented with a corpus of 63 speakers’ spontaneous speech (11K words) and an artificially-generated corpus (50K words). Experimental results show that, without feature engineering, BLSTM classifiers outperform CRF classifiers by 33.6%. However, adding features to support the CRF classifier yields performance improvements of 45% and 18% over the CRF baseline and BLSTM results, respectively. Moreover, adding more data to train the CRF and BLSTM classifiers consistently improves the results.</description><identifier>ISSN: 0885-2308</identifier><identifier>EISSN: 1095-8363</identifier><identifier>DOI: 10.1016/j.csl.2019.101052</identifier><language>eng</language><publisher>Elsevier Ltd</publisher><subject>BLSTM ; CRF ; Speech disorder ; Stuttering event detection</subject><ispartof>Computer speech &amp; language, 2020-07, Vol.62, p.101052, Article 101052</ispartof><rights>2019 Elsevier Ltd</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c340t-6cd757154b985b57f8bf942a2e2f8ba20400b41ea7ba37c1ee3f0483f722ae673</citedby><cites>FETCH-LOGICAL-c340t-6cd757154b985b57f8bf942a2e2f8ba20400b41ea7ba37c1ee3f0483f722ae673</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Alharbi, Sadeen</creatorcontrib><creatorcontrib>Hasan, Madina</creatorcontrib><creatorcontrib>Simons, Anthony J H</creatorcontrib><creatorcontrib>Brumfitt, Shelagh</creatorcontrib><creatorcontrib>Green, Phil</creatorcontrib><title>Sequence labeling to detect stuttering events in read speech</title><title>Computer speech &amp; language</title><description>•The effect of data augmentation technique has improved the performance of all applied classifiers.•The results on human transcripts show that, without feature engineering, the BLSTM outperform the CRF classifiers.•The results after added auxiliary features to support the CRFaux classifier allows performance improvements.•The results of CRFngram , CRFaux and BLSTM classifiers on ASR transcripts, scored against human transcription degrade in these three classifiers. Stuttering is a speech disorder that, if treated during childhood, may be prevented from persisting into adolescence. A clinician must first determine the severity of stuttering, assessing a child during a conversational or reading task, recording each instance of disfluency, either in real time, or after transcribing the recorded session and analysing the transcript. The current study evaluates the ability of two machine learning approaches, namely conditional random fields (CRF) and bi-directional long-short-term memory (BLSTM), to detect stuttering events in transcriptions of stuttering speech. The two approaches are compared for their performance both on ideal hand-transcribed data and also on the output of automatic speech recognition (ASR). We also study the effect of data augmentation to improve performance. A corpus of 35 speakers’ read speech (13K words) was supplemented with a corpus of 63 speakers’ spontaneous speech (11K words) and an artificially-generated corpus (50K words). Experimental results show that, without feature engineering, BLSTM classifiers outperform CRF classifiers by 33.6%. However, adding features to support the CRF classifier yields performance improvements of 45% and 18% over the CRF baseline and BLSTM results, respectively. Moreover, adding more data to train the CRF and BLSTM classifiers consistently improves the results.</description><subject>BLSTM</subject><subject>CRF</subject><subject>Speech disorder</subject><subject>Stuttering event detection</subject><issn>0885-2308</issn><issn>1095-8363</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><recordid>eNp9z81KxDAQB_AgCq6rD-AtL9A6-WpS9CKLrsKCB_Uc0nSqKbW7JtkF396W9expPuA_zI-QawYlA1bd9KVPQ8mB1fMMip-QBYNaFUZU4pQswBhVcAHmnFyk1ANApaRekLtX_N7j6JEOrsEhjB80b2mLGX2mKe9zxjgv8YBjTjSMNKJradoh-s9Lcta5IeHVX12S98eHt9VTsXlZP6_uN4UXEnJR-VYrzZRsaqMapTvTdLXkjiOfWsdBAjSSodONE9ozRNGBNKLTnDustFgSdrzr4zaliJ3dxfDl4o9lYGe-7e3EtzPfHvlT5vaYwemxQ8Bokw8ztA1xstl2G_5J_wJL7WJ4</recordid><startdate>202007</startdate><enddate>202007</enddate><creator>Alharbi, Sadeen</creator><creator>Hasan, Madina</creator><creator>Simons, Anthony J H</creator><creator>Brumfitt, Shelagh</creator><creator>Green, Phil</creator><general>Elsevier Ltd</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>202007</creationdate><title>Sequence labeling to detect stuttering events in read speech</title><author>Alharbi, Sadeen ; Hasan, Madina ; Simons, Anthony J H ; Brumfitt, Shelagh ; Green, Phil</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c340t-6cd757154b985b57f8bf942a2e2f8ba20400b41ea7ba37c1ee3f0483f722ae673</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>BLSTM</topic><topic>CRF</topic><topic>Speech disorder</topic><topic>Stuttering event detection</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Alharbi, Sadeen</creatorcontrib><creatorcontrib>Hasan, Madina</creatorcontrib><creatorcontrib>Simons, Anthony J H</creatorcontrib><creatorcontrib>Brumfitt, Shelagh</creatorcontrib><creatorcontrib>Green, Phil</creatorcontrib><collection>CrossRef</collection><jtitle>Computer speech &amp; language</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Alharbi, Sadeen</au><au>Hasan, Madina</au><au>Simons, Anthony J H</au><au>Brumfitt, Shelagh</au><au>Green, Phil</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Sequence labeling to detect stuttering events in read speech</atitle><jtitle>Computer speech &amp; language</jtitle><date>2020-07</date><risdate>2020</risdate><volume>62</volume><spage>101052</spage><pages>101052-</pages><artnum>101052</artnum><issn>0885-2308</issn><eissn>1095-8363</eissn><abstract>•The effect of data augmentation technique has improved the performance of all applied classifiers.•The results on human transcripts show that, without feature engineering, the BLSTM outperform the CRF classifiers.•The results after added auxiliary features to support the CRFaux classifier allows performance improvements.•The results of CRFngram , CRFaux and BLSTM classifiers on ASR transcripts, scored against human transcription degrade in these three classifiers. Stuttering is a speech disorder that, if treated during childhood, may be prevented from persisting into adolescence. A clinician must first determine the severity of stuttering, assessing a child during a conversational or reading task, recording each instance of disfluency, either in real time, or after transcribing the recorded session and analysing the transcript. The current study evaluates the ability of two machine learning approaches, namely conditional random fields (CRF) and bi-directional long-short-term memory (BLSTM), to detect stuttering events in transcriptions of stuttering speech. The two approaches are compared for their performance both on ideal hand-transcribed data and also on the output of automatic speech recognition (ASR). We also study the effect of data augmentation to improve performance. A corpus of 35 speakers’ read speech (13K words) was supplemented with a corpus of 63 speakers’ spontaneous speech (11K words) and an artificially-generated corpus (50K words). Experimental results show that, without feature engineering, BLSTM classifiers outperform CRF classifiers by 33.6%. However, adding features to support the CRF classifier yields performance improvements of 45% and 18% over the CRF baseline and BLSTM results, respectively. Moreover, adding more data to train the CRF and BLSTM classifiers consistently improves the results.</abstract><pub>Elsevier Ltd</pub><doi>10.1016/j.csl.2019.101052</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0885-2308
ispartof Computer speech & language, 2020-07, Vol.62, p.101052, Article 101052
issn 0885-2308
1095-8363
language eng
recordid cdi_crossref_primary_10_1016_j_csl_2019_101052
source Elsevier
subjects BLSTM
CRF
Speech disorder
Stuttering event detection
title Sequence labeling to detect stuttering events in read speech
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T10%3A03%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-elsevier_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Sequence%20labeling%20to%20detect%20stuttering%20events%20in%20read%20speech&rft.jtitle=Computer%20speech%20&%20language&rft.au=Alharbi,%20Sadeen&rft.date=2020-07&rft.volume=62&rft.spage=101052&rft.pages=101052-&rft.artnum=101052&rft.issn=0885-2308&rft.eissn=1095-8363&rft_id=info:doi/10.1016/j.csl.2019.101052&rft_dat=%3Celsevier_cross%3ES0885230819302967%3C/elsevier_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c340t-6cd757154b985b57f8bf942a2e2f8ba20400b41ea7ba37c1ee3f0483f722ae673%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true