Loading…

Autoregressive Predictive Coding: A Comprehensive Study

We review autoregressive predictive coding (APC), an approach to learn speech representation by predicting a future frame given the past frames. We present three different views of interpreting APC, and provide a historical account to the approach. To study the speech representation learned by APC,...

Full description

Saved in:
Bibliographic Details
Published in:IEEE journal of selected topics in signal processing 2022-10, Vol.16 (6), p.1380-1390
Main Authors: Yang, Gene-Ping, Yeh, Sung-Lin, Chung, Yu-An, Glass, James, Tang, Hao
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c295t-2deac580b225648c092f19784a97376022a019bd42e20aeddf9e851ba0f7e0d3
cites cdi_FETCH-LOGICAL-c295t-2deac580b225648c092f19784a97376022a019bd42e20aeddf9e851ba0f7e0d3
container_end_page 1390
container_issue 6
container_start_page 1380
container_title IEEE journal of selected topics in signal processing
container_volume 16
creator Yang, Gene-Ping
Yeh, Sung-Lin
Chung, Yu-An
Glass, James
Tang, Hao
description We review autoregressive predictive coding (APC), an approach to learn speech representation by predicting a future frame given the past frames. We present three different views of interpreting APC, and provide a historical account to the approach. To study the speech representation learned by APC, we use common speech tasks, such as automatic speech recognition and speaker verification, to demonstrate the utility of the learned representation. In addition, we design a suite of fine-grained tasks, including frame classification, segment classification, fundamental frequency tracking, and duration prediction, to probe the phonetic and prosodic content of the representation. The three views of the APC objective welcome various generalizations and algorithms to learn speech representations. Probing on the suite of fine-grained tasks suggests that APC makes a wide range of high-level speech information accessible in its learned representation.
doi_str_mv 10.1109/JSTSP.2022.3203608
format article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_9874771</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9874771</ieee_id><sourcerecordid>2726109241</sourcerecordid><originalsourceid>FETCH-LOGICAL-c295t-2deac580b225648c092f19784a97376022a019bd42e20aeddf9e851ba0f7e0d3</originalsourceid><addsrcrecordid>eNo9kEtvwjAQhK2qlUoff6C9IPUcul7bsd0bQn0KqUhwt0y8oUGFUDupxL9vAqinncPM7s7H2B2HEedgHz_mi_lshIA4EggiB3PGBtxKnoE08rzXAjOplLhkVymtAZTOuRwwPW6bOtIqUkrVLw1nkUJVNL2c1KHarp6G405tdpG-aHuwzJs27G_YRem_E92e5jVbvDwvJm_Z9PP1fTKeZgVa1WQYyBfKwBJR5dIUYLHkVhvprRY67_71wO0ySCQETyGUloziSw-lJgjimj0c1-5i_dNSaty6buO2u-hQY95VR8k7Fx5dRaxTilS6Xaw2Pu4dB9fzcQc-rufjTny60P0xVBHRf8AaLbXm4g_OyWCL</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2726109241</pqid></control><display><type>article</type><title>Autoregressive Predictive Coding: A Comprehensive Study</title><source>Linguistics and Language Behavior Abstracts (LLBA)</source><source>IEEE Xplore (Online service)</source><creator>Yang, Gene-Ping ; Yeh, Sung-Lin ; Chung, Yu-An ; Glass, James ; Tang, Hao</creator><creatorcontrib>Yang, Gene-Ping ; Yeh, Sung-Lin ; Chung, Yu-An ; Glass, James ; Tang, Hao</creatorcontrib><description>We review autoregressive predictive coding (APC), an approach to learn speech representation by predicting a future frame given the past frames. We present three different views of interpreting APC, and provide a historical account to the approach. To study the speech representation learned by APC, we use common speech tasks, such as automatic speech recognition and speaker verification, to demonstrate the utility of the learned representation. In addition, we design a suite of fine-grained tasks, including frame classification, segment classification, fundamental frequency tracking, and duration prediction, to probe the phonetic and prosodic content of the representation. The three views of the APC objective welcome various generalizations and algorithms to learn speech representations. Probing on the suite of fine-grained tasks suggests that APC makes a wide range of high-level speech information accessible in its learned representation.</description><identifier>ISSN: 1932-4553</identifier><identifier>EISSN: 1941-0484</identifier><identifier>DOI: 10.1109/JSTSP.2022.3203608</identifier><identifier>CODEN: IJSTGY</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithms ; Automatic speech recognition ; Autoregressive processes ; Classification ; Coding ; Fundamental frequency ; Linguistics ; Phonetics ; Predictive coding ; Prosody ; Representation learning ; Representations ; Resonant frequencies ; Self-supervised learning ; Speaker identification ; speaker verification ; Speech recognition ; Voice recognition</subject><ispartof>IEEE journal of selected topics in signal processing, 2022-10, Vol.16 (6), p.1380-1390</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c295t-2deac580b225648c092f19784a97376022a019bd42e20aeddf9e851ba0f7e0d3</citedby><cites>FETCH-LOGICAL-c295t-2deac580b225648c092f19784a97376022a019bd42e20aeddf9e851ba0f7e0d3</cites><orcidid>0000-0001-9451-7956 ; 0000-0002-3786-3251 ; 0000-0002-3097-360X ; 0000-0002-2445-2605</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9874771$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,31269,54796</link.rule.ids></links><search><creatorcontrib>Yang, Gene-Ping</creatorcontrib><creatorcontrib>Yeh, Sung-Lin</creatorcontrib><creatorcontrib>Chung, Yu-An</creatorcontrib><creatorcontrib>Glass, James</creatorcontrib><creatorcontrib>Tang, Hao</creatorcontrib><title>Autoregressive Predictive Coding: A Comprehensive Study</title><title>IEEE journal of selected topics in signal processing</title><addtitle>JSTSP</addtitle><description>We review autoregressive predictive coding (APC), an approach to learn speech representation by predicting a future frame given the past frames. We present three different views of interpreting APC, and provide a historical account to the approach. To study the speech representation learned by APC, we use common speech tasks, such as automatic speech recognition and speaker verification, to demonstrate the utility of the learned representation. In addition, we design a suite of fine-grained tasks, including frame classification, segment classification, fundamental frequency tracking, and duration prediction, to probe the phonetic and prosodic content of the representation. The three views of the APC objective welcome various generalizations and algorithms to learn speech representations. Probing on the suite of fine-grained tasks suggests that APC makes a wide range of high-level speech information accessible in its learned representation.</description><subject>Algorithms</subject><subject>Automatic speech recognition</subject><subject>Autoregressive processes</subject><subject>Classification</subject><subject>Coding</subject><subject>Fundamental frequency</subject><subject>Linguistics</subject><subject>Phonetics</subject><subject>Predictive coding</subject><subject>Prosody</subject><subject>Representation learning</subject><subject>Representations</subject><subject>Resonant frequencies</subject><subject>Self-supervised learning</subject><subject>Speaker identification</subject><subject>speaker verification</subject><subject>Speech recognition</subject><subject>Voice recognition</subject><issn>1932-4553</issn><issn>1941-0484</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>7T9</sourceid><recordid>eNo9kEtvwjAQhK2qlUoff6C9IPUcul7bsd0bQn0KqUhwt0y8oUGFUDupxL9vAqinncPM7s7H2B2HEedgHz_mi_lshIA4EggiB3PGBtxKnoE08rzXAjOplLhkVymtAZTOuRwwPW6bOtIqUkrVLw1nkUJVNL2c1KHarp6G405tdpG-aHuwzJs27G_YRem_E92e5jVbvDwvJm_Z9PP1fTKeZgVa1WQYyBfKwBJR5dIUYLHkVhvprRY67_71wO0ySCQETyGUloziSw-lJgjimj0c1-5i_dNSaty6buO2u-hQY95VR8k7Fx5dRaxTilS6Xaw2Pu4dB9fzcQc-rufjTny60P0xVBHRf8AaLbXm4g_OyWCL</recordid><startdate>20221001</startdate><enddate>20221001</enddate><creator>Yang, Gene-Ping</creator><creator>Yeh, Sung-Lin</creator><creator>Chung, Yu-An</creator><creator>Glass, James</creator><creator>Tang, Hao</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>7T9</scope><scope>8FD</scope><scope>H8D</scope><scope>L7M</scope><orcidid>https://orcid.org/0000-0001-9451-7956</orcidid><orcidid>https://orcid.org/0000-0002-3786-3251</orcidid><orcidid>https://orcid.org/0000-0002-3097-360X</orcidid><orcidid>https://orcid.org/0000-0002-2445-2605</orcidid></search><sort><creationdate>20221001</creationdate><title>Autoregressive Predictive Coding: A Comprehensive Study</title><author>Yang, Gene-Ping ; Yeh, Sung-Lin ; Chung, Yu-An ; Glass, James ; Tang, Hao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c295t-2deac580b225648c092f19784a97376022a019bd42e20aeddf9e851ba0f7e0d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Automatic speech recognition</topic><topic>Autoregressive processes</topic><topic>Classification</topic><topic>Coding</topic><topic>Fundamental frequency</topic><topic>Linguistics</topic><topic>Phonetics</topic><topic>Predictive coding</topic><topic>Prosody</topic><topic>Representation learning</topic><topic>Representations</topic><topic>Resonant frequencies</topic><topic>Self-supervised learning</topic><topic>Speaker identification</topic><topic>speaker verification</topic><topic>Speech recognition</topic><topic>Voice recognition</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yang, Gene-Ping</creatorcontrib><creatorcontrib>Yeh, Sung-Lin</creatorcontrib><creatorcontrib>Chung, Yu-An</creatorcontrib><creatorcontrib>Glass, James</creatorcontrib><creatorcontrib>Tang, Hao</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE/IET Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><collection>Technology Research Database</collection><collection>Aerospace Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>IEEE journal of selected topics in signal processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yang, Gene-Ping</au><au>Yeh, Sung-Lin</au><au>Chung, Yu-An</au><au>Glass, James</au><au>Tang, Hao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Autoregressive Predictive Coding: A Comprehensive Study</atitle><jtitle>IEEE journal of selected topics in signal processing</jtitle><stitle>JSTSP</stitle><date>2022-10-01</date><risdate>2022</risdate><volume>16</volume><issue>6</issue><spage>1380</spage><epage>1390</epage><pages>1380-1390</pages><issn>1932-4553</issn><eissn>1941-0484</eissn><coden>IJSTGY</coden><abstract>We review autoregressive predictive coding (APC), an approach to learn speech representation by predicting a future frame given the past frames. We present three different views of interpreting APC, and provide a historical account to the approach. To study the speech representation learned by APC, we use common speech tasks, such as automatic speech recognition and speaker verification, to demonstrate the utility of the learned representation. In addition, we design a suite of fine-grained tasks, including frame classification, segment classification, fundamental frequency tracking, and duration prediction, to probe the phonetic and prosodic content of the representation. The three views of the APC objective welcome various generalizations and algorithms to learn speech representations. Probing on the suite of fine-grained tasks suggests that APC makes a wide range of high-level speech information accessible in its learned representation.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/JSTSP.2022.3203608</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0001-9451-7956</orcidid><orcidid>https://orcid.org/0000-0002-3786-3251</orcidid><orcidid>https://orcid.org/0000-0002-3097-360X</orcidid><orcidid>https://orcid.org/0000-0002-2445-2605</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1932-4553
ispartof IEEE journal of selected topics in signal processing, 2022-10, Vol.16 (6), p.1380-1390
issn 1932-4553
1941-0484
language eng
recordid cdi_ieee_primary_9874771
source Linguistics and Language Behavior Abstracts (LLBA); IEEE Xplore (Online service)
subjects Algorithms
Automatic speech recognition
Autoregressive processes
Classification
Coding
Fundamental frequency
Linguistics
Phonetics
Predictive coding
Prosody
Representation learning
Representations
Resonant frequencies
Self-supervised learning
Speaker identification
speaker verification
Speech recognition
Voice recognition
title Autoregressive Predictive Coding: A Comprehensive Study
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T22%3A00%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Autoregressive%20Predictive%20Coding:%20A%20Comprehensive%20Study&rft.jtitle=IEEE%20journal%20of%20selected%20topics%20in%20signal%20processing&rft.au=Yang,%20Gene-Ping&rft.date=2022-10-01&rft.volume=16&rft.issue=6&rft.spage=1380&rft.epage=1390&rft.pages=1380-1390&rft.issn=1932-4553&rft.eissn=1941-0484&rft.coden=IJSTGY&rft_id=info:doi/10.1109/JSTSP.2022.3203608&rft_dat=%3Cproquest_ieee_%3E2726109241%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c295t-2deac580b225648c092f19784a97376022a019bd42e20aeddf9e851ba0f7e0d3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2726109241&rft_id=info:pmid/&rft_ieee_id=9874771&rfr_iscdi=true