Loading…
Autoregressive Predictive Coding: A Comprehensive Study
We review autoregressive predictive coding (APC), an approach to learn speech representation by predicting a future frame given the past frames. We present three different views of interpreting APC, and provide a historical account to the approach. To study the speech representation learned by APC,...
Saved in:
Published in: | IEEE journal of selected topics in signal processing 2022-10, Vol.16 (6), p.1380-1390 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c295t-2deac580b225648c092f19784a97376022a019bd42e20aeddf9e851ba0f7e0d3 |
---|---|
cites | cdi_FETCH-LOGICAL-c295t-2deac580b225648c092f19784a97376022a019bd42e20aeddf9e851ba0f7e0d3 |
container_end_page | 1390 |
container_issue | 6 |
container_start_page | 1380 |
container_title | IEEE journal of selected topics in signal processing |
container_volume | 16 |
creator | Yang, Gene-Ping Yeh, Sung-Lin Chung, Yu-An Glass, James Tang, Hao |
description | We review autoregressive predictive coding (APC), an approach to learn speech representation by predicting a future frame given the past frames. We present three different views of interpreting APC, and provide a historical account to the approach. To study the speech representation learned by APC, we use common speech tasks, such as automatic speech recognition and speaker verification, to demonstrate the utility of the learned representation. In addition, we design a suite of fine-grained tasks, including frame classification, segment classification, fundamental frequency tracking, and duration prediction, to probe the phonetic and prosodic content of the representation. The three views of the APC objective welcome various generalizations and algorithms to learn speech representations. Probing on the suite of fine-grained tasks suggests that APC makes a wide range of high-level speech information accessible in its learned representation. |
doi_str_mv | 10.1109/JSTSP.2022.3203608 |
format | article |
fullrecord | <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_9874771</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9874771</ieee_id><sourcerecordid>2726109241</sourcerecordid><originalsourceid>FETCH-LOGICAL-c295t-2deac580b225648c092f19784a97376022a019bd42e20aeddf9e851ba0f7e0d3</originalsourceid><addsrcrecordid>eNo9kEtvwjAQhK2qlUoff6C9IPUcul7bsd0bQn0KqUhwt0y8oUGFUDupxL9vAqinncPM7s7H2B2HEedgHz_mi_lshIA4EggiB3PGBtxKnoE08rzXAjOplLhkVymtAZTOuRwwPW6bOtIqUkrVLw1nkUJVNL2c1KHarp6G405tdpG-aHuwzJs27G_YRem_E92e5jVbvDwvJm_Z9PP1fTKeZgVa1WQYyBfKwBJR5dIUYLHkVhvprRY67_71wO0ySCQETyGUloziSw-lJgjimj0c1-5i_dNSaty6buO2u-hQY95VR8k7Fx5dRaxTilS6Xaw2Pu4dB9fzcQc-rufjTny60P0xVBHRf8AaLbXm4g_OyWCL</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2726109241</pqid></control><display><type>article</type><title>Autoregressive Predictive Coding: A Comprehensive Study</title><source>Linguistics and Language Behavior Abstracts (LLBA)</source><source>IEEE Xplore (Online service)</source><creator>Yang, Gene-Ping ; Yeh, Sung-Lin ; Chung, Yu-An ; Glass, James ; Tang, Hao</creator><creatorcontrib>Yang, Gene-Ping ; Yeh, Sung-Lin ; Chung, Yu-An ; Glass, James ; Tang, Hao</creatorcontrib><description>We review autoregressive predictive coding (APC), an approach to learn speech representation by predicting a future frame given the past frames. We present three different views of interpreting APC, and provide a historical account to the approach. To study the speech representation learned by APC, we use common speech tasks, such as automatic speech recognition and speaker verification, to demonstrate the utility of the learned representation. In addition, we design a suite of fine-grained tasks, including frame classification, segment classification, fundamental frequency tracking, and duration prediction, to probe the phonetic and prosodic content of the representation. The three views of the APC objective welcome various generalizations and algorithms to learn speech representations. Probing on the suite of fine-grained tasks suggests that APC makes a wide range of high-level speech information accessible in its learned representation.</description><identifier>ISSN: 1932-4553</identifier><identifier>EISSN: 1941-0484</identifier><identifier>DOI: 10.1109/JSTSP.2022.3203608</identifier><identifier>CODEN: IJSTGY</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithms ; Automatic speech recognition ; Autoregressive processes ; Classification ; Coding ; Fundamental frequency ; Linguistics ; Phonetics ; Predictive coding ; Prosody ; Representation learning ; Representations ; Resonant frequencies ; Self-supervised learning ; Speaker identification ; speaker verification ; Speech recognition ; Voice recognition</subject><ispartof>IEEE journal of selected topics in signal processing, 2022-10, Vol.16 (6), p.1380-1390</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c295t-2deac580b225648c092f19784a97376022a019bd42e20aeddf9e851ba0f7e0d3</citedby><cites>FETCH-LOGICAL-c295t-2deac580b225648c092f19784a97376022a019bd42e20aeddf9e851ba0f7e0d3</cites><orcidid>0000-0001-9451-7956 ; 0000-0002-3786-3251 ; 0000-0002-3097-360X ; 0000-0002-2445-2605</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9874771$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,31269,54796</link.rule.ids></links><search><creatorcontrib>Yang, Gene-Ping</creatorcontrib><creatorcontrib>Yeh, Sung-Lin</creatorcontrib><creatorcontrib>Chung, Yu-An</creatorcontrib><creatorcontrib>Glass, James</creatorcontrib><creatorcontrib>Tang, Hao</creatorcontrib><title>Autoregressive Predictive Coding: A Comprehensive Study</title><title>IEEE journal of selected topics in signal processing</title><addtitle>JSTSP</addtitle><description>We review autoregressive predictive coding (APC), an approach to learn speech representation by predicting a future frame given the past frames. We present three different views of interpreting APC, and provide a historical account to the approach. To study the speech representation learned by APC, we use common speech tasks, such as automatic speech recognition and speaker verification, to demonstrate the utility of the learned representation. In addition, we design a suite of fine-grained tasks, including frame classification, segment classification, fundamental frequency tracking, and duration prediction, to probe the phonetic and prosodic content of the representation. The three views of the APC objective welcome various generalizations and algorithms to learn speech representations. Probing on the suite of fine-grained tasks suggests that APC makes a wide range of high-level speech information accessible in its learned representation.</description><subject>Algorithms</subject><subject>Automatic speech recognition</subject><subject>Autoregressive processes</subject><subject>Classification</subject><subject>Coding</subject><subject>Fundamental frequency</subject><subject>Linguistics</subject><subject>Phonetics</subject><subject>Predictive coding</subject><subject>Prosody</subject><subject>Representation learning</subject><subject>Representations</subject><subject>Resonant frequencies</subject><subject>Self-supervised learning</subject><subject>Speaker identification</subject><subject>speaker verification</subject><subject>Speech recognition</subject><subject>Voice recognition</subject><issn>1932-4553</issn><issn>1941-0484</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>7T9</sourceid><recordid>eNo9kEtvwjAQhK2qlUoff6C9IPUcul7bsd0bQn0KqUhwt0y8oUGFUDupxL9vAqinncPM7s7H2B2HEedgHz_mi_lshIA4EggiB3PGBtxKnoE08rzXAjOplLhkVymtAZTOuRwwPW6bOtIqUkrVLw1nkUJVNL2c1KHarp6G405tdpG-aHuwzJs27G_YRem_E92e5jVbvDwvJm_Z9PP1fTKeZgVa1WQYyBfKwBJR5dIUYLHkVhvprRY67_71wO0ySCQETyGUloziSw-lJgjimj0c1-5i_dNSaty6buO2u-hQY95VR8k7Fx5dRaxTilS6Xaw2Pu4dB9fzcQc-rufjTny60P0xVBHRf8AaLbXm4g_OyWCL</recordid><startdate>20221001</startdate><enddate>20221001</enddate><creator>Yang, Gene-Ping</creator><creator>Yeh, Sung-Lin</creator><creator>Chung, Yu-An</creator><creator>Glass, James</creator><creator>Tang, Hao</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>7T9</scope><scope>8FD</scope><scope>H8D</scope><scope>L7M</scope><orcidid>https://orcid.org/0000-0001-9451-7956</orcidid><orcidid>https://orcid.org/0000-0002-3786-3251</orcidid><orcidid>https://orcid.org/0000-0002-3097-360X</orcidid><orcidid>https://orcid.org/0000-0002-2445-2605</orcidid></search><sort><creationdate>20221001</creationdate><title>Autoregressive Predictive Coding: A Comprehensive Study</title><author>Yang, Gene-Ping ; Yeh, Sung-Lin ; Chung, Yu-An ; Glass, James ; Tang, Hao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c295t-2deac580b225648c092f19784a97376022a019bd42e20aeddf9e851ba0f7e0d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Automatic speech recognition</topic><topic>Autoregressive processes</topic><topic>Classification</topic><topic>Coding</topic><topic>Fundamental frequency</topic><topic>Linguistics</topic><topic>Phonetics</topic><topic>Predictive coding</topic><topic>Prosody</topic><topic>Representation learning</topic><topic>Representations</topic><topic>Resonant frequencies</topic><topic>Self-supervised learning</topic><topic>Speaker identification</topic><topic>speaker verification</topic><topic>Speech recognition</topic><topic>Voice recognition</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yang, Gene-Ping</creatorcontrib><creatorcontrib>Yeh, Sung-Lin</creatorcontrib><creatorcontrib>Chung, Yu-An</creatorcontrib><creatorcontrib>Glass, James</creatorcontrib><creatorcontrib>Tang, Hao</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE/IET Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Electronics & Communications Abstracts</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><collection>Technology Research Database</collection><collection>Aerospace Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>IEEE journal of selected topics in signal processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yang, Gene-Ping</au><au>Yeh, Sung-Lin</au><au>Chung, Yu-An</au><au>Glass, James</au><au>Tang, Hao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Autoregressive Predictive Coding: A Comprehensive Study</atitle><jtitle>IEEE journal of selected topics in signal processing</jtitle><stitle>JSTSP</stitle><date>2022-10-01</date><risdate>2022</risdate><volume>16</volume><issue>6</issue><spage>1380</spage><epage>1390</epage><pages>1380-1390</pages><issn>1932-4553</issn><eissn>1941-0484</eissn><coden>IJSTGY</coden><abstract>We review autoregressive predictive coding (APC), an approach to learn speech representation by predicting a future frame given the past frames. We present three different views of interpreting APC, and provide a historical account to the approach. To study the speech representation learned by APC, we use common speech tasks, such as automatic speech recognition and speaker verification, to demonstrate the utility of the learned representation. In addition, we design a suite of fine-grained tasks, including frame classification, segment classification, fundamental frequency tracking, and duration prediction, to probe the phonetic and prosodic content of the representation. The three views of the APC objective welcome various generalizations and algorithms to learn speech representations. Probing on the suite of fine-grained tasks suggests that APC makes a wide range of high-level speech information accessible in its learned representation.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/JSTSP.2022.3203608</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0001-9451-7956</orcidid><orcidid>https://orcid.org/0000-0002-3786-3251</orcidid><orcidid>https://orcid.org/0000-0002-3097-360X</orcidid><orcidid>https://orcid.org/0000-0002-2445-2605</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1932-4553 |
ispartof | IEEE journal of selected topics in signal processing, 2022-10, Vol.16 (6), p.1380-1390 |
issn | 1932-4553 1941-0484 |
language | eng |
recordid | cdi_ieee_primary_9874771 |
source | Linguistics and Language Behavior Abstracts (LLBA); IEEE Xplore (Online service) |
subjects | Algorithms Automatic speech recognition Autoregressive processes Classification Coding Fundamental frequency Linguistics Phonetics Predictive coding Prosody Representation learning Representations Resonant frequencies Self-supervised learning Speaker identification speaker verification Speech recognition Voice recognition |
title | Autoregressive Predictive Coding: A Comprehensive Study |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T22%3A00%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Autoregressive%20Predictive%20Coding:%20A%20Comprehensive%20Study&rft.jtitle=IEEE%20journal%20of%20selected%20topics%20in%20signal%20processing&rft.au=Yang,%20Gene-Ping&rft.date=2022-10-01&rft.volume=16&rft.issue=6&rft.spage=1380&rft.epage=1390&rft.pages=1380-1390&rft.issn=1932-4553&rft.eissn=1941-0484&rft.coden=IJSTGY&rft_id=info:doi/10.1109/JSTSP.2022.3203608&rft_dat=%3Cproquest_ieee_%3E2726109241%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c295t-2deac580b225648c092f19784a97376022a019bd42e20aeddf9e851ba0f7e0d3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2726109241&rft_id=info:pmid/&rft_ieee_id=9874771&rfr_iscdi=true |