Loading…

A novel approach to isolated word recognition

A voice signal contains the psychological and physiological properties of the speaker as well as dialect differences, acoustical environment effects, and phase differences. For these reasons, the same word uttered by different speakers can be very different. In this paper, two theories are developed...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on speech and audio processing 1999-11, Vol.7 (6), p.620-628
Main Authors:	Bilginer Gulmezoglu, M., Dzhafarov, V., Keskin, M., Barkana, A.
Format:	Article
Language:	English
Subjects:	Acoustic testing Applied sciences Cepstral analysis Criteria Exact sciences and technology Information, signal and communications theory Karhunen-Loeve transforms Linear predictive coding Loudspeakers Mathematical analysis Optimization Projection Psychology Recognition Signal processing Speech Speech processing Speech recognition Telecommunications and information theory Time domain analysis Training Vectors Vectors (mathematics) Working environment noise
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c405t-92aec178d27c6cd93ccf2346c0ca0824bf62504339672ec69ff4cff994f47dc43
cites	cdi_FETCH-LOGICAL-c405t-92aec178d27c6cd93ccf2346c0ca0824bf62504339672ec69ff4cff994f47dc43
container_end_page	628
container_issue	6
container_start_page	620
container_title	IEEE transactions on speech and audio processing
container_volume	7
creator	Bilginer Gulmezoglu, M. Dzhafarov, V. Keskin, M. Barkana, A.
description	A voice signal contains the psychological and physiological properties of the speaker as well as dialect differences, acoustical environment effects, and phase differences. For these reasons, the same word uttered by different speakers can be very different. In this paper, two theories are developed by considering two optimization criteria applied to both the training set and the test set. The first theory is well known and uses what is called Criterion 1 here and ends up with the average of all vectors belonging to the words in the training set. The second theory is a novel approach and uses what is called Criterion 2 here, and it is used to extract the common properties of all vectors belonging to the words in the training set. It is shown that Criterion 2 is superior to Criterion 1 when the training set is of concern. In Criterion 2, the individual differences are obtained by subtracting a reference vector from other vectors, and individual difference vectors are used to obtain orthogonal vector basis by using the Gram-Schmidt orthogonalization method. The common vector is obtained by subtracting projections of any vector of the training set on the orthogonal vectors from this same vector. It is proved that this common vector is unique for any word class in the training set and independent of the chosen reference vector. This common vector is used in isolated word recognition, and it is also shown that Criterion 2 is superior to Criterion 1 for the test set. From the theoretical and experimental study, it is seen that the recognition rates increase as the number of speakers in the training set increases. This means that the common vector obtained from Criterion 2 represents the common properties of a spoken word better than the common or average vector obtained from Criterion 1.
doi_str_mv	10.1109/89.799687
format	article
fullrecord	<record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_799687</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>799687</ieee_id><sourcerecordid>1671388414</sourcerecordid><originalsourceid>FETCH-LOGICAL-c405t-92aec178d27c6cd93ccf2346c0ca0824bf62504339672ec69ff4cff994f47dc43</originalsourceid><addsrcrecordid>eNp9kEtLxDAURoMoOI4u3LrqQkQXHfNqHsth8AUDbnQd4m2ikU5Tk47iv7dDi-5c3Qv3fIfLh9ApwQtCsL5WeiG1FkruoRmpKlVSVrH9YceClUJIcYiOcn7HGCsi-QyVy6KNn64pbNelaOGt6GMRcmxs7-riK6a6SA7iaxv6ENtjdOBtk93JNOfo-fbmaXVfrh_vHlbLdQkcV32pqXVApKqpBAG1ZgCeMi4Ag8WK8hcvaIU5Y1pI6kBo7zl4rzX3XNbA2RxdjN7hp4-ty73ZhAyuaWzr4jYbqngluKYDePkvSIQkTClOds6rEYUUc07Omy6FjU3fhmCz684obcbuBvZ80toMtvHJthDyX4AoLJUYsLMRC8653-vk-AFU53SU</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1671388414</pqid></control><display><type>article</type><title>A novel approach to isolated word recognition</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Bilginer Gulmezoglu, M. ; Dzhafarov, V. ; Keskin, M. ; Barkana, A.</creator><creatorcontrib>Bilginer Gulmezoglu, M. ; Dzhafarov, V. ; Keskin, M. ; Barkana, A.</creatorcontrib><description>A voice signal contains the psychological and physiological properties of the speaker as well as dialect differences, acoustical environment effects, and phase differences. For these reasons, the same word uttered by different speakers can be very different. In this paper, two theories are developed by considering two optimization criteria applied to both the training set and the test set. The first theory is well known and uses what is called Criterion 1 here and ends up with the average of all vectors belonging to the words in the training set. The second theory is a novel approach and uses what is called Criterion 2 here, and it is used to extract the common properties of all vectors belonging to the words in the training set. It is shown that Criterion 2 is superior to Criterion 1 when the training set is of concern. In Criterion 2, the individual differences are obtained by subtracting a reference vector from other vectors, and individual difference vectors are used to obtain orthogonal vector basis by using the Gram-Schmidt orthogonalization method. The common vector is obtained by subtracting projections of any vector of the training set on the orthogonal vectors from this same vector. It is proved that this common vector is unique for any word class in the training set and independent of the chosen reference vector. This common vector is used in isolated word recognition, and it is also shown that Criterion 2 is superior to Criterion 1 for the test set. From the theoretical and experimental study, it is seen that the recognition rates increase as the number of speakers in the training set increases. This means that the common vector obtained from Criterion 2 represents the common properties of a spoken word better than the common or average vector obtained from Criterion 1.</description><identifier>ISSN: 1063-6676</identifier><identifier>EISSN: 1558-2353</identifier><identifier>DOI: 10.1109/89.799687</identifier><identifier>CODEN: IESPEJ</identifier><language>eng</language><publisher>New York, NY: IEEE</publisher><subject>Acoustic testing ; Applied sciences ; Cepstral analysis ; Criteria ; Exact sciences and technology ; Information, signal and communications theory ; Karhunen-Loeve transforms ; Linear predictive coding ; Loudspeakers ; Mathematical analysis ; Optimization ; Projection ; Psychology ; Recognition ; Signal processing ; Speech ; Speech processing ; Speech recognition ; Telecommunications and information theory ; Time domain analysis ; Training ; Vectors ; Vectors (mathematics) ; Working environment noise</subject><ispartof>IEEE transactions on speech and audio processing, 1999-11, Vol.7 (6), p.620-628</ispartof><rights>2000 INIST-CNRS</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c405t-92aec178d27c6cd93ccf2346c0ca0824bf62504339672ec69ff4cff994f47dc43</citedby><cites>FETCH-LOGICAL-c405t-92aec178d27c6cd93ccf2346c0ca0824bf62504339672ec69ff4cff994f47dc43</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/799687$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27923,27924,54795</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=1180786$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Bilginer Gulmezoglu, M.</creatorcontrib><creatorcontrib>Dzhafarov, V.</creatorcontrib><creatorcontrib>Keskin, M.</creatorcontrib><creatorcontrib>Barkana, A.</creatorcontrib><title>A novel approach to isolated word recognition</title><title>IEEE transactions on speech and audio processing</title><addtitle>T-SAP</addtitle><description>A voice signal contains the psychological and physiological properties of the speaker as well as dialect differences, acoustical environment effects, and phase differences. For these reasons, the same word uttered by different speakers can be very different. In this paper, two theories are developed by considering two optimization criteria applied to both the training set and the test set. The first theory is well known and uses what is called Criterion 1 here and ends up with the average of all vectors belonging to the words in the training set. The second theory is a novel approach and uses what is called Criterion 2 here, and it is used to extract the common properties of all vectors belonging to the words in the training set. It is shown that Criterion 2 is superior to Criterion 1 when the training set is of concern. In Criterion 2, the individual differences are obtained by subtracting a reference vector from other vectors, and individual difference vectors are used to obtain orthogonal vector basis by using the Gram-Schmidt orthogonalization method. The common vector is obtained by subtracting projections of any vector of the training set on the orthogonal vectors from this same vector. It is proved that this common vector is unique for any word class in the training set and independent of the chosen reference vector. This common vector is used in isolated word recognition, and it is also shown that Criterion 2 is superior to Criterion 1 for the test set. From the theoretical and experimental study, it is seen that the recognition rates increase as the number of speakers in the training set increases. This means that the common vector obtained from Criterion 2 represents the common properties of a spoken word better than the common or average vector obtained from Criterion 1.</description><subject>Acoustic testing</subject><subject>Applied sciences</subject><subject>Cepstral analysis</subject><subject>Criteria</subject><subject>Exact sciences and technology</subject><subject>Information, signal and communications theory</subject><subject>Karhunen-Loeve transforms</subject><subject>Linear predictive coding</subject><subject>Loudspeakers</subject><subject>Mathematical analysis</subject><subject>Optimization</subject><subject>Projection</subject><subject>Psychology</subject><subject>Recognition</subject><subject>Signal processing</subject><subject>Speech</subject><subject>Speech processing</subject><subject>Speech recognition</subject><subject>Telecommunications and information theory</subject><subject>Time domain analysis</subject><subject>Training</subject><subject>Vectors</subject><subject>Vectors (mathematics)</subject><subject>Working environment noise</subject><issn>1063-6676</issn><issn>1558-2353</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>1999</creationdate><recordtype>article</recordtype><recordid>eNp9kEtLxDAURoMoOI4u3LrqQkQXHfNqHsth8AUDbnQd4m2ikU5Tk47iv7dDi-5c3Qv3fIfLh9ApwQtCsL5WeiG1FkruoRmpKlVSVrH9YceClUJIcYiOcn7HGCsi-QyVy6KNn64pbNelaOGt6GMRcmxs7-riK6a6SA7iaxv6ENtjdOBtk93JNOfo-fbmaXVfrh_vHlbLdQkcV32pqXVApKqpBAG1ZgCeMi4Ag8WK8hcvaIU5Y1pI6kBo7zl4rzX3XNbA2RxdjN7hp4-ty73ZhAyuaWzr4jYbqngluKYDePkvSIQkTClOds6rEYUUc07Omy6FjU3fhmCz684obcbuBvZ80toMtvHJthDyX4AoLJUYsLMRC8653-vk-AFU53SU</recordid><startdate>19991101</startdate><enddate>19991101</enddate><creator>Bilginer Gulmezoglu, M.</creator><creator>Dzhafarov, V.</creator><creator>Keskin, M.</creator><creator>Barkana, A.</creator><general>IEEE</general><general>Institute of Electrical and Electronics Engineers</general><scope>RIA</scope><scope>RIE</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>19991101</creationdate><title>A novel approach to isolated word recognition</title><author>Bilginer Gulmezoglu, M. ; Dzhafarov, V. ; Keskin, M. ; Barkana, A.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c405t-92aec178d27c6cd93ccf2346c0ca0824bf62504339672ec69ff4cff994f47dc43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>1999</creationdate><topic>Acoustic testing</topic><topic>Applied sciences</topic><topic>Cepstral analysis</topic><topic>Criteria</topic><topic>Exact sciences and technology</topic><topic>Information, signal and communications theory</topic><topic>Karhunen-Loeve transforms</topic><topic>Linear predictive coding</topic><topic>Loudspeakers</topic><topic>Mathematical analysis</topic><topic>Optimization</topic><topic>Projection</topic><topic>Psychology</topic><topic>Recognition</topic><topic>Signal processing</topic><topic>Speech</topic><topic>Speech processing</topic><topic>Speech recognition</topic><topic>Telecommunications and information theory</topic><topic>Time domain analysis</topic><topic>Training</topic><topic>Vectors</topic><topic>Vectors (mathematics)</topic><topic>Working environment noise</topic><toplevel>online_resources</toplevel><creatorcontrib>Bilginer Gulmezoglu, M.</creatorcontrib><creatorcontrib>Dzhafarov, V.</creatorcontrib><creatorcontrib>Keskin, M.</creatorcontrib><creatorcontrib>Barkana, A.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on speech and audio processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Bilginer Gulmezoglu, M.</au><au>Dzhafarov, V.</au><au>Keskin, M.</au><au>Barkana, A.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A novel approach to isolated word recognition</atitle><jtitle>IEEE transactions on speech and audio processing</jtitle><stitle>T-SAP</stitle><date>1999-11-01</date><risdate>1999</risdate><volume>7</volume><issue>6</issue><spage>620</spage><epage>628</epage><pages>620-628</pages><issn>1063-6676</issn><eissn>1558-2353</eissn><coden>IESPEJ</coden><abstract>A voice signal contains the psychological and physiological properties of the speaker as well as dialect differences, acoustical environment effects, and phase differences. For these reasons, the same word uttered by different speakers can be very different. In this paper, two theories are developed by considering two optimization criteria applied to both the training set and the test set. The first theory is well known and uses what is called Criterion 1 here and ends up with the average of all vectors belonging to the words in the training set. The second theory is a novel approach and uses what is called Criterion 2 here, and it is used to extract the common properties of all vectors belonging to the words in the training set. It is shown that Criterion 2 is superior to Criterion 1 when the training set is of concern. In Criterion 2, the individual differences are obtained by subtracting a reference vector from other vectors, and individual difference vectors are used to obtain orthogonal vector basis by using the Gram-Schmidt orthogonalization method. The common vector is obtained by subtracting projections of any vector of the training set on the orthogonal vectors from this same vector. It is proved that this common vector is unique for any word class in the training set and independent of the chosen reference vector. This common vector is used in isolated word recognition, and it is also shown that Criterion 2 is superior to Criterion 1 for the test set. From the theoretical and experimental study, it is seen that the recognition rates increase as the number of speakers in the training set increases. This means that the common vector obtained from Criterion 2 represents the common properties of a spoken word better than the common or average vector obtained from Criterion 1.</abstract><cop>New York, NY</cop><pub>IEEE</pub><doi>10.1109/89.799687</doi><tpages>9</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 1063-6676
ispartof	IEEE transactions on speech and audio processing, 1999-11, Vol.7 (6), p.620-628
issn	1063-6676 1558-2353
language	eng
recordid	cdi_ieee_primary_799687
source	IEEE Electronic Library (IEL) Journals
subjects	Acoustic testing Applied sciences Cepstral analysis Criteria Exact sciences and technology Information, signal and communications theory Karhunen-Loeve transforms Linear predictive coding Loudspeakers Mathematical analysis Optimization Projection Psychology Recognition Signal processing Speech Speech processing Speech recognition Telecommunications and information theory Time domain analysis Training Vectors Vectors (mathematics) Working environment noise
title	A novel approach to isolated word recognition
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T16%3A13%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20novel%20approach%20to%20isolated%20word%20recognition&rft.jtitle=IEEE%20transactions%20on%20speech%20and%20audio%20processing&rft.au=Bilginer%20Gulmezoglu,%20M.&rft.date=1999-11-01&rft.volume=7&rft.issue=6&rft.spage=620&rft.epage=628&rft.pages=620-628&rft.issn=1063-6676&rft.eissn=1558-2353&rft.coden=IESPEJ&rft_id=info:doi/10.1109/89.799687&rft_dat=%3Cproquest_ieee_%3E1671388414%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c405t-92aec178d27c6cd93ccf2346c0ca0824bf62504339672ec69ff4cff994f47dc43%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1671388414&rft_id=info:pmid/&rft_ieee_id=799687&rfr_iscdi=true