Loading…

Normalized training for HMM-based visual speech recognition

This paper presents an approach to estimating the parameters of continuous density HMMs for visual speech recognition. One of the key issues of image-based visual speech recognition is normalization of lip location and lighting conditions prior to estimating the parameters of HMMs. We presented a no...

Full description

Saved in:

Bibliographic Details
Main Authors:	Nankaku, Y., Tokuda, K., Kitamura, T., Kobayashi, T.
Format:	Conference Proceeding
Language:	English
Subjects:	Data mining Hidden Markov models Lips Maximum likelihood estimation Mouth Pixel Speech recognition Training data Vectors
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	237 vol.3
container_issue
container_start_page	234
container_title
container_volume	3
creator	Nankaku, Y. Tokuda, K. Kitamura, T. Kobayashi, T.
description	This paper presents an approach to estimating the parameters of continuous density HMMs for visual speech recognition. One of the key issues of image-based visual speech recognition is normalization of lip location and lighting conditions prior to estimating the parameters of HMMs. We presented a normalized training method in which the normalization process is integrated in the model training. This paper extends it for contrast normalization in addition to average-intensity and location normalization. The proposed method provides a theoretically-well-defined algorithm based on a maximum likelihood formulation, hence the likelihood for the training data is guaranteed to increase at each iteration of the normalized training. Experiments on the M2VTS database show that the recognition performance can be significantly improved by the normalized training.
doi_str_mv	10.1109/ICIP.2000.899338
format	conference_proceeding
fullrecord	<record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_899338</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>899338</ieee_id><sourcerecordid>899338</sourcerecordid><originalsourceid>FETCH-LOGICAL-i104t-1e9fd208f0e675f7e332968aa813e90edd5fdf3f322b096e28e004cc5caa0b753</originalsourceid><addsrcrecordid>eNotj01LxDAURYMfYB1nL676B1pfkqZ5wZUUdQoz6kLXQ9q-jJFOOyRV0F9vYVxduHAu5zJ2zSHnHMxtXdWvuQCAHI2REk9YIiTyDFVhTtklaARZCqP1GUu4EiIrEOGCLWP8nCEoVKFRJOzueQx72_tf6tIpWD_4YZe6MaSrzSZrbJzrbx-_bJ_GA1H7kQZqx93gJz8OV-zc2T7S8j8X7P3x4a1aZeuXp7q6X2eeQzFlnIzrBKADKrVymqQUpkRrkUsyQF2nXOekk0I0YEoSSLNf26rWWmi0kgt2c9z1RLQ9BL-34Wd7vC3_AMEtSX4</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Normalized training for HMM-based visual speech recognition</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Nankaku, Y. ; Tokuda, K. ; Kitamura, T. ; Kobayashi, T.</creator><creatorcontrib>Nankaku, Y. ; Tokuda, K. ; Kitamura, T. ; Kobayashi, T.</creatorcontrib><description>This paper presents an approach to estimating the parameters of continuous density HMMs for visual speech recognition. One of the key issues of image-based visual speech recognition is normalization of lip location and lighting conditions prior to estimating the parameters of HMMs. We presented a normalized training method in which the normalization process is integrated in the model training. This paper extends it for contrast normalization in addition to average-intensity and location normalization. The proposed method provides a theoretically-well-defined algorithm based on a maximum likelihood formulation, hence the likelihood for the training data is guaranteed to increase at each iteration of the normalized training. Experiments on the M2VTS database show that the recognition performance can be significantly improved by the normalized training.</description><identifier>ISSN: 1522-4880</identifier><identifier>ISBN: 0780362977</identifier><identifier>ISBN: 9780780362970</identifier><identifier>EISSN: 2381-8549</identifier><identifier>DOI: 10.1109/ICIP.2000.899338</identifier><language>eng</language><publisher>IEEE</publisher><subject>Data mining ; Hidden Markov models ; Lips ; Maximum likelihood estimation ; Mouth ; Pixel ; Speech recognition ; Training data ; Vectors</subject><ispartof>Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101), 2000, Vol.3, p.234-237 vol.3</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/899338$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,4036,4037,27902,54530,54895,54907</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/899338$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Nankaku, Y.</creatorcontrib><creatorcontrib>Tokuda, K.</creatorcontrib><creatorcontrib>Kitamura, T.</creatorcontrib><creatorcontrib>Kobayashi, T.</creatorcontrib><title>Normalized training for HMM-based visual speech recognition</title><title>Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101)</title><addtitle>ICIP</addtitle><description>This paper presents an approach to estimating the parameters of continuous density HMMs for visual speech recognition. One of the key issues of image-based visual speech recognition is normalization of lip location and lighting conditions prior to estimating the parameters of HMMs. We presented a normalized training method in which the normalization process is integrated in the model training. This paper extends it for contrast normalization in addition to average-intensity and location normalization. The proposed method provides a theoretically-well-defined algorithm based on a maximum likelihood formulation, hence the likelihood for the training data is guaranteed to increase at each iteration of the normalized training. Experiments on the M2VTS database show that the recognition performance can be significantly improved by the normalized training.</description><subject>Data mining</subject><subject>Hidden Markov models</subject><subject>Lips</subject><subject>Maximum likelihood estimation</subject><subject>Mouth</subject><subject>Pixel</subject><subject>Speech recognition</subject><subject>Training data</subject><subject>Vectors</subject><issn>1522-4880</issn><issn>2381-8549</issn><isbn>0780362977</isbn><isbn>9780780362970</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2000</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotj01LxDAURYMfYB1nL676B1pfkqZ5wZUUdQoz6kLXQ9q-jJFOOyRV0F9vYVxduHAu5zJ2zSHnHMxtXdWvuQCAHI2REk9YIiTyDFVhTtklaARZCqP1GUu4EiIrEOGCLWP8nCEoVKFRJOzueQx72_tf6tIpWD_4YZe6MaSrzSZrbJzrbx-_bJ_GA1H7kQZqx93gJz8OV-zc2T7S8j8X7P3x4a1aZeuXp7q6X2eeQzFlnIzrBKADKrVymqQUpkRrkUsyQF2nXOekk0I0YEoSSLNf26rWWmi0kgt2c9z1RLQ9BL-34Wd7vC3_AMEtSX4</recordid><startdate>2000</startdate><enddate>2000</enddate><creator>Nankaku, Y.</creator><creator>Tokuda, K.</creator><creator>Kitamura, T.</creator><creator>Kobayashi, T.</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>2000</creationdate><title>Normalized training for HMM-based visual speech recognition</title><author>Nankaku, Y. ; Tokuda, K. ; Kitamura, T. ; Kobayashi, T.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i104t-1e9fd208f0e675f7e332968aa813e90edd5fdf3f322b096e28e004cc5caa0b753</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2000</creationdate><topic>Data mining</topic><topic>Hidden Markov models</topic><topic>Lips</topic><topic>Maximum likelihood estimation</topic><topic>Mouth</topic><topic>Pixel</topic><topic>Speech recognition</topic><topic>Training data</topic><topic>Vectors</topic><toplevel>online_resources</toplevel><creatorcontrib>Nankaku, Y.</creatorcontrib><creatorcontrib>Tokuda, K.</creatorcontrib><creatorcontrib>Kitamura, T.</creatorcontrib><creatorcontrib>Kobayashi, T.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE/IET Electronic Library</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Nankaku, Y.</au><au>Tokuda, K.</au><au>Kitamura, T.</au><au>Kobayashi, T.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Normalized training for HMM-based visual speech recognition</atitle><btitle>Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101)</btitle><stitle>ICIP</stitle><date>2000</date><risdate>2000</risdate><volume>3</volume><spage>234</spage><epage>237 vol.3</epage><pages>234-237 vol.3</pages><issn>1522-4880</issn><eissn>2381-8549</eissn><isbn>0780362977</isbn><isbn>9780780362970</isbn><abstract>This paper presents an approach to estimating the parameters of continuous density HMMs for visual speech recognition. One of the key issues of image-based visual speech recognition is normalization of lip location and lighting conditions prior to estimating the parameters of HMMs. We presented a normalized training method in which the normalization process is integrated in the model training. This paper extends it for contrast normalization in addition to average-intensity and location normalization. The proposed method provides a theoretically-well-defined algorithm based on a maximum likelihood formulation, hence the likelihood for the training data is guaranteed to increase at each iteration of the normalized training. Experiments on the M2VTS database show that the recognition performance can be significantly improved by the normalized training.</abstract><pub>IEEE</pub><doi>10.1109/ICIP.2000.899338</doi></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1522-4880
ispartof	Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101), 2000, Vol.3, p.234-237 vol.3
issn	1522-4880 2381-8549
language	eng
recordid	cdi_ieee_primary_899338
source	IEEE Electronic Library (IEL) Conference Proceedings
subjects	Data mining Hidden Markov models Lips Maximum likelihood estimation Mouth Pixel Speech recognition Training data Vectors
title	Normalized training for HMM-based visual speech recognition
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T15%3A23%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Normalized%20training%20for%20HMM-based%20visual%20speech%20recognition&rft.btitle=Proceedings%202000%20International%20Conference%20on%20Image%20Processing%20(Cat.%20No.00CH37101)&rft.au=Nankaku,%20Y.&rft.date=2000&rft.volume=3&rft.spage=234&rft.epage=237%20vol.3&rft.pages=234-237%20vol.3&rft.issn=1522-4880&rft.eissn=2381-8549&rft.isbn=0780362977&rft.isbn_list=9780780362970&rft_id=info:doi/10.1109/ICIP.2000.899338&rft_dat=%3Cieee_6IE%3E899338%3C/ieee_6IE%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i104t-1e9fd208f0e675f7e332968aa813e90edd5fdf3f322b096e28e004cc5caa0b753%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=899338&rfr_iscdi=true