Loading…

Normalized training for HMM-based visual speech recognition

This paper presents an approach to estimating the parameters of continuous density HMMs for visual speech recognition. One of the key issues of image-based visual speech recognition is normalization of lip location and lighting conditions prior to estimating the parameters of HMMs. We presented a no...

Full description

Saved in:
Bibliographic Details
Main Authors: Nankaku, Y., Tokuda, K., Kitamura, T., Kobayashi, T.
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 237 vol.3
container_issue
container_start_page 234
container_title
container_volume 3
creator Nankaku, Y.
Tokuda, K.
Kitamura, T.
Kobayashi, T.
description This paper presents an approach to estimating the parameters of continuous density HMMs for visual speech recognition. One of the key issues of image-based visual speech recognition is normalization of lip location and lighting conditions prior to estimating the parameters of HMMs. We presented a normalized training method in which the normalization process is integrated in the model training. This paper extends it for contrast normalization in addition to average-intensity and location normalization. The proposed method provides a theoretically-well-defined algorithm based on a maximum likelihood formulation, hence the likelihood for the training data is guaranteed to increase at each iteration of the normalized training. Experiments on the M2VTS database show that the recognition performance can be significantly improved by the normalized training.
doi_str_mv 10.1109/ICIP.2000.899338
format conference_proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_899338</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>899338</ieee_id><sourcerecordid>899338</sourcerecordid><originalsourceid>FETCH-LOGICAL-i104t-1e9fd208f0e675f7e332968aa813e90edd5fdf3f322b096e28e004cc5caa0b753</originalsourceid><addsrcrecordid>eNotj01LxDAURYMfYB1nL676B1pfkqZ5wZUUdQoz6kLXQ9q-jJFOOyRV0F9vYVxduHAu5zJ2zSHnHMxtXdWvuQCAHI2REk9YIiTyDFVhTtklaARZCqP1GUu4EiIrEOGCLWP8nCEoVKFRJOzueQx72_tf6tIpWD_4YZe6MaSrzSZrbJzrbx-_bJ_GA1H7kQZqx93gJz8OV-zc2T7S8j8X7P3x4a1aZeuXp7q6X2eeQzFlnIzrBKADKrVymqQUpkRrkUsyQF2nXOekk0I0YEoSSLNf26rWWmi0kgt2c9z1RLQ9BL-34Wd7vC3_AMEtSX4</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Normalized training for HMM-based visual speech recognition</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Nankaku, Y. ; Tokuda, K. ; Kitamura, T. ; Kobayashi, T.</creator><creatorcontrib>Nankaku, Y. ; Tokuda, K. ; Kitamura, T. ; Kobayashi, T.</creatorcontrib><description>This paper presents an approach to estimating the parameters of continuous density HMMs for visual speech recognition. One of the key issues of image-based visual speech recognition is normalization of lip location and lighting conditions prior to estimating the parameters of HMMs. We presented a normalized training method in which the normalization process is integrated in the model training. This paper extends it for contrast normalization in addition to average-intensity and location normalization. The proposed method provides a theoretically-well-defined algorithm based on a maximum likelihood formulation, hence the likelihood for the training data is guaranteed to increase at each iteration of the normalized training. Experiments on the M2VTS database show that the recognition performance can be significantly improved by the normalized training.</description><identifier>ISSN: 1522-4880</identifier><identifier>ISBN: 0780362977</identifier><identifier>ISBN: 9780780362970</identifier><identifier>EISSN: 2381-8549</identifier><identifier>DOI: 10.1109/ICIP.2000.899338</identifier><language>eng</language><publisher>IEEE</publisher><subject>Data mining ; Hidden Markov models ; Lips ; Maximum likelihood estimation ; Mouth ; Pixel ; Speech recognition ; Training data ; Vectors</subject><ispartof>Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101), 2000, Vol.3, p.234-237 vol.3</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/899338$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,4036,4037,27902,54530,54895,54907</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/899338$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Nankaku, Y.</creatorcontrib><creatorcontrib>Tokuda, K.</creatorcontrib><creatorcontrib>Kitamura, T.</creatorcontrib><creatorcontrib>Kobayashi, T.</creatorcontrib><title>Normalized training for HMM-based visual speech recognition</title><title>Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101)</title><addtitle>ICIP</addtitle><description>This paper presents an approach to estimating the parameters of continuous density HMMs for visual speech recognition. One of the key issues of image-based visual speech recognition is normalization of lip location and lighting conditions prior to estimating the parameters of HMMs. We presented a normalized training method in which the normalization process is integrated in the model training. This paper extends it for contrast normalization in addition to average-intensity and location normalization. The proposed method provides a theoretically-well-defined algorithm based on a maximum likelihood formulation, hence the likelihood for the training data is guaranteed to increase at each iteration of the normalized training. Experiments on the M2VTS database show that the recognition performance can be significantly improved by the normalized training.</description><subject>Data mining</subject><subject>Hidden Markov models</subject><subject>Lips</subject><subject>Maximum likelihood estimation</subject><subject>Mouth</subject><subject>Pixel</subject><subject>Speech recognition</subject><subject>Training data</subject><subject>Vectors</subject><issn>1522-4880</issn><issn>2381-8549</issn><isbn>0780362977</isbn><isbn>9780780362970</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2000</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotj01LxDAURYMfYB1nL676B1pfkqZ5wZUUdQoz6kLXQ9q-jJFOOyRV0F9vYVxduHAu5zJ2zSHnHMxtXdWvuQCAHI2REk9YIiTyDFVhTtklaARZCqP1GUu4EiIrEOGCLWP8nCEoVKFRJOzueQx72_tf6tIpWD_4YZe6MaSrzSZrbJzrbx-_bJ_GA1H7kQZqx93gJz8OV-zc2T7S8j8X7P3x4a1aZeuXp7q6X2eeQzFlnIzrBKADKrVymqQUpkRrkUsyQF2nXOekk0I0YEoSSLNf26rWWmi0kgt2c9z1RLQ9BL-34Wd7vC3_AMEtSX4</recordid><startdate>2000</startdate><enddate>2000</enddate><creator>Nankaku, Y.</creator><creator>Tokuda, K.</creator><creator>Kitamura, T.</creator><creator>Kobayashi, T.</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>2000</creationdate><title>Normalized training for HMM-based visual speech recognition</title><author>Nankaku, Y. ; Tokuda, K. ; Kitamura, T. ; Kobayashi, T.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i104t-1e9fd208f0e675f7e332968aa813e90edd5fdf3f322b096e28e004cc5caa0b753</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2000</creationdate><topic>Data mining</topic><topic>Hidden Markov models</topic><topic>Lips</topic><topic>Maximum likelihood estimation</topic><topic>Mouth</topic><topic>Pixel</topic><topic>Speech recognition</topic><topic>Training data</topic><topic>Vectors</topic><toplevel>online_resources</toplevel><creatorcontrib>Nankaku, Y.</creatorcontrib><creatorcontrib>Tokuda, K.</creatorcontrib><creatorcontrib>Kitamura, T.</creatorcontrib><creatorcontrib>Kobayashi, T.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE/IET Electronic Library</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Nankaku, Y.</au><au>Tokuda, K.</au><au>Kitamura, T.</au><au>Kobayashi, T.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Normalized training for HMM-based visual speech recognition</atitle><btitle>Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101)</btitle><stitle>ICIP</stitle><date>2000</date><risdate>2000</risdate><volume>3</volume><spage>234</spage><epage>237 vol.3</epage><pages>234-237 vol.3</pages><issn>1522-4880</issn><eissn>2381-8549</eissn><isbn>0780362977</isbn><isbn>9780780362970</isbn><abstract>This paper presents an approach to estimating the parameters of continuous density HMMs for visual speech recognition. One of the key issues of image-based visual speech recognition is normalization of lip location and lighting conditions prior to estimating the parameters of HMMs. We presented a normalized training method in which the normalization process is integrated in the model training. This paper extends it for contrast normalization in addition to average-intensity and location normalization. The proposed method provides a theoretically-well-defined algorithm based on a maximum likelihood formulation, hence the likelihood for the training data is guaranteed to increase at each iteration of the normalized training. Experiments on the M2VTS database show that the recognition performance can be significantly improved by the normalized training.</abstract><pub>IEEE</pub><doi>10.1109/ICIP.2000.899338</doi></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1522-4880
ispartof Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101), 2000, Vol.3, p.234-237 vol.3
issn 1522-4880
2381-8549
language eng
recordid cdi_ieee_primary_899338
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Data mining
Hidden Markov models
Lips
Maximum likelihood estimation
Mouth
Pixel
Speech recognition
Training data
Vectors
title Normalized training for HMM-based visual speech recognition
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T15%3A23%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Normalized%20training%20for%20HMM-based%20visual%20speech%20recognition&rft.btitle=Proceedings%202000%20International%20Conference%20on%20Image%20Processing%20(Cat.%20No.00CH37101)&rft.au=Nankaku,%20Y.&rft.date=2000&rft.volume=3&rft.spage=234&rft.epage=237%20vol.3&rft.pages=234-237%20vol.3&rft.issn=1522-4880&rft.eissn=2381-8549&rft.isbn=0780362977&rft.isbn_list=9780780362970&rft_id=info:doi/10.1109/ICIP.2000.899338&rft_dat=%3Cieee_6IE%3E899338%3C/ieee_6IE%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i104t-1e9fd208f0e675f7e332968aa813e90edd5fdf3f322b096e28e004cc5caa0b753%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=899338&rfr_iscdi=true