Loading…
Normalized training for HMM-based visual speech recognition
This paper presents an approach to estimating the parameters of continuous density HMMs for visual speech recognition. One of the key issues of image-based visual speech recognition is normalization of lip location and lighting conditions prior to estimating the parameters of HMMs. We presented a no...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 237 vol.3 |
container_issue | |
container_start_page | 234 |
container_title | |
container_volume | 3 |
creator | Nankaku, Y. Tokuda, K. Kitamura, T. Kobayashi, T. |
description | This paper presents an approach to estimating the parameters of continuous density HMMs for visual speech recognition. One of the key issues of image-based visual speech recognition is normalization of lip location and lighting conditions prior to estimating the parameters of HMMs. We presented a normalized training method in which the normalization process is integrated in the model training. This paper extends it for contrast normalization in addition to average-intensity and location normalization. The proposed method provides a theoretically-well-defined algorithm based on a maximum likelihood formulation, hence the likelihood for the training data is guaranteed to increase at each iteration of the normalized training. Experiments on the M2VTS database show that the recognition performance can be significantly improved by the normalized training. |
doi_str_mv | 10.1109/ICIP.2000.899338 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_899338</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>899338</ieee_id><sourcerecordid>899338</sourcerecordid><originalsourceid>FETCH-LOGICAL-i104t-1e9fd208f0e675f7e332968aa813e90edd5fdf3f322b096e28e004cc5caa0b753</originalsourceid><addsrcrecordid>eNotj01LxDAURYMfYB1nL676B1pfkqZ5wZUUdQoz6kLXQ9q-jJFOOyRV0F9vYVxduHAu5zJ2zSHnHMxtXdWvuQCAHI2REk9YIiTyDFVhTtklaARZCqP1GUu4EiIrEOGCLWP8nCEoVKFRJOzueQx72_tf6tIpWD_4YZe6MaSrzSZrbJzrbx-_bJ_GA1H7kQZqx93gJz8OV-zc2T7S8j8X7P3x4a1aZeuXp7q6X2eeQzFlnIzrBKADKrVymqQUpkRrkUsyQF2nXOekk0I0YEoSSLNf26rWWmi0kgt2c9z1RLQ9BL-34Wd7vC3_AMEtSX4</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Normalized training for HMM-based visual speech recognition</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Nankaku, Y. ; Tokuda, K. ; Kitamura, T. ; Kobayashi, T.</creator><creatorcontrib>Nankaku, Y. ; Tokuda, K. ; Kitamura, T. ; Kobayashi, T.</creatorcontrib><description>This paper presents an approach to estimating the parameters of continuous density HMMs for visual speech recognition. One of the key issues of image-based visual speech recognition is normalization of lip location and lighting conditions prior to estimating the parameters of HMMs. We presented a normalized training method in which the normalization process is integrated in the model training. This paper extends it for contrast normalization in addition to average-intensity and location normalization. The proposed method provides a theoretically-well-defined algorithm based on a maximum likelihood formulation, hence the likelihood for the training data is guaranteed to increase at each iteration of the normalized training. Experiments on the M2VTS database show that the recognition performance can be significantly improved by the normalized training.</description><identifier>ISSN: 1522-4880</identifier><identifier>ISBN: 0780362977</identifier><identifier>ISBN: 9780780362970</identifier><identifier>EISSN: 2381-8549</identifier><identifier>DOI: 10.1109/ICIP.2000.899338</identifier><language>eng</language><publisher>IEEE</publisher><subject>Data mining ; Hidden Markov models ; Lips ; Maximum likelihood estimation ; Mouth ; Pixel ; Speech recognition ; Training data ; Vectors</subject><ispartof>Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101), 2000, Vol.3, p.234-237 vol.3</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/899338$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,4036,4037,27902,54530,54895,54907</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/899338$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Nankaku, Y.</creatorcontrib><creatorcontrib>Tokuda, K.</creatorcontrib><creatorcontrib>Kitamura, T.</creatorcontrib><creatorcontrib>Kobayashi, T.</creatorcontrib><title>Normalized training for HMM-based visual speech recognition</title><title>Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101)</title><addtitle>ICIP</addtitle><description>This paper presents an approach to estimating the parameters of continuous density HMMs for visual speech recognition. One of the key issues of image-based visual speech recognition is normalization of lip location and lighting conditions prior to estimating the parameters of HMMs. We presented a normalized training method in which the normalization process is integrated in the model training. This paper extends it for contrast normalization in addition to average-intensity and location normalization. The proposed method provides a theoretically-well-defined algorithm based on a maximum likelihood formulation, hence the likelihood for the training data is guaranteed to increase at each iteration of the normalized training. Experiments on the M2VTS database show that the recognition performance can be significantly improved by the normalized training.</description><subject>Data mining</subject><subject>Hidden Markov models</subject><subject>Lips</subject><subject>Maximum likelihood estimation</subject><subject>Mouth</subject><subject>Pixel</subject><subject>Speech recognition</subject><subject>Training data</subject><subject>Vectors</subject><issn>1522-4880</issn><issn>2381-8549</issn><isbn>0780362977</isbn><isbn>9780780362970</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2000</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotj01LxDAURYMfYB1nL676B1pfkqZ5wZUUdQoz6kLXQ9q-jJFOOyRV0F9vYVxduHAu5zJ2zSHnHMxtXdWvuQCAHI2REk9YIiTyDFVhTtklaARZCqP1GUu4EiIrEOGCLWP8nCEoVKFRJOzueQx72_tf6tIpWD_4YZe6MaSrzSZrbJzrbx-_bJ_GA1H7kQZqx93gJz8OV-zc2T7S8j8X7P3x4a1aZeuXp7q6X2eeQzFlnIzrBKADKrVymqQUpkRrkUsyQF2nXOekk0I0YEoSSLNf26rWWmi0kgt2c9z1RLQ9BL-34Wd7vC3_AMEtSX4</recordid><startdate>2000</startdate><enddate>2000</enddate><creator>Nankaku, Y.</creator><creator>Tokuda, K.</creator><creator>Kitamura, T.</creator><creator>Kobayashi, T.</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>2000</creationdate><title>Normalized training for HMM-based visual speech recognition</title><author>Nankaku, Y. ; Tokuda, K. ; Kitamura, T. ; Kobayashi, T.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i104t-1e9fd208f0e675f7e332968aa813e90edd5fdf3f322b096e28e004cc5caa0b753</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2000</creationdate><topic>Data mining</topic><topic>Hidden Markov models</topic><topic>Lips</topic><topic>Maximum likelihood estimation</topic><topic>Mouth</topic><topic>Pixel</topic><topic>Speech recognition</topic><topic>Training data</topic><topic>Vectors</topic><toplevel>online_resources</toplevel><creatorcontrib>Nankaku, Y.</creatorcontrib><creatorcontrib>Tokuda, K.</creatorcontrib><creatorcontrib>Kitamura, T.</creatorcontrib><creatorcontrib>Kobayashi, T.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE/IET Electronic Library</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Nankaku, Y.</au><au>Tokuda, K.</au><au>Kitamura, T.</au><au>Kobayashi, T.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Normalized training for HMM-based visual speech recognition</atitle><btitle>Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101)</btitle><stitle>ICIP</stitle><date>2000</date><risdate>2000</risdate><volume>3</volume><spage>234</spage><epage>237 vol.3</epage><pages>234-237 vol.3</pages><issn>1522-4880</issn><eissn>2381-8549</eissn><isbn>0780362977</isbn><isbn>9780780362970</isbn><abstract>This paper presents an approach to estimating the parameters of continuous density HMMs for visual speech recognition. One of the key issues of image-based visual speech recognition is normalization of lip location and lighting conditions prior to estimating the parameters of HMMs. We presented a normalized training method in which the normalization process is integrated in the model training. This paper extends it for contrast normalization in addition to average-intensity and location normalization. The proposed method provides a theoretically-well-defined algorithm based on a maximum likelihood formulation, hence the likelihood for the training data is guaranteed to increase at each iteration of the normalized training. Experiments on the M2VTS database show that the recognition performance can be significantly improved by the normalized training.</abstract><pub>IEEE</pub><doi>10.1109/ICIP.2000.899338</doi></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1522-4880 |
ispartof | Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101), 2000, Vol.3, p.234-237 vol.3 |
issn | 1522-4880 2381-8549 |
language | eng |
recordid | cdi_ieee_primary_899338 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | Data mining Hidden Markov models Lips Maximum likelihood estimation Mouth Pixel Speech recognition Training data Vectors |
title | Normalized training for HMM-based visual speech recognition |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T15%3A23%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Normalized%20training%20for%20HMM-based%20visual%20speech%20recognition&rft.btitle=Proceedings%202000%20International%20Conference%20on%20Image%20Processing%20(Cat.%20No.00CH37101)&rft.au=Nankaku,%20Y.&rft.date=2000&rft.volume=3&rft.spage=234&rft.epage=237%20vol.3&rft.pages=234-237%20vol.3&rft.issn=1522-4880&rft.eissn=2381-8549&rft.isbn=0780362977&rft.isbn_list=9780780362970&rft_id=info:doi/10.1109/ICIP.2000.899338&rft_dat=%3Cieee_6IE%3E899338%3C/ieee_6IE%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i104t-1e9fd208f0e675f7e332968aa813e90edd5fdf3f322b096e28e004cc5caa0b753%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=899338&rfr_iscdi=true |