Loading…

Language model adaptation for video lectures transcription

Videolectures are currently being digitised all over the world for its enormous value as reference resource. Many of these lectures are accompanied with slides. The slides offer a great opportunity for improving ASR systems performance. We propose a simple yet powerful extension to the linear interp...

Full description

Saved in:

Bibliographic Details
Main Authors:	Martinez-Villaronga, Adria, del Agua, Miguel A., Andres-Ferrer, Jesus, Juan, Alfons
Format:	Conference Proceeding
Language:	English
Subjects:	Adaptation models Computational modeling Hidden Markov models Interpolation language model adaptation Mathematical model Optical character recognition software video lectures Vocabulary
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	8454
container_issue
container_start_page	8450
container_title
container_volume
creator	Martinez-Villaronga, Adria del Agua, Miguel A. Andres-Ferrer, Jesus Juan, Alfons
description	Videolectures are currently being digitised all over the world for its enormous value as reference resource. Many of these lectures are accompanied with slides. The slides offer a great opportunity for improving ASR systems performance. We propose a simple yet powerful extension to the linear interpolation of language models for adapting language models with slide information. Two types of slides are considered, correct slides, and slides automatic extracted from the videos with OCR. Furthermore, we compare both time aligned and unaligned slides. Results report an improvement of up to 3.8 % absolute WER points when using correct slides. Surprisingly, when using automatic slides obtained with poor OCR quality, the ASR system still improves up to 2.2 absolute WER points.
doi_str_mv	10.1109/ICASSP.2013.6639314
format	conference_proceeding
fullrecord	<record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_6639314</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6639314</ieee_id><sourcerecordid>6639314</sourcerecordid><originalsourceid>FETCH-LOGICAL-i220t-677c5dfada96447fde9c46621378943d2e4c29a653938d3d4cf3c58d9fadfa793</originalsourceid><addsrcrecordid>eNotj81KxDAUhaMoODP6BLPJC7QmuWnS604G_6CgMAruhpDcDJFOW9KO4NtbcVZnc87H-RhbS1FKKfD2ZXO_3b6VSkgojQEEqc_YUmqLKKAy5pwtFFgsJIrPC7aQlRKFkRqv2HIcv4QQtdX1gt01rtsf3Z74oQ_UchfcMLkp9R2PfebfKVDPW_LTMdPIp-y60ec0_BWu2WV07Ug3p1yxj8eH981z0bw-ze-aIiklpsJY66sQZzAarW0MhF4boyTYGjUERdordKaaHeoAQfsIvqoDzpPoLMKKrf-5iYh2Q04Hl392J2f4BVy_SjI</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Language model adaptation for video lectures transcription</title><source>IEEE Xplore All Conference Series</source><creator>Martinez-Villaronga, Adria ; del Agua, Miguel A. ; Andres-Ferrer, Jesus ; Juan, Alfons</creator><creatorcontrib>Martinez-Villaronga, Adria ; del Agua, Miguel A. ; Andres-Ferrer, Jesus ; Juan, Alfons</creatorcontrib><description>Videolectures are currently being digitised all over the world for its enormous value as reference resource. Many of these lectures are accompanied with slides. The slides offer a great opportunity for improving ASR systems performance. We propose a simple yet powerful extension to the linear interpolation of language models for adapting language models with slide information. Two types of slides are considered, correct slides, and slides automatic extracted from the videos with OCR. Furthermore, we compare both time aligned and unaligned slides. Results report an improvement of up to 3.8 % absolute WER points when using correct slides. Surprisingly, when using automatic slides obtained with poor OCR quality, the ASR system still improves up to 2.2 absolute WER points.</description><identifier>ISSN: 1520-6149</identifier><identifier>EISSN: 2379-190X</identifier><identifier>EISBN: 1479903566</identifier><identifier>EISBN: 9781479903566</identifier><identifier>DOI: 10.1109/ICASSP.2013.6639314</identifier><language>eng</language><publisher>IEEE</publisher><subject>Adaptation models ; Computational modeling ; Hidden Markov models ; Interpolation ; language model adaptation ; Mathematical model ; Optical character recognition software ; video lectures ; Vocabulary</subject><ispartof>2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, p.8450-8454</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6639314$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,23909,23910,25118,27902,54530,54907</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6639314$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Martinez-Villaronga, Adria</creatorcontrib><creatorcontrib>del Agua, Miguel A.</creatorcontrib><creatorcontrib>Andres-Ferrer, Jesus</creatorcontrib><creatorcontrib>Juan, Alfons</creatorcontrib><title>Language model adaptation for video lectures transcription</title><title>2013 IEEE International Conference on Acoustics, Speech and Signal Processing</title><addtitle>ICASSP</addtitle><description>Videolectures are currently being digitised all over the world for its enormous value as reference resource. Many of these lectures are accompanied with slides. The slides offer a great opportunity for improving ASR systems performance. We propose a simple yet powerful extension to the linear interpolation of language models for adapting language models with slide information. Two types of slides are considered, correct slides, and slides automatic extracted from the videos with OCR. Furthermore, we compare both time aligned and unaligned slides. Results report an improvement of up to 3.8 % absolute WER points when using correct slides. Surprisingly, when using automatic slides obtained with poor OCR quality, the ASR system still improves up to 2.2 absolute WER points.</description><subject>Adaptation models</subject><subject>Computational modeling</subject><subject>Hidden Markov models</subject><subject>Interpolation</subject><subject>language model adaptation</subject><subject>Mathematical model</subject><subject>Optical character recognition software</subject><subject>video lectures</subject><subject>Vocabulary</subject><issn>1520-6149</issn><issn>2379-190X</issn><isbn>1479903566</isbn><isbn>9781479903566</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2013</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotj81KxDAUhaMoODP6BLPJC7QmuWnS604G_6CgMAruhpDcDJFOW9KO4NtbcVZnc87H-RhbS1FKKfD2ZXO_3b6VSkgojQEEqc_YUmqLKKAy5pwtFFgsJIrPC7aQlRKFkRqv2HIcv4QQtdX1gt01rtsf3Z74oQ_UchfcMLkp9R2PfebfKVDPW_LTMdPIp-y60ec0_BWu2WV07Ug3p1yxj8eH981z0bw-ze-aIiklpsJY66sQZzAarW0MhF4boyTYGjUERdordKaaHeoAQfsIvqoDzpPoLMKKrf-5iYh2Q04Hl392J2f4BVy_SjI</recordid><startdate>20131018</startdate><enddate>20131018</enddate><creator>Martinez-Villaronga, Adria</creator><creator>del Agua, Miguel A.</creator><creator>Andres-Ferrer, Jesus</creator><creator>Juan, Alfons</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>20131018</creationdate><title>Language model adaptation for video lectures transcription</title><author>Martinez-Villaronga, Adria ; del Agua, Miguel A. ; Andres-Ferrer, Jesus ; Juan, Alfons</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i220t-677c5dfada96447fde9c46621378943d2e4c29a653938d3d4cf3c58d9fadfa793</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Adaptation models</topic><topic>Computational modeling</topic><topic>Hidden Markov models</topic><topic>Interpolation</topic><topic>language model adaptation</topic><topic>Mathematical model</topic><topic>Optical character recognition software</topic><topic>video lectures</topic><topic>Vocabulary</topic><toplevel>online_resources</toplevel><creatorcontrib>Martinez-Villaronga, Adria</creatorcontrib><creatorcontrib>del Agua, Miguel A.</creatorcontrib><creatorcontrib>Andres-Ferrer, Jesus</creatorcontrib><creatorcontrib>Juan, Alfons</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Martinez-Villaronga, Adria</au><au>del Agua, Miguel A.</au><au>Andres-Ferrer, Jesus</au><au>Juan, Alfons</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Language model adaptation for video lectures transcription</atitle><btitle>2013 IEEE International Conference on Acoustics, Speech and Signal Processing</btitle><stitle>ICASSP</stitle><date>2013-10-18</date><risdate>2013</risdate><spage>8450</spage><epage>8454</epage><pages>8450-8454</pages><issn>1520-6149</issn><eissn>2379-190X</eissn><eisbn>1479903566</eisbn><eisbn>9781479903566</eisbn><abstract>Videolectures are currently being digitised all over the world for its enormous value as reference resource. Many of these lectures are accompanied with slides. The slides offer a great opportunity for improving ASR systems performance. We propose a simple yet powerful extension to the linear interpolation of language models for adapting language models with slide information. Two types of slides are considered, correct slides, and slides automatic extracted from the videos with OCR. Furthermore, we compare both time aligned and unaligned slides. Results report an improvement of up to 3.8 % absolute WER points when using correct slides. Surprisingly, when using automatic slides obtained with poor OCR quality, the ASR system still improves up to 2.2 absolute WER points.</abstract><pub>IEEE</pub><doi>10.1109/ICASSP.2013.6639314</doi><tpages>5</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1520-6149
ispartof	2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, p.8450-8454
issn	1520-6149 2379-190X
language	eng
recordid	cdi_ieee_primary_6639314
source	IEEE Xplore All Conference Series
subjects	Adaptation models Computational modeling Hidden Markov models Interpolation language model adaptation Mathematical model Optical character recognition software video lectures Vocabulary
title	Language model adaptation for video lectures transcription
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T23%3A04%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Language%20model%20adaptation%20for%20video%20lectures%20transcription&rft.btitle=2013%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech%20and%20Signal%20Processing&rft.au=Martinez-Villaronga,%20Adria&rft.date=2013-10-18&rft.spage=8450&rft.epage=8454&rft.pages=8450-8454&rft.issn=1520-6149&rft.eissn=2379-190X&rft_id=info:doi/10.1109/ICASSP.2013.6639314&rft.eisbn=1479903566&rft.eisbn_list=9781479903566&rft_dat=%3Cieee_CHZPO%3E6639314%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i220t-677c5dfada96447fde9c46621378943d2e4c29a653938d3d4cf3c58d9fadfa793%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6639314&rfr_iscdi=true