Loading…

Language model adaptation for video lectures transcription

Videolectures are currently being digitised all over the world for its enormous value as reference resource. Many of these lectures are accompanied with slides. The slides offer a great opportunity for improving ASR systems performance. We propose a simple yet powerful extension to the linear interp...

Full description

Saved in:
Bibliographic Details
Main Authors: Martinez-Villaronga, Adria, del Agua, Miguel A., Andres-Ferrer, Jesus, Juan, Alfons
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 8454
container_issue
container_start_page 8450
container_title
container_volume
creator Martinez-Villaronga, Adria
del Agua, Miguel A.
Andres-Ferrer, Jesus
Juan, Alfons
description Videolectures are currently being digitised all over the world for its enormous value as reference resource. Many of these lectures are accompanied with slides. The slides offer a great opportunity for improving ASR systems performance. We propose a simple yet powerful extension to the linear interpolation of language models for adapting language models with slide information. Two types of slides are considered, correct slides, and slides automatic extracted from the videos with OCR. Furthermore, we compare both time aligned and unaligned slides. Results report an improvement of up to 3.8 % absolute WER points when using correct slides. Surprisingly, when using automatic slides obtained with poor OCR quality, the ASR system still improves up to 2.2 absolute WER points.
doi_str_mv 10.1109/ICASSP.2013.6639314
format conference_proceeding
fullrecord <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_6639314</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6639314</ieee_id><sourcerecordid>6639314</sourcerecordid><originalsourceid>FETCH-LOGICAL-i220t-677c5dfada96447fde9c46621378943d2e4c29a653938d3d4cf3c58d9fadfa793</originalsourceid><addsrcrecordid>eNotj81KxDAUhaMoODP6BLPJC7QmuWnS604G_6CgMAruhpDcDJFOW9KO4NtbcVZnc87H-RhbS1FKKfD2ZXO_3b6VSkgojQEEqc_YUmqLKKAy5pwtFFgsJIrPC7aQlRKFkRqv2HIcv4QQtdX1gt01rtsf3Z74oQ_UchfcMLkp9R2PfebfKVDPW_LTMdPIp-y60ec0_BWu2WV07Ug3p1yxj8eH981z0bw-ze-aIiklpsJY66sQZzAarW0MhF4boyTYGjUERdordKaaHeoAQfsIvqoDzpPoLMKKrf-5iYh2Q04Hl392J2f4BVy_SjI</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Language model adaptation for video lectures transcription</title><source>IEEE Xplore All Conference Series</source><creator>Martinez-Villaronga, Adria ; del Agua, Miguel A. ; Andres-Ferrer, Jesus ; Juan, Alfons</creator><creatorcontrib>Martinez-Villaronga, Adria ; del Agua, Miguel A. ; Andres-Ferrer, Jesus ; Juan, Alfons</creatorcontrib><description>Videolectures are currently being digitised all over the world for its enormous value as reference resource. Many of these lectures are accompanied with slides. The slides offer a great opportunity for improving ASR systems performance. We propose a simple yet powerful extension to the linear interpolation of language models for adapting language models with slide information. Two types of slides are considered, correct slides, and slides automatic extracted from the videos with OCR. Furthermore, we compare both time aligned and unaligned slides. Results report an improvement of up to 3.8 % absolute WER points when using correct slides. Surprisingly, when using automatic slides obtained with poor OCR quality, the ASR system still improves up to 2.2 absolute WER points.</description><identifier>ISSN: 1520-6149</identifier><identifier>EISSN: 2379-190X</identifier><identifier>EISBN: 1479903566</identifier><identifier>EISBN: 9781479903566</identifier><identifier>DOI: 10.1109/ICASSP.2013.6639314</identifier><language>eng</language><publisher>IEEE</publisher><subject>Adaptation models ; Computational modeling ; Hidden Markov models ; Interpolation ; language model adaptation ; Mathematical model ; Optical character recognition software ; video lectures ; Vocabulary</subject><ispartof>2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, p.8450-8454</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6639314$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,23909,23910,25118,27902,54530,54907</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6639314$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Martinez-Villaronga, Adria</creatorcontrib><creatorcontrib>del Agua, Miguel A.</creatorcontrib><creatorcontrib>Andres-Ferrer, Jesus</creatorcontrib><creatorcontrib>Juan, Alfons</creatorcontrib><title>Language model adaptation for video lectures transcription</title><title>2013 IEEE International Conference on Acoustics, Speech and Signal Processing</title><addtitle>ICASSP</addtitle><description>Videolectures are currently being digitised all over the world for its enormous value as reference resource. Many of these lectures are accompanied with slides. The slides offer a great opportunity for improving ASR systems performance. We propose a simple yet powerful extension to the linear interpolation of language models for adapting language models with slide information. Two types of slides are considered, correct slides, and slides automatic extracted from the videos with OCR. Furthermore, we compare both time aligned and unaligned slides. Results report an improvement of up to 3.8 % absolute WER points when using correct slides. Surprisingly, when using automatic slides obtained with poor OCR quality, the ASR system still improves up to 2.2 absolute WER points.</description><subject>Adaptation models</subject><subject>Computational modeling</subject><subject>Hidden Markov models</subject><subject>Interpolation</subject><subject>language model adaptation</subject><subject>Mathematical model</subject><subject>Optical character recognition software</subject><subject>video lectures</subject><subject>Vocabulary</subject><issn>1520-6149</issn><issn>2379-190X</issn><isbn>1479903566</isbn><isbn>9781479903566</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2013</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotj81KxDAUhaMoODP6BLPJC7QmuWnS604G_6CgMAruhpDcDJFOW9KO4NtbcVZnc87H-RhbS1FKKfD2ZXO_3b6VSkgojQEEqc_YUmqLKKAy5pwtFFgsJIrPC7aQlRKFkRqv2HIcv4QQtdX1gt01rtsf3Z74oQ_UchfcMLkp9R2PfebfKVDPW_LTMdPIp-y60ec0_BWu2WV07Ug3p1yxj8eH981z0bw-ze-aIiklpsJY66sQZzAarW0MhF4boyTYGjUERdordKaaHeoAQfsIvqoDzpPoLMKKrf-5iYh2Q04Hl392J2f4BVy_SjI</recordid><startdate>20131018</startdate><enddate>20131018</enddate><creator>Martinez-Villaronga, Adria</creator><creator>del Agua, Miguel A.</creator><creator>Andres-Ferrer, Jesus</creator><creator>Juan, Alfons</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>20131018</creationdate><title>Language model adaptation for video lectures transcription</title><author>Martinez-Villaronga, Adria ; del Agua, Miguel A. ; Andres-Ferrer, Jesus ; Juan, Alfons</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i220t-677c5dfada96447fde9c46621378943d2e4c29a653938d3d4cf3c58d9fadfa793</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Adaptation models</topic><topic>Computational modeling</topic><topic>Hidden Markov models</topic><topic>Interpolation</topic><topic>language model adaptation</topic><topic>Mathematical model</topic><topic>Optical character recognition software</topic><topic>video lectures</topic><topic>Vocabulary</topic><toplevel>online_resources</toplevel><creatorcontrib>Martinez-Villaronga, Adria</creatorcontrib><creatorcontrib>del Agua, Miguel A.</creatorcontrib><creatorcontrib>Andres-Ferrer, Jesus</creatorcontrib><creatorcontrib>Juan, Alfons</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Martinez-Villaronga, Adria</au><au>del Agua, Miguel A.</au><au>Andres-Ferrer, Jesus</au><au>Juan, Alfons</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Language model adaptation for video lectures transcription</atitle><btitle>2013 IEEE International Conference on Acoustics, Speech and Signal Processing</btitle><stitle>ICASSP</stitle><date>2013-10-18</date><risdate>2013</risdate><spage>8450</spage><epage>8454</epage><pages>8450-8454</pages><issn>1520-6149</issn><eissn>2379-190X</eissn><eisbn>1479903566</eisbn><eisbn>9781479903566</eisbn><abstract>Videolectures are currently being digitised all over the world for its enormous value as reference resource. Many of these lectures are accompanied with slides. The slides offer a great opportunity for improving ASR systems performance. We propose a simple yet powerful extension to the linear interpolation of language models for adapting language models with slide information. Two types of slides are considered, correct slides, and slides automatic extracted from the videos with OCR. Furthermore, we compare both time aligned and unaligned slides. Results report an improvement of up to 3.8 % absolute WER points when using correct slides. Surprisingly, when using automatic slides obtained with poor OCR quality, the ASR system still improves up to 2.2 absolute WER points.</abstract><pub>IEEE</pub><doi>10.1109/ICASSP.2013.6639314</doi><tpages>5</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1520-6149
ispartof 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, p.8450-8454
issn 1520-6149
2379-190X
language eng
recordid cdi_ieee_primary_6639314
source IEEE Xplore All Conference Series
subjects Adaptation models
Computational modeling
Hidden Markov models
Interpolation
language model adaptation
Mathematical model
Optical character recognition software
video lectures
Vocabulary
title Language model adaptation for video lectures transcription
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T23%3A04%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Language%20model%20adaptation%20for%20video%20lectures%20transcription&rft.btitle=2013%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech%20and%20Signal%20Processing&rft.au=Martinez-Villaronga,%20Adria&rft.date=2013-10-18&rft.spage=8450&rft.epage=8454&rft.pages=8450-8454&rft.issn=1520-6149&rft.eissn=2379-190X&rft_id=info:doi/10.1109/ICASSP.2013.6639314&rft.eisbn=1479903566&rft.eisbn_list=9781479903566&rft_dat=%3Cieee_CHZPO%3E6639314%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i220t-677c5dfada96447fde9c46621378943d2e4c29a653938d3d4cf3c58d9fadfa793%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6639314&rfr_iscdi=true