Loading…
Language model adaptation for video lectures transcription
Videolectures are currently being digitised all over the world for its enormous value as reference resource. Many of these lectures are accompanied with slides. The slides offer a great opportunity for improving ASR systems performance. We propose a simple yet powerful extension to the linear interp...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 8454 |
container_issue | |
container_start_page | 8450 |
container_title | |
container_volume | |
creator | Martinez-Villaronga, Adria del Agua, Miguel A. Andres-Ferrer, Jesus Juan, Alfons |
description | Videolectures are currently being digitised all over the world for its enormous value as reference resource. Many of these lectures are accompanied with slides. The slides offer a great opportunity for improving ASR systems performance. We propose a simple yet powerful extension to the linear interpolation of language models for adapting language models with slide information. Two types of slides are considered, correct slides, and slides automatic extracted from the videos with OCR. Furthermore, we compare both time aligned and unaligned slides. Results report an improvement of up to 3.8 % absolute WER points when using correct slides. Surprisingly, when using automatic slides obtained with poor OCR quality, the ASR system still improves up to 2.2 absolute WER points. |
doi_str_mv | 10.1109/ICASSP.2013.6639314 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_6639314</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6639314</ieee_id><sourcerecordid>6639314</sourcerecordid><originalsourceid>FETCH-LOGICAL-i220t-677c5dfada96447fde9c46621378943d2e4c29a653938d3d4cf3c58d9fadfa793</originalsourceid><addsrcrecordid>eNotj81KxDAUhaMoODP6BLPJC7QmuWnS604G_6CgMAruhpDcDJFOW9KO4NtbcVZnc87H-RhbS1FKKfD2ZXO_3b6VSkgojQEEqc_YUmqLKKAy5pwtFFgsJIrPC7aQlRKFkRqv2HIcv4QQtdX1gt01rtsf3Z74oQ_UchfcMLkp9R2PfebfKVDPW_LTMdPIp-y60ec0_BWu2WV07Ug3p1yxj8eH981z0bw-ze-aIiklpsJY66sQZzAarW0MhF4boyTYGjUERdordKaaHeoAQfsIvqoDzpPoLMKKrf-5iYh2Q04Hl392J2f4BVy_SjI</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Language model adaptation for video lectures transcription</title><source>IEEE Xplore All Conference Series</source><creator>Martinez-Villaronga, Adria ; del Agua, Miguel A. ; Andres-Ferrer, Jesus ; Juan, Alfons</creator><creatorcontrib>Martinez-Villaronga, Adria ; del Agua, Miguel A. ; Andres-Ferrer, Jesus ; Juan, Alfons</creatorcontrib><description>Videolectures are currently being digitised all over the world for its enormous value as reference resource. Many of these lectures are accompanied with slides. The slides offer a great opportunity for improving ASR systems performance. We propose a simple yet powerful extension to the linear interpolation of language models for adapting language models with slide information. Two types of slides are considered, correct slides, and slides automatic extracted from the videos with OCR. Furthermore, we compare both time aligned and unaligned slides. Results report an improvement of up to 3.8 % absolute WER points when using correct slides. Surprisingly, when using automatic slides obtained with poor OCR quality, the ASR system still improves up to 2.2 absolute WER points.</description><identifier>ISSN: 1520-6149</identifier><identifier>EISSN: 2379-190X</identifier><identifier>EISBN: 1479903566</identifier><identifier>EISBN: 9781479903566</identifier><identifier>DOI: 10.1109/ICASSP.2013.6639314</identifier><language>eng</language><publisher>IEEE</publisher><subject>Adaptation models ; Computational modeling ; Hidden Markov models ; Interpolation ; language model adaptation ; Mathematical model ; Optical character recognition software ; video lectures ; Vocabulary</subject><ispartof>2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, p.8450-8454</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6639314$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,23909,23910,25118,27902,54530,54907</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6639314$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Martinez-Villaronga, Adria</creatorcontrib><creatorcontrib>del Agua, Miguel A.</creatorcontrib><creatorcontrib>Andres-Ferrer, Jesus</creatorcontrib><creatorcontrib>Juan, Alfons</creatorcontrib><title>Language model adaptation for video lectures transcription</title><title>2013 IEEE International Conference on Acoustics, Speech and Signal Processing</title><addtitle>ICASSP</addtitle><description>Videolectures are currently being digitised all over the world for its enormous value as reference resource. Many of these lectures are accompanied with slides. The slides offer a great opportunity for improving ASR systems performance. We propose a simple yet powerful extension to the linear interpolation of language models for adapting language models with slide information. Two types of slides are considered, correct slides, and slides automatic extracted from the videos with OCR. Furthermore, we compare both time aligned and unaligned slides. Results report an improvement of up to 3.8 % absolute WER points when using correct slides. Surprisingly, when using automatic slides obtained with poor OCR quality, the ASR system still improves up to 2.2 absolute WER points.</description><subject>Adaptation models</subject><subject>Computational modeling</subject><subject>Hidden Markov models</subject><subject>Interpolation</subject><subject>language model adaptation</subject><subject>Mathematical model</subject><subject>Optical character recognition software</subject><subject>video lectures</subject><subject>Vocabulary</subject><issn>1520-6149</issn><issn>2379-190X</issn><isbn>1479903566</isbn><isbn>9781479903566</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2013</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotj81KxDAUhaMoODP6BLPJC7QmuWnS604G_6CgMAruhpDcDJFOW9KO4NtbcVZnc87H-RhbS1FKKfD2ZXO_3b6VSkgojQEEqc_YUmqLKKAy5pwtFFgsJIrPC7aQlRKFkRqv2HIcv4QQtdX1gt01rtsf3Z74oQ_UchfcMLkp9R2PfebfKVDPW_LTMdPIp-y60ec0_BWu2WV07Ug3p1yxj8eH981z0bw-ze-aIiklpsJY66sQZzAarW0MhF4boyTYGjUERdordKaaHeoAQfsIvqoDzpPoLMKKrf-5iYh2Q04Hl392J2f4BVy_SjI</recordid><startdate>20131018</startdate><enddate>20131018</enddate><creator>Martinez-Villaronga, Adria</creator><creator>del Agua, Miguel A.</creator><creator>Andres-Ferrer, Jesus</creator><creator>Juan, Alfons</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>20131018</creationdate><title>Language model adaptation for video lectures transcription</title><author>Martinez-Villaronga, Adria ; del Agua, Miguel A. ; Andres-Ferrer, Jesus ; Juan, Alfons</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i220t-677c5dfada96447fde9c46621378943d2e4c29a653938d3d4cf3c58d9fadfa793</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Adaptation models</topic><topic>Computational modeling</topic><topic>Hidden Markov models</topic><topic>Interpolation</topic><topic>language model adaptation</topic><topic>Mathematical model</topic><topic>Optical character recognition software</topic><topic>video lectures</topic><topic>Vocabulary</topic><toplevel>online_resources</toplevel><creatorcontrib>Martinez-Villaronga, Adria</creatorcontrib><creatorcontrib>del Agua, Miguel A.</creatorcontrib><creatorcontrib>Andres-Ferrer, Jesus</creatorcontrib><creatorcontrib>Juan, Alfons</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Martinez-Villaronga, Adria</au><au>del Agua, Miguel A.</au><au>Andres-Ferrer, Jesus</au><au>Juan, Alfons</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Language model adaptation for video lectures transcription</atitle><btitle>2013 IEEE International Conference on Acoustics, Speech and Signal Processing</btitle><stitle>ICASSP</stitle><date>2013-10-18</date><risdate>2013</risdate><spage>8450</spage><epage>8454</epage><pages>8450-8454</pages><issn>1520-6149</issn><eissn>2379-190X</eissn><eisbn>1479903566</eisbn><eisbn>9781479903566</eisbn><abstract>Videolectures are currently being digitised all over the world for its enormous value as reference resource. Many of these lectures are accompanied with slides. The slides offer a great opportunity for improving ASR systems performance. We propose a simple yet powerful extension to the linear interpolation of language models for adapting language models with slide information. Two types of slides are considered, correct slides, and slides automatic extracted from the videos with OCR. Furthermore, we compare both time aligned and unaligned slides. Results report an improvement of up to 3.8 % absolute WER points when using correct slides. Surprisingly, when using automatic slides obtained with poor OCR quality, the ASR system still improves up to 2.2 absolute WER points.</abstract><pub>IEEE</pub><doi>10.1109/ICASSP.2013.6639314</doi><tpages>5</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1520-6149 |
ispartof | 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, p.8450-8454 |
issn | 1520-6149 2379-190X |
language | eng |
recordid | cdi_ieee_primary_6639314 |
source | IEEE Xplore All Conference Series |
subjects | Adaptation models Computational modeling Hidden Markov models Interpolation language model adaptation Mathematical model Optical character recognition software video lectures Vocabulary |
title | Language model adaptation for video lectures transcription |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T23%3A04%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Language%20model%20adaptation%20for%20video%20lectures%20transcription&rft.btitle=2013%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech%20and%20Signal%20Processing&rft.au=Martinez-Villaronga,%20Adria&rft.date=2013-10-18&rft.spage=8450&rft.epage=8454&rft.pages=8450-8454&rft.issn=1520-6149&rft.eissn=2379-190X&rft_id=info:doi/10.1109/ICASSP.2013.6639314&rft.eisbn=1479903566&rft.eisbn_list=9781479903566&rft_dat=%3Cieee_CHZPO%3E6639314%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i220t-677c5dfada96447fde9c46621378943d2e4c29a653938d3d4cf3c58d9fadfa793%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6639314&rfr_iscdi=true |