Loading…

Semiautomatic Ground Truth Generation for Text Detection and Recognition in Video Images

Although a large number of methods for video text detection and recognition have been proposed over the past years, it is hard to find the best state-of-the-art method because of nonavailability of standard datasets, ground truth, and common evaluation measures. Therefore, in this paper, we propose...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on circuits and systems for video technology 2014-08, Vol.24 (8), p.1277-1287
Main Authors: Trung Quy Phan, Shivakumara, Palaiahnakote, Bhowmick, Souvik, Shimiao Li, Chew Lim Tan, Pal, Umapada
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c358t-972f35d3a18534deb69727a50371bf6d2adfc9198a456806790f9657ad5e3faf3
cites cdi_FETCH-LOGICAL-c358t-972f35d3a18534deb69727a50371bf6d2adfc9198a456806790f9657ad5e3faf3
container_end_page 1287
container_issue 8
container_start_page 1277
container_title IEEE transactions on circuits and systems for video technology
container_volume 24
creator Trung Quy Phan
Shivakumara, Palaiahnakote
Bhowmick, Souvik
Shimiao Li
Chew Lim Tan
Pal, Umapada
description Although a large number of methods for video text detection and recognition have been proposed over the past years, it is hard to find the best state-of-the-art method because of nonavailability of standard datasets, ground truth, and common evaluation measures. Therefore, in this paper, we propose a semiautomatic system for ground truth generation for video text detection and recognition, which includes English and Chinese text of different orientation. The system has a facility to allow the user to manually correct the ground truth if the automatic method produces incorrect results. We propose eleven attributes at the word level, namely: line index, word index, coordinate values of bounding box, area, content, script type, orientation information, type of text (caption/scene), condition of text (distortion/distortion free), start frame, and end frame to evaluate the performance of the method. We also introduce a new dataset that consists of 466 video frames collected from TRECVID 2005 and 2006 databases. The video frames in our dataset contain both horizontal texts (278 frames: 181 with English texts and 97 with Chinese texts) and nonhorizontal texts (188 frames: 140 English and 48 Chinese). Furthermore, the performance of the proposed system is compared with existing text detection methods by calculating measures manually and automatically to show usefulness of our semiautomatic system. The ground truth and the semiautomatic system will be released to the public.
doi_str_mv 10.1109/TCSVT.2014.2305515
format article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_6739120</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6739120</ieee_id><sourcerecordid>3395811121</sourcerecordid><originalsourceid>FETCH-LOGICAL-c358t-972f35d3a18534deb69727a50371bf6d2adfc9198a456806790f9657ad5e3faf3</originalsourceid><addsrcrecordid>eNpdkFtLxDAQhYsoeP0D-lIQwZeuuXSa5lFWXRcWBK3iW4npRLNsmzVpQf-92Qv74NPMnPnmMJwkOadkRCmRN9X45a0aMULzEeMEgMJeckQByowxAvuxJ0CzklE4TI5DmJNIlrk4St5fsLVq6F2reqvTiXdD16SVH_qvdIId-ii7LjXOpxX-9Okd9qjXkorcM2r32dn1bLv0zTbo0mmrPjGcJgdGLQKebetJ8vpwX40fs9nTZDq-nWWaQ9lnUjDDoeGKlsDzBj-KqAgFhAv6YYqGqcZoSWWpcihKUghJjCxAqAaQG2X4SXK98V169z1g6OvWBo2LherQDaGmUAhSSCryiF7-Q-du8F38LlJAo3tOVhTbUNq7EDyaeultq_xvTUm9Crteh12vwq63Ycejq621ClotjFedtmF3yUqRCwIychcbziLibl0ILikj_A_rg4cj</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1551806404</pqid></control><display><type>article</type><title>Semiautomatic Ground Truth Generation for Text Detection and Recognition in Video Images</title><source>IEEE Xplore (Online service)</source><creator>Trung Quy Phan ; Shivakumara, Palaiahnakote ; Bhowmick, Souvik ; Shimiao Li ; Chew Lim Tan ; Pal, Umapada</creator><creatorcontrib>Trung Quy Phan ; Shivakumara, Palaiahnakote ; Bhowmick, Souvik ; Shimiao Li ; Chew Lim Tan ; Pal, Umapada</creatorcontrib><description>Although a large number of methods for video text detection and recognition have been proposed over the past years, it is hard to find the best state-of-the-art method because of nonavailability of standard datasets, ground truth, and common evaluation measures. Therefore, in this paper, we propose a semiautomatic system for ground truth generation for video text detection and recognition, which includes English and Chinese text of different orientation. The system has a facility to allow the user to manually correct the ground truth if the automatic method produces incorrect results. We propose eleven attributes at the word level, namely: line index, word index, coordinate values of bounding box, area, content, script type, orientation information, type of text (caption/scene), condition of text (distortion/distortion free), start frame, and end frame to evaluate the performance of the method. We also introduce a new dataset that consists of 466 video frames collected from TRECVID 2005 and 2006 databases. The video frames in our dataset contain both horizontal texts (278 frames: 181 with English texts and 97 with Chinese texts) and nonhorizontal texts (188 frames: 140 English and 48 Chinese). Furthermore, the performance of the proposed system is compared with existing text detection methods by calculating measures manually and automatically to show usefulness of our semiautomatic system. The ground truth and the semiautomatic system will be released to the public.</description><identifier>ISSN: 1051-8215</identifier><identifier>EISSN: 1558-2205</identifier><identifier>DOI: 10.1109/TCSVT.2014.2305515</identifier><identifier>CODEN: ITCTEM</identifier><language>eng</language><publisher>New York, NY: IEEE</publisher><subject>Accuracy ; Applied sciences ; Automation ; Chinese Video text recognition ; Distortion ; Exact sciences and technology ; Frames ; Graphics ; Ground truth ; Ground truthing ; Indexes ; Information, signal and communications theory ; Optical character recognition software ; Orientation ; Pattern recognition ; Performance indices ; Recognition ; Signal processing ; Telecommunications and information theory ; Text recognition ; Texts ; Video text detection ; Video text recognition</subject><ispartof>IEEE transactions on circuits and systems for video technology, 2014-08, Vol.24 (8), p.1277-1287</ispartof><rights>2015 INIST-CNRS</rights><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Aug 2014</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c358t-972f35d3a18534deb69727a50371bf6d2adfc9198a456806790f9657ad5e3faf3</citedby><cites>FETCH-LOGICAL-c358t-972f35d3a18534deb69727a50371bf6d2adfc9198a456806790f9657ad5e3faf3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6739120$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,27903,27904,54775</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=28747059$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Trung Quy Phan</creatorcontrib><creatorcontrib>Shivakumara, Palaiahnakote</creatorcontrib><creatorcontrib>Bhowmick, Souvik</creatorcontrib><creatorcontrib>Shimiao Li</creatorcontrib><creatorcontrib>Chew Lim Tan</creatorcontrib><creatorcontrib>Pal, Umapada</creatorcontrib><title>Semiautomatic Ground Truth Generation for Text Detection and Recognition in Video Images</title><title>IEEE transactions on circuits and systems for video technology</title><addtitle>TCSVT</addtitle><description>Although a large number of methods for video text detection and recognition have been proposed over the past years, it is hard to find the best state-of-the-art method because of nonavailability of standard datasets, ground truth, and common evaluation measures. Therefore, in this paper, we propose a semiautomatic system for ground truth generation for video text detection and recognition, which includes English and Chinese text of different orientation. The system has a facility to allow the user to manually correct the ground truth if the automatic method produces incorrect results. We propose eleven attributes at the word level, namely: line index, word index, coordinate values of bounding box, area, content, script type, orientation information, type of text (caption/scene), condition of text (distortion/distortion free), start frame, and end frame to evaluate the performance of the method. We also introduce a new dataset that consists of 466 video frames collected from TRECVID 2005 and 2006 databases. The video frames in our dataset contain both horizontal texts (278 frames: 181 with English texts and 97 with Chinese texts) and nonhorizontal texts (188 frames: 140 English and 48 Chinese). Furthermore, the performance of the proposed system is compared with existing text detection methods by calculating measures manually and automatically to show usefulness of our semiautomatic system. The ground truth and the semiautomatic system will be released to the public.</description><subject>Accuracy</subject><subject>Applied sciences</subject><subject>Automation</subject><subject>Chinese Video text recognition</subject><subject>Distortion</subject><subject>Exact sciences and technology</subject><subject>Frames</subject><subject>Graphics</subject><subject>Ground truth</subject><subject>Ground truthing</subject><subject>Indexes</subject><subject>Information, signal and communications theory</subject><subject>Optical character recognition software</subject><subject>Orientation</subject><subject>Pattern recognition</subject><subject>Performance indices</subject><subject>Recognition</subject><subject>Signal processing</subject><subject>Telecommunications and information theory</subject><subject>Text recognition</subject><subject>Texts</subject><subject>Video text detection</subject><subject>Video text recognition</subject><issn>1051-8215</issn><issn>1558-2205</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><recordid>eNpdkFtLxDAQhYsoeP0D-lIQwZeuuXSa5lFWXRcWBK3iW4npRLNsmzVpQf-92Qv74NPMnPnmMJwkOadkRCmRN9X45a0aMULzEeMEgMJeckQByowxAvuxJ0CzklE4TI5DmJNIlrk4St5fsLVq6F2reqvTiXdD16SVH_qvdIId-ii7LjXOpxX-9Okd9qjXkorcM2r32dn1bLv0zTbo0mmrPjGcJgdGLQKebetJ8vpwX40fs9nTZDq-nWWaQ9lnUjDDoeGKlsDzBj-KqAgFhAv6YYqGqcZoSWWpcihKUghJjCxAqAaQG2X4SXK98V169z1g6OvWBo2LherQDaGmUAhSSCryiF7-Q-du8F38LlJAo3tOVhTbUNq7EDyaeultq_xvTUm9Crteh12vwq63Ycejq621ClotjFedtmF3yUqRCwIychcbziLibl0ILikj_A_rg4cj</recordid><startdate>20140801</startdate><enddate>20140801</enddate><creator>Trung Quy Phan</creator><creator>Shivakumara, Palaiahnakote</creator><creator>Bhowmick, Souvik</creator><creator>Shimiao Li</creator><creator>Chew Lim Tan</creator><creator>Pal, Umapada</creator><general>IEEE</general><general>Institute of Electrical and Electronics Engineers</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>F28</scope><scope>FR3</scope></search><sort><creationdate>20140801</creationdate><title>Semiautomatic Ground Truth Generation for Text Detection and Recognition in Video Images</title><author>Trung Quy Phan ; Shivakumara, Palaiahnakote ; Bhowmick, Souvik ; Shimiao Li ; Chew Lim Tan ; Pal, Umapada</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c358t-972f35d3a18534deb69727a50371bf6d2adfc9198a456806790f9657ad5e3faf3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Accuracy</topic><topic>Applied sciences</topic><topic>Automation</topic><topic>Chinese Video text recognition</topic><topic>Distortion</topic><topic>Exact sciences and technology</topic><topic>Frames</topic><topic>Graphics</topic><topic>Ground truth</topic><topic>Ground truthing</topic><topic>Indexes</topic><topic>Information, signal and communications theory</topic><topic>Optical character recognition software</topic><topic>Orientation</topic><topic>Pattern recognition</topic><topic>Performance indices</topic><topic>Recognition</topic><topic>Signal processing</topic><topic>Telecommunications and information theory</topic><topic>Text recognition</topic><topic>Texts</topic><topic>Video text detection</topic><topic>Video text recognition</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Trung Quy Phan</creatorcontrib><creatorcontrib>Shivakumara, Palaiahnakote</creatorcontrib><creatorcontrib>Bhowmick, Souvik</creatorcontrib><creatorcontrib>Shimiao Li</creatorcontrib><creatorcontrib>Chew Lim Tan</creatorcontrib><creatorcontrib>Pal, Umapada</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><jtitle>IEEE transactions on circuits and systems for video technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Trung Quy Phan</au><au>Shivakumara, Palaiahnakote</au><au>Bhowmick, Souvik</au><au>Shimiao Li</au><au>Chew Lim Tan</au><au>Pal, Umapada</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Semiautomatic Ground Truth Generation for Text Detection and Recognition in Video Images</atitle><jtitle>IEEE transactions on circuits and systems for video technology</jtitle><stitle>TCSVT</stitle><date>2014-08-01</date><risdate>2014</risdate><volume>24</volume><issue>8</issue><spage>1277</spage><epage>1287</epage><pages>1277-1287</pages><issn>1051-8215</issn><eissn>1558-2205</eissn><coden>ITCTEM</coden><abstract>Although a large number of methods for video text detection and recognition have been proposed over the past years, it is hard to find the best state-of-the-art method because of nonavailability of standard datasets, ground truth, and common evaluation measures. Therefore, in this paper, we propose a semiautomatic system for ground truth generation for video text detection and recognition, which includes English and Chinese text of different orientation. The system has a facility to allow the user to manually correct the ground truth if the automatic method produces incorrect results. We propose eleven attributes at the word level, namely: line index, word index, coordinate values of bounding box, area, content, script type, orientation information, type of text (caption/scene), condition of text (distortion/distortion free), start frame, and end frame to evaluate the performance of the method. We also introduce a new dataset that consists of 466 video frames collected from TRECVID 2005 and 2006 databases. The video frames in our dataset contain both horizontal texts (278 frames: 181 with English texts and 97 with Chinese texts) and nonhorizontal texts (188 frames: 140 English and 48 Chinese). Furthermore, the performance of the proposed system is compared with existing text detection methods by calculating measures manually and automatically to show usefulness of our semiautomatic system. The ground truth and the semiautomatic system will be released to the public.</abstract><cop>New York, NY</cop><pub>IEEE</pub><doi>10.1109/TCSVT.2014.2305515</doi><tpages>11</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1051-8215
ispartof IEEE transactions on circuits and systems for video technology, 2014-08, Vol.24 (8), p.1277-1287
issn 1051-8215
1558-2205
language eng
recordid cdi_ieee_primary_6739120
source IEEE Xplore (Online service)
subjects Accuracy
Applied sciences
Automation
Chinese Video text recognition
Distortion
Exact sciences and technology
Frames
Graphics
Ground truth
Ground truthing
Indexes
Information, signal and communications theory
Optical character recognition software
Orientation
Pattern recognition
Performance indices
Recognition
Signal processing
Telecommunications and information theory
Text recognition
Texts
Video text detection
Video text recognition
title Semiautomatic Ground Truth Generation for Text Detection and Recognition in Video Images
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T16%3A39%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Semiautomatic%20Ground%20Truth%20Generation%20for%20Text%20Detection%20and%20Recognition%20in%20Video%20Images&rft.jtitle=IEEE%20transactions%20on%20circuits%20and%20systems%20for%20video%20technology&rft.au=Trung%20Quy%20Phan&rft.date=2014-08-01&rft.volume=24&rft.issue=8&rft.spage=1277&rft.epage=1287&rft.pages=1277-1287&rft.issn=1051-8215&rft.eissn=1558-2205&rft.coden=ITCTEM&rft_id=info:doi/10.1109/TCSVT.2014.2305515&rft_dat=%3Cproquest_ieee_%3E3395811121%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c358t-972f35d3a18534deb69727a50371bf6d2adfc9198a456806790f9657ad5e3faf3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1551806404&rft_id=info:pmid/&rft_ieee_id=6739120&rfr_iscdi=true