Loading…
Semiautomatic Ground Truth Generation for Text Detection and Recognition in Video Images
Although a large number of methods for video text detection and recognition have been proposed over the past years, it is hard to find the best state-of-the-art method because of nonavailability of standard datasets, ground truth, and common evaluation measures. Therefore, in this paper, we propose...
Saved in:
Published in: | IEEE transactions on circuits and systems for video technology 2014-08, Vol.24 (8), p.1277-1287 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c358t-972f35d3a18534deb69727a50371bf6d2adfc9198a456806790f9657ad5e3faf3 |
---|---|
cites | cdi_FETCH-LOGICAL-c358t-972f35d3a18534deb69727a50371bf6d2adfc9198a456806790f9657ad5e3faf3 |
container_end_page | 1287 |
container_issue | 8 |
container_start_page | 1277 |
container_title | IEEE transactions on circuits and systems for video technology |
container_volume | 24 |
creator | Trung Quy Phan Shivakumara, Palaiahnakote Bhowmick, Souvik Shimiao Li Chew Lim Tan Pal, Umapada |
description | Although a large number of methods for video text detection and recognition have been proposed over the past years, it is hard to find the best state-of-the-art method because of nonavailability of standard datasets, ground truth, and common evaluation measures. Therefore, in this paper, we propose a semiautomatic system for ground truth generation for video text detection and recognition, which includes English and Chinese text of different orientation. The system has a facility to allow the user to manually correct the ground truth if the automatic method produces incorrect results. We propose eleven attributes at the word level, namely: line index, word index, coordinate values of bounding box, area, content, script type, orientation information, type of text (caption/scene), condition of text (distortion/distortion free), start frame, and end frame to evaluate the performance of the method. We also introduce a new dataset that consists of 466 video frames collected from TRECVID 2005 and 2006 databases. The video frames in our dataset contain both horizontal texts (278 frames: 181 with English texts and 97 with Chinese texts) and nonhorizontal texts (188 frames: 140 English and 48 Chinese). Furthermore, the performance of the proposed system is compared with existing text detection methods by calculating measures manually and automatically to show usefulness of our semiautomatic system. The ground truth and the semiautomatic system will be released to the public. |
doi_str_mv | 10.1109/TCSVT.2014.2305515 |
format | article |
fullrecord | <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_6739120</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6739120</ieee_id><sourcerecordid>3395811121</sourcerecordid><originalsourceid>FETCH-LOGICAL-c358t-972f35d3a18534deb69727a50371bf6d2adfc9198a456806790f9657ad5e3faf3</originalsourceid><addsrcrecordid>eNpdkFtLxDAQhYsoeP0D-lIQwZeuuXSa5lFWXRcWBK3iW4npRLNsmzVpQf-92Qv74NPMnPnmMJwkOadkRCmRN9X45a0aMULzEeMEgMJeckQByowxAvuxJ0CzklE4TI5DmJNIlrk4St5fsLVq6F2reqvTiXdD16SVH_qvdIId-ii7LjXOpxX-9Okd9qjXkorcM2r32dn1bLv0zTbo0mmrPjGcJgdGLQKebetJ8vpwX40fs9nTZDq-nWWaQ9lnUjDDoeGKlsDzBj-KqAgFhAv6YYqGqcZoSWWpcihKUghJjCxAqAaQG2X4SXK98V169z1g6OvWBo2LherQDaGmUAhSSCryiF7-Q-du8F38LlJAo3tOVhTbUNq7EDyaeultq_xvTUm9Crteh12vwq63Ycejq621ClotjFedtmF3yUqRCwIychcbziLibl0ILikj_A_rg4cj</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1551806404</pqid></control><display><type>article</type><title>Semiautomatic Ground Truth Generation for Text Detection and Recognition in Video Images</title><source>IEEE Xplore (Online service)</source><creator>Trung Quy Phan ; Shivakumara, Palaiahnakote ; Bhowmick, Souvik ; Shimiao Li ; Chew Lim Tan ; Pal, Umapada</creator><creatorcontrib>Trung Quy Phan ; Shivakumara, Palaiahnakote ; Bhowmick, Souvik ; Shimiao Li ; Chew Lim Tan ; Pal, Umapada</creatorcontrib><description>Although a large number of methods for video text detection and recognition have been proposed over the past years, it is hard to find the best state-of-the-art method because of nonavailability of standard datasets, ground truth, and common evaluation measures. Therefore, in this paper, we propose a semiautomatic system for ground truth generation for video text detection and recognition, which includes English and Chinese text of different orientation. The system has a facility to allow the user to manually correct the ground truth if the automatic method produces incorrect results. We propose eleven attributes at the word level, namely: line index, word index, coordinate values of bounding box, area, content, script type, orientation information, type of text (caption/scene), condition of text (distortion/distortion free), start frame, and end frame to evaluate the performance of the method. We also introduce a new dataset that consists of 466 video frames collected from TRECVID 2005 and 2006 databases. The video frames in our dataset contain both horizontal texts (278 frames: 181 with English texts and 97 with Chinese texts) and nonhorizontal texts (188 frames: 140 English and 48 Chinese). Furthermore, the performance of the proposed system is compared with existing text detection methods by calculating measures manually and automatically to show usefulness of our semiautomatic system. The ground truth and the semiautomatic system will be released to the public.</description><identifier>ISSN: 1051-8215</identifier><identifier>EISSN: 1558-2205</identifier><identifier>DOI: 10.1109/TCSVT.2014.2305515</identifier><identifier>CODEN: ITCTEM</identifier><language>eng</language><publisher>New York, NY: IEEE</publisher><subject>Accuracy ; Applied sciences ; Automation ; Chinese Video text recognition ; Distortion ; Exact sciences and technology ; Frames ; Graphics ; Ground truth ; Ground truthing ; Indexes ; Information, signal and communications theory ; Optical character recognition software ; Orientation ; Pattern recognition ; Performance indices ; Recognition ; Signal processing ; Telecommunications and information theory ; Text recognition ; Texts ; Video text detection ; Video text recognition</subject><ispartof>IEEE transactions on circuits and systems for video technology, 2014-08, Vol.24 (8), p.1277-1287</ispartof><rights>2015 INIST-CNRS</rights><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Aug 2014</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c358t-972f35d3a18534deb69727a50371bf6d2adfc9198a456806790f9657ad5e3faf3</citedby><cites>FETCH-LOGICAL-c358t-972f35d3a18534deb69727a50371bf6d2adfc9198a456806790f9657ad5e3faf3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6739120$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,27903,27904,54775</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=28747059$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Trung Quy Phan</creatorcontrib><creatorcontrib>Shivakumara, Palaiahnakote</creatorcontrib><creatorcontrib>Bhowmick, Souvik</creatorcontrib><creatorcontrib>Shimiao Li</creatorcontrib><creatorcontrib>Chew Lim Tan</creatorcontrib><creatorcontrib>Pal, Umapada</creatorcontrib><title>Semiautomatic Ground Truth Generation for Text Detection and Recognition in Video Images</title><title>IEEE transactions on circuits and systems for video technology</title><addtitle>TCSVT</addtitle><description>Although a large number of methods for video text detection and recognition have been proposed over the past years, it is hard to find the best state-of-the-art method because of nonavailability of standard datasets, ground truth, and common evaluation measures. Therefore, in this paper, we propose a semiautomatic system for ground truth generation for video text detection and recognition, which includes English and Chinese text of different orientation. The system has a facility to allow the user to manually correct the ground truth if the automatic method produces incorrect results. We propose eleven attributes at the word level, namely: line index, word index, coordinate values of bounding box, area, content, script type, orientation information, type of text (caption/scene), condition of text (distortion/distortion free), start frame, and end frame to evaluate the performance of the method. We also introduce a new dataset that consists of 466 video frames collected from TRECVID 2005 and 2006 databases. The video frames in our dataset contain both horizontal texts (278 frames: 181 with English texts and 97 with Chinese texts) and nonhorizontal texts (188 frames: 140 English and 48 Chinese). Furthermore, the performance of the proposed system is compared with existing text detection methods by calculating measures manually and automatically to show usefulness of our semiautomatic system. The ground truth and the semiautomatic system will be released to the public.</description><subject>Accuracy</subject><subject>Applied sciences</subject><subject>Automation</subject><subject>Chinese Video text recognition</subject><subject>Distortion</subject><subject>Exact sciences and technology</subject><subject>Frames</subject><subject>Graphics</subject><subject>Ground truth</subject><subject>Ground truthing</subject><subject>Indexes</subject><subject>Information, signal and communications theory</subject><subject>Optical character recognition software</subject><subject>Orientation</subject><subject>Pattern recognition</subject><subject>Performance indices</subject><subject>Recognition</subject><subject>Signal processing</subject><subject>Telecommunications and information theory</subject><subject>Text recognition</subject><subject>Texts</subject><subject>Video text detection</subject><subject>Video text recognition</subject><issn>1051-8215</issn><issn>1558-2205</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><recordid>eNpdkFtLxDAQhYsoeP0D-lIQwZeuuXSa5lFWXRcWBK3iW4npRLNsmzVpQf-92Qv74NPMnPnmMJwkOadkRCmRN9X45a0aMULzEeMEgMJeckQByowxAvuxJ0CzklE4TI5DmJNIlrk4St5fsLVq6F2reqvTiXdD16SVH_qvdIId-ii7LjXOpxX-9Okd9qjXkorcM2r32dn1bLv0zTbo0mmrPjGcJgdGLQKebetJ8vpwX40fs9nTZDq-nWWaQ9lnUjDDoeGKlsDzBj-KqAgFhAv6YYqGqcZoSWWpcihKUghJjCxAqAaQG2X4SXK98V169z1g6OvWBo2LherQDaGmUAhSSCryiF7-Q-du8F38LlJAo3tOVhTbUNq7EDyaeultq_xvTUm9Crteh12vwq63Ycejq621ClotjFedtmF3yUqRCwIychcbziLibl0ILikj_A_rg4cj</recordid><startdate>20140801</startdate><enddate>20140801</enddate><creator>Trung Quy Phan</creator><creator>Shivakumara, Palaiahnakote</creator><creator>Bhowmick, Souvik</creator><creator>Shimiao Li</creator><creator>Chew Lim Tan</creator><creator>Pal, Umapada</creator><general>IEEE</general><general>Institute of Electrical and Electronics Engineers</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>F28</scope><scope>FR3</scope></search><sort><creationdate>20140801</creationdate><title>Semiautomatic Ground Truth Generation for Text Detection and Recognition in Video Images</title><author>Trung Quy Phan ; Shivakumara, Palaiahnakote ; Bhowmick, Souvik ; Shimiao Li ; Chew Lim Tan ; Pal, Umapada</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c358t-972f35d3a18534deb69727a50371bf6d2adfc9198a456806790f9657ad5e3faf3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Accuracy</topic><topic>Applied sciences</topic><topic>Automation</topic><topic>Chinese Video text recognition</topic><topic>Distortion</topic><topic>Exact sciences and technology</topic><topic>Frames</topic><topic>Graphics</topic><topic>Ground truth</topic><topic>Ground truthing</topic><topic>Indexes</topic><topic>Information, signal and communications theory</topic><topic>Optical character recognition software</topic><topic>Orientation</topic><topic>Pattern recognition</topic><topic>Performance indices</topic><topic>Recognition</topic><topic>Signal processing</topic><topic>Telecommunications and information theory</topic><topic>Text recognition</topic><topic>Texts</topic><topic>Video text detection</topic><topic>Video text recognition</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Trung Quy Phan</creatorcontrib><creatorcontrib>Shivakumara, Palaiahnakote</creatorcontrib><creatorcontrib>Bhowmick, Souvik</creatorcontrib><creatorcontrib>Shimiao Li</creatorcontrib><creatorcontrib>Chew Lim Tan</creatorcontrib><creatorcontrib>Pal, Umapada</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><jtitle>IEEE transactions on circuits and systems for video technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Trung Quy Phan</au><au>Shivakumara, Palaiahnakote</au><au>Bhowmick, Souvik</au><au>Shimiao Li</au><au>Chew Lim Tan</au><au>Pal, Umapada</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Semiautomatic Ground Truth Generation for Text Detection and Recognition in Video Images</atitle><jtitle>IEEE transactions on circuits and systems for video technology</jtitle><stitle>TCSVT</stitle><date>2014-08-01</date><risdate>2014</risdate><volume>24</volume><issue>8</issue><spage>1277</spage><epage>1287</epage><pages>1277-1287</pages><issn>1051-8215</issn><eissn>1558-2205</eissn><coden>ITCTEM</coden><abstract>Although a large number of methods for video text detection and recognition have been proposed over the past years, it is hard to find the best state-of-the-art method because of nonavailability of standard datasets, ground truth, and common evaluation measures. Therefore, in this paper, we propose a semiautomatic system for ground truth generation for video text detection and recognition, which includes English and Chinese text of different orientation. The system has a facility to allow the user to manually correct the ground truth if the automatic method produces incorrect results. We propose eleven attributes at the word level, namely: line index, word index, coordinate values of bounding box, area, content, script type, orientation information, type of text (caption/scene), condition of text (distortion/distortion free), start frame, and end frame to evaluate the performance of the method. We also introduce a new dataset that consists of 466 video frames collected from TRECVID 2005 and 2006 databases. The video frames in our dataset contain both horizontal texts (278 frames: 181 with English texts and 97 with Chinese texts) and nonhorizontal texts (188 frames: 140 English and 48 Chinese). Furthermore, the performance of the proposed system is compared with existing text detection methods by calculating measures manually and automatically to show usefulness of our semiautomatic system. The ground truth and the semiautomatic system will be released to the public.</abstract><cop>New York, NY</cop><pub>IEEE</pub><doi>10.1109/TCSVT.2014.2305515</doi><tpages>11</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1051-8215 |
ispartof | IEEE transactions on circuits and systems for video technology, 2014-08, Vol.24 (8), p.1277-1287 |
issn | 1051-8215 1558-2205 |
language | eng |
recordid | cdi_ieee_primary_6739120 |
source | IEEE Xplore (Online service) |
subjects | Accuracy Applied sciences Automation Chinese Video text recognition Distortion Exact sciences and technology Frames Graphics Ground truth Ground truthing Indexes Information, signal and communications theory Optical character recognition software Orientation Pattern recognition Performance indices Recognition Signal processing Telecommunications and information theory Text recognition Texts Video text detection Video text recognition |
title | Semiautomatic Ground Truth Generation for Text Detection and Recognition in Video Images |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T16%3A39%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Semiautomatic%20Ground%20Truth%20Generation%20for%20Text%20Detection%20and%20Recognition%20in%20Video%20Images&rft.jtitle=IEEE%20transactions%20on%20circuits%20and%20systems%20for%20video%20technology&rft.au=Trung%20Quy%20Phan&rft.date=2014-08-01&rft.volume=24&rft.issue=8&rft.spage=1277&rft.epage=1287&rft.pages=1277-1287&rft.issn=1051-8215&rft.eissn=1558-2205&rft.coden=ITCTEM&rft_id=info:doi/10.1109/TCSVT.2014.2305515&rft_dat=%3Cproquest_ieee_%3E3395811121%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c358t-972f35d3a18534deb69727a50371bf6d2adfc9198a456806790f9657ad5e3faf3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1551806404&rft_id=info:pmid/&rft_ieee_id=6739120&rfr_iscdi=true |