Loading…

A Two-Stage Approach for Word Spotting in Graphical Documents

Presence of multi-oriented characters, connected characters with graphical lines, intersection of text and symbols with graphical lines/curves etc. are very common in graphical documents. As a result word spotting in graphical documents is still a challenging task that we try to solve (partially) in...

Full description

Saved in:
Bibliographic Details
Main Authors: Tarafdar, Arundhati, Pal, Umapada, Roy, Partha Pratim, Ragot, Nicolas, Ramel, Jean-Yves
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 323
container_issue
container_start_page 319
container_title
container_volume
creator Tarafdar, Arundhati
Pal, Umapada
Roy, Partha Pratim
Ragot, Nicolas
Ramel, Jean-Yves
description Presence of multi-oriented characters, connected characters with graphical lines, intersection of text and symbols with graphical lines/curves etc. are very common in graphical documents. As a result word spotting in graphical documents is still a challenging task that we try to solve (partially) in this paper. The proposed approach proceeds in two stages. In the first stage, recognition of isolated components is done using rotation invariant features and an SVM classifier. The characters having good recognition score and match in the query string are first selected for initial spotting. Because of structural complexity of graphical documents as well as of touching components, we may miss some of the query characters during initial spotting in some documents. In that case, based on the position, size and orientation of the recognized characters in the input document image, regions where missing characters may be located (candidate regions) are defined. In the second stage, Scale Invariant Feature Transform (SIFT) is used to find those missing characters in the candidate regions for possible spotting. Finally, using the position, size, orientation as well as intercharacter gap information of the recognized components, spotting is validated. Experimental results demonstrate that the method is efficient to locate a query word in multi-oriented and/or touching graphical documents.
doi_str_mv 10.1109/ICDAR.2013.71
format conference_proceeding
fullrecord <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_6628636</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6628636</ieee_id><sourcerecordid>6628636</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-df42cbe417e6fd10dd906e06c9fda624531722a034946d5b210ccf7d115391033</originalsourceid><addsrcrecordid>eNotj8tKw0AUQEdRsNYuXbmZH5h47zw7CxchrbVQEGzFZZnOo420SZhExL9X0dXZHA4cQm4RCkSw98tqVr4UHFAUBs_IxJopGG2VtNbqczLiwljGUcIFGaHiwJTQ4opc9_07AP5KI_JQ0s1ny9aD20dadl1unT_Q1Gb61uZA1107DHWzp3VDF9l1h9q7I521_uMUm6G_IZfJHfs4-eeYvD7ON9UTWz0vllW5YjUaNbCQJPe7KNFEnQJCCBZ0BO1tCk5zqQQazh0IaaUOascRvE8mICphEYQYk7u_bh1j3Ha5Prn8tdWaT_XP0jd6NEg5</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>A Two-Stage Approach for Word Spotting in Graphical Documents</title><source>IEEE Xplore All Conference Series</source><creator>Tarafdar, Arundhati ; Pal, Umapada ; Roy, Partha Pratim ; Ragot, Nicolas ; Ramel, Jean-Yves</creator><creatorcontrib>Tarafdar, Arundhati ; Pal, Umapada ; Roy, Partha Pratim ; Ragot, Nicolas ; Ramel, Jean-Yves</creatorcontrib><description>Presence of multi-oriented characters, connected characters with graphical lines, intersection of text and symbols with graphical lines/curves etc. are very common in graphical documents. As a result word spotting in graphical documents is still a challenging task that we try to solve (partially) in this paper. The proposed approach proceeds in two stages. In the first stage, recognition of isolated components is done using rotation invariant features and an SVM classifier. The characters having good recognition score and match in the query string are first selected for initial spotting. Because of structural complexity of graphical documents as well as of touching components, we may miss some of the query characters during initial spotting in some documents. In that case, based on the position, size and orientation of the recognized characters in the input document image, regions where missing characters may be located (candidate regions) are defined. In the second stage, Scale Invariant Feature Transform (SIFT) is used to find those missing characters in the candidate regions for possible spotting. Finally, using the position, size, orientation as well as intercharacter gap information of the recognized components, spotting is validated. Experimental results demonstrate that the method is efficient to locate a query word in multi-oriented and/or touching graphical documents.</description><identifier>ISSN: 1520-5363</identifier><identifier>EISSN: 2379-2140</identifier><identifier>EISBN: 9780769549996</identifier><identifier>EISBN: 0769549993</identifier><identifier>DOI: 10.1109/ICDAR.2013.71</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>IEEE</publisher><subject>Character recognition ; Document Image Analysis ; Feature extraction ; Graphical documents ; Information Retrieval ; Shape ; SIFT ; Support vector machines ; Text analysis ; Text recognition ; Word Spotting</subject><ispartof>2013 12th International Conference on Document Analysis and Recognition, 2013, p.319-323</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6628636$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6628636$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Tarafdar, Arundhati</creatorcontrib><creatorcontrib>Pal, Umapada</creatorcontrib><creatorcontrib>Roy, Partha Pratim</creatorcontrib><creatorcontrib>Ragot, Nicolas</creatorcontrib><creatorcontrib>Ramel, Jean-Yves</creatorcontrib><title>A Two-Stage Approach for Word Spotting in Graphical Documents</title><title>2013 12th International Conference on Document Analysis and Recognition</title><addtitle>icdar</addtitle><description>Presence of multi-oriented characters, connected characters with graphical lines, intersection of text and symbols with graphical lines/curves etc. are very common in graphical documents. As a result word spotting in graphical documents is still a challenging task that we try to solve (partially) in this paper. The proposed approach proceeds in two stages. In the first stage, recognition of isolated components is done using rotation invariant features and an SVM classifier. The characters having good recognition score and match in the query string are first selected for initial spotting. Because of structural complexity of graphical documents as well as of touching components, we may miss some of the query characters during initial spotting in some documents. In that case, based on the position, size and orientation of the recognized characters in the input document image, regions where missing characters may be located (candidate regions) are defined. In the second stage, Scale Invariant Feature Transform (SIFT) is used to find those missing characters in the candidate regions for possible spotting. Finally, using the position, size, orientation as well as intercharacter gap information of the recognized components, spotting is validated. Experimental results demonstrate that the method is efficient to locate a query word in multi-oriented and/or touching graphical documents.</description><subject>Character recognition</subject><subject>Document Image Analysis</subject><subject>Feature extraction</subject><subject>Graphical documents</subject><subject>Information Retrieval</subject><subject>Shape</subject><subject>SIFT</subject><subject>Support vector machines</subject><subject>Text analysis</subject><subject>Text recognition</subject><subject>Word Spotting</subject><issn>1520-5363</issn><issn>2379-2140</issn><isbn>9780769549996</isbn><isbn>0769549993</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2013</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotj8tKw0AUQEdRsNYuXbmZH5h47zw7CxchrbVQEGzFZZnOo420SZhExL9X0dXZHA4cQm4RCkSw98tqVr4UHFAUBs_IxJopGG2VtNbqczLiwljGUcIFGaHiwJTQ4opc9_07AP5KI_JQ0s1ny9aD20dadl1unT_Q1Gb61uZA1107DHWzp3VDF9l1h9q7I521_uMUm6G_IZfJHfs4-eeYvD7ON9UTWz0vllW5YjUaNbCQJPe7KNFEnQJCCBZ0BO1tCk5zqQQazh0IaaUOascRvE8mICphEYQYk7u_bh1j3Ha5Prn8tdWaT_XP0jd6NEg5</recordid><startdate>201308</startdate><enddate>201308</enddate><creator>Tarafdar, Arundhati</creator><creator>Pal, Umapada</creator><creator>Roy, Partha Pratim</creator><creator>Ragot, Nicolas</creator><creator>Ramel, Jean-Yves</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>201308</creationdate><title>A Two-Stage Approach for Word Spotting in Graphical Documents</title><author>Tarafdar, Arundhati ; Pal, Umapada ; Roy, Partha Pratim ; Ragot, Nicolas ; Ramel, Jean-Yves</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-df42cbe417e6fd10dd906e06c9fda624531722a034946d5b210ccf7d115391033</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Character recognition</topic><topic>Document Image Analysis</topic><topic>Feature extraction</topic><topic>Graphical documents</topic><topic>Information Retrieval</topic><topic>Shape</topic><topic>SIFT</topic><topic>Support vector machines</topic><topic>Text analysis</topic><topic>Text recognition</topic><topic>Word Spotting</topic><toplevel>online_resources</toplevel><creatorcontrib>Tarafdar, Arundhati</creatorcontrib><creatorcontrib>Pal, Umapada</creatorcontrib><creatorcontrib>Roy, Partha Pratim</creatorcontrib><creatorcontrib>Ragot, Nicolas</creatorcontrib><creatorcontrib>Ramel, Jean-Yves</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE/IET Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Tarafdar, Arundhati</au><au>Pal, Umapada</au><au>Roy, Partha Pratim</au><au>Ragot, Nicolas</au><au>Ramel, Jean-Yves</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>A Two-Stage Approach for Word Spotting in Graphical Documents</atitle><btitle>2013 12th International Conference on Document Analysis and Recognition</btitle><stitle>icdar</stitle><date>2013-08</date><risdate>2013</risdate><spage>319</spage><epage>323</epage><pages>319-323</pages><issn>1520-5363</issn><eissn>2379-2140</eissn><eisbn>9780769549996</eisbn><eisbn>0769549993</eisbn><coden>IEEPAD</coden><abstract>Presence of multi-oriented characters, connected characters with graphical lines, intersection of text and symbols with graphical lines/curves etc. are very common in graphical documents. As a result word spotting in graphical documents is still a challenging task that we try to solve (partially) in this paper. The proposed approach proceeds in two stages. In the first stage, recognition of isolated components is done using rotation invariant features and an SVM classifier. The characters having good recognition score and match in the query string are first selected for initial spotting. Because of structural complexity of graphical documents as well as of touching components, we may miss some of the query characters during initial spotting in some documents. In that case, based on the position, size and orientation of the recognized characters in the input document image, regions where missing characters may be located (candidate regions) are defined. In the second stage, Scale Invariant Feature Transform (SIFT) is used to find those missing characters in the candidate regions for possible spotting. Finally, using the position, size, orientation as well as intercharacter gap information of the recognized components, spotting is validated. Experimental results demonstrate that the method is efficient to locate a query word in multi-oriented and/or touching graphical documents.</abstract><pub>IEEE</pub><doi>10.1109/ICDAR.2013.71</doi><tpages>5</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1520-5363
ispartof 2013 12th International Conference on Document Analysis and Recognition, 2013, p.319-323
issn 1520-5363
2379-2140
language eng
recordid cdi_ieee_primary_6628636
source IEEE Xplore All Conference Series
subjects Character recognition
Document Image Analysis
Feature extraction
Graphical documents
Information Retrieval
Shape
SIFT
Support vector machines
Text analysis
Text recognition
Word Spotting
title A Two-Stage Approach for Word Spotting in Graphical Documents
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T19%3A03%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=A%20Two-Stage%20Approach%20for%20Word%20Spotting%20in%20Graphical%20Documents&rft.btitle=2013%2012th%20International%20Conference%20on%20Document%20Analysis%20and%20Recognition&rft.au=Tarafdar,%20Arundhati&rft.date=2013-08&rft.spage=319&rft.epage=323&rft.pages=319-323&rft.issn=1520-5363&rft.eissn=2379-2140&rft.coden=IEEPAD&rft_id=info:doi/10.1109/ICDAR.2013.71&rft.eisbn=9780769549996&rft.eisbn_list=0769549993&rft_dat=%3Cieee_CHZPO%3E6628636%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i175t-df42cbe417e6fd10dd906e06c9fda624531722a034946d5b210ccf7d115391033%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6628636&rfr_iscdi=true