Loading…

Multi scale mirror connection based encoder decoder network for text localization

•An encoder decoder architecture with mirror skip connections for text localization•Linear, parametric, and convolutional mirror skip connections have been implemented.•Three models with different kernel sizes ensembled to capture multi scale features.•Proposed model beats state of the art pixel lev...

Full description

Saved in:
Bibliographic Details
Published in:Pattern recognition letters 2020-07, Vol.135, p.64-71
Main Authors: Dutta, Kalpita, Bal, Malyaban, Basak, Arpita, Ghosh, Swarnendu, Das, Nibaran, Kundu, Mahantapas, Nasipuri, Mita
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c334t-a1741a3dd73fbbce9318c7eae4c1207564719da7a32fe5b6e82cfcb46f7b9c053
cites cdi_FETCH-LOGICAL-c334t-a1741a3dd73fbbce9318c7eae4c1207564719da7a32fe5b6e82cfcb46f7b9c053
container_end_page 71
container_issue
container_start_page 64
container_title Pattern recognition letters
container_volume 135
creator Dutta, Kalpita
Bal, Malyaban
Basak, Arpita
Ghosh, Swarnendu
Das, Nibaran
Kundu, Mahantapas
Nasipuri, Mita
description •An encoder decoder architecture with mirror skip connections for text localization•Linear, parametric, and convolutional mirror skip connections have been implemented.•Three models with different kernel sizes ensembled to capture multi scale features.•Proposed model beats state of the art pixel level classifiers.•Datasets used: ICDAR 2003, 2015, SVT and Total Text. Encoder decoder models with multi-scale feature concatenations have become ubiquitous for various natural scene segmentation tasks. In the current approach, a similar model with an improved mirror connection from encoders to decoder has been proposed. Three different types of mirror connections, namely, linear, parametric and convolutional, have been demonstrated in the proposed work. We have also implemented the use of internal skips to facilitate better gradient propagation within the encoder-decoder architecture. The proposed model also consists of an ensemble module that combines outputs from models with different kernel sizes, such as, 3 × 3, 5 × 5 and 7 × 7 to combine multi-scale features for efficient detections. The model was tested on the ICDAR 2003, SVT, ICDAR 2015 and the Total-Text dataset where it proved to be superior to other state of the art encoder-decoder architectures for pixel level classification.
doi_str_mv 10.1016/j.patrec.2020.04.002
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2438723780</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0167865520301227</els_id><sourcerecordid>2438723780</sourcerecordid><originalsourceid>FETCH-LOGICAL-c334t-a1741a3dd73fbbce9318c7eae4c1207564719da7a32fe5b6e82cfcb46f7b9c053</originalsourceid><addsrcrecordid>eNp9kEtLxDAUhYMoOI7-AxcB1615tWk3ggy-YEQEXYc0uYXUTjMmGV-_3gx17epsvnMu90PonJKSElpfDuVWpwCmZISRkoiSEHaAFrSRrJBciEO0yJgsmrqqjtFJjAMhpOZts0DPj7sxORyNHgFvXAg-YOOnCUxyfsKdjmAxTMZbCNjCnBOkTx_ecJ_hBF8Jjz733Y_ed07RUa_HCGd_uUSvtzcvq_ti_XT3sLpeF4ZzkQpNpaCaWyt533UGWk4bI0GDMJQRWdVC0tZqqTnroepqaJjpTSfqXnatIRVfoot5dxv8-w5iUoPfhSmfVEzw_DqXDcmUmCkTfIwBerUNbqPDt6JE7eWpQc3y1F6eIkJlebl2Ndcgf_DhIKhoXNYA1mU0Kevd_wO_4BN7Tg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2438723780</pqid></control><display><type>article</type><title>Multi scale mirror connection based encoder decoder network for text localization</title><source>ScienceDirect Journals</source><creator>Dutta, Kalpita ; Bal, Malyaban ; Basak, Arpita ; Ghosh, Swarnendu ; Das, Nibaran ; Kundu, Mahantapas ; Nasipuri, Mita</creator><creatorcontrib>Dutta, Kalpita ; Bal, Malyaban ; Basak, Arpita ; Ghosh, Swarnendu ; Das, Nibaran ; Kundu, Mahantapas ; Nasipuri, Mita</creatorcontrib><description>•An encoder decoder architecture with mirror skip connections for text localization•Linear, parametric, and convolutional mirror skip connections have been implemented.•Three models with different kernel sizes ensembled to capture multi scale features.•Proposed model beats state of the art pixel level classifiers.•Datasets used: ICDAR 2003, 2015, SVT and Total Text. Encoder decoder models with multi-scale feature concatenations have become ubiquitous for various natural scene segmentation tasks. In the current approach, a similar model with an improved mirror connection from encoders to decoder has been proposed. Three different types of mirror connections, namely, linear, parametric and convolutional, have been demonstrated in the proposed work. We have also implemented the use of internal skips to facilitate better gradient propagation within the encoder-decoder architecture. The proposed model also consists of an ensemble module that combines outputs from models with different kernel sizes, such as, 3 × 3, 5 × 5 and 7 × 7 to combine multi-scale features for efficient detections. The model was tested on the ICDAR 2003, SVT, ICDAR 2015 and the Total-Text dataset where it proved to be superior to other state of the art encoder-decoder architectures for pixel level classification.</description><identifier>ISSN: 0167-8655</identifier><identifier>EISSN: 1872-7344</identifier><identifier>DOI: 10.1016/j.patrec.2020.04.002</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Coders ; Encoder-Decoder ; Encoders-Decoders ; Localization ; Mirror skip ; Model testing ; Scene image ; Segmentation ; Skips</subject><ispartof>Pattern recognition letters, 2020-07, Vol.135, p.64-71</ispartof><rights>2020 Elsevier B.V.</rights><rights>Copyright Elsevier Science Ltd. Jul 2020</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c334t-a1741a3dd73fbbce9318c7eae4c1207564719da7a32fe5b6e82cfcb46f7b9c053</citedby><cites>FETCH-LOGICAL-c334t-a1741a3dd73fbbce9318c7eae4c1207564719da7a32fe5b6e82cfcb46f7b9c053</cites><orcidid>0000-0002-2426-9915</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail></links><search><creatorcontrib>Dutta, Kalpita</creatorcontrib><creatorcontrib>Bal, Malyaban</creatorcontrib><creatorcontrib>Basak, Arpita</creatorcontrib><creatorcontrib>Ghosh, Swarnendu</creatorcontrib><creatorcontrib>Das, Nibaran</creatorcontrib><creatorcontrib>Kundu, Mahantapas</creatorcontrib><creatorcontrib>Nasipuri, Mita</creatorcontrib><title>Multi scale mirror connection based encoder decoder network for text localization</title><title>Pattern recognition letters</title><description>•An encoder decoder architecture with mirror skip connections for text localization•Linear, parametric, and convolutional mirror skip connections have been implemented.•Three models with different kernel sizes ensembled to capture multi scale features.•Proposed model beats state of the art pixel level classifiers.•Datasets used: ICDAR 2003, 2015, SVT and Total Text. Encoder decoder models with multi-scale feature concatenations have become ubiquitous for various natural scene segmentation tasks. In the current approach, a similar model with an improved mirror connection from encoders to decoder has been proposed. Three different types of mirror connections, namely, linear, parametric and convolutional, have been demonstrated in the proposed work. We have also implemented the use of internal skips to facilitate better gradient propagation within the encoder-decoder architecture. The proposed model also consists of an ensemble module that combines outputs from models with different kernel sizes, such as, 3 × 3, 5 × 5 and 7 × 7 to combine multi-scale features for efficient detections. The model was tested on the ICDAR 2003, SVT, ICDAR 2015 and the Total-Text dataset where it proved to be superior to other state of the art encoder-decoder architectures for pixel level classification.</description><subject>Coders</subject><subject>Encoder-Decoder</subject><subject>Encoders-Decoders</subject><subject>Localization</subject><subject>Mirror skip</subject><subject>Model testing</subject><subject>Scene image</subject><subject>Segmentation</subject><subject>Skips</subject><issn>0167-8655</issn><issn>1872-7344</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><recordid>eNp9kEtLxDAUhYMoOI7-AxcB1615tWk3ggy-YEQEXYc0uYXUTjMmGV-_3gx17epsvnMu90PonJKSElpfDuVWpwCmZISRkoiSEHaAFrSRrJBciEO0yJgsmrqqjtFJjAMhpOZts0DPj7sxORyNHgFvXAg-YOOnCUxyfsKdjmAxTMZbCNjCnBOkTx_ecJ_hBF8Jjz733Y_ed07RUa_HCGd_uUSvtzcvq_ti_XT3sLpeF4ZzkQpNpaCaWyt533UGWk4bI0GDMJQRWdVC0tZqqTnroepqaJjpTSfqXnatIRVfoot5dxv8-w5iUoPfhSmfVEzw_DqXDcmUmCkTfIwBerUNbqPDt6JE7eWpQc3y1F6eIkJlebl2Ndcgf_DhIKhoXNYA1mU0Kevd_wO_4BN7Tg</recordid><startdate>202007</startdate><enddate>202007</enddate><creator>Dutta, Kalpita</creator><creator>Bal, Malyaban</creator><creator>Basak, Arpita</creator><creator>Ghosh, Swarnendu</creator><creator>Das, Nibaran</creator><creator>Kundu, Mahantapas</creator><creator>Nasipuri, Mita</creator><general>Elsevier B.V</general><general>Elsevier Science Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7TK</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-2426-9915</orcidid></search><sort><creationdate>202007</creationdate><title>Multi scale mirror connection based encoder decoder network for text localization</title><author>Dutta, Kalpita ; Bal, Malyaban ; Basak, Arpita ; Ghosh, Swarnendu ; Das, Nibaran ; Kundu, Mahantapas ; Nasipuri, Mita</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c334t-a1741a3dd73fbbce9318c7eae4c1207564719da7a32fe5b6e82cfcb46f7b9c053</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Coders</topic><topic>Encoder-Decoder</topic><topic>Encoders-Decoders</topic><topic>Localization</topic><topic>Mirror skip</topic><topic>Model testing</topic><topic>Scene image</topic><topic>Segmentation</topic><topic>Skips</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Dutta, Kalpita</creatorcontrib><creatorcontrib>Bal, Malyaban</creatorcontrib><creatorcontrib>Basak, Arpita</creatorcontrib><creatorcontrib>Ghosh, Swarnendu</creatorcontrib><creatorcontrib>Das, Nibaran</creatorcontrib><creatorcontrib>Kundu, Mahantapas</creatorcontrib><creatorcontrib>Nasipuri, Mita</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Pattern recognition letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Dutta, Kalpita</au><au>Bal, Malyaban</au><au>Basak, Arpita</au><au>Ghosh, Swarnendu</au><au>Das, Nibaran</au><au>Kundu, Mahantapas</au><au>Nasipuri, Mita</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multi scale mirror connection based encoder decoder network for text localization</atitle><jtitle>Pattern recognition letters</jtitle><date>2020-07</date><risdate>2020</risdate><volume>135</volume><spage>64</spage><epage>71</epage><pages>64-71</pages><issn>0167-8655</issn><eissn>1872-7344</eissn><abstract>•An encoder decoder architecture with mirror skip connections for text localization•Linear, parametric, and convolutional mirror skip connections have been implemented.•Three models with different kernel sizes ensembled to capture multi scale features.•Proposed model beats state of the art pixel level classifiers.•Datasets used: ICDAR 2003, 2015, SVT and Total Text. Encoder decoder models with multi-scale feature concatenations have become ubiquitous for various natural scene segmentation tasks. In the current approach, a similar model with an improved mirror connection from encoders to decoder has been proposed. Three different types of mirror connections, namely, linear, parametric and convolutional, have been demonstrated in the proposed work. We have also implemented the use of internal skips to facilitate better gradient propagation within the encoder-decoder architecture. The proposed model also consists of an ensemble module that combines outputs from models with different kernel sizes, such as, 3 × 3, 5 × 5 and 7 × 7 to combine multi-scale features for efficient detections. The model was tested on the ICDAR 2003, SVT, ICDAR 2015 and the Total-Text dataset where it proved to be superior to other state of the art encoder-decoder architectures for pixel level classification.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.patrec.2020.04.002</doi><tpages>8</tpages><orcidid>https://orcid.org/0000-0002-2426-9915</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0167-8655
ispartof Pattern recognition letters, 2020-07, Vol.135, p.64-71
issn 0167-8655
1872-7344
language eng
recordid cdi_proquest_journals_2438723780
source ScienceDirect Journals
subjects Coders
Encoder-Decoder
Encoders-Decoders
Localization
Mirror skip
Model testing
Scene image
Segmentation
Skips
title Multi scale mirror connection based encoder decoder network for text localization
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-03-06T03%3A57%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multi%20scale%20mirror%20connection%20based%20encoder%20decoder%20network%20for%20text%20localization&rft.jtitle=Pattern%20recognition%20letters&rft.au=Dutta,%20Kalpita&rft.date=2020-07&rft.volume=135&rft.spage=64&rft.epage=71&rft.pages=64-71&rft.issn=0167-8655&rft.eissn=1872-7344&rft_id=info:doi/10.1016/j.patrec.2020.04.002&rft_dat=%3Cproquest_cross%3E2438723780%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c334t-a1741a3dd73fbbce9318c7eae4c1207564719da7a32fe5b6e82cfcb46f7b9c053%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2438723780&rft_id=info:pmid/&rfr_iscdi=true