Loading…
Multi scale mirror connection based encoder decoder network for text localization
•An encoder decoder architecture with mirror skip connections for text localization•Linear, parametric, and convolutional mirror skip connections have been implemented.•Three models with different kernel sizes ensembled to capture multi scale features.•Proposed model beats state of the art pixel lev...
Saved in:
Published in: | Pattern recognition letters 2020-07, Vol.135, p.64-71 |
---|---|
Main Authors: | , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c334t-a1741a3dd73fbbce9318c7eae4c1207564719da7a32fe5b6e82cfcb46f7b9c053 |
---|---|
cites | cdi_FETCH-LOGICAL-c334t-a1741a3dd73fbbce9318c7eae4c1207564719da7a32fe5b6e82cfcb46f7b9c053 |
container_end_page | 71 |
container_issue | |
container_start_page | 64 |
container_title | Pattern recognition letters |
container_volume | 135 |
creator | Dutta, Kalpita Bal, Malyaban Basak, Arpita Ghosh, Swarnendu Das, Nibaran Kundu, Mahantapas Nasipuri, Mita |
description | •An encoder decoder architecture with mirror skip connections for text localization•Linear, parametric, and convolutional mirror skip connections have been implemented.•Three models with different kernel sizes ensembled to capture multi scale features.•Proposed model beats state of the art pixel level classifiers.•Datasets used: ICDAR 2003, 2015, SVT and Total Text.
Encoder decoder models with multi-scale feature concatenations have become ubiquitous for various natural scene segmentation tasks. In the current approach, a similar model with an improved mirror connection from encoders to decoder has been proposed. Three different types of mirror connections, namely, linear, parametric and convolutional, have been demonstrated in the proposed work. We have also implemented the use of internal skips to facilitate better gradient propagation within the encoder-decoder architecture. The proposed model also consists of an ensemble module that combines outputs from models with different kernel sizes, such as, 3 × 3, 5 × 5 and 7 × 7 to combine multi-scale features for efficient detections. The model was tested on the ICDAR 2003, SVT, ICDAR 2015 and the Total-Text dataset where it proved to be superior to other state of the art encoder-decoder architectures for pixel level classification. |
doi_str_mv | 10.1016/j.patrec.2020.04.002 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2438723780</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0167865520301227</els_id><sourcerecordid>2438723780</sourcerecordid><originalsourceid>FETCH-LOGICAL-c334t-a1741a3dd73fbbce9318c7eae4c1207564719da7a32fe5b6e82cfcb46f7b9c053</originalsourceid><addsrcrecordid>eNp9kEtLxDAUhYMoOI7-AxcB1615tWk3ggy-YEQEXYc0uYXUTjMmGV-_3gx17epsvnMu90PonJKSElpfDuVWpwCmZISRkoiSEHaAFrSRrJBciEO0yJgsmrqqjtFJjAMhpOZts0DPj7sxORyNHgFvXAg-YOOnCUxyfsKdjmAxTMZbCNjCnBOkTx_ecJ_hBF8Jjz733Y_ed07RUa_HCGd_uUSvtzcvq_ti_XT3sLpeF4ZzkQpNpaCaWyt533UGWk4bI0GDMJQRWdVC0tZqqTnroepqaJjpTSfqXnatIRVfoot5dxv8-w5iUoPfhSmfVEzw_DqXDcmUmCkTfIwBerUNbqPDt6JE7eWpQc3y1F6eIkJlebl2Ndcgf_DhIKhoXNYA1mU0Kevd_wO_4BN7Tg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2438723780</pqid></control><display><type>article</type><title>Multi scale mirror connection based encoder decoder network for text localization</title><source>ScienceDirect Journals</source><creator>Dutta, Kalpita ; Bal, Malyaban ; Basak, Arpita ; Ghosh, Swarnendu ; Das, Nibaran ; Kundu, Mahantapas ; Nasipuri, Mita</creator><creatorcontrib>Dutta, Kalpita ; Bal, Malyaban ; Basak, Arpita ; Ghosh, Swarnendu ; Das, Nibaran ; Kundu, Mahantapas ; Nasipuri, Mita</creatorcontrib><description>•An encoder decoder architecture with mirror skip connections for text localization•Linear, parametric, and convolutional mirror skip connections have been implemented.•Three models with different kernel sizes ensembled to capture multi scale features.•Proposed model beats state of the art pixel level classifiers.•Datasets used: ICDAR 2003, 2015, SVT and Total Text.
Encoder decoder models with multi-scale feature concatenations have become ubiquitous for various natural scene segmentation tasks. In the current approach, a similar model with an improved mirror connection from encoders to decoder has been proposed. Three different types of mirror connections, namely, linear, parametric and convolutional, have been demonstrated in the proposed work. We have also implemented the use of internal skips to facilitate better gradient propagation within the encoder-decoder architecture. The proposed model also consists of an ensemble module that combines outputs from models with different kernel sizes, such as, 3 × 3, 5 × 5 and 7 × 7 to combine multi-scale features for efficient detections. The model was tested on the ICDAR 2003, SVT, ICDAR 2015 and the Total-Text dataset where it proved to be superior to other state of the art encoder-decoder architectures for pixel level classification.</description><identifier>ISSN: 0167-8655</identifier><identifier>EISSN: 1872-7344</identifier><identifier>DOI: 10.1016/j.patrec.2020.04.002</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Coders ; Encoder-Decoder ; Encoders-Decoders ; Localization ; Mirror skip ; Model testing ; Scene image ; Segmentation ; Skips</subject><ispartof>Pattern recognition letters, 2020-07, Vol.135, p.64-71</ispartof><rights>2020 Elsevier B.V.</rights><rights>Copyright Elsevier Science Ltd. Jul 2020</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c334t-a1741a3dd73fbbce9318c7eae4c1207564719da7a32fe5b6e82cfcb46f7b9c053</citedby><cites>FETCH-LOGICAL-c334t-a1741a3dd73fbbce9318c7eae4c1207564719da7a32fe5b6e82cfcb46f7b9c053</cites><orcidid>0000-0002-2426-9915</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail></links><search><creatorcontrib>Dutta, Kalpita</creatorcontrib><creatorcontrib>Bal, Malyaban</creatorcontrib><creatorcontrib>Basak, Arpita</creatorcontrib><creatorcontrib>Ghosh, Swarnendu</creatorcontrib><creatorcontrib>Das, Nibaran</creatorcontrib><creatorcontrib>Kundu, Mahantapas</creatorcontrib><creatorcontrib>Nasipuri, Mita</creatorcontrib><title>Multi scale mirror connection based encoder decoder network for text localization</title><title>Pattern recognition letters</title><description>•An encoder decoder architecture with mirror skip connections for text localization•Linear, parametric, and convolutional mirror skip connections have been implemented.•Three models with different kernel sizes ensembled to capture multi scale features.•Proposed model beats state of the art pixel level classifiers.•Datasets used: ICDAR 2003, 2015, SVT and Total Text.
Encoder decoder models with multi-scale feature concatenations have become ubiquitous for various natural scene segmentation tasks. In the current approach, a similar model with an improved mirror connection from encoders to decoder has been proposed. Three different types of mirror connections, namely, linear, parametric and convolutional, have been demonstrated in the proposed work. We have also implemented the use of internal skips to facilitate better gradient propagation within the encoder-decoder architecture. The proposed model also consists of an ensemble module that combines outputs from models with different kernel sizes, such as, 3 × 3, 5 × 5 and 7 × 7 to combine multi-scale features for efficient detections. The model was tested on the ICDAR 2003, SVT, ICDAR 2015 and the Total-Text dataset where it proved to be superior to other state of the art encoder-decoder architectures for pixel level classification.</description><subject>Coders</subject><subject>Encoder-Decoder</subject><subject>Encoders-Decoders</subject><subject>Localization</subject><subject>Mirror skip</subject><subject>Model testing</subject><subject>Scene image</subject><subject>Segmentation</subject><subject>Skips</subject><issn>0167-8655</issn><issn>1872-7344</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><recordid>eNp9kEtLxDAUhYMoOI7-AxcB1615tWk3ggy-YEQEXYc0uYXUTjMmGV-_3gx17epsvnMu90PonJKSElpfDuVWpwCmZISRkoiSEHaAFrSRrJBciEO0yJgsmrqqjtFJjAMhpOZts0DPj7sxORyNHgFvXAg-YOOnCUxyfsKdjmAxTMZbCNjCnBOkTx_ecJ_hBF8Jjz733Y_ed07RUa_HCGd_uUSvtzcvq_ti_XT3sLpeF4ZzkQpNpaCaWyt533UGWk4bI0GDMJQRWdVC0tZqqTnroepqaJjpTSfqXnatIRVfoot5dxv8-w5iUoPfhSmfVEzw_DqXDcmUmCkTfIwBerUNbqPDt6JE7eWpQc3y1F6eIkJlebl2Ndcgf_DhIKhoXNYA1mU0Kevd_wO_4BN7Tg</recordid><startdate>202007</startdate><enddate>202007</enddate><creator>Dutta, Kalpita</creator><creator>Bal, Malyaban</creator><creator>Basak, Arpita</creator><creator>Ghosh, Swarnendu</creator><creator>Das, Nibaran</creator><creator>Kundu, Mahantapas</creator><creator>Nasipuri, Mita</creator><general>Elsevier B.V</general><general>Elsevier Science Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7TK</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-2426-9915</orcidid></search><sort><creationdate>202007</creationdate><title>Multi scale mirror connection based encoder decoder network for text localization</title><author>Dutta, Kalpita ; Bal, Malyaban ; Basak, Arpita ; Ghosh, Swarnendu ; Das, Nibaran ; Kundu, Mahantapas ; Nasipuri, Mita</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c334t-a1741a3dd73fbbce9318c7eae4c1207564719da7a32fe5b6e82cfcb46f7b9c053</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Coders</topic><topic>Encoder-Decoder</topic><topic>Encoders-Decoders</topic><topic>Localization</topic><topic>Mirror skip</topic><topic>Model testing</topic><topic>Scene image</topic><topic>Segmentation</topic><topic>Skips</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Dutta, Kalpita</creatorcontrib><creatorcontrib>Bal, Malyaban</creatorcontrib><creatorcontrib>Basak, Arpita</creatorcontrib><creatorcontrib>Ghosh, Swarnendu</creatorcontrib><creatorcontrib>Das, Nibaran</creatorcontrib><creatorcontrib>Kundu, Mahantapas</creatorcontrib><creatorcontrib>Nasipuri, Mita</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Pattern recognition letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Dutta, Kalpita</au><au>Bal, Malyaban</au><au>Basak, Arpita</au><au>Ghosh, Swarnendu</au><au>Das, Nibaran</au><au>Kundu, Mahantapas</au><au>Nasipuri, Mita</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multi scale mirror connection based encoder decoder network for text localization</atitle><jtitle>Pattern recognition letters</jtitle><date>2020-07</date><risdate>2020</risdate><volume>135</volume><spage>64</spage><epage>71</epage><pages>64-71</pages><issn>0167-8655</issn><eissn>1872-7344</eissn><abstract>•An encoder decoder architecture with mirror skip connections for text localization•Linear, parametric, and convolutional mirror skip connections have been implemented.•Three models with different kernel sizes ensembled to capture multi scale features.•Proposed model beats state of the art pixel level classifiers.•Datasets used: ICDAR 2003, 2015, SVT and Total Text.
Encoder decoder models with multi-scale feature concatenations have become ubiquitous for various natural scene segmentation tasks. In the current approach, a similar model with an improved mirror connection from encoders to decoder has been proposed. Three different types of mirror connections, namely, linear, parametric and convolutional, have been demonstrated in the proposed work. We have also implemented the use of internal skips to facilitate better gradient propagation within the encoder-decoder architecture. The proposed model also consists of an ensemble module that combines outputs from models with different kernel sizes, such as, 3 × 3, 5 × 5 and 7 × 7 to combine multi-scale features for efficient detections. The model was tested on the ICDAR 2003, SVT, ICDAR 2015 and the Total-Text dataset where it proved to be superior to other state of the art encoder-decoder architectures for pixel level classification.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.patrec.2020.04.002</doi><tpages>8</tpages><orcidid>https://orcid.org/0000-0002-2426-9915</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0167-8655 |
ispartof | Pattern recognition letters, 2020-07, Vol.135, p.64-71 |
issn | 0167-8655 1872-7344 |
language | eng |
recordid | cdi_proquest_journals_2438723780 |
source | ScienceDirect Journals |
subjects | Coders Encoder-Decoder Encoders-Decoders Localization Mirror skip Model testing Scene image Segmentation Skips |
title | Multi scale mirror connection based encoder decoder network for text localization |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-03-06T03%3A57%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multi%20scale%20mirror%20connection%20based%20encoder%20decoder%20network%20for%20text%20localization&rft.jtitle=Pattern%20recognition%20letters&rft.au=Dutta,%20Kalpita&rft.date=2020-07&rft.volume=135&rft.spage=64&rft.epage=71&rft.pages=64-71&rft.issn=0167-8655&rft.eissn=1872-7344&rft_id=info:doi/10.1016/j.patrec.2020.04.002&rft_dat=%3Cproquest_cross%3E2438723780%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c334t-a1741a3dd73fbbce9318c7eae4c1207564719da7a32fe5b6e82cfcb46f7b9c053%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2438723780&rft_id=info:pmid/&rfr_iscdi=true |