Loading…

Multi scale mirror connection based encoder decoder network for text localization

•An encoder decoder architecture with mirror skip connections for text localization•Linear, parametric, and convolutional mirror skip connections have been implemented.•Three models with different kernel sizes ensembled to capture multi scale features.•Proposed model beats state of the art pixel lev...

Full description

Saved in:
Bibliographic Details
Published in:Pattern recognition letters 2020-07, Vol.135, p.64-71
Main Authors: Dutta, Kalpita, Bal, Malyaban, Basak, Arpita, Ghosh, Swarnendu, Das, Nibaran, Kundu, Mahantapas, Nasipuri, Mita
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•An encoder decoder architecture with mirror skip connections for text localization•Linear, parametric, and convolutional mirror skip connections have been implemented.•Three models with different kernel sizes ensembled to capture multi scale features.•Proposed model beats state of the art pixel level classifiers.•Datasets used: ICDAR 2003, 2015, SVT and Total Text. Encoder decoder models with multi-scale feature concatenations have become ubiquitous for various natural scene segmentation tasks. In the current approach, a similar model with an improved mirror connection from encoders to decoder has been proposed. Three different types of mirror connections, namely, linear, parametric and convolutional, have been demonstrated in the proposed work. We have also implemented the use of internal skips to facilitate better gradient propagation within the encoder-decoder architecture. The proposed model also consists of an ensemble module that combines outputs from models with different kernel sizes, such as, 3 × 3, 5 × 5 and 7 × 7 to combine multi-scale features for efficient detections. The model was tested on the ICDAR 2003, SVT, ICDAR 2015 and the Total-Text dataset where it proved to be superior to other state of the art encoder-decoder architectures for pixel level classification.
ISSN:0167-8655
1872-7344
DOI:10.1016/j.patrec.2020.04.002