Loading…

MASTER: Multi-aspect non-local network for scene text recognition

•Multi-aspect non-local block enables the feature extracter to model global context.•Different types of attention focus on different aspects of spatial feature dependencies.•The inference speed is fast because of the proposed novel memory-cashed decoding mechanism.•Our method achieves the best case-...

Full description

Saved in:

Bibliographic Details
Published in:	Pattern recognition 2021-09, Vol.117, p.107980, Article 107980
Main Authors:	Lu, Ning, Yu, Wenwen, Qi, Xianbiao, Chen, Yihao, Gong, Ping, Xiao, Rong, Bai, Xiang
Format:	Article
Language:	English
Subjects:	Memory-cached mechanism Non-local network Scene text recognition Transformer
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	•Multi-aspect non-local block enables the feature extracter to model global context.•Different types of attention focus on different aspects of spatial feature dependencies.•The inference speed is fast because of the proposed novel memory-cashed decoding mechanism.•Our method achieves the best case-sensitive performance on COCO-text dataset. Attention-based scene text recognizers have gained huge success, which leverages a more compact intermediate representation to learn 1d- or 2d- attention by a RNN-based encoder-decoder architecture. However, such methods suffer from attention-driftproblem because high similarity among encoded features leads to attention confusion under the RNN-based local attention mechanism. Moreover, RNN-based methods have low efficiency due to poor parallelization. To overcome these problems, we propose the MASTER, a self-attention based scene text recognizer that (1) not only encodes the input-output attention but also learns self-attention which encodes feature-feature and target-target relationships inside the encoder and decoder and (2) learns a more powerful and robust intermediate representation to spatial distortion, and (3) owns a great training efficiency because of high training parallelization and a high-speed inference because of an efficient memory-cache mechanism. Extensive experiments on various benchmarks demonstrate the superior performance of our MASTER on both regular and irregular scene text. Pytorch code can be found at https://github.com/wenwenyu/MASTER-pytorch, and Tensorflow code can be found at https://github.com/jiangxiluning/MASTER-TF.
ISSN:	0031-3203 1873-5142
DOI:	10.1016/j.patcog.2021.107980