Loading…

Rethinking text rectification for scene text recognition

Existing scene text recognition methods have incorporated text rectification to lessen text irregularity in images for accurate recognition. Previous text rectification methods aim to convert an irregular text image into a regular form, making it easier to be recognized. In this study, we explore te...

Full description

Saved in:
Bibliographic Details
Published in:Expert systems with applications 2023-06, Vol.219, p.119647, Article 119647
Main Authors: Ke, Wenjun, Wei, Jianguo, Hou, Qingzhi, Feng, Hui
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Existing scene text recognition methods have incorporated text rectification to lessen text irregularity in images for accurate recognition. Previous text rectification methods aim to convert an irregular text image into a regular form, making it easier to be recognized. In this study, we explore text rectification for text recognition and discover the issues: performance degradation of the recognition network and the unreliable situation of text rectification, which are ignored by all previous works. Therefore, we rethink what is causing two issues, and propose a rectification-based text recognition network to mitigate the above issues. The proposed network consists of text rectification and text recognition, and designs a multi-level feature aggregation module to enhance feature learning for character representation. Concretely, we devise a mixed batch training strategy to address the performance degradation of the recognition network, and design a confidence decoding scheme to avoid the unreliable situation of text rectification. Extensive ablation studies verified the positive role of the feature aggregation module in feature learning and the effectiveness of the proposed training strategy and decoding scheme in addressing the issues. Experimental results outperform the state-of-the-art results on public benchmarks. •We explore the limitation and unreliable cases caused by the rectification network.•Mixed batch and confidence decoding are designed for accurate recognition.•We design feature aggregation to exploit multi-level features.•Our method achieves superior performance on standard datasets.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2023.119647