Loading…
Rethinking text rectification for scene text recognition
Existing scene text recognition methods have incorporated text rectification to lessen text irregularity in images for accurate recognition. Previous text rectification methods aim to convert an irregular text image into a regular form, making it easier to be recognized. In this study, we explore te...
Saved in:
Published in: | Expert systems with applications 2023-06, Vol.219, p.119647, Article 119647 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Existing scene text recognition methods have incorporated text rectification to lessen text irregularity in images for accurate recognition. Previous text rectification methods aim to convert an irregular text image into a regular form, making it easier to be recognized. In this study, we explore text rectification for text recognition and discover the issues: performance degradation of the recognition network and the unreliable situation of text rectification, which are ignored by all previous works. Therefore, we rethink what is causing two issues, and propose a rectification-based text recognition network to mitigate the above issues. The proposed network consists of text rectification and text recognition, and designs a multi-level feature aggregation module to enhance feature learning for character representation. Concretely, we devise a mixed batch training strategy to address the performance degradation of the recognition network, and design a confidence decoding scheme to avoid the unreliable situation of text rectification. Extensive ablation studies verified the positive role of the feature aggregation module in feature learning and the effectiveness of the proposed training strategy and decoding scheme in addressing the issues. Experimental results outperform the state-of-the-art results on public benchmarks.
•We explore the limitation and unreliable cases caused by the rectification network.•Mixed batch and confidence decoding are designed for accurate recognition.•We design feature aggregation to exploit multi-level features.•Our method achieves superior performance on standard datasets. |
---|---|
ISSN: | 0957-4174 1873-6793 |
DOI: | 10.1016/j.eswa.2023.119647 |