Loading…

Anchor-free multi-orientation text detection in natural scene images

Text detection in natural scene images is a key prerequisite for computer vision tasks such as image search, blind navigation, autopilot, and multi-language translation. Existing text detection methods only detect partial region of large-scale texts and are difficult to detect small-scale texts. Aim...

Full description

Saved in:

Bibliographic Details
Published in:	Applied intelligence (Dordrecht, Netherlands) Netherlands), 2020-11, Vol.50 (11), p.3623-3637
Main Authors:	Lu, Liqiong, Wu, Dong, Wu, Tao, Huang, Faliang, Yi, Yaohua
Format:	Article
Language:	English
Subjects:	Accuracy Artificial Intelligence Artificial neural networks Automatic pilots Boxes Computer Science Computer vision Discourse functions Image segmentation Language translation Machines Manufacturing Mass media Mechanical Engineering Methods Neural networks Pixels Processes Segmentation Semantics Texts
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Text detection in natural scene images is a key prerequisite for computer vision tasks such as image search, blind navigation, autopilot, and multi-language translation. Existing text detection methods only detect partial region of large-scale texts and are difficult to detect small-scale texts. Aiming at this problem, an anchor-free multi-orientation text detection method is proposed. Firstly, Feature Pyramid Network (FPN) is used to combine the multiple feature layers of Convolutional Neural Network (CNN) to predict the geometric properties of text, which can be used to expand the receptive field of each pixel and thus help to detect more large-scale texts. Secondly, a new loss function independent of the scale of text is designed, which enables the pixels in the small-scale text to have a larger calculation weight, thereby facilitating the detection of small-scale texts. Finally, the results of pixel-level semantic segmentation are used to filter obviously unreasonable candidate text boxes, and at the same time improve the accuracy and recall rate of text detection. The experimental results on ICDAR 2015 and MSRA-TD500 prove the good performance of our method.
ISSN:	0924-669X 1573-7497
DOI:	10.1007/s10489-020-01742-z