Loading…

Efficient character segmentation approach for machine-typed documents

•Efficient character segmentation algorithm for machine typed documents is presented.•The efficient algorithm supports general pipeline from grayscale conversion to segmentation.•The mathematical background of the novel idea is also presented in detail.•The algorithm core part is shown in pseudo-cod...

Full description

Saved in:
Bibliographic Details
Published in:Expert systems with applications 2017-09, Vol.80, p.210-231
Main Authors: Vučković, Vladan, Arizanović, Boban
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Efficient character segmentation algorithm for machine typed documents is presented.•The efficient algorithm supports general pipeline from grayscale conversion to segmentation.•The mathematical background of the novel idea is also presented in detail.•The algorithm core part is shown in pseudo-code proved with a large set of empiric results. In this paper an efficient approach for segmentation of the individual characters from scanned documents typed on old typewriters is proposed. The approach proposed in this paper is primarily intended for processing of machine-typed documents, but can be used for machine-printed documents as well. The proposed character segmentation approach uses the modified projection profiles technique which is based on using the sliding window for obtaining the information about the document image structure. This is followed by histogram processing in order to determine the spaces between lines, words and characters in the document image. The decision-making logic used in the process of character segmentation is describes and represents the most an integral aspect of the proposed technique. Beside the character segmentation approach, the ultra-fast architecture for geometrical image transformations, which is used for image rotation in the process of skew correction, is presented, and its fast implementation using pointer arithmetic and a highly optimized low-level machine routine is provided. The proposed character segmentation approach is semi-automatic and uses threshold values to control the segmentation process. Provided results for segmentation accuracy show that the proposed approach outperforms the state-of-the-art approaches in most cases. Also, the results from the aspect of the time complexity show that the new technique performs faster than state-of-the-art approaches and can process even very large document images in less than one second, which makes this approach suitable for real-time tasks. Finally, visual demonstration of the proposed approach performances is achieved using original documents authored by Nikola Tesla.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2017.03.027