Loading…

Segmentation of Characters from Old Typewritten Documents using Radon Transform

Optical character recognition is a very challenging area. Many works have been done and still being done for many languages across the world. For many Indian languages too good amount of work has been done. However, Gujarati is a language for which hardly any work can be found. Gujarati has a rich l...

Full description

Saved in:
Bibliographic Details
Published in:International journal of computer applications 2012-01, Vol.37 (9), p.10-15
Main Author: Desai, Apurva A
Format: Article
Language:English
Subjects:
Citations: Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Optical character recognition is a very challenging area. Many works have been done and still being done for many languages across the world. For many Indian languages too good amount of work has been done. However, Gujarati is a language for which hardly any work can be found. Gujarati has a rich literary heritage, and therefore it is important to preserve it for the next generation. In this paper an attempt has be done to segmenting out the words and characters from old typewritten Gujarati documents. Here an algorithm is presented which makes use of global threshold for converting scan RGB documents to blank and white documents. Noise removal has also been applied. Here Radon transform is utilized for skew detection. The novel concept of using Radon transform is presented here in this work. Here Radon transform is used for segmenting documents into lines and then vertical profiles has been used for further segmentation of lines in characters. At last this segmentation algorithm is also tested for the documents typewritten in Hindi. The algorithm presented here gives very good results.
ISSN:0975-8887
0975-8887
DOI:10.5120/4635-6683