Loading…
Discrete Script or Cursive Language Identification from Document Images
We present a method for identifying the discrete script or cursive language contained in a document image in only one step. The method depends on extracting a set of global templates that are shared between scripts and languages having common symbol shapes. This results in a small number of template...
Saved in:
Published in: | Journal of King Saud University. Engineering sciences 2004, Vol.16 (2), p.253-268 |
---|---|
Main Author: | |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | We present a method for identifying the discrete script or cursive language contained in a document image in only one step. The method depends on extracting a set of global templates that are shared between scripts and languages having common symbol shapes. This results in a small number of templates in addition to saving in processing time and memory requirement during program execution. A key point in our approach is that we perform one-dimensional normalization such that the width to height ratio is retained. This preserves the relative geometrical attributes of symbols, which adds to the discriminating power of our algorithm and produces small-size templates. Our algorithm requires less than 15 seconds using Pentium III (866MHz and 128 MB RAM) to identify the discrete script/cursive language of a document. The very encouraging results of our approach in terms of accuracy and speed make it suitable for use in commercial OCR products. |
---|---|
ISSN: | 1018-3639 1018-3639 |
DOI: | 10.1016/S1018-3639(18)30790-6 |