Loading…
Improving Nastalique specific pre-recognition process for Urdu OCR
Urdu language is written using Arabic script in Nastalique writing style. Nastalique script is highly cursive, context sensitive and is hard to process as only the last character in its ligature sits on the baseline. In addition, it exhibits character and ligature level spatial overlap. Due to these...
Saved in:
Main Authors: | , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Urdu language is written using Arabic script in Nastalique writing style. Nastalique script is highly cursive, context sensitive and is hard to process as only the last character in its ligature sits on the baseline. In addition, it exhibits character and ligature level spatial overlap. Due to these factors, the placement of dots and other diacritics is also highly contextual and variable. There is now increasing amount of work to process and recognize Nastalique script to develop Urdu OCR. This paper proposes improvements to these methods. The paper focuses on Nastalique specific pre-processing methods which can be employed before the text recognition process. The recognition and post recognition processes will be addressed separately. |
---|---|
DOI: | 10.1109/INMIC.2009.5383111 |