Loading…

A convolutional recursive deep architecture for unconstrained Urdu handwriting recognition

An offline handwriting recognition system for Urdu, a language with a user base of 200 Million and written in Nastaleeq script, has been a challenge for the research community. The key problems include recognition of complex ligature shapes and lack of publicly available datasets. This paper address...

Full description

Saved in:
Bibliographic Details
Published in:Neural computing & applications 2022, Vol.34 (2), p.1635-1648
Main Authors: ul Sehr Zia, Noor, Naeem, Muhammad Ferjad, Raza, Syed Muhammad Kumail, Khan, Muhammad Mubasher, Ul-Hasan, Adnan, Shafait, Faisal
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:An offline handwriting recognition system for Urdu, a language with a user base of 200 Million and written in Nastaleeq script, has been a challenge for the research community. The key problems include recognition of complex ligature shapes and lack of publicly available datasets. This paper addresses both these problems by (i) proposing an end-to-end handwriting recognition system based on a new CNN-RNN architecture with n-gram language modeling, and (ii) presenting a new unconstrained dataset called NUST-UHWR. We compiled the first unconstrained Urdu handwritten data from around 1000 people from diverse background, age and gender population. The text in this dataset is selected carefully from seven different fields to ensure the presence of commonly used words in different domains. The model architecture is capable of incorporating fine-grained features necessary for handwritten text recognition of complex ligature languages. Our method addresses the limitations of existing architectures and provides state-of-the-art performance on Urdu handwritten text. We achieve a minimum character error rate of 5.28% on Urdu handwriting recognition (UHWR) and establish a state-of-the-art. The paper further demonstrates the generalization ability of the proposed model by training on English language and bilingual (Urdu and English) handwritten data.
ISSN:0941-0643
1433-3058
DOI:10.1007/s00521-021-06498-2