Loading…

Computationally efficient recognition of unconstrained handwritten Urdu script using BERT with vision transformers

The handwritten Urdu text recognition is a challenging area in pattern recognition and has gained much importance after the rapid emergence of several camera-based applications on portable devices, which facilitate the daily processing of plenty of images. The various challenges encountered in handw...

Full description

Saved in:
Bibliographic Details
Published in:Neural computing & applications 2023-12, Vol.35 (34), p.24161-24177
Main Authors: Ganai, Aejaz Farooq, Khursheed, Farida
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The handwritten Urdu text recognition is a challenging area in pattern recognition and has gained much importance after the rapid emergence of several camera-based applications on portable devices, which facilitate the daily processing of plenty of images. The various challenges encountered in handwritten Urdu recognition are writer-dependent variations amongst different Urdu writers, irregular positioning of diacritics associated with a character, context sensitivity of characters, and cursive nature of Urdu script. These challenges also make it difficult to formulate a large generalized handwritten Urdu dataset. The state-of-the-art approaches proposed for the recognition of handwritten Urdu text mostly focus on implicit approaches. These approaches are error prone and do not yield significant recognition rates. The holistic approach of handwritten Urdu recognition has been least explored to date and the existing holistic approaches are complex and time consuming as they mostly rely on convolutional/recurrent neural networks or statistical methods. Hence, in this research, a novel and efficient vision transformer-based methodology using BERT architecture has been proposed to the recognition of handwritten Urdu text. The proposed approach uses convolution feature maps as word embedding in the transformer that makes full use of the powerful attention mechanism of the vision transformer to focus on a particular connected component (ligature) in handwritten Urdu text. To cover the entire Urdu corpus, we have pre-trained several benchmark handwritten Urdu datasets such as UNHD and NUST-UHWR and tested unconstrained handwritten Urdu text. In comparison with the state-of-the-art techniques, the experimental evaluation of the proposed approach reports the better results of the various performance parameters such as Ligature Error Rate (LER), precision, sensitivity, specificity, f1-score, and accuracy. The great success of the proposed approach lies in (i) the significant reduction of training time needed to train a large handwritten Urdu dataset, (ii) minimum computational complexity as there is no overhead of diacritic separation and re-association as used in most of the state-of-the-art techniques, and (iii) the proposed approach registers a new state-of-the-art LER of up to 3% only on unconstrained handwritten Urdu text.
ISSN:0941-0643
1433-3058
DOI:10.1007/s00521-023-08976-1