Loading…

Offline handwritten mathematical expression recognition with graph encoder and transformer decoder

Handwritten mathematical expression recognition (HMER) has attracted extensive attention. Despite the significant progress achieved in recent years attributed to the development of deep learning approaches, HMER remains a challenge due to the complex spatial structure and variable writing styles. En...

Full description

Saved in:
Bibliographic Details
Published in:Pattern recognition 2024-04, Vol.148, p.110155, Article 110155
Main Authors: Tang, Jia-Man, Guo, Hong-Yu, Wu, Jin-Wen, Yin, Fei, Huang, Lin-Lin
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Handwritten mathematical expression recognition (HMER) has attracted extensive attention. Despite the significant progress achieved in recent years attributed to the development of deep learning approaches, HMER remains a challenge due to the complex spatial structure and variable writing styles. Encoder–decoder models with attention mechanism, which treats HMER as an image-to-sequence (i.e. LaTeX) generation task, have boosted the accuracy, but suffer from low interpretability in that the symbols are not segmented explicitly. Symbol segmentation is desired for facilitating post-processing and human interaction in real applications. In this paper, we formulate the mathematical expression as a graph and propose a Graph-Encoder-Transformer-Decoder (GETD) approach for HMER. For constructing the graph from input image, candidate symbols are first detected using an object detector and represented as the nodes of a graph, called symbol graph, and the edges of the graph encodes the between-symbol relationship. The spatial information is aggregated in a graph neural network (GNN), and a Transformer-based decoder is used to identify the symbol classes and structure from the graph. Experiments on public datasets demonstrate that our GETD model achieves competitive expression recognition performance while offering good interpretability compared with previous methods. •A Graph-Encoder-Transformer-Decoder (GETD) approach is proposed for offline HMER, which has good interpretability.•The approach consists of a graph encoder (GNN) for aggregating spatial structure information and a Transformer decoder for generating recognition result.•Experiments on public datasets demonstrate that the proposed method achieves competitive recognition performance.
ISSN:0031-3203
1873-5142
DOI:10.1016/j.patcog.2023.110155