Loading…

Can question-texts improve the recognition of handwritten mathematical expressions in respondents’ solutions?

The accurate recognition of respondents’ handwritten solutions is important for implementing intelligent diagnosis and tutoring. This task is significantly challenging because of scribbled and irregular writing, especially when handling primary or secondary students whose handwriting has not yet bee...

Full description

Saved in:
Bibliographic Details
Published in:Knowledge-based systems 2025-01, Vol.307, p.112731, Article 112731
Main Authors: Zhang, Ting, Jin, Xinxin, Ma, Xiaoyang, Peng, Xinzi, Zhao, Yiyang, Liu, Jinzheng, Yu, Xinguo
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The accurate recognition of respondents’ handwritten solutions is important for implementing intelligent diagnosis and tutoring. This task is significantly challenging because of scribbled and irregular writing, especially when handling primary or secondary students whose handwriting has not yet been fully developed. Recognition becomes difficult in such cases even for humans relying only on the visual signals of handwritten content without any context. However, despite decades of work on handwriting recognition, few studies have explored the idea of utilizing external information (question priors) to improve the accuracy. Based on the correlation between questions and solutions, this study aims to explore whether question-texts can improve the recognition of handwritten mathematical expressions (HMEs) in respondents’ solutions. Based on the encoder–decoder framework, which is the mainstream method for HME recognition, we propose two models for fusing question-text signals and handwriting-vision signals at the encoder and decoder stages, respectively. The first, called encoder-fusion, adopts a static query to implement the interaction between two modalities at the encoder phase, and to better catch and interpret the interaction, a fusing method based on a dynamic query at the decoder stage, called decoder-attend is proposed. These two models were evaluated on a self-collected dataset comprising approximately 7k samples and achieved accuracies of 62.61% and 64.20%, respectively, at the expression level. The experimental results demonstrated that both models outperformed the baseline model, which utilized only visual information. The encoder fusion achieved results similar to those of other state-of-the-art methods.
ISSN:0950-7051
DOI:10.1016/j.knosys.2024.112731