Loading…

CNN-BLSTM-CRF Network for Semantic Labeling of Students' Online Handwritten Assignments

Automatic semantic labeling of strokes in online handwritten documents is a crucial task for many applications such as diagram interpretation, text recognition, and search. We formulate this task as a stroke classification problem in which each stroke is classified as a cross-out, free body diagram,...

Full description

Saved in:

Bibliographic Details
Main Authors:	Darvishzadeh, Amirali, Stahovich, Thomas, Feghahati, Amir, Entezari, Negin, Gharghabi, Shaghayegh, Kanemaru, Reed, Shelton, Christian
Format:	Conference Proceeding
Language:	English
Subjects:	CNN CRF Feature extraction Hidden Markov models Labeling Logic gates LSTM Mathematical model Semantic Labeling Semantics Stroke Classification Task analysis
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Automatic semantic labeling of strokes in online handwritten documents is a crucial task for many applications such as diagram interpretation, text recognition, and search. We formulate this task as a stroke classification problem in which each stroke is classified as a cross-out, free body diagram, or text. Separating free body diagram and text in this work is different than the traditional text/non-text separation problem because these two classes contain both text and graphics. The text class includes textual notes, mathematical symbols/equations, and graphics such as arrows that connect other elements. The free body diagram class also contains graphics and various alphanumeric characters and symbols that mark or explain the graphical objects. In this work, we present a novel deep neural network model for classification of strokes in online handwritten documents. There are two input sequences to the network. The first sequence contains the trajectories of the pen strokes while the second contains features of the strokes. Each of these sequences is fed to its own CNN-BLSTM channel to extract features and encode relationships between nearby strokes. The output of the two channels is concatenated and used as the input to a CRF layer that predicts the best sequence of labels for given input sequences. We evaluated our model on a dataset of 1,060 pages written by 132 students in an undergraduate statics course. Our model achieved an overall classification accuracy of 94.70% on this dataset.
ISSN:	2379-2140
DOI:	10.1109/ICDAR.2019.00169