Loading…

A Chinese OCR spelling check approach based on statistical language models

This work describes an effective spelling check approach for Chinese OCR with a new multi-knowledge based statistical language model. This language model combines the conventional n-gram language model and the new LSA (latent semantic analysis) language model, so both local information (syntax) and...

Full description

Saved in:
Bibliographic Details
Main Authors: Li Zhuang, Ta Bao, Xioyan Zhu, Chunheng Wang, Naoi, S.
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This work describes an effective spelling check approach for Chinese OCR with a new multi-knowledge based statistical language model. This language model combines the conventional n-gram language model and the new LSA (latent semantic analysis) language model, so both local information (syntax) and global information (semantic) are utilized. Furthermore, Chinese similar characters are used in Viterbi search process to expand the candidate list in order to add more possible correct results. With our approach, the best recognition accuracy rate increases from 79.3% to 91.9%, which means 60.9% error reduction.
ISSN:1062-922X
2577-1655
DOI:10.1109/ICSMC.2004.1401278