Loading…
Recognition of Handwritten Textual Annotations using Tesseract Open Source OCR Engine for information Just In Time (iJIT)
Objective of the current work is to develop an Optical Character Recognition (OCR) engine for information Just In Time (iJIT) system that can be used for recognition of handwritten textual annotations of lower case Roman script. Tesseract open source OCR engine under Apache License 2.0 is used to de...
Saved in:
Published in: | arXiv.org 2010-03 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Objective of the current work is to develop an Optical Character Recognition (OCR) engine for information Just In Time (iJIT) system that can be used for recognition of handwritten textual annotations of lower case Roman script. Tesseract open source OCR engine under Apache License 2.0 is used to develop user-specific handwriting recognition models, viz., the language sets, for the said system, where each user is identified by a unique identification tag associated with the digital pen. To generate the language set for any user, Tesseract is trained with labeled handwritten data samples of isolated and free-flow texts of Roman script, collected exclusively from that user. The designed system is tested on five different language sets with free- flow handwritten annotations as test samples. The system could successfully segment and subsequently recognize 87.92%, 81.53%, 92.88%, 86.75% and 90.80% handwritten characters in the test samples of five different users. |
---|---|
ISSN: | 2331-8422 |