Loading…
ICDAR2017 Competition on Post-OCR Text Correction
This paper describes the ICDAR2017 competition on post-OCR text correction and presents the different methods submitted by the participants. OCR has been an active research field for over the past 30 years but results are still imperfect, especially for historical documents. The purpose of this comp...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This paper describes the ICDAR2017 competition on post-OCR text correction and presents the different methods submitted by the participants. OCR has been an active research field for over the past 30 years but results are still imperfect, especially for historical documents. The purpose of this competition is to compare and evaluate automatic approaches for correcting (denoising) OCR-ed texts. The challenge consists of two independent tasks: 1) error detection and 2) error correction. An original dataset of 12M OCR-ed symbols along with an aligned ground truth was provided to the participants with 80% of the dataset dedicated to the training and 20% to the evaluation. Different sources were aggregated and namely contain newspapers and monographs covering 2 languages (English and French). 11 teams submitted results, while the difficulty of the task was underlined by the fact that only half of the submitted methods were able to denoise the evaluation dataset on average. In any case, this competition, which counted 35 registrations, illustrates the strong interest of the community in this essential problem, which is key to any digitization process involving textual data. |
---|---|
ISSN: | 2379-2140 |
DOI: | 10.1109/ICDAR.2017.232 |