Loading…

Quality of OCR for Degraded Text Images

Commercial OCR packages work best with high-quality scanned images. They often produce poor results when the image is degraded, either because the original itself was poor quality, or because of excessive photocopying. The ability to predict the word failure rate of OCR from a statistical analysis o...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 1999-02
Main Authors: Hartley, Roger T, Crumpton, Kathleen
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Hartley, Roger T
Crumpton, Kathleen
description Commercial OCR packages work best with high-quality scanned images. They often produce poor results when the image is degraded, either because the original itself was poor quality, or because of excessive photocopying. The ability to predict the word failure rate of OCR from a statistical analysis of the image can help in making decisions in the trade-off between the success rate of OCR and the cost of human correction of errors. This paper describes an investigation of OCR of degraded text images using a standard OCR engine (Adobe Capture). The documents were selected from those in the archive at Los Alamos National Laboratory. By introducing noise in a controlled manner into perfect documents, we show how the quality of OCR can be predicted from the nature of the noise. The preliminary results show that a simple noise model can give good prediction of the number of OCR errors.
format article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2091243016</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2091243016</sourcerecordid><originalsourceid>FETCH-proquest_journals_20912430163</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mRQDyxNzMksqVTIT1Pwdw5SSMsvUnBJTS9KTElNUQhJrShR8MxNTE8t5mFgTUvMKU7lhdLcDMpuriHOHroFRfmFpanFJfFZ-aVFeUCpeCMDS0MjE2MDQzNj4lQBAPd4Lpg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2091243016</pqid></control><display><type>article</type><title>Quality of OCR for Degraded Text Images</title><source>Publicly Available Content (ProQuest)</source><creator>Hartley, Roger T ; Crumpton, Kathleen</creator><creatorcontrib>Hartley, Roger T ; Crumpton, Kathleen</creatorcontrib><description>Commercial OCR packages work best with high-quality scanned images. They often produce poor results when the image is degraded, either because the original itself was poor quality, or because of excessive photocopying. The ability to predict the word failure rate of OCR from a statistical analysis of the image can help in making decisions in the trade-off between the success rate of OCR and the cost of human correction of errors. This paper describes an investigation of OCR of degraded text images using a standard OCR engine (Adobe Capture). The documents were selected from those in the archive at Los Alamos National Laboratory. By introducing noise in a controlled manner into perfect documents, we show how the quality of OCR can be predicted from the nature of the noise. The preliminary results show that a simple noise model can give good prediction of the number of OCR errors.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Decision analysis ; Failure analysis ; Failure rates ; Image degradation ; Image quality ; Noise ; Noise prediction ; OCR ; Photocopying ; Statistical analysis</subject><ispartof>arXiv.org, 1999-02</ispartof><rights>1999. This work is published under https://arxiv.org/licenses/assumed-1991-2003/license.html (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2091243016?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>776,780,25732,36991,44569</link.rule.ids></links><search><creatorcontrib>Hartley, Roger T</creatorcontrib><creatorcontrib>Crumpton, Kathleen</creatorcontrib><title>Quality of OCR for Degraded Text Images</title><title>arXiv.org</title><description>Commercial OCR packages work best with high-quality scanned images. They often produce poor results when the image is degraded, either because the original itself was poor quality, or because of excessive photocopying. The ability to predict the word failure rate of OCR from a statistical analysis of the image can help in making decisions in the trade-off between the success rate of OCR and the cost of human correction of errors. This paper describes an investigation of OCR of degraded text images using a standard OCR engine (Adobe Capture). The documents were selected from those in the archive at Los Alamos National Laboratory. By introducing noise in a controlled manner into perfect documents, we show how the quality of OCR can be predicted from the nature of the noise. The preliminary results show that a simple noise model can give good prediction of the number of OCR errors.</description><subject>Decision analysis</subject><subject>Failure analysis</subject><subject>Failure rates</subject><subject>Image degradation</subject><subject>Image quality</subject><subject>Noise</subject><subject>Noise prediction</subject><subject>OCR</subject><subject>Photocopying</subject><subject>Statistical analysis</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>1999</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mRQDyxNzMksqVTIT1Pwdw5SSMsvUnBJTS9KTElNUQhJrShR8MxNTE8t5mFgTUvMKU7lhdLcDMpuriHOHroFRfmFpanFJfFZ-aVFeUCpeCMDS0MjE2MDQzNj4lQBAPd4Lpg</recordid><startdate>19990205</startdate><enddate>19990205</enddate><creator>Hartley, Roger T</creator><creator>Crumpton, Kathleen</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>19990205</creationdate><title>Quality of OCR for Degraded Text Images</title><author>Hartley, Roger T ; Crumpton, Kathleen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_20912430163</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>1999</creationdate><topic>Decision analysis</topic><topic>Failure analysis</topic><topic>Failure rates</topic><topic>Image degradation</topic><topic>Image quality</topic><topic>Noise</topic><topic>Noise prediction</topic><topic>OCR</topic><topic>Photocopying</topic><topic>Statistical analysis</topic><toplevel>online_resources</toplevel><creatorcontrib>Hartley, Roger T</creatorcontrib><creatorcontrib>Crumpton, Kathleen</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Engineering Database</collection><collection>Publicly Available Content (ProQuest)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hartley, Roger T</au><au>Crumpton, Kathleen</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Quality of OCR for Degraded Text Images</atitle><jtitle>arXiv.org</jtitle><date>1999-02-05</date><risdate>1999</risdate><eissn>2331-8422</eissn><abstract>Commercial OCR packages work best with high-quality scanned images. They often produce poor results when the image is degraded, either because the original itself was poor quality, or because of excessive photocopying. The ability to predict the word failure rate of OCR from a statistical analysis of the image can help in making decisions in the trade-off between the success rate of OCR and the cost of human correction of errors. This paper describes an investigation of OCR of degraded text images using a standard OCR engine (Adobe Capture). The documents were selected from those in the archive at Los Alamos National Laboratory. By introducing noise in a controlled manner into perfect documents, we show how the quality of OCR can be predicted from the nature of the noise. The preliminary results show that a simple noise model can give good prediction of the number of OCR errors.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 1999-02
issn 2331-8422
language eng
recordid cdi_proquest_journals_2091243016
source Publicly Available Content (ProQuest)
subjects Decision analysis
Failure analysis
Failure rates
Image degradation
Image quality
Noise
Noise prediction
OCR
Photocopying
Statistical analysis
title Quality of OCR for Degraded Text Images
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T22%3A55%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Quality%20of%20OCR%20for%20Degraded%20Text%20Images&rft.jtitle=arXiv.org&rft.au=Hartley,%20Roger%20T&rft.date=1999-02-05&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2091243016%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_20912430163%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2091243016&rft_id=info:pmid/&rfr_iscdi=true