Loading…
ICDAR 2019 Historical Document Reading Challenge on Large Structured Chinese Family Records
In this paper, we present a large historical database of Chinese family records with the aim to develop robust systems for historical document analysis. In this direction, we propose a Historical Document Reading Challenge on Large Chinese Structured Family Records (ICDAR 2019 HDRC-CHINESE). The obj...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 1504 |
container_issue | |
container_start_page | 1499 |
container_title | |
container_volume | |
creator | Saini, Rajkumar Dobson, Derek Morrey, Jon Liwicki, Marcus Simistira Liwicki, Foteini |
description | In this paper, we present a large historical database of Chinese family records with the aim to develop robust systems for historical document analysis. In this direction, we propose a Historical Document Reading Challenge on Large Chinese Structured Family Records (ICDAR 2019 HDRC-CHINESE). The objective of the competition is to recognize and analyze the layout, and finally detect and recognize the textlines and characters of the large historical document image dataset containing more than 100000 pages. Cascade R-CNN, CRNN, and U-Net based architectures were trained to evaluate the performances in these tasks. Error rate of 0.01 has been recorded for textline recognition (Task1) whereas a Jaccard Index of 99:54% has been recorded for layout analysis (Task2). The graph edit distance based total error ratio of 1:5% has been recorded for complete integrated textline detection and recognition (Task3). |
doi_str_mv | 10.1109/ICDAR.2019.00241 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_8977999</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8977999</ieee_id><sourcerecordid>8977999</sourcerecordid><originalsourceid>FETCH-LOGICAL-i1558-2d78d828d8dee753190347518c2c5ddc28a308dedece81b0f678064cabb512523</originalsourceid><addsrcrecordid>eNotj01LAzEQhqMgWGvvgpf8ga2ZZLNJjmVrbWFBqHryULLJtEb2Q7K7h_5748dhmIHnfQZeQu6ALQGYediV69V-yRmYJWM8hwuyMEqD4hoEg9xckhkXymQccnZNbobhk6WsMcWMvP-69Mel2zCMfQzONnTdu6nFbqR7tD50J1p-2KbB7oS072hlYzpexji5cYroEw0dDkg3tg3NOUmuj364JVdH2wy4-N9z8rZ5fC23WfX8tCtXVRZASp1xr7TXPI1HVFKAYSJXErTjTnrvuLaCJebRoYaaHQulWZE7W9cSuORiTu7__gZEPHzF0Np4PmijVOoovgGrmlFI</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>ICDAR 2019 Historical Document Reading Challenge on Large Structured Chinese Family Records</title><source>IEEE Xplore All Conference Series</source><creator>Saini, Rajkumar ; Dobson, Derek ; Morrey, Jon ; Liwicki, Marcus ; Simistira Liwicki, Foteini</creator><creatorcontrib>Saini, Rajkumar ; Dobson, Derek ; Morrey, Jon ; Liwicki, Marcus ; Simistira Liwicki, Foteini</creatorcontrib><description>In this paper, we present a large historical database of Chinese family records with the aim to develop robust systems for historical document analysis. In this direction, we propose a Historical Document Reading Challenge on Large Chinese Structured Family Records (ICDAR 2019 HDRC-CHINESE). The objective of the competition is to recognize and analyze the layout, and finally detect and recognize the textlines and characters of the large historical document image dataset containing more than 100000 pages. Cascade R-CNN, CRNN, and U-Net based architectures were trained to evaluate the performances in these tasks. Error rate of 0.01 has been recorded for textline recognition (Task1) whereas a Jaccard Index of 99:54% has been recorded for layout analysis (Task2). The graph edit distance based total error ratio of 1:5% has been recorded for complete integrated textline detection and recognition (Task3).</description><identifier>EISSN: 2379-2140</identifier><identifier>EISBN: 9781728130149</identifier><identifier>EISBN: 172813014X</identifier><identifier>DOI: 10.1109/ICDAR.2019.00241</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>IEEE</publisher><subject>Character recognition ; Chinese Script ; Complete Integrated Detection and Recognition ; Document Analysis ; Foreground Background Segmentation ; Han Script ; Layout ; Layout Analysis ; Measurement ; Task analysis ; Text Recognition ; Traditional Chinese HDRC Database ; XML</subject><ispartof>2019 International Conference on Document Analysis and Recognition (ICDAR), 2019, p.1499-1504</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8977999$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8977999$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Saini, Rajkumar</creatorcontrib><creatorcontrib>Dobson, Derek</creatorcontrib><creatorcontrib>Morrey, Jon</creatorcontrib><creatorcontrib>Liwicki, Marcus</creatorcontrib><creatorcontrib>Simistira Liwicki, Foteini</creatorcontrib><title>ICDAR 2019 Historical Document Reading Challenge on Large Structured Chinese Family Records</title><title>2019 International Conference on Document Analysis and Recognition (ICDAR)</title><addtitle>ICDAR</addtitle><description>In this paper, we present a large historical database of Chinese family records with the aim to develop robust systems for historical document analysis. In this direction, we propose a Historical Document Reading Challenge on Large Chinese Structured Family Records (ICDAR 2019 HDRC-CHINESE). The objective of the competition is to recognize and analyze the layout, and finally detect and recognize the textlines and characters of the large historical document image dataset containing more than 100000 pages. Cascade R-CNN, CRNN, and U-Net based architectures were trained to evaluate the performances in these tasks. Error rate of 0.01 has been recorded for textline recognition (Task1) whereas a Jaccard Index of 99:54% has been recorded for layout analysis (Task2). The graph edit distance based total error ratio of 1:5% has been recorded for complete integrated textline detection and recognition (Task3).</description><subject>Character recognition</subject><subject>Chinese Script</subject><subject>Complete Integrated Detection and Recognition</subject><subject>Document Analysis</subject><subject>Foreground Background Segmentation</subject><subject>Han Script</subject><subject>Layout</subject><subject>Layout Analysis</subject><subject>Measurement</subject><subject>Task analysis</subject><subject>Text Recognition</subject><subject>Traditional Chinese HDRC Database</subject><subject>XML</subject><issn>2379-2140</issn><isbn>9781728130149</isbn><isbn>172813014X</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2019</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotj01LAzEQhqMgWGvvgpf8ga2ZZLNJjmVrbWFBqHryULLJtEb2Q7K7h_5748dhmIHnfQZeQu6ALQGYediV69V-yRmYJWM8hwuyMEqD4hoEg9xckhkXymQccnZNbobhk6WsMcWMvP-69Mel2zCMfQzONnTdu6nFbqR7tD50J1p-2KbB7oS072hlYzpexji5cYroEw0dDkg3tg3NOUmuj364JVdH2wy4-N9z8rZ5fC23WfX8tCtXVRZASp1xr7TXPI1HVFKAYSJXErTjTnrvuLaCJebRoYaaHQulWZE7W9cSuORiTu7__gZEPHzF0Np4PmijVOoovgGrmlFI</recordid><startdate>201909</startdate><enddate>201909</enddate><creator>Saini, Rajkumar</creator><creator>Dobson, Derek</creator><creator>Morrey, Jon</creator><creator>Liwicki, Marcus</creator><creator>Simistira Liwicki, Foteini</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>201909</creationdate><title>ICDAR 2019 Historical Document Reading Challenge on Large Structured Chinese Family Records</title><author>Saini, Rajkumar ; Dobson, Derek ; Morrey, Jon ; Liwicki, Marcus ; Simistira Liwicki, Foteini</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i1558-2d78d828d8dee753190347518c2c5ddc28a308dedece81b0f678064cabb512523</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Character recognition</topic><topic>Chinese Script</topic><topic>Complete Integrated Detection and Recognition</topic><topic>Document Analysis</topic><topic>Foreground Background Segmentation</topic><topic>Han Script</topic><topic>Layout</topic><topic>Layout Analysis</topic><topic>Measurement</topic><topic>Task analysis</topic><topic>Text Recognition</topic><topic>Traditional Chinese HDRC Database</topic><topic>XML</topic><toplevel>online_resources</toplevel><creatorcontrib>Saini, Rajkumar</creatorcontrib><creatorcontrib>Dobson, Derek</creatorcontrib><creatorcontrib>Morrey, Jon</creatorcontrib><creatorcontrib>Liwicki, Marcus</creatorcontrib><creatorcontrib>Simistira Liwicki, Foteini</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Saini, Rajkumar</au><au>Dobson, Derek</au><au>Morrey, Jon</au><au>Liwicki, Marcus</au><au>Simistira Liwicki, Foteini</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>ICDAR 2019 Historical Document Reading Challenge on Large Structured Chinese Family Records</atitle><btitle>2019 International Conference on Document Analysis and Recognition (ICDAR)</btitle><stitle>ICDAR</stitle><date>2019-09</date><risdate>2019</risdate><spage>1499</spage><epage>1504</epage><pages>1499-1504</pages><eissn>2379-2140</eissn><eisbn>9781728130149</eisbn><eisbn>172813014X</eisbn><coden>IEEPAD</coden><abstract>In this paper, we present a large historical database of Chinese family records with the aim to develop robust systems for historical document analysis. In this direction, we propose a Historical Document Reading Challenge on Large Chinese Structured Family Records (ICDAR 2019 HDRC-CHINESE). The objective of the competition is to recognize and analyze the layout, and finally detect and recognize the textlines and characters of the large historical document image dataset containing more than 100000 pages. Cascade R-CNN, CRNN, and U-Net based architectures were trained to evaluate the performances in these tasks. Error rate of 0.01 has been recorded for textline recognition (Task1) whereas a Jaccard Index of 99:54% has been recorded for layout analysis (Task2). The graph edit distance based total error ratio of 1:5% has been recorded for complete integrated textline detection and recognition (Task3).</abstract><pub>IEEE</pub><doi>10.1109/ICDAR.2019.00241</doi><tpages>6</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | EISSN: 2379-2140 |
ispartof | 2019 International Conference on Document Analysis and Recognition (ICDAR), 2019, p.1499-1504 |
issn | 2379-2140 |
language | eng |
recordid | cdi_ieee_primary_8977999 |
source | IEEE Xplore All Conference Series |
subjects | Character recognition Chinese Script Complete Integrated Detection and Recognition Document Analysis Foreground Background Segmentation Han Script Layout Layout Analysis Measurement Task analysis Text Recognition Traditional Chinese HDRC Database XML |
title | ICDAR 2019 Historical Document Reading Challenge on Large Structured Chinese Family Records |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T22%3A33%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=ICDAR%202019%20Historical%20Document%20Reading%20Challenge%20on%20Large%20Structured%20Chinese%20Family%20Records&rft.btitle=2019%20International%20Conference%20on%20Document%20Analysis%20and%20Recognition%20(ICDAR)&rft.au=Saini,%20Rajkumar&rft.date=2019-09&rft.spage=1499&rft.epage=1504&rft.pages=1499-1504&rft.eissn=2379-2140&rft.coden=IEEPAD&rft_id=info:doi/10.1109/ICDAR.2019.00241&rft.eisbn=9781728130149&rft.eisbn_list=172813014X&rft_dat=%3Cieee_CHZPO%3E8977999%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i1558-2d78d828d8dee753190347518c2c5ddc28a308dedece81b0f678064cabb512523%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=8977999&rfr_iscdi=true |