Loading…

ICDAR 2019 Historical Document Reading Challenge on Large Structured Chinese Family Records

In this paper, we present a large historical database of Chinese family records with the aim to develop robust systems for historical document analysis. In this direction, we propose a Historical Document Reading Challenge on Large Chinese Structured Family Records (ICDAR 2019 HDRC-CHINESE). The obj...

Full description

Saved in:
Bibliographic Details
Main Authors: Saini, Rajkumar, Dobson, Derek, Morrey, Jon, Liwicki, Marcus, Simistira Liwicki, Foteini
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 1504
container_issue
container_start_page 1499
container_title
container_volume
creator Saini, Rajkumar
Dobson, Derek
Morrey, Jon
Liwicki, Marcus
Simistira Liwicki, Foteini
description In this paper, we present a large historical database of Chinese family records with the aim to develop robust systems for historical document analysis. In this direction, we propose a Historical Document Reading Challenge on Large Chinese Structured Family Records (ICDAR 2019 HDRC-CHINESE). The objective of the competition is to recognize and analyze the layout, and finally detect and recognize the textlines and characters of the large historical document image dataset containing more than 100000 pages. Cascade R-CNN, CRNN, and U-Net based architectures were trained to evaluate the performances in these tasks. Error rate of 0.01 has been recorded for textline recognition (Task1) whereas a Jaccard Index of 99:54% has been recorded for layout analysis (Task2). The graph edit distance based total error ratio of 1:5% has been recorded for complete integrated textline detection and recognition (Task3).
doi_str_mv 10.1109/ICDAR.2019.00241
format conference_proceeding
fullrecord <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_8977999</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8977999</ieee_id><sourcerecordid>8977999</sourcerecordid><originalsourceid>FETCH-LOGICAL-i1558-2d78d828d8dee753190347518c2c5ddc28a308dedece81b0f678064cabb512523</originalsourceid><addsrcrecordid>eNotj01LAzEQhqMgWGvvgpf8ga2ZZLNJjmVrbWFBqHryULLJtEb2Q7K7h_5748dhmIHnfQZeQu6ALQGYediV69V-yRmYJWM8hwuyMEqD4hoEg9xckhkXymQccnZNbobhk6WsMcWMvP-69Mel2zCMfQzONnTdu6nFbqR7tD50J1p-2KbB7oS072hlYzpexji5cYroEw0dDkg3tg3NOUmuj364JVdH2wy4-N9z8rZ5fC23WfX8tCtXVRZASp1xr7TXPI1HVFKAYSJXErTjTnrvuLaCJebRoYaaHQulWZE7W9cSuORiTu7__gZEPHzF0Np4PmijVOoovgGrmlFI</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>ICDAR 2019 Historical Document Reading Challenge on Large Structured Chinese Family Records</title><source>IEEE Xplore All Conference Series</source><creator>Saini, Rajkumar ; Dobson, Derek ; Morrey, Jon ; Liwicki, Marcus ; Simistira Liwicki, Foteini</creator><creatorcontrib>Saini, Rajkumar ; Dobson, Derek ; Morrey, Jon ; Liwicki, Marcus ; Simistira Liwicki, Foteini</creatorcontrib><description>In this paper, we present a large historical database of Chinese family records with the aim to develop robust systems for historical document analysis. In this direction, we propose a Historical Document Reading Challenge on Large Chinese Structured Family Records (ICDAR 2019 HDRC-CHINESE). The objective of the competition is to recognize and analyze the layout, and finally detect and recognize the textlines and characters of the large historical document image dataset containing more than 100000 pages. Cascade R-CNN, CRNN, and U-Net based architectures were trained to evaluate the performances in these tasks. Error rate of 0.01 has been recorded for textline recognition (Task1) whereas a Jaccard Index of 99:54% has been recorded for layout analysis (Task2). The graph edit distance based total error ratio of 1:5% has been recorded for complete integrated textline detection and recognition (Task3).</description><identifier>EISSN: 2379-2140</identifier><identifier>EISBN: 9781728130149</identifier><identifier>EISBN: 172813014X</identifier><identifier>DOI: 10.1109/ICDAR.2019.00241</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>IEEE</publisher><subject>Character recognition ; Chinese Script ; Complete Integrated Detection and Recognition ; Document Analysis ; Foreground Background Segmentation ; Han Script ; Layout ; Layout Analysis ; Measurement ; Task analysis ; Text Recognition ; Traditional Chinese HDRC Database ; XML</subject><ispartof>2019 International Conference on Document Analysis and Recognition (ICDAR), 2019, p.1499-1504</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8977999$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8977999$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Saini, Rajkumar</creatorcontrib><creatorcontrib>Dobson, Derek</creatorcontrib><creatorcontrib>Morrey, Jon</creatorcontrib><creatorcontrib>Liwicki, Marcus</creatorcontrib><creatorcontrib>Simistira Liwicki, Foteini</creatorcontrib><title>ICDAR 2019 Historical Document Reading Challenge on Large Structured Chinese Family Records</title><title>2019 International Conference on Document Analysis and Recognition (ICDAR)</title><addtitle>ICDAR</addtitle><description>In this paper, we present a large historical database of Chinese family records with the aim to develop robust systems for historical document analysis. In this direction, we propose a Historical Document Reading Challenge on Large Chinese Structured Family Records (ICDAR 2019 HDRC-CHINESE). The objective of the competition is to recognize and analyze the layout, and finally detect and recognize the textlines and characters of the large historical document image dataset containing more than 100000 pages. Cascade R-CNN, CRNN, and U-Net based architectures were trained to evaluate the performances in these tasks. Error rate of 0.01 has been recorded for textline recognition (Task1) whereas a Jaccard Index of 99:54% has been recorded for layout analysis (Task2). The graph edit distance based total error ratio of 1:5% has been recorded for complete integrated textline detection and recognition (Task3).</description><subject>Character recognition</subject><subject>Chinese Script</subject><subject>Complete Integrated Detection and Recognition</subject><subject>Document Analysis</subject><subject>Foreground Background Segmentation</subject><subject>Han Script</subject><subject>Layout</subject><subject>Layout Analysis</subject><subject>Measurement</subject><subject>Task analysis</subject><subject>Text Recognition</subject><subject>Traditional Chinese HDRC Database</subject><subject>XML</subject><issn>2379-2140</issn><isbn>9781728130149</isbn><isbn>172813014X</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2019</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotj01LAzEQhqMgWGvvgpf8ga2ZZLNJjmVrbWFBqHryULLJtEb2Q7K7h_5748dhmIHnfQZeQu6ALQGYediV69V-yRmYJWM8hwuyMEqD4hoEg9xckhkXymQccnZNbobhk6WsMcWMvP-69Mel2zCMfQzONnTdu6nFbqR7tD50J1p-2KbB7oS072hlYzpexji5cYroEw0dDkg3tg3NOUmuj364JVdH2wy4-N9z8rZ5fC23WfX8tCtXVRZASp1xr7TXPI1HVFKAYSJXErTjTnrvuLaCJebRoYaaHQulWZE7W9cSuORiTu7__gZEPHzF0Np4PmijVOoovgGrmlFI</recordid><startdate>201909</startdate><enddate>201909</enddate><creator>Saini, Rajkumar</creator><creator>Dobson, Derek</creator><creator>Morrey, Jon</creator><creator>Liwicki, Marcus</creator><creator>Simistira Liwicki, Foteini</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>201909</creationdate><title>ICDAR 2019 Historical Document Reading Challenge on Large Structured Chinese Family Records</title><author>Saini, Rajkumar ; Dobson, Derek ; Morrey, Jon ; Liwicki, Marcus ; Simistira Liwicki, Foteini</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i1558-2d78d828d8dee753190347518c2c5ddc28a308dedece81b0f678064cabb512523</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Character recognition</topic><topic>Chinese Script</topic><topic>Complete Integrated Detection and Recognition</topic><topic>Document Analysis</topic><topic>Foreground Background Segmentation</topic><topic>Han Script</topic><topic>Layout</topic><topic>Layout Analysis</topic><topic>Measurement</topic><topic>Task analysis</topic><topic>Text Recognition</topic><topic>Traditional Chinese HDRC Database</topic><topic>XML</topic><toplevel>online_resources</toplevel><creatorcontrib>Saini, Rajkumar</creatorcontrib><creatorcontrib>Dobson, Derek</creatorcontrib><creatorcontrib>Morrey, Jon</creatorcontrib><creatorcontrib>Liwicki, Marcus</creatorcontrib><creatorcontrib>Simistira Liwicki, Foteini</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Saini, Rajkumar</au><au>Dobson, Derek</au><au>Morrey, Jon</au><au>Liwicki, Marcus</au><au>Simistira Liwicki, Foteini</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>ICDAR 2019 Historical Document Reading Challenge on Large Structured Chinese Family Records</atitle><btitle>2019 International Conference on Document Analysis and Recognition (ICDAR)</btitle><stitle>ICDAR</stitle><date>2019-09</date><risdate>2019</risdate><spage>1499</spage><epage>1504</epage><pages>1499-1504</pages><eissn>2379-2140</eissn><eisbn>9781728130149</eisbn><eisbn>172813014X</eisbn><coden>IEEPAD</coden><abstract>In this paper, we present a large historical database of Chinese family records with the aim to develop robust systems for historical document analysis. In this direction, we propose a Historical Document Reading Challenge on Large Chinese Structured Family Records (ICDAR 2019 HDRC-CHINESE). The objective of the competition is to recognize and analyze the layout, and finally detect and recognize the textlines and characters of the large historical document image dataset containing more than 100000 pages. Cascade R-CNN, CRNN, and U-Net based architectures were trained to evaluate the performances in these tasks. Error rate of 0.01 has been recorded for textline recognition (Task1) whereas a Jaccard Index of 99:54% has been recorded for layout analysis (Task2). The graph edit distance based total error ratio of 1:5% has been recorded for complete integrated textline detection and recognition (Task3).</abstract><pub>IEEE</pub><doi>10.1109/ICDAR.2019.00241</doi><tpages>6</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier EISSN: 2379-2140
ispartof 2019 International Conference on Document Analysis and Recognition (ICDAR), 2019, p.1499-1504
issn 2379-2140
language eng
recordid cdi_ieee_primary_8977999
source IEEE Xplore All Conference Series
subjects Character recognition
Chinese Script
Complete Integrated Detection and Recognition
Document Analysis
Foreground Background Segmentation
Han Script
Layout
Layout Analysis
Measurement
Task analysis
Text Recognition
Traditional Chinese HDRC Database
XML
title ICDAR 2019 Historical Document Reading Challenge on Large Structured Chinese Family Records
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T22%3A33%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=ICDAR%202019%20Historical%20Document%20Reading%20Challenge%20on%20Large%20Structured%20Chinese%20Family%20Records&rft.btitle=2019%20International%20Conference%20on%20Document%20Analysis%20and%20Recognition%20(ICDAR)&rft.au=Saini,%20Rajkumar&rft.date=2019-09&rft.spage=1499&rft.epage=1504&rft.pages=1499-1504&rft.eissn=2379-2140&rft.coden=IEEPAD&rft_id=info:doi/10.1109/ICDAR.2019.00241&rft.eisbn=9781728130149&rft.eisbn_list=172813014X&rft_dat=%3Cieee_CHZPO%3E8977999%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i1558-2d78d828d8dee753190347518c2c5ddc28a308dedece81b0f678064cabb512523%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=8977999&rfr_iscdi=true