Loading…

Automatic Page Classification in a Large Collection of Manuscripts Based on the International Image Interoperability Framework

In patrimonial institutions such as libraries and archives, the valorization of the vast amount of documents that have been recently digitized is still a challenge. Most of these documents are freely accessible as images but their textual content remains largely unreachable and unknown. Research pro...

Full description

Saved in:
Bibliographic Details
Main Authors: Emanuela Boros, Emanuela, Toumi, Alexis, Rouchet, Erwan, Abadie, Bastien, Stutzmann, Dominique, Kermorvant, Christopher
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In patrimonial institutions such as libraries and archives, the valorization of the vast amount of documents that have been recently digitized is still a challenge. Most of these documents are freely accessible as images but their textual content remains largely unreachable and unknown. Research projects dedicated to specific collection allow creating meta-data or even transcriptions obtained through volunteers or crowd-sourcing. But the vast majority of the documents cannot be manually transcribed or indexed: automatic large-scale processes for indexing are needed. The increasing adoption of the International Image Interoperability Framework (IIIF) by the patrimonial institutions is a technological enabler for the development of such services. Images are accessible with a unique protocol across institutions and both images and data can be presented with standard tools. In this paper, we describe an architecture for automatic processing of historical documents owned by different institutions but processed and presented thanks to the IIIF framework. We implemented this architecture and processed a large collection of books of hours with a page classifier trained on an annotated sample. The result is freely distributed and can be viewed with any IIIF compatible viewer.
ISSN:2379-2140
DOI:10.1109/ICDAR.2019.00126