Loading…
A Hybrid Approach and Unified Framework for Bibliographic Reference Extraction
Publications are an integral part in a scientific community. Bibliographic reference extraction from scientific publication is a challenging task due to diversity in referencing styles and document layout. Existing methods perform sufficiently on one dataset however, applying these solutions to a di...
Saved in:
Published in: | arXiv.org 2020-10 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Publications are an integral part in a scientific community. Bibliographic reference extraction from scientific publication is a challenging task due to diversity in referencing styles and document layout. Existing methods perform sufficiently on one dataset however, applying these solutions to a different dataset proves to be challenging. Therefore, a generic solution was anticipated which could overcome the limitations of the previous approaches. The contribution of this paper is three-fold. First, it presents a novel approach called DeepBiRD which is inspired by human visual perception and exploits layout features to identify individual references in a scientific publication. Second, we release a large dataset for image-based reference detection with 2401 scans containing 38863 references, all manually annotated for individual reference. Third, we present a unified and highly configurable end-to-end automatic bibliographic reference extraction framework called BRExSys which employs DeepBiRD along with state-of-the-art text-based models to detect and visualize references from a bibliographic document. Our proposed approach pre-processes the images in which a hybrid representation is obtained by processing the given image using different computer vision techniques. Then, it performs layout driven reference detection using Mask R-CNN on a given scientific publication. DeepBiRD was evaluated on two different datasets to demonstrate the generalization of this approach. The proposed system achieved an AP50 of 98.56% on our dataset. DeepBiRD significantly outperformed the current state-of-the-art approach on their dataset. Therefore, suggesting that DeepBiRD is significantly superior in performance, generalized, and independent of any domain or referencing style. |
---|---|
ISSN: | 2331-8422 |
DOI: | 10.48550/arxiv.1912.07266 |