Loading…
Fast Extraction of Article Titles from XML Based Large Bibliographic Datasets
On a daily basis, large numbers of research articles are published world-wide. Usually the meta data of these articles are made available in bibliographic datasets. The format of such bibliographic dataset is generally in xml format. This format is generally used for data transfer between systems an...
Saved in:
Published in: | Procedia technology 2016, Vol.24, p.1263-1267 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | On a daily basis, large numbers of research articles are published world-wide. Usually the meta data of these articles are made available in bibliographic datasets. The format of such bibliographic dataset is generally in xml format. This format is generally used for data transfer between systems and for data processing by systems. An xml bibliographic dataset will have many article tags and its sub tags specify the meta data associated with each article. Usually an article tag will be associated with many meta data sub tags. Extraction of article title tags is essential for domain based classification of articles. This extraction and subsequent classification of research article titles present in a bibliographic dataset is a laborious task which is usually done manually. Hence a fast and efficient technique is essential to extract titles from datasets and is the need of the hour. In this article, a fast map reduced based approach is proposed to quickly extract research articles titles from bibliographic dataset. Articles from DBLP bibliographic dataset of past 3 years is used in this study. Hadoop Map reduce method is used to speed up the title extraction process from large xml based bibliographic datasets. Performance analysis revealed that the proposed method is quick, efficient and highly scalable. |
---|---|
ISSN: | 2212-0173 2212-0173 |
DOI: | 10.1016/j.protcy.2016.05.108 |