Loading…
CrudeOilNews: An Annotated Crude Oil News Corpus for Event Extraction
In this paper, we present CrudeOilNews, a corpus of English Crude Oil news for event extraction. It is the first of its kind for Commodity News and serve to contribute towards resource building for economic and financial text mining. This paper describes the data collection process, the annotation m...
Saved in:
Published in: | arXiv.org 2022-04 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | |
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Lee, Meisin Lay-Ki Soon Eu-Gene Siew Ly Fie Sugianto |
description | In this paper, we present CrudeOilNews, a corpus of English Crude Oil news for event extraction. It is the first of its kind for Commodity News and serve to contribute towards resource building for economic and financial text mining. This paper describes the data collection process, the annotation methodology and the event typology used in producing the corpus. Firstly, a seed set of 175 news articles were manually annotated, of which a subset of 25 news were used as the adjudicated reference test set for inter-annotator and system evaluation. Agreement was generally substantial and annotator performance was adequate, indicating that the annotation scheme produces consistent event annotations of high quality. Subsequently the dataset is expanded through (1) data augmentation and (2) Human-in-the-loop active learning. The resulting corpus has 425 news articles with approximately 11k events annotated. As part of active learning process, the corpus was used to train basic event extraction models for machine labeling, the resulting models also serve as a validation or as a pilot study demonstrating the use of the corpus in machine learning purposes. The annotated corpus is made available for academic research purpose at https://github.com/meisin/CrudeOilNews-Corpus. |
format | article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2649143453</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2649143453</sourcerecordid><originalsourceid>FETCH-proquest_journals_26491434533</originalsourceid><addsrcrecordid>eNqNyskKwjAUheEgCBbtO1xwXUgz1GEnJeJKN-5LsLfQUpKaQX18o_gAwoF_8Z0ZyRjnZbEVjC1I7v1AKWXVhknJM6JqF1u89OMZn34PB5NmbNABW_gSJIMPQm3dFD101oF6oAmgXsHpW-itWZF5p0eP-a9Lsj6qa30qJmfvEX1oBhudSdSwSuxKwYXk_L_XGwZCOhE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2649143453</pqid></control><display><type>article</type><title>CrudeOilNews: An Annotated Crude Oil News Corpus for Event Extraction</title><source>Publicly Available Content Database</source><creator>Lee, Meisin ; Lay-Ki Soon ; Eu-Gene Siew ; Ly Fie Sugianto</creator><creatorcontrib>Lee, Meisin ; Lay-Ki Soon ; Eu-Gene Siew ; Ly Fie Sugianto</creatorcontrib><description>In this paper, we present CrudeOilNews, a corpus of English Crude Oil news for event extraction. It is the first of its kind for Commodity News and serve to contribute towards resource building for economic and financial text mining. This paper describes the data collection process, the annotation methodology and the event typology used in producing the corpus. Firstly, a seed set of 175 news articles were manually annotated, of which a subset of 25 news were used as the adjudicated reference test set for inter-annotator and system evaluation. Agreement was generally substantial and annotator performance was adequate, indicating that the annotation scheme produces consistent event annotations of high quality. Subsequently the dataset is expanded through (1) data augmentation and (2) Human-in-the-loop active learning. The resulting corpus has 425 news articles with approximately 11k events annotated. As part of active learning process, the corpus was used to train basic event extraction models for machine labeling, the resulting models also serve as a validation or as a pilot study demonstrating the use of the corpus in machine learning purposes. The annotated corpus is made available for academic research purpose at https://github.com/meisin/CrudeOilNews-Corpus.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Active learning ; Annotations ; Crude oil ; Data collection ; Data mining ; Machine learning ; News</subject><ispartof>arXiv.org, 2022-04</ispartof><rights>2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2649143453?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Lee, Meisin</creatorcontrib><creatorcontrib>Lay-Ki Soon</creatorcontrib><creatorcontrib>Eu-Gene Siew</creatorcontrib><creatorcontrib>Ly Fie Sugianto</creatorcontrib><title>CrudeOilNews: An Annotated Crude Oil News Corpus for Event Extraction</title><title>arXiv.org</title><description>In this paper, we present CrudeOilNews, a corpus of English Crude Oil news for event extraction. It is the first of its kind for Commodity News and serve to contribute towards resource building for economic and financial text mining. This paper describes the data collection process, the annotation methodology and the event typology used in producing the corpus. Firstly, a seed set of 175 news articles were manually annotated, of which a subset of 25 news were used as the adjudicated reference test set for inter-annotator and system evaluation. Agreement was generally substantial and annotator performance was adequate, indicating that the annotation scheme produces consistent event annotations of high quality. Subsequently the dataset is expanded through (1) data augmentation and (2) Human-in-the-loop active learning. The resulting corpus has 425 news articles with approximately 11k events annotated. As part of active learning process, the corpus was used to train basic event extraction models for machine labeling, the resulting models also serve as a validation or as a pilot study demonstrating the use of the corpus in machine learning purposes. The annotated corpus is made available for academic research purpose at https://github.com/meisin/CrudeOilNews-Corpus.</description><subject>Active learning</subject><subject>Annotations</subject><subject>Crude oil</subject><subject>Data collection</subject><subject>Data mining</subject><subject>Machine learning</subject><subject>News</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNyskKwjAUheEgCBbtO1xwXUgz1GEnJeJKN-5LsLfQUpKaQX18o_gAwoF_8Z0ZyRjnZbEVjC1I7v1AKWXVhknJM6JqF1u89OMZn34PB5NmbNABW_gSJIMPQm3dFD101oF6oAmgXsHpW-itWZF5p0eP-a9Lsj6qa30qJmfvEX1oBhudSdSwSuxKwYXk_L_XGwZCOhE</recordid><startdate>20220408</startdate><enddate>20220408</enddate><creator>Lee, Meisin</creator><creator>Lay-Ki Soon</creator><creator>Eu-Gene Siew</creator><creator>Ly Fie Sugianto</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20220408</creationdate><title>CrudeOilNews: An Annotated Crude Oil News Corpus for Event Extraction</title><author>Lee, Meisin ; Lay-Ki Soon ; Eu-Gene Siew ; Ly Fie Sugianto</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_26491434533</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Active learning</topic><topic>Annotations</topic><topic>Crude oil</topic><topic>Data collection</topic><topic>Data mining</topic><topic>Machine learning</topic><topic>News</topic><toplevel>online_resources</toplevel><creatorcontrib>Lee, Meisin</creatorcontrib><creatorcontrib>Lay-Ki Soon</creatorcontrib><creatorcontrib>Eu-Gene Siew</creatorcontrib><creatorcontrib>Ly Fie Sugianto</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lee, Meisin</au><au>Lay-Ki Soon</au><au>Eu-Gene Siew</au><au>Ly Fie Sugianto</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>CrudeOilNews: An Annotated Crude Oil News Corpus for Event Extraction</atitle><jtitle>arXiv.org</jtitle><date>2022-04-08</date><risdate>2022</risdate><eissn>2331-8422</eissn><abstract>In this paper, we present CrudeOilNews, a corpus of English Crude Oil news for event extraction. It is the first of its kind for Commodity News and serve to contribute towards resource building for economic and financial text mining. This paper describes the data collection process, the annotation methodology and the event typology used in producing the corpus. Firstly, a seed set of 175 news articles were manually annotated, of which a subset of 25 news were used as the adjudicated reference test set for inter-annotator and system evaluation. Agreement was generally substantial and annotator performance was adequate, indicating that the annotation scheme produces consistent event annotations of high quality. Subsequently the dataset is expanded through (1) data augmentation and (2) Human-in-the-loop active learning. The resulting corpus has 425 news articles with approximately 11k events annotated. As part of active learning process, the corpus was used to train basic event extraction models for machine labeling, the resulting models also serve as a validation or as a pilot study demonstrating the use of the corpus in machine learning purposes. The annotated corpus is made available for academic research purpose at https://github.com/meisin/CrudeOilNews-Corpus.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2022-04 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2649143453 |
source | Publicly Available Content Database |
subjects | Active learning Annotations Crude oil Data collection Data mining Machine learning News |
title | CrudeOilNews: An Annotated Crude Oil News Corpus for Event Extraction |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-21T16%3A25%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=CrudeOilNews:%20An%20Annotated%20Crude%20Oil%20News%20Corpus%20for%20Event%20Extraction&rft.jtitle=arXiv.org&rft.au=Lee,%20Meisin&rft.date=2022-04-08&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2649143453%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_26491434533%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2649143453&rft_id=info:pmid/&rfr_iscdi=true |