Loading…

Joint Learning for Document-Level Threat Intelligence Relation Extraction and Coreference Resolution Based on GCN

In order to help researchers quickly understand the connection between new threat events and previous threat events, threat intelligence document-level relation extraction plays a very important role in threat intelligence text analysis and processing. Because there is no public document-level threa...

Full description

Saved in:

Bibliographic Details
Main Authors:	Wang, Xuren, Xiong, Mengbo, Luo, Yali, Li, Ning, Jiang, Zhengwei, Xiong, Zihan
Format:	Conference Proceeding
Language:	English
Subjects:	APT intelligence entities APTERC-DOC Conferences coreference dataset coreference relation coreference resolution document-level text document-level threat intelligence relation extraction feature extraction GCN inter-sentence relation extraction inter-sentence relations intra-sentence relations joint learning framework learning (artificial intelligence) natural language processing pattern classification predefined relations previous threat events Privacy public document-level threat intelligence dataset relation extraction SDP-VP-SET Security sentence set Task analysis text analysis threat intelligence threat intelligence document-level relation extraction threat intelligence text analysis Transforms trees (mathematics)
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In order to help researchers quickly understand the connection between new threat events and previous threat events, threat intelligence document-level relation extraction plays a very important role in threat intelligence text analysis and processing. Because there is no public document-level threat intelligence dataset, we create APTERC-DOC, an APT intelligence entities, relations and coreference dataset. We treat the relation extraction as a multi-classification task. Treating the coreference relation as a kind of predefined relations, we develop a joint learning framework called TIRECO, a model which can simultaneously complete threat intelligence relation extraction and coreference resolution. In order to solve the problem of document-level text being too long to extract feature, we propose the concept of sentence set, which transforms document-level relation extraction into inter-sentence relation extraction. To incorporate relevant information with maximally removing irrelevant content in sentence set, we further apply a novel pruning strategy (SDP-VP-SET) to the input trees considering that verbs are crucial in determining the relation between entities in sentence set. With retaining the shortest path and nodes that are K hops away from the shortest path, we give the edge connected to the verb nodes a weight of w times. Experimental results show that our model not only performs well in the extraction of inter-sentence relations, it is also effective in intra-sentence relations, and the F1 value has increased by 15.694%.
ISSN:	2324-9013
DOI:	10.1109/TrustCom50675.2020.00083