Loading…

Generic SAO Similarity Measure via Extended Sørensen-Dice Index

As an essential component of many Natural Language Processing applications, semantic similarity measure has been studied for decades. Recent research results indicate that the Subject-Action- Object (SAO) structure in sentences is more desirable for describing the technological information, and SAO-...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access 2020-01, Vol.8, p.1-1
Main Authors: Li, Xiaoman, Wang, Cui, Zhang, Xuefu, Sun, Wei
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:As an essential component of many Natural Language Processing applications, semantic similarity measure has been studied for decades. Recent research results indicate that the Subject-Action- Object (SAO) structure in sentences is more desirable for describing the technological information, and SAO-based similarity measure outperforms classical text-based ones. The typical approach in the literature to finding the similarity between two SAO structures relies on a term matching technique, which produces the similarity score by the Sørensen-Dice index, i.e., the proportion of the total number of matching terms. However, in this paper, we observe that the entities in the SAO structures usually have a small number of terms, which makes the currently acknowledged methods have a high recurrence rate and poor accuracy. To settle this issue, we extend the Sørensen-Dice index, and present a new unified framework for the SAO similarity measure that can give a higher discrimination. The effectiveness of our measure is evaluated on the basis of patent data sets in the Nano-Fertilizer field. The results show that our measure can significantly improve the accuracy than the currently acknowledged ones. The proposed measure has an excellent flexibility and robustness, and can be easily used for patent similarity measure. In addition, the extended Sørensen-Dice index is of independent interest, and has potential applications for other similarity measures.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2020.2984024