Loading…
Generic SAO Similarity Measure via Extended Sørensen-Dice Index
As an essential component of many Natural Language Processing applications, semantic similarity measure has been studied for decades. Recent research results indicate that the Subject-Action- Object (SAO) structure in sentences is more desirable for describing the technological information, and SAO-...
Saved in:
Published in: | IEEE access 2020-01, Vol.8, p.1-1 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c408t-755ec2982e763e6cf4481c6c86a9b7d308df0cac988bd3e3a3e72ae0cb4345953 |
---|---|
cites | cdi_FETCH-LOGICAL-c408t-755ec2982e763e6cf4481c6c86a9b7d308df0cac988bd3e3a3e72ae0cb4345953 |
container_end_page | 1 |
container_issue | |
container_start_page | 1 |
container_title | IEEE access |
container_volume | 8 |
creator | Li, Xiaoman Wang, Cui Zhang, Xuefu Sun, Wei |
description | As an essential component of many Natural Language Processing applications, semantic similarity measure has been studied for decades. Recent research results indicate that the Subject-Action- Object (SAO) structure in sentences is more desirable for describing the technological information, and SAO-based similarity measure outperforms classical text-based ones. The typical approach in the literature to finding the similarity between two SAO structures relies on a term matching technique, which produces the similarity score by the Sørensen-Dice index, i.e., the proportion of the total number of matching terms. However, in this paper, we observe that the entities in the SAO structures usually have a small number of terms, which makes the currently acknowledged methods have a high recurrence rate and poor accuracy. To settle this issue, we extend the Sørensen-Dice index, and present a new unified framework for the SAO similarity measure that can give a higher discrimination. The effectiveness of our measure is evaluated on the basis of patent data sets in the Nano-Fertilizer field. The results show that our measure can significantly improve the accuracy than the currently acknowledged ones. The proposed measure has an excellent flexibility and robustness, and can be easily used for patent similarity measure. In addition, the extended Sørensen-Dice index is of independent interest, and has potential applications for other similarity measures. |
doi_str_mv | 10.1109/ACCESS.2020.2984024 |
format | article |
fullrecord | <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_proquest_journals_2453704305</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9050516</ieee_id><doaj_id>oai_doaj_org_article_2b223cb65aaf4149b94ad27eda05aad6</doaj_id><sourcerecordid>2453704305</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-755ec2982e763e6cf4481c6c86a9b7d308df0cac988bd3e3a3e72ae0cb4345953</originalsourceid><addsrcrecordid>eNpNUMtOwkAUbYwmEuQL2DRxXZx3OztJRSTBsKiuJ9OZWzMEWpwWA3_m3h9zsIR4N_fm5J5HThSNMZpgjOTDNM9nRTEhiKAJkRlDhF1FA4KFTCin4vrffRuN2naNwmQB4ukgepxDDd6ZuJiu4sJt3UZ71x3jV9Dt3kP85XQ8O3RQW7Bx8fPtoW6hTp6cgXgRwMNddFPpTQuj8x5G78-zt_wlWa7mi3y6TAxDWZeknIMJ6QikgoIwFWMZNsJkQssytRRltkJGG5llpaVANYWUaECmZJRxyekwWvS6ttFrtfNuq_1RNdqpP6DxH0r7zpkNKFISQk0puNYVw0yWkmlLUrAaBciKoHXfa-1887mHtlPrZu_rEF8RxmmKGEUnR9p_Gd-0rYfq4oqROjWv-ubVqXl1bj6wxj3LAcCFIRFHHAv6Cw8GfnI</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2453704305</pqid></control><display><type>article</type><title>Generic SAO Similarity Measure via Extended Sørensen-Dice Index</title><source>IEEE Xplore Open Access Journals</source><creator>Li, Xiaoman ; Wang, Cui ; Zhang, Xuefu ; Sun, Wei</creator><creatorcontrib>Li, Xiaoman ; Wang, Cui ; Zhang, Xuefu ; Sun, Wei</creatorcontrib><description>As an essential component of many Natural Language Processing applications, semantic similarity measure has been studied for decades. Recent research results indicate that the Subject-Action- Object (SAO) structure in sentences is more desirable for describing the technological information, and SAO-based similarity measure outperforms classical text-based ones. The typical approach in the literature to finding the similarity between two SAO structures relies on a term matching technique, which produces the similarity score by the Sørensen-Dice index, i.e., the proportion of the total number of matching terms. However, in this paper, we observe that the entities in the SAO structures usually have a small number of terms, which makes the currently acknowledged methods have a high recurrence rate and poor accuracy. To settle this issue, we extend the Sørensen-Dice index, and present a new unified framework for the SAO similarity measure that can give a higher discrimination. The effectiveness of our measure is evaluated on the basis of patent data sets in the Nano-Fertilizer field. The results show that our measure can significantly improve the accuracy than the currently acknowledged ones. The proposed measure has an excellent flexibility and robustness, and can be easily used for patent similarity measure. In addition, the extended Sørensen-Dice index is of independent interest, and has potential applications for other similarity measures.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2020.2984024</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Atmospheric measurements ; Computational linguistics ; Current measurement ; Indexes ; Matching ; Natural language processing ; Particle measurements ; Semantic information ; Semantics ; Sentences ; Similarity ; Similarity measurement ; Similarity measures ; Subject-Action-Object ; Syntactics ; Sørensen-Dice index</subject><ispartof>IEEE access, 2020-01, Vol.8, p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-755ec2982e763e6cf4481c6c86a9b7d308df0cac988bd3e3a3e72ae0cb4345953</citedby><cites>FETCH-LOGICAL-c408t-755ec2982e763e6cf4481c6c86a9b7d308df0cac988bd3e3a3e72ae0cb4345953</cites><orcidid>0000-0001-5718-0047</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9050516$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,27633,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Li, Xiaoman</creatorcontrib><creatorcontrib>Wang, Cui</creatorcontrib><creatorcontrib>Zhang, Xuefu</creatorcontrib><creatorcontrib>Sun, Wei</creatorcontrib><title>Generic SAO Similarity Measure via Extended Sørensen-Dice Index</title><title>IEEE access</title><addtitle>Access</addtitle><description>As an essential component of many Natural Language Processing applications, semantic similarity measure has been studied for decades. Recent research results indicate that the Subject-Action- Object (SAO) structure in sentences is more desirable for describing the technological information, and SAO-based similarity measure outperforms classical text-based ones. The typical approach in the literature to finding the similarity between two SAO structures relies on a term matching technique, which produces the similarity score by the Sørensen-Dice index, i.e., the proportion of the total number of matching terms. However, in this paper, we observe that the entities in the SAO structures usually have a small number of terms, which makes the currently acknowledged methods have a high recurrence rate and poor accuracy. To settle this issue, we extend the Sørensen-Dice index, and present a new unified framework for the SAO similarity measure that can give a higher discrimination. The effectiveness of our measure is evaluated on the basis of patent data sets in the Nano-Fertilizer field. The results show that our measure can significantly improve the accuracy than the currently acknowledged ones. The proposed measure has an excellent flexibility and robustness, and can be easily used for patent similarity measure. In addition, the extended Sørensen-Dice index is of independent interest, and has potential applications for other similarity measures.</description><subject>Atmospheric measurements</subject><subject>Computational linguistics</subject><subject>Current measurement</subject><subject>Indexes</subject><subject>Matching</subject><subject>Natural language processing</subject><subject>Particle measurements</subject><subject>Semantic information</subject><subject>Semantics</subject><subject>Sentences</subject><subject>Similarity</subject><subject>Similarity measurement</subject><subject>Similarity measures</subject><subject>Subject-Action-Object</subject><subject>Syntactics</subject><subject>Sørensen-Dice index</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>DOA</sourceid><recordid>eNpNUMtOwkAUbYwmEuQL2DRxXZx3OztJRSTBsKiuJ9OZWzMEWpwWA3_m3h9zsIR4N_fm5J5HThSNMZpgjOTDNM9nRTEhiKAJkRlDhF1FA4KFTCin4vrffRuN2naNwmQB4ukgepxDDd6ZuJiu4sJt3UZ71x3jV9Dt3kP85XQ8O3RQW7Bx8fPtoW6hTp6cgXgRwMNddFPpTQuj8x5G78-zt_wlWa7mi3y6TAxDWZeknIMJ6QikgoIwFWMZNsJkQssytRRltkJGG5llpaVANYWUaECmZJRxyekwWvS6ttFrtfNuq_1RNdqpP6DxH0r7zpkNKFISQk0puNYVw0yWkmlLUrAaBciKoHXfa-1887mHtlPrZu_rEF8RxmmKGEUnR9p_Gd-0rYfq4oqROjWv-ubVqXl1bj6wxj3LAcCFIRFHHAv6Cw8GfnI</recordid><startdate>20200101</startdate><enddate>20200101</enddate><creator>Li, Xiaoman</creator><creator>Wang, Cui</creator><creator>Zhang, Xuefu</creator><creator>Sun, Wei</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-5718-0047</orcidid></search><sort><creationdate>20200101</creationdate><title>Generic SAO Similarity Measure via Extended Sørensen-Dice Index</title><author>Li, Xiaoman ; Wang, Cui ; Zhang, Xuefu ; Sun, Wei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-755ec2982e763e6cf4481c6c86a9b7d308df0cac988bd3e3a3e72ae0cb4345953</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Atmospheric measurements</topic><topic>Computational linguistics</topic><topic>Current measurement</topic><topic>Indexes</topic><topic>Matching</topic><topic>Natural language processing</topic><topic>Particle measurements</topic><topic>Semantic information</topic><topic>Semantics</topic><topic>Sentences</topic><topic>Similarity</topic><topic>Similarity measurement</topic><topic>Similarity measures</topic><topic>Subject-Action-Object</topic><topic>Syntactics</topic><topic>Sørensen-Dice index</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Xiaoman</creatorcontrib><creatorcontrib>Wang, Cui</creatorcontrib><creatorcontrib>Zhang, Xuefu</creatorcontrib><creatorcontrib>Sun, Wei</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Xplore Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE Xplore</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Xiaoman</au><au>Wang, Cui</au><au>Zhang, Xuefu</au><au>Sun, Wei</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Generic SAO Similarity Measure via Extended Sørensen-Dice Index</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2020-01-01</date><risdate>2020</risdate><volume>8</volume><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>As an essential component of many Natural Language Processing applications, semantic similarity measure has been studied for decades. Recent research results indicate that the Subject-Action- Object (SAO) structure in sentences is more desirable for describing the technological information, and SAO-based similarity measure outperforms classical text-based ones. The typical approach in the literature to finding the similarity between two SAO structures relies on a term matching technique, which produces the similarity score by the Sørensen-Dice index, i.e., the proportion of the total number of matching terms. However, in this paper, we observe that the entities in the SAO structures usually have a small number of terms, which makes the currently acknowledged methods have a high recurrence rate and poor accuracy. To settle this issue, we extend the Sørensen-Dice index, and present a new unified framework for the SAO similarity measure that can give a higher discrimination. The effectiveness of our measure is evaluated on the basis of patent data sets in the Nano-Fertilizer field. The results show that our measure can significantly improve the accuracy than the currently acknowledged ones. The proposed measure has an excellent flexibility and robustness, and can be easily used for patent similarity measure. In addition, the extended Sørensen-Dice index is of independent interest, and has potential applications for other similarity measures.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2020.2984024</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0001-5718-0047</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2169-3536 |
ispartof | IEEE access, 2020-01, Vol.8, p.1-1 |
issn | 2169-3536 2169-3536 |
language | eng |
recordid | cdi_proquest_journals_2453704305 |
source | IEEE Xplore Open Access Journals |
subjects | Atmospheric measurements Computational linguistics Current measurement Indexes Matching Natural language processing Particle measurements Semantic information Semantics Sentences Similarity Similarity measurement Similarity measures Subject-Action-Object Syntactics Sørensen-Dice index |
title | Generic SAO Similarity Measure via Extended Sørensen-Dice Index |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T10%3A13%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Generic%20SAO%20Similarity%20Measure%20via%20Extended%20S%C3%B8rensen-Dice%20Index&rft.jtitle=IEEE%20access&rft.au=Li,%20Xiaoman&rft.date=2020-01-01&rft.volume=8&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2020.2984024&rft_dat=%3Cproquest_ieee_%3E2453704305%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c408t-755ec2982e763e6cf4481c6c86a9b7d308df0cac988bd3e3a3e72ae0cb4345953%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2453704305&rft_id=info:pmid/&rft_ieee_id=9050516&rfr_iscdi=true |