Loading…

Generic SAO Similarity Measure via Extended Sørensen-Dice Index

As an essential component of many Natural Language Processing applications, semantic similarity measure has been studied for decades. Recent research results indicate that the Subject-Action- Object (SAO) structure in sentences is more desirable for describing the technological information, and SAO-...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access 2020-01, Vol.8, p.1-1
Main Authors: Li, Xiaoman, Wang, Cui, Zhang, Xuefu, Sun, Wei
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c408t-755ec2982e763e6cf4481c6c86a9b7d308df0cac988bd3e3a3e72ae0cb4345953
cites cdi_FETCH-LOGICAL-c408t-755ec2982e763e6cf4481c6c86a9b7d308df0cac988bd3e3a3e72ae0cb4345953
container_end_page 1
container_issue
container_start_page 1
container_title IEEE access
container_volume 8
creator Li, Xiaoman
Wang, Cui
Zhang, Xuefu
Sun, Wei
description As an essential component of many Natural Language Processing applications, semantic similarity measure has been studied for decades. Recent research results indicate that the Subject-Action- Object (SAO) structure in sentences is more desirable for describing the technological information, and SAO-based similarity measure outperforms classical text-based ones. The typical approach in the literature to finding the similarity between two SAO structures relies on a term matching technique, which produces the similarity score by the Sørensen-Dice index, i.e., the proportion of the total number of matching terms. However, in this paper, we observe that the entities in the SAO structures usually have a small number of terms, which makes the currently acknowledged methods have a high recurrence rate and poor accuracy. To settle this issue, we extend the Sørensen-Dice index, and present a new unified framework for the SAO similarity measure that can give a higher discrimination. The effectiveness of our measure is evaluated on the basis of patent data sets in the Nano-Fertilizer field. The results show that our measure can significantly improve the accuracy than the currently acknowledged ones. The proposed measure has an excellent flexibility and robustness, and can be easily used for patent similarity measure. In addition, the extended Sørensen-Dice index is of independent interest, and has potential applications for other similarity measures.
doi_str_mv 10.1109/ACCESS.2020.2984024
format article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_proquest_journals_2453704305</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9050516</ieee_id><doaj_id>oai_doaj_org_article_2b223cb65aaf4149b94ad27eda05aad6</doaj_id><sourcerecordid>2453704305</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-755ec2982e763e6cf4481c6c86a9b7d308df0cac988bd3e3a3e72ae0cb4345953</originalsourceid><addsrcrecordid>eNpNUMtOwkAUbYwmEuQL2DRxXZx3OztJRSTBsKiuJ9OZWzMEWpwWA3_m3h9zsIR4N_fm5J5HThSNMZpgjOTDNM9nRTEhiKAJkRlDhF1FA4KFTCin4vrffRuN2naNwmQB4ukgepxDDd6ZuJiu4sJt3UZ71x3jV9Dt3kP85XQ8O3RQW7Bx8fPtoW6hTp6cgXgRwMNddFPpTQuj8x5G78-zt_wlWa7mi3y6TAxDWZeknIMJ6QikgoIwFWMZNsJkQssytRRltkJGG5llpaVANYWUaECmZJRxyekwWvS6ttFrtfNuq_1RNdqpP6DxH0r7zpkNKFISQk0puNYVw0yWkmlLUrAaBciKoHXfa-1887mHtlPrZu_rEF8RxmmKGEUnR9p_Gd-0rYfq4oqROjWv-ubVqXl1bj6wxj3LAcCFIRFHHAv6Cw8GfnI</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2453704305</pqid></control><display><type>article</type><title>Generic SAO Similarity Measure via Extended Sørensen-Dice Index</title><source>IEEE Xplore Open Access Journals</source><creator>Li, Xiaoman ; Wang, Cui ; Zhang, Xuefu ; Sun, Wei</creator><creatorcontrib>Li, Xiaoman ; Wang, Cui ; Zhang, Xuefu ; Sun, Wei</creatorcontrib><description>As an essential component of many Natural Language Processing applications, semantic similarity measure has been studied for decades. Recent research results indicate that the Subject-Action- Object (SAO) structure in sentences is more desirable for describing the technological information, and SAO-based similarity measure outperforms classical text-based ones. The typical approach in the literature to finding the similarity between two SAO structures relies on a term matching technique, which produces the similarity score by the Sørensen-Dice index, i.e., the proportion of the total number of matching terms. However, in this paper, we observe that the entities in the SAO structures usually have a small number of terms, which makes the currently acknowledged methods have a high recurrence rate and poor accuracy. To settle this issue, we extend the Sørensen-Dice index, and present a new unified framework for the SAO similarity measure that can give a higher discrimination. The effectiveness of our measure is evaluated on the basis of patent data sets in the Nano-Fertilizer field. The results show that our measure can significantly improve the accuracy than the currently acknowledged ones. The proposed measure has an excellent flexibility and robustness, and can be easily used for patent similarity measure. In addition, the extended Sørensen-Dice index is of independent interest, and has potential applications for other similarity measures.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2020.2984024</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Atmospheric measurements ; Computational linguistics ; Current measurement ; Indexes ; Matching ; Natural language processing ; Particle measurements ; Semantic information ; Semantics ; Sentences ; Similarity ; Similarity measurement ; Similarity measures ; Subject-Action-Object ; Syntactics ; Sørensen-Dice index</subject><ispartof>IEEE access, 2020-01, Vol.8, p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-755ec2982e763e6cf4481c6c86a9b7d308df0cac988bd3e3a3e72ae0cb4345953</citedby><cites>FETCH-LOGICAL-c408t-755ec2982e763e6cf4481c6c86a9b7d308df0cac988bd3e3a3e72ae0cb4345953</cites><orcidid>0000-0001-5718-0047</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9050516$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,27633,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Li, Xiaoman</creatorcontrib><creatorcontrib>Wang, Cui</creatorcontrib><creatorcontrib>Zhang, Xuefu</creatorcontrib><creatorcontrib>Sun, Wei</creatorcontrib><title>Generic SAO Similarity Measure via Extended Sørensen-Dice Index</title><title>IEEE access</title><addtitle>Access</addtitle><description>As an essential component of many Natural Language Processing applications, semantic similarity measure has been studied for decades. Recent research results indicate that the Subject-Action- Object (SAO) structure in sentences is more desirable for describing the technological information, and SAO-based similarity measure outperforms classical text-based ones. The typical approach in the literature to finding the similarity between two SAO structures relies on a term matching technique, which produces the similarity score by the Sørensen-Dice index, i.e., the proportion of the total number of matching terms. However, in this paper, we observe that the entities in the SAO structures usually have a small number of terms, which makes the currently acknowledged methods have a high recurrence rate and poor accuracy. To settle this issue, we extend the Sørensen-Dice index, and present a new unified framework for the SAO similarity measure that can give a higher discrimination. The effectiveness of our measure is evaluated on the basis of patent data sets in the Nano-Fertilizer field. The results show that our measure can significantly improve the accuracy than the currently acknowledged ones. The proposed measure has an excellent flexibility and robustness, and can be easily used for patent similarity measure. In addition, the extended Sørensen-Dice index is of independent interest, and has potential applications for other similarity measures.</description><subject>Atmospheric measurements</subject><subject>Computational linguistics</subject><subject>Current measurement</subject><subject>Indexes</subject><subject>Matching</subject><subject>Natural language processing</subject><subject>Particle measurements</subject><subject>Semantic information</subject><subject>Semantics</subject><subject>Sentences</subject><subject>Similarity</subject><subject>Similarity measurement</subject><subject>Similarity measures</subject><subject>Subject-Action-Object</subject><subject>Syntactics</subject><subject>Sørensen-Dice index</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>DOA</sourceid><recordid>eNpNUMtOwkAUbYwmEuQL2DRxXZx3OztJRSTBsKiuJ9OZWzMEWpwWA3_m3h9zsIR4N_fm5J5HThSNMZpgjOTDNM9nRTEhiKAJkRlDhF1FA4KFTCin4vrffRuN2naNwmQB4ukgepxDDd6ZuJiu4sJt3UZ71x3jV9Dt3kP85XQ8O3RQW7Bx8fPtoW6hTp6cgXgRwMNddFPpTQuj8x5G78-zt_wlWa7mi3y6TAxDWZeknIMJ6QikgoIwFWMZNsJkQssytRRltkJGG5llpaVANYWUaECmZJRxyekwWvS6ttFrtfNuq_1RNdqpP6DxH0r7zpkNKFISQk0puNYVw0yWkmlLUrAaBciKoHXfa-1887mHtlPrZu_rEF8RxmmKGEUnR9p_Gd-0rYfq4oqROjWv-ubVqXl1bj6wxj3LAcCFIRFHHAv6Cw8GfnI</recordid><startdate>20200101</startdate><enddate>20200101</enddate><creator>Li, Xiaoman</creator><creator>Wang, Cui</creator><creator>Zhang, Xuefu</creator><creator>Sun, Wei</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-5718-0047</orcidid></search><sort><creationdate>20200101</creationdate><title>Generic SAO Similarity Measure via Extended Sørensen-Dice Index</title><author>Li, Xiaoman ; Wang, Cui ; Zhang, Xuefu ; Sun, Wei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-755ec2982e763e6cf4481c6c86a9b7d308df0cac988bd3e3a3e72ae0cb4345953</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Atmospheric measurements</topic><topic>Computational linguistics</topic><topic>Current measurement</topic><topic>Indexes</topic><topic>Matching</topic><topic>Natural language processing</topic><topic>Particle measurements</topic><topic>Semantic information</topic><topic>Semantics</topic><topic>Sentences</topic><topic>Similarity</topic><topic>Similarity measurement</topic><topic>Similarity measures</topic><topic>Subject-Action-Object</topic><topic>Syntactics</topic><topic>Sørensen-Dice index</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Xiaoman</creatorcontrib><creatorcontrib>Wang, Cui</creatorcontrib><creatorcontrib>Zhang, Xuefu</creatorcontrib><creatorcontrib>Sun, Wei</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Xplore Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE Xplore</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Xiaoman</au><au>Wang, Cui</au><au>Zhang, Xuefu</au><au>Sun, Wei</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Generic SAO Similarity Measure via Extended Sørensen-Dice Index</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2020-01-01</date><risdate>2020</risdate><volume>8</volume><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>As an essential component of many Natural Language Processing applications, semantic similarity measure has been studied for decades. Recent research results indicate that the Subject-Action- Object (SAO) structure in sentences is more desirable for describing the technological information, and SAO-based similarity measure outperforms classical text-based ones. The typical approach in the literature to finding the similarity between two SAO structures relies on a term matching technique, which produces the similarity score by the Sørensen-Dice index, i.e., the proportion of the total number of matching terms. However, in this paper, we observe that the entities in the SAO structures usually have a small number of terms, which makes the currently acknowledged methods have a high recurrence rate and poor accuracy. To settle this issue, we extend the Sørensen-Dice index, and present a new unified framework for the SAO similarity measure that can give a higher discrimination. The effectiveness of our measure is evaluated on the basis of patent data sets in the Nano-Fertilizer field. The results show that our measure can significantly improve the accuracy than the currently acknowledged ones. The proposed measure has an excellent flexibility and robustness, and can be easily used for patent similarity measure. In addition, the extended Sørensen-Dice index is of independent interest, and has potential applications for other similarity measures.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2020.2984024</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0001-5718-0047</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2020-01, Vol.8, p.1-1
issn 2169-3536
2169-3536
language eng
recordid cdi_proquest_journals_2453704305
source IEEE Xplore Open Access Journals
subjects Atmospheric measurements
Computational linguistics
Current measurement
Indexes
Matching
Natural language processing
Particle measurements
Semantic information
Semantics
Sentences
Similarity
Similarity measurement
Similarity measures
Subject-Action-Object
Syntactics
Sørensen-Dice index
title Generic SAO Similarity Measure via Extended Sørensen-Dice Index
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T10%3A13%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Generic%20SAO%20Similarity%20Measure%20via%20Extended%20S%C3%B8rensen-Dice%20Index&rft.jtitle=IEEE%20access&rft.au=Li,%20Xiaoman&rft.date=2020-01-01&rft.volume=8&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2020.2984024&rft_dat=%3Cproquest_ieee_%3E2453704305%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c408t-755ec2982e763e6cf4481c6c86a9b7d308df0cac988bd3e3a3e72ae0cb4345953%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2453704305&rft_id=info:pmid/&rft_ieee_id=9050516&rfr_iscdi=true