Loading…

A Grammar-Based Semantic Similarity Algorithm for Natural Language Sentences

This paper presents a grammar and semantic corpus based similarity algorithm for natural language sentences. Natural language, in opposition to “artificial language”, such as computer programming languages, is the language used by the general public for daily communication. Traditional information r...

Full description

Saved in:
Bibliographic Details
Published in:TheScientificWorld 2014-01, Vol.2014 (2014), p.1-17
Main Authors: Lee, Ming Che, Hsieh, Tung Cheng, Chang, Jia Wei
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c638t-3662ea17e66f20fd13b98af558f50f8e0ee7ce91f75e332aa79fde66ebc01a183
cites cdi_FETCH-LOGICAL-c638t-3662ea17e66f20fd13b98af558f50f8e0ee7ce91f75e332aa79fde66ebc01a183
container_end_page 17
container_issue 2014
container_start_page 1
container_title TheScientificWorld
container_volume 2014
creator Lee, Ming Che
Hsieh, Tung Cheng
Chang, Jia Wei
description This paper presents a grammar and semantic corpus based similarity algorithm for natural language sentences. Natural language, in opposition to “artificial language”, such as computer programming languages, is the language used by the general public for daily communication. Traditional information retrieval approaches, such as vector models, LSA, HAL, or even the ontology-based approaches that extend to include concept similarity comparison instead of cooccurrence terms/words, may not always determine the perfect matching while there is no obvious relation or concept overlap between two natural language sentences. This paper proposes a sentence similarity algorithm that takes advantage of corpus-based ontology and grammatical rules to overcome the addressed problems. Experiments on two famous benchmarks demonstrate that the proposed algorithm has a significant performance improvement in sentences/short-texts with arbitrary syntax and structure.
doi_str_mv 10.1155/2014/437162
format article
fullrecord <record><control><sourceid>gale_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_215f6934fc3342c984ee6c2dd629822b</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A413711666</galeid><doaj_id>oai_doaj_org_article_215f6934fc3342c984ee6c2dd629822b</doaj_id><sourcerecordid>A413711666</sourcerecordid><originalsourceid>FETCH-LOGICAL-c638t-3662ea17e66f20fd13b98af558f50f8e0ee7ce91f75e332aa79fde66ebc01a183</originalsourceid><addsrcrecordid>eNqNks1v1DAQxSMEokvhxB1F4oKK0vo7yQVpqaBUWsGhIHGzZp1x1qskbp2Eqv89U1KqlhPywZb9m2e_8cuy15wdc671iWBcnShZciOeZCuuZVmUSv18mq2E1KYwXLGD7MU47hmTVcn18-xAqLoStRarbLPOzxL0PaTiI4zY5BfYwzAFl1-EPnSQwnSTr7s20mLX5z6m_CtMc4Iu38DQztAilQwTDg7Hl9kzD92Ir-7mw-zH50_fT78Um29n56frTeGMrKZCGiMQeInGeMF8w-W2rsBrXXnNfIUMsXRYc19qlFIAlLVvCMatYxx4JQ-z80W3ibC3lynQ829shGD_bMTUWkjkoUMruPamlso7KZVwdaUQjRNNYwS1QGxJ68OidTlve2wceSFzj0QfnwxhZ9v4yyrGNKsYCby7E0jxasZxsn0YHXYdDBjn0XKtBPW6lpLQt_-g-zingVpFlFFC8kpwoo4XqgUyEAYf6V5Ho8E-uDigD7S_Vpy-nBtjqOD9UuBSHMeE_v71nNnbiNjbiNglIkS_eWj4nv2bCQKOFmAXhgauw_-pUWzoZngAq9pIIX8DRJjL0A</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1564231821</pqid></control><display><type>article</type><title>A Grammar-Based Semantic Similarity Algorithm for Natural Language Sentences</title><source>Open Access: PubMed Central</source><source>Publicly Available Content Database (Proquest) (PQ_SDU_P3)</source><source>Wiley Online Library Open Access</source><creator>Lee, Ming Che ; Hsieh, Tung Cheng ; Chang, Jia Wei</creator><contributor>Melin, P. ; Fernandez-Breis, J. T. ; Duque, J. G.</contributor><creatorcontrib>Lee, Ming Che ; Hsieh, Tung Cheng ; Chang, Jia Wei ; Melin, P. ; Fernandez-Breis, J. T. ; Duque, J. G.</creatorcontrib><description>This paper presents a grammar and semantic corpus based similarity algorithm for natural language sentences. Natural language, in opposition to “artificial language”, such as computer programming languages, is the language used by the general public for daily communication. Traditional information retrieval approaches, such as vector models, LSA, HAL, or even the ontology-based approaches that extend to include concept similarity comparison instead of cooccurrence terms/words, may not always determine the perfect matching while there is no obvious relation or concept overlap between two natural language sentences. This paper proposes a sentence similarity algorithm that takes advantage of corpus-based ontology and grammatical rules to overcome the addressed problems. Experiments on two famous benchmarks demonstrate that the proposed algorithm has a significant performance improvement in sentences/short-texts with arbitrary syntax and structure.</description><identifier>ISSN: 2356-6140</identifier><identifier>ISSN: 1537-744X</identifier><identifier>EISSN: 1537-744X</identifier><identifier>DOI: 10.1155/2014/437162</identifier><identifier>PMID: 24982952</identifier><language>eng</language><publisher>Cairo, Egypt: Hindawi Publishing Corporation</publisher><subject>Algorithms ; Computational linguistics ; Grammar ; Information retrieval ; Language ; Language processing ; Methods ; Natural language interfaces ; Natural Language Processing ; Semantics ; Syntax</subject><ispartof>TheScientificWorld, 2014-01, Vol.2014 (2014), p.1-17</ispartof><rights>Copyright © 2014 Ming Che Lee et al.</rights><rights>COPYRIGHT 2014 John Wiley &amp; Sons, Inc.</rights><rights>Copyright © 2014 Ming Che Lee et al. Ming Che Lee et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</rights><rights>Copyright © 2014 Ming Che Lee et al. 2014</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c638t-3662ea17e66f20fd13b98af558f50f8e0ee7ce91f75e332aa79fde66ebc01a183</citedby><cites>FETCH-LOGICAL-c638t-3662ea17e66f20fd13b98af558f50f8e0ee7ce91f75e332aa79fde66ebc01a183</cites><orcidid>0000-0002-4400-9109</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/1564231821/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/1564231821?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,25753,27924,27925,37012,37013,44590,53791,53793,75126</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/24982952$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Melin, P.</contributor><contributor>Fernandez-Breis, J. T.</contributor><contributor>Duque, J. G.</contributor><creatorcontrib>Lee, Ming Che</creatorcontrib><creatorcontrib>Hsieh, Tung Cheng</creatorcontrib><creatorcontrib>Chang, Jia Wei</creatorcontrib><title>A Grammar-Based Semantic Similarity Algorithm for Natural Language Sentences</title><title>TheScientificWorld</title><addtitle>ScientificWorldJournal</addtitle><description>This paper presents a grammar and semantic corpus based similarity algorithm for natural language sentences. Natural language, in opposition to “artificial language”, such as computer programming languages, is the language used by the general public for daily communication. Traditional information retrieval approaches, such as vector models, LSA, HAL, or even the ontology-based approaches that extend to include concept similarity comparison instead of cooccurrence terms/words, may not always determine the perfect matching while there is no obvious relation or concept overlap between two natural language sentences. This paper proposes a sentence similarity algorithm that takes advantage of corpus-based ontology and grammatical rules to overcome the addressed problems. Experiments on two famous benchmarks demonstrate that the proposed algorithm has a significant performance improvement in sentences/short-texts with arbitrary syntax and structure.</description><subject>Algorithms</subject><subject>Computational linguistics</subject><subject>Grammar</subject><subject>Information retrieval</subject><subject>Language</subject><subject>Language processing</subject><subject>Methods</subject><subject>Natural language interfaces</subject><subject>Natural Language Processing</subject><subject>Semantics</subject><subject>Syntax</subject><issn>2356-6140</issn><issn>1537-744X</issn><issn>1537-744X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNqNks1v1DAQxSMEokvhxB1F4oKK0vo7yQVpqaBUWsGhIHGzZp1x1qskbp2Eqv89U1KqlhPywZb9m2e_8cuy15wdc671iWBcnShZciOeZCuuZVmUSv18mq2E1KYwXLGD7MU47hmTVcn18-xAqLoStRarbLPOzxL0PaTiI4zY5BfYwzAFl1-EPnSQwnSTr7s20mLX5z6m_CtMc4Iu38DQztAilQwTDg7Hl9kzD92Ir-7mw-zH50_fT78Um29n56frTeGMrKZCGiMQeInGeMF8w-W2rsBrXXnNfIUMsXRYc19qlFIAlLVvCMatYxx4JQ-z80W3ibC3lynQ829shGD_bMTUWkjkoUMruPamlso7KZVwdaUQjRNNYwS1QGxJ68OidTlve2wceSFzj0QfnwxhZ9v4yyrGNKsYCby7E0jxasZxsn0YHXYdDBjn0XKtBPW6lpLQt_-g-zingVpFlFFC8kpwoo4XqgUyEAYf6V5Ho8E-uDigD7S_Vpy-nBtjqOD9UuBSHMeE_v71nNnbiNjbiNglIkS_eWj4nv2bCQKOFmAXhgauw_-pUWzoZngAq9pIIX8DRJjL0A</recordid><startdate>20140101</startdate><enddate>20140101</enddate><creator>Lee, Ming Che</creator><creator>Hsieh, Tung Cheng</creator><creator>Chang, Jia Wei</creator><general>Hindawi Publishing Corporation</general><general>John Wiley &amp; Sons, Inc</general><general>Hindawi Limited</general><scope>ADJCN</scope><scope>AHFXO</scope><scope>RHU</scope><scope>RHW</scope><scope>RHX</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7QP</scope><scope>7TK</scope><scope>7TM</scope><scope>7X2</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>ATCPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>CWDGH</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>M0K</scope><scope>M0S</scope><scope>M1P</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-4400-9109</orcidid></search><sort><creationdate>20140101</creationdate><title>A Grammar-Based Semantic Similarity Algorithm for Natural Language Sentences</title><author>Lee, Ming Che ; Hsieh, Tung Cheng ; Chang, Jia Wei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c638t-3662ea17e66f20fd13b98af558f50f8e0ee7ce91f75e332aa79fde66ebc01a183</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Algorithms</topic><topic>Computational linguistics</topic><topic>Grammar</topic><topic>Information retrieval</topic><topic>Language</topic><topic>Language processing</topic><topic>Methods</topic><topic>Natural language interfaces</topic><topic>Natural Language Processing</topic><topic>Semantics</topic><topic>Syntax</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lee, Ming Che</creatorcontrib><creatorcontrib>Hsieh, Tung Cheng</creatorcontrib><creatorcontrib>Chang, Jia Wei</creatorcontrib><collection>الدوريات العلمية والإحصائية - e-Marefa Academic and Statistical Periodicals</collection><collection>معرفة - المحتوى العربي الأكاديمي المتكامل - e-Marefa Academic Complete</collection><collection>Hindawi Publishing Complete</collection><collection>Hindawi Publishing Subscription Journals</collection><collection>Hindawi Publishing Open Access</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Calcium &amp; Calcified Tissue Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Agricultural Science Collection</collection><collection>Health &amp; Medicine (ProQuest)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>Agricultural &amp; Environmental Science Collection</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>Middle East &amp; Africa Database</collection><collection>ProQuest Central</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Agriculture Science Database</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Publicly Available Content Database (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>Open Access: DOAJ - Directory of Open Access Journals</collection><jtitle>TheScientificWorld</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lee, Ming Che</au><au>Hsieh, Tung Cheng</au><au>Chang, Jia Wei</au><au>Melin, P.</au><au>Fernandez-Breis, J. T.</au><au>Duque, J. G.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Grammar-Based Semantic Similarity Algorithm for Natural Language Sentences</atitle><jtitle>TheScientificWorld</jtitle><addtitle>ScientificWorldJournal</addtitle><date>2014-01-01</date><risdate>2014</risdate><volume>2014</volume><issue>2014</issue><spage>1</spage><epage>17</epage><pages>1-17</pages><issn>2356-6140</issn><issn>1537-744X</issn><eissn>1537-744X</eissn><abstract>This paper presents a grammar and semantic corpus based similarity algorithm for natural language sentences. Natural language, in opposition to “artificial language”, such as computer programming languages, is the language used by the general public for daily communication. Traditional information retrieval approaches, such as vector models, LSA, HAL, or even the ontology-based approaches that extend to include concept similarity comparison instead of cooccurrence terms/words, may not always determine the perfect matching while there is no obvious relation or concept overlap between two natural language sentences. This paper proposes a sentence similarity algorithm that takes advantage of corpus-based ontology and grammatical rules to overcome the addressed problems. Experiments on two famous benchmarks demonstrate that the proposed algorithm has a significant performance improvement in sentences/short-texts with arbitrary syntax and structure.</abstract><cop>Cairo, Egypt</cop><pub>Hindawi Publishing Corporation</pub><pmid>24982952</pmid><doi>10.1155/2014/437162</doi><tpages>17</tpages><orcidid>https://orcid.org/0000-0002-4400-9109</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2356-6140
ispartof TheScientificWorld, 2014-01, Vol.2014 (2014), p.1-17
issn 2356-6140
1537-744X
1537-744X
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_215f6934fc3342c984ee6c2dd629822b
source Open Access: PubMed Central; Publicly Available Content Database (Proquest) (PQ_SDU_P3); Wiley Online Library Open Access
subjects Algorithms
Computational linguistics
Grammar
Information retrieval
Language
Language processing
Methods
Natural language interfaces
Natural Language Processing
Semantics
Syntax
title A Grammar-Based Semantic Similarity Algorithm for Natural Language Sentences
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-30T22%3A54%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Grammar-Based%20Semantic%20Similarity%20Algorithm%20for%20Natural%20Language%20Sentences&rft.jtitle=TheScientificWorld&rft.au=Lee,%20Ming%20Che&rft.date=2014-01-01&rft.volume=2014&rft.issue=2014&rft.spage=1&rft.epage=17&rft.pages=1-17&rft.issn=2356-6140&rft.eissn=1537-744X&rft_id=info:doi/10.1155/2014/437162&rft_dat=%3Cgale_doaj_%3EA413711666%3C/gale_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c638t-3662ea17e66f20fd13b98af558f50f8e0ee7ce91f75e332aa79fde66ebc01a183%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1564231821&rft_id=info:pmid/24982952&rft_galeid=A413711666&rfr_iscdi=true