Loading…
Word Sense Disambiguation Using Prior Probability Estimation Based on the Korean WordNet
Supervised disambiguation using a large amount of corpus data delivers better performance than other word sense disambiguation methods. However, it is not easy to construct large-scale, sense-tagged corpora since this requires high cost and time. On the other hand, implementing unsupervised disambig...
Saved in:
Published in: | Electronics (Basel) 2021-12, Vol.10 (23), p.2938 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c322t-97436787878e6be31949e2504f04fe17671be47d5dddc470947ffd9d70a5d4883 |
---|---|
cites | cdi_FETCH-LOGICAL-c322t-97436787878e6be31949e2504f04fe17671be47d5dddc470947ffd9d70a5d4883 |
container_end_page | |
container_issue | 23 |
container_start_page | 2938 |
container_title | Electronics (Basel) |
container_volume | 10 |
creator | Kim, Minho Kwon, Hyuk-Chul |
description | Supervised disambiguation using a large amount of corpus data delivers better performance than other word sense disambiguation methods. However, it is not easy to construct large-scale, sense-tagged corpora since this requires high cost and time. On the other hand, implementing unsupervised disambiguation is relatively easy, although most of the efforts have not been satisfactory. A primary reason for the performance degradation of unsupervised disambiguation is that the semantic occurrence probability of ambiguous words is not available. Hence, a data deficiency problem occurs while determining the dependency between words. This paper proposes an unsupervised disambiguation method using a prior probability estimation based on the Korean WordNet. This performs better than supervised disambiguation. In the Korean WordNet, all the words have similar semantic characteristics to their related words. Thus, it is assumed that the dependency between words is the same as the dependency between their related words. This resolves the data deficiency problem by determining the dependency between words by calculating the χ2 statistic between related words. Moreover, in order to have the same effect as using the semantic occurrence probability as prior probability, which is used in supervised disambiguation, semantically related words of ambiguous vocabulary are obtained and utilized as prior probability data. An experiment was conducted with Korean, English, and Chinese to evaluate the performance of our proposed lexical disambiguation method. We found that our proposed method had better performance than supervised disambiguation methods even though our method is based on unsupervised disambiguation (using a knowledge-based approach). |
doi_str_mv | 10.3390/electronics10232938 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2608081892</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2608081892</sourcerecordid><originalsourceid>FETCH-LOGICAL-c322t-97436787878e6be31949e2504f04fe17671be47d5dddc470947ffd9d70a5d4883</originalsourceid><addsrcrecordid>eNptUMtOwzAQtBBIVKVfwMUS54Bfje0jlPIQFSBBBbfIiTfFVRoX2z3073EVDhzYXe3OYTSjHYTOKbnkXJMr6KBJwfeuiZQwzjRXR2jEiNSFZpod_8GnaBLjmuTSlCtORujzwweL36CPgG9dNJvarXYmOd_jZXT9Cr8G50Pevja161za43lMbjNQbkwEizNIX4CffADT44PgM6QzdNKaLsLk947R8m7-PnsoFi_3j7PrRdFwxlKhpeClVIeGsgZOtdDApkS0eYDKUtIahLRTa20jJNFCtq3VVhIztUIpPkYXg-42-O8dxFSt_S702bJiJVFEUaVZZvGB1QQfY4C22ob8RNhXlFSHFKt_UuQ_Xg5oBQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2608081892</pqid></control><display><type>article</type><title>Word Sense Disambiguation Using Prior Probability Estimation Based on the Korean WordNet</title><source>Linguistics and Language Behavior Abstracts (LLBA)</source><source>ProQuest - Publicly Available Content Database</source><creator>Kim, Minho ; Kwon, Hyuk-Chul</creator><creatorcontrib>Kim, Minho ; Kwon, Hyuk-Chul</creatorcontrib><description>Supervised disambiguation using a large amount of corpus data delivers better performance than other word sense disambiguation methods. However, it is not easy to construct large-scale, sense-tagged corpora since this requires high cost and time. On the other hand, implementing unsupervised disambiguation is relatively easy, although most of the efforts have not been satisfactory. A primary reason for the performance degradation of unsupervised disambiguation is that the semantic occurrence probability of ambiguous words is not available. Hence, a data deficiency problem occurs while determining the dependency between words. This paper proposes an unsupervised disambiguation method using a prior probability estimation based on the Korean WordNet. This performs better than supervised disambiguation. In the Korean WordNet, all the words have similar semantic characteristics to their related words. Thus, it is assumed that the dependency between words is the same as the dependency between their related words. This resolves the data deficiency problem by determining the dependency between words by calculating the χ2 statistic between related words. Moreover, in order to have the same effect as using the semantic occurrence probability as prior probability, which is used in supervised disambiguation, semantically related words of ambiguous vocabulary are obtained and utilized as prior probability data. An experiment was conducted with Korean, English, and Chinese to evaluate the performance of our proposed lexical disambiguation method. We found that our proposed method had better performance than supervised disambiguation methods even though our method is based on unsupervised disambiguation (using a knowledge-based approach).</description><identifier>ISSN: 2079-9292</identifier><identifier>EISSN: 2079-9292</identifier><identifier>DOI: 10.3390/electronics10232938</identifier><language>eng</language><publisher>Basel: MDPI AG</publisher><subject>Accuracy ; Ambiguity ; Chi-square test ; Chinese languages ; Conditional probability ; Dictionaries ; Korean language ; Language ; Machine learning ; Methods ; Morphology ; Natural language processing ; Performance degradation ; Performance evaluation ; Semantics ; Statistical analysis ; Statistical tests ; Word sense disambiguation</subject><ispartof>Electronics (Basel), 2021-12, Vol.10 (23), p.2938</ispartof><rights>2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c322t-97436787878e6be31949e2504f04fe17671be47d5dddc470947ffd9d70a5d4883</citedby><cites>FETCH-LOGICAL-c322t-97436787878e6be31949e2504f04fe17671be47d5dddc470947ffd9d70a5d4883</cites><orcidid>0000-0002-8280-3493</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2608081892/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2608081892?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,12851,25753,27924,27925,31269,37012,44590,75126</link.rule.ids></links><search><creatorcontrib>Kim, Minho</creatorcontrib><creatorcontrib>Kwon, Hyuk-Chul</creatorcontrib><title>Word Sense Disambiguation Using Prior Probability Estimation Based on the Korean WordNet</title><title>Electronics (Basel)</title><description>Supervised disambiguation using a large amount of corpus data delivers better performance than other word sense disambiguation methods. However, it is not easy to construct large-scale, sense-tagged corpora since this requires high cost and time. On the other hand, implementing unsupervised disambiguation is relatively easy, although most of the efforts have not been satisfactory. A primary reason for the performance degradation of unsupervised disambiguation is that the semantic occurrence probability of ambiguous words is not available. Hence, a data deficiency problem occurs while determining the dependency between words. This paper proposes an unsupervised disambiguation method using a prior probability estimation based on the Korean WordNet. This performs better than supervised disambiguation. In the Korean WordNet, all the words have similar semantic characteristics to their related words. Thus, it is assumed that the dependency between words is the same as the dependency between their related words. This resolves the data deficiency problem by determining the dependency between words by calculating the χ2 statistic between related words. Moreover, in order to have the same effect as using the semantic occurrence probability as prior probability, which is used in supervised disambiguation, semantically related words of ambiguous vocabulary are obtained and utilized as prior probability data. An experiment was conducted with Korean, English, and Chinese to evaluate the performance of our proposed lexical disambiguation method. We found that our proposed method had better performance than supervised disambiguation methods even though our method is based on unsupervised disambiguation (using a knowledge-based approach).</description><subject>Accuracy</subject><subject>Ambiguity</subject><subject>Chi-square test</subject><subject>Chinese languages</subject><subject>Conditional probability</subject><subject>Dictionaries</subject><subject>Korean language</subject><subject>Language</subject><subject>Machine learning</subject><subject>Methods</subject><subject>Morphology</subject><subject>Natural language processing</subject><subject>Performance degradation</subject><subject>Performance evaluation</subject><subject>Semantics</subject><subject>Statistical analysis</subject><subject>Statistical tests</subject><subject>Word sense disambiguation</subject><issn>2079-9292</issn><issn>2079-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>7T9</sourceid><sourceid>PIMPY</sourceid><recordid>eNptUMtOwzAQtBBIVKVfwMUS54Bfje0jlPIQFSBBBbfIiTfFVRoX2z3073EVDhzYXe3OYTSjHYTOKbnkXJMr6KBJwfeuiZQwzjRXR2jEiNSFZpod_8GnaBLjmuTSlCtORujzwweL36CPgG9dNJvarXYmOd_jZXT9Cr8G50Pevja161za43lMbjNQbkwEizNIX4CffADT44PgM6QzdNKaLsLk947R8m7-PnsoFi_3j7PrRdFwxlKhpeClVIeGsgZOtdDApkS0eYDKUtIahLRTa20jJNFCtq3VVhIztUIpPkYXg-42-O8dxFSt_S702bJiJVFEUaVZZvGB1QQfY4C22ob8RNhXlFSHFKt_UuQ_Xg5oBQ</recordid><startdate>20211201</startdate><enddate>20211201</enddate><creator>Kim, Minho</creator><creator>Kwon, Hyuk-Chul</creator><general>MDPI AG</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>7T9</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L7M</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><orcidid>https://orcid.org/0000-0002-8280-3493</orcidid></search><sort><creationdate>20211201</creationdate><title>Word Sense Disambiguation Using Prior Probability Estimation Based on the Korean WordNet</title><author>Kim, Minho ; Kwon, Hyuk-Chul</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c322t-97436787878e6be31949e2504f04fe17671be47d5dddc470947ffd9d70a5d4883</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Accuracy</topic><topic>Ambiguity</topic><topic>Chi-square test</topic><topic>Chinese languages</topic><topic>Conditional probability</topic><topic>Dictionaries</topic><topic>Korean language</topic><topic>Language</topic><topic>Machine learning</topic><topic>Methods</topic><topic>Morphology</topic><topic>Natural language processing</topic><topic>Performance degradation</topic><topic>Performance evaluation</topic><topic>Semantics</topic><topic>Statistical analysis</topic><topic>Statistical tests</topic><topic>Word sense disambiguation</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kim, Minho</creatorcontrib><creatorcontrib>Kwon, Hyuk-Chul</creatorcontrib><collection>CrossRef</collection><collection>Electronics & Communications Abstracts</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>ProQuest advanced technologies & aerospace journals</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest - Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>Electronics (Basel)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kim, Minho</au><au>Kwon, Hyuk-Chul</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Word Sense Disambiguation Using Prior Probability Estimation Based on the Korean WordNet</atitle><jtitle>Electronics (Basel)</jtitle><date>2021-12-01</date><risdate>2021</risdate><volume>10</volume><issue>23</issue><spage>2938</spage><pages>2938-</pages><issn>2079-9292</issn><eissn>2079-9292</eissn><abstract>Supervised disambiguation using a large amount of corpus data delivers better performance than other word sense disambiguation methods. However, it is not easy to construct large-scale, sense-tagged corpora since this requires high cost and time. On the other hand, implementing unsupervised disambiguation is relatively easy, although most of the efforts have not been satisfactory. A primary reason for the performance degradation of unsupervised disambiguation is that the semantic occurrence probability of ambiguous words is not available. Hence, a data deficiency problem occurs while determining the dependency between words. This paper proposes an unsupervised disambiguation method using a prior probability estimation based on the Korean WordNet. This performs better than supervised disambiguation. In the Korean WordNet, all the words have similar semantic characteristics to their related words. Thus, it is assumed that the dependency between words is the same as the dependency between their related words. This resolves the data deficiency problem by determining the dependency between words by calculating the χ2 statistic between related words. Moreover, in order to have the same effect as using the semantic occurrence probability as prior probability, which is used in supervised disambiguation, semantically related words of ambiguous vocabulary are obtained and utilized as prior probability data. An experiment was conducted with Korean, English, and Chinese to evaluate the performance of our proposed lexical disambiguation method. We found that our proposed method had better performance than supervised disambiguation methods even though our method is based on unsupervised disambiguation (using a knowledge-based approach).</abstract><cop>Basel</cop><pub>MDPI AG</pub><doi>10.3390/electronics10232938</doi><orcidid>https://orcid.org/0000-0002-8280-3493</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2079-9292 |
ispartof | Electronics (Basel), 2021-12, Vol.10 (23), p.2938 |
issn | 2079-9292 2079-9292 |
language | eng |
recordid | cdi_proquest_journals_2608081892 |
source | Linguistics and Language Behavior Abstracts (LLBA); ProQuest - Publicly Available Content Database |
subjects | Accuracy Ambiguity Chi-square test Chinese languages Conditional probability Dictionaries Korean language Language Machine learning Methods Morphology Natural language processing Performance degradation Performance evaluation Semantics Statistical analysis Statistical tests Word sense disambiguation |
title | Word Sense Disambiguation Using Prior Probability Estimation Based on the Korean WordNet |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T10%3A17%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Word%20Sense%20Disambiguation%20Using%20Prior%20Probability%20Estimation%20Based%20on%20the%20Korean%20WordNet&rft.jtitle=Electronics%20(Basel)&rft.au=Kim,%20Minho&rft.date=2021-12-01&rft.volume=10&rft.issue=23&rft.spage=2938&rft.pages=2938-&rft.issn=2079-9292&rft.eissn=2079-9292&rft_id=info:doi/10.3390/electronics10232938&rft_dat=%3Cproquest_cross%3E2608081892%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c322t-97436787878e6be31949e2504f04fe17671be47d5dddc470947ffd9d70a5d4883%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2608081892&rft_id=info:pmid/&rfr_iscdi=true |