Loading…

Semantic concept model using Wikipedia semantic features

Wikipedia has become a high coverage knowledge source which has been used in many research areas such as natural language processing, text mining and information retrieval. Several methods have been introduced for extracting explicit or implicit relations from Wikipedia to represent semantics of con...

Full description

Saved in:

Bibliographic Details
Published in:	Journal of information science 2018-08, Vol.44 (4), p.526-551
Main Authors:	Saif, Abdulgabbar, Omar, Nazlia, Ab Aziz, Mohd Juzaiddin, Zainodin, Ummi Zakiah, Salim, Naomie
Format:	Article
Language:	English
Subjects:	Data mining Data models Datasets Dirichlet problem Feature extraction Hypertext Information retrieval Interpolation Natural language processing Probabilistic models Semantics Statistical analysis Texts
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c309t-28b651dd4cd37cf5094feda60850f34510890f68502a3932c6f1f9d69928d8d73
cites	cdi_FETCH-LOGICAL-c309t-28b651dd4cd37cf5094feda60850f34510890f68502a3932c6f1f9d69928d8d73
container_end_page	551
container_issue	4
container_start_page	526
container_title	Journal of information science
container_volume	44
creator	Saif, Abdulgabbar Omar, Nazlia Ab Aziz, Mohd Juzaiddin Zainodin, Ummi Zakiah Salim, Naomie
description	Wikipedia has become a high coverage knowledge source which has been used in many research areas such as natural language processing, text mining and information retrieval. Several methods have been introduced for extracting explicit or implicit relations from Wikipedia to represent semantics of concepts/words. However, the main challenge in semantic representation is how to incorporate different types of semantic relations to capture more semantic evidences of the associations of concepts. In this article, we propose a semantic concept model that incorporates different types of semantic features extracting from Wikipedia. For each concept that corresponds to an article, four semantic features are introduced: template links, categories, salient concepts and topics. The proposed model is based on the probability distributions that are defined for these semantic features of a Wikipedia concept. The template links and categories are the document-level features which are directly extracted from the structured information included in the article. On the other hand, the salient concepts and topics are corpus-level features which are extracted to capture implicit relations among concepts. For the salient concepts feature, the distributional-based method is utilised on the hypertext corpus to extract this feature for each Wikipedia concept. Then, the probability product kernel is used to improve the weight of each concept in this feature. For the topic feature, the Labelled latent Dirichlet allocation is adapted on the supervised multi-label of Wikipedia to train the probabilistic model of this feature. Finally, we used the linear interpolation for incorporating these semantic features into the probabilistic model to estimate the semantic relation probability of the specific concept over Wikipedia articles. The proposed model is evaluated on 12 benchmark datasets in three natural language processing tasks: measuring the semantic relatedness of concepts/words in general and in the biomedical domain, semantic textual relatedness measurement and measuring the semantic compositionality of noun compounds. The model is also compared with five methods that depends on separate semantic features in Wikipedia. Experimental results show that the proposed model achieves promising results in three tasks and outperforms the baseline methods in most of the evaluation datasets. This implies that incorporation of explicit and implicit semantic features is useful for representing s
doi_str_mv	10.1177/0165551517706231
format	article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2072913390</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sage_id>10.1177_0165551517706231</sage_id><sourcerecordid>2072913390</sourcerecordid><originalsourceid>FETCH-LOGICAL-c309t-28b651dd4cd37cf5094feda60850f34510890f68502a3932c6f1f9d69928d8d73</originalsourceid><addsrcrecordid>eNp1kM1LxDAQxYMoWFfvHgueqzNJkyZHWfyCBQ8qHkvMx5J1-2HSHvzvbakiCJ5mhvd7b-ARco5wiVhVV4CCc4582kFQhgckw6rEQpSSH5JslotZPyYnKe0AgCtWZkQ-uUa3QzC56Vrj-iFvOuv2-ZhCu81fw3vonQ06Tz-Yd3oYo0un5MjrfXJn33NFXm5vntf3xebx7mF9vSkMAzUUVL4JjtaWxrLKeA6q9M5qAZKDZyVHkAq8mC6qmWLUCI9eWaEUlVbaiq3IxZLbx-5jdGmod90Y2-llTaGiChlTMFGwUCZ2KUXn6z6GRsfPGqGe-6n_9jNZisWS9Nb9hv7LfwFruWL3</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2072913390</pqid></control><display><type>article</type><title>Semantic concept model using Wikipedia semantic features</title><source>Library & Information Science Abstracts (LISA)</source><source>Sage Journals Online</source><creator>Saif, Abdulgabbar ; Omar, Nazlia ; Ab Aziz, Mohd Juzaiddin ; Zainodin, Ummi Zakiah ; Salim, Naomie</creator><creatorcontrib>Saif, Abdulgabbar ; Omar, Nazlia ; Ab Aziz, Mohd Juzaiddin ; Zainodin, Ummi Zakiah ; Salim, Naomie</creatorcontrib><description>Wikipedia has become a high coverage knowledge source which has been used in many research areas such as natural language processing, text mining and information retrieval. Several methods have been introduced for extracting explicit or implicit relations from Wikipedia to represent semantics of concepts/words. However, the main challenge in semantic representation is how to incorporate different types of semantic relations to capture more semantic evidences of the associations of concepts. In this article, we propose a semantic concept model that incorporates different types of semantic features extracting from Wikipedia. For each concept that corresponds to an article, four semantic features are introduced: template links, categories, salient concepts and topics. The proposed model is based on the probability distributions that are defined for these semantic features of a Wikipedia concept. The template links and categories are the document-level features which are directly extracted from the structured information included in the article. On the other hand, the salient concepts and topics are corpus-level features which are extracted to capture implicit relations among concepts. For the salient concepts feature, the distributional-based method is utilised on the hypertext corpus to extract this feature for each Wikipedia concept. Then, the probability product kernel is used to improve the weight of each concept in this feature. For the topic feature, the Labelled latent Dirichlet allocation is adapted on the supervised multi-label of Wikipedia to train the probabilistic model of this feature. Finally, we used the linear interpolation for incorporating these semantic features into the probabilistic model to estimate the semantic relation probability of the specific concept over Wikipedia articles. The proposed model is evaluated on 12 benchmark datasets in three natural language processing tasks: measuring the semantic relatedness of concepts/words in general and in the biomedical domain, semantic textual relatedness measurement and measuring the semantic compositionality of noun compounds. The model is also compared with five methods that depends on separate semantic features in Wikipedia. Experimental results show that the proposed model achieves promising results in three tasks and outperforms the baseline methods in most of the evaluation datasets. This implies that incorporation of explicit and implicit semantic features is useful for representing semantics of concepts in Wikipedia.</description><identifier>ISSN: 0165-5515</identifier><identifier>EISSN: 1741-6485</identifier><identifier>DOI: 10.1177/0165551517706231</identifier><language>eng</language><publisher>London, England: SAGE Publications</publisher><subject>Data mining ; Data models ; Datasets ; Dirichlet problem ; Feature extraction ; Hypertext ; Information retrieval ; Interpolation ; Natural language processing ; Probabilistic models ; Semantics ; Statistical analysis ; Texts</subject><ispartof>Journal of information science, 2018-08, Vol.44 (4), p.526-551</ispartof><rights>The Author(s) 2017</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c309t-28b651dd4cd37cf5094feda60850f34510890f68502a3932c6f1f9d69928d8d73</citedby><cites>FETCH-LOGICAL-c309t-28b651dd4cd37cf5094feda60850f34510890f68502a3932c6f1f9d69928d8d73</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925,34135,79364</link.rule.ids></links><search><creatorcontrib>Saif, Abdulgabbar</creatorcontrib><creatorcontrib>Omar, Nazlia</creatorcontrib><creatorcontrib>Ab Aziz, Mohd Juzaiddin</creatorcontrib><creatorcontrib>Zainodin, Ummi Zakiah</creatorcontrib><creatorcontrib>Salim, Naomie</creatorcontrib><title>Semantic concept model using Wikipedia semantic features</title><title>Journal of information science</title><description>Wikipedia has become a high coverage knowledge source which has been used in many research areas such as natural language processing, text mining and information retrieval. Several methods have been introduced for extracting explicit or implicit relations from Wikipedia to represent semantics of concepts/words. However, the main challenge in semantic representation is how to incorporate different types of semantic relations to capture more semantic evidences of the associations of concepts. In this article, we propose a semantic concept model that incorporates different types of semantic features extracting from Wikipedia. For each concept that corresponds to an article, four semantic features are introduced: template links, categories, salient concepts and topics. The proposed model is based on the probability distributions that are defined for these semantic features of a Wikipedia concept. The template links and categories are the document-level features which are directly extracted from the structured information included in the article. On the other hand, the salient concepts and topics are corpus-level features which are extracted to capture implicit relations among concepts. For the salient concepts feature, the distributional-based method is utilised on the hypertext corpus to extract this feature for each Wikipedia concept. Then, the probability product kernel is used to improve the weight of each concept in this feature. For the topic feature, the Labelled latent Dirichlet allocation is adapted on the supervised multi-label of Wikipedia to train the probabilistic model of this feature. Finally, we used the linear interpolation for incorporating these semantic features into the probabilistic model to estimate the semantic relation probability of the specific concept over Wikipedia articles. The proposed model is evaluated on 12 benchmark datasets in three natural language processing tasks: measuring the semantic relatedness of concepts/words in general and in the biomedical domain, semantic textual relatedness measurement and measuring the semantic compositionality of noun compounds. The model is also compared with five methods that depends on separate semantic features in Wikipedia. Experimental results show that the proposed model achieves promising results in three tasks and outperforms the baseline methods in most of the evaluation datasets. This implies that incorporation of explicit and implicit semantic features is useful for representing semantics of concepts in Wikipedia.</description><subject>Data mining</subject><subject>Data models</subject><subject>Datasets</subject><subject>Dirichlet problem</subject><subject>Feature extraction</subject><subject>Hypertext</subject><subject>Information retrieval</subject><subject>Interpolation</subject><subject>Natural language processing</subject><subject>Probabilistic models</subject><subject>Semantics</subject><subject>Statistical analysis</subject><subject>Texts</subject><issn>0165-5515</issn><issn>1741-6485</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>F2A</sourceid><recordid>eNp1kM1LxDAQxYMoWFfvHgueqzNJkyZHWfyCBQ8qHkvMx5J1-2HSHvzvbakiCJ5mhvd7b-ARco5wiVhVV4CCc4582kFQhgckw6rEQpSSH5JslotZPyYnKe0AgCtWZkQ-uUa3QzC56Vrj-iFvOuv2-ZhCu81fw3vonQ06Tz-Yd3oYo0un5MjrfXJn33NFXm5vntf3xebx7mF9vSkMAzUUVL4JjtaWxrLKeA6q9M5qAZKDZyVHkAq8mC6qmWLUCI9eWaEUlVbaiq3IxZLbx-5jdGmod90Y2-llTaGiChlTMFGwUCZ2KUXn6z6GRsfPGqGe-6n_9jNZisWS9Nb9hv7LfwFruWL3</recordid><startdate>201808</startdate><enddate>201808</enddate><creator>Saif, Abdulgabbar</creator><creator>Omar, Nazlia</creator><creator>Ab Aziz, Mohd Juzaiddin</creator><creator>Zainodin, Ummi Zakiah</creator><creator>Salim, Naomie</creator><general>SAGE Publications</general><general>Bowker-Saur Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>E3H</scope><scope>F2A</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>201808</creationdate><title>Semantic concept model using Wikipedia semantic features</title><author>Saif, Abdulgabbar ; Omar, Nazlia ; Ab Aziz, Mohd Juzaiddin ; Zainodin, Ummi Zakiah ; Salim, Naomie</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c309t-28b651dd4cd37cf5094feda60850f34510890f68502a3932c6f1f9d69928d8d73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Data mining</topic><topic>Data models</topic><topic>Datasets</topic><topic>Dirichlet problem</topic><topic>Feature extraction</topic><topic>Hypertext</topic><topic>Information retrieval</topic><topic>Interpolation</topic><topic>Natural language processing</topic><topic>Probabilistic models</topic><topic>Semantics</topic><topic>Statistical analysis</topic><topic>Texts</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Saif, Abdulgabbar</creatorcontrib><creatorcontrib>Omar, Nazlia</creatorcontrib><creatorcontrib>Ab Aziz, Mohd Juzaiddin</creatorcontrib><creatorcontrib>Zainodin, Ummi Zakiah</creatorcontrib><creatorcontrib>Salim, Naomie</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Library & Information Sciences Abstracts (LISA)</collection><collection>Library & Information Science Abstracts (LISA)</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of information science</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Saif, Abdulgabbar</au><au>Omar, Nazlia</au><au>Ab Aziz, Mohd Juzaiddin</au><au>Zainodin, Ummi Zakiah</au><au>Salim, Naomie</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Semantic concept model using Wikipedia semantic features</atitle><jtitle>Journal of information science</jtitle><date>2018-08</date><risdate>2018</risdate><volume>44</volume><issue>4</issue><spage>526</spage><epage>551</epage><pages>526-551</pages><issn>0165-5515</issn><eissn>1741-6485</eissn><abstract>Wikipedia has become a high coverage knowledge source which has been used in many research areas such as natural language processing, text mining and information retrieval. Several methods have been introduced for extracting explicit or implicit relations from Wikipedia to represent semantics of concepts/words. However, the main challenge in semantic representation is how to incorporate different types of semantic relations to capture more semantic evidences of the associations of concepts. In this article, we propose a semantic concept model that incorporates different types of semantic features extracting from Wikipedia. For each concept that corresponds to an article, four semantic features are introduced: template links, categories, salient concepts and topics. The proposed model is based on the probability distributions that are defined for these semantic features of a Wikipedia concept. The template links and categories are the document-level features which are directly extracted from the structured information included in the article. On the other hand, the salient concepts and topics are corpus-level features which are extracted to capture implicit relations among concepts. For the salient concepts feature, the distributional-based method is utilised on the hypertext corpus to extract this feature for each Wikipedia concept. Then, the probability product kernel is used to improve the weight of each concept in this feature. For the topic feature, the Labelled latent Dirichlet allocation is adapted on the supervised multi-label of Wikipedia to train the probabilistic model of this feature. Finally, we used the linear interpolation for incorporating these semantic features into the probabilistic model to estimate the semantic relation probability of the specific concept over Wikipedia articles. The proposed model is evaluated on 12 benchmark datasets in three natural language processing tasks: measuring the semantic relatedness of concepts/words in general and in the biomedical domain, semantic textual relatedness measurement and measuring the semantic compositionality of noun compounds. The model is also compared with five methods that depends on separate semantic features in Wikipedia. Experimental results show that the proposed model achieves promising results in three tasks and outperforms the baseline methods in most of the evaluation datasets. This implies that incorporation of explicit and implicit semantic features is useful for representing semantics of concepts in Wikipedia.</abstract><cop>London, England</cop><pub>SAGE Publications</pub><doi>10.1177/0165551517706231</doi><tpages>26</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0165-5515
ispartof	Journal of information science, 2018-08, Vol.44 (4), p.526-551
issn	0165-5515 1741-6485
language	eng
recordid	cdi_proquest_journals_2072913390
source	Library & Information Science Abstracts (LISA); Sage Journals Online
subjects	Data mining Data models Datasets Dirichlet problem Feature extraction Hypertext Information retrieval Interpolation Natural language processing Probabilistic models Semantics Statistical analysis Texts
title	Semantic concept model using Wikipedia semantic features
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T23%3A42%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Semantic%20concept%20model%20using%20Wikipedia%20semantic%20features&rft.jtitle=Journal%20of%20information%20science&rft.au=Saif,%20Abdulgabbar&rft.date=2018-08&rft.volume=44&rft.issue=4&rft.spage=526&rft.epage=551&rft.pages=526-551&rft.issn=0165-5515&rft.eissn=1741-6485&rft_id=info:doi/10.1177/0165551517706231&rft_dat=%3Cproquest_cross%3E2072913390%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c309t-28b651dd4cd37cf5094feda60850f34510890f68502a3932c6f1f9d69928d8d73%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2072913390&rft_id=info:pmid/&rft_sage_id=10.1177_0165551517706231&rfr_iscdi=true