Loading…
Important citation identification by exploiting the syntactic and contextual information of citations
Citations are not equally important. Researchers presented different models and techniques to identify important citations. However, the features used in these work are relatively limited, so they cannot achieve good recognition performance. This paper proposed a new machine learning framework to di...
Saved in:
Published in: | Scientometrics 2020-12, Vol.125 (3), p.2109-2129 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c438t-5ce653d9a5b4b0d7cfb2b3c63d238c42baa06971285ce7cf20b0df82a3d5d64d3 |
---|---|
cites | cdi_FETCH-LOGICAL-c438t-5ce653d9a5b4b0d7cfb2b3c63d238c42baa06971285ce7cf20b0df82a3d5d64d3 |
container_end_page | 2129 |
container_issue | 3 |
container_start_page | 2109 |
container_title | Scientometrics |
container_volume | 125 |
creator | Wang, Mingyang Zhang, Jiaqi Jiao, Shijia Zhang, Xiangrong Zhu, Na Chen, Guangsheng |
description | Citations are not equally important. Researchers presented different models and techniques to identify important citations. However, the features used in these work are relatively limited, so they cannot achieve good recognition performance. This paper proposed a new machine learning framework to distinguish important and non-important citations by examining the syntactic and contextual information of citations. Among them, syntactic features reflect the statistical perspective characteristics brought by citation behavior, such as the cited frequency and citation position of the cited article in the citing ones. Contextual features reflect the semantic content characteristics brought by citations, such as the intent and polarity of citations. Three feature selection algorithms, Pearson correlation coefficient, relief-F and entropy weight method, were used to calculate the contribution of each index on distinguishing different kinds of citations. On this basis, key features that can better identify the important citations were screened out. Three classifiers of support vector machine, KNN and random forest were used to test the classification performance of these key features. The experiment was performed on two annotated benchmark datasets. It showed that the framework proposed in this paper can achieve better classification performance compared with contemporary state-of-the-art research. The syntactic and contextual features of citation are of great value in identifying important citations. |
doi_str_mv | 10.1007/s11192-020-03677-1 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2492302599</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2492302599</sourcerecordid><originalsourceid>FETCH-LOGICAL-c438t-5ce653d9a5b4b0d7cfb2b3c63d238c42baa06971285ce7cf20b0df82a3d5d64d3</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhoMoWKt_wFPAc3SS7Ef2KMWPQsGLnkM2ydaUNlmTFNp_b3RFb56GYZ7nHXgRuqZwSwHau0Qp7RgBBgR407aEnqAZrYUgTDT0FM2AckE6yuEcXaS0gSJxEDNkl7sxxKx8xtpllV3w2Bnrsxucntb-iO1h3AaXnV_j_G5xOvqsdHYaK2-wDj7bQ96rLXZ-CHE3aWH4TUyX6GxQ22SvfuYcvT0-vC6eyerlabm4XxFdcZFJrW1Tc9Opuq96MK0eetZz3XDDuNAV65WCpmspE4UsVwaFGgRT3NSmqQyfo5spd4zhY29Tlpuwj768lKzqGAdWd12h2ETpGFKKdpBjdDsVj5KC_KpTTnXKUqf8rlPSIvFJSgX2axv_ov-xPgGv0Hre</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2492302599</pqid></control><display><type>article</type><title>Important citation identification by exploiting the syntactic and contextual information of citations</title><source>Library & Information Science Abstracts (LISA)</source><source>Springer Nature</source><creator>Wang, Mingyang ; Zhang, Jiaqi ; Jiao, Shijia ; Zhang, Xiangrong ; Zhu, Na ; Chen, Guangsheng</creator><creatorcontrib>Wang, Mingyang ; Zhang, Jiaqi ; Jiao, Shijia ; Zhang, Xiangrong ; Zhu, Na ; Chen, Guangsheng</creatorcontrib><description>Citations are not equally important. Researchers presented different models and techniques to identify important citations. However, the features used in these work are relatively limited, so they cannot achieve good recognition performance. This paper proposed a new machine learning framework to distinguish important and non-important citations by examining the syntactic and contextual information of citations. Among them, syntactic features reflect the statistical perspective characteristics brought by citation behavior, such as the cited frequency and citation position of the cited article in the citing ones. Contextual features reflect the semantic content characteristics brought by citations, such as the intent and polarity of citations. Three feature selection algorithms, Pearson correlation coefficient, relief-F and entropy weight method, were used to calculate the contribution of each index on distinguishing different kinds of citations. On this basis, key features that can better identify the important citations were screened out. Three classifiers of support vector machine, KNN and random forest were used to test the classification performance of these key features. The experiment was performed on two annotated benchmark datasets. It showed that the framework proposed in this paper can achieve better classification performance compared with contemporary state-of-the-art research. The syntactic and contextual features of citation are of great value in identifying important citations.</description><identifier>ISSN: 0138-9130</identifier><identifier>EISSN: 1588-2861</identifier><identifier>DOI: 10.1007/s11192-020-03677-1</identifier><language>eng</language><publisher>Cham: Springer International Publishing</publisher><subject>Algorithms ; Citations ; Classification ; Computer Science ; Correlation coefficient ; Correlation coefficients ; Entropy ; Information Storage and Retrieval ; Learning algorithms ; Library Science ; Machine learning ; Polarity ; Support vector machines</subject><ispartof>Scientometrics, 2020-12, Vol.125 (3), p.2109-2129</ispartof><rights>Akadémiai Kiadó, Budapest, Hungary 2020</rights><rights>Akadémiai Kiadó, Budapest, Hungary 2020.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c438t-5ce653d9a5b4b0d7cfb2b3c63d238c42baa06971285ce7cf20b0df82a3d5d64d3</citedby><cites>FETCH-LOGICAL-c438t-5ce653d9a5b4b0d7cfb2b3c63d238c42baa06971285ce7cf20b0df82a3d5d64d3</cites><orcidid>0000-0003-0525-6120</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902,34112</link.rule.ids></links><search><creatorcontrib>Wang, Mingyang</creatorcontrib><creatorcontrib>Zhang, Jiaqi</creatorcontrib><creatorcontrib>Jiao, Shijia</creatorcontrib><creatorcontrib>Zhang, Xiangrong</creatorcontrib><creatorcontrib>Zhu, Na</creatorcontrib><creatorcontrib>Chen, Guangsheng</creatorcontrib><title>Important citation identification by exploiting the syntactic and contextual information of citations</title><title>Scientometrics</title><addtitle>Scientometrics</addtitle><description>Citations are not equally important. Researchers presented different models and techniques to identify important citations. However, the features used in these work are relatively limited, so they cannot achieve good recognition performance. This paper proposed a new machine learning framework to distinguish important and non-important citations by examining the syntactic and contextual information of citations. Among them, syntactic features reflect the statistical perspective characteristics brought by citation behavior, such as the cited frequency and citation position of the cited article in the citing ones. Contextual features reflect the semantic content characteristics brought by citations, such as the intent and polarity of citations. Three feature selection algorithms, Pearson correlation coefficient, relief-F and entropy weight method, were used to calculate the contribution of each index on distinguishing different kinds of citations. On this basis, key features that can better identify the important citations were screened out. Three classifiers of support vector machine, KNN and random forest were used to test the classification performance of these key features. The experiment was performed on two annotated benchmark datasets. It showed that the framework proposed in this paper can achieve better classification performance compared with contemporary state-of-the-art research. The syntactic and contextual features of citation are of great value in identifying important citations.</description><subject>Algorithms</subject><subject>Citations</subject><subject>Classification</subject><subject>Computer Science</subject><subject>Correlation coefficient</subject><subject>Correlation coefficients</subject><subject>Entropy</subject><subject>Information Storage and Retrieval</subject><subject>Learning algorithms</subject><subject>Library Science</subject><subject>Machine learning</subject><subject>Polarity</subject><subject>Support vector machines</subject><issn>0138-9130</issn><issn>1588-2861</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>F2A</sourceid><recordid>eNp9kE1LAzEQhoMoWKt_wFPAc3SS7Ef2KMWPQsGLnkM2ydaUNlmTFNp_b3RFb56GYZ7nHXgRuqZwSwHau0Qp7RgBBgR407aEnqAZrYUgTDT0FM2AckE6yuEcXaS0gSJxEDNkl7sxxKx8xtpllV3w2Bnrsxucntb-iO1h3AaXnV_j_G5xOvqsdHYaK2-wDj7bQ96rLXZ-CHE3aWH4TUyX6GxQ22SvfuYcvT0-vC6eyerlabm4XxFdcZFJrW1Tc9Opuq96MK0eetZz3XDDuNAV65WCpmspE4UsVwaFGgRT3NSmqQyfo5spd4zhY29Tlpuwj768lKzqGAdWd12h2ETpGFKKdpBjdDsVj5KC_KpTTnXKUqf8rlPSIvFJSgX2axv_ov-xPgGv0Hre</recordid><startdate>20201201</startdate><enddate>20201201</enddate><creator>Wang, Mingyang</creator><creator>Zhang, Jiaqi</creator><creator>Jiao, Shijia</creator><creator>Zhang, Xiangrong</creator><creator>Zhu, Na</creator><creator>Chen, Guangsheng</creator><general>Springer International Publishing</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>E3H</scope><scope>F2A</scope><orcidid>https://orcid.org/0000-0003-0525-6120</orcidid></search><sort><creationdate>20201201</creationdate><title>Important citation identification by exploiting the syntactic and contextual information of citations</title><author>Wang, Mingyang ; Zhang, Jiaqi ; Jiao, Shijia ; Zhang, Xiangrong ; Zhu, Na ; Chen, Guangsheng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c438t-5ce653d9a5b4b0d7cfb2b3c63d238c42baa06971285ce7cf20b0df82a3d5d64d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Algorithms</topic><topic>Citations</topic><topic>Classification</topic><topic>Computer Science</topic><topic>Correlation coefficient</topic><topic>Correlation coefficients</topic><topic>Entropy</topic><topic>Information Storage and Retrieval</topic><topic>Learning algorithms</topic><topic>Library Science</topic><topic>Machine learning</topic><topic>Polarity</topic><topic>Support vector machines</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Mingyang</creatorcontrib><creatorcontrib>Zhang, Jiaqi</creatorcontrib><creatorcontrib>Jiao, Shijia</creatorcontrib><creatorcontrib>Zhang, Xiangrong</creatorcontrib><creatorcontrib>Zhu, Na</creatorcontrib><creatorcontrib>Chen, Guangsheng</creatorcontrib><collection>CrossRef</collection><collection>Library & Information Sciences Abstracts (LISA)</collection><collection>Library & Information Science Abstracts (LISA)</collection><jtitle>Scientometrics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wang, Mingyang</au><au>Zhang, Jiaqi</au><au>Jiao, Shijia</au><au>Zhang, Xiangrong</au><au>Zhu, Na</au><au>Chen, Guangsheng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Important citation identification by exploiting the syntactic and contextual information of citations</atitle><jtitle>Scientometrics</jtitle><stitle>Scientometrics</stitle><date>2020-12-01</date><risdate>2020</risdate><volume>125</volume><issue>3</issue><spage>2109</spage><epage>2129</epage><pages>2109-2129</pages><issn>0138-9130</issn><eissn>1588-2861</eissn><abstract>Citations are not equally important. Researchers presented different models and techniques to identify important citations. However, the features used in these work are relatively limited, so they cannot achieve good recognition performance. This paper proposed a new machine learning framework to distinguish important and non-important citations by examining the syntactic and contextual information of citations. Among them, syntactic features reflect the statistical perspective characteristics brought by citation behavior, such as the cited frequency and citation position of the cited article in the citing ones. Contextual features reflect the semantic content characteristics brought by citations, such as the intent and polarity of citations. Three feature selection algorithms, Pearson correlation coefficient, relief-F and entropy weight method, were used to calculate the contribution of each index on distinguishing different kinds of citations. On this basis, key features that can better identify the important citations were screened out. Three classifiers of support vector machine, KNN and random forest were used to test the classification performance of these key features. The experiment was performed on two annotated benchmark datasets. It showed that the framework proposed in this paper can achieve better classification performance compared with contemporary state-of-the-art research. The syntactic and contextual features of citation are of great value in identifying important citations.</abstract><cop>Cham</cop><pub>Springer International Publishing</pub><doi>10.1007/s11192-020-03677-1</doi><tpages>21</tpages><orcidid>https://orcid.org/0000-0003-0525-6120</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0138-9130 |
ispartof | Scientometrics, 2020-12, Vol.125 (3), p.2109-2129 |
issn | 0138-9130 1588-2861 |
language | eng |
recordid | cdi_proquest_journals_2492302599 |
source | Library & Information Science Abstracts (LISA); Springer Nature |
subjects | Algorithms Citations Classification Computer Science Correlation coefficient Correlation coefficients Entropy Information Storage and Retrieval Learning algorithms Library Science Machine learning Polarity Support vector machines |
title | Important citation identification by exploiting the syntactic and contextual information of citations |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T07%3A27%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Important%20citation%20identification%20by%20exploiting%20the%20syntactic%20and%20contextual%20information%20of%20citations&rft.jtitle=Scientometrics&rft.au=Wang,%20Mingyang&rft.date=2020-12-01&rft.volume=125&rft.issue=3&rft.spage=2109&rft.epage=2129&rft.pages=2109-2129&rft.issn=0138-9130&rft.eissn=1588-2861&rft_id=info:doi/10.1007/s11192-020-03677-1&rft_dat=%3Cproquest_cross%3E2492302599%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c438t-5ce653d9a5b4b0d7cfb2b3c63d238c42baa06971285ce7cf20b0df82a3d5d64d3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2492302599&rft_id=info:pmid/&rfr_iscdi=true |