Loading…

Important citation identification by exploiting the syntactic and contextual information of citations

Citations are not equally important. Researchers presented different models and techniques to identify important citations. However, the features used in these work are relatively limited, so they cannot achieve good recognition performance. This paper proposed a new machine learning framework to di...

Full description

Saved in:
Bibliographic Details
Published in:Scientometrics 2020-12, Vol.125 (3), p.2109-2129
Main Authors: Wang, Mingyang, Zhang, Jiaqi, Jiao, Shijia, Zhang, Xiangrong, Zhu, Na, Chen, Guangsheng
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c438t-5ce653d9a5b4b0d7cfb2b3c63d238c42baa06971285ce7cf20b0df82a3d5d64d3
cites cdi_FETCH-LOGICAL-c438t-5ce653d9a5b4b0d7cfb2b3c63d238c42baa06971285ce7cf20b0df82a3d5d64d3
container_end_page 2129
container_issue 3
container_start_page 2109
container_title Scientometrics
container_volume 125
creator Wang, Mingyang
Zhang, Jiaqi
Jiao, Shijia
Zhang, Xiangrong
Zhu, Na
Chen, Guangsheng
description Citations are not equally important. Researchers presented different models and techniques to identify important citations. However, the features used in these work are relatively limited, so they cannot achieve good recognition performance. This paper proposed a new machine learning framework to distinguish important and non-important citations by examining the syntactic and contextual information of citations. Among them, syntactic features reflect the statistical perspective characteristics brought by citation behavior, such as the cited frequency and citation position of the cited article in the citing ones. Contextual features reflect the semantic content characteristics brought by citations, such as the intent and polarity of citations. Three feature selection algorithms, Pearson correlation coefficient, relief-F and entropy weight method, were used to calculate the contribution of each index on distinguishing different kinds of citations. On this basis, key features that can better identify the important citations were screened out. Three classifiers of support vector machine, KNN and random forest were used to test the classification performance of these key features. The experiment was performed on two annotated benchmark datasets. It showed that the framework proposed in this paper can achieve better classification performance compared with contemporary state-of-the-art research. The syntactic and contextual features of citation are of great value in identifying important citations.
doi_str_mv 10.1007/s11192-020-03677-1
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2492302599</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2492302599</sourcerecordid><originalsourceid>FETCH-LOGICAL-c438t-5ce653d9a5b4b0d7cfb2b3c63d238c42baa06971285ce7cf20b0df82a3d5d64d3</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhoMoWKt_wFPAc3SS7Ef2KMWPQsGLnkM2ydaUNlmTFNp_b3RFb56GYZ7nHXgRuqZwSwHau0Qp7RgBBgR407aEnqAZrYUgTDT0FM2AckE6yuEcXaS0gSJxEDNkl7sxxKx8xtpllV3w2Bnrsxucntb-iO1h3AaXnV_j_G5xOvqsdHYaK2-wDj7bQ96rLXZ-CHE3aWH4TUyX6GxQ22SvfuYcvT0-vC6eyerlabm4XxFdcZFJrW1Tc9Opuq96MK0eetZz3XDDuNAV65WCpmspE4UsVwaFGgRT3NSmqQyfo5spd4zhY29Tlpuwj768lKzqGAdWd12h2ETpGFKKdpBjdDsVj5KC_KpTTnXKUqf8rlPSIvFJSgX2axv_ov-xPgGv0Hre</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2492302599</pqid></control><display><type>article</type><title>Important citation identification by exploiting the syntactic and contextual information of citations</title><source>Library &amp; Information Science Abstracts (LISA)</source><source>Springer Nature</source><creator>Wang, Mingyang ; Zhang, Jiaqi ; Jiao, Shijia ; Zhang, Xiangrong ; Zhu, Na ; Chen, Guangsheng</creator><creatorcontrib>Wang, Mingyang ; Zhang, Jiaqi ; Jiao, Shijia ; Zhang, Xiangrong ; Zhu, Na ; Chen, Guangsheng</creatorcontrib><description>Citations are not equally important. Researchers presented different models and techniques to identify important citations. However, the features used in these work are relatively limited, so they cannot achieve good recognition performance. This paper proposed a new machine learning framework to distinguish important and non-important citations by examining the syntactic and contextual information of citations. Among them, syntactic features reflect the statistical perspective characteristics brought by citation behavior, such as the cited frequency and citation position of the cited article in the citing ones. Contextual features reflect the semantic content characteristics brought by citations, such as the intent and polarity of citations. Three feature selection algorithms, Pearson correlation coefficient, relief-F and entropy weight method, were used to calculate the contribution of each index on distinguishing different kinds of citations. On this basis, key features that can better identify the important citations were screened out. Three classifiers of support vector machine, KNN and random forest were used to test the classification performance of these key features. The experiment was performed on two annotated benchmark datasets. It showed that the framework proposed in this paper can achieve better classification performance compared with contemporary state-of-the-art research. The syntactic and contextual features of citation are of great value in identifying important citations.</description><identifier>ISSN: 0138-9130</identifier><identifier>EISSN: 1588-2861</identifier><identifier>DOI: 10.1007/s11192-020-03677-1</identifier><language>eng</language><publisher>Cham: Springer International Publishing</publisher><subject>Algorithms ; Citations ; Classification ; Computer Science ; Correlation coefficient ; Correlation coefficients ; Entropy ; Information Storage and Retrieval ; Learning algorithms ; Library Science ; Machine learning ; Polarity ; Support vector machines</subject><ispartof>Scientometrics, 2020-12, Vol.125 (3), p.2109-2129</ispartof><rights>Akadémiai Kiadó, Budapest, Hungary 2020</rights><rights>Akadémiai Kiadó, Budapest, Hungary 2020.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c438t-5ce653d9a5b4b0d7cfb2b3c63d238c42baa06971285ce7cf20b0df82a3d5d64d3</citedby><cites>FETCH-LOGICAL-c438t-5ce653d9a5b4b0d7cfb2b3c63d238c42baa06971285ce7cf20b0df82a3d5d64d3</cites><orcidid>0000-0003-0525-6120</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902,34112</link.rule.ids></links><search><creatorcontrib>Wang, Mingyang</creatorcontrib><creatorcontrib>Zhang, Jiaqi</creatorcontrib><creatorcontrib>Jiao, Shijia</creatorcontrib><creatorcontrib>Zhang, Xiangrong</creatorcontrib><creatorcontrib>Zhu, Na</creatorcontrib><creatorcontrib>Chen, Guangsheng</creatorcontrib><title>Important citation identification by exploiting the syntactic and contextual information of citations</title><title>Scientometrics</title><addtitle>Scientometrics</addtitle><description>Citations are not equally important. Researchers presented different models and techniques to identify important citations. However, the features used in these work are relatively limited, so they cannot achieve good recognition performance. This paper proposed a new machine learning framework to distinguish important and non-important citations by examining the syntactic and contextual information of citations. Among them, syntactic features reflect the statistical perspective characteristics brought by citation behavior, such as the cited frequency and citation position of the cited article in the citing ones. Contextual features reflect the semantic content characteristics brought by citations, such as the intent and polarity of citations. Three feature selection algorithms, Pearson correlation coefficient, relief-F and entropy weight method, were used to calculate the contribution of each index on distinguishing different kinds of citations. On this basis, key features that can better identify the important citations were screened out. Three classifiers of support vector machine, KNN and random forest were used to test the classification performance of these key features. The experiment was performed on two annotated benchmark datasets. It showed that the framework proposed in this paper can achieve better classification performance compared with contemporary state-of-the-art research. The syntactic and contextual features of citation are of great value in identifying important citations.</description><subject>Algorithms</subject><subject>Citations</subject><subject>Classification</subject><subject>Computer Science</subject><subject>Correlation coefficient</subject><subject>Correlation coefficients</subject><subject>Entropy</subject><subject>Information Storage and Retrieval</subject><subject>Learning algorithms</subject><subject>Library Science</subject><subject>Machine learning</subject><subject>Polarity</subject><subject>Support vector machines</subject><issn>0138-9130</issn><issn>1588-2861</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>F2A</sourceid><recordid>eNp9kE1LAzEQhoMoWKt_wFPAc3SS7Ef2KMWPQsGLnkM2ydaUNlmTFNp_b3RFb56GYZ7nHXgRuqZwSwHau0Qp7RgBBgR407aEnqAZrYUgTDT0FM2AckE6yuEcXaS0gSJxEDNkl7sxxKx8xtpllV3w2Bnrsxucntb-iO1h3AaXnV_j_G5xOvqsdHYaK2-wDj7bQ96rLXZ-CHE3aWH4TUyX6GxQ22SvfuYcvT0-vC6eyerlabm4XxFdcZFJrW1Tc9Opuq96MK0eetZz3XDDuNAV65WCpmspE4UsVwaFGgRT3NSmqQyfo5spd4zhY29Tlpuwj768lKzqGAdWd12h2ETpGFKKdpBjdDsVj5KC_KpTTnXKUqf8rlPSIvFJSgX2axv_ov-xPgGv0Hre</recordid><startdate>20201201</startdate><enddate>20201201</enddate><creator>Wang, Mingyang</creator><creator>Zhang, Jiaqi</creator><creator>Jiao, Shijia</creator><creator>Zhang, Xiangrong</creator><creator>Zhu, Na</creator><creator>Chen, Guangsheng</creator><general>Springer International Publishing</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>E3H</scope><scope>F2A</scope><orcidid>https://orcid.org/0000-0003-0525-6120</orcidid></search><sort><creationdate>20201201</creationdate><title>Important citation identification by exploiting the syntactic and contextual information of citations</title><author>Wang, Mingyang ; Zhang, Jiaqi ; Jiao, Shijia ; Zhang, Xiangrong ; Zhu, Na ; Chen, Guangsheng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c438t-5ce653d9a5b4b0d7cfb2b3c63d238c42baa06971285ce7cf20b0df82a3d5d64d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Algorithms</topic><topic>Citations</topic><topic>Classification</topic><topic>Computer Science</topic><topic>Correlation coefficient</topic><topic>Correlation coefficients</topic><topic>Entropy</topic><topic>Information Storage and Retrieval</topic><topic>Learning algorithms</topic><topic>Library Science</topic><topic>Machine learning</topic><topic>Polarity</topic><topic>Support vector machines</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Mingyang</creatorcontrib><creatorcontrib>Zhang, Jiaqi</creatorcontrib><creatorcontrib>Jiao, Shijia</creatorcontrib><creatorcontrib>Zhang, Xiangrong</creatorcontrib><creatorcontrib>Zhu, Na</creatorcontrib><creatorcontrib>Chen, Guangsheng</creatorcontrib><collection>CrossRef</collection><collection>Library &amp; Information Sciences Abstracts (LISA)</collection><collection>Library &amp; Information Science Abstracts (LISA)</collection><jtitle>Scientometrics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wang, Mingyang</au><au>Zhang, Jiaqi</au><au>Jiao, Shijia</au><au>Zhang, Xiangrong</au><au>Zhu, Na</au><au>Chen, Guangsheng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Important citation identification by exploiting the syntactic and contextual information of citations</atitle><jtitle>Scientometrics</jtitle><stitle>Scientometrics</stitle><date>2020-12-01</date><risdate>2020</risdate><volume>125</volume><issue>3</issue><spage>2109</spage><epage>2129</epage><pages>2109-2129</pages><issn>0138-9130</issn><eissn>1588-2861</eissn><abstract>Citations are not equally important. Researchers presented different models and techniques to identify important citations. However, the features used in these work are relatively limited, so they cannot achieve good recognition performance. This paper proposed a new machine learning framework to distinguish important and non-important citations by examining the syntactic and contextual information of citations. Among them, syntactic features reflect the statistical perspective characteristics brought by citation behavior, such as the cited frequency and citation position of the cited article in the citing ones. Contextual features reflect the semantic content characteristics brought by citations, such as the intent and polarity of citations. Three feature selection algorithms, Pearson correlation coefficient, relief-F and entropy weight method, were used to calculate the contribution of each index on distinguishing different kinds of citations. On this basis, key features that can better identify the important citations were screened out. Three classifiers of support vector machine, KNN and random forest were used to test the classification performance of these key features. The experiment was performed on two annotated benchmark datasets. It showed that the framework proposed in this paper can achieve better classification performance compared with contemporary state-of-the-art research. The syntactic and contextual features of citation are of great value in identifying important citations.</abstract><cop>Cham</cop><pub>Springer International Publishing</pub><doi>10.1007/s11192-020-03677-1</doi><tpages>21</tpages><orcidid>https://orcid.org/0000-0003-0525-6120</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0138-9130
ispartof Scientometrics, 2020-12, Vol.125 (3), p.2109-2129
issn 0138-9130
1588-2861
language eng
recordid cdi_proquest_journals_2492302599
source Library & Information Science Abstracts (LISA); Springer Nature
subjects Algorithms
Citations
Classification
Computer Science
Correlation coefficient
Correlation coefficients
Entropy
Information Storage and Retrieval
Learning algorithms
Library Science
Machine learning
Polarity
Support vector machines
title Important citation identification by exploiting the syntactic and contextual information of citations
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T07%3A27%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Important%20citation%20identification%20by%20exploiting%20the%20syntactic%20and%20contextual%20information%20of%20citations&rft.jtitle=Scientometrics&rft.au=Wang,%20Mingyang&rft.date=2020-12-01&rft.volume=125&rft.issue=3&rft.spage=2109&rft.epage=2129&rft.pages=2109-2129&rft.issn=0138-9130&rft.eissn=1588-2861&rft_id=info:doi/10.1007/s11192-020-03677-1&rft_dat=%3Cproquest_cross%3E2492302599%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c438t-5ce653d9a5b4b0d7cfb2b3c63d238c42baa06971285ce7cf20b0df82a3d5d64d3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2492302599&rft_id=info:pmid/&rfr_iscdi=true