Loading…
Semantic-Enriched Code Knowledge Graph to Reveal Unknowns in Smart Contract Code Reuse
Programmers who work with smart contract development often encounter challenges in reusing code from repositories. This is due to the presence of two unknowns that can lead to non-functional and functional failures. These unknowns are implicit collaborations between functions and subtle differences...
Saved in:
Published in: | ACM transactions on software engineering and methodology 2023-09, Vol.32 (6), p.1-37, Article 147 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-a277t-6eabd38a18655224b4c308e18b29926f9f1480f7377df416b34e33d925d5717e3 |
---|---|
cites | cdi_FETCH-LOGICAL-a277t-6eabd38a18655224b4c308e18b29926f9f1480f7377df416b34e33d925d5717e3 |
container_end_page | 37 |
container_issue | 6 |
container_start_page | 1 |
container_title | ACM transactions on software engineering and methodology |
container_volume | 32 |
creator | Huang, Qing Liao, Dianshu Xing, Zhenchang Zuo, Zhengkang Wang, Changjing Xia, Xin |
description | Programmers who work with smart contract development often encounter challenges in reusing code from repositories. This is due to the presence of two unknowns that can lead to non-functional and functional failures. These unknowns are implicit collaborations between functions and subtle differences among similar functions. Current code mining methods can extract syntax and semantic knowledge (known knowledge), but they cannot uncover these unknowns due to a significant gap between the known and the unknown. To address this issue, we formulate knowledge acquisition as a knowledge deduction task and propose an analytic flow that uses the function clone as a bridge to gradually deduce the known knowledge into the problem-solving knowledge that can reveal the unknowns. This flow comprises five methods: clone detection, co-occurrence probability calculation, function usage frequency accumulation, description propagation, and control flow graph annotation. This provides a systematic and coherent approach to knowledge deduction. We then structure all of the knowledge into a semantic-enriched code Knowledge Graph (KG) and integrate this KG into two software engineering tasks: code recommendation and crowd-scaled coding practice checking. As a proof of concept, we apply our approach to 5,140 smart contract files available on Etherscan.io and confirm high accuracy of our KG construction steps. In our experiments, our code KG effectively improved code recommendation accuracy by 6% to 45%, increased diversity by 61% to 102%, and enhanced NDCG by 1% to 21%. Furthermore, compared to traditional analysis tools and the debugging-with-the-crowd method, our KG improved time efficiency by 30 to 380 seconds, vulnerability determination accuracy by 20% to 33%, and vulnerability fixing accuracy by 24% to 40% for novice developers who identified and fixed vulnerable smart contract functions. |
doi_str_mv | 10.1145/3597206 |
format | article |
fullrecord | <record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3597206</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3597206</sourcerecordid><originalsourceid>FETCH-LOGICAL-a277t-6eabd38a18655224b4c308e18b29926f9f1480f7377df416b34e33d925d5717e3</originalsourceid><addsrcrecordid>eNo9kM9LwzAcxYMoOKd495Sbp2p-Ns1RypziQNiceCtp8q2rrulIouJ_b6XT0_vC-7wvj4fQOSVXlAp5zaVWjOQHaEKlVJnimh0ONxE645y-HKOTGN8IoZwwMUHPK-iMT63NZj60dgMOl70D_OD7ry24V8DzYHYbnHq8hE8wW7z274PnI249XnUmpCHgUzA2jcklfEQ4RUeN2UY42-sUrW9nT-Vdtnic35c3i8wwpVKWg6kdLwwtcikZE7WwnBRAi5ppzfJGN1QUpFFcKdcImtdcAOdOM-mkogr4FF2Of23oYwzQVLvQDqW-K0qq3zmq_RwDeTGSxnb_0J_5A8knWNE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Semantic-Enriched Code Knowledge Graph to Reveal Unknowns in Smart Contract Code Reuse</title><source>Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)</source><creator>Huang, Qing ; Liao, Dianshu ; Xing, Zhenchang ; Zuo, Zhengkang ; Wang, Changjing ; Xia, Xin</creator><creatorcontrib>Huang, Qing ; Liao, Dianshu ; Xing, Zhenchang ; Zuo, Zhengkang ; Wang, Changjing ; Xia, Xin</creatorcontrib><description>Programmers who work with smart contract development often encounter challenges in reusing code from repositories. This is due to the presence of two unknowns that can lead to non-functional and functional failures. These unknowns are implicit collaborations between functions and subtle differences among similar functions. Current code mining methods can extract syntax and semantic knowledge (known knowledge), but they cannot uncover these unknowns due to a significant gap between the known and the unknown. To address this issue, we formulate knowledge acquisition as a knowledge deduction task and propose an analytic flow that uses the function clone as a bridge to gradually deduce the known knowledge into the problem-solving knowledge that can reveal the unknowns. This flow comprises five methods: clone detection, co-occurrence probability calculation, function usage frequency accumulation, description propagation, and control flow graph annotation. This provides a systematic and coherent approach to knowledge deduction. We then structure all of the knowledge into a semantic-enriched code Knowledge Graph (KG) and integrate this KG into two software engineering tasks: code recommendation and crowd-scaled coding practice checking. As a proof of concept, we apply our approach to 5,140 smart contract files available on Etherscan.io and confirm high accuracy of our KG construction steps. In our experiments, our code KG effectively improved code recommendation accuracy by 6% to 45%, increased diversity by 61% to 102%, and enhanced NDCG by 1% to 21%. Furthermore, compared to traditional analysis tools and the debugging-with-the-crowd method, our KG improved time efficiency by 30 to 380 seconds, vulnerability determination accuracy by 20% to 33%, and vulnerability fixing accuracy by 24% to 40% for novice developers who identified and fixed vulnerable smart contract functions.</description><identifier>ISSN: 1049-331X</identifier><identifier>EISSN: 1557-7392</identifier><identifier>DOI: 10.1145/3597206</identifier><language>eng</language><publisher>New York, NY: ACM</publisher><subject>Reusability ; Search-based software engineering ; Security and privacy ; Semantics ; Software and its engineering ; Software security engineering</subject><ispartof>ACM transactions on software engineering and methodology, 2023-09, Vol.32 (6), p.1-37, Article 147</ispartof><rights>Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a277t-6eabd38a18655224b4c308e18b29926f9f1480f7377df416b34e33d925d5717e3</citedby><cites>FETCH-LOGICAL-a277t-6eabd38a18655224b4c308e18b29926f9f1480f7377df416b34e33d925d5717e3</cites><orcidid>0000-0002-3601-4979 ; 0000-0001-7663-1421 ; 0000-0002-6302-3256 ; 0009-0000-0865-0444 ; 0000-0002-8877-4267 ; 0000-0002-7118-3727</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Huang, Qing</creatorcontrib><creatorcontrib>Liao, Dianshu</creatorcontrib><creatorcontrib>Xing, Zhenchang</creatorcontrib><creatorcontrib>Zuo, Zhengkang</creatorcontrib><creatorcontrib>Wang, Changjing</creatorcontrib><creatorcontrib>Xia, Xin</creatorcontrib><title>Semantic-Enriched Code Knowledge Graph to Reveal Unknowns in Smart Contract Code Reuse</title><title>ACM transactions on software engineering and methodology</title><addtitle>ACM TOSEM</addtitle><description>Programmers who work with smart contract development often encounter challenges in reusing code from repositories. This is due to the presence of two unknowns that can lead to non-functional and functional failures. These unknowns are implicit collaborations between functions and subtle differences among similar functions. Current code mining methods can extract syntax and semantic knowledge (known knowledge), but they cannot uncover these unknowns due to a significant gap between the known and the unknown. To address this issue, we formulate knowledge acquisition as a knowledge deduction task and propose an analytic flow that uses the function clone as a bridge to gradually deduce the known knowledge into the problem-solving knowledge that can reveal the unknowns. This flow comprises five methods: clone detection, co-occurrence probability calculation, function usage frequency accumulation, description propagation, and control flow graph annotation. This provides a systematic and coherent approach to knowledge deduction. We then structure all of the knowledge into a semantic-enriched code Knowledge Graph (KG) and integrate this KG into two software engineering tasks: code recommendation and crowd-scaled coding practice checking. As a proof of concept, we apply our approach to 5,140 smart contract files available on Etherscan.io and confirm high accuracy of our KG construction steps. In our experiments, our code KG effectively improved code recommendation accuracy by 6% to 45%, increased diversity by 61% to 102%, and enhanced NDCG by 1% to 21%. Furthermore, compared to traditional analysis tools and the debugging-with-the-crowd method, our KG improved time efficiency by 30 to 380 seconds, vulnerability determination accuracy by 20% to 33%, and vulnerability fixing accuracy by 24% to 40% for novice developers who identified and fixed vulnerable smart contract functions.</description><subject>Reusability</subject><subject>Search-based software engineering</subject><subject>Security and privacy</subject><subject>Semantics</subject><subject>Software and its engineering</subject><subject>Software security engineering</subject><issn>1049-331X</issn><issn>1557-7392</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNo9kM9LwzAcxYMoOKd495Sbp2p-Ns1RypziQNiceCtp8q2rrulIouJ_b6XT0_vC-7wvj4fQOSVXlAp5zaVWjOQHaEKlVJnimh0ONxE645y-HKOTGN8IoZwwMUHPK-iMT63NZj60dgMOl70D_OD7ry24V8DzYHYbnHq8hE8wW7z274PnI249XnUmpCHgUzA2jcklfEQ4RUeN2UY42-sUrW9nT-Vdtnic35c3i8wwpVKWg6kdLwwtcikZE7WwnBRAi5ppzfJGN1QUpFFcKdcImtdcAOdOM-mkogr4FF2Of23oYwzQVLvQDqW-K0qq3zmq_RwDeTGSxnb_0J_5A8knWNE</recordid><startdate>20230930</startdate><enddate>20230930</enddate><creator>Huang, Qing</creator><creator>Liao, Dianshu</creator><creator>Xing, Zhenchang</creator><creator>Zuo, Zhengkang</creator><creator>Wang, Changjing</creator><creator>Xia, Xin</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-3601-4979</orcidid><orcidid>https://orcid.org/0000-0001-7663-1421</orcidid><orcidid>https://orcid.org/0000-0002-6302-3256</orcidid><orcidid>https://orcid.org/0009-0000-0865-0444</orcidid><orcidid>https://orcid.org/0000-0002-8877-4267</orcidid><orcidid>https://orcid.org/0000-0002-7118-3727</orcidid></search><sort><creationdate>20230930</creationdate><title>Semantic-Enriched Code Knowledge Graph to Reveal Unknowns in Smart Contract Code Reuse</title><author>Huang, Qing ; Liao, Dianshu ; Xing, Zhenchang ; Zuo, Zhengkang ; Wang, Changjing ; Xia, Xin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a277t-6eabd38a18655224b4c308e18b29926f9f1480f7377df416b34e33d925d5717e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Reusability</topic><topic>Search-based software engineering</topic><topic>Security and privacy</topic><topic>Semantics</topic><topic>Software and its engineering</topic><topic>Software security engineering</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Huang, Qing</creatorcontrib><creatorcontrib>Liao, Dianshu</creatorcontrib><creatorcontrib>Xing, Zhenchang</creatorcontrib><creatorcontrib>Zuo, Zhengkang</creatorcontrib><creatorcontrib>Wang, Changjing</creatorcontrib><creatorcontrib>Xia, Xin</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on software engineering and methodology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Huang, Qing</au><au>Liao, Dianshu</au><au>Xing, Zhenchang</au><au>Zuo, Zhengkang</au><au>Wang, Changjing</au><au>Xia, Xin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Semantic-Enriched Code Knowledge Graph to Reveal Unknowns in Smart Contract Code Reuse</atitle><jtitle>ACM transactions on software engineering and methodology</jtitle><stitle>ACM TOSEM</stitle><date>2023-09-30</date><risdate>2023</risdate><volume>32</volume><issue>6</issue><spage>1</spage><epage>37</epage><pages>1-37</pages><artnum>147</artnum><issn>1049-331X</issn><eissn>1557-7392</eissn><abstract>Programmers who work with smart contract development often encounter challenges in reusing code from repositories. This is due to the presence of two unknowns that can lead to non-functional and functional failures. These unknowns are implicit collaborations between functions and subtle differences among similar functions. Current code mining methods can extract syntax and semantic knowledge (known knowledge), but they cannot uncover these unknowns due to a significant gap between the known and the unknown. To address this issue, we formulate knowledge acquisition as a knowledge deduction task and propose an analytic flow that uses the function clone as a bridge to gradually deduce the known knowledge into the problem-solving knowledge that can reveal the unknowns. This flow comprises five methods: clone detection, co-occurrence probability calculation, function usage frequency accumulation, description propagation, and control flow graph annotation. This provides a systematic and coherent approach to knowledge deduction. We then structure all of the knowledge into a semantic-enriched code Knowledge Graph (KG) and integrate this KG into two software engineering tasks: code recommendation and crowd-scaled coding practice checking. As a proof of concept, we apply our approach to 5,140 smart contract files available on Etherscan.io and confirm high accuracy of our KG construction steps. In our experiments, our code KG effectively improved code recommendation accuracy by 6% to 45%, increased diversity by 61% to 102%, and enhanced NDCG by 1% to 21%. Furthermore, compared to traditional analysis tools and the debugging-with-the-crowd method, our KG improved time efficiency by 30 to 380 seconds, vulnerability determination accuracy by 20% to 33%, and vulnerability fixing accuracy by 24% to 40% for novice developers who identified and fixed vulnerable smart contract functions.</abstract><cop>New York, NY</cop><pub>ACM</pub><doi>10.1145/3597206</doi><tpages>37</tpages><orcidid>https://orcid.org/0000-0002-3601-4979</orcidid><orcidid>https://orcid.org/0000-0001-7663-1421</orcidid><orcidid>https://orcid.org/0000-0002-6302-3256</orcidid><orcidid>https://orcid.org/0009-0000-0865-0444</orcidid><orcidid>https://orcid.org/0000-0002-8877-4267</orcidid><orcidid>https://orcid.org/0000-0002-7118-3727</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1049-331X |
ispartof | ACM transactions on software engineering and methodology, 2023-09, Vol.32 (6), p.1-37, Article 147 |
issn | 1049-331X 1557-7392 |
language | eng |
recordid | cdi_crossref_primary_10_1145_3597206 |
source | Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list) |
subjects | Reusability Search-based software engineering Security and privacy Semantics Software and its engineering Software security engineering |
title | Semantic-Enriched Code Knowledge Graph to Reveal Unknowns in Smart Contract Code Reuse |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T23%3A42%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Semantic-Enriched%20Code%20Knowledge%20Graph%20to%20Reveal%20Unknowns%20in%20Smart%20Contract%20Code%20Reuse&rft.jtitle=ACM%20transactions%20on%20software%20engineering%20and%20methodology&rft.au=Huang,%20Qing&rft.date=2023-09-30&rft.volume=32&rft.issue=6&rft.spage=1&rft.epage=37&rft.pages=1-37&rft.artnum=147&rft.issn=1049-331X&rft.eissn=1557-7392&rft_id=info:doi/10.1145/3597206&rft_dat=%3Cacm_cross%3E3597206%3C/acm_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a277t-6eabd38a18655224b4c308e18b29926f9f1480f7377df416b34e33d925d5717e3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |