Loading…

Semantic-Enriched Code Knowledge Graph to Reveal Unknowns in Smart Contract Code Reuse

Programmers who work with smart contract development often encounter challenges in reusing code from repositories. This is due to the presence of two unknowns that can lead to non-functional and functional failures. These unknowns are implicit collaborations between functions and subtle differences...

Full description

Saved in:
Bibliographic Details
Published in:ACM transactions on software engineering and methodology 2023-09, Vol.32 (6), p.1-37, Article 147
Main Authors: Huang, Qing, Liao, Dianshu, Xing, Zhenchang, Zuo, Zhengkang, Wang, Changjing, Xia, Xin
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-a277t-6eabd38a18655224b4c308e18b29926f9f1480f7377df416b34e33d925d5717e3
cites cdi_FETCH-LOGICAL-a277t-6eabd38a18655224b4c308e18b29926f9f1480f7377df416b34e33d925d5717e3
container_end_page 37
container_issue 6
container_start_page 1
container_title ACM transactions on software engineering and methodology
container_volume 32
creator Huang, Qing
Liao, Dianshu
Xing, Zhenchang
Zuo, Zhengkang
Wang, Changjing
Xia, Xin
description Programmers who work with smart contract development often encounter challenges in reusing code from repositories. This is due to the presence of two unknowns that can lead to non-functional and functional failures. These unknowns are implicit collaborations between functions and subtle differences among similar functions. Current code mining methods can extract syntax and semantic knowledge (known knowledge), but they cannot uncover these unknowns due to a significant gap between the known and the unknown. To address this issue, we formulate knowledge acquisition as a knowledge deduction task and propose an analytic flow that uses the function clone as a bridge to gradually deduce the known knowledge into the problem-solving knowledge that can reveal the unknowns. This flow comprises five methods: clone detection, co-occurrence probability calculation, function usage frequency accumulation, description propagation, and control flow graph annotation. This provides a systematic and coherent approach to knowledge deduction. We then structure all of the knowledge into a semantic-enriched code Knowledge Graph (KG) and integrate this KG into two software engineering tasks: code recommendation and crowd-scaled coding practice checking. As a proof of concept, we apply our approach to 5,140 smart contract files available on Etherscan.io and confirm high accuracy of our KG construction steps. In our experiments, our code KG effectively improved code recommendation accuracy by 6% to 45%, increased diversity by 61% to 102%, and enhanced NDCG by 1% to 21%. Furthermore, compared to traditional analysis tools and the debugging-with-the-crowd method, our KG improved time efficiency by 30 to 380 seconds, vulnerability determination accuracy by 20% to 33%, and vulnerability fixing accuracy by 24% to 40% for novice developers who identified and fixed vulnerable smart contract functions.
doi_str_mv 10.1145/3597206
format article
fullrecord <record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3597206</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3597206</sourcerecordid><originalsourceid>FETCH-LOGICAL-a277t-6eabd38a18655224b4c308e18b29926f9f1480f7377df416b34e33d925d5717e3</originalsourceid><addsrcrecordid>eNo9kM9LwzAcxYMoOKd495Sbp2p-Ns1RypziQNiceCtp8q2rrulIouJ_b6XT0_vC-7wvj4fQOSVXlAp5zaVWjOQHaEKlVJnimh0ONxE645y-HKOTGN8IoZwwMUHPK-iMT63NZj60dgMOl70D_OD7ry24V8DzYHYbnHq8hE8wW7z274PnI249XnUmpCHgUzA2jcklfEQ4RUeN2UY42-sUrW9nT-Vdtnic35c3i8wwpVKWg6kdLwwtcikZE7WwnBRAi5ppzfJGN1QUpFFcKdcImtdcAOdOM-mkogr4FF2Of23oYwzQVLvQDqW-K0qq3zmq_RwDeTGSxnb_0J_5A8knWNE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Semantic-Enriched Code Knowledge Graph to Reveal Unknowns in Smart Contract Code Reuse</title><source>Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)</source><creator>Huang, Qing ; Liao, Dianshu ; Xing, Zhenchang ; Zuo, Zhengkang ; Wang, Changjing ; Xia, Xin</creator><creatorcontrib>Huang, Qing ; Liao, Dianshu ; Xing, Zhenchang ; Zuo, Zhengkang ; Wang, Changjing ; Xia, Xin</creatorcontrib><description>Programmers who work with smart contract development often encounter challenges in reusing code from repositories. This is due to the presence of two unknowns that can lead to non-functional and functional failures. These unknowns are implicit collaborations between functions and subtle differences among similar functions. Current code mining methods can extract syntax and semantic knowledge (known knowledge), but they cannot uncover these unknowns due to a significant gap between the known and the unknown. To address this issue, we formulate knowledge acquisition as a knowledge deduction task and propose an analytic flow that uses the function clone as a bridge to gradually deduce the known knowledge into the problem-solving knowledge that can reveal the unknowns. This flow comprises five methods: clone detection, co-occurrence probability calculation, function usage frequency accumulation, description propagation, and control flow graph annotation. This provides a systematic and coherent approach to knowledge deduction. We then structure all of the knowledge into a semantic-enriched code Knowledge Graph (KG) and integrate this KG into two software engineering tasks: code recommendation and crowd-scaled coding practice checking. As a proof of concept, we apply our approach to 5,140 smart contract files available on Etherscan.io and confirm high accuracy of our KG construction steps. In our experiments, our code KG effectively improved code recommendation accuracy by 6% to 45%, increased diversity by 61% to 102%, and enhanced NDCG by 1% to 21%. Furthermore, compared to traditional analysis tools and the debugging-with-the-crowd method, our KG improved time efficiency by 30 to 380 seconds, vulnerability determination accuracy by 20% to 33%, and vulnerability fixing accuracy by 24% to 40% for novice developers who identified and fixed vulnerable smart contract functions.</description><identifier>ISSN: 1049-331X</identifier><identifier>EISSN: 1557-7392</identifier><identifier>DOI: 10.1145/3597206</identifier><language>eng</language><publisher>New York, NY: ACM</publisher><subject>Reusability ; Search-based software engineering ; Security and privacy ; Semantics ; Software and its engineering ; Software security engineering</subject><ispartof>ACM transactions on software engineering and methodology, 2023-09, Vol.32 (6), p.1-37, Article 147</ispartof><rights>Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a277t-6eabd38a18655224b4c308e18b29926f9f1480f7377df416b34e33d925d5717e3</citedby><cites>FETCH-LOGICAL-a277t-6eabd38a18655224b4c308e18b29926f9f1480f7377df416b34e33d925d5717e3</cites><orcidid>0000-0002-3601-4979 ; 0000-0001-7663-1421 ; 0000-0002-6302-3256 ; 0009-0000-0865-0444 ; 0000-0002-8877-4267 ; 0000-0002-7118-3727</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Huang, Qing</creatorcontrib><creatorcontrib>Liao, Dianshu</creatorcontrib><creatorcontrib>Xing, Zhenchang</creatorcontrib><creatorcontrib>Zuo, Zhengkang</creatorcontrib><creatorcontrib>Wang, Changjing</creatorcontrib><creatorcontrib>Xia, Xin</creatorcontrib><title>Semantic-Enriched Code Knowledge Graph to Reveal Unknowns in Smart Contract Code Reuse</title><title>ACM transactions on software engineering and methodology</title><addtitle>ACM TOSEM</addtitle><description>Programmers who work with smart contract development often encounter challenges in reusing code from repositories. This is due to the presence of two unknowns that can lead to non-functional and functional failures. These unknowns are implicit collaborations between functions and subtle differences among similar functions. Current code mining methods can extract syntax and semantic knowledge (known knowledge), but they cannot uncover these unknowns due to a significant gap between the known and the unknown. To address this issue, we formulate knowledge acquisition as a knowledge deduction task and propose an analytic flow that uses the function clone as a bridge to gradually deduce the known knowledge into the problem-solving knowledge that can reveal the unknowns. This flow comprises five methods: clone detection, co-occurrence probability calculation, function usage frequency accumulation, description propagation, and control flow graph annotation. This provides a systematic and coherent approach to knowledge deduction. We then structure all of the knowledge into a semantic-enriched code Knowledge Graph (KG) and integrate this KG into two software engineering tasks: code recommendation and crowd-scaled coding practice checking. As a proof of concept, we apply our approach to 5,140 smart contract files available on Etherscan.io and confirm high accuracy of our KG construction steps. In our experiments, our code KG effectively improved code recommendation accuracy by 6% to 45%, increased diversity by 61% to 102%, and enhanced NDCG by 1% to 21%. Furthermore, compared to traditional analysis tools and the debugging-with-the-crowd method, our KG improved time efficiency by 30 to 380 seconds, vulnerability determination accuracy by 20% to 33%, and vulnerability fixing accuracy by 24% to 40% for novice developers who identified and fixed vulnerable smart contract functions.</description><subject>Reusability</subject><subject>Search-based software engineering</subject><subject>Security and privacy</subject><subject>Semantics</subject><subject>Software and its engineering</subject><subject>Software security engineering</subject><issn>1049-331X</issn><issn>1557-7392</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNo9kM9LwzAcxYMoOKd495Sbp2p-Ns1RypziQNiceCtp8q2rrulIouJ_b6XT0_vC-7wvj4fQOSVXlAp5zaVWjOQHaEKlVJnimh0ONxE645y-HKOTGN8IoZwwMUHPK-iMT63NZj60dgMOl70D_OD7ry24V8DzYHYbnHq8hE8wW7z274PnI249XnUmpCHgUzA2jcklfEQ4RUeN2UY42-sUrW9nT-Vdtnic35c3i8wwpVKWg6kdLwwtcikZE7WwnBRAi5ppzfJGN1QUpFFcKdcImtdcAOdOM-mkogr4FF2Of23oYwzQVLvQDqW-K0qq3zmq_RwDeTGSxnb_0J_5A8knWNE</recordid><startdate>20230930</startdate><enddate>20230930</enddate><creator>Huang, Qing</creator><creator>Liao, Dianshu</creator><creator>Xing, Zhenchang</creator><creator>Zuo, Zhengkang</creator><creator>Wang, Changjing</creator><creator>Xia, Xin</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-3601-4979</orcidid><orcidid>https://orcid.org/0000-0001-7663-1421</orcidid><orcidid>https://orcid.org/0000-0002-6302-3256</orcidid><orcidid>https://orcid.org/0009-0000-0865-0444</orcidid><orcidid>https://orcid.org/0000-0002-8877-4267</orcidid><orcidid>https://orcid.org/0000-0002-7118-3727</orcidid></search><sort><creationdate>20230930</creationdate><title>Semantic-Enriched Code Knowledge Graph to Reveal Unknowns in Smart Contract Code Reuse</title><author>Huang, Qing ; Liao, Dianshu ; Xing, Zhenchang ; Zuo, Zhengkang ; Wang, Changjing ; Xia, Xin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a277t-6eabd38a18655224b4c308e18b29926f9f1480f7377df416b34e33d925d5717e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Reusability</topic><topic>Search-based software engineering</topic><topic>Security and privacy</topic><topic>Semantics</topic><topic>Software and its engineering</topic><topic>Software security engineering</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Huang, Qing</creatorcontrib><creatorcontrib>Liao, Dianshu</creatorcontrib><creatorcontrib>Xing, Zhenchang</creatorcontrib><creatorcontrib>Zuo, Zhengkang</creatorcontrib><creatorcontrib>Wang, Changjing</creatorcontrib><creatorcontrib>Xia, Xin</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on software engineering and methodology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Huang, Qing</au><au>Liao, Dianshu</au><au>Xing, Zhenchang</au><au>Zuo, Zhengkang</au><au>Wang, Changjing</au><au>Xia, Xin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Semantic-Enriched Code Knowledge Graph to Reveal Unknowns in Smart Contract Code Reuse</atitle><jtitle>ACM transactions on software engineering and methodology</jtitle><stitle>ACM TOSEM</stitle><date>2023-09-30</date><risdate>2023</risdate><volume>32</volume><issue>6</issue><spage>1</spage><epage>37</epage><pages>1-37</pages><artnum>147</artnum><issn>1049-331X</issn><eissn>1557-7392</eissn><abstract>Programmers who work with smart contract development often encounter challenges in reusing code from repositories. This is due to the presence of two unknowns that can lead to non-functional and functional failures. These unknowns are implicit collaborations between functions and subtle differences among similar functions. Current code mining methods can extract syntax and semantic knowledge (known knowledge), but they cannot uncover these unknowns due to a significant gap between the known and the unknown. To address this issue, we formulate knowledge acquisition as a knowledge deduction task and propose an analytic flow that uses the function clone as a bridge to gradually deduce the known knowledge into the problem-solving knowledge that can reveal the unknowns. This flow comprises five methods: clone detection, co-occurrence probability calculation, function usage frequency accumulation, description propagation, and control flow graph annotation. This provides a systematic and coherent approach to knowledge deduction. We then structure all of the knowledge into a semantic-enriched code Knowledge Graph (KG) and integrate this KG into two software engineering tasks: code recommendation and crowd-scaled coding practice checking. As a proof of concept, we apply our approach to 5,140 smart contract files available on Etherscan.io and confirm high accuracy of our KG construction steps. In our experiments, our code KG effectively improved code recommendation accuracy by 6% to 45%, increased diversity by 61% to 102%, and enhanced NDCG by 1% to 21%. Furthermore, compared to traditional analysis tools and the debugging-with-the-crowd method, our KG improved time efficiency by 30 to 380 seconds, vulnerability determination accuracy by 20% to 33%, and vulnerability fixing accuracy by 24% to 40% for novice developers who identified and fixed vulnerable smart contract functions.</abstract><cop>New York, NY</cop><pub>ACM</pub><doi>10.1145/3597206</doi><tpages>37</tpages><orcidid>https://orcid.org/0000-0002-3601-4979</orcidid><orcidid>https://orcid.org/0000-0001-7663-1421</orcidid><orcidid>https://orcid.org/0000-0002-6302-3256</orcidid><orcidid>https://orcid.org/0009-0000-0865-0444</orcidid><orcidid>https://orcid.org/0000-0002-8877-4267</orcidid><orcidid>https://orcid.org/0000-0002-7118-3727</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1049-331X
ispartof ACM transactions on software engineering and methodology, 2023-09, Vol.32 (6), p.1-37, Article 147
issn 1049-331X
1557-7392
language eng
recordid cdi_crossref_primary_10_1145_3597206
source Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)
subjects Reusability
Search-based software engineering
Security and privacy
Semantics
Software and its engineering
Software security engineering
title Semantic-Enriched Code Knowledge Graph to Reveal Unknowns in Smart Contract Code Reuse
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T23%3A42%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Semantic-Enriched%20Code%20Knowledge%20Graph%20to%20Reveal%20Unknowns%20in%20Smart%20Contract%20Code%20Reuse&rft.jtitle=ACM%20transactions%20on%20software%20engineering%20and%20methodology&rft.au=Huang,%20Qing&rft.date=2023-09-30&rft.volume=32&rft.issue=6&rft.spage=1&rft.epage=37&rft.pages=1-37&rft.artnum=147&rft.issn=1049-331X&rft.eissn=1557-7392&rft_id=info:doi/10.1145/3597206&rft_dat=%3Cacm_cross%3E3597206%3C/acm_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a277t-6eabd38a18655224b4c308e18b29926f9f1480f7377df416b34e33d925d5717e3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true