Loading…
NL2Fix: Generating Functionally Correct Code Edits from Bug Descriptions
Despite the notable advancement of Large Language Models for Code Generation, there is a distinct gap in benchmark datasets and evaluation of LLMs' proficiency in generating functionally correct code edits based on natural language descriptions of intended changes. We address this void by prese...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 411 |
container_issue | |
container_start_page | 410 |
container_title | |
container_volume | |
creator | Fakhoury, Sarah Chakraborty, Saikat Musuvathi, Madanlal Lahiri, Shuvendu K. |
description | Despite the notable advancement of Large Language Models for Code Generation, there is a distinct gap in benchmark datasets and evaluation of LLMs' proficiency in generating functionally correct code edits based on natural language descriptions of intended changes. We address this void by presenting the challenge of translating natural language descriptions of code changes, particularly bug fixes outlined in Issue reports within repositories, into accurate code fixes. To tackle this issue, we introduce Defects4J-Nl2fix, a dataset comprising 283 Java programs from the widely-used Defects4J dataset, augmented with high-level descriptions of bug fixes. Subsequently, we empirically evaluate three state-of-the-art LLMs on this task, exploring the impact of different prompting strategies on their ability to generate functionally correct edits. Results show varied ability across models on this novel task. Collectively, the studied LLMs are able to produce plausible fixes for 64.6% of the bugs. |
doi_str_mv | 10.1145/3639478.3643526 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>acm_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10554944</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10554944</ieee_id><sourcerecordid>acm_books_10_1145_3639478_3643526</sourcerecordid><originalsourceid>FETCH-LOGICAL-a1626-7cc3e1bd499e2f8190d0772587beb79830f576a148b291bedb3d49c76bb94873</originalsourceid><addsrcrecordid>eNqNkL1PwzAUxA0Iiap0ZmHIyJLib8dsUPqBVMHS3bKdl8qiTSo7leh_j6tmYmK64Xd3eu8QeiB4SggXz0wyzVU1ZZIzQeUVmmilK46xwgJTco1GVCheEs34zR92hyYpBYeFyA6i1QitPtd0EX5eiiW0EG0f2m2xOLa-D11rd7tTMetiBN9nraGY16FPRRO7ffF23BbvkHwMh7M33aPbxu4STAYdo81ivpmtyvXX8mP2ui4tkVSWynsGxNVca6BNRTSusVJUVMqBy6cy3AglLeGVo5o4qB3LXq-kc5pXio3R46U2AIA5xLC38WRI_ohrzjOeXrD1e-O67jtlZs6zmWE2M8xmXAzQ5MDTPwPsF7NCZuA</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>NL2Fix: Generating Functionally Correct Code Edits from Bug Descriptions</title><source>IEEE Xplore All Conference Series</source><creator>Fakhoury, Sarah ; Chakraborty, Saikat ; Musuvathi, Madanlal ; Lahiri, Shuvendu K.</creator><creatorcontrib>Fakhoury, Sarah ; Chakraborty, Saikat ; Musuvathi, Madanlal ; Lahiri, Shuvendu K.</creatorcontrib><description>Despite the notable advancement of Large Language Models for Code Generation, there is a distinct gap in benchmark datasets and evaluation of LLMs' proficiency in generating functionally correct code edits based on natural language descriptions of intended changes. We address this void by presenting the challenge of translating natural language descriptions of code changes, particularly bug fixes outlined in Issue reports within repositories, into accurate code fixes. To tackle this issue, we introduce Defects4J-Nl2fix, a dataset comprising 283 Java programs from the widely-used Defects4J dataset, augmented with high-level descriptions of bug fixes. Subsequently, we empirically evaluate three state-of-the-art LLMs on this task, exploring the impact of different prompting strategies on their ability to generate functionally correct edits. Results show varied ability across models on this novel task. Collectively, the studied LLMs are able to produce plausible fixes for 64.6% of the bugs.</description><identifier>ISBN: 9798400705021</identifier><identifier>EISSN: 2574-1934</identifier><identifier>EISBN: 9798400705021</identifier><identifier>DOI: 10.1145/3639478.3643526</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Accuracy ; Benchmark testing ; Codes ; Computer bugs ; Java ; llm4code ; Natural languages ; nl2edit ; nl2fix ; Task analysis</subject><ispartof>2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), 2024, p.410-411</ispartof><rights>2024 Copyright held by the owner/author(s)</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0002-8486-7749 ; 0000-0002-4446-4777 ; 0000-0002-2482-7892 ; 0000-0002-6889-7171</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10554944$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,27904,54534,54911</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10554944$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Fakhoury, Sarah</creatorcontrib><creatorcontrib>Chakraborty, Saikat</creatorcontrib><creatorcontrib>Musuvathi, Madanlal</creatorcontrib><creatorcontrib>Lahiri, Shuvendu K.</creatorcontrib><title>NL2Fix: Generating Functionally Correct Code Edits from Bug Descriptions</title><title>2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)</title><addtitle>ICSE-COMPANION</addtitle><description>Despite the notable advancement of Large Language Models for Code Generation, there is a distinct gap in benchmark datasets and evaluation of LLMs' proficiency in generating functionally correct code edits based on natural language descriptions of intended changes. We address this void by presenting the challenge of translating natural language descriptions of code changes, particularly bug fixes outlined in Issue reports within repositories, into accurate code fixes. To tackle this issue, we introduce Defects4J-Nl2fix, a dataset comprising 283 Java programs from the widely-used Defects4J dataset, augmented with high-level descriptions of bug fixes. Subsequently, we empirically evaluate three state-of-the-art LLMs on this task, exploring the impact of different prompting strategies on their ability to generate functionally correct edits. Results show varied ability across models on this novel task. Collectively, the studied LLMs are able to produce plausible fixes for 64.6% of the bugs.</description><subject>Accuracy</subject><subject>Benchmark testing</subject><subject>Codes</subject><subject>Computer bugs</subject><subject>Java</subject><subject>llm4code</subject><subject>Natural languages</subject><subject>nl2edit</subject><subject>nl2fix</subject><subject>Task analysis</subject><issn>2574-1934</issn><isbn>9798400705021</isbn><isbn>9798400705021</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2024</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNqNkL1PwzAUxA0Iiap0ZmHIyJLib8dsUPqBVMHS3bKdl8qiTSo7leh_j6tmYmK64Xd3eu8QeiB4SggXz0wyzVU1ZZIzQeUVmmilK46xwgJTco1GVCheEs34zR92hyYpBYeFyA6i1QitPtd0EX5eiiW0EG0f2m2xOLa-D11rd7tTMetiBN9nraGY16FPRRO7ffF23BbvkHwMh7M33aPbxu4STAYdo81ivpmtyvXX8mP2ui4tkVSWynsGxNVca6BNRTSusVJUVMqBy6cy3AglLeGVo5o4qB3LXq-kc5pXio3R46U2AIA5xLC38WRI_ohrzjOeXrD1e-O67jtlZs6zmWE2M8xmXAzQ5MDTPwPsF7NCZuA</recordid><startdate>20240414</startdate><enddate>20240414</enddate><creator>Fakhoury, Sarah</creator><creator>Chakraborty, Saikat</creator><creator>Musuvathi, Madanlal</creator><creator>Lahiri, Shuvendu K.</creator><general>ACM</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope><orcidid>https://orcid.org/0000-0002-8486-7749</orcidid><orcidid>https://orcid.org/0000-0002-4446-4777</orcidid><orcidid>https://orcid.org/0000-0002-2482-7892</orcidid><orcidid>https://orcid.org/0000-0002-6889-7171</orcidid></search><sort><creationdate>20240414</creationdate><title>NL2Fix: Generating Functionally Correct Code Edits from Bug Descriptions</title><author>Fakhoury, Sarah ; Chakraborty, Saikat ; Musuvathi, Madanlal ; Lahiri, Shuvendu K.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a1626-7cc3e1bd499e2f8190d0772587beb79830f576a148b291bedb3d49c76bb94873</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Benchmark testing</topic><topic>Codes</topic><topic>Computer bugs</topic><topic>Java</topic><topic>llm4code</topic><topic>Natural languages</topic><topic>nl2edit</topic><topic>nl2fix</topic><topic>Task analysis</topic><toplevel>online_resources</toplevel><creatorcontrib>Fakhoury, Sarah</creatorcontrib><creatorcontrib>Chakraborty, Saikat</creatorcontrib><creatorcontrib>Musuvathi, Madanlal</creatorcontrib><creatorcontrib>Lahiri, Shuvendu K.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Fakhoury, Sarah</au><au>Chakraborty, Saikat</au><au>Musuvathi, Madanlal</au><au>Lahiri, Shuvendu K.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>NL2Fix: Generating Functionally Correct Code Edits from Bug Descriptions</atitle><btitle>2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)</btitle><stitle>ICSE-COMPANION</stitle><date>2024-04-14</date><risdate>2024</risdate><spage>410</spage><epage>411</epage><pages>410-411</pages><eissn>2574-1934</eissn><isbn>9798400705021</isbn><eisbn>9798400705021</eisbn><coden>IEEPAD</coden><abstract>Despite the notable advancement of Large Language Models for Code Generation, there is a distinct gap in benchmark datasets and evaluation of LLMs' proficiency in generating functionally correct code edits based on natural language descriptions of intended changes. We address this void by presenting the challenge of translating natural language descriptions of code changes, particularly bug fixes outlined in Issue reports within repositories, into accurate code fixes. To tackle this issue, we introduce Defects4J-Nl2fix, a dataset comprising 283 Java programs from the widely-used Defects4J dataset, augmented with high-level descriptions of bug fixes. Subsequently, we empirically evaluate three state-of-the-art LLMs on this task, exploring the impact of different prompting strategies on their ability to generate functionally correct edits. Results show varied ability across models on this novel task. Collectively, the studied LLMs are able to produce plausible fixes for 64.6% of the bugs.</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/3639478.3643526</doi><tpages>2</tpages><orcidid>https://orcid.org/0000-0002-8486-7749</orcidid><orcidid>https://orcid.org/0000-0002-4446-4777</orcidid><orcidid>https://orcid.org/0000-0002-2482-7892</orcidid><orcidid>https://orcid.org/0000-0002-6889-7171</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISBN: 9798400705021 |
ispartof | 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), 2024, p.410-411 |
issn | 2574-1934 |
language | eng |
recordid | cdi_ieee_primary_10554944 |
source | IEEE Xplore All Conference Series |
subjects | Accuracy Benchmark testing Codes Computer bugs Java llm4code Natural languages nl2edit nl2fix Task analysis |
title | NL2Fix: Generating Functionally Correct Code Edits from Bug Descriptions |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-22T08%3A22%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=NL2Fix:%20Generating%20Functionally%20Correct%20Code%20Edits%20from%20Bug%20Descriptions&rft.btitle=2024%20IEEE/ACM%2046th%20International%20Conference%20on%20Software%20Engineering:%20Companion%20Proceedings%20(ICSE-Companion)&rft.au=Fakhoury,%20Sarah&rft.date=2024-04-14&rft.spage=410&rft.epage=411&rft.pages=410-411&rft.eissn=2574-1934&rft.isbn=9798400705021&rft.coden=IEEPAD&rft_id=info:doi/10.1145/3639478.3643526&rft.eisbn=9798400705021&rft_dat=%3Cacm_CHZPO%3Eacm_books_10_1145_3639478_3643526%3C/acm_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a1626-7cc3e1bd499e2f8190d0772587beb79830f576a148b291bedb3d49c76bb94873%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10554944&rfr_iscdi=true |