Loading…

NL2Fix: Generating Functionally Correct Code Edits from Bug Descriptions

Despite the notable advancement of Large Language Models for Code Generation, there is a distinct gap in benchmark datasets and evaluation of LLMs' proficiency in generating functionally correct code edits based on natural language descriptions of intended changes. We address this void by prese...

Full description

Saved in:
Bibliographic Details
Main Authors: Fakhoury, Sarah, Chakraborty, Saikat, Musuvathi, Madanlal, Lahiri, Shuvendu K.
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 411
container_issue
container_start_page 410
container_title
container_volume
creator Fakhoury, Sarah
Chakraborty, Saikat
Musuvathi, Madanlal
Lahiri, Shuvendu K.
description Despite the notable advancement of Large Language Models for Code Generation, there is a distinct gap in benchmark datasets and evaluation of LLMs' proficiency in generating functionally correct code edits based on natural language descriptions of intended changes. We address this void by presenting the challenge of translating natural language descriptions of code changes, particularly bug fixes outlined in Issue reports within repositories, into accurate code fixes. To tackle this issue, we introduce Defects4J-Nl2fix, a dataset comprising 283 Java programs from the widely-used Defects4J dataset, augmented with high-level descriptions of bug fixes. Subsequently, we empirically evaluate three state-of-the-art LLMs on this task, exploring the impact of different prompting strategies on their ability to generate functionally correct edits. Results show varied ability across models on this novel task. Collectively, the studied LLMs are able to produce plausible fixes for 64.6% of the bugs.
doi_str_mv 10.1145/3639478.3643526
format conference_proceeding
fullrecord <record><control><sourceid>acm_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10554944</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10554944</ieee_id><sourcerecordid>acm_books_10_1145_3639478_3643526</sourcerecordid><originalsourceid>FETCH-LOGICAL-a1626-7cc3e1bd499e2f8190d0772587beb79830f576a148b291bedb3d49c76bb94873</originalsourceid><addsrcrecordid>eNqNkL1PwzAUxA0Iiap0ZmHIyJLib8dsUPqBVMHS3bKdl8qiTSo7leh_j6tmYmK64Xd3eu8QeiB4SggXz0wyzVU1ZZIzQeUVmmilK46xwgJTco1GVCheEs34zR92hyYpBYeFyA6i1QitPtd0EX5eiiW0EG0f2m2xOLa-D11rd7tTMetiBN9nraGY16FPRRO7ffF23BbvkHwMh7M33aPbxu4STAYdo81ivpmtyvXX8mP2ui4tkVSWynsGxNVca6BNRTSusVJUVMqBy6cy3AglLeGVo5o4qB3LXq-kc5pXio3R46U2AIA5xLC38WRI_ohrzjOeXrD1e-O67jtlZs6zmWE2M8xmXAzQ5MDTPwPsF7NCZuA</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>NL2Fix: Generating Functionally Correct Code Edits from Bug Descriptions</title><source>IEEE Xplore All Conference Series</source><creator>Fakhoury, Sarah ; Chakraborty, Saikat ; Musuvathi, Madanlal ; Lahiri, Shuvendu K.</creator><creatorcontrib>Fakhoury, Sarah ; Chakraborty, Saikat ; Musuvathi, Madanlal ; Lahiri, Shuvendu K.</creatorcontrib><description>Despite the notable advancement of Large Language Models for Code Generation, there is a distinct gap in benchmark datasets and evaluation of LLMs' proficiency in generating functionally correct code edits based on natural language descriptions of intended changes. We address this void by presenting the challenge of translating natural language descriptions of code changes, particularly bug fixes outlined in Issue reports within repositories, into accurate code fixes. To tackle this issue, we introduce Defects4J-Nl2fix, a dataset comprising 283 Java programs from the widely-used Defects4J dataset, augmented with high-level descriptions of bug fixes. Subsequently, we empirically evaluate three state-of-the-art LLMs on this task, exploring the impact of different prompting strategies on their ability to generate functionally correct edits. Results show varied ability across models on this novel task. Collectively, the studied LLMs are able to produce plausible fixes for 64.6% of the bugs.</description><identifier>ISBN: 9798400705021</identifier><identifier>EISSN: 2574-1934</identifier><identifier>EISBN: 9798400705021</identifier><identifier>DOI: 10.1145/3639478.3643526</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Accuracy ; Benchmark testing ; Codes ; Computer bugs ; Java ; llm4code ; Natural languages ; nl2edit ; nl2fix ; Task analysis</subject><ispartof>2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), 2024, p.410-411</ispartof><rights>2024 Copyright held by the owner/author(s)</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0002-8486-7749 ; 0000-0002-4446-4777 ; 0000-0002-2482-7892 ; 0000-0002-6889-7171</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10554944$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,27904,54534,54911</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10554944$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Fakhoury, Sarah</creatorcontrib><creatorcontrib>Chakraborty, Saikat</creatorcontrib><creatorcontrib>Musuvathi, Madanlal</creatorcontrib><creatorcontrib>Lahiri, Shuvendu K.</creatorcontrib><title>NL2Fix: Generating Functionally Correct Code Edits from Bug Descriptions</title><title>2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)</title><addtitle>ICSE-COMPANION</addtitle><description>Despite the notable advancement of Large Language Models for Code Generation, there is a distinct gap in benchmark datasets and evaluation of LLMs' proficiency in generating functionally correct code edits based on natural language descriptions of intended changes. We address this void by presenting the challenge of translating natural language descriptions of code changes, particularly bug fixes outlined in Issue reports within repositories, into accurate code fixes. To tackle this issue, we introduce Defects4J-Nl2fix, a dataset comprising 283 Java programs from the widely-used Defects4J dataset, augmented with high-level descriptions of bug fixes. Subsequently, we empirically evaluate three state-of-the-art LLMs on this task, exploring the impact of different prompting strategies on their ability to generate functionally correct edits. Results show varied ability across models on this novel task. Collectively, the studied LLMs are able to produce plausible fixes for 64.6% of the bugs.</description><subject>Accuracy</subject><subject>Benchmark testing</subject><subject>Codes</subject><subject>Computer bugs</subject><subject>Java</subject><subject>llm4code</subject><subject>Natural languages</subject><subject>nl2edit</subject><subject>nl2fix</subject><subject>Task analysis</subject><issn>2574-1934</issn><isbn>9798400705021</isbn><isbn>9798400705021</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2024</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNqNkL1PwzAUxA0Iiap0ZmHIyJLib8dsUPqBVMHS3bKdl8qiTSo7leh_j6tmYmK64Xd3eu8QeiB4SggXz0wyzVU1ZZIzQeUVmmilK46xwgJTco1GVCheEs34zR92hyYpBYeFyA6i1QitPtd0EX5eiiW0EG0f2m2xOLa-D11rd7tTMetiBN9nraGY16FPRRO7ffF23BbvkHwMh7M33aPbxu4STAYdo81ivpmtyvXX8mP2ui4tkVSWynsGxNVca6BNRTSusVJUVMqBy6cy3AglLeGVo5o4qB3LXq-kc5pXio3R46U2AIA5xLC38WRI_ohrzjOeXrD1e-O67jtlZs6zmWE2M8xmXAzQ5MDTPwPsF7NCZuA</recordid><startdate>20240414</startdate><enddate>20240414</enddate><creator>Fakhoury, Sarah</creator><creator>Chakraborty, Saikat</creator><creator>Musuvathi, Madanlal</creator><creator>Lahiri, Shuvendu K.</creator><general>ACM</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope><orcidid>https://orcid.org/0000-0002-8486-7749</orcidid><orcidid>https://orcid.org/0000-0002-4446-4777</orcidid><orcidid>https://orcid.org/0000-0002-2482-7892</orcidid><orcidid>https://orcid.org/0000-0002-6889-7171</orcidid></search><sort><creationdate>20240414</creationdate><title>NL2Fix: Generating Functionally Correct Code Edits from Bug Descriptions</title><author>Fakhoury, Sarah ; Chakraborty, Saikat ; Musuvathi, Madanlal ; Lahiri, Shuvendu K.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a1626-7cc3e1bd499e2f8190d0772587beb79830f576a148b291bedb3d49c76bb94873</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Benchmark testing</topic><topic>Codes</topic><topic>Computer bugs</topic><topic>Java</topic><topic>llm4code</topic><topic>Natural languages</topic><topic>nl2edit</topic><topic>nl2fix</topic><topic>Task analysis</topic><toplevel>online_resources</toplevel><creatorcontrib>Fakhoury, Sarah</creatorcontrib><creatorcontrib>Chakraborty, Saikat</creatorcontrib><creatorcontrib>Musuvathi, Madanlal</creatorcontrib><creatorcontrib>Lahiri, Shuvendu K.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Fakhoury, Sarah</au><au>Chakraborty, Saikat</au><au>Musuvathi, Madanlal</au><au>Lahiri, Shuvendu K.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>NL2Fix: Generating Functionally Correct Code Edits from Bug Descriptions</atitle><btitle>2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)</btitle><stitle>ICSE-COMPANION</stitle><date>2024-04-14</date><risdate>2024</risdate><spage>410</spage><epage>411</epage><pages>410-411</pages><eissn>2574-1934</eissn><isbn>9798400705021</isbn><eisbn>9798400705021</eisbn><coden>IEEPAD</coden><abstract>Despite the notable advancement of Large Language Models for Code Generation, there is a distinct gap in benchmark datasets and evaluation of LLMs' proficiency in generating functionally correct code edits based on natural language descriptions of intended changes. We address this void by presenting the challenge of translating natural language descriptions of code changes, particularly bug fixes outlined in Issue reports within repositories, into accurate code fixes. To tackle this issue, we introduce Defects4J-Nl2fix, a dataset comprising 283 Java programs from the widely-used Defects4J dataset, augmented with high-level descriptions of bug fixes. Subsequently, we empirically evaluate three state-of-the-art LLMs on this task, exploring the impact of different prompting strategies on their ability to generate functionally correct edits. Results show varied ability across models on this novel task. Collectively, the studied LLMs are able to produce plausible fixes for 64.6% of the bugs.</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/3639478.3643526</doi><tpages>2</tpages><orcidid>https://orcid.org/0000-0002-8486-7749</orcidid><orcidid>https://orcid.org/0000-0002-4446-4777</orcidid><orcidid>https://orcid.org/0000-0002-2482-7892</orcidid><orcidid>https://orcid.org/0000-0002-6889-7171</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISBN: 9798400705021
ispartof 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), 2024, p.410-411
issn 2574-1934
language eng
recordid cdi_ieee_primary_10554944
source IEEE Xplore All Conference Series
subjects Accuracy
Benchmark testing
Codes
Computer bugs
Java
llm4code
Natural languages
nl2edit
nl2fix
Task analysis
title NL2Fix: Generating Functionally Correct Code Edits from Bug Descriptions
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-22T08%3A22%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=NL2Fix:%20Generating%20Functionally%20Correct%20Code%20Edits%20from%20Bug%20Descriptions&rft.btitle=2024%20IEEE/ACM%2046th%20International%20Conference%20on%20Software%20Engineering:%20Companion%20Proceedings%20(ICSE-Companion)&rft.au=Fakhoury,%20Sarah&rft.date=2024-04-14&rft.spage=410&rft.epage=411&rft.pages=410-411&rft.eissn=2574-1934&rft.isbn=9798400705021&rft.coden=IEEPAD&rft_id=info:doi/10.1145/3639478.3643526&rft.eisbn=9798400705021&rft_dat=%3Cacm_CHZPO%3Eacm_books_10_1145_3639478_3643526%3C/acm_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a1626-7cc3e1bd499e2f8190d0772587beb79830f576a148b291bedb3d49c76bb94873%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10554944&rfr_iscdi=true