Loading…

NL2Fix: Generating Functionally Correct Code Edits from Bug Descriptions

Despite the notable advancement of Large Language Models for Code Generation, there is a distinct gap in benchmark datasets and evaluation of LLMs' proficiency in generating functionally correct code edits based on natural language descriptions of intended changes. We address this void by prese...

Full description

Saved in:

Bibliographic Details
Main Authors:	Fakhoury, Sarah, Chakraborty, Saikat, Musuvathi, Madanlal, Lahiri, Shuvendu K.
Format:	Conference Proceeding
Language:	English
Subjects:	Accuracy Benchmark testing Codes Computer bugs Java llm4code Natural languages nl2edit nl2fix Task analysis
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	411
container_issue
container_start_page	410
container_title
container_volume
creator	Fakhoury, Sarah Chakraborty, Saikat Musuvathi, Madanlal Lahiri, Shuvendu K.
description	Despite the notable advancement of Large Language Models for Code Generation, there is a distinct gap in benchmark datasets and evaluation of LLMs' proficiency in generating functionally correct code edits based on natural language descriptions of intended changes. We address this void by presenting the challenge of translating natural language descriptions of code changes, particularly bug fixes outlined in Issue reports within repositories, into accurate code fixes. To tackle this issue, we introduce Defects4J-Nl2fix, a dataset comprising 283 Java programs from the widely-used Defects4J dataset, augmented with high-level descriptions of bug fixes. Subsequently, we empirically evaluate three state-of-the-art LLMs on this task, exploring the impact of different prompting strategies on their ability to generate functionally correct edits. Results show varied ability across models on this novel task. Collectively, the studied LLMs are able to produce plausible fixes for 64.6% of the bugs.
doi_str_mv	10.1145/3639478.3643526
format	conference_proceeding
fullrecord	<record><control><sourceid>acm_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10554944</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10554944</ieee_id><sourcerecordid>acm_books_10_1145_3639478_3643526</sourcerecordid><originalsourceid>FETCH-LOGICAL-a1626-7cc3e1bd499e2f8190d0772587beb79830f576a148b291bedb3d49c76bb94873</originalsourceid><addsrcrecordid>eNqNkL1PwzAUxA0Iiap0ZmHIyJLib8dsUPqBVMHS3bKdl8qiTSo7leh_j6tmYmK64Xd3eu8QeiB4SggXz0wyzVU1ZZIzQeUVmmilK46xwgJTco1GVCheEs34zR92hyYpBYeFyA6i1QitPtd0EX5eiiW0EG0f2m2xOLa-D11rd7tTMetiBN9nraGY16FPRRO7ffF23BbvkHwMh7M33aPbxu4STAYdo81ivpmtyvXX8mP2ui4tkVSWynsGxNVca6BNRTSusVJUVMqBy6cy3AglLeGVo5o4qB3LXq-kc5pXio3R46U2AIA5xLC38WRI_ohrzjOeXrD1e-O67jtlZs6zmWE2M8xmXAzQ5MDTPwPsF7NCZuA</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>NL2Fix: Generating Functionally Correct Code Edits from Bug Descriptions</title><source>IEEE Xplore All Conference Series</source><creator>Fakhoury, Sarah ; Chakraborty, Saikat ; Musuvathi, Madanlal ; Lahiri, Shuvendu K.</creator><creatorcontrib>Fakhoury, Sarah ; Chakraborty, Saikat ; Musuvathi, Madanlal ; Lahiri, Shuvendu K.</creatorcontrib><description>Despite the notable advancement of Large Language Models for Code Generation, there is a distinct gap in benchmark datasets and evaluation of LLMs' proficiency in generating functionally correct code edits based on natural language descriptions of intended changes. We address this void by presenting the challenge of translating natural language descriptions of code changes, particularly bug fixes outlined in Issue reports within repositories, into accurate code fixes. To tackle this issue, we introduce Defects4J-Nl2fix, a dataset comprising 283 Java programs from the widely-used Defects4J dataset, augmented with high-level descriptions of bug fixes. Subsequently, we empirically evaluate three state-of-the-art LLMs on this task, exploring the impact of different prompting strategies on their ability to generate functionally correct edits. Results show varied ability across models on this novel task. Collectively, the studied LLMs are able to produce plausible fixes for 64.6% of the bugs.</description><identifier>ISBN: 9798400705021</identifier><identifier>EISSN: 2574-1934</identifier><identifier>EISBN: 9798400705021</identifier><identifier>DOI: 10.1145/3639478.3643526</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Accuracy ; Benchmark testing ; Codes ; Computer bugs ; Java ; llm4code ; Natural languages ; nl2edit ; nl2fix ; Task analysis</subject><ispartof>2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), 2024, p.410-411</ispartof><rights>2024 Copyright held by the owner/author(s)</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0002-8486-7749 ; 0000-0002-4446-4777 ; 0000-0002-2482-7892 ; 0000-0002-6889-7171</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10554944$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,27904,54534,54911</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10554944$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Fakhoury, Sarah</creatorcontrib><creatorcontrib>Chakraborty, Saikat</creatorcontrib><creatorcontrib>Musuvathi, Madanlal</creatorcontrib><creatorcontrib>Lahiri, Shuvendu K.</creatorcontrib><title>NL2Fix: Generating Functionally Correct Code Edits from Bug Descriptions</title><title>2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)</title><addtitle>ICSE-COMPANION</addtitle><description>Despite the notable advancement of Large Language Models for Code Generation, there is a distinct gap in benchmark datasets and evaluation of LLMs' proficiency in generating functionally correct code edits based on natural language descriptions of intended changes. We address this void by presenting the challenge of translating natural language descriptions of code changes, particularly bug fixes outlined in Issue reports within repositories, into accurate code fixes. To tackle this issue, we introduce Defects4J-Nl2fix, a dataset comprising 283 Java programs from the widely-used Defects4J dataset, augmented with high-level descriptions of bug fixes. Subsequently, we empirically evaluate three state-of-the-art LLMs on this task, exploring the impact of different prompting strategies on their ability to generate functionally correct edits. Results show varied ability across models on this novel task. Collectively, the studied LLMs are able to produce plausible fixes for 64.6% of the bugs.</description><subject>Accuracy</subject><subject>Benchmark testing</subject><subject>Codes</subject><subject>Computer bugs</subject><subject>Java</subject><subject>llm4code</subject><subject>Natural languages</subject><subject>nl2edit</subject><subject>nl2fix</subject><subject>Task analysis</subject><issn>2574-1934</issn><isbn>9798400705021</isbn><isbn>9798400705021</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2024</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNqNkL1PwzAUxA0Iiap0ZmHIyJLib8dsUPqBVMHS3bKdl8qiTSo7leh_j6tmYmK64Xd3eu8QeiB4SggXz0wyzVU1ZZIzQeUVmmilK46xwgJTco1GVCheEs34zR92hyYpBYeFyA6i1QitPtd0EX5eiiW0EG0f2m2xOLa-D11rd7tTMetiBN9nraGY16FPRRO7ffF23BbvkHwMh7M33aPbxu4STAYdo81ivpmtyvXX8mP2ui4tkVSWynsGxNVca6BNRTSusVJUVMqBy6cy3AglLeGVo5o4qB3LXq-kc5pXio3R46U2AIA5xLC38WRI_ohrzjOeXrD1e-O67jtlZs6zmWE2M8xmXAzQ5MDTPwPsF7NCZuA</recordid><startdate>20240414</startdate><enddate>20240414</enddate><creator>Fakhoury, Sarah</creator><creator>Chakraborty, Saikat</creator><creator>Musuvathi, Madanlal</creator><creator>Lahiri, Shuvendu K.</creator><general>ACM</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope><orcidid>https://orcid.org/0000-0002-8486-7749</orcidid><orcidid>https://orcid.org/0000-0002-4446-4777</orcidid><orcidid>https://orcid.org/0000-0002-2482-7892</orcidid><orcidid>https://orcid.org/0000-0002-6889-7171</orcidid></search><sort><creationdate>20240414</creationdate><title>NL2Fix: Generating Functionally Correct Code Edits from Bug Descriptions</title><author>Fakhoury, Sarah ; Chakraborty, Saikat ; Musuvathi, Madanlal ; Lahiri, Shuvendu K.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a1626-7cc3e1bd499e2f8190d0772587beb79830f576a148b291bedb3d49c76bb94873</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Benchmark testing</topic><topic>Codes</topic><topic>Computer bugs</topic><topic>Java</topic><topic>llm4code</topic><topic>Natural languages</topic><topic>nl2edit</topic><topic>nl2fix</topic><topic>Task analysis</topic><toplevel>online_resources</toplevel><creatorcontrib>Fakhoury, Sarah</creatorcontrib><creatorcontrib>Chakraborty, Saikat</creatorcontrib><creatorcontrib>Musuvathi, Madanlal</creatorcontrib><creatorcontrib>Lahiri, Shuvendu K.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Fakhoury, Sarah</au><au>Chakraborty, Saikat</au><au>Musuvathi, Madanlal</au><au>Lahiri, Shuvendu K.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>NL2Fix: Generating Functionally Correct Code Edits from Bug Descriptions</atitle><btitle>2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)</btitle><stitle>ICSE-COMPANION</stitle><date>2024-04-14</date><risdate>2024</risdate><spage>410</spage><epage>411</epage><pages>410-411</pages><eissn>2574-1934</eissn><isbn>9798400705021</isbn><eisbn>9798400705021</eisbn><coden>IEEPAD</coden><abstract>Despite the notable advancement of Large Language Models for Code Generation, there is a distinct gap in benchmark datasets and evaluation of LLMs' proficiency in generating functionally correct code edits based on natural language descriptions of intended changes. We address this void by presenting the challenge of translating natural language descriptions of code changes, particularly bug fixes outlined in Issue reports within repositories, into accurate code fixes. To tackle this issue, we introduce Defects4J-Nl2fix, a dataset comprising 283 Java programs from the widely-used Defects4J dataset, augmented with high-level descriptions of bug fixes. Subsequently, we empirically evaluate three state-of-the-art LLMs on this task, exploring the impact of different prompting strategies on their ability to generate functionally correct edits. Results show varied ability across models on this novel task. Collectively, the studied LLMs are able to produce plausible fixes for 64.6% of the bugs.</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/3639478.3643526</doi><tpages>2</tpages><orcidid>https://orcid.org/0000-0002-8486-7749</orcidid><orcidid>https://orcid.org/0000-0002-4446-4777</orcidid><orcidid>https://orcid.org/0000-0002-2482-7892</orcidid><orcidid>https://orcid.org/0000-0002-6889-7171</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISBN: 9798400705021
ispartof	2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), 2024, p.410-411
issn	2574-1934
language	eng
recordid	cdi_ieee_primary_10554944
source	IEEE Xplore All Conference Series
subjects	Accuracy Benchmark testing Codes Computer bugs Java llm4code Natural languages nl2edit nl2fix Task analysis
title	NL2Fix: Generating Functionally Correct Code Edits from Bug Descriptions
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-22T08%3A22%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=NL2Fix:%20Generating%20Functionally%20Correct%20Code%20Edits%20from%20Bug%20Descriptions&rft.btitle=2024%20IEEE/ACM%2046th%20International%20Conference%20on%20Software%20Engineering:%20Companion%20Proceedings%20(ICSE-Companion)&rft.au=Fakhoury,%20Sarah&rft.date=2024-04-14&rft.spage=410&rft.epage=411&rft.pages=410-411&rft.eissn=2574-1934&rft.isbn=9798400705021&rft.coden=IEEPAD&rft_id=info:doi/10.1145/3639478.3643526&rft.eisbn=9798400705021&rft_dat=%3Cacm_CHZPO%3Eacm_books_10_1145_3639478_3643526%3C/acm_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a1626-7cc3e1bd499e2f8190d0772587beb79830f576a148b291bedb3d49c76bb94873%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10554944&rfr_iscdi=true