Loading…

NL2Fix: Generating Functionally Correct Code Edits from Bug Descriptions

Despite the notable advancement of Large Language Models for Code Generation, there is a distinct gap in benchmark datasets and evaluation of LLMs' proficiency in generating functionally correct code edits based on natural language descriptions of intended changes. We address this void by prese...

Full description

Saved in:
Bibliographic Details
Main Authors: Fakhoury, Sarah, Chakraborty, Saikat, Musuvathi, Madanlal, Lahiri, Shuvendu K.
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Despite the notable advancement of Large Language Models for Code Generation, there is a distinct gap in benchmark datasets and evaluation of LLMs' proficiency in generating functionally correct code edits based on natural language descriptions of intended changes. We address this void by presenting the challenge of translating natural language descriptions of code changes, particularly bug fixes outlined in Issue reports within repositories, into accurate code fixes. To tackle this issue, we introduce Defects4J-Nl2fix, a dataset comprising 283 Java programs from the widely-used Defects4J dataset, augmented with high-level descriptions of bug fixes. Subsequently, we empirically evaluate three state-of-the-art LLMs on this task, exploring the impact of different prompting strategies on their ability to generate functionally correct edits. Results show varied ability across models on this novel task. Collectively, the studied LLMs are able to produce plausible fixes for 64.6% of the bugs.
ISSN:2574-1934
DOI:10.1145/3639478.3643526