Loading…

Framework for evaluating code generation ability of large language models

Large language models (LLMs) have revolutionized various applications in natural language processing and exhibited proficiency in generating programming code. We propose a framework for evaluating the code generation ability of LLMs and introduce a new metric, pass-ratio@n, which captures the granul...

Full description

Saved in:

Bibliographic Details
Published in:	ETRI journal 2024, Vol.46 (1), p.106-117
Main Authors:	Sangyeop Yeo, Yu-Seung Ma, Sang Cheol Kim, Hyungkook Jun, Taeho Kim
Format:	Article
Language:	Korean
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Large language models (LLMs) have revolutionized various applications in natural language processing and exhibited proficiency in generating programming code. We propose a framework for evaluating the code generation ability of LLMs and introduce a new metric, pass-ratio@n, which captures the granularity of accuracy according to the pass rate of test cases. The framework is intended to be fully automatic to handle the repetitive work involved in generating prompts, conducting inferences, and executing the generated codes. A preliminary evaluation focusing on the prompt detail, problem publication date, and difficulty level demonstrates the successful integration of our framework with the LeetCode coding platform and highlights the applicability of the pass-ratio@n metric.
ISSN:	1225-6463 2233-7326