Loading…

Framework for evaluating code generation ability of large language models

Large language models (LLMs) have revolutionized various applications in natural language processing and exhibited proficiency in generating programming code. We propose a framework for evaluating the code generation ability of LLMs and introduce a new metric, pass‐ratio@n, which captures the granul...

Full description

Saved in:

Bibliographic Details
Published in:	ETRI journal 2024, 46(1), , pp.106-117
Main Authors:	Yeo, Sangyeop, Ma, Yu‐Seung, Kim, Sang Cheol, Jun, Hyungkook, Kim, Taeho
Format:	Article
Language:	English
Subjects:	code generation evaluation metric large language model natural language processing software engineering 전자/정보통신공학
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c4309-d20da706fad898caa8c646a5cde94f6c85fc2cc1b9295567f582797f233cce823
cites	cdi_FETCH-LOGICAL-c4309-d20da706fad898caa8c646a5cde94f6c85fc2cc1b9295567f582797f233cce823
container_end_page	117
container_issue	1
container_start_page	106
container_title	ETRI journal
container_volume	46
creator	Yeo, Sangyeop Ma, Yu‐Seung Kim, Sang Cheol Jun, Hyungkook Kim, Taeho
description	Large language models (LLMs) have revolutionized various applications in natural language processing and exhibited proficiency in generating programming code. We propose a framework for evaluating the code generation ability of LLMs and introduce a new metric, pass‐ratio@n, which captures the granularity of accuracy according to the pass rate of test cases. The framework is intended to be fully automatic to handle the repetitive work involved in generating prompts, conducting inferences, and executing the generated codes. A preliminary evaluation focusing on the prompt detail, problem publication date, and difficulty level demonstrates the successful integration of our framework with the LeetCode coding platform and highlights the applicability of the pass‐ratio@n metric.
doi_str_mv	10.4218/etrij.2023-0357
format	article
fullrecord	<record><control><sourceid>wiley_nrf_k</sourceid><recordid>TN_cdi_nrf_kci_oai_kci_go_kr_ARTI_10402618</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_0149d44d4b2944788e506b1366390abf</doaj_id><sourcerecordid>ETR212649</sourcerecordid><originalsourceid>FETCH-LOGICAL-c4309-d20da706fad898caa8c646a5cde94f6c85fc2cc1b9295567f582797f233cce823</originalsourceid><addsrcrecordid>eNqFUU1LAzEQDaJg_Th73bOwmkw-NjmK-FEQBKnnMJtNlrTbjWSr0n9v2opXL_NmhvceMzxCrhi9EcD0rd_kuLwBCrymXDZHZAbAed1wUMdkxgBkrYTip-RsmpaUAhVSz8j8MePaf6e8qkLKlf_C4RM3cewrlzpf9X70ucxprLCNQ9xsqxSqAXPvSx37TyzNujCH6YKcBBwmf_mL5-T98WFx_1y_vD7N7-9eaic4NXUHtMOGqoCdNtohalfOQuk6b0RQTsvgwDnWGjBSqiZIDY1pQvnFOa-Bn5Prg--Yg125aBPGPfbJrrK9e1vMLaOCgmK6kOcHcpdwaT9yXGPe7hX7Rcq9xbyJbvCWMmE6ITrRghGi0dpLqlrGleKGYhuK1-3By-U0TdmHPz9G7S4Cu4_A7iKwuwiKQh0U33Hw2__o9mHxBgyUMPwH5faKbg</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Framework for evaluating code generation ability of large language models</title><source>Alma/SFX Local Collection</source><creator>Yeo, Sangyeop ; Ma, Yu‐Seung ; Kim, Sang Cheol ; Jun, Hyungkook ; Kim, Taeho</creator><creatorcontrib>Yeo, Sangyeop ; Ma, Yu‐Seung ; Kim, Sang Cheol ; Jun, Hyungkook ; Kim, Taeho</creatorcontrib><description>Large language models (LLMs) have revolutionized various applications in natural language processing and exhibited proficiency in generating programming code. We propose a framework for evaluating the code generation ability of LLMs and introduce a new metric, pass‐ratio@n, which captures the granularity of accuracy according to the pass rate of test cases. The framework is intended to be fully automatic to handle the repetitive work involved in generating prompts, conducting inferences, and executing the generated codes. A preliminary evaluation focusing on the prompt detail, problem publication date, and difficulty level demonstrates the successful integration of our framework with the LeetCode coding platform and highlights the applicability of the pass‐ratio@n metric.</description><identifier>ISSN: 1225-6463</identifier><identifier>EISSN: 2233-7326</identifier><identifier>DOI: 10.4218/etrij.2023-0357</identifier><language>eng</language><publisher>Electronics and Telecommunications Research Institute (ETRI)</publisher><subject>code generation ; evaluation metric ; large language model ; natural language processing ; software engineering ; 전자/정보통신공학</subject><ispartof>ETRI Journal, 2024, 46(1), , pp.106-117</ispartof><rights>1225‐6463/$ © 2024 ETRI</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c4309-d20da706fad898caa8c646a5cde94f6c85fc2cc1b9295567f582797f233cce823</citedby><cites>FETCH-LOGICAL-c4309-d20da706fad898caa8c646a5cde94f6c85fc2cc1b9295567f582797f233cce823</cites><orcidid>0000-0002-1925-2588 ; 0000-0002-4168-5515 ; 0000-0002-5061-206X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids><backlink>$$Uhttps://www.kci.go.kr/kciportal/ci/sereArticleSearch/ciSereArtiView.kci?sereArticleSearchBean.artiId=ART003054740$$DAccess content in National Research Foundation of Korea (NRF)$$Hfree_for_read</backlink></links><search><creatorcontrib>Yeo, Sangyeop</creatorcontrib><creatorcontrib>Ma, Yu‐Seung</creatorcontrib><creatorcontrib>Kim, Sang Cheol</creatorcontrib><creatorcontrib>Jun, Hyungkook</creatorcontrib><creatorcontrib>Kim, Taeho</creatorcontrib><title>Framework for evaluating code generation ability of large language models</title><title>ETRI journal</title><description>Large language models (LLMs) have revolutionized various applications in natural language processing and exhibited proficiency in generating programming code. We propose a framework for evaluating the code generation ability of LLMs and introduce a new metric, pass‐ratio@n, which captures the granularity of accuracy according to the pass rate of test cases. The framework is intended to be fully automatic to handle the repetitive work involved in generating prompts, conducting inferences, and executing the generated codes. A preliminary evaluation focusing on the prompt detail, problem publication date, and difficulty level demonstrates the successful integration of our framework with the LeetCode coding platform and highlights the applicability of the pass‐ratio@n metric.</description><subject>code generation</subject><subject>evaluation metric</subject><subject>large language model</subject><subject>natural language processing</subject><subject>software engineering</subject><subject>전자/정보통신공학</subject><issn>1225-6463</issn><issn>2233-7326</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>DOA</sourceid><recordid>eNqFUU1LAzEQDaJg_Th73bOwmkw-NjmK-FEQBKnnMJtNlrTbjWSr0n9v2opXL_NmhvceMzxCrhi9EcD0rd_kuLwBCrymXDZHZAbAed1wUMdkxgBkrYTip-RsmpaUAhVSz8j8MePaf6e8qkLKlf_C4RM3cewrlzpf9X70ucxprLCNQ9xsqxSqAXPvSx37TyzNujCH6YKcBBwmf_mL5-T98WFx_1y_vD7N7-9eaic4NXUHtMOGqoCdNtohalfOQuk6b0RQTsvgwDnWGjBSqiZIDY1pQvnFOa-Bn5Prg--Yg125aBPGPfbJrrK9e1vMLaOCgmK6kOcHcpdwaT9yXGPe7hX7Rcq9xbyJbvCWMmE6ITrRghGi0dpLqlrGleKGYhuK1-3By-U0TdmHPz9G7S4Cu4_A7iKwuwiKQh0U33Hw2__o9mHxBgyUMPwH5faKbg</recordid><startdate>202402</startdate><enddate>202402</enddate><creator>Yeo, Sangyeop</creator><creator>Ma, Yu‐Seung</creator><creator>Kim, Sang Cheol</creator><creator>Jun, Hyungkook</creator><creator>Kim, Taeho</creator><general>Electronics and Telecommunications Research Institute (ETRI)</general><general>한국전자통신연구원</general><scope>AAYXX</scope><scope>CITATION</scope><scope>DOA</scope><scope>ACYCR</scope><orcidid>https://orcid.org/0000-0002-1925-2588</orcidid><orcidid>https://orcid.org/0000-0002-4168-5515</orcidid><orcidid>https://orcid.org/0000-0002-5061-206X</orcidid></search><sort><creationdate>202402</creationdate><title>Framework for evaluating code generation ability of large language models</title><author>Yeo, Sangyeop ; Ma, Yu‐Seung ; Kim, Sang Cheol ; Jun, Hyungkook ; Kim, Taeho</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c4309-d20da706fad898caa8c646a5cde94f6c85fc2cc1b9295567f582797f233cce823</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>code generation</topic><topic>evaluation metric</topic><topic>large language model</topic><topic>natural language processing</topic><topic>software engineering</topic><topic>전자/정보통신공학</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yeo, Sangyeop</creatorcontrib><creatorcontrib>Ma, Yu‐Seung</creatorcontrib><creatorcontrib>Kim, Sang Cheol</creatorcontrib><creatorcontrib>Jun, Hyungkook</creatorcontrib><creatorcontrib>Kim, Taeho</creatorcontrib><collection>CrossRef</collection><collection>DOAJ Directory of Open Access Journals</collection><collection>Korean Citation Index</collection><jtitle>ETRI journal</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yeo, Sangyeop</au><au>Ma, Yu‐Seung</au><au>Kim, Sang Cheol</au><au>Jun, Hyungkook</au><au>Kim, Taeho</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Framework for evaluating code generation ability of large language models</atitle><jtitle>ETRI journal</jtitle><date>2024-02</date><risdate>2024</risdate><volume>46</volume><issue>1</issue><spage>106</spage><epage>117</epage><pages>106-117</pages><issn>1225-6463</issn><eissn>2233-7326</eissn><abstract>Large language models (LLMs) have revolutionized various applications in natural language processing and exhibited proficiency in generating programming code. We propose a framework for evaluating the code generation ability of LLMs and introduce a new metric, pass‐ratio@n, which captures the granularity of accuracy according to the pass rate of test cases. The framework is intended to be fully automatic to handle the repetitive work involved in generating prompts, conducting inferences, and executing the generated codes. A preliminary evaluation focusing on the prompt detail, problem publication date, and difficulty level demonstrates the successful integration of our framework with the LeetCode coding platform and highlights the applicability of the pass‐ratio@n metric.</abstract><pub>Electronics and Telecommunications Research Institute (ETRI)</pub><doi>10.4218/etrij.2023-0357</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0002-1925-2588</orcidid><orcidid>https://orcid.org/0000-0002-4168-5515</orcidid><orcidid>https://orcid.org/0000-0002-5061-206X</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1225-6463
ispartof	ETRI Journal, 2024, 46(1), , pp.106-117
issn	1225-6463 2233-7326
language	eng
recordid	cdi_nrf_kci_oai_kci_go_kr_ARTI_10402618
source	Alma/SFX Local Collection
subjects	code generation evaluation metric large language model natural language processing software engineering 전자/정보통신공학
title	Framework for evaluating code generation ability of large language models
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-30T23%3A29%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-wiley_nrf_k&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Framework%20for%20evaluating%20code%20generation%20ability%20of%20large%20language%20models&rft.jtitle=ETRI%20journal&rft.au=Yeo,%20Sangyeop&rft.date=2024-02&rft.volume=46&rft.issue=1&rft.spage=106&rft.epage=117&rft.pages=106-117&rft.issn=1225-6463&rft.eissn=2233-7326&rft_id=info:doi/10.4218/etrij.2023-0357&rft_dat=%3Cwiley_nrf_k%3EETR212649%3C/wiley_nrf_k%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c4309-d20da706fad898caa8c646a5cde94f6c85fc2cc1b9295567f582797f233cce823%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true