Loading…

Comprehensive Evaluation and Insights into the Use of Large Language Models in the Automation of Behavior-Driven Development Acceptance Test Formulation

Behavior-driven development (BDD) is an Agile testing methodology fostering collaboration among developers, QA analysts, and stakeholders. In this manuscript, we propose a novel approach to enhance BDD practices using large language models (LLMs) to automate acceptance test generation. Our study use...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2024-03
Main Authors:	Karpurapu, Shanthi, Myneni, Sravanthy, Nettur, Unnati, Likhit Sagar Gajja, Burke, Dave, Stiehm, Tom, Payne, Jeffery
Format:	Article
Language:	English
Subjects:	Acceptance tests Artificial intelligence Automation Collaboration Large language models
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Karpurapu, Shanthi Myneni, Sravanthy Nettur, Unnati Likhit Sagar Gajja Burke, Dave Stiehm, Tom Payne, Jeffery
description	Behavior-driven development (BDD) is an Agile testing methodology fostering collaboration among developers, QA analysts, and stakeholders. In this manuscript, we propose a novel approach to enhance BDD practices using large language models (LLMs) to automate acceptance test generation. Our study uses zero and few-shot prompts to evaluate LLMs such as GPT-3.5, GPT-4, Llama-2-13B, and PaLM-2. The paper presents a detailed methodology that includes the dataset, prompt techniques, LLMs, and the evaluation process. The results demonstrate that GPT-3.5 and GPT-4 generate error-free BDD acceptance tests with better performance. The few-shot prompt technique highlights its ability to provide higher accuracy by incorporating examples for in-context learning. Furthermore, the study examines syntax errors, validation accuracy, and comparative analysis of LLMs, revealing their effectiveness in enhancing BDD practices. However, our study acknowledges that there are limitations to the proposed approach. We emphasize that this approach can support collaborative BDD processes and create opportunities for future research into automated BDD acceptance test generation using LLMs.
doi_str_mv	10.48550/arxiv.2403.14965
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2982474806</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2982474806</sourcerecordid><originalsourceid>FETCH-LOGICAL-a955-15a154e9ef03f5abf6f8a65435e8244d5020947da4b6f7031ce4f89180dfd3063</originalsourceid><addsrcrecordid>eNotjstOwzAQRS0kJKrSD2BniXWK40fiLEsfUKmITVlXbjJuUiV2sJ2IT-FzcVs2M6PRmXMHoaeUzLkUgrwo99OMc8oJm6e8yMQdmlDG0kRySh_QzPszIYRmORWCTdDv0na9gxqMb0bA61G1gwqNNViZCm_j9lQHjxsTLA414C8P2Gq8U-4EsZrToOLwYStoL9SVWQzBdjdJRF-hVmNjXbJyMcHgFYzQ2r4DE_CiLKEPypSA9-AD3ljXDe319BHda9V6mP33Kdpv1vvle7L7fNsuF7tEFUIkqVCp4FCAJkwLddSZlioTnAmQlPNKEEoKnleKHzOdE5aWwLUsUkkqXTGSsSl6vml7Z7-H-MPhbAdnYuKBFlGRcxmpPyOkatA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2982474806</pqid></control><display><type>article</type><title>Comprehensive Evaluation and Insights into the Use of Large Language Models in the Automation of Behavior-Driven Development Acceptance Test Formulation</title><source>Publicly Available Content Database</source><creator>Karpurapu, Shanthi ; Myneni, Sravanthy ; Nettur, Unnati ; Likhit Sagar Gajja ; Burke, Dave ; Stiehm, Tom ; Payne, Jeffery</creator><creatorcontrib>Karpurapu, Shanthi ; Myneni, Sravanthy ; Nettur, Unnati ; Likhit Sagar Gajja ; Burke, Dave ; Stiehm, Tom ; Payne, Jeffery</creatorcontrib><description>Behavior-driven development (BDD) is an Agile testing methodology fostering collaboration among developers, QA analysts, and stakeholders. In this manuscript, we propose a novel approach to enhance BDD practices using large language models (LLMs) to automate acceptance test generation. Our study uses zero and few-shot prompts to evaluate LLMs such as GPT-3.5, GPT-4, Llama-2-13B, and PaLM-2. The paper presents a detailed methodology that includes the dataset, prompt techniques, LLMs, and the evaluation process. The results demonstrate that GPT-3.5 and GPT-4 generate error-free BDD acceptance tests with better performance. The few-shot prompt technique highlights its ability to provide higher accuracy by incorporating examples for in-context learning. Furthermore, the study examines syntax errors, validation accuracy, and comparative analysis of LLMs, revealing their effectiveness in enhancing BDD practices. However, our study acknowledges that there are limitations to the proposed approach. We emphasize that this approach can support collaborative BDD processes and create opportunities for future research into automated BDD acceptance test generation using LLMs.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2403.14965</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Acceptance tests ; Artificial intelligence ; Automation ; Collaboration ; Large language models</subject><ispartof>arXiv.org, 2024-03</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2982474806?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,27925,37012,44590</link.rule.ids></links><search><creatorcontrib>Karpurapu, Shanthi</creatorcontrib><creatorcontrib>Myneni, Sravanthy</creatorcontrib><creatorcontrib>Nettur, Unnati</creatorcontrib><creatorcontrib>Likhit Sagar Gajja</creatorcontrib><creatorcontrib>Burke, Dave</creatorcontrib><creatorcontrib>Stiehm, Tom</creatorcontrib><creatorcontrib>Payne, Jeffery</creatorcontrib><title>Comprehensive Evaluation and Insights into the Use of Large Language Models in the Automation of Behavior-Driven Development Acceptance Test Formulation</title><title>arXiv.org</title><description>Behavior-driven development (BDD) is an Agile testing methodology fostering collaboration among developers, QA analysts, and stakeholders. In this manuscript, we propose a novel approach to enhance BDD practices using large language models (LLMs) to automate acceptance test generation. Our study uses zero and few-shot prompts to evaluate LLMs such as GPT-3.5, GPT-4, Llama-2-13B, and PaLM-2. The paper presents a detailed methodology that includes the dataset, prompt techniques, LLMs, and the evaluation process. The results demonstrate that GPT-3.5 and GPT-4 generate error-free BDD acceptance tests with better performance. The few-shot prompt technique highlights its ability to provide higher accuracy by incorporating examples for in-context learning. Furthermore, the study examines syntax errors, validation accuracy, and comparative analysis of LLMs, revealing their effectiveness in enhancing BDD practices. However, our study acknowledges that there are limitations to the proposed approach. We emphasize that this approach can support collaborative BDD processes and create opportunities for future research into automated BDD acceptance test generation using LLMs.</description><subject>Acceptance tests</subject><subject>Artificial intelligence</subject><subject>Automation</subject><subject>Collaboration</subject><subject>Large language models</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNotjstOwzAQRS0kJKrSD2BniXWK40fiLEsfUKmITVlXbjJuUiV2sJ2IT-FzcVs2M6PRmXMHoaeUzLkUgrwo99OMc8oJm6e8yMQdmlDG0kRySh_QzPszIYRmORWCTdDv0na9gxqMb0bA61G1gwqNNViZCm_j9lQHjxsTLA414C8P2Gq8U-4EsZrToOLwYStoL9SVWQzBdjdJRF-hVmNjXbJyMcHgFYzQ2r4DE_CiLKEPypSA9-AD3ljXDe319BHda9V6mP33Kdpv1vvle7L7fNsuF7tEFUIkqVCp4FCAJkwLddSZlioTnAmQlPNKEEoKnleKHzOdE5aWwLUsUkkqXTGSsSl6vml7Z7-H-MPhbAdnYuKBFlGRcxmpPyOkatA</recordid><startdate>20240322</startdate><enddate>20240322</enddate><creator>Karpurapu, Shanthi</creator><creator>Myneni, Sravanthy</creator><creator>Nettur, Unnati</creator><creator>Likhit Sagar Gajja</creator><creator>Burke, Dave</creator><creator>Stiehm, Tom</creator><creator>Payne, Jeffery</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240322</creationdate><title>Comprehensive Evaluation and Insights into the Use of Large Language Models in the Automation of Behavior-Driven Development Acceptance Test Formulation</title><author>Karpurapu, Shanthi ; Myneni, Sravanthy ; Nettur, Unnati ; Likhit Sagar Gajja ; Burke, Dave ; Stiehm, Tom ; Payne, Jeffery</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a955-15a154e9ef03f5abf6f8a65435e8244d5020947da4b6f7031ce4f89180dfd3063</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Acceptance tests</topic><topic>Artificial intelligence</topic><topic>Automation</topic><topic>Collaboration</topic><topic>Large language models</topic><toplevel>online_resources</toplevel><creatorcontrib>Karpurapu, Shanthi</creatorcontrib><creatorcontrib>Myneni, Sravanthy</creatorcontrib><creatorcontrib>Nettur, Unnati</creatorcontrib><creatorcontrib>Likhit Sagar Gajja</creatorcontrib><creatorcontrib>Burke, Dave</creatorcontrib><creatorcontrib>Stiehm, Tom</creatorcontrib><creatorcontrib>Payne, Jeffery</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Karpurapu, Shanthi</au><au>Myneni, Sravanthy</au><au>Nettur, Unnati</au><au>Likhit Sagar Gajja</au><au>Burke, Dave</au><au>Stiehm, Tom</au><au>Payne, Jeffery</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Comprehensive Evaluation and Insights into the Use of Large Language Models in the Automation of Behavior-Driven Development Acceptance Test Formulation</atitle><jtitle>arXiv.org</jtitle><date>2024-03-22</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Behavior-driven development (BDD) is an Agile testing methodology fostering collaboration among developers, QA analysts, and stakeholders. In this manuscript, we propose a novel approach to enhance BDD practices using large language models (LLMs) to automate acceptance test generation. Our study uses zero and few-shot prompts to evaluate LLMs such as GPT-3.5, GPT-4, Llama-2-13B, and PaLM-2. The paper presents a detailed methodology that includes the dataset, prompt techniques, LLMs, and the evaluation process. The results demonstrate that GPT-3.5 and GPT-4 generate error-free BDD acceptance tests with better performance. The few-shot prompt technique highlights its ability to provide higher accuracy by incorporating examples for in-context learning. Furthermore, the study examines syntax errors, validation accuracy, and comparative analysis of LLMs, revealing their effectiveness in enhancing BDD practices. However, our study acknowledges that there are limitations to the proposed approach. We emphasize that this approach can support collaborative BDD processes and create opportunities for future research into automated BDD acceptance test generation using LLMs.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2403.14965</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-03
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2982474806
source	Publicly Available Content Database
subjects	Acceptance tests Artificial intelligence Automation Collaboration Large language models
title	Comprehensive Evaluation and Insights into the Use of Large Language Models in the Automation of Behavior-Driven Development Acceptance Test Formulation
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T14%3A20%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Comprehensive%20Evaluation%20and%20Insights%20into%20the%20Use%20of%20Large%20Language%20Models%20in%20the%20Automation%20of%20Behavior-Driven%20Development%20Acceptance%20Test%20Formulation&rft.jtitle=arXiv.org&rft.au=Karpurapu,%20Shanthi&rft.date=2024-03-22&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2403.14965&rft_dat=%3Cproquest%3E2982474806%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a955-15a154e9ef03f5abf6f8a65435e8244d5020947da4b6f7031ce4f89180dfd3063%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2982474806&rft_id=info:pmid/&rfr_iscdi=true