Loading…

Comprehensive Evaluation and Insights into the Use of Large Language Models in the Automation of Behavior-Driven Development Acceptance Test Formulation

Behavior-driven development (BDD) is an Agile testing methodology fostering collaboration among developers, QA analysts, and stakeholders. In this manuscript, we propose a novel approach to enhance BDD practices using large language models (LLMs) to automate acceptance test generation. Our study use...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2024-03
Main Authors: Karpurapu, Shanthi, Myneni, Sravanthy, Nettur, Unnati, Likhit Sagar Gajja, Burke, Dave, Stiehm, Tom, Payne, Jeffery
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Karpurapu, Shanthi
Myneni, Sravanthy
Nettur, Unnati
Likhit Sagar Gajja
Burke, Dave
Stiehm, Tom
Payne, Jeffery
description Behavior-driven development (BDD) is an Agile testing methodology fostering collaboration among developers, QA analysts, and stakeholders. In this manuscript, we propose a novel approach to enhance BDD practices using large language models (LLMs) to automate acceptance test generation. Our study uses zero and few-shot prompts to evaluate LLMs such as GPT-3.5, GPT-4, Llama-2-13B, and PaLM-2. The paper presents a detailed methodology that includes the dataset, prompt techniques, LLMs, and the evaluation process. The results demonstrate that GPT-3.5 and GPT-4 generate error-free BDD acceptance tests with better performance. The few-shot prompt technique highlights its ability to provide higher accuracy by incorporating examples for in-context learning. Furthermore, the study examines syntax errors, validation accuracy, and comparative analysis of LLMs, revealing their effectiveness in enhancing BDD practices. However, our study acknowledges that there are limitations to the proposed approach. We emphasize that this approach can support collaborative BDD processes and create opportunities for future research into automated BDD acceptance test generation using LLMs.
doi_str_mv 10.48550/arxiv.2403.14965
format article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2982474806</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2982474806</sourcerecordid><originalsourceid>FETCH-LOGICAL-a955-15a154e9ef03f5abf6f8a65435e8244d5020947da4b6f7031ce4f89180dfd3063</originalsourceid><addsrcrecordid>eNotjstOwzAQRS0kJKrSD2BniXWK40fiLEsfUKmITVlXbjJuUiV2sJ2IT-FzcVs2M6PRmXMHoaeUzLkUgrwo99OMc8oJm6e8yMQdmlDG0kRySh_QzPszIYRmORWCTdDv0na9gxqMb0bA61G1gwqNNViZCm_j9lQHjxsTLA414C8P2Gq8U-4EsZrToOLwYStoL9SVWQzBdjdJRF-hVmNjXbJyMcHgFYzQ2r4DE_CiLKEPypSA9-AD3ljXDe319BHda9V6mP33Kdpv1vvle7L7fNsuF7tEFUIkqVCp4FCAJkwLddSZlioTnAmQlPNKEEoKnleKHzOdE5aWwLUsUkkqXTGSsSl6vml7Z7-H-MPhbAdnYuKBFlGRcxmpPyOkatA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2982474806</pqid></control><display><type>article</type><title>Comprehensive Evaluation and Insights into the Use of Large Language Models in the Automation of Behavior-Driven Development Acceptance Test Formulation</title><source>Publicly Available Content Database</source><creator>Karpurapu, Shanthi ; Myneni, Sravanthy ; Nettur, Unnati ; Likhit Sagar Gajja ; Burke, Dave ; Stiehm, Tom ; Payne, Jeffery</creator><creatorcontrib>Karpurapu, Shanthi ; Myneni, Sravanthy ; Nettur, Unnati ; Likhit Sagar Gajja ; Burke, Dave ; Stiehm, Tom ; Payne, Jeffery</creatorcontrib><description>Behavior-driven development (BDD) is an Agile testing methodology fostering collaboration among developers, QA analysts, and stakeholders. In this manuscript, we propose a novel approach to enhance BDD practices using large language models (LLMs) to automate acceptance test generation. Our study uses zero and few-shot prompts to evaluate LLMs such as GPT-3.5, GPT-4, Llama-2-13B, and PaLM-2. The paper presents a detailed methodology that includes the dataset, prompt techniques, LLMs, and the evaluation process. The results demonstrate that GPT-3.5 and GPT-4 generate error-free BDD acceptance tests with better performance. The few-shot prompt technique highlights its ability to provide higher accuracy by incorporating examples for in-context learning. Furthermore, the study examines syntax errors, validation accuracy, and comparative analysis of LLMs, revealing their effectiveness in enhancing BDD practices. However, our study acknowledges that there are limitations to the proposed approach. We emphasize that this approach can support collaborative BDD processes and create opportunities for future research into automated BDD acceptance test generation using LLMs.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2403.14965</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Acceptance tests ; Artificial intelligence ; Automation ; Collaboration ; Large language models</subject><ispartof>arXiv.org, 2024-03</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2982474806?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,27925,37012,44590</link.rule.ids></links><search><creatorcontrib>Karpurapu, Shanthi</creatorcontrib><creatorcontrib>Myneni, Sravanthy</creatorcontrib><creatorcontrib>Nettur, Unnati</creatorcontrib><creatorcontrib>Likhit Sagar Gajja</creatorcontrib><creatorcontrib>Burke, Dave</creatorcontrib><creatorcontrib>Stiehm, Tom</creatorcontrib><creatorcontrib>Payne, Jeffery</creatorcontrib><title>Comprehensive Evaluation and Insights into the Use of Large Language Models in the Automation of Behavior-Driven Development Acceptance Test Formulation</title><title>arXiv.org</title><description>Behavior-driven development (BDD) is an Agile testing methodology fostering collaboration among developers, QA analysts, and stakeholders. In this manuscript, we propose a novel approach to enhance BDD practices using large language models (LLMs) to automate acceptance test generation. Our study uses zero and few-shot prompts to evaluate LLMs such as GPT-3.5, GPT-4, Llama-2-13B, and PaLM-2. The paper presents a detailed methodology that includes the dataset, prompt techniques, LLMs, and the evaluation process. The results demonstrate that GPT-3.5 and GPT-4 generate error-free BDD acceptance tests with better performance. The few-shot prompt technique highlights its ability to provide higher accuracy by incorporating examples for in-context learning. Furthermore, the study examines syntax errors, validation accuracy, and comparative analysis of LLMs, revealing their effectiveness in enhancing BDD practices. However, our study acknowledges that there are limitations to the proposed approach. We emphasize that this approach can support collaborative BDD processes and create opportunities for future research into automated BDD acceptance test generation using LLMs.</description><subject>Acceptance tests</subject><subject>Artificial intelligence</subject><subject>Automation</subject><subject>Collaboration</subject><subject>Large language models</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNotjstOwzAQRS0kJKrSD2BniXWK40fiLEsfUKmITVlXbjJuUiV2sJ2IT-FzcVs2M6PRmXMHoaeUzLkUgrwo99OMc8oJm6e8yMQdmlDG0kRySh_QzPszIYRmORWCTdDv0na9gxqMb0bA61G1gwqNNViZCm_j9lQHjxsTLA414C8P2Gq8U-4EsZrToOLwYStoL9SVWQzBdjdJRF-hVmNjXbJyMcHgFYzQ2r4DE_CiLKEPypSA9-AD3ljXDe319BHda9V6mP33Kdpv1vvle7L7fNsuF7tEFUIkqVCp4FCAJkwLddSZlioTnAmQlPNKEEoKnleKHzOdE5aWwLUsUkkqXTGSsSl6vml7Z7-H-MPhbAdnYuKBFlGRcxmpPyOkatA</recordid><startdate>20240322</startdate><enddate>20240322</enddate><creator>Karpurapu, Shanthi</creator><creator>Myneni, Sravanthy</creator><creator>Nettur, Unnati</creator><creator>Likhit Sagar Gajja</creator><creator>Burke, Dave</creator><creator>Stiehm, Tom</creator><creator>Payne, Jeffery</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240322</creationdate><title>Comprehensive Evaluation and Insights into the Use of Large Language Models in the Automation of Behavior-Driven Development Acceptance Test Formulation</title><author>Karpurapu, Shanthi ; Myneni, Sravanthy ; Nettur, Unnati ; Likhit Sagar Gajja ; Burke, Dave ; Stiehm, Tom ; Payne, Jeffery</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a955-15a154e9ef03f5abf6f8a65435e8244d5020947da4b6f7031ce4f89180dfd3063</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Acceptance tests</topic><topic>Artificial intelligence</topic><topic>Automation</topic><topic>Collaboration</topic><topic>Large language models</topic><toplevel>online_resources</toplevel><creatorcontrib>Karpurapu, Shanthi</creatorcontrib><creatorcontrib>Myneni, Sravanthy</creatorcontrib><creatorcontrib>Nettur, Unnati</creatorcontrib><creatorcontrib>Likhit Sagar Gajja</creatorcontrib><creatorcontrib>Burke, Dave</creatorcontrib><creatorcontrib>Stiehm, Tom</creatorcontrib><creatorcontrib>Payne, Jeffery</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Karpurapu, Shanthi</au><au>Myneni, Sravanthy</au><au>Nettur, Unnati</au><au>Likhit Sagar Gajja</au><au>Burke, Dave</au><au>Stiehm, Tom</au><au>Payne, Jeffery</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Comprehensive Evaluation and Insights into the Use of Large Language Models in the Automation of Behavior-Driven Development Acceptance Test Formulation</atitle><jtitle>arXiv.org</jtitle><date>2024-03-22</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Behavior-driven development (BDD) is an Agile testing methodology fostering collaboration among developers, QA analysts, and stakeholders. In this manuscript, we propose a novel approach to enhance BDD practices using large language models (LLMs) to automate acceptance test generation. Our study uses zero and few-shot prompts to evaluate LLMs such as GPT-3.5, GPT-4, Llama-2-13B, and PaLM-2. The paper presents a detailed methodology that includes the dataset, prompt techniques, LLMs, and the evaluation process. The results demonstrate that GPT-3.5 and GPT-4 generate error-free BDD acceptance tests with better performance. The few-shot prompt technique highlights its ability to provide higher accuracy by incorporating examples for in-context learning. Furthermore, the study examines syntax errors, validation accuracy, and comparative analysis of LLMs, revealing their effectiveness in enhancing BDD practices. However, our study acknowledges that there are limitations to the proposed approach. We emphasize that this approach can support collaborative BDD processes and create opportunities for future research into automated BDD acceptance test generation using LLMs.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2403.14965</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-03
issn 2331-8422
language eng
recordid cdi_proquest_journals_2982474806
source Publicly Available Content Database
subjects Acceptance tests
Artificial intelligence
Automation
Collaboration
Large language models
title Comprehensive Evaluation and Insights into the Use of Large Language Models in the Automation of Behavior-Driven Development Acceptance Test Formulation
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T14%3A20%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Comprehensive%20Evaluation%20and%20Insights%20into%20the%20Use%20of%20Large%20Language%20Models%20in%20the%20Automation%20of%20Behavior-Driven%20Development%20Acceptance%20Test%20Formulation&rft.jtitle=arXiv.org&rft.au=Karpurapu,%20Shanthi&rft.date=2024-03-22&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2403.14965&rft_dat=%3Cproquest%3E2982474806%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a955-15a154e9ef03f5abf6f8a65435e8244d5020947da4b6f7031ce4f89180dfd3063%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2982474806&rft_id=info:pmid/&rfr_iscdi=true