Loading…

Harnessing large language models to auto-evaluate the student project reports

Addressing the problem of the difficulty in providing timely and reasonable feedback evaluation for student project reports, this paper proposes a method based on LLMs (Large Language Models) that can automatically generate instant feedback evaluations for student project reports. Three LLMs, namely...

Full description

Saved in:

Bibliographic Details
Published in:	Computers and education. Artificial intelligence 2024-12, Vol.7, p.100268, Article 100268
Main Authors:	Du, Haoze, Jia, Qinjin, Gehringer, Edward, Wang, Xianfang
Format:	Article
Language:	English
Subjects:	Auto-evaluation CGP-BLCS CPTB Large language models Student project reports
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites	cdi_FETCH-LOGICAL-c2098-e6e50131a0daa580269cc6eb270c578f414ef6369829ba083f8c8160b03178733
container_end_page
container_issue
container_start_page	100268
container_title	Computers and education. Artificial intelligence
container_volume	7
creator	Du, Haoze Jia, Qinjin Gehringer, Edward Wang, Xianfang
description	Addressing the problem of the difficulty in providing timely and reasonable feedback evaluation for student project reports, this paper proposes a method based on LLMs (Large Language Models) that can automatically generate instant feedback evaluations for student project reports. Three LLMs, namely BART (Bidirectional and Auto-Regressive Transformer), CPTB (chatgpt_paraphraser_on_T5_base), and CGP-BLCS (chatgpt-gpt4-prompts-bart-large-cnn-samsum), were designed to generate instant text feedback pre-training models for student project reports. The effectiveness of the feedback was evaluated using ROUGE Metrics, BERT Scores, and human expert evaluations. Experiments showed that the lightweight, fine-tuned BART model, when trained on a larger dataset of 80%, generated effective feedback evaluations for student project reports. When trained on a smaller dataset of 20%, both the BART and CPTB models had unsatisfactory overall performance, while the fine-tuned CGP-BLCS model was able to generate feedback evaluations that approached human-level evaluations. The detailed descriptions of the methods used with the LLMs for generating effective text feedback evaluations for student project reports will be useful to AI computer programmers, researchers, and computer science instructional designers for improving their courses and future research.
doi_str_mv	10.1016/j.caeai.2024.100268
format	article
fullrecord	<record><control><sourceid>elsevier_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_520470b6816a4f1b90aeeebc62f25214</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S2666920X24000717</els_id><doaj_id>oai_doaj_org_article_520470b6816a4f1b90aeeebc62f25214</doaj_id><sourcerecordid>S2666920X24000717</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2098-e6e50131a0daa580269cc6eb270c578f414ef6369829ba083f8c8160b03178733</originalsourceid><addsrcrecordid>eNp9kEFLw0AQhYMoWGp_gZf8gdTZTbLZHDxIUVuoeFHwtkw2k7ohzZbdbcF_77YV8eRlZnjMezy-JLllMGfAxF0_10ho5hx4ERXgQl4kEy6EyGoOH5d_7utk5n0P8adkOavEJHlZohvJezNu0gHdhuIcN3uMx9a2NPg02BT3wWZ0wGGPgdLwSakP-5bGkO6c7UmH1NHOuuBvkqsOB0-znz1N3p8e3xbLbP36vFo8rDPNoZYZCSohFkBoEUsZG9daC2p4BbqsZFewgjqRi1ryukGQeSe1ZAIaiKVllefTZHXObS32aufMFt2XsmjUSbBuo9AFowdSJYeigkZEPxYda2pAImq04B0vOStiVn7O0s5676j7zWOgjoBVr06A1RGwOgOOrvuzKyKigyGnvDY0amqNi0BiD_Ov_xv5_IOt</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Harnessing large language models to auto-evaluate the student project reports</title><source>Elsevier ScienceDirect Journals</source><creator>Du, Haoze ; Jia, Qinjin ; Gehringer, Edward ; Wang, Xianfang</creator><creatorcontrib>Du, Haoze ; Jia, Qinjin ; Gehringer, Edward ; Wang, Xianfang</creatorcontrib><description>Addressing the problem of the difficulty in providing timely and reasonable feedback evaluation for student project reports, this paper proposes a method based on LLMs (Large Language Models) that can automatically generate instant feedback evaluations for student project reports. Three LLMs, namely BART (Bidirectional and Auto-Regressive Transformer), CPTB (chatgpt_paraphraser_on_T5_base), and CGP-BLCS (chatgpt-gpt4-prompts-bart-large-cnn-samsum), were designed to generate instant text feedback pre-training models for student project reports. The effectiveness of the feedback was evaluated using ROUGE Metrics, BERT Scores, and human expert evaluations. Experiments showed that the lightweight, fine-tuned BART model, when trained on a larger dataset of 80%, generated effective feedback evaluations for student project reports. When trained on a smaller dataset of 20%, both the BART and CPTB models had unsatisfactory overall performance, while the fine-tuned CGP-BLCS model was able to generate feedback evaluations that approached human-level evaluations. The detailed descriptions of the methods used with the LLMs for generating effective text feedback evaluations for student project reports will be useful to AI computer programmers, researchers, and computer science instructional designers for improving their courses and future research.</description><identifier>ISSN: 2666-920X</identifier><identifier>EISSN: 2666-920X</identifier><identifier>DOI: 10.1016/j.caeai.2024.100268</identifier><language>eng</language><publisher>Elsevier Ltd</publisher><subject>Auto-evaluation ; CGP-BLCS ; CPTB ; Large language models ; Student project reports</subject><ispartof>Computers and education. Artificial intelligence, 2024-12, Vol.7, p.100268, Article 100268</ispartof><rights>2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c2098-e6e50131a0daa580269cc6eb270c578f414ef6369829ba083f8c8160b03178733</cites><orcidid>0000-0001-8355-1470 ; 0000-0002-5298-9047</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S2666920X24000717$$EHTML$$P50$$Gelsevier$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,3549,27924,27925,45780</link.rule.ids></links><search><creatorcontrib>Du, Haoze</creatorcontrib><creatorcontrib>Jia, Qinjin</creatorcontrib><creatorcontrib>Gehringer, Edward</creatorcontrib><creatorcontrib>Wang, Xianfang</creatorcontrib><title>Harnessing large language models to auto-evaluate the student project reports</title><title>Computers and education. Artificial intelligence</title><description>Addressing the problem of the difficulty in providing timely and reasonable feedback evaluation for student project reports, this paper proposes a method based on LLMs (Large Language Models) that can automatically generate instant feedback evaluations for student project reports. Three LLMs, namely BART (Bidirectional and Auto-Regressive Transformer), CPTB (chatgpt_paraphraser_on_T5_base), and CGP-BLCS (chatgpt-gpt4-prompts-bart-large-cnn-samsum), were designed to generate instant text feedback pre-training models for student project reports. The effectiveness of the feedback was evaluated using ROUGE Metrics, BERT Scores, and human expert evaluations. Experiments showed that the lightweight, fine-tuned BART model, when trained on a larger dataset of 80%, generated effective feedback evaluations for student project reports. When trained on a smaller dataset of 20%, both the BART and CPTB models had unsatisfactory overall performance, while the fine-tuned CGP-BLCS model was able to generate feedback evaluations that approached human-level evaluations. The detailed descriptions of the methods used with the LLMs for generating effective text feedback evaluations for student project reports will be useful to AI computer programmers, researchers, and computer science instructional designers for improving their courses and future research.</description><subject>Auto-evaluation</subject><subject>CGP-BLCS</subject><subject>CPTB</subject><subject>Large language models</subject><subject>Student project reports</subject><issn>2666-920X</issn><issn>2666-920X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>DOA</sourceid><recordid>eNp9kEFLw0AQhYMoWGp_gZf8gdTZTbLZHDxIUVuoeFHwtkw2k7ohzZbdbcF_77YV8eRlZnjMezy-JLllMGfAxF0_10ho5hx4ERXgQl4kEy6EyGoOH5d_7utk5n0P8adkOavEJHlZohvJezNu0gHdhuIcN3uMx9a2NPg02BT3wWZ0wGGPgdLwSakP-5bGkO6c7UmH1NHOuuBvkqsOB0-znz1N3p8e3xbLbP36vFo8rDPNoZYZCSohFkBoEUsZG9daC2p4BbqsZFewgjqRi1ryukGQeSe1ZAIaiKVllefTZHXObS32aufMFt2XsmjUSbBuo9AFowdSJYeigkZEPxYda2pAImq04B0vOStiVn7O0s5676j7zWOgjoBVr06A1RGwOgOOrvuzKyKigyGnvDY0amqNi0BiD_Ov_xv5_IOt</recordid><startdate>202412</startdate><enddate>202412</enddate><creator>Du, Haoze</creator><creator>Jia, Qinjin</creator><creator>Gehringer, Edward</creator><creator>Wang, Xianfang</creator><general>Elsevier Ltd</general><general>Elsevier</general><scope>6I.</scope><scope>AAFTH</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-8355-1470</orcidid><orcidid>https://orcid.org/0000-0002-5298-9047</orcidid></search><sort><creationdate>202412</creationdate><title>Harnessing large language models to auto-evaluate the student project reports</title><author>Du, Haoze ; Jia, Qinjin ; Gehringer, Edward ; Wang, Xianfang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2098-e6e50131a0daa580269cc6eb270c578f414ef6369829ba083f8c8160b03178733</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Auto-evaluation</topic><topic>CGP-BLCS</topic><topic>CPTB</topic><topic>Large language models</topic><topic>Student project reports</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Du, Haoze</creatorcontrib><creatorcontrib>Jia, Qinjin</creatorcontrib><creatorcontrib>Gehringer, Edward</creatorcontrib><creatorcontrib>Wang, Xianfang</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>CrossRef</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>Computers and education. Artificial intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Du, Haoze</au><au>Jia, Qinjin</au><au>Gehringer, Edward</au><au>Wang, Xianfang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Harnessing large language models to auto-evaluate the student project reports</atitle><jtitle>Computers and education. Artificial intelligence</jtitle><date>2024-12</date><risdate>2024</risdate><volume>7</volume><spage>100268</spage><pages>100268-</pages><artnum>100268</artnum><issn>2666-920X</issn><eissn>2666-920X</eissn><abstract>Addressing the problem of the difficulty in providing timely and reasonable feedback evaluation for student project reports, this paper proposes a method based on LLMs (Large Language Models) that can automatically generate instant feedback evaluations for student project reports. Three LLMs, namely BART (Bidirectional and Auto-Regressive Transformer), CPTB (chatgpt_paraphraser_on_T5_base), and CGP-BLCS (chatgpt-gpt4-prompts-bart-large-cnn-samsum), were designed to generate instant text feedback pre-training models for student project reports. The effectiveness of the feedback was evaluated using ROUGE Metrics, BERT Scores, and human expert evaluations. Experiments showed that the lightweight, fine-tuned BART model, when trained on a larger dataset of 80%, generated effective feedback evaluations for student project reports. When trained on a smaller dataset of 20%, both the BART and CPTB models had unsatisfactory overall performance, while the fine-tuned CGP-BLCS model was able to generate feedback evaluations that approached human-level evaluations. The detailed descriptions of the methods used with the LLMs for generating effective text feedback evaluations for student project reports will be useful to AI computer programmers, researchers, and computer science instructional designers for improving their courses and future research.</abstract><pub>Elsevier Ltd</pub><doi>10.1016/j.caeai.2024.100268</doi><orcidid>https://orcid.org/0000-0001-8355-1470</orcidid><orcidid>https://orcid.org/0000-0002-5298-9047</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2666-920X
ispartof	Computers and education. Artificial intelligence, 2024-12, Vol.7, p.100268, Article 100268
issn	2666-920X 2666-920X
language	eng
recordid	cdi_doaj_primary_oai_doaj_org_article_520470b6816a4f1b90aeeebc62f25214
source	Elsevier ScienceDirect Journals
subjects	Auto-evaluation CGP-BLCS CPTB Large language models Student project reports
title	Harnessing large language models to auto-evaluate the student project reports
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T03%3A43%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-elsevier_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Harnessing%20large%20language%20models%20to%20auto-evaluate%20the%20student%20project%20reports&rft.jtitle=Computers%20and%20education.%20Artificial%20intelligence&rft.au=Du,%20Haoze&rft.date=2024-12&rft.volume=7&rft.spage=100268&rft.pages=100268-&rft.artnum=100268&rft.issn=2666-920X&rft.eissn=2666-920X&rft_id=info:doi/10.1016/j.caeai.2024.100268&rft_dat=%3Celsevier_doaj_%3ES2666920X24000717%3C/elsevier_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c2098-e6e50131a0daa580269cc6eb270c578f414ef6369829ba083f8c8160b03178733%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true