Loading…

Digital Ink and Surgical Dreams: Perceptions of Artificial Intelligence–Generated Essays in Residency Applications

Large language models like Chat Generative Pre-Trained Transformer (ChatGPT) are increasingly used in academic writing. Faculty may consider use of artificial intelligence (AI)–generated responses a form of cheating. We sought to determine whether general surgery residency faculty could detect AI ve...

Full description

Saved in:
Bibliographic Details
Published in:The Journal of surgical research 2024-09, Vol.301, p.504-511
Main Authors: Crawford, Loralai M., Hendzlik, Peter, Lam, Justine, Cannon, Lisa M., Qi, Yanjie, DeCaporale-Ryan, Lauren, Wilson, Nicole A.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Large language models like Chat Generative Pre-Trained Transformer (ChatGPT) are increasingly used in academic writing. Faculty may consider use of artificial intelligence (AI)–generated responses a form of cheating. We sought to determine whether general surgery residency faculty could detect AI versus human-written responses to a text prompt; hypothesizing that faculty would not be able to reliably differentiate AI versus human-written responses. Ten essays were generated using a text prompt, “Tell us in 1-2 paragraphs why you are considering the University of Rochester for General Surgery residency” (Current trainees: n = 5, ChatGPT: n = 5). Ten blinded faculty reviewers rated essays (ten-point Likert scale) on the following criteria: desire to interview, relevance to the general surgery residency, overall impression, and AI- or human-generated; with scores and identification error rates compared between the groups. There were no differences between groups for %total points (ChatGPT 66.0 ± 13.5%, human 70.0 ± 23.0%, P = 0.508) or identification error rates (ChatGPT 40.0 ± 35.0%, human 20.0 ± 30.0%, P = 0.175). Except for one, all essays were identified incorrectly by at least two reviewers. Essays identified as human-generated received higher overall impression scores (area under the curve: 0.82 ± 0.04, P 
ISSN:0022-4804
1095-8673
1095-8673
DOI:10.1016/j.jss.2024.06.020