Loading…
Assessing ChatGPT vs. Standard Medical Resources for Endoscopic Sleeve Gastroplasty Education: A Medical Professional Evaluation Study
Background and Aims The Chat Generative Pre-Trained Transformer (ChatGPT) represents a significant advancement in artificial intelligence (AI) chatbot technology. While ChatGPT offers promising capabilities, concerns remain about its reliability and accuracy. This study aims to evaluate ChatGPT’s re...
Saved in:
Published in: | Obesity surgery 2024-07, Vol.34 (7), p.2718-2724 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Background and Aims
The Chat Generative Pre-Trained Transformer (ChatGPT) represents a significant advancement in artificial intelligence (AI) chatbot technology. While ChatGPT offers promising capabilities, concerns remain about its reliability and accuracy. This study aims to evaluate ChatGPT’s responses to patients’ frequently asked questions about Endoscopic Sleeve Gastroplasty (ESG).
Methods
Expert Gastroenterologists and Bariatric Surgeons, with experience in ESG, were invited to evaluate ChatGPT-generated answers to eight ESG-related questions, and answers sourced from hospital websites. The evaluation criteria included ease of understanding, scientific accuracy, and overall answer satisfaction. They were also tasked with discerning whether each response was AI generated or not.
Results
Twelve medical professionals with expertise in ESG participated, 83.3% of whom had experience performing the procedure independently. The entire cohort possessed substantial knowledge about ESG. ChatGPT’s utility among participants, rated on a scale of one to five, averaged 2.75. The raters demonstrated a 54% accuracy rate in distinguishing AI-generated responses, with a sensitivity of 39% and specificity of 60%, resulting in an average of 17.6 correct identifications out of a possible 31. Overall, there were no significant differences between AI-generated and non-AI responses in terms of scientific accuracy, understandability, and satisfaction, with one notable exception. For the question defining ESG, the AI-generated definition scored higher in scientific accuracy (4.33 vs. 3.61,
p
= 0.007) and satisfaction (4.33 vs. 3.58,
p
= 0.009) compared to the non-AI versions.
Conclusions
This study underscores ChatGPT’s efficacy in providing medical information on ESG, demonstrating its comparability to traditional sources in scientific accuracy.
Graphical Abstract |
---|---|
ISSN: | 0960-8923 1708-0428 1708-0428 |
DOI: | 10.1007/s11695-024-07283-5 |