Loading…
Artificial intelligence in dental education: ChatGPT's performance on the periodontic in‐service examination
Background ChatGPT's (Chat Generative Pre‐Trained Transformer) remarkable capacity to generate human‐like output makes it an appealing learning tool for healthcare students worldwide. Nevertheless, the chatbot's responses may be subject to inaccuracies, putting forth an intense risk of mis...
Saved in:
Published in: | Journal of periodontology (1970) 2024-07, Vol.95 (7), p.682-687 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Background
ChatGPT's (Chat Generative Pre‐Trained Transformer) remarkable capacity to generate human‐like output makes it an appealing learning tool for healthcare students worldwide. Nevertheless, the chatbot's responses may be subject to inaccuracies, putting forth an intense risk of misinformation. ChatGPT's capabilities should be examined in every corner of healthcare education, including dentistry and its specialties, to understand the potential of misinformation associated with the chatbot's use as a learning tool. Our investigation aims to explore ChatGPT's foundation of knowledge in the field of periodontology by evaluating the chatbot's performance on questions obtained from an in‐service examination administered by the American Academy of Periodontology (AAP).
Methods
ChatGPT3.5 and ChatGPT4 were evaluated on 311 multiple‐choice questions obtained from the 2023 in‐service examination administered by the AAP. The dataset of in‐service examination questions was accessed through Nova Southeastern University's Department of Periodontology. Our study excluded questions containing an image as ChatGPT does not accept image inputs.
Results
ChatGPT3.5 and ChatGPT4 answered 57.9% and 73.6% of in‐service questions correctly on the 2023 Periodontics In‐Service Written Examination, respectively. A two‐tailed t test was incorporated to compare independent sample means, and sample proportions were compared using a two‐tailed χ2 test. A p value below the threshold of 0.05 was deemed statistically significant.
Conclusion
While ChatGPT4 showed a higher proficiency compared to ChatGPT3.5, both chatbot models leave considerable room for misinformation with their responses relating to periodontology. The findings of the study encourage residents to scrutinize the periodontic information generated by ChatGPT to account for the chatbot's current limitations. |
---|---|
ISSN: | 0022-3492 1943-3670 1943-3670 |
DOI: | 10.1002/JPER.23-0514 |