Loading…

Evaluation of the accuracy and readability of ChatGPT-4 and Google Gemini in providing information on retinal detachment: a multicenter expert comparative study

Large language models (LLMs) such as ChatGPT-4 and Google Gemini show potential for patient health education, but concerns about their accuracy require careful evaluation. This study evaluates the readability and accuracy of ChatGPT-4 and Google Gemini in answering questions about retinal detachment...

Full description

Saved in:

Bibliographic Details
Published in:	International journal of retina and vitreous 2024-09, Vol.10 (1), p.61-11, Article 61
Main Authors:	Strzalkowski, Piotr, Strzalkowska, Alicja, Chhablani, Jay, Pfau, Kristina, Errera, Marie-Hélène, Roth, Mathias, Schaub, Friederike, Bechrakis, Nikolaos E, Hoerauf, Hans, Reiter, Constantin, Schuster, Alexander K, Geerling, Gerd, Guthoff, Rainer
Format:	Article
Language:	English
Subjects:	Artificial intelligence ChatGPT-4 Google Gemini Language learning models Patient education Retinal detachment
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Large language models (LLMs) such as ChatGPT-4 and Google Gemini show potential for patient health education, but concerns about their accuracy require careful evaluation. This study evaluates the readability and accuracy of ChatGPT-4 and Google Gemini in answering questions about retinal detachment. Comparative study analyzing responses from ChatGPT-4 and Google Gemini to 13 retinal detachment questions, categorized by difficulty levels (D1, D2, D3). Masked responses were reviewed by ten vitreoretinal specialists and rated on correctness, errors, thematic accuracy, coherence, and overall quality grading. Analysis included Flesch Readability Ease Score, word and sentence counts. Both Artificial Intelligence tools required college-level understanding for all difficulty levels. Google Gemini was easier to understand (p = 0.03), while ChatGPT-4 provided more correct answers for the more difficult questions (p = 0.0005) with fewer serious errors. ChatGPT-4 scored highest on most challenging questions, showing superior thematic accuracy (p = 0.003). ChatGPT-4 outperformed Google Gemini in 8 of 13 questions, with higher overall quality grades in the easiest (p = 0.03) and hardest levels (p = 0.0002), showing a lower grade as question difficulty increased. ChatGPT-4 and Google Gemini effectively address queries about retinal detachment, offering mostly accurate answers with few critical errors, though patients require higher education for comprehension. The implementation of AI tools may contribute to improving medical care by providing accurate and relevant healthcare information quickly.
ISSN:	2056-9920 2056-9920
DOI:	10.1186/s40942-024-00579-9