Loading…

B - 113 Assessing the Neuropsychology Information Base of Large Language Models

Abstract Objective Research has demonstrated that Large Language Models (LLMs) can obtain passing scores on medical board-certification examinations and have made substantial improvements in recent years (e.g., ChatGPT-4 and ChatGPT-3.5 demonstrating an accuracy of 83.4% and 73.4%, respectively, on...

Full description

Saved in:
Bibliographic Details
Published in:Archives of clinical neuropsychology 2024-10, Vol.39 (7), p.1214-1215
Main Authors: Kronenberger, Oscar, Bullinger, Leah, Kaser, Alyssa N, Cullum, Munro C, Schaffert, Jeffrey, Harder, Lana, Lacritz, Laura
Format: Article
Language:English
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Objective Research has demonstrated that Large Language Models (LLMs) can obtain passing scores on medical board-certification examinations and have made substantial improvements in recent years (e.g., ChatGPT-4 and ChatGPT-3.5 demonstrating an accuracy of 83.4% and 73.4%, respectively, on neurosurgical practice written board-certification questions). To date, the extent of LLMs’ neuropsychology domain information has not been investigated. This study is an initial exploration of ChatGPT-3.5, ChatGPT-4, and Gemini’s performance on mock clinical neuropsychology written board-certification examination questions. Methods Six hundred practice examination questions were obtained from the BRAIN American Academy of Clinical Neuropsychology (AACN) website. Data for specific question domains and pediatric subclassification were available for 300 items. Using an a priori prompting strategy, the questions were input into ChatGPT-3.5, ChatGPT-4, and Gemini. Responses were scored based on BRAIN AACN answer keys. Chi-squared tests assessed LLMs’ performance overall and within domains, and significance was set at p = 0.002 using Bonferroni correction. Results Across all six hundred items, ChatGPT-4 had superior accuracy (74%) to ChatGPT-3.5 (62.5%) and Gemini (52.7%; p’s 
ISSN:1873-5843
1873-5843
DOI:10.1093/arclin/acae067.274