Loading…

Evaluating the Adherence of Large Language Models to Surgical Guidelines: A Comparative Analysis of Chatbot Recommendations and North American Spine Society (NASS) Coverage Criteria

Background There has been a significant increase in cervical fusion procedures, both anterior and posterior, across the United States. Despite this upward trend, limited research exists on adherence to evidence-based medicine (EBM) guidelines for cervical fusion, highlighting a gap between recommend...

Full description

Saved in:
Bibliographic Details
Published in:Curēus (Palo Alto, CA) CA), 2024-09, Vol.16 (9), p.e68521
Main Authors: Sarikonda, Advith, Isch, Emily, Self, Mitchell, Sambangi, Abhijeet, Carreras, Angeleah, Sivaganesan, Ahilan, Harrop, Jim, Jallo, Jack
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Background There has been a significant increase in cervical fusion procedures, both anterior and posterior, across the United States. Despite this upward trend, limited research exists on adherence to evidence-based medicine (EBM) guidelines for cervical fusion, highlighting a gap between recommended practices and surgeon preferences. Additionally, patients are increasingly utilizing large language models (LLMs) to aid in decision-making. Methodology This observational study evaluated the capacity of four LLMs, namely, Bard, BingAI, ChatGPT-3.5, and ChatGPT-4, to adhere to EBM guidelines, specifically the 2023 North American Spine Society (NASS) cervical fusion guidelines. Ten clinical vignettes were created based on NASS recommendations to determine when fusion was indicated. This novel approach assessed LLM performance in a clinical decision-making context without requiring institutional review board approval, as no human subjects were involved. Results No LLM achieved complete concordance with NASS guidelines, though ChatGPT-4 and Bing Chat exhibited the highest adherence at 60%. Discrepancies were notably observed in scenarios involving head-drop syndrome and pseudoarthrosis, where all LLMs failed to align with NASS recommendations. Additionally, only 25% of LLMs agreed with NASS guidelines for fusion in cases of cervical radiculopathy and as an adjunct to facet cyst resection. Conclusions The study underscores the need for improved LLM training on clinical guidelines and emphasizes the importance of considering the nuances of individual patient cases. While LLMs hold promise for enhancing guideline adherence in cervical fusion decision-making, their current performance indicates a need for further refinement and integration with clinical expertise to ensure optimal patient care. This study contributes to understanding the role of AI in healthcare, advocating for a balanced approach that leverages technological advancements while acknowledging the complexities of surgical decision-making.
ISSN:2168-8184
2168-8184
DOI:10.7759/cureus.68521