Loading…

Use of ChatGPT for Determining Clinical and Surgical Treatment of Lumbar Disc Herniation With Radiculopathy: A North American Spine Society Guideline Comparison

Objective: Large language models like chat generative pre-trained transformer (ChatGPT) have found success in various sectors, but their application in the medical field remains limited. This study aimed to assess the feasibility of using ChatGPT to provide accurate medical information to patients,...

Full description

Saved in:
Bibliographic Details
Published in:Neurospine 2024, 21(1), , pp.149-158
Main Authors: Mejia, Mateo Restrepo, Arroyave, Juan Sebastian, Saturno, Michael, Ndjonko, Laura Chelsea Mazudie, Zaidat, Bashar, Rajjoub, Rami, Ahmed, Wasil, Zapolsky, Ivan, Cho, Samuel K.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c369t-8c13004a7c0bb792363e062c6c6a6b997de2b4456da39ae5580dec3aeedfb1ce3
cites cdi_FETCH-LOGICAL-c369t-8c13004a7c0bb792363e062c6c6a6b997de2b4456da39ae5580dec3aeedfb1ce3
container_end_page 158
container_issue 1
container_start_page 149
container_title Neurospine
container_volume 21
creator Mejia, Mateo Restrepo
Arroyave, Juan Sebastian
Saturno, Michael
Ndjonko, Laura Chelsea Mazudie
Zaidat, Bashar
Rajjoub, Rami
Ahmed, Wasil
Zapolsky, Ivan
Cho, Samuel K.
description Objective: Large language models like chat generative pre-trained transformer (ChatGPT) have found success in various sectors, but their application in the medical field remains limited. This study aimed to assess the feasibility of using ChatGPT to provide accurate medical information to patients, specifically evaluating how well ChatGPT versions 3.5 and 4 aligned with the 2012 North American Spine Society (NASS) guidelines for lumbar disk herniation with radiculopathy.Methods: ChatGPT's responses to questions based on the NASS guidelines were analyzed for accuracy. Three new categories—overconclusiveness, supplementary information, and incompleteness—were introduced to deepen the analysis. Overconclusiveness referred to recommendations not mentioned in the NASS guidelines, supplementary information denoted additional relevant details, and incompleteness indicated omitted crucial information from the NASS guidelines.Results: Out of 29 clinical guidelines evaluated, ChatGPT-3.5 demonstrated accuracy in 15 responses (52%), while ChatGPT-4 achieved accuracy in 17 responses (59%). ChatGPT-3.5 was overconclusive in 14 responses (48%), while ChatGPT-4 exhibited overconclusiveness in 13 responses (45%). Additionally, ChatGPT-3.5 provided supplementary information in 24 responses (83%), and ChatGPT-4 provided supplemental information in 27 responses (93%). In terms of incompleteness, ChatGPT-3.5 displayed this in 11 responses (38%), while ChatGPT-4 showed incompleteness in 8 responses (23%).Conclusion: ChatGPT shows promise for clinical decision-making, but both patients and healthcare providers should exercise caution to ensure safety and quality of care. While these results are encouraging, further research is necessary to validate the use of large language models in clinical settings.
doi_str_mv 10.14245/ns.2347052.526
format article
fullrecord <record><control><sourceid>nrf_doaj_</sourceid><recordid>TN_cdi_nrf_kci_oai_kci_go_kr_ARTI_10418360</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_e903d68c517f457b9e3c4967b7e6f74f</doaj_id><sourcerecordid>oai_kci_go_kr_ARTI_10418360</sourcerecordid><originalsourceid>FETCH-LOGICAL-c369t-8c13004a7c0bb792363e062c6c6a6b997de2b4456da39ae5580dec3aeedfb1ce3</originalsourceid><addsrcrecordid>eNpVkl9r2zAUxc3YWEvW573qeZBUsv5ZexnB29JA2EaTskchy9eJVlsysj3It9lHneqUQp-OdHTPj8vVzbKPBK8Iyxm_9cMqp0xinq94Lt5k1zkvxFJwRd6-nAt6ld0Mg6swY5IzSsn77IoWuSKSievs38MAKDSoPJlx8-uAmhDRVxghds47f0Rlm9SaFhlfo_0Uj_PlEMGMHfjxKbqbusqklBssuoPonRld8Oi3G0_o3tTOTm3ozXg6f0Zr9CPEZK87iAnk0b53HtA-WAfjGW0mV0P75JSh6010Q_AfsneNaQe4edZF9vD926G8W-5-brblere0VKhxWVhCMWZGWlxVUuVUUMAit8IKIyqlZA15xRgXtaHKAOcFrsFSA1A3FbFAF9mnC9fHRj9ap4Nxsx6Dfox6fX_YaoIZKajAqXh7Ka6D-aP76DoTz3NiNkI8ahNHZ1vQoDCtRWE5kQ3jslJALVNCVhJEI1mTWF8urH6qOqhtmmo07Svo6xfvTqmpv6kbpXKRfnSR3V4INoZhiNC8hAnW86ZoP-jnTdFpU-h_eOWyYQ</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Use of ChatGPT for Determining Clinical and Surgical Treatment of Lumbar Disc Herniation With Radiculopathy: A North American Spine Society Guideline Comparison</title><source>PubMed Central</source><creator>Mejia, Mateo Restrepo ; Arroyave, Juan Sebastian ; Saturno, Michael ; Ndjonko, Laura Chelsea Mazudie ; Zaidat, Bashar ; Rajjoub, Rami ; Ahmed, Wasil ; Zapolsky, Ivan ; Cho, Samuel K.</creator><creatorcontrib>Mejia, Mateo Restrepo ; Arroyave, Juan Sebastian ; Saturno, Michael ; Ndjonko, Laura Chelsea Mazudie ; Zaidat, Bashar ; Rajjoub, Rami ; Ahmed, Wasil ; Zapolsky, Ivan ; Cho, Samuel K.</creatorcontrib><description>Objective: Large language models like chat generative pre-trained transformer (ChatGPT) have found success in various sectors, but their application in the medical field remains limited. This study aimed to assess the feasibility of using ChatGPT to provide accurate medical information to patients, specifically evaluating how well ChatGPT versions 3.5 and 4 aligned with the 2012 North American Spine Society (NASS) guidelines for lumbar disk herniation with radiculopathy.Methods: ChatGPT's responses to questions based on the NASS guidelines were analyzed for accuracy. Three new categories—overconclusiveness, supplementary information, and incompleteness—were introduced to deepen the analysis. Overconclusiveness referred to recommendations not mentioned in the NASS guidelines, supplementary information denoted additional relevant details, and incompleteness indicated omitted crucial information from the NASS guidelines.Results: Out of 29 clinical guidelines evaluated, ChatGPT-3.5 demonstrated accuracy in 15 responses (52%), while ChatGPT-4 achieved accuracy in 17 responses (59%). ChatGPT-3.5 was overconclusive in 14 responses (48%), while ChatGPT-4 exhibited overconclusiveness in 13 responses (45%). Additionally, ChatGPT-3.5 provided supplementary information in 24 responses (83%), and ChatGPT-4 provided supplemental information in 27 responses (93%). In terms of incompleteness, ChatGPT-3.5 displayed this in 11 responses (38%), while ChatGPT-4 showed incompleteness in 8 responses (23%).Conclusion: ChatGPT shows promise for clinical decision-making, but both patients and healthcare providers should exercise caution to ensure safety and quality of care. While these results are encouraging, further research is necessary to validate the use of large language models in clinical settings.</description><identifier>ISSN: 2586-6583</identifier><identifier>EISSN: 2586-6591</identifier><identifier>DOI: 10.14245/ns.2347052.526</identifier><identifier>PMID: 38291746</identifier><language>eng</language><publisher>Korean Spinal Neurosurgery Society</publisher><subject>artificial intelligence ; chatgpt ; lumbar disk herniation with radiculopathy ; north american spine society guidelines ; Original ; qualitative study ; 신경외과학</subject><ispartof>Neurospine, 2024, 21(1), , pp.149-158</ispartof><rights>Copyright © 2024 by the Korean Spinal Neurosurgery Society 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c369t-8c13004a7c0bb792363e062c6c6a6b997de2b4456da39ae5580dec3aeedfb1ce3</citedby><cites>FETCH-LOGICAL-c369t-8c13004a7c0bb792363e062c6c6a6b997de2b4456da39ae5580dec3aeedfb1ce3</cites><orcidid>0009-0003-0457-3308 ; 0000-0002-3153-4967 ; 0000-0001-7511-2486 ; 0009-0005-2990-7874 ; 0009-0003-9480-0657 ; 0009-0001-0904-1891 ; 0000-0002-8823-720X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC10992643/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC10992643/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,27923,27924,53790,53792</link.rule.ids><backlink>$$Uhttps://www.kci.go.kr/kciportal/ci/sereArticleSearch/ciSereArtiView.kci?sereArticleSearchBean.artiId=ART003067530$$DAccess content in National Research Foundation of Korea (NRF)$$Hfree_for_read</backlink></links><search><creatorcontrib>Mejia, Mateo Restrepo</creatorcontrib><creatorcontrib>Arroyave, Juan Sebastian</creatorcontrib><creatorcontrib>Saturno, Michael</creatorcontrib><creatorcontrib>Ndjonko, Laura Chelsea Mazudie</creatorcontrib><creatorcontrib>Zaidat, Bashar</creatorcontrib><creatorcontrib>Rajjoub, Rami</creatorcontrib><creatorcontrib>Ahmed, Wasil</creatorcontrib><creatorcontrib>Zapolsky, Ivan</creatorcontrib><creatorcontrib>Cho, Samuel K.</creatorcontrib><title>Use of ChatGPT for Determining Clinical and Surgical Treatment of Lumbar Disc Herniation With Radiculopathy: A North American Spine Society Guideline Comparison</title><title>Neurospine</title><description>Objective: Large language models like chat generative pre-trained transformer (ChatGPT) have found success in various sectors, but their application in the medical field remains limited. This study aimed to assess the feasibility of using ChatGPT to provide accurate medical information to patients, specifically evaluating how well ChatGPT versions 3.5 and 4 aligned with the 2012 North American Spine Society (NASS) guidelines for lumbar disk herniation with radiculopathy.Methods: ChatGPT's responses to questions based on the NASS guidelines were analyzed for accuracy. Three new categories—overconclusiveness, supplementary information, and incompleteness—were introduced to deepen the analysis. Overconclusiveness referred to recommendations not mentioned in the NASS guidelines, supplementary information denoted additional relevant details, and incompleteness indicated omitted crucial information from the NASS guidelines.Results: Out of 29 clinical guidelines evaluated, ChatGPT-3.5 demonstrated accuracy in 15 responses (52%), while ChatGPT-4 achieved accuracy in 17 responses (59%). ChatGPT-3.5 was overconclusive in 14 responses (48%), while ChatGPT-4 exhibited overconclusiveness in 13 responses (45%). Additionally, ChatGPT-3.5 provided supplementary information in 24 responses (83%), and ChatGPT-4 provided supplemental information in 27 responses (93%). In terms of incompleteness, ChatGPT-3.5 displayed this in 11 responses (38%), while ChatGPT-4 showed incompleteness in 8 responses (23%).Conclusion: ChatGPT shows promise for clinical decision-making, but both patients and healthcare providers should exercise caution to ensure safety and quality of care. While these results are encouraging, further research is necessary to validate the use of large language models in clinical settings.</description><subject>artificial intelligence</subject><subject>chatgpt</subject><subject>lumbar disk herniation with radiculopathy</subject><subject>north american spine society guidelines</subject><subject>Original</subject><subject>qualitative study</subject><subject>신경외과학</subject><issn>2586-6583</issn><issn>2586-6591</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>DOA</sourceid><recordid>eNpVkl9r2zAUxc3YWEvW573qeZBUsv5ZexnB29JA2EaTskchy9eJVlsysj3It9lHneqUQp-OdHTPj8vVzbKPBK8Iyxm_9cMqp0xinq94Lt5k1zkvxFJwRd6-nAt6ld0Mg6swY5IzSsn77IoWuSKSievs38MAKDSoPJlx8-uAmhDRVxghds47f0Rlm9SaFhlfo_0Uj_PlEMGMHfjxKbqbusqklBssuoPonRld8Oi3G0_o3tTOTm3ozXg6f0Zr9CPEZK87iAnk0b53HtA-WAfjGW0mV0P75JSh6010Q_AfsneNaQe4edZF9vD926G8W-5-brblere0VKhxWVhCMWZGWlxVUuVUUMAit8IKIyqlZA15xRgXtaHKAOcFrsFSA1A3FbFAF9mnC9fHRj9ap4Nxsx6Dfox6fX_YaoIZKajAqXh7Ka6D-aP76DoTz3NiNkI8ahNHZ1vQoDCtRWE5kQ3jslJALVNCVhJEI1mTWF8urH6qOqhtmmo07Svo6xfvTqmpv6kbpXKRfnSR3V4INoZhiNC8hAnW86ZoP-jnTdFpU-h_eOWyYQ</recordid><startdate>20240301</startdate><enddate>20240301</enddate><creator>Mejia, Mateo Restrepo</creator><creator>Arroyave, Juan Sebastian</creator><creator>Saturno, Michael</creator><creator>Ndjonko, Laura Chelsea Mazudie</creator><creator>Zaidat, Bashar</creator><creator>Rajjoub, Rami</creator><creator>Ahmed, Wasil</creator><creator>Zapolsky, Ivan</creator><creator>Cho, Samuel K.</creator><general>Korean Spinal Neurosurgery Society</general><general>대한척추신경외과학회</general><scope>AAYXX</scope><scope>CITATION</scope><scope>5PM</scope><scope>DOA</scope><scope>ACYCR</scope><orcidid>https://orcid.org/0009-0003-0457-3308</orcidid><orcidid>https://orcid.org/0000-0002-3153-4967</orcidid><orcidid>https://orcid.org/0000-0001-7511-2486</orcidid><orcidid>https://orcid.org/0009-0005-2990-7874</orcidid><orcidid>https://orcid.org/0009-0003-9480-0657</orcidid><orcidid>https://orcid.org/0009-0001-0904-1891</orcidid><orcidid>https://orcid.org/0000-0002-8823-720X</orcidid></search><sort><creationdate>20240301</creationdate><title>Use of ChatGPT for Determining Clinical and Surgical Treatment of Lumbar Disc Herniation With Radiculopathy: A North American Spine Society Guideline Comparison</title><author>Mejia, Mateo Restrepo ; Arroyave, Juan Sebastian ; Saturno, Michael ; Ndjonko, Laura Chelsea Mazudie ; Zaidat, Bashar ; Rajjoub, Rami ; Ahmed, Wasil ; Zapolsky, Ivan ; Cho, Samuel K.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c369t-8c13004a7c0bb792363e062c6c6a6b997de2b4456da39ae5580dec3aeedfb1ce3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>artificial intelligence</topic><topic>chatgpt</topic><topic>lumbar disk herniation with radiculopathy</topic><topic>north american spine society guidelines</topic><topic>Original</topic><topic>qualitative study</topic><topic>신경외과학</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Mejia, Mateo Restrepo</creatorcontrib><creatorcontrib>Arroyave, Juan Sebastian</creatorcontrib><creatorcontrib>Saturno, Michael</creatorcontrib><creatorcontrib>Ndjonko, Laura Chelsea Mazudie</creatorcontrib><creatorcontrib>Zaidat, Bashar</creatorcontrib><creatorcontrib>Rajjoub, Rami</creatorcontrib><creatorcontrib>Ahmed, Wasil</creatorcontrib><creatorcontrib>Zapolsky, Ivan</creatorcontrib><creatorcontrib>Cho, Samuel K.</creatorcontrib><collection>CrossRef</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><collection>Korean Citation Index</collection><jtitle>Neurospine</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Mejia, Mateo Restrepo</au><au>Arroyave, Juan Sebastian</au><au>Saturno, Michael</au><au>Ndjonko, Laura Chelsea Mazudie</au><au>Zaidat, Bashar</au><au>Rajjoub, Rami</au><au>Ahmed, Wasil</au><au>Zapolsky, Ivan</au><au>Cho, Samuel K.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Use of ChatGPT for Determining Clinical and Surgical Treatment of Lumbar Disc Herniation With Radiculopathy: A North American Spine Society Guideline Comparison</atitle><jtitle>Neurospine</jtitle><date>2024-03-01</date><risdate>2024</risdate><volume>21</volume><issue>1</issue><spage>149</spage><epage>158</epage><pages>149-158</pages><issn>2586-6583</issn><eissn>2586-6591</eissn><abstract>Objective: Large language models like chat generative pre-trained transformer (ChatGPT) have found success in various sectors, but their application in the medical field remains limited. This study aimed to assess the feasibility of using ChatGPT to provide accurate medical information to patients, specifically evaluating how well ChatGPT versions 3.5 and 4 aligned with the 2012 North American Spine Society (NASS) guidelines for lumbar disk herniation with radiculopathy.Methods: ChatGPT's responses to questions based on the NASS guidelines were analyzed for accuracy. Three new categories—overconclusiveness, supplementary information, and incompleteness—were introduced to deepen the analysis. Overconclusiveness referred to recommendations not mentioned in the NASS guidelines, supplementary information denoted additional relevant details, and incompleteness indicated omitted crucial information from the NASS guidelines.Results: Out of 29 clinical guidelines evaluated, ChatGPT-3.5 demonstrated accuracy in 15 responses (52%), while ChatGPT-4 achieved accuracy in 17 responses (59%). ChatGPT-3.5 was overconclusive in 14 responses (48%), while ChatGPT-4 exhibited overconclusiveness in 13 responses (45%). Additionally, ChatGPT-3.5 provided supplementary information in 24 responses (83%), and ChatGPT-4 provided supplemental information in 27 responses (93%). In terms of incompleteness, ChatGPT-3.5 displayed this in 11 responses (38%), while ChatGPT-4 showed incompleteness in 8 responses (23%).Conclusion: ChatGPT shows promise for clinical decision-making, but both patients and healthcare providers should exercise caution to ensure safety and quality of care. While these results are encouraging, further research is necessary to validate the use of large language models in clinical settings.</abstract><pub>Korean Spinal Neurosurgery Society</pub><pmid>38291746</pmid><doi>10.14245/ns.2347052.526</doi><tpages>10</tpages><orcidid>https://orcid.org/0009-0003-0457-3308</orcidid><orcidid>https://orcid.org/0000-0002-3153-4967</orcidid><orcidid>https://orcid.org/0000-0001-7511-2486</orcidid><orcidid>https://orcid.org/0009-0005-2990-7874</orcidid><orcidid>https://orcid.org/0009-0003-9480-0657</orcidid><orcidid>https://orcid.org/0009-0001-0904-1891</orcidid><orcidid>https://orcid.org/0000-0002-8823-720X</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2586-6583
ispartof Neurospine, 2024, 21(1), , pp.149-158
issn 2586-6583
2586-6591
language eng
recordid cdi_nrf_kci_oai_kci_go_kr_ARTI_10418360
source PubMed Central
subjects artificial intelligence
chatgpt
lumbar disk herniation with radiculopathy
north american spine society guidelines
Original
qualitative study
신경외과학
title Use of ChatGPT for Determining Clinical and Surgical Treatment of Lumbar Disc Herniation With Radiculopathy: A North American Spine Society Guideline Comparison
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T15%3A18%3A15IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-nrf_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Use%20of%20ChatGPT%20for%20Determining%20Clinical%20and%20Surgical%20Treatment%20of%20Lumbar%20Disc%20Herniation%20With%20Radiculopathy:%20A%20North%20American%20Spine%20Society%20Guideline%20Comparison&rft.jtitle=Neurospine&rft.au=Mejia,%20Mateo%20Restrepo&rft.date=2024-03-01&rft.volume=21&rft.issue=1&rft.spage=149&rft.epage=158&rft.pages=149-158&rft.issn=2586-6583&rft.eissn=2586-6591&rft_id=info:doi/10.14245/ns.2347052.526&rft_dat=%3Cnrf_doaj_%3Eoai_kci_go_kr_ARTI_10418360%3C/nrf_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c369t-8c13004a7c0bb792363e062c6c6a6b997de2b4456da39ae5580dec3aeedfb1ce3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/38291746&rfr_iscdi=true