Loading…

Can ChatGPT outperform a neurosurgical trainee? A prospective comparative study

This study aimed to compare the performance of ChatGPT, a large language model (LLM), with human neurosurgical applicants in a neurosurgical national selection interview, to assess the potential of artificial intelligence (AI) and LLMs in healthcare and provide insights into their integration into t...

Full description

Saved in:
Bibliographic Details
Published in:British journal of neurosurgery 2024-02, p.1-10
Main Authors: Williams, Simon C, Starup-Hansen, Joachim, Funnell, Jonathan P, Hanrahan, John Gerrard, Valetopoulou, Alexandra, Singh, Navneet, Sinha, Saurabh, Muirhead, William R, Marcus, Hani J
Format: Article
Language:English
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c356t-482ebde37adbc60268ffeb0811f3f41b5c83f6723c322b22f383b000c8a5ac6b3
cites cdi_FETCH-LOGICAL-c356t-482ebde37adbc60268ffeb0811f3f41b5c83f6723c322b22f383b000c8a5ac6b3
container_end_page 10
container_issue
container_start_page 1
container_title British journal of neurosurgery
container_volume
creator Williams, Simon C
Starup-Hansen, Joachim
Funnell, Jonathan P
Hanrahan, John Gerrard
Valetopoulou, Alexandra
Singh, Navneet
Sinha, Saurabh
Muirhead, William R
Marcus, Hani J
description This study aimed to compare the performance of ChatGPT, a large language model (LLM), with human neurosurgical applicants in a neurosurgical national selection interview, to assess the potential of artificial intelligence (AI) and LLMs in healthcare and provide insights into their integration into the field. In a prospective comparative study, a set of neurosurgical national selection-style interview questions were asked to eight human participants and ChatGPT in an online interview. All participants were doctors currently practicing in the UK who had applied for a neurosurgical National Training Number. Interviews were recorded, anonymised, and scored by three neurosurgical consultants with experience as interviewers for national selection. Answers provided by ChatGPT were used as a template for a virtual interview. Interview transcripts were subsequently scored by neurosurgical consultants using criteria utilised in real national selection interviews. Overall interview score and subdomain scores were compared between human participants and ChatGPT. For overall score, ChatGPT fell behind six human competitors and did not achieve a mean score higher than any individuals who achieved training positions. Several factors, including factual inaccuracies and deviations from expected structure and style may have contributed to ChatGPT's underperformance. LLMs such as ChatGPT have huge potential for integration in healthcare. However, this study emphasises the need for further development to address limitations and challenges. While LLMs have not surpassed human performance yet, collaboration between humans and AI systems holds promise for the future of healthcare.
doi_str_mv 10.1080/02688697.2024.2308222
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2929131172</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2929131172</sourcerecordid><originalsourceid>FETCH-LOGICAL-c356t-482ebde37adbc60268ffeb0811f3f41b5c83f6723c322b22f383b000c8a5ac6b3</originalsourceid><addsrcrecordid>eNo9kEtPwzAQhC0EoqXwE0A5cklZr_NwTqiKeEmVyqFI3CzbsSEoL-wEqf-ehLacdrWamR19hFxTWFLgcAeYcJ5k6RIBoyUy4Ih4QuaUJRBClLyfkvmkCSfRjFx4_wVAMYb0nMwYZxAjy-Zkk8smyD9l__S6Ddqh74yzrasDGTRmcK0f3EepZRX0TpaNMffBKujGc2d0X_6YQLd1J538230_FLtLcmZl5c3VYS7I2-PDNn8O15unl3y1DjWLkz6MOBpVGJbKQulk6mmtUcAptcxGVMWaM5ukyDRDVIh2bKwAQHMZS50otiC3-9yxzfdgfC_q0mtTVbIx7eAFZphRRukYsSDxXqrH4t4ZKzpX1tLtBAUxsRRHlmJiKQ4sR9_N4cWgalP8u47w2C_CgW91</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2929131172</pqid></control><display><type>article</type><title>Can ChatGPT outperform a neurosurgical trainee? A prospective comparative study</title><source>Taylor and Francis:Jisc Collections:Taylor and Francis Read and Publish Agreement 2024-2025:Medical Collection (Reading list)</source><creator>Williams, Simon C ; Starup-Hansen, Joachim ; Funnell, Jonathan P ; Hanrahan, John Gerrard ; Valetopoulou, Alexandra ; Singh, Navneet ; Sinha, Saurabh ; Muirhead, William R ; Marcus, Hani J</creator><creatorcontrib>Williams, Simon C ; Starup-Hansen, Joachim ; Funnell, Jonathan P ; Hanrahan, John Gerrard ; Valetopoulou, Alexandra ; Singh, Navneet ; Sinha, Saurabh ; Muirhead, William R ; Marcus, Hani J</creatorcontrib><description>This study aimed to compare the performance of ChatGPT, a large language model (LLM), with human neurosurgical applicants in a neurosurgical national selection interview, to assess the potential of artificial intelligence (AI) and LLMs in healthcare and provide insights into their integration into the field. In a prospective comparative study, a set of neurosurgical national selection-style interview questions were asked to eight human participants and ChatGPT in an online interview. All participants were doctors currently practicing in the UK who had applied for a neurosurgical National Training Number. Interviews were recorded, anonymised, and scored by three neurosurgical consultants with experience as interviewers for national selection. Answers provided by ChatGPT were used as a template for a virtual interview. Interview transcripts were subsequently scored by neurosurgical consultants using criteria utilised in real national selection interviews. Overall interview score and subdomain scores were compared between human participants and ChatGPT. For overall score, ChatGPT fell behind six human competitors and did not achieve a mean score higher than any individuals who achieved training positions. Several factors, including factual inaccuracies and deviations from expected structure and style may have contributed to ChatGPT's underperformance. LLMs such as ChatGPT have huge potential for integration in healthcare. However, this study emphasises the need for further development to address limitations and challenges. While LLMs have not surpassed human performance yet, collaboration between humans and AI systems holds promise for the future of healthcare.</description><identifier>ISSN: 0268-8697</identifier><identifier>EISSN: 1360-046X</identifier><identifier>DOI: 10.1080/02688697.2024.2308222</identifier><identifier>PMID: 38305239</identifier><language>eng</language><publisher>England</publisher><ispartof>British journal of neurosurgery, 2024-02, p.1-10</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c356t-482ebde37adbc60268ffeb0811f3f41b5c83f6723c322b22f383b000c8a5ac6b3</citedby><cites>FETCH-LOGICAL-c356t-482ebde37adbc60268ffeb0811f3f41b5c83f6723c322b22f383b000c8a5ac6b3</cites><orcidid>0000-0003-1770-1797 ; 0000-0001-8800-1961</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27923,27924</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38305239$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Williams, Simon C</creatorcontrib><creatorcontrib>Starup-Hansen, Joachim</creatorcontrib><creatorcontrib>Funnell, Jonathan P</creatorcontrib><creatorcontrib>Hanrahan, John Gerrard</creatorcontrib><creatorcontrib>Valetopoulou, Alexandra</creatorcontrib><creatorcontrib>Singh, Navneet</creatorcontrib><creatorcontrib>Sinha, Saurabh</creatorcontrib><creatorcontrib>Muirhead, William R</creatorcontrib><creatorcontrib>Marcus, Hani J</creatorcontrib><title>Can ChatGPT outperform a neurosurgical trainee? A prospective comparative study</title><title>British journal of neurosurgery</title><addtitle>Br J Neurosurg</addtitle><description>This study aimed to compare the performance of ChatGPT, a large language model (LLM), with human neurosurgical applicants in a neurosurgical national selection interview, to assess the potential of artificial intelligence (AI) and LLMs in healthcare and provide insights into their integration into the field. In a prospective comparative study, a set of neurosurgical national selection-style interview questions were asked to eight human participants and ChatGPT in an online interview. All participants were doctors currently practicing in the UK who had applied for a neurosurgical National Training Number. Interviews were recorded, anonymised, and scored by three neurosurgical consultants with experience as interviewers for national selection. Answers provided by ChatGPT were used as a template for a virtual interview. Interview transcripts were subsequently scored by neurosurgical consultants using criteria utilised in real national selection interviews. Overall interview score and subdomain scores were compared between human participants and ChatGPT. For overall score, ChatGPT fell behind six human competitors and did not achieve a mean score higher than any individuals who achieved training positions. Several factors, including factual inaccuracies and deviations from expected structure and style may have contributed to ChatGPT's underperformance. LLMs such as ChatGPT have huge potential for integration in healthcare. However, this study emphasises the need for further development to address limitations and challenges. While LLMs have not surpassed human performance yet, collaboration between humans and AI systems holds promise for the future of healthcare.</description><issn>0268-8697</issn><issn>1360-046X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNo9kEtPwzAQhC0EoqXwE0A5cklZr_NwTqiKeEmVyqFI3CzbsSEoL-wEqf-ehLacdrWamR19hFxTWFLgcAeYcJ5k6RIBoyUy4Ih4QuaUJRBClLyfkvmkCSfRjFx4_wVAMYb0nMwYZxAjy-Zkk8smyD9l__S6Ddqh74yzrasDGTRmcK0f3EepZRX0TpaNMffBKujGc2d0X_6YQLd1J538230_FLtLcmZl5c3VYS7I2-PDNn8O15unl3y1DjWLkz6MOBpVGJbKQulk6mmtUcAptcxGVMWaM5ukyDRDVIh2bKwAQHMZS50otiC3-9yxzfdgfC_q0mtTVbIx7eAFZphRRukYsSDxXqrH4t4ZKzpX1tLtBAUxsRRHlmJiKQ4sR9_N4cWgalP8u47w2C_CgW91</recordid><startdate>20240202</startdate><enddate>20240202</enddate><creator>Williams, Simon C</creator><creator>Starup-Hansen, Joachim</creator><creator>Funnell, Jonathan P</creator><creator>Hanrahan, John Gerrard</creator><creator>Valetopoulou, Alexandra</creator><creator>Singh, Navneet</creator><creator>Sinha, Saurabh</creator><creator>Muirhead, William R</creator><creator>Marcus, Hani J</creator><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-1770-1797</orcidid><orcidid>https://orcid.org/0000-0001-8800-1961</orcidid></search><sort><creationdate>20240202</creationdate><title>Can ChatGPT outperform a neurosurgical trainee? A prospective comparative study</title><author>Williams, Simon C ; Starup-Hansen, Joachim ; Funnell, Jonathan P ; Hanrahan, John Gerrard ; Valetopoulou, Alexandra ; Singh, Navneet ; Sinha, Saurabh ; Muirhead, William R ; Marcus, Hani J</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c356t-482ebde37adbc60268ffeb0811f3f41b5c83f6723c322b22f383b000c8a5ac6b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Williams, Simon C</creatorcontrib><creatorcontrib>Starup-Hansen, Joachim</creatorcontrib><creatorcontrib>Funnell, Jonathan P</creatorcontrib><creatorcontrib>Hanrahan, John Gerrard</creatorcontrib><creatorcontrib>Valetopoulou, Alexandra</creatorcontrib><creatorcontrib>Singh, Navneet</creatorcontrib><creatorcontrib>Sinha, Saurabh</creatorcontrib><creatorcontrib>Muirhead, William R</creatorcontrib><creatorcontrib>Marcus, Hani J</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>British journal of neurosurgery</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Williams, Simon C</au><au>Starup-Hansen, Joachim</au><au>Funnell, Jonathan P</au><au>Hanrahan, John Gerrard</au><au>Valetopoulou, Alexandra</au><au>Singh, Navneet</au><au>Sinha, Saurabh</au><au>Muirhead, William R</au><au>Marcus, Hani J</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Can ChatGPT outperform a neurosurgical trainee? A prospective comparative study</atitle><jtitle>British journal of neurosurgery</jtitle><addtitle>Br J Neurosurg</addtitle><date>2024-02-02</date><risdate>2024</risdate><spage>1</spage><epage>10</epage><pages>1-10</pages><issn>0268-8697</issn><eissn>1360-046X</eissn><abstract>This study aimed to compare the performance of ChatGPT, a large language model (LLM), with human neurosurgical applicants in a neurosurgical national selection interview, to assess the potential of artificial intelligence (AI) and LLMs in healthcare and provide insights into their integration into the field. In a prospective comparative study, a set of neurosurgical national selection-style interview questions were asked to eight human participants and ChatGPT in an online interview. All participants were doctors currently practicing in the UK who had applied for a neurosurgical National Training Number. Interviews were recorded, anonymised, and scored by three neurosurgical consultants with experience as interviewers for national selection. Answers provided by ChatGPT were used as a template for a virtual interview. Interview transcripts were subsequently scored by neurosurgical consultants using criteria utilised in real national selection interviews. Overall interview score and subdomain scores were compared between human participants and ChatGPT. For overall score, ChatGPT fell behind six human competitors and did not achieve a mean score higher than any individuals who achieved training positions. Several factors, including factual inaccuracies and deviations from expected structure and style may have contributed to ChatGPT's underperformance. LLMs such as ChatGPT have huge potential for integration in healthcare. However, this study emphasises the need for further development to address limitations and challenges. While LLMs have not surpassed human performance yet, collaboration between humans and AI systems holds promise for the future of healthcare.</abstract><cop>England</cop><pmid>38305239</pmid><doi>10.1080/02688697.2024.2308222</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0003-1770-1797</orcidid><orcidid>https://orcid.org/0000-0001-8800-1961</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0268-8697
ispartof British journal of neurosurgery, 2024-02, p.1-10
issn 0268-8697
1360-046X
language eng
recordid cdi_proquest_miscellaneous_2929131172
source Taylor and Francis:Jisc Collections:Taylor and Francis Read and Publish Agreement 2024-2025:Medical Collection (Reading list)
title Can ChatGPT outperform a neurosurgical trainee? A prospective comparative study
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T15%3A23%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Can%20ChatGPT%20outperform%20a%20neurosurgical%20trainee?%20A%20prospective%20comparative%20study&rft.jtitle=British%20journal%20of%20neurosurgery&rft.au=Williams,%20Simon%20C&rft.date=2024-02-02&rft.spage=1&rft.epage=10&rft.pages=1-10&rft.issn=0268-8697&rft.eissn=1360-046X&rft_id=info:doi/10.1080/02688697.2024.2308222&rft_dat=%3Cproquest_cross%3E2929131172%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c356t-482ebde37adbc60268ffeb0811f3f41b5c83f6723c322b22f383b000c8a5ac6b3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2929131172&rft_id=info:pmid/38305239&rfr_iscdi=true