Loading…
News Verifiers Showdown: A Comparative Performance Evaluation of ChatGPT 3.5, ChatGPT 4.0, Bing AI, and Bard in News Fact-Checking
This study aimed to evaluate the proficiency of prominent Large Language Models (LLMs), namely OpenAI's ChatGPT 3.5 and 4.0, Google's Bard(LaMDA), and Microsoft's Bing AI in discerning the truthfulness of news items using black box testing. A total of 100 fact-checked news items, all...
Saved in:
Published in: | arXiv.org 2023-06 |
---|---|
Main Author: | |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | |
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Kevin Matthe Caramancion |
description | This study aimed to evaluate the proficiency of prominent Large Language Models (LLMs), namely OpenAI's ChatGPT 3.5 and 4.0, Google's Bard(LaMDA), and Microsoft's Bing AI in discerning the truthfulness of news items using black box testing. A total of 100 fact-checked news items, all sourced from independent fact-checking agencies, were presented to each of these LLMs under controlled conditions. Their responses were classified into one of three categories: True, False, and Partially True/False. The effectiveness of the LLMs was gauged based on the accuracy of their classifications against the verified facts provided by the independent agencies. The results showed a moderate proficiency across all models, with an average score of 65.25 out of 100. Among the models, OpenAI's GPT-4.0 stood out with a score of 71, suggesting an edge in newer LLMs' abilities to differentiate fact from deception. However, when juxtaposed against the performance of human fact-checkers, the AI models, despite showing promise, lag in comprehending the subtleties and contexts inherent in news information. The findings highlight the potential of AI in the domain of fact-checking while underscoring the continued importance of human cognitive skills and the necessity for persistent advancements in AI capabilities. Finally, the experimental data produced from the simulation of this work is openly available on Kaggle. |
format | article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2832638376</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2832638376</sourcerecordid><originalsourceid>FETCH-proquest_journals_28326383763</originalsourceid><addsrcrecordid>eNqNjUsLgkAUhYcgKMr_cKGths2oSbuSXpsIirZx0WtOj5ma0dz3y5OI1q0O53wfnBbrciFGXhxw3mGOtWff93k05mEouuy1odrCgYzMJRkLu0LXma7VBKaQ6NsdDZbySbAlk2tzQ5USzJ94rZpZK9A5JAWWy-0exDB0fyUY-i7MpDrBdO0CqgxmaDKQCj5_C0xLLykovTRKn7VzvFpyvtljg8V8n6y8u9GPimx5POvKqAYdeSx4JGIxjsR_1ht-u00U</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2832638376</pqid></control><display><type>article</type><title>News Verifiers Showdown: A Comparative Performance Evaluation of ChatGPT 3.5, ChatGPT 4.0, Bing AI, and Bard in News Fact-Checking</title><source>Publicly Available Content (ProQuest)</source><creator>Kevin Matthe Caramancion</creator><creatorcontrib>Kevin Matthe Caramancion</creatorcontrib><description>This study aimed to evaluate the proficiency of prominent Large Language Models (LLMs), namely OpenAI's ChatGPT 3.5 and 4.0, Google's Bard(LaMDA), and Microsoft's Bing AI in discerning the truthfulness of news items using black box testing. A total of 100 fact-checked news items, all sourced from independent fact-checking agencies, were presented to each of these LLMs under controlled conditions. Their responses were classified into one of three categories: True, False, and Partially True/False. The effectiveness of the LLMs was gauged based on the accuracy of their classifications against the verified facts provided by the independent agencies. The results showed a moderate proficiency across all models, with an average score of 65.25 out of 100. Among the models, OpenAI's GPT-4.0 stood out with a score of 71, suggesting an edge in newer LLMs' abilities to differentiate fact from deception. However, when juxtaposed against the performance of human fact-checkers, the AI models, despite showing promise, lag in comprehending the subtleties and contexts inherent in news information. The findings highlight the potential of AI in the domain of fact-checking while underscoring the continued importance of human cognitive skills and the necessity for persistent advancements in AI capabilities. Finally, the experimental data produced from the simulation of this work is openly available on Kaggle.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Artificial intelligence ; Chatbots ; Cognition & reasoning ; Human performance ; Large language models ; News ; Performance evaluation ; Search engines</subject><ispartof>arXiv.org, 2023-06</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2832638376?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>776,780,25732,36991,44569</link.rule.ids></links><search><creatorcontrib>Kevin Matthe Caramancion</creatorcontrib><title>News Verifiers Showdown: A Comparative Performance Evaluation of ChatGPT 3.5, ChatGPT 4.0, Bing AI, and Bard in News Fact-Checking</title><title>arXiv.org</title><description>This study aimed to evaluate the proficiency of prominent Large Language Models (LLMs), namely OpenAI's ChatGPT 3.5 and 4.0, Google's Bard(LaMDA), and Microsoft's Bing AI in discerning the truthfulness of news items using black box testing. A total of 100 fact-checked news items, all sourced from independent fact-checking agencies, were presented to each of these LLMs under controlled conditions. Their responses were classified into one of three categories: True, False, and Partially True/False. The effectiveness of the LLMs was gauged based on the accuracy of their classifications against the verified facts provided by the independent agencies. The results showed a moderate proficiency across all models, with an average score of 65.25 out of 100. Among the models, OpenAI's GPT-4.0 stood out with a score of 71, suggesting an edge in newer LLMs' abilities to differentiate fact from deception. However, when juxtaposed against the performance of human fact-checkers, the AI models, despite showing promise, lag in comprehending the subtleties and contexts inherent in news information. The findings highlight the potential of AI in the domain of fact-checking while underscoring the continued importance of human cognitive skills and the necessity for persistent advancements in AI capabilities. Finally, the experimental data produced from the simulation of this work is openly available on Kaggle.</description><subject>Artificial intelligence</subject><subject>Chatbots</subject><subject>Cognition & reasoning</subject><subject>Human performance</subject><subject>Large language models</subject><subject>News</subject><subject>Performance evaluation</subject><subject>Search engines</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNjUsLgkAUhYcgKMr_cKGths2oSbuSXpsIirZx0WtOj5ma0dz3y5OI1q0O53wfnBbrciFGXhxw3mGOtWff93k05mEouuy1odrCgYzMJRkLu0LXma7VBKaQ6NsdDZbySbAlk2tzQ5USzJ94rZpZK9A5JAWWy-0exDB0fyUY-i7MpDrBdO0CqgxmaDKQCj5_C0xLLykovTRKn7VzvFpyvtljg8V8n6y8u9GPimx5POvKqAYdeSx4JGIxjsR_1ht-u00U</recordid><startdate>20230618</startdate><enddate>20230618</enddate><creator>Kevin Matthe Caramancion</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20230618</creationdate><title>News Verifiers Showdown: A Comparative Performance Evaluation of ChatGPT 3.5, ChatGPT 4.0, Bing AI, and Bard in News Fact-Checking</title><author>Kevin Matthe Caramancion</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28326383763</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Artificial intelligence</topic><topic>Chatbots</topic><topic>Cognition & reasoning</topic><topic>Human performance</topic><topic>Large language models</topic><topic>News</topic><topic>Performance evaluation</topic><topic>Search engines</topic><toplevel>online_resources</toplevel><creatorcontrib>Kevin Matthe Caramancion</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content (ProQuest)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kevin Matthe Caramancion</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>News Verifiers Showdown: A Comparative Performance Evaluation of ChatGPT 3.5, ChatGPT 4.0, Bing AI, and Bard in News Fact-Checking</atitle><jtitle>arXiv.org</jtitle><date>2023-06-18</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>This study aimed to evaluate the proficiency of prominent Large Language Models (LLMs), namely OpenAI's ChatGPT 3.5 and 4.0, Google's Bard(LaMDA), and Microsoft's Bing AI in discerning the truthfulness of news items using black box testing. A total of 100 fact-checked news items, all sourced from independent fact-checking agencies, were presented to each of these LLMs under controlled conditions. Their responses were classified into one of three categories: True, False, and Partially True/False. The effectiveness of the LLMs was gauged based on the accuracy of their classifications against the verified facts provided by the independent agencies. The results showed a moderate proficiency across all models, with an average score of 65.25 out of 100. Among the models, OpenAI's GPT-4.0 stood out with a score of 71, suggesting an edge in newer LLMs' abilities to differentiate fact from deception. However, when juxtaposed against the performance of human fact-checkers, the AI models, despite showing promise, lag in comprehending the subtleties and contexts inherent in news information. The findings highlight the potential of AI in the domain of fact-checking while underscoring the continued importance of human cognitive skills and the necessity for persistent advancements in AI capabilities. Finally, the experimental data produced from the simulation of this work is openly available on Kaggle.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2023-06 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2832638376 |
source | Publicly Available Content (ProQuest) |
subjects | Artificial intelligence Chatbots Cognition & reasoning Human performance Large language models News Performance evaluation Search engines |
title | News Verifiers Showdown: A Comparative Performance Evaluation of ChatGPT 3.5, ChatGPT 4.0, Bing AI, and Bard in News Fact-Checking |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-22T19%3A07%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=News%20Verifiers%20Showdown:%20A%20Comparative%20Performance%20Evaluation%20of%20ChatGPT%203.5,%20ChatGPT%204.0,%20Bing%20AI,%20and%20Bard%20in%20News%20Fact-Checking&rft.jtitle=arXiv.org&rft.au=Kevin%20Matthe%20Caramancion&rft.date=2023-06-18&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2832638376%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_28326383763%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2832638376&rft_id=info:pmid/&rfr_iscdi=true |