Loading…

Cross-Lingual Learning vs. Low-Resource Fine-Tuning: A Case Study with Fact-Checking in Turkish

The rapid spread of misinformation through social media platforms has raised concerns regarding its impact on public opinion. While misinformation is prevalent in other languages, the majority of research in this field has concentrated on the English language. Hence, there is a scarcity of datasets...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2024-03
Main Authors:	Cekinel, Recep Firat, Karagoz, Pinar, Coltekin, Cagri
Format:	Article
Language:	English
Subjects:	Context Datasets English language Languages Large language models
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Cekinel, Recep Firat Karagoz, Pinar Coltekin, Cagri
description	The rapid spread of misinformation through social media platforms has raised concerns regarding its impact on public opinion. While misinformation is prevalent in other languages, the majority of research in this field has concentrated on the English language. Hence, there is a scarcity of datasets for other languages, including Turkish. To address this concern, we have introduced the FCTR dataset, consisting of 3238 real-world claims. This dataset spans multiple domains and incorporates evidence collected from three Turkish fact-checking organizations. Additionally, we aim to assess the effectiveness of cross-lingual transfer learning for low-resource languages, with a particular focus on Turkish. We demonstrate in-context learning (zero-shot and few-shot) performance of large language models in this context. The experimental results indicate that the dataset has the potential to advance research in the Turkish language.
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2937136923</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2937136923</sourcerecordid><originalsourceid>FETCH-proquest_journals_29371369233</originalsourceid><addsrcrecordid>eNqNjsEKgkAYhJcgSMp3-KHzhu6mZreQpIOn8i6Lbbkqa-2_m_T2GfQAnWbgmxlmRjzGeUh3W8YWxEdsgyBgccKiiHukysyASAul7070UEhh9OThhRsohpGeJQ7O1BJypSUt3Rfu4QCZQAkX665vGJVtIBe1pVkj6-7bVhpKZzqFzYrMb6JH6f90Sdb5scxO9GGGp5Noq3ba1xOqWMqTkMfp9Pa_1AfQ4UMH</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2937136923</pqid></control><display><type>article</type><title>Cross-Lingual Learning vs. Low-Resource Fine-Tuning: A Case Study with Fact-Checking in Turkish</title><source>Publicly Available Content (ProQuest)</source><creator>Cekinel, Recep Firat ; Karagoz, Pinar ; Coltekin, Cagri</creator><creatorcontrib>Cekinel, Recep Firat ; Karagoz, Pinar ; Coltekin, Cagri</creatorcontrib><description>The rapid spread of misinformation through social media platforms has raised concerns regarding its impact on public opinion. While misinformation is prevalent in other languages, the majority of research in this field has concentrated on the English language. Hence, there is a scarcity of datasets for other languages, including Turkish. To address this concern, we have introduced the FCTR dataset, consisting of 3238 real-world claims. This dataset spans multiple domains and incorporates evidence collected from three Turkish fact-checking organizations. Additionally, we aim to assess the effectiveness of cross-lingual transfer learning for low-resource languages, with a particular focus on Turkish. We demonstrate in-context learning (zero-shot and few-shot) performance of large language models in this context. The experimental results indicate that the dataset has the potential to advance research in the Turkish language.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Context ; Datasets ; English language ; Languages ; Large language models</subject><ispartof>arXiv.org, 2024-03</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by-nc-sa/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2937136923?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Cekinel, Recep Firat</creatorcontrib><creatorcontrib>Karagoz, Pinar</creatorcontrib><creatorcontrib>Coltekin, Cagri</creatorcontrib><title>Cross-Lingual Learning vs. Low-Resource Fine-Tuning: A Case Study with Fact-Checking in Turkish</title><title>arXiv.org</title><description>The rapid spread of misinformation through social media platforms has raised concerns regarding its impact on public opinion. While misinformation is prevalent in other languages, the majority of research in this field has concentrated on the English language. Hence, there is a scarcity of datasets for other languages, including Turkish. To address this concern, we have introduced the FCTR dataset, consisting of 3238 real-world claims. This dataset spans multiple domains and incorporates evidence collected from three Turkish fact-checking organizations. Additionally, we aim to assess the effectiveness of cross-lingual transfer learning for low-resource languages, with a particular focus on Turkish. We demonstrate in-context learning (zero-shot and few-shot) performance of large language models in this context. The experimental results indicate that the dataset has the potential to advance research in the Turkish language.</description><subject>Context</subject><subject>Datasets</subject><subject>English language</subject><subject>Languages</subject><subject>Large language models</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNjsEKgkAYhJcgSMp3-KHzhu6mZreQpIOn8i6Lbbkqa-2_m_T2GfQAnWbgmxlmRjzGeUh3W8YWxEdsgyBgccKiiHukysyASAul7070UEhh9OThhRsohpGeJQ7O1BJypSUt3Rfu4QCZQAkX665vGJVtIBe1pVkj6-7bVhpKZzqFzYrMb6JH6f90Sdb5scxO9GGGp5Noq3ba1xOqWMqTkMfp9Pa_1AfQ4UMH</recordid><startdate>20240322</startdate><enddate>20240322</enddate><creator>Cekinel, Recep Firat</creator><creator>Karagoz, Pinar</creator><creator>Coltekin, Cagri</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240322</creationdate><title>Cross-Lingual Learning vs. Low-Resource Fine-Tuning: A Case Study with Fact-Checking in Turkish</title><author>Cekinel, Recep Firat ; Karagoz, Pinar ; Coltekin, Cagri</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_29371369233</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Context</topic><topic>Datasets</topic><topic>English language</topic><topic>Languages</topic><topic>Large language models</topic><toplevel>online_resources</toplevel><creatorcontrib>Cekinel, Recep Firat</creatorcontrib><creatorcontrib>Karagoz, Pinar</creatorcontrib><creatorcontrib>Coltekin, Cagri</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content (ProQuest)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cekinel, Recep Firat</au><au>Karagoz, Pinar</au><au>Coltekin, Cagri</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Cross-Lingual Learning vs. Low-Resource Fine-Tuning: A Case Study with Fact-Checking in Turkish</atitle><jtitle>arXiv.org</jtitle><date>2024-03-22</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>The rapid spread of misinformation through social media platforms has raised concerns regarding its impact on public opinion. While misinformation is prevalent in other languages, the majority of research in this field has concentrated on the English language. Hence, there is a scarcity of datasets for other languages, including Turkish. To address this concern, we have introduced the FCTR dataset, consisting of 3238 real-world claims. This dataset spans multiple domains and incorporates evidence collected from three Turkish fact-checking organizations. Additionally, we aim to assess the effectiveness of cross-lingual transfer learning for low-resource languages, with a particular focus on Turkish. We demonstrate in-context learning (zero-shot and few-shot) performance of large language models in this context. The experimental results indicate that the dataset has the potential to advance research in the Turkish language.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-03
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2937136923
source	Publicly Available Content (ProQuest)
subjects	Context Datasets English language Languages Large language models
title	Cross-Lingual Learning vs. Low-Resource Fine-Tuning: A Case Study with Fact-Checking in Turkish
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T06%3A40%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Cross-Lingual%20Learning%20vs.%20Low-Resource%20Fine-Tuning:%20A%20Case%20Study%20with%20Fact-Checking%20in%20Turkish&rft.jtitle=arXiv.org&rft.au=Cekinel,%20Recep%20Firat&rft.date=2024-03-22&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2937136923%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_29371369233%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2937136923&rft_id=info:pmid/&rfr_iscdi=true