Loading…

Cross-Lingual Learning vs. Low-Resource Fine-Tuning: A Case Study with Fact-Checking in Turkish

The rapid spread of misinformation through social media platforms has raised concerns regarding its impact on public opinion. While misinformation is prevalent in other languages, the majority of research in this field has concentrated on the English language. Hence, there is a scarcity of datasets...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2024-03
Main Authors: Cekinel, Recep Firat, Karagoz, Pinar, Coltekin, Cagri
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Cekinel, Recep Firat
Karagoz, Pinar
Coltekin, Cagri
description The rapid spread of misinformation through social media platforms has raised concerns regarding its impact on public opinion. While misinformation is prevalent in other languages, the majority of research in this field has concentrated on the English language. Hence, there is a scarcity of datasets for other languages, including Turkish. To address this concern, we have introduced the FCTR dataset, consisting of 3238 real-world claims. This dataset spans multiple domains and incorporates evidence collected from three Turkish fact-checking organizations. Additionally, we aim to assess the effectiveness of cross-lingual transfer learning for low-resource languages, with a particular focus on Turkish. We demonstrate in-context learning (zero-shot and few-shot) performance of large language models in this context. The experimental results indicate that the dataset has the potential to advance research in the Turkish language.
format article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2937136923</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2937136923</sourcerecordid><originalsourceid>FETCH-proquest_journals_29371369233</originalsourceid><addsrcrecordid>eNqNjsEKgkAYhJcgSMp3-KHzhu6mZreQpIOn8i6Lbbkqa-2_m_T2GfQAnWbgmxlmRjzGeUh3W8YWxEdsgyBgccKiiHukysyASAul7070UEhh9OThhRsohpGeJQ7O1BJypSUt3Rfu4QCZQAkX665vGJVtIBe1pVkj6-7bVhpKZzqFzYrMb6JH6f90Sdb5scxO9GGGp5Noq3ba1xOqWMqTkMfp9Pa_1AfQ4UMH</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2937136923</pqid></control><display><type>article</type><title>Cross-Lingual Learning vs. Low-Resource Fine-Tuning: A Case Study with Fact-Checking in Turkish</title><source>Publicly Available Content (ProQuest)</source><creator>Cekinel, Recep Firat ; Karagoz, Pinar ; Coltekin, Cagri</creator><creatorcontrib>Cekinel, Recep Firat ; Karagoz, Pinar ; Coltekin, Cagri</creatorcontrib><description>The rapid spread of misinformation through social media platforms has raised concerns regarding its impact on public opinion. While misinformation is prevalent in other languages, the majority of research in this field has concentrated on the English language. Hence, there is a scarcity of datasets for other languages, including Turkish. To address this concern, we have introduced the FCTR dataset, consisting of 3238 real-world claims. This dataset spans multiple domains and incorporates evidence collected from three Turkish fact-checking organizations. Additionally, we aim to assess the effectiveness of cross-lingual transfer learning for low-resource languages, with a particular focus on Turkish. We demonstrate in-context learning (zero-shot and few-shot) performance of large language models in this context. The experimental results indicate that the dataset has the potential to advance research in the Turkish language.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Context ; Datasets ; English language ; Languages ; Large language models</subject><ispartof>arXiv.org, 2024-03</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by-nc-sa/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2937136923?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Cekinel, Recep Firat</creatorcontrib><creatorcontrib>Karagoz, Pinar</creatorcontrib><creatorcontrib>Coltekin, Cagri</creatorcontrib><title>Cross-Lingual Learning vs. Low-Resource Fine-Tuning: A Case Study with Fact-Checking in Turkish</title><title>arXiv.org</title><description>The rapid spread of misinformation through social media platforms has raised concerns regarding its impact on public opinion. While misinformation is prevalent in other languages, the majority of research in this field has concentrated on the English language. Hence, there is a scarcity of datasets for other languages, including Turkish. To address this concern, we have introduced the FCTR dataset, consisting of 3238 real-world claims. This dataset spans multiple domains and incorporates evidence collected from three Turkish fact-checking organizations. Additionally, we aim to assess the effectiveness of cross-lingual transfer learning for low-resource languages, with a particular focus on Turkish. We demonstrate in-context learning (zero-shot and few-shot) performance of large language models in this context. The experimental results indicate that the dataset has the potential to advance research in the Turkish language.</description><subject>Context</subject><subject>Datasets</subject><subject>English language</subject><subject>Languages</subject><subject>Large language models</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNjsEKgkAYhJcgSMp3-KHzhu6mZreQpIOn8i6Lbbkqa-2_m_T2GfQAnWbgmxlmRjzGeUh3W8YWxEdsgyBgccKiiHukysyASAul7070UEhh9OThhRsohpGeJQ7O1BJypSUt3Rfu4QCZQAkX665vGJVtIBe1pVkj6-7bVhpKZzqFzYrMb6JH6f90Sdb5scxO9GGGp5Noq3ba1xOqWMqTkMfp9Pa_1AfQ4UMH</recordid><startdate>20240322</startdate><enddate>20240322</enddate><creator>Cekinel, Recep Firat</creator><creator>Karagoz, Pinar</creator><creator>Coltekin, Cagri</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240322</creationdate><title>Cross-Lingual Learning vs. Low-Resource Fine-Tuning: A Case Study with Fact-Checking in Turkish</title><author>Cekinel, Recep Firat ; Karagoz, Pinar ; Coltekin, Cagri</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_29371369233</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Context</topic><topic>Datasets</topic><topic>English language</topic><topic>Languages</topic><topic>Large language models</topic><toplevel>online_resources</toplevel><creatorcontrib>Cekinel, Recep Firat</creatorcontrib><creatorcontrib>Karagoz, Pinar</creatorcontrib><creatorcontrib>Coltekin, Cagri</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content (ProQuest)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cekinel, Recep Firat</au><au>Karagoz, Pinar</au><au>Coltekin, Cagri</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Cross-Lingual Learning vs. Low-Resource Fine-Tuning: A Case Study with Fact-Checking in Turkish</atitle><jtitle>arXiv.org</jtitle><date>2024-03-22</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>The rapid spread of misinformation through social media platforms has raised concerns regarding its impact on public opinion. While misinformation is prevalent in other languages, the majority of research in this field has concentrated on the English language. Hence, there is a scarcity of datasets for other languages, including Turkish. To address this concern, we have introduced the FCTR dataset, consisting of 3238 real-world claims. This dataset spans multiple domains and incorporates evidence collected from three Turkish fact-checking organizations. Additionally, we aim to assess the effectiveness of cross-lingual transfer learning for low-resource languages, with a particular focus on Turkish. We demonstrate in-context learning (zero-shot and few-shot) performance of large language models in this context. The experimental results indicate that the dataset has the potential to advance research in the Turkish language.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-03
issn 2331-8422
language eng
recordid cdi_proquest_journals_2937136923
source Publicly Available Content (ProQuest)
subjects Context
Datasets
English language
Languages
Large language models
title Cross-Lingual Learning vs. Low-Resource Fine-Tuning: A Case Study with Fact-Checking in Turkish
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T06%3A40%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Cross-Lingual%20Learning%20vs.%20Low-Resource%20Fine-Tuning:%20A%20Case%20Study%20with%20Fact-Checking%20in%20Turkish&rft.jtitle=arXiv.org&rft.au=Cekinel,%20Recep%20Firat&rft.date=2024-03-22&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2937136923%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_29371369233%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2937136923&rft_id=info:pmid/&rfr_iscdi=true