Loading…
Cross-Lingual Learning vs. Low-Resource Fine-Tuning: A Case Study with Fact-Checking in Turkish
The rapid spread of misinformation through social media platforms has raised concerns regarding its impact on public opinion. While misinformation is prevalent in other languages, the majority of research in this field has concentrated on the English language. Hence, there is a scarcity of datasets...
Saved in:
Published in: | arXiv.org 2024-03 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | |
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Cekinel, Recep Firat Karagoz, Pinar Coltekin, Cagri |
description | The rapid spread of misinformation through social media platforms has raised concerns regarding its impact on public opinion. While misinformation is prevalent in other languages, the majority of research in this field has concentrated on the English language. Hence, there is a scarcity of datasets for other languages, including Turkish. To address this concern, we have introduced the FCTR dataset, consisting of 3238 real-world claims. This dataset spans multiple domains and incorporates evidence collected from three Turkish fact-checking organizations. Additionally, we aim to assess the effectiveness of cross-lingual transfer learning for low-resource languages, with a particular focus on Turkish. We demonstrate in-context learning (zero-shot and few-shot) performance of large language models in this context. The experimental results indicate that the dataset has the potential to advance research in the Turkish language. |
format | article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2937136923</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2937136923</sourcerecordid><originalsourceid>FETCH-proquest_journals_29371369233</originalsourceid><addsrcrecordid>eNqNjsEKgkAYhJcgSMp3-KHzhu6mZreQpIOn8i6Lbbkqa-2_m_T2GfQAnWbgmxlmRjzGeUh3W8YWxEdsgyBgccKiiHukysyASAul7070UEhh9OThhRsohpGeJQ7O1BJypSUt3Rfu4QCZQAkX665vGJVtIBe1pVkj6-7bVhpKZzqFzYrMb6JH6f90Sdb5scxO9GGGp5Noq3ba1xOqWMqTkMfp9Pa_1AfQ4UMH</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2937136923</pqid></control><display><type>article</type><title>Cross-Lingual Learning vs. Low-Resource Fine-Tuning: A Case Study with Fact-Checking in Turkish</title><source>Publicly Available Content (ProQuest)</source><creator>Cekinel, Recep Firat ; Karagoz, Pinar ; Coltekin, Cagri</creator><creatorcontrib>Cekinel, Recep Firat ; Karagoz, Pinar ; Coltekin, Cagri</creatorcontrib><description>The rapid spread of misinformation through social media platforms has raised concerns regarding its impact on public opinion. While misinformation is prevalent in other languages, the majority of research in this field has concentrated on the English language. Hence, there is a scarcity of datasets for other languages, including Turkish. To address this concern, we have introduced the FCTR dataset, consisting of 3238 real-world claims. This dataset spans multiple domains and incorporates evidence collected from three Turkish fact-checking organizations. Additionally, we aim to assess the effectiveness of cross-lingual transfer learning for low-resource languages, with a particular focus on Turkish. We demonstrate in-context learning (zero-shot and few-shot) performance of large language models in this context. The experimental results indicate that the dataset has the potential to advance research in the Turkish language.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Context ; Datasets ; English language ; Languages ; Large language models</subject><ispartof>arXiv.org, 2024-03</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by-nc-sa/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2937136923?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Cekinel, Recep Firat</creatorcontrib><creatorcontrib>Karagoz, Pinar</creatorcontrib><creatorcontrib>Coltekin, Cagri</creatorcontrib><title>Cross-Lingual Learning vs. Low-Resource Fine-Tuning: A Case Study with Fact-Checking in Turkish</title><title>arXiv.org</title><description>The rapid spread of misinformation through social media platforms has raised concerns regarding its impact on public opinion. While misinformation is prevalent in other languages, the majority of research in this field has concentrated on the English language. Hence, there is a scarcity of datasets for other languages, including Turkish. To address this concern, we have introduced the FCTR dataset, consisting of 3238 real-world claims. This dataset spans multiple domains and incorporates evidence collected from three Turkish fact-checking organizations. Additionally, we aim to assess the effectiveness of cross-lingual transfer learning for low-resource languages, with a particular focus on Turkish. We demonstrate in-context learning (zero-shot and few-shot) performance of large language models in this context. The experimental results indicate that the dataset has the potential to advance research in the Turkish language.</description><subject>Context</subject><subject>Datasets</subject><subject>English language</subject><subject>Languages</subject><subject>Large language models</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNjsEKgkAYhJcgSMp3-KHzhu6mZreQpIOn8i6Lbbkqa-2_m_T2GfQAnWbgmxlmRjzGeUh3W8YWxEdsgyBgccKiiHukysyASAul7070UEhh9OThhRsohpGeJQ7O1BJypSUt3Rfu4QCZQAkX665vGJVtIBe1pVkj6-7bVhpKZzqFzYrMb6JH6f90Sdb5scxO9GGGp5Noq3ba1xOqWMqTkMfp9Pa_1AfQ4UMH</recordid><startdate>20240322</startdate><enddate>20240322</enddate><creator>Cekinel, Recep Firat</creator><creator>Karagoz, Pinar</creator><creator>Coltekin, Cagri</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240322</creationdate><title>Cross-Lingual Learning vs. Low-Resource Fine-Tuning: A Case Study with Fact-Checking in Turkish</title><author>Cekinel, Recep Firat ; Karagoz, Pinar ; Coltekin, Cagri</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_29371369233</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Context</topic><topic>Datasets</topic><topic>English language</topic><topic>Languages</topic><topic>Large language models</topic><toplevel>online_resources</toplevel><creatorcontrib>Cekinel, Recep Firat</creatorcontrib><creatorcontrib>Karagoz, Pinar</creatorcontrib><creatorcontrib>Coltekin, Cagri</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content (ProQuest)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cekinel, Recep Firat</au><au>Karagoz, Pinar</au><au>Coltekin, Cagri</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Cross-Lingual Learning vs. Low-Resource Fine-Tuning: A Case Study with Fact-Checking in Turkish</atitle><jtitle>arXiv.org</jtitle><date>2024-03-22</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>The rapid spread of misinformation through social media platforms has raised concerns regarding its impact on public opinion. While misinformation is prevalent in other languages, the majority of research in this field has concentrated on the English language. Hence, there is a scarcity of datasets for other languages, including Turkish. To address this concern, we have introduced the FCTR dataset, consisting of 3238 real-world claims. This dataset spans multiple domains and incorporates evidence collected from three Turkish fact-checking organizations. Additionally, we aim to assess the effectiveness of cross-lingual transfer learning for low-resource languages, with a particular focus on Turkish. We demonstrate in-context learning (zero-shot and few-shot) performance of large language models in this context. The experimental results indicate that the dataset has the potential to advance research in the Turkish language.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2024-03 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2937136923 |
source | Publicly Available Content (ProQuest) |
subjects | Context Datasets English language Languages Large language models |
title | Cross-Lingual Learning vs. Low-Resource Fine-Tuning: A Case Study with Fact-Checking in Turkish |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T06%3A40%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Cross-Lingual%20Learning%20vs.%20Low-Resource%20Fine-Tuning:%20A%20Case%20Study%20with%20Fact-Checking%20in%20Turkish&rft.jtitle=arXiv.org&rft.au=Cekinel,%20Recep%20Firat&rft.date=2024-03-22&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2937136923%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_29371369233%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2937136923&rft_id=info:pmid/&rfr_iscdi=true |