Loading…

Effectiveness of Privacy-Preserving Algorithms for Large Language Models: A Benchmark Analysis

Recently, several privacy-preserving algorithms for NLP have emerged. These algorithms can be suitable for LLMs as they can protect both training and query data. However, there is no benchmark exists to guide the evaluation of these algorithms when applied to LLMs. This paper presents a benchmark fr...

Full description

Saved in:

Bibliographic Details
Main Authors:	Sun, Jinglin, Suleiman, Basem, Ullah, Imdad
Format:	Conference Proceeding
Language:	English
Subjects:	Adaptation models Benchmark testing benchmarks Data models Data privacy differential privacy large language models Measurement Organizations Privacy privacy-preserving algorithms Protection Security Training
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	8
container_issue
container_start_page	1
container_title
container_volume
creator	Sun, Jinglin Suleiman, Basem Ullah, Imdad
description	Recently, several privacy-preserving algorithms for NLP have emerged. These algorithms can be suitable for LLMs as they can protect both training and query data. However, there is no benchmark exists to guide the evaluation of these algorithms when applied to LLMs. This paper presents a benchmark framework for evaluating the effectiveness of privacy-preserving algorithms applied to training and query data for fine-tuning LLMs under various scenarios. The proposed benchmark is designed to be transferable, enabling researchers to assess other privacy-preserving algorithms and LLMs. The benchmark focuses on assessing the privacy-preserving algorithms on training and query data when fine-tuning LLMs in various scenarios. We evaluated the Santext+ algorithm on the open-source Llama2-7b LLM using a sensitive medical transcription dataset. Results demonstrate the algorithm's effectiveness while highlighting the importance of considering specific situations when determining algorithm parameters. This work aims to facilitate the development and evaluation of effective privacy-preserving algorithms for LLMs, contributing to the creation of trusted LLMs that mitigate concerns regarding the misuse of sensitive information.
doi_str_mv	10.1109/PST62714.2024.10788045
format	conference_proceeding
fullrecord	<record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10788045</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10788045</ieee_id><sourcerecordid>10788045</sourcerecordid><originalsourceid>FETCH-ieee_primary_107880453</originalsourceid><addsrcrecordid>eNqFjs1Kw0AUhUehYLF5A5F5gcQ7P80k7tJS6UIh0K5bhngnHU0nZW4byNubha7dnHPg-xaHsWcBmRBQvtS7fS6N0JkEqTMBpihAL-9YUpqyUEtQuYEyv2dzmWuV6sl6YAnRFwAoCcpoOWeHjXPYXP2AAYl473gd_WCbMa0jEsbBh5ZXXdtHfz2dibs-8ncbW5wytDc7jY_-Ezt65RVfYWhOZxu_eRVsN5KnBZs52xEmv_3Int42-_U29Yh4vEQ_2ePx77r6B_8AnYZHfA</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Effectiveness of Privacy-Preserving Algorithms for Large Language Models: A Benchmark Analysis</title><source>IEEE Xplore All Conference Series</source><creator>Sun, Jinglin ; Suleiman, Basem ; Ullah, Imdad</creator><creatorcontrib>Sun, Jinglin ; Suleiman, Basem ; Ullah, Imdad</creatorcontrib><description>Recently, several privacy-preserving algorithms for NLP have emerged. These algorithms can be suitable for LLMs as they can protect both training and query data. However, there is no benchmark exists to guide the evaluation of these algorithms when applied to LLMs. This paper presents a benchmark framework for evaluating the effectiveness of privacy-preserving algorithms applied to training and query data for fine-tuning LLMs under various scenarios. The proposed benchmark is designed to be transferable, enabling researchers to assess other privacy-preserving algorithms and LLMs. The benchmark focuses on assessing the privacy-preserving algorithms on training and query data when fine-tuning LLMs in various scenarios. We evaluated the Santext+ algorithm on the open-source Llama2-7b LLM using a sensitive medical transcription dataset. Results demonstrate the algorithm's effectiveness while highlighting the importance of considering specific situations when determining algorithm parameters. This work aims to facilitate the development and evaluation of effective privacy-preserving algorithms for LLMs, contributing to the creation of trusted LLMs that mitigate concerns regarding the misuse of sensitive information.</description><identifier>EISSN: 2643-4202</identifier><identifier>EISBN: 9798350367096</identifier><identifier>DOI: 10.1109/PST62714.2024.10788045</identifier><language>eng</language><publisher>IEEE</publisher><subject>Adaptation models ; Benchmark testing ; benchmarks ; Data models ; Data privacy ; differential privacy ; large language models ; Measurement ; Organizations ; Privacy ; privacy-preserving algorithms ; Protection ; Security ; Training</subject><ispartof>Annual International Conference on Privacy, Security and Trust (Online), 2024, p.1-8</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10788045$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10788045$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Sun, Jinglin</creatorcontrib><creatorcontrib>Suleiman, Basem</creatorcontrib><creatorcontrib>Ullah, Imdad</creatorcontrib><title>Effectiveness of Privacy-Preserving Algorithms for Large Language Models: A Benchmark Analysis</title><title>Annual International Conference on Privacy, Security and Trust (Online)</title><addtitle>PST</addtitle><description>Recently, several privacy-preserving algorithms for NLP have emerged. These algorithms can be suitable for LLMs as they can protect both training and query data. However, there is no benchmark exists to guide the evaluation of these algorithms when applied to LLMs. This paper presents a benchmark framework for evaluating the effectiveness of privacy-preserving algorithms applied to training and query data for fine-tuning LLMs under various scenarios. The proposed benchmark is designed to be transferable, enabling researchers to assess other privacy-preserving algorithms and LLMs. The benchmark focuses on assessing the privacy-preserving algorithms on training and query data when fine-tuning LLMs in various scenarios. We evaluated the Santext+ algorithm on the open-source Llama2-7b LLM using a sensitive medical transcription dataset. Results demonstrate the algorithm's effectiveness while highlighting the importance of considering specific situations when determining algorithm parameters. This work aims to facilitate the development and evaluation of effective privacy-preserving algorithms for LLMs, contributing to the creation of trusted LLMs that mitigate concerns regarding the misuse of sensitive information.</description><subject>Adaptation models</subject><subject>Benchmark testing</subject><subject>benchmarks</subject><subject>Data models</subject><subject>Data privacy</subject><subject>differential privacy</subject><subject>large language models</subject><subject>Measurement</subject><subject>Organizations</subject><subject>Privacy</subject><subject>privacy-preserving algorithms</subject><subject>Protection</subject><subject>Security</subject><subject>Training</subject><issn>2643-4202</issn><isbn>9798350367096</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2024</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNqFjs1Kw0AUhUehYLF5A5F5gcQ7P80k7tJS6UIh0K5bhngnHU0nZW4byNubha7dnHPg-xaHsWcBmRBQvtS7fS6N0JkEqTMBpihAL-9YUpqyUEtQuYEyv2dzmWuV6sl6YAnRFwAoCcpoOWeHjXPYXP2AAYl473gd_WCbMa0jEsbBh5ZXXdtHfz2dibs-8ncbW5wytDc7jY_-Ezt65RVfYWhOZxu_eRVsN5KnBZs52xEmv_3Int42-_U29Yh4vEQ_2ePx77r6B_8AnYZHfA</recordid><startdate>20240828</startdate><enddate>20240828</enddate><creator>Sun, Jinglin</creator><creator>Suleiman, Basem</creator><creator>Ullah, Imdad</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>20240828</creationdate><title>Effectiveness of Privacy-Preserving Algorithms for Large Language Models: A Benchmark Analysis</title><author>Sun, Jinglin ; Suleiman, Basem ; Ullah, Imdad</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-ieee_primary_107880453</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Adaptation models</topic><topic>Benchmark testing</topic><topic>benchmarks</topic><topic>Data models</topic><topic>Data privacy</topic><topic>differential privacy</topic><topic>large language models</topic><topic>Measurement</topic><topic>Organizations</topic><topic>Privacy</topic><topic>privacy-preserving algorithms</topic><topic>Protection</topic><topic>Security</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Sun, Jinglin</creatorcontrib><creatorcontrib>Suleiman, Basem</creatorcontrib><creatorcontrib>Ullah, Imdad</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE/IET Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Sun, Jinglin</au><au>Suleiman, Basem</au><au>Ullah, Imdad</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Effectiveness of Privacy-Preserving Algorithms for Large Language Models: A Benchmark Analysis</atitle><btitle>Annual International Conference on Privacy, Security and Trust (Online)</btitle><stitle>PST</stitle><date>2024-08-28</date><risdate>2024</risdate><spage>1</spage><epage>8</epage><pages>1-8</pages><eissn>2643-4202</eissn><eisbn>9798350367096</eisbn><abstract>Recently, several privacy-preserving algorithms for NLP have emerged. These algorithms can be suitable for LLMs as they can protect both training and query data. However, there is no benchmark exists to guide the evaluation of these algorithms when applied to LLMs. This paper presents a benchmark framework for evaluating the effectiveness of privacy-preserving algorithms applied to training and query data for fine-tuning LLMs under various scenarios. The proposed benchmark is designed to be transferable, enabling researchers to assess other privacy-preserving algorithms and LLMs. The benchmark focuses on assessing the privacy-preserving algorithms on training and query data when fine-tuning LLMs in various scenarios. We evaluated the Santext+ algorithm on the open-source Llama2-7b LLM using a sensitive medical transcription dataset. Results demonstrate the algorithm's effectiveness while highlighting the importance of considering specific situations when determining algorithm parameters. This work aims to facilitate the development and evaluation of effective privacy-preserving algorithms for LLMs, contributing to the creation of trusted LLMs that mitigate concerns regarding the misuse of sensitive information.</abstract><pub>IEEE</pub><doi>10.1109/PST62714.2024.10788045</doi></addata></record>
fulltext	fulltext_linktorsrc
identifier	EISSN: 2643-4202
ispartof	Annual International Conference on Privacy, Security and Trust (Online), 2024, p.1-8
issn	2643-4202
language	eng
recordid	cdi_ieee_primary_10788045
source	IEEE Xplore All Conference Series
subjects	Adaptation models Benchmark testing benchmarks Data models Data privacy differential privacy large language models Measurement Organizations Privacy privacy-preserving algorithms Protection Security Training
title	Effectiveness of Privacy-Preserving Algorithms for Large Language Models: A Benchmark Analysis
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T13%3A22%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Effectiveness%20of%20Privacy-Preserving%20Algorithms%20for%20Large%20Language%20Models:%20A%20Benchmark%20Analysis&rft.btitle=Annual%20International%20Conference%20on%20Privacy,%20Security%20and%20Trust%20(Online)&rft.au=Sun,%20Jinglin&rft.date=2024-08-28&rft.spage=1&rft.epage=8&rft.pages=1-8&rft.eissn=2643-4202&rft_id=info:doi/10.1109/PST62714.2024.10788045&rft.eisbn=9798350367096&rft_dat=%3Cieee_CHZPO%3E10788045%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-ieee_primary_107880453%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10788045&rfr_iscdi=true