Loading…

With Great Dispersion Comes Greater Resilience: Efficient Poisoning Attacks and Defenses for Linear Regression Models

With the rise of third parties in the machine learning pipeline, the service provider in "Machine Learning as a Service" (MLaaS), or external data contributors in online learning, or the retraining of existing models, the need to ensure the security of the resulting machine learning models...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on information forensics and security 2021, Vol.16, p.3709-3723
Main Authors:	Wen, Jialin, Zhao, Benjamin Zi Hao, Xue, Minhui, Oprea, Alina, Qian, Haifeng
Format:	Article
Language:	English
Subjects:	Algorithms Complexity Data models Data poisoning attacks and defenses Datasets Distance learning Linear regression linear regression models Loans Machine learning Numerical models Poisoning Predictive models Regression models Security Statistical analysis Time complexity Training
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c293t-c438b8aeb5484589899b58c92dc5fdfd2a7c454092bd3e9390465d2a038dd3813
cites	cdi_FETCH-LOGICAL-c293t-c438b8aeb5484589899b58c92dc5fdfd2a7c454092bd3e9390465d2a038dd3813
container_end_page	3723
container_issue
container_start_page	3709
container_title	IEEE transactions on information forensics and security
container_volume	16
creator	Wen, Jialin Zhao, Benjamin Zi Hao Xue, Minhui Oprea, Alina Qian, Haifeng
description	With the rise of third parties in the machine learning pipeline, the service provider in "Machine Learning as a Service" (MLaaS), or external data contributors in online learning, or the retraining of existing models, the need to ensure the security of the resulting machine learning models has become an increasingly important topic. The security community has demonstrated that without transparency of the data and the resulting model, there exist many potential security risks, with new risks constantly being discovered. In this paper, we focus on one of these security risks - poisoning attacks . Specifically, we analyze how attackers may interfere with the results of regression learning by poisoning the training datasets. To this end, we analyze and develop a new poisoning attack algorithm. Our attack, termed Nopt , in contrast with previous poisoning attack algorithms, can produce larger errors with the same proportion of poisoning data-points. Furthermore, we also significantly improve the state-of-the-art defense algorithm, termed TRIM, proposed by Jagielsk et al. (IEEE S&P 2018), by incorporating the concept of probability estimation of clean data-points into the algorithm. Our new defense algorithm, termed Proda , demonstrates an increased effectiveness in reducing errors arising from the poisoning dataset through optimizing ensemble models. We highlight that the time complexity of TRIM had not been estimated; however, we deduce from their work that TRIM can take exponential time complexity in the worst-case scenario, in excess of Proda 's logarithmic time. The performance of both our proposed attack and defense algorithms is extensively evaluated on four real-world datasets of housing prices, loans, health care, and bike sharing services. We hope that our work will inspire future research to develop more robust learning algorithms immune to poisoning attacks.
doi_str_mv	10.1109/TIFS.2021.3087332
format	article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TIFS_2021_3087332</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9448089</ieee_id><sourcerecordid>2555726130</sourcerecordid><originalsourceid>FETCH-LOGICAL-c293t-c438b8aeb5484589899b58c92dc5fdfd2a7c454092bd3e9390465d2a038dd3813</originalsourceid><addsrcrecordid>eNo9kMtOwzAURCMEEqXwAYiNJdYpfqY2u6ovKhWBoIhl5DjXxaWNi50u-HsSUnV1R6M5c6VJkluCB4Rg9bBazN4HFFMyYFgOGaNnSY8IkaVZ452fNGGXyVWMG4w5J5nsJYdPV3-heQBdo4mLewjR-QqN_Q5iZ0NAbxDd1kFl4BFNrXWm0TV69S76ylVrNKprbb4j0lWJJmChig1sfUBLV4Fu-XWA-F_87EvYxuvkwupthJvj7Scfs-lq_JQuX-aL8WiZGqpYnRrOZCE1FIJLLqSSShVCGkVLI2xpS6qHhguOFS1KBoopzDPRuJjJsmSSsH5y3_Xug_85QKzzjT-EqnmZUyHEkGaE4SZFupQJPsYANt8Ht9PhNyc4b9fN23Xzdt38uG7D3HWMA4BTXnEusVTsD5Itdm8</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2555726130</pqid></control><display><type>article</type><title>With Great Dispersion Comes Greater Resilience: Efficient Poisoning Attacks and Defenses for Linear Regression Models</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Wen, Jialin ; Zhao, Benjamin Zi Hao ; Xue, Minhui ; Oprea, Alina ; Qian, Haifeng</creator><creatorcontrib>Wen, Jialin ; Zhao, Benjamin Zi Hao ; Xue, Minhui ; Oprea, Alina ; Qian, Haifeng</creatorcontrib><description>With the rise of third parties in the machine learning pipeline, the service provider in "Machine Learning as a Service" (MLaaS), or external data contributors in online learning, or the retraining of existing models, the need to ensure the security of the resulting machine learning models has become an increasingly important topic. The security community has demonstrated that without transparency of the data and the resulting model, there exist many potential security risks, with new risks constantly being discovered. In this paper, we focus on one of these security risks - poisoning attacks . Specifically, we analyze how attackers may interfere with the results of regression learning by poisoning the training datasets. To this end, we analyze and develop a new poisoning attack algorithm. Our attack, termed Nopt , in contrast with previous poisoning attack algorithms, can produce larger errors with the same proportion of poisoning data-points. Furthermore, we also significantly improve the state-of-the-art defense algorithm, termed TRIM, proposed by Jagielsk et al. (IEEE S&P 2018), by incorporating the concept of probability estimation of clean data-points into the algorithm. Our new defense algorithm, termed Proda , demonstrates an increased effectiveness in reducing errors arising from the poisoning dataset through optimizing ensemble models. We highlight that the time complexity of TRIM had not been estimated; however, we deduce from their work that TRIM can take exponential time complexity in the worst-case scenario, in excess of Proda 's logarithmic time. The performance of both our proposed attack and defense algorithms is extensively evaluated on four real-world datasets of housing prices, loans, health care, and bike sharing services. We hope that our work will inspire future research to develop more robust learning algorithms immune to poisoning attacks.</description><identifier>ISSN: 1556-6013</identifier><identifier>EISSN: 1556-6021</identifier><identifier>DOI: 10.1109/TIFS.2021.3087332</identifier><identifier>CODEN: ITIFA6</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithms ; Complexity ; Data models ; Data poisoning attacks and defenses ; Datasets ; Distance learning ; Linear regression ; linear regression models ; Loans ; Machine learning ; Numerical models ; Poisoning ; Predictive models ; Regression models ; Security ; Statistical analysis ; Time complexity ; Training</subject><ispartof>IEEE transactions on information forensics and security, 2021, Vol.16, p.3709-3723</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c293t-c438b8aeb5484589899b58c92dc5fdfd2a7c454092bd3e9390465d2a038dd3813</citedby><cites>FETCH-LOGICAL-c293t-c438b8aeb5484589899b58c92dc5fdfd2a7c454092bd3e9390465d2a038dd3813</cites><orcidid>0000-0003-4920-5405 ; 0000-0002-9172-4252</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9448089$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,4009,27902,27903,27904,54774</link.rule.ids></links><search><creatorcontrib>Wen, Jialin</creatorcontrib><creatorcontrib>Zhao, Benjamin Zi Hao</creatorcontrib><creatorcontrib>Xue, Minhui</creatorcontrib><creatorcontrib>Oprea, Alina</creatorcontrib><creatorcontrib>Qian, Haifeng</creatorcontrib><title>With Great Dispersion Comes Greater Resilience: Efficient Poisoning Attacks and Defenses for Linear Regression Models</title><title>IEEE transactions on information forensics and security</title><addtitle>TIFS</addtitle><description>With the rise of third parties in the machine learning pipeline, the service provider in "Machine Learning as a Service" (MLaaS), or external data contributors in online learning, or the retraining of existing models, the need to ensure the security of the resulting machine learning models has become an increasingly important topic. The security community has demonstrated that without transparency of the data and the resulting model, there exist many potential security risks, with new risks constantly being discovered. In this paper, we focus on one of these security risks - poisoning attacks . Specifically, we analyze how attackers may interfere with the results of regression learning by poisoning the training datasets. To this end, we analyze and develop a new poisoning attack algorithm. Our attack, termed Nopt , in contrast with previous poisoning attack algorithms, can produce larger errors with the same proportion of poisoning data-points. Furthermore, we also significantly improve the state-of-the-art defense algorithm, termed TRIM, proposed by Jagielsk et al. (IEEE S&P 2018), by incorporating the concept of probability estimation of clean data-points into the algorithm. Our new defense algorithm, termed Proda , demonstrates an increased effectiveness in reducing errors arising from the poisoning dataset through optimizing ensemble models. We highlight that the time complexity of TRIM had not been estimated; however, we deduce from their work that TRIM can take exponential time complexity in the worst-case scenario, in excess of Proda 's logarithmic time. The performance of both our proposed attack and defense algorithms is extensively evaluated on four real-world datasets of housing prices, loans, health care, and bike sharing services. We hope that our work will inspire future research to develop more robust learning algorithms immune to poisoning attacks.</description><subject>Algorithms</subject><subject>Complexity</subject><subject>Data models</subject><subject>Data poisoning attacks and defenses</subject><subject>Datasets</subject><subject>Distance learning</subject><subject>Linear regression</subject><subject>linear regression models</subject><subject>Loans</subject><subject>Machine learning</subject><subject>Numerical models</subject><subject>Poisoning</subject><subject>Predictive models</subject><subject>Regression models</subject><subject>Security</subject><subject>Statistical analysis</subject><subject>Time complexity</subject><subject>Training</subject><issn>1556-6013</issn><issn>1556-6021</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNo9kMtOwzAURCMEEqXwAYiNJdYpfqY2u6ovKhWBoIhl5DjXxaWNi50u-HsSUnV1R6M5c6VJkluCB4Rg9bBazN4HFFMyYFgOGaNnSY8IkaVZ452fNGGXyVWMG4w5J5nsJYdPV3-heQBdo4mLewjR-QqN_Q5iZ0NAbxDd1kFl4BFNrXWm0TV69S76ylVrNKprbb4j0lWJJmChig1sfUBLV4Fu-XWA-F_87EvYxuvkwupthJvj7Scfs-lq_JQuX-aL8WiZGqpYnRrOZCE1FIJLLqSSShVCGkVLI2xpS6qHhguOFS1KBoopzDPRuJjJsmSSsH5y3_Xug_85QKzzjT-EqnmZUyHEkGaE4SZFupQJPsYANt8Ht9PhNyc4b9fN23Xzdt38uG7D3HWMA4BTXnEusVTsD5Itdm8</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>Wen, Jialin</creator><creator>Zhao, Benjamin Zi Hao</creator><creator>Xue, Minhui</creator><creator>Oprea, Alina</creator><creator>Qian, Haifeng</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TB</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-4920-5405</orcidid><orcidid>https://orcid.org/0000-0002-9172-4252</orcidid></search><sort><creationdate>2021</creationdate><title>With Great Dispersion Comes Greater Resilience: Efficient Poisoning Attacks and Defenses for Linear Regression Models</title><author>Wen, Jialin ; Zhao, Benjamin Zi Hao ; Xue, Minhui ; Oprea, Alina ; Qian, Haifeng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c293t-c438b8aeb5484589899b58c92dc5fdfd2a7c454092bd3e9390465d2a038dd3813</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Complexity</topic><topic>Data models</topic><topic>Data poisoning attacks and defenses</topic><topic>Datasets</topic><topic>Distance learning</topic><topic>Linear regression</topic><topic>linear regression models</topic><topic>Loans</topic><topic>Machine learning</topic><topic>Numerical models</topic><topic>Poisoning</topic><topic>Predictive models</topic><topic>Regression models</topic><topic>Security</topic><topic>Statistical analysis</topic><topic>Time complexity</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wen, Jialin</creatorcontrib><creatorcontrib>Zhao, Benjamin Zi Hao</creatorcontrib><creatorcontrib>Xue, Minhui</creatorcontrib><creatorcontrib>Oprea, Alina</creatorcontrib><creatorcontrib>Qian, Haifeng</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library Online</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on information forensics and security</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wen, Jialin</au><au>Zhao, Benjamin Zi Hao</au><au>Xue, Minhui</au><au>Oprea, Alina</au><au>Qian, Haifeng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>With Great Dispersion Comes Greater Resilience: Efficient Poisoning Attacks and Defenses for Linear Regression Models</atitle><jtitle>IEEE transactions on information forensics and security</jtitle><stitle>TIFS</stitle><date>2021</date><risdate>2021</risdate><volume>16</volume><spage>3709</spage><epage>3723</epage><pages>3709-3723</pages><issn>1556-6013</issn><eissn>1556-6021</eissn><coden>ITIFA6</coden><abstract>With the rise of third parties in the machine learning pipeline, the service provider in "Machine Learning as a Service" (MLaaS), or external data contributors in online learning, or the retraining of existing models, the need to ensure the security of the resulting machine learning models has become an increasingly important topic. The security community has demonstrated that without transparency of the data and the resulting model, there exist many potential security risks, with new risks constantly being discovered. In this paper, we focus on one of these security risks - poisoning attacks . Specifically, we analyze how attackers may interfere with the results of regression learning by poisoning the training datasets. To this end, we analyze and develop a new poisoning attack algorithm. Our attack, termed Nopt , in contrast with previous poisoning attack algorithms, can produce larger errors with the same proportion of poisoning data-points. Furthermore, we also significantly improve the state-of-the-art defense algorithm, termed TRIM, proposed by Jagielsk et al. (IEEE S&P 2018), by incorporating the concept of probability estimation of clean data-points into the algorithm. Our new defense algorithm, termed Proda , demonstrates an increased effectiveness in reducing errors arising from the poisoning dataset through optimizing ensemble models. We highlight that the time complexity of TRIM had not been estimated; however, we deduce from their work that TRIM can take exponential time complexity in the worst-case scenario, in excess of Proda 's logarithmic time. The performance of both our proposed attack and defense algorithms is extensively evaluated on four real-world datasets of housing prices, loans, health care, and bike sharing services. We hope that our work will inspire future research to develop more robust learning algorithms immune to poisoning attacks.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TIFS.2021.3087332</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0003-4920-5405</orcidid><orcidid>https://orcid.org/0000-0002-9172-4252</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 1556-6013
ispartof	IEEE transactions on information forensics and security, 2021, Vol.16, p.3709-3723
issn	1556-6013 1556-6021
language	eng
recordid	cdi_crossref_primary_10_1109_TIFS_2021_3087332
source	IEEE Electronic Library (IEL) Journals
subjects	Algorithms Complexity Data models Data poisoning attacks and defenses Datasets Distance learning Linear regression linear regression models Loans Machine learning Numerical models Poisoning Predictive models Regression models Security Statistical analysis Time complexity Training
title	With Great Dispersion Comes Greater Resilience: Efficient Poisoning Attacks and Defenses for Linear Regression Models
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T00%3A14%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=With%20Great%20Dispersion%20Comes%20Greater%20Resilience:%20Efficient%20Poisoning%20Attacks%20and%20Defenses%20for%20Linear%20Regression%20Models&rft.jtitle=IEEE%20transactions%20on%20information%20forensics%20and%20security&rft.au=Wen,%20Jialin&rft.date=2021&rft.volume=16&rft.spage=3709&rft.epage=3723&rft.pages=3709-3723&rft.issn=1556-6013&rft.eissn=1556-6021&rft.coden=ITIFA6&rft_id=info:doi/10.1109/TIFS.2021.3087332&rft_dat=%3Cproquest_cross%3E2555726130%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c293t-c438b8aeb5484589899b58c92dc5fdfd2a7c454092bd3e9390465d2a038dd3813%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2555726130&rft_id=info:pmid/&rft_ieee_id=9448089&rfr_iscdi=true