Loading…

With Great Dispersion Comes Greater Resilience: Efficient Poisoning Attacks and Defenses for Linear Regression Models

With the rise of third parties in the machine learning pipeline, the service provider in "Machine Learning as a Service" (MLaaS), or external data contributors in online learning, or the retraining of existing models, the need to ensure the security of the resulting machine learning models...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on information forensics and security 2021, Vol.16, p.3709-3723
Main Authors: Wen, Jialin, Zhao, Benjamin Zi Hao, Xue, Minhui, Oprea, Alina, Qian, Haifeng
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c293t-c438b8aeb5484589899b58c92dc5fdfd2a7c454092bd3e9390465d2a038dd3813
cites cdi_FETCH-LOGICAL-c293t-c438b8aeb5484589899b58c92dc5fdfd2a7c454092bd3e9390465d2a038dd3813
container_end_page 3723
container_issue
container_start_page 3709
container_title IEEE transactions on information forensics and security
container_volume 16
creator Wen, Jialin
Zhao, Benjamin Zi Hao
Xue, Minhui
Oprea, Alina
Qian, Haifeng
description With the rise of third parties in the machine learning pipeline, the service provider in "Machine Learning as a Service" (MLaaS), or external data contributors in online learning, or the retraining of existing models, the need to ensure the security of the resulting machine learning models has become an increasingly important topic. The security community has demonstrated that without transparency of the data and the resulting model, there exist many potential security risks, with new risks constantly being discovered. In this paper, we focus on one of these security risks - poisoning attacks . Specifically, we analyze how attackers may interfere with the results of regression learning by poisoning the training datasets. To this end, we analyze and develop a new poisoning attack algorithm. Our attack, termed Nopt , in contrast with previous poisoning attack algorithms, can produce larger errors with the same proportion of poisoning data-points. Furthermore, we also significantly improve the state-of-the-art defense algorithm, termed TRIM, proposed by Jagielsk et al. (IEEE S&P 2018), by incorporating the concept of probability estimation of clean data-points into the algorithm. Our new defense algorithm, termed Proda , demonstrates an increased effectiveness in reducing errors arising from the poisoning dataset through optimizing ensemble models. We highlight that the time complexity of TRIM had not been estimated; however, we deduce from their work that TRIM can take exponential time complexity in the worst-case scenario, in excess of Proda 's logarithmic time. The performance of both our proposed attack and defense algorithms is extensively evaluated on four real-world datasets of housing prices, loans, health care, and bike sharing services. We hope that our work will inspire future research to develop more robust learning algorithms immune to poisoning attacks.
doi_str_mv 10.1109/TIFS.2021.3087332
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TIFS_2021_3087332</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9448089</ieee_id><sourcerecordid>2555726130</sourcerecordid><originalsourceid>FETCH-LOGICAL-c293t-c438b8aeb5484589899b58c92dc5fdfd2a7c454092bd3e9390465d2a038dd3813</originalsourceid><addsrcrecordid>eNo9kMtOwzAURCMEEqXwAYiNJdYpfqY2u6ovKhWBoIhl5DjXxaWNi50u-HsSUnV1R6M5c6VJkluCB4Rg9bBazN4HFFMyYFgOGaNnSY8IkaVZ452fNGGXyVWMG4w5J5nsJYdPV3-heQBdo4mLewjR-QqN_Q5iZ0NAbxDd1kFl4BFNrXWm0TV69S76ylVrNKprbb4j0lWJJmChig1sfUBLV4Fu-XWA-F_87EvYxuvkwupthJvj7Scfs-lq_JQuX-aL8WiZGqpYnRrOZCE1FIJLLqSSShVCGkVLI2xpS6qHhguOFS1KBoopzDPRuJjJsmSSsH5y3_Xug_85QKzzjT-EqnmZUyHEkGaE4SZFupQJPsYANt8Ht9PhNyc4b9fN23Xzdt38uG7D3HWMA4BTXnEusVTsD5Itdm8</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2555726130</pqid></control><display><type>article</type><title>With Great Dispersion Comes Greater Resilience: Efficient Poisoning Attacks and Defenses for Linear Regression Models</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Wen, Jialin ; Zhao, Benjamin Zi Hao ; Xue, Minhui ; Oprea, Alina ; Qian, Haifeng</creator><creatorcontrib>Wen, Jialin ; Zhao, Benjamin Zi Hao ; Xue, Minhui ; Oprea, Alina ; Qian, Haifeng</creatorcontrib><description>With the rise of third parties in the machine learning pipeline, the service provider in "Machine Learning as a Service" (MLaaS), or external data contributors in online learning, or the retraining of existing models, the need to ensure the security of the resulting machine learning models has become an increasingly important topic. The security community has demonstrated that without transparency of the data and the resulting model, there exist many potential security risks, with new risks constantly being discovered. In this paper, we focus on one of these security risks - poisoning attacks . Specifically, we analyze how attackers may interfere with the results of regression learning by poisoning the training datasets. To this end, we analyze and develop a new poisoning attack algorithm. Our attack, termed Nopt , in contrast with previous poisoning attack algorithms, can produce larger errors with the same proportion of poisoning data-points. Furthermore, we also significantly improve the state-of-the-art defense algorithm, termed TRIM, proposed by Jagielsk et al. (IEEE S&amp;P 2018), by incorporating the concept of probability estimation of clean data-points into the algorithm. Our new defense algorithm, termed Proda , demonstrates an increased effectiveness in reducing errors arising from the poisoning dataset through optimizing ensemble models. We highlight that the time complexity of TRIM had not been estimated; however, we deduce from their work that TRIM can take exponential time complexity in the worst-case scenario, in excess of Proda 's logarithmic time. The performance of both our proposed attack and defense algorithms is extensively evaluated on four real-world datasets of housing prices, loans, health care, and bike sharing services. We hope that our work will inspire future research to develop more robust learning algorithms immune to poisoning attacks.</description><identifier>ISSN: 1556-6013</identifier><identifier>EISSN: 1556-6021</identifier><identifier>DOI: 10.1109/TIFS.2021.3087332</identifier><identifier>CODEN: ITIFA6</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithms ; Complexity ; Data models ; Data poisoning attacks and defenses ; Datasets ; Distance learning ; Linear regression ; linear regression models ; Loans ; Machine learning ; Numerical models ; Poisoning ; Predictive models ; Regression models ; Security ; Statistical analysis ; Time complexity ; Training</subject><ispartof>IEEE transactions on information forensics and security, 2021, Vol.16, p.3709-3723</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c293t-c438b8aeb5484589899b58c92dc5fdfd2a7c454092bd3e9390465d2a038dd3813</citedby><cites>FETCH-LOGICAL-c293t-c438b8aeb5484589899b58c92dc5fdfd2a7c454092bd3e9390465d2a038dd3813</cites><orcidid>0000-0003-4920-5405 ; 0000-0002-9172-4252</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9448089$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,4009,27902,27903,27904,54774</link.rule.ids></links><search><creatorcontrib>Wen, Jialin</creatorcontrib><creatorcontrib>Zhao, Benjamin Zi Hao</creatorcontrib><creatorcontrib>Xue, Minhui</creatorcontrib><creatorcontrib>Oprea, Alina</creatorcontrib><creatorcontrib>Qian, Haifeng</creatorcontrib><title>With Great Dispersion Comes Greater Resilience: Efficient Poisoning Attacks and Defenses for Linear Regression Models</title><title>IEEE transactions on information forensics and security</title><addtitle>TIFS</addtitle><description>With the rise of third parties in the machine learning pipeline, the service provider in "Machine Learning as a Service" (MLaaS), or external data contributors in online learning, or the retraining of existing models, the need to ensure the security of the resulting machine learning models has become an increasingly important topic. The security community has demonstrated that without transparency of the data and the resulting model, there exist many potential security risks, with new risks constantly being discovered. In this paper, we focus on one of these security risks - poisoning attacks . Specifically, we analyze how attackers may interfere with the results of regression learning by poisoning the training datasets. To this end, we analyze and develop a new poisoning attack algorithm. Our attack, termed Nopt , in contrast with previous poisoning attack algorithms, can produce larger errors with the same proportion of poisoning data-points. Furthermore, we also significantly improve the state-of-the-art defense algorithm, termed TRIM, proposed by Jagielsk et al. (IEEE S&amp;P 2018), by incorporating the concept of probability estimation of clean data-points into the algorithm. Our new defense algorithm, termed Proda , demonstrates an increased effectiveness in reducing errors arising from the poisoning dataset through optimizing ensemble models. We highlight that the time complexity of TRIM had not been estimated; however, we deduce from their work that TRIM can take exponential time complexity in the worst-case scenario, in excess of Proda 's logarithmic time. The performance of both our proposed attack and defense algorithms is extensively evaluated on four real-world datasets of housing prices, loans, health care, and bike sharing services. We hope that our work will inspire future research to develop more robust learning algorithms immune to poisoning attacks.</description><subject>Algorithms</subject><subject>Complexity</subject><subject>Data models</subject><subject>Data poisoning attacks and defenses</subject><subject>Datasets</subject><subject>Distance learning</subject><subject>Linear regression</subject><subject>linear regression models</subject><subject>Loans</subject><subject>Machine learning</subject><subject>Numerical models</subject><subject>Poisoning</subject><subject>Predictive models</subject><subject>Regression models</subject><subject>Security</subject><subject>Statistical analysis</subject><subject>Time complexity</subject><subject>Training</subject><issn>1556-6013</issn><issn>1556-6021</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNo9kMtOwzAURCMEEqXwAYiNJdYpfqY2u6ovKhWBoIhl5DjXxaWNi50u-HsSUnV1R6M5c6VJkluCB4Rg9bBazN4HFFMyYFgOGaNnSY8IkaVZ452fNGGXyVWMG4w5J5nsJYdPV3-heQBdo4mLewjR-QqN_Q5iZ0NAbxDd1kFl4BFNrXWm0TV69S76ylVrNKprbb4j0lWJJmChig1sfUBLV4Fu-XWA-F_87EvYxuvkwupthJvj7Scfs-lq_JQuX-aL8WiZGqpYnRrOZCE1FIJLLqSSShVCGkVLI2xpS6qHhguOFS1KBoopzDPRuJjJsmSSsH5y3_Xug_85QKzzjT-EqnmZUyHEkGaE4SZFupQJPsYANt8Ht9PhNyc4b9fN23Xzdt38uG7D3HWMA4BTXnEusVTsD5Itdm8</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>Wen, Jialin</creator><creator>Zhao, Benjamin Zi Hao</creator><creator>Xue, Minhui</creator><creator>Oprea, Alina</creator><creator>Qian, Haifeng</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TB</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-4920-5405</orcidid><orcidid>https://orcid.org/0000-0002-9172-4252</orcidid></search><sort><creationdate>2021</creationdate><title>With Great Dispersion Comes Greater Resilience: Efficient Poisoning Attacks and Defenses for Linear Regression Models</title><author>Wen, Jialin ; Zhao, Benjamin Zi Hao ; Xue, Minhui ; Oprea, Alina ; Qian, Haifeng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c293t-c438b8aeb5484589899b58c92dc5fdfd2a7c454092bd3e9390465d2a038dd3813</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Complexity</topic><topic>Data models</topic><topic>Data poisoning attacks and defenses</topic><topic>Datasets</topic><topic>Distance learning</topic><topic>Linear regression</topic><topic>linear regression models</topic><topic>Loans</topic><topic>Machine learning</topic><topic>Numerical models</topic><topic>Poisoning</topic><topic>Predictive models</topic><topic>Regression models</topic><topic>Security</topic><topic>Statistical analysis</topic><topic>Time complexity</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wen, Jialin</creatorcontrib><creatorcontrib>Zhao, Benjamin Zi Hao</creatorcontrib><creatorcontrib>Xue, Minhui</creatorcontrib><creatorcontrib>Oprea, Alina</creatorcontrib><creatorcontrib>Qian, Haifeng</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library Online</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on information forensics and security</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wen, Jialin</au><au>Zhao, Benjamin Zi Hao</au><au>Xue, Minhui</au><au>Oprea, Alina</au><au>Qian, Haifeng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>With Great Dispersion Comes Greater Resilience: Efficient Poisoning Attacks and Defenses for Linear Regression Models</atitle><jtitle>IEEE transactions on information forensics and security</jtitle><stitle>TIFS</stitle><date>2021</date><risdate>2021</risdate><volume>16</volume><spage>3709</spage><epage>3723</epage><pages>3709-3723</pages><issn>1556-6013</issn><eissn>1556-6021</eissn><coden>ITIFA6</coden><abstract>With the rise of third parties in the machine learning pipeline, the service provider in "Machine Learning as a Service" (MLaaS), or external data contributors in online learning, or the retraining of existing models, the need to ensure the security of the resulting machine learning models has become an increasingly important topic. The security community has demonstrated that without transparency of the data and the resulting model, there exist many potential security risks, with new risks constantly being discovered. In this paper, we focus on one of these security risks - poisoning attacks . Specifically, we analyze how attackers may interfere with the results of regression learning by poisoning the training datasets. To this end, we analyze and develop a new poisoning attack algorithm. Our attack, termed Nopt , in contrast with previous poisoning attack algorithms, can produce larger errors with the same proportion of poisoning data-points. Furthermore, we also significantly improve the state-of-the-art defense algorithm, termed TRIM, proposed by Jagielsk et al. (IEEE S&amp;P 2018), by incorporating the concept of probability estimation of clean data-points into the algorithm. Our new defense algorithm, termed Proda , demonstrates an increased effectiveness in reducing errors arising from the poisoning dataset through optimizing ensemble models. We highlight that the time complexity of TRIM had not been estimated; however, we deduce from their work that TRIM can take exponential time complexity in the worst-case scenario, in excess of Proda 's logarithmic time. The performance of both our proposed attack and defense algorithms is extensively evaluated on four real-world datasets of housing prices, loans, health care, and bike sharing services. We hope that our work will inspire future research to develop more robust learning algorithms immune to poisoning attacks.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TIFS.2021.3087332</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0003-4920-5405</orcidid><orcidid>https://orcid.org/0000-0002-9172-4252</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1556-6013
ispartof IEEE transactions on information forensics and security, 2021, Vol.16, p.3709-3723
issn 1556-6013
1556-6021
language eng
recordid cdi_crossref_primary_10_1109_TIFS_2021_3087332
source IEEE Electronic Library (IEL) Journals
subjects Algorithms
Complexity
Data models
Data poisoning attacks and defenses
Datasets
Distance learning
Linear regression
linear regression models
Loans
Machine learning
Numerical models
Poisoning
Predictive models
Regression models
Security
Statistical analysis
Time complexity
Training
title With Great Dispersion Comes Greater Resilience: Efficient Poisoning Attacks and Defenses for Linear Regression Models
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T00%3A14%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=With%20Great%20Dispersion%20Comes%20Greater%20Resilience:%20Efficient%20Poisoning%20Attacks%20and%20Defenses%20for%20Linear%20Regression%20Models&rft.jtitle=IEEE%20transactions%20on%20information%20forensics%20and%20security&rft.au=Wen,%20Jialin&rft.date=2021&rft.volume=16&rft.spage=3709&rft.epage=3723&rft.pages=3709-3723&rft.issn=1556-6013&rft.eissn=1556-6021&rft.coden=ITIFA6&rft_id=info:doi/10.1109/TIFS.2021.3087332&rft_dat=%3Cproquest_cross%3E2555726130%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c293t-c438b8aeb5484589899b58c92dc5fdfd2a7c454092bd3e9390465d2a038dd3813%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2555726130&rft_id=info:pmid/&rft_ieee_id=9448089&rfr_iscdi=true