Loading…
With Great Dispersion Comes Greater Resilience: Efficient Poisoning Attacks and Defenses for Linear Regression Models
With the rise of third parties in the machine learning pipeline, the service provider in "Machine Learning as a Service" (MLaaS), or external data contributors in online learning, or the retraining of existing models, the need to ensure the security of the resulting machine learning models...
Saved in:
Published in: | IEEE transactions on information forensics and security 2021, Vol.16, p.3709-3723 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c293t-c438b8aeb5484589899b58c92dc5fdfd2a7c454092bd3e9390465d2a038dd3813 |
---|---|
cites | cdi_FETCH-LOGICAL-c293t-c438b8aeb5484589899b58c92dc5fdfd2a7c454092bd3e9390465d2a038dd3813 |
container_end_page | 3723 |
container_issue | |
container_start_page | 3709 |
container_title | IEEE transactions on information forensics and security |
container_volume | 16 |
creator | Wen, Jialin Zhao, Benjamin Zi Hao Xue, Minhui Oprea, Alina Qian, Haifeng |
description | With the rise of third parties in the machine learning pipeline, the service provider in "Machine Learning as a Service" (MLaaS), or external data contributors in online learning, or the retraining of existing models, the need to ensure the security of the resulting machine learning models has become an increasingly important topic. The security community has demonstrated that without transparency of the data and the resulting model, there exist many potential security risks, with new risks constantly being discovered. In this paper, we focus on one of these security risks - poisoning attacks . Specifically, we analyze how attackers may interfere with the results of regression learning by poisoning the training datasets. To this end, we analyze and develop a new poisoning attack algorithm. Our attack, termed Nopt , in contrast with previous poisoning attack algorithms, can produce larger errors with the same proportion of poisoning data-points. Furthermore, we also significantly improve the state-of-the-art defense algorithm, termed TRIM, proposed by Jagielsk et al. (IEEE S&P 2018), by incorporating the concept of probability estimation of clean data-points into the algorithm. Our new defense algorithm, termed Proda , demonstrates an increased effectiveness in reducing errors arising from the poisoning dataset through optimizing ensemble models. We highlight that the time complexity of TRIM had not been estimated; however, we deduce from their work that TRIM can take exponential time complexity in the worst-case scenario, in excess of Proda 's logarithmic time. The performance of both our proposed attack and defense algorithms is extensively evaluated on four real-world datasets of housing prices, loans, health care, and bike sharing services. We hope that our work will inspire future research to develop more robust learning algorithms immune to poisoning attacks. |
doi_str_mv | 10.1109/TIFS.2021.3087332 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TIFS_2021_3087332</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9448089</ieee_id><sourcerecordid>2555726130</sourcerecordid><originalsourceid>FETCH-LOGICAL-c293t-c438b8aeb5484589899b58c92dc5fdfd2a7c454092bd3e9390465d2a038dd3813</originalsourceid><addsrcrecordid>eNo9kMtOwzAURCMEEqXwAYiNJdYpfqY2u6ovKhWBoIhl5DjXxaWNi50u-HsSUnV1R6M5c6VJkluCB4Rg9bBazN4HFFMyYFgOGaNnSY8IkaVZ452fNGGXyVWMG4w5J5nsJYdPV3-heQBdo4mLewjR-QqN_Q5iZ0NAbxDd1kFl4BFNrXWm0TV69S76ylVrNKprbb4j0lWJJmChig1sfUBLV4Fu-XWA-F_87EvYxuvkwupthJvj7Scfs-lq_JQuX-aL8WiZGqpYnRrOZCE1FIJLLqSSShVCGkVLI2xpS6qHhguOFS1KBoopzDPRuJjJsmSSsH5y3_Xug_85QKzzjT-EqnmZUyHEkGaE4SZFupQJPsYANt8Ht9PhNyc4b9fN23Xzdt38uG7D3HWMA4BTXnEusVTsD5Itdm8</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2555726130</pqid></control><display><type>article</type><title>With Great Dispersion Comes Greater Resilience: Efficient Poisoning Attacks and Defenses for Linear Regression Models</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Wen, Jialin ; Zhao, Benjamin Zi Hao ; Xue, Minhui ; Oprea, Alina ; Qian, Haifeng</creator><creatorcontrib>Wen, Jialin ; Zhao, Benjamin Zi Hao ; Xue, Minhui ; Oprea, Alina ; Qian, Haifeng</creatorcontrib><description>With the rise of third parties in the machine learning pipeline, the service provider in "Machine Learning as a Service" (MLaaS), or external data contributors in online learning, or the retraining of existing models, the need to ensure the security of the resulting machine learning models has become an increasingly important topic. The security community has demonstrated that without transparency of the data and the resulting model, there exist many potential security risks, with new risks constantly being discovered. In this paper, we focus on one of these security risks - poisoning attacks . Specifically, we analyze how attackers may interfere with the results of regression learning by poisoning the training datasets. To this end, we analyze and develop a new poisoning attack algorithm. Our attack, termed Nopt , in contrast with previous poisoning attack algorithms, can produce larger errors with the same proportion of poisoning data-points. Furthermore, we also significantly improve the state-of-the-art defense algorithm, termed TRIM, proposed by Jagielsk et al. (IEEE S&P 2018), by incorporating the concept of probability estimation of clean data-points into the algorithm. Our new defense algorithm, termed Proda , demonstrates an increased effectiveness in reducing errors arising from the poisoning dataset through optimizing ensemble models. We highlight that the time complexity of TRIM had not been estimated; however, we deduce from their work that TRIM can take exponential time complexity in the worst-case scenario, in excess of Proda 's logarithmic time. The performance of both our proposed attack and defense algorithms is extensively evaluated on four real-world datasets of housing prices, loans, health care, and bike sharing services. We hope that our work will inspire future research to develop more robust learning algorithms immune to poisoning attacks.</description><identifier>ISSN: 1556-6013</identifier><identifier>EISSN: 1556-6021</identifier><identifier>DOI: 10.1109/TIFS.2021.3087332</identifier><identifier>CODEN: ITIFA6</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithms ; Complexity ; Data models ; Data poisoning attacks and defenses ; Datasets ; Distance learning ; Linear regression ; linear regression models ; Loans ; Machine learning ; Numerical models ; Poisoning ; Predictive models ; Regression models ; Security ; Statistical analysis ; Time complexity ; Training</subject><ispartof>IEEE transactions on information forensics and security, 2021, Vol.16, p.3709-3723</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c293t-c438b8aeb5484589899b58c92dc5fdfd2a7c454092bd3e9390465d2a038dd3813</citedby><cites>FETCH-LOGICAL-c293t-c438b8aeb5484589899b58c92dc5fdfd2a7c454092bd3e9390465d2a038dd3813</cites><orcidid>0000-0003-4920-5405 ; 0000-0002-9172-4252</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9448089$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,4009,27902,27903,27904,54774</link.rule.ids></links><search><creatorcontrib>Wen, Jialin</creatorcontrib><creatorcontrib>Zhao, Benjamin Zi Hao</creatorcontrib><creatorcontrib>Xue, Minhui</creatorcontrib><creatorcontrib>Oprea, Alina</creatorcontrib><creatorcontrib>Qian, Haifeng</creatorcontrib><title>With Great Dispersion Comes Greater Resilience: Efficient Poisoning Attacks and Defenses for Linear Regression Models</title><title>IEEE transactions on information forensics and security</title><addtitle>TIFS</addtitle><description>With the rise of third parties in the machine learning pipeline, the service provider in "Machine Learning as a Service" (MLaaS), or external data contributors in online learning, or the retraining of existing models, the need to ensure the security of the resulting machine learning models has become an increasingly important topic. The security community has demonstrated that without transparency of the data and the resulting model, there exist many potential security risks, with new risks constantly being discovered. In this paper, we focus on one of these security risks - poisoning attacks . Specifically, we analyze how attackers may interfere with the results of regression learning by poisoning the training datasets. To this end, we analyze and develop a new poisoning attack algorithm. Our attack, termed Nopt , in contrast with previous poisoning attack algorithms, can produce larger errors with the same proportion of poisoning data-points. Furthermore, we also significantly improve the state-of-the-art defense algorithm, termed TRIM, proposed by Jagielsk et al. (IEEE S&P 2018), by incorporating the concept of probability estimation of clean data-points into the algorithm. Our new defense algorithm, termed Proda , demonstrates an increased effectiveness in reducing errors arising from the poisoning dataset through optimizing ensemble models. We highlight that the time complexity of TRIM had not been estimated; however, we deduce from their work that TRIM can take exponential time complexity in the worst-case scenario, in excess of Proda 's logarithmic time. The performance of both our proposed attack and defense algorithms is extensively evaluated on four real-world datasets of housing prices, loans, health care, and bike sharing services. We hope that our work will inspire future research to develop more robust learning algorithms immune to poisoning attacks.</description><subject>Algorithms</subject><subject>Complexity</subject><subject>Data models</subject><subject>Data poisoning attacks and defenses</subject><subject>Datasets</subject><subject>Distance learning</subject><subject>Linear regression</subject><subject>linear regression models</subject><subject>Loans</subject><subject>Machine learning</subject><subject>Numerical models</subject><subject>Poisoning</subject><subject>Predictive models</subject><subject>Regression models</subject><subject>Security</subject><subject>Statistical analysis</subject><subject>Time complexity</subject><subject>Training</subject><issn>1556-6013</issn><issn>1556-6021</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNo9kMtOwzAURCMEEqXwAYiNJdYpfqY2u6ovKhWBoIhl5DjXxaWNi50u-HsSUnV1R6M5c6VJkluCB4Rg9bBazN4HFFMyYFgOGaNnSY8IkaVZ452fNGGXyVWMG4w5J5nsJYdPV3-heQBdo4mLewjR-QqN_Q5iZ0NAbxDd1kFl4BFNrXWm0TV69S76ylVrNKprbb4j0lWJJmChig1sfUBLV4Fu-XWA-F_87EvYxuvkwupthJvj7Scfs-lq_JQuX-aL8WiZGqpYnRrOZCE1FIJLLqSSShVCGkVLI2xpS6qHhguOFS1KBoopzDPRuJjJsmSSsH5y3_Xug_85QKzzjT-EqnmZUyHEkGaE4SZFupQJPsYANt8Ht9PhNyc4b9fN23Xzdt38uG7D3HWMA4BTXnEusVTsD5Itdm8</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>Wen, Jialin</creator><creator>Zhao, Benjamin Zi Hao</creator><creator>Xue, Minhui</creator><creator>Oprea, Alina</creator><creator>Qian, Haifeng</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TB</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-4920-5405</orcidid><orcidid>https://orcid.org/0000-0002-9172-4252</orcidid></search><sort><creationdate>2021</creationdate><title>With Great Dispersion Comes Greater Resilience: Efficient Poisoning Attacks and Defenses for Linear Regression Models</title><author>Wen, Jialin ; Zhao, Benjamin Zi Hao ; Xue, Minhui ; Oprea, Alina ; Qian, Haifeng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c293t-c438b8aeb5484589899b58c92dc5fdfd2a7c454092bd3e9390465d2a038dd3813</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Complexity</topic><topic>Data models</topic><topic>Data poisoning attacks and defenses</topic><topic>Datasets</topic><topic>Distance learning</topic><topic>Linear regression</topic><topic>linear regression models</topic><topic>Loans</topic><topic>Machine learning</topic><topic>Numerical models</topic><topic>Poisoning</topic><topic>Predictive models</topic><topic>Regression models</topic><topic>Security</topic><topic>Statistical analysis</topic><topic>Time complexity</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wen, Jialin</creatorcontrib><creatorcontrib>Zhao, Benjamin Zi Hao</creatorcontrib><creatorcontrib>Xue, Minhui</creatorcontrib><creatorcontrib>Oprea, Alina</creatorcontrib><creatorcontrib>Qian, Haifeng</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library Online</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on information forensics and security</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wen, Jialin</au><au>Zhao, Benjamin Zi Hao</au><au>Xue, Minhui</au><au>Oprea, Alina</au><au>Qian, Haifeng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>With Great Dispersion Comes Greater Resilience: Efficient Poisoning Attacks and Defenses for Linear Regression Models</atitle><jtitle>IEEE transactions on information forensics and security</jtitle><stitle>TIFS</stitle><date>2021</date><risdate>2021</risdate><volume>16</volume><spage>3709</spage><epage>3723</epage><pages>3709-3723</pages><issn>1556-6013</issn><eissn>1556-6021</eissn><coden>ITIFA6</coden><abstract>With the rise of third parties in the machine learning pipeline, the service provider in "Machine Learning as a Service" (MLaaS), or external data contributors in online learning, or the retraining of existing models, the need to ensure the security of the resulting machine learning models has become an increasingly important topic. The security community has demonstrated that without transparency of the data and the resulting model, there exist many potential security risks, with new risks constantly being discovered. In this paper, we focus on one of these security risks - poisoning attacks . Specifically, we analyze how attackers may interfere with the results of regression learning by poisoning the training datasets. To this end, we analyze and develop a new poisoning attack algorithm. Our attack, termed Nopt , in contrast with previous poisoning attack algorithms, can produce larger errors with the same proportion of poisoning data-points. Furthermore, we also significantly improve the state-of-the-art defense algorithm, termed TRIM, proposed by Jagielsk et al. (IEEE S&P 2018), by incorporating the concept of probability estimation of clean data-points into the algorithm. Our new defense algorithm, termed Proda , demonstrates an increased effectiveness in reducing errors arising from the poisoning dataset through optimizing ensemble models. We highlight that the time complexity of TRIM had not been estimated; however, we deduce from their work that TRIM can take exponential time complexity in the worst-case scenario, in excess of Proda 's logarithmic time. The performance of both our proposed attack and defense algorithms is extensively evaluated on four real-world datasets of housing prices, loans, health care, and bike sharing services. We hope that our work will inspire future research to develop more robust learning algorithms immune to poisoning attacks.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TIFS.2021.3087332</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0003-4920-5405</orcidid><orcidid>https://orcid.org/0000-0002-9172-4252</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1556-6013 |
ispartof | IEEE transactions on information forensics and security, 2021, Vol.16, p.3709-3723 |
issn | 1556-6013 1556-6021 |
language | eng |
recordid | cdi_crossref_primary_10_1109_TIFS_2021_3087332 |
source | IEEE Electronic Library (IEL) Journals |
subjects | Algorithms Complexity Data models Data poisoning attacks and defenses Datasets Distance learning Linear regression linear regression models Loans Machine learning Numerical models Poisoning Predictive models Regression models Security Statistical analysis Time complexity Training |
title | With Great Dispersion Comes Greater Resilience: Efficient Poisoning Attacks and Defenses for Linear Regression Models |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T00%3A14%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=With%20Great%20Dispersion%20Comes%20Greater%20Resilience:%20Efficient%20Poisoning%20Attacks%20and%20Defenses%20for%20Linear%20Regression%20Models&rft.jtitle=IEEE%20transactions%20on%20information%20forensics%20and%20security&rft.au=Wen,%20Jialin&rft.date=2021&rft.volume=16&rft.spage=3709&rft.epage=3723&rft.pages=3709-3723&rft.issn=1556-6013&rft.eissn=1556-6021&rft.coden=ITIFA6&rft_id=info:doi/10.1109/TIFS.2021.3087332&rft_dat=%3Cproquest_cross%3E2555726130%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c293t-c438b8aeb5484589899b58c92dc5fdfd2a7c454092bd3e9390465d2a038dd3813%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2555726130&rft_id=info:pmid/&rft_ieee_id=9448089&rfr_iscdi=true |