Loading…

Manipulating Pre-trained Encoder for Targeted Poisoning Attacks in Contrastive Learning

In recent years, contrastive learning has become very powerful for representation learning using large-scale unlabeled data, by involving pretrained encoders to fine-tune downstream classifiers. However, the latest research indicates that contrastive learning can potentially suffer from the risks of...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on information forensics and security 2024-01, Vol.19, p.1-1
Main Authors: Chen, Jian, Gao, Yuan, Liu, Gaoyang, Abdelmoniem, Ahmed M., Wang, Chen
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In recent years, contrastive learning has become very powerful for representation learning using large-scale unlabeled data, by involving pretrained encoders to fine-tune downstream classifiers. However, the latest research indicates that contrastive learning can potentially suffer from the risks of data poisoning attacks, where the attacker injects maliciously crafted poisoned samples into the unlabeled pretraining data. To step forward, in this paper, we present a more stealthy poisoning attack dubbed PA-CL to directly poison the pretrained encoder, such that the downstream classifier's behavior on a single target instance to the attacker-desired class can be manipulated without affecting the overall downstream classification performance. We observe that a high similarity exists between the feature representation generated by the poisoned pretrained encoder for the target sample and samples from the attacker-desired class. This leads to the downstream classifier misclassifying the target sample with the attacker-desired class. Therefore, we formulate our attack as an optimization problem, and design two novel loss functions, namely, the target effectiveness loss to effectively poison the pretrained encoder, and the model utility loss to maintain the downstream classification performance. Experimental results on four real-world datasets demonstrate that the attack success rate of the proposed attack is 40% higher on average than that of the three baseline attacks, and the fluctuation of the downstream classifier's prediction accuracy is within 5%.
ISSN:1556-6013
1556-6021
DOI:10.1109/TIFS.2024.3350389