Loading…

Private prediction for large-scale synthetic text generation

We present an approach for generating differentially private synthetic text using large language models (LLMs), via private prediction. In the private prediction framework, we only require the output synthetic data to satisfy differential privacy guarantees. This is in contrast to approaches that tr...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2024-10
Main Authors: Amin, Kareem, Bie, Alex, Kong, Weiwei, Kurakin, Alexey, Ponomareva, Natalia, Umar Syed, Terzis, Andreas, Vassilvitskii, Sergei
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We present an approach for generating differentially private synthetic text using large language models (LLMs), via private prediction. In the private prediction framework, we only require the output synthetic data to satisfy differential privacy guarantees. This is in contrast to approaches that train a generative model on potentially sensitive user-supplied source data and seek to ensure the model itself is safe to release. We prompt a pretrained LLM with source data, but ensure that next-token predictions are made with differential privacy guarantees. Previous work in this paradigm reported generating a small number of examples (
ISSN:2331-8422