Loading…

DiffInject: Revisiting Debias via Synthetic Data Generation using Diffusion-based Style Injection

Dataset bias is a significant challenge in machine learning, where specific attributes, such as texture or color of the images are unintentionally learned resulting in detrimental performance. To address this, previous efforts have focused on debiasing models either by developing novel debiasing alg...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2024-06
Main Authors:	Ko, Donggeun, Sangwoo Jo, Lee, Dongjun, Park, Namjun, Kim, Jaekwang
Format:	Article
Language:	English
Subjects:	Algorithms Bias Datasets Machine learning Synthetic data
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Ko, Donggeun Sangwoo Jo Lee, Dongjun Park, Namjun Kim, Jaekwang
description	Dataset bias is a significant challenge in machine learning, where specific attributes, such as texture or color of the images are unintentionally learned resulting in detrimental performance. To address this, previous efforts have focused on debiasing models either by developing novel debiasing algorithms or by generating synthetic data to mitigate the prevalent dataset biases. However, generative approaches to date have largely relied on using bias-specific samples from the dataset, which are typically too scarce. In this work, we propose, DiffInject, a straightforward yet powerful method to augment synthetic bias-conflict samples using a pretrained diffusion model. This approach significantly advances the use of diffusion models for debiasing purposes by manipulating the latent space. Our framework does not require any explicit knowledge of the bias types or labelling, making it a fully unsupervised setting for debiasing. Our methodology demonstrates substantial result in effectively reducing dataset bias.
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3066583550</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3066583550</sourcerecordid><originalsourceid>FETCH-proquest_journals_30665835503</originalsourceid><addsrcrecordid>eNqNjM0KgkAUhYcgSMp3uNBamGYak7bZ3zbby9WuNSIz5YyCb59UD9DqHM75-CYsEFKuomQtxIyFztWccxFvhFIyYJjqqjqbmkq_hQv12mmvzR1SKjQ66DVCNhj_IK9LSNEjHMlQi15bA537oKNhbNZEBTq6QeaHhuDrHNcFm1bYOAp_OWfLw_66O0XP1r46cj6vbdea8colj2OVSKW4_I96AxneRSI</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3066583550</pqid></control><display><type>article</type><title>DiffInject: Revisiting Debias via Synthetic Data Generation using Diffusion-based Style Injection</title><source>Publicly Available Content Database</source><creator>Ko, Donggeun ; Sangwoo Jo ; Lee, Dongjun ; Park, Namjun ; Kim, Jaekwang</creator><creatorcontrib>Ko, Donggeun ; Sangwoo Jo ; Lee, Dongjun ; Park, Namjun ; Kim, Jaekwang</creatorcontrib><description>Dataset bias is a significant challenge in machine learning, where specific attributes, such as texture or color of the images are unintentionally learned resulting in detrimental performance. To address this, previous efforts have focused on debiasing models either by developing novel debiasing algorithms or by generating synthetic data to mitigate the prevalent dataset biases. However, generative approaches to date have largely relied on using bias-specific samples from the dataset, which are typically too scarce. In this work, we propose, DiffInject, a straightforward yet powerful method to augment synthetic bias-conflict samples using a pretrained diffusion model. This approach significantly advances the use of diffusion models for debiasing purposes by manipulating the latent space. Our framework does not require any explicit knowledge of the bias types or labelling, making it a fully unsupervised setting for debiasing. Our methodology demonstrates substantial result in effectively reducing dataset bias.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Bias ; Datasets ; Machine learning ; Synthetic data</subject><ispartof>arXiv.org, 2024-06</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/3066583550?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>776,780,25731,36989,44566</link.rule.ids></links><search><creatorcontrib>Ko, Donggeun</creatorcontrib><creatorcontrib>Sangwoo Jo</creatorcontrib><creatorcontrib>Lee, Dongjun</creatorcontrib><creatorcontrib>Park, Namjun</creatorcontrib><creatorcontrib>Kim, Jaekwang</creatorcontrib><title>DiffInject: Revisiting Debias via Synthetic Data Generation using Diffusion-based Style Injection</title><title>arXiv.org</title><description>Dataset bias is a significant challenge in machine learning, where specific attributes, such as texture or color of the images are unintentionally learned resulting in detrimental performance. To address this, previous efforts have focused on debiasing models either by developing novel debiasing algorithms or by generating synthetic data to mitigate the prevalent dataset biases. However, generative approaches to date have largely relied on using bias-specific samples from the dataset, which are typically too scarce. In this work, we propose, DiffInject, a straightforward yet powerful method to augment synthetic bias-conflict samples using a pretrained diffusion model. This approach significantly advances the use of diffusion models for debiasing purposes by manipulating the latent space. Our framework does not require any explicit knowledge of the bias types or labelling, making it a fully unsupervised setting for debiasing. Our methodology demonstrates substantial result in effectively reducing dataset bias.</description><subject>Algorithms</subject><subject>Bias</subject><subject>Datasets</subject><subject>Machine learning</subject><subject>Synthetic data</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNjM0KgkAUhYcgSMp3uNBamGYak7bZ3zbby9WuNSIz5YyCb59UD9DqHM75-CYsEFKuomQtxIyFztWccxFvhFIyYJjqqjqbmkq_hQv12mmvzR1SKjQ66DVCNhj_IK9LSNEjHMlQi15bA537oKNhbNZEBTq6QeaHhuDrHNcFm1bYOAp_OWfLw_66O0XP1r46cj6vbdea8colj2OVSKW4_I96AxneRSI</recordid><startdate>20240610</startdate><enddate>20240610</enddate><creator>Ko, Donggeun</creator><creator>Sangwoo Jo</creator><creator>Lee, Dongjun</creator><creator>Park, Namjun</creator><creator>Kim, Jaekwang</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240610</creationdate><title>DiffInject: Revisiting Debias via Synthetic Data Generation using Diffusion-based Style Injection</title><author>Ko, Donggeun ; Sangwoo Jo ; Lee, Dongjun ; Park, Namjun ; Kim, Jaekwang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_30665835503</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Bias</topic><topic>Datasets</topic><topic>Machine learning</topic><topic>Synthetic data</topic><toplevel>online_resources</toplevel><creatorcontrib>Ko, Donggeun</creatorcontrib><creatorcontrib>Sangwoo Jo</creatorcontrib><creatorcontrib>Lee, Dongjun</creatorcontrib><creatorcontrib>Park, Namjun</creatorcontrib><creatorcontrib>Kim, Jaekwang</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ko, Donggeun</au><au>Sangwoo Jo</au><au>Lee, Dongjun</au><au>Park, Namjun</au><au>Kim, Jaekwang</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>DiffInject: Revisiting Debias via Synthetic Data Generation using Diffusion-based Style Injection</atitle><jtitle>arXiv.org</jtitle><date>2024-06-10</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Dataset bias is a significant challenge in machine learning, where specific attributes, such as texture or color of the images are unintentionally learned resulting in detrimental performance. To address this, previous efforts have focused on debiasing models either by developing novel debiasing algorithms or by generating synthetic data to mitigate the prevalent dataset biases. However, generative approaches to date have largely relied on using bias-specific samples from the dataset, which are typically too scarce. In this work, we propose, DiffInject, a straightforward yet powerful method to augment synthetic bias-conflict samples using a pretrained diffusion model. This approach significantly advances the use of diffusion models for debiasing purposes by manipulating the latent space. Our framework does not require any explicit knowledge of the bias types or labelling, making it a fully unsupervised setting for debiasing. Our methodology demonstrates substantial result in effectively reducing dataset bias.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-06
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_3066583550
source	Publicly Available Content Database
subjects	Algorithms Bias Datasets Machine learning Synthetic data
title	DiffInject: Revisiting Debias via Synthetic Data Generation using Diffusion-based Style Injection
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T13%3A04%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=DiffInject:%20Revisiting%20Debias%20via%20Synthetic%20Data%20Generation%20using%20Diffusion-based%20Style%20Injection&rft.jtitle=arXiv.org&rft.au=Ko,%20Donggeun&rft.date=2024-06-10&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3066583550%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_30665835503%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3066583550&rft_id=info:pmid/&rfr_iscdi=true