Loading…

Iterative Adversarial Attack on Image-Guided Story Ending Generation

Multimodal learning involves developing models that can integrate information from various sources like images and texts. In this field, multimodal text generation is a crucial aspect that involves processing data from multiple modalities and outputting text. The image-guided story ending generation...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on multimedia 2024, Vol.26, p.6117-6130
Main Authors:	Wang, Youze, Hu, Wenbo, Hong, Richang
Format:	Article
Language:	English
Subjects:	adversarial attack Artificial neural networks Computational modeling Data models Data processing Fuses Iterative methods Machine learning Machine translation Multimodal multimodal text generation Perturbation methods Task analysis Visualization
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites	cdi_FETCH-LOGICAL-c245t-be3fcf1c12a30bd4b9d47526e569d5f57deeb5a6deb0ffb38786e9678df4df083
container_end_page	6130
container_issue
container_start_page	6117
container_title	IEEE transactions on multimedia
container_volume	26
creator	Wang, Youze Hu, Wenbo Hong, Richang
description	Multimodal learning involves developing models that can integrate information from various sources like images and texts. In this field, multimodal text generation is a crucial aspect that involves processing data from multiple modalities and outputting text. The image-guided story ending generation (IgSEG) is a particularly significant task, targeting on an understanding of complex relationships between text and image data with a complete story text ending. Unfortunately, deep neural networks, which are the backbone of recent IgSEG models, are vulnerable to adversarial samples. Current adversarial attack methods mainly focus on single-modality data and do not analyze adversarial attacks for multimodal text generation tasks that use cross-modal information. To this end, we propose an iterative adversarial attack method (Iterative-attack) that fuses image and text modality attacks, allowing for an attack search for adversarial text and image in a more effective iterative way. Experimental results demonstrate that the proposed method outperforms existing single-modal and non-iterative multimodal attack methods, indicating the potential for improving the adversarial robustness of multimodal text generation models, such as multimodal machine translation, multimodal question answering, etc.
doi_str_mv	10.1109/TMM.2023.3345167
format	article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TMM_2023_3345167</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10366855</ieee_id><sourcerecordid>3035274133</sourcerecordid><originalsourceid>FETCH-LOGICAL-c245t-be3fcf1c12a30bd4b9d47526e569d5f57deeb5a6deb0ffb38786e9678df4df083</originalsourceid><addsrcrecordid>eNpNkE1PwkAQQDdGExG9e_DQxHNx9rs9EkQkgXgQz5ttd5YUocXdloR_bxEOnmYO780kj5BHCiNKIX9ZLZcjBoyPOBeSKn1FBjQXNAXQ-rrfJYM0ZxRuyV2MGwAqJOgBeZ23GGxbHTAZuwOGaENlt8m4bW35nTR1Mt_ZNaazrnLoks-2CcdkWruqXiczrP_Upr4nN95uIz5c5pB8vU1Xk_d08TGbT8aLtGRCtmmB3JeelpRZDoUTRe6ElkyhVLmTXmqHWEirHBbgfcEznSnMlc6cF85Dxofk-Xx3H5qfDmNrNk0X6v6l4cAl04Jy3lNwpsrQxBjQm32odjYcDQVzamX6VubUylxa9crTWakQ8R_Olcqk5L9E_WVd</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3035274133</pqid></control><display><type>article</type><title>Iterative Adversarial Attack on Image-Guided Story Ending Generation</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Wang, Youze ; Hu, Wenbo ; Hong, Richang</creator><creatorcontrib>Wang, Youze ; Hu, Wenbo ; Hong, Richang</creatorcontrib><description>Multimodal learning involves developing models that can integrate information from various sources like images and texts. In this field, multimodal text generation is a crucial aspect that involves processing data from multiple modalities and outputting text. The image-guided story ending generation (IgSEG) is a particularly significant task, targeting on an understanding of complex relationships between text and image data with a complete story text ending. Unfortunately, deep neural networks, which are the backbone of recent IgSEG models, are vulnerable to adversarial samples. Current adversarial attack methods mainly focus on single-modality data and do not analyze adversarial attacks for multimodal text generation tasks that use cross-modal information. To this end, we propose an iterative adversarial attack method (Iterative-attack) that fuses image and text modality attacks, allowing for an attack search for adversarial text and image in a more effective iterative way. Experimental results demonstrate that the proposed method outperforms existing single-modal and non-iterative multimodal attack methods, indicating the potential for improving the adversarial robustness of multimodal text generation models, such as multimodal machine translation, multimodal question answering, etc.</description><identifier>ISSN: 1520-9210</identifier><identifier>EISSN: 1941-0077</identifier><identifier>DOI: 10.1109/TMM.2023.3345167</identifier><identifier>CODEN: ITMUF8</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>adversarial attack ; Artificial neural networks ; Computational modeling ; Data models ; Data processing ; Fuses ; Iterative methods ; Machine learning ; Machine translation ; Multimodal ; multimodal text generation ; Perturbation methods ; Task analysis ; Visualization</subject><ispartof>IEEE transactions on multimedia, 2024, Vol.26, p.6117-6130</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c245t-be3fcf1c12a30bd4b9d47526e569d5f57deeb5a6deb0ffb38786e9678df4df083</cites><orcidid>0000-0002-0639-2012 ; 0009-0003-5621-6310 ; 0000-0001-5461-3986</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10366855$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,4010,27900,27901,27902,54771</link.rule.ids></links><search><creatorcontrib>Wang, Youze</creatorcontrib><creatorcontrib>Hu, Wenbo</creatorcontrib><creatorcontrib>Hong, Richang</creatorcontrib><title>Iterative Adversarial Attack on Image-Guided Story Ending Generation</title><title>IEEE transactions on multimedia</title><addtitle>TMM</addtitle><description>Multimodal learning involves developing models that can integrate information from various sources like images and texts. In this field, multimodal text generation is a crucial aspect that involves processing data from multiple modalities and outputting text. The image-guided story ending generation (IgSEG) is a particularly significant task, targeting on an understanding of complex relationships between text and image data with a complete story text ending. Unfortunately, deep neural networks, which are the backbone of recent IgSEG models, are vulnerable to adversarial samples. Current adversarial attack methods mainly focus on single-modality data and do not analyze adversarial attacks for multimodal text generation tasks that use cross-modal information. To this end, we propose an iterative adversarial attack method (Iterative-attack) that fuses image and text modality attacks, allowing for an attack search for adversarial text and image in a more effective iterative way. Experimental results demonstrate that the proposed method outperforms existing single-modal and non-iterative multimodal attack methods, indicating the potential for improving the adversarial robustness of multimodal text generation models, such as multimodal machine translation, multimodal question answering, etc.</description><subject>adversarial attack</subject><subject>Artificial neural networks</subject><subject>Computational modeling</subject><subject>Data models</subject><subject>Data processing</subject><subject>Fuses</subject><subject>Iterative methods</subject><subject>Machine learning</subject><subject>Machine translation</subject><subject>Multimodal</subject><subject>multimodal text generation</subject><subject>Perturbation methods</subject><subject>Task analysis</subject><subject>Visualization</subject><issn>1520-9210</issn><issn>1941-0077</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNpNkE1PwkAQQDdGExG9e_DQxHNx9rs9EkQkgXgQz5ttd5YUocXdloR_bxEOnmYO780kj5BHCiNKIX9ZLZcjBoyPOBeSKn1FBjQXNAXQ-rrfJYM0ZxRuyV2MGwAqJOgBeZ23GGxbHTAZuwOGaENlt8m4bW35nTR1Mt_ZNaazrnLoks-2CcdkWruqXiczrP_Upr4nN95uIz5c5pB8vU1Xk_d08TGbT8aLtGRCtmmB3JeelpRZDoUTRe6ElkyhVLmTXmqHWEirHBbgfcEznSnMlc6cF85Dxofk-Xx3H5qfDmNrNk0X6v6l4cAl04Jy3lNwpsrQxBjQm32odjYcDQVzamX6VubUylxa9crTWakQ8R_Olcqk5L9E_WVd</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Wang, Youze</creator><creator>Hu, Wenbo</creator><creator>Hong, Richang</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-0639-2012</orcidid><orcidid>https://orcid.org/0009-0003-5621-6310</orcidid><orcidid>https://orcid.org/0000-0001-5461-3986</orcidid></search><sort><creationdate>2024</creationdate><title>Iterative Adversarial Attack on Image-Guided Story Ending Generation</title><author>Wang, Youze ; Hu, Wenbo ; Hong, Richang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c245t-be3fcf1c12a30bd4b9d47526e569d5f57deeb5a6deb0ffb38786e9678df4df083</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>adversarial attack</topic><topic>Artificial neural networks</topic><topic>Computational modeling</topic><topic>Data models</topic><topic>Data processing</topic><topic>Fuses</topic><topic>Iterative methods</topic><topic>Machine learning</topic><topic>Machine translation</topic><topic>Multimodal</topic><topic>multimodal text generation</topic><topic>Perturbation methods</topic><topic>Task analysis</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Youze</creatorcontrib><creatorcontrib>Hu, Wenbo</creatorcontrib><creatorcontrib>Hong, Richang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on multimedia</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wang, Youze</au><au>Hu, Wenbo</au><au>Hong, Richang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Iterative Adversarial Attack on Image-Guided Story Ending Generation</atitle><jtitle>IEEE transactions on multimedia</jtitle><stitle>TMM</stitle><date>2024</date><risdate>2024</risdate><volume>26</volume><spage>6117</spage><epage>6130</epage><pages>6117-6130</pages><issn>1520-9210</issn><eissn>1941-0077</eissn><coden>ITMUF8</coden><abstract>Multimodal learning involves developing models that can integrate information from various sources like images and texts. In this field, multimodal text generation is a crucial aspect that involves processing data from multiple modalities and outputting text. The image-guided story ending generation (IgSEG) is a particularly significant task, targeting on an understanding of complex relationships between text and image data with a complete story text ending. Unfortunately, deep neural networks, which are the backbone of recent IgSEG models, are vulnerable to adversarial samples. Current adversarial attack methods mainly focus on single-modality data and do not analyze adversarial attacks for multimodal text generation tasks that use cross-modal information. To this end, we propose an iterative adversarial attack method (Iterative-attack) that fuses image and text modality attacks, allowing for an attack search for adversarial text and image in a more effective iterative way. Experimental results demonstrate that the proposed method outperforms existing single-modal and non-iterative multimodal attack methods, indicating the potential for improving the adversarial robustness of multimodal text generation models, such as multimodal machine translation, multimodal question answering, etc.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TMM.2023.3345167</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-0639-2012</orcidid><orcidid>https://orcid.org/0009-0003-5621-6310</orcidid><orcidid>https://orcid.org/0000-0001-5461-3986</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 1520-9210
ispartof	IEEE transactions on multimedia, 2024, Vol.26, p.6117-6130
issn	1520-9210 1941-0077
language	eng
recordid	cdi_crossref_primary_10_1109_TMM_2023_3345167
source	IEEE Electronic Library (IEL) Journals
subjects	adversarial attack Artificial neural networks Computational modeling Data models Data processing Fuses Iterative methods Machine learning Machine translation Multimodal multimodal text generation Perturbation methods Task analysis Visualization
title	Iterative Adversarial Attack on Image-Guided Story Ending Generation
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T23%3A58%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Iterative%20Adversarial%20Attack%20on%20Image-Guided%20Story%20Ending%20Generation&rft.jtitle=IEEE%20transactions%20on%20multimedia&rft.au=Wang,%20Youze&rft.date=2024&rft.volume=26&rft.spage=6117&rft.epage=6130&rft.pages=6117-6130&rft.issn=1520-9210&rft.eissn=1941-0077&rft.coden=ITMUF8&rft_id=info:doi/10.1109/TMM.2023.3345167&rft_dat=%3Cproquest_cross%3E3035274133%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c245t-be3fcf1c12a30bd4b9d47526e569d5f57deeb5a6deb0ffb38786e9678df4df083%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3035274133&rft_id=info:pmid/&rft_ieee_id=10366855&rfr_iscdi=true