Loading…

Iterative Adversarial Attack on Image-Guided Story Ending Generation

Multimodal learning involves developing models that can integrate information from various sources like images and texts. In this field, multimodal text generation is a crucial aspect that involves processing data from multiple modalities and outputting text. The image-guided story ending generation...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on multimedia 2024, Vol.26, p.6117-6130
Main Authors: Wang, Youze, Hu, Wenbo, Hong, Richang
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c245t-be3fcf1c12a30bd4b9d47526e569d5f57deeb5a6deb0ffb38786e9678df4df083
container_end_page 6130
container_issue
container_start_page 6117
container_title IEEE transactions on multimedia
container_volume 26
creator Wang, Youze
Hu, Wenbo
Hong, Richang
description Multimodal learning involves developing models that can integrate information from various sources like images and texts. In this field, multimodal text generation is a crucial aspect that involves processing data from multiple modalities and outputting text. The image-guided story ending generation (IgSEG) is a particularly significant task, targeting on an understanding of complex relationships between text and image data with a complete story text ending. Unfortunately, deep neural networks, which are the backbone of recent IgSEG models, are vulnerable to adversarial samples. Current adversarial attack methods mainly focus on single-modality data and do not analyze adversarial attacks for multimodal text generation tasks that use cross-modal information. To this end, we propose an iterative adversarial attack method (Iterative-attack) that fuses image and text modality attacks, allowing for an attack search for adversarial text and image in a more effective iterative way. Experimental results demonstrate that the proposed method outperforms existing single-modal and non-iterative multimodal attack methods, indicating the potential for improving the adversarial robustness of multimodal text generation models, such as multimodal machine translation, multimodal question answering, etc.
doi_str_mv 10.1109/TMM.2023.3345167
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TMM_2023_3345167</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10366855</ieee_id><sourcerecordid>3035274133</sourcerecordid><originalsourceid>FETCH-LOGICAL-c245t-be3fcf1c12a30bd4b9d47526e569d5f57deeb5a6deb0ffb38786e9678df4df083</originalsourceid><addsrcrecordid>eNpNkE1PwkAQQDdGExG9e_DQxHNx9rs9EkQkgXgQz5ttd5YUocXdloR_bxEOnmYO780kj5BHCiNKIX9ZLZcjBoyPOBeSKn1FBjQXNAXQ-rrfJYM0ZxRuyV2MGwAqJOgBeZ23GGxbHTAZuwOGaENlt8m4bW35nTR1Mt_ZNaazrnLoks-2CcdkWruqXiczrP_Upr4nN95uIz5c5pB8vU1Xk_d08TGbT8aLtGRCtmmB3JeelpRZDoUTRe6ElkyhVLmTXmqHWEirHBbgfcEznSnMlc6cF85Dxofk-Xx3H5qfDmNrNk0X6v6l4cAl04Jy3lNwpsrQxBjQm32odjYcDQVzamX6VubUylxa9crTWakQ8R_Olcqk5L9E_WVd</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3035274133</pqid></control><display><type>article</type><title>Iterative Adversarial Attack on Image-Guided Story Ending Generation</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Wang, Youze ; Hu, Wenbo ; Hong, Richang</creator><creatorcontrib>Wang, Youze ; Hu, Wenbo ; Hong, Richang</creatorcontrib><description>Multimodal learning involves developing models that can integrate information from various sources like images and texts. In this field, multimodal text generation is a crucial aspect that involves processing data from multiple modalities and outputting text. The image-guided story ending generation (IgSEG) is a particularly significant task, targeting on an understanding of complex relationships between text and image data with a complete story text ending. Unfortunately, deep neural networks, which are the backbone of recent IgSEG models, are vulnerable to adversarial samples. Current adversarial attack methods mainly focus on single-modality data and do not analyze adversarial attacks for multimodal text generation tasks that use cross-modal information. To this end, we propose an iterative adversarial attack method (Iterative-attack) that fuses image and text modality attacks, allowing for an attack search for adversarial text and image in a more effective iterative way. Experimental results demonstrate that the proposed method outperforms existing single-modal and non-iterative multimodal attack methods, indicating the potential for improving the adversarial robustness of multimodal text generation models, such as multimodal machine translation, multimodal question answering, etc.</description><identifier>ISSN: 1520-9210</identifier><identifier>EISSN: 1941-0077</identifier><identifier>DOI: 10.1109/TMM.2023.3345167</identifier><identifier>CODEN: ITMUF8</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>adversarial attack ; Artificial neural networks ; Computational modeling ; Data models ; Data processing ; Fuses ; Iterative methods ; Machine learning ; Machine translation ; Multimodal ; multimodal text generation ; Perturbation methods ; Task analysis ; Visualization</subject><ispartof>IEEE transactions on multimedia, 2024, Vol.26, p.6117-6130</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c245t-be3fcf1c12a30bd4b9d47526e569d5f57deeb5a6deb0ffb38786e9678df4df083</cites><orcidid>0000-0002-0639-2012 ; 0009-0003-5621-6310 ; 0000-0001-5461-3986</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10366855$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,4010,27900,27901,27902,54771</link.rule.ids></links><search><creatorcontrib>Wang, Youze</creatorcontrib><creatorcontrib>Hu, Wenbo</creatorcontrib><creatorcontrib>Hong, Richang</creatorcontrib><title>Iterative Adversarial Attack on Image-Guided Story Ending Generation</title><title>IEEE transactions on multimedia</title><addtitle>TMM</addtitle><description>Multimodal learning involves developing models that can integrate information from various sources like images and texts. In this field, multimodal text generation is a crucial aspect that involves processing data from multiple modalities and outputting text. The image-guided story ending generation (IgSEG) is a particularly significant task, targeting on an understanding of complex relationships between text and image data with a complete story text ending. Unfortunately, deep neural networks, which are the backbone of recent IgSEG models, are vulnerable to adversarial samples. Current adversarial attack methods mainly focus on single-modality data and do not analyze adversarial attacks for multimodal text generation tasks that use cross-modal information. To this end, we propose an iterative adversarial attack method (Iterative-attack) that fuses image and text modality attacks, allowing for an attack search for adversarial text and image in a more effective iterative way. Experimental results demonstrate that the proposed method outperforms existing single-modal and non-iterative multimodal attack methods, indicating the potential for improving the adversarial robustness of multimodal text generation models, such as multimodal machine translation, multimodal question answering, etc.</description><subject>adversarial attack</subject><subject>Artificial neural networks</subject><subject>Computational modeling</subject><subject>Data models</subject><subject>Data processing</subject><subject>Fuses</subject><subject>Iterative methods</subject><subject>Machine learning</subject><subject>Machine translation</subject><subject>Multimodal</subject><subject>multimodal text generation</subject><subject>Perturbation methods</subject><subject>Task analysis</subject><subject>Visualization</subject><issn>1520-9210</issn><issn>1941-0077</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNpNkE1PwkAQQDdGExG9e_DQxHNx9rs9EkQkgXgQz5ttd5YUocXdloR_bxEOnmYO780kj5BHCiNKIX9ZLZcjBoyPOBeSKn1FBjQXNAXQ-rrfJYM0ZxRuyV2MGwAqJOgBeZ23GGxbHTAZuwOGaENlt8m4bW35nTR1Mt_ZNaazrnLoks-2CcdkWruqXiczrP_Upr4nN95uIz5c5pB8vU1Xk_d08TGbT8aLtGRCtmmB3JeelpRZDoUTRe6ElkyhVLmTXmqHWEirHBbgfcEznSnMlc6cF85Dxofk-Xx3H5qfDmNrNk0X6v6l4cAl04Jy3lNwpsrQxBjQm32odjYcDQVzamX6VubUylxa9crTWakQ8R_Olcqk5L9E_WVd</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Wang, Youze</creator><creator>Hu, Wenbo</creator><creator>Hong, Richang</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-0639-2012</orcidid><orcidid>https://orcid.org/0009-0003-5621-6310</orcidid><orcidid>https://orcid.org/0000-0001-5461-3986</orcidid></search><sort><creationdate>2024</creationdate><title>Iterative Adversarial Attack on Image-Guided Story Ending Generation</title><author>Wang, Youze ; Hu, Wenbo ; Hong, Richang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c245t-be3fcf1c12a30bd4b9d47526e569d5f57deeb5a6deb0ffb38786e9678df4df083</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>adversarial attack</topic><topic>Artificial neural networks</topic><topic>Computational modeling</topic><topic>Data models</topic><topic>Data processing</topic><topic>Fuses</topic><topic>Iterative methods</topic><topic>Machine learning</topic><topic>Machine translation</topic><topic>Multimodal</topic><topic>multimodal text generation</topic><topic>Perturbation methods</topic><topic>Task analysis</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Youze</creatorcontrib><creatorcontrib>Hu, Wenbo</creatorcontrib><creatorcontrib>Hong, Richang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on multimedia</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wang, Youze</au><au>Hu, Wenbo</au><au>Hong, Richang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Iterative Adversarial Attack on Image-Guided Story Ending Generation</atitle><jtitle>IEEE transactions on multimedia</jtitle><stitle>TMM</stitle><date>2024</date><risdate>2024</risdate><volume>26</volume><spage>6117</spage><epage>6130</epage><pages>6117-6130</pages><issn>1520-9210</issn><eissn>1941-0077</eissn><coden>ITMUF8</coden><abstract>Multimodal learning involves developing models that can integrate information from various sources like images and texts. In this field, multimodal text generation is a crucial aspect that involves processing data from multiple modalities and outputting text. The image-guided story ending generation (IgSEG) is a particularly significant task, targeting on an understanding of complex relationships between text and image data with a complete story text ending. Unfortunately, deep neural networks, which are the backbone of recent IgSEG models, are vulnerable to adversarial samples. Current adversarial attack methods mainly focus on single-modality data and do not analyze adversarial attacks for multimodal text generation tasks that use cross-modal information. To this end, we propose an iterative adversarial attack method (Iterative-attack) that fuses image and text modality attacks, allowing for an attack search for adversarial text and image in a more effective iterative way. Experimental results demonstrate that the proposed method outperforms existing single-modal and non-iterative multimodal attack methods, indicating the potential for improving the adversarial robustness of multimodal text generation models, such as multimodal machine translation, multimodal question answering, etc.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TMM.2023.3345167</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-0639-2012</orcidid><orcidid>https://orcid.org/0009-0003-5621-6310</orcidid><orcidid>https://orcid.org/0000-0001-5461-3986</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1520-9210
ispartof IEEE transactions on multimedia, 2024, Vol.26, p.6117-6130
issn 1520-9210
1941-0077
language eng
recordid cdi_crossref_primary_10_1109_TMM_2023_3345167
source IEEE Electronic Library (IEL) Journals
subjects adversarial attack
Artificial neural networks
Computational modeling
Data models
Data processing
Fuses
Iterative methods
Machine learning
Machine translation
Multimodal
multimodal text generation
Perturbation methods
Task analysis
Visualization
title Iterative Adversarial Attack on Image-Guided Story Ending Generation
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T23%3A58%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Iterative%20Adversarial%20Attack%20on%20Image-Guided%20Story%20Ending%20Generation&rft.jtitle=IEEE%20transactions%20on%20multimedia&rft.au=Wang,%20Youze&rft.date=2024&rft.volume=26&rft.spage=6117&rft.epage=6130&rft.pages=6117-6130&rft.issn=1520-9210&rft.eissn=1941-0077&rft.coden=ITMUF8&rft_id=info:doi/10.1109/TMM.2023.3345167&rft_dat=%3Cproquest_cross%3E3035274133%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c245t-be3fcf1c12a30bd4b9d47526e569d5f57deeb5a6deb0ffb38786e9678df4df083%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3035274133&rft_id=info:pmid/&rft_ieee_id=10366855&rfr_iscdi=true