Loading…
Iterative Adversarial Attack on Image-Guided Story Ending Generation
Multimodal learning involves developing models that can integrate information from various sources like images and texts. In this field, multimodal text generation is a crucial aspect that involves processing data from multiple modalities and outputting text. The image-guided story ending generation...
Saved in:
Published in: | IEEE transactions on multimedia 2024, Vol.26, p.6117-6130 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | cdi_FETCH-LOGICAL-c245t-be3fcf1c12a30bd4b9d47526e569d5f57deeb5a6deb0ffb38786e9678df4df083 |
container_end_page | 6130 |
container_issue | |
container_start_page | 6117 |
container_title | IEEE transactions on multimedia |
container_volume | 26 |
creator | Wang, Youze Hu, Wenbo Hong, Richang |
description | Multimodal learning involves developing models that can integrate information from various sources like images and texts. In this field, multimodal text generation is a crucial aspect that involves processing data from multiple modalities and outputting text. The image-guided story ending generation (IgSEG) is a particularly significant task, targeting on an understanding of complex relationships between text and image data with a complete story text ending. Unfortunately, deep neural networks, which are the backbone of recent IgSEG models, are vulnerable to adversarial samples. Current adversarial attack methods mainly focus on single-modality data and do not analyze adversarial attacks for multimodal text generation tasks that use cross-modal information. To this end, we propose an iterative adversarial attack method (Iterative-attack) that fuses image and text modality attacks, allowing for an attack search for adversarial text and image in a more effective iterative way. Experimental results demonstrate that the proposed method outperforms existing single-modal and non-iterative multimodal attack methods, indicating the potential for improving the adversarial robustness of multimodal text generation models, such as multimodal machine translation, multimodal question answering, etc. |
doi_str_mv | 10.1109/TMM.2023.3345167 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TMM_2023_3345167</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10366855</ieee_id><sourcerecordid>3035274133</sourcerecordid><originalsourceid>FETCH-LOGICAL-c245t-be3fcf1c12a30bd4b9d47526e569d5f57deeb5a6deb0ffb38786e9678df4df083</originalsourceid><addsrcrecordid>eNpNkE1PwkAQQDdGExG9e_DQxHNx9rs9EkQkgXgQz5ttd5YUocXdloR_bxEOnmYO780kj5BHCiNKIX9ZLZcjBoyPOBeSKn1FBjQXNAXQ-rrfJYM0ZxRuyV2MGwAqJOgBeZ23GGxbHTAZuwOGaENlt8m4bW35nTR1Mt_ZNaazrnLoks-2CcdkWruqXiczrP_Upr4nN95uIz5c5pB8vU1Xk_d08TGbT8aLtGRCtmmB3JeelpRZDoUTRe6ElkyhVLmTXmqHWEirHBbgfcEznSnMlc6cF85Dxofk-Xx3H5qfDmNrNk0X6v6l4cAl04Jy3lNwpsrQxBjQm32odjYcDQVzamX6VubUylxa9crTWakQ8R_Olcqk5L9E_WVd</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3035274133</pqid></control><display><type>article</type><title>Iterative Adversarial Attack on Image-Guided Story Ending Generation</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Wang, Youze ; Hu, Wenbo ; Hong, Richang</creator><creatorcontrib>Wang, Youze ; Hu, Wenbo ; Hong, Richang</creatorcontrib><description>Multimodal learning involves developing models that can integrate information from various sources like images and texts. In this field, multimodal text generation is a crucial aspect that involves processing data from multiple modalities and outputting text. The image-guided story ending generation (IgSEG) is a particularly significant task, targeting on an understanding of complex relationships between text and image data with a complete story text ending. Unfortunately, deep neural networks, which are the backbone of recent IgSEG models, are vulnerable to adversarial samples. Current adversarial attack methods mainly focus on single-modality data and do not analyze adversarial attacks for multimodal text generation tasks that use cross-modal information. To this end, we propose an iterative adversarial attack method (Iterative-attack) that fuses image and text modality attacks, allowing for an attack search for adversarial text and image in a more effective iterative way. Experimental results demonstrate that the proposed method outperforms existing single-modal and non-iterative multimodal attack methods, indicating the potential for improving the adversarial robustness of multimodal text generation models, such as multimodal machine translation, multimodal question answering, etc.</description><identifier>ISSN: 1520-9210</identifier><identifier>EISSN: 1941-0077</identifier><identifier>DOI: 10.1109/TMM.2023.3345167</identifier><identifier>CODEN: ITMUF8</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>adversarial attack ; Artificial neural networks ; Computational modeling ; Data models ; Data processing ; Fuses ; Iterative methods ; Machine learning ; Machine translation ; Multimodal ; multimodal text generation ; Perturbation methods ; Task analysis ; Visualization</subject><ispartof>IEEE transactions on multimedia, 2024, Vol.26, p.6117-6130</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c245t-be3fcf1c12a30bd4b9d47526e569d5f57deeb5a6deb0ffb38786e9678df4df083</cites><orcidid>0000-0002-0639-2012 ; 0009-0003-5621-6310 ; 0000-0001-5461-3986</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10366855$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,4010,27900,27901,27902,54771</link.rule.ids></links><search><creatorcontrib>Wang, Youze</creatorcontrib><creatorcontrib>Hu, Wenbo</creatorcontrib><creatorcontrib>Hong, Richang</creatorcontrib><title>Iterative Adversarial Attack on Image-Guided Story Ending Generation</title><title>IEEE transactions on multimedia</title><addtitle>TMM</addtitle><description>Multimodal learning involves developing models that can integrate information from various sources like images and texts. In this field, multimodal text generation is a crucial aspect that involves processing data from multiple modalities and outputting text. The image-guided story ending generation (IgSEG) is a particularly significant task, targeting on an understanding of complex relationships between text and image data with a complete story text ending. Unfortunately, deep neural networks, which are the backbone of recent IgSEG models, are vulnerable to adversarial samples. Current adversarial attack methods mainly focus on single-modality data and do not analyze adversarial attacks for multimodal text generation tasks that use cross-modal information. To this end, we propose an iterative adversarial attack method (Iterative-attack) that fuses image and text modality attacks, allowing for an attack search for adversarial text and image in a more effective iterative way. Experimental results demonstrate that the proposed method outperforms existing single-modal and non-iterative multimodal attack methods, indicating the potential for improving the adversarial robustness of multimodal text generation models, such as multimodal machine translation, multimodal question answering, etc.</description><subject>adversarial attack</subject><subject>Artificial neural networks</subject><subject>Computational modeling</subject><subject>Data models</subject><subject>Data processing</subject><subject>Fuses</subject><subject>Iterative methods</subject><subject>Machine learning</subject><subject>Machine translation</subject><subject>Multimodal</subject><subject>multimodal text generation</subject><subject>Perturbation methods</subject><subject>Task analysis</subject><subject>Visualization</subject><issn>1520-9210</issn><issn>1941-0077</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNpNkE1PwkAQQDdGExG9e_DQxHNx9rs9EkQkgXgQz5ttd5YUocXdloR_bxEOnmYO780kj5BHCiNKIX9ZLZcjBoyPOBeSKn1FBjQXNAXQ-rrfJYM0ZxRuyV2MGwAqJOgBeZ23GGxbHTAZuwOGaENlt8m4bW35nTR1Mt_ZNaazrnLoks-2CcdkWruqXiczrP_Upr4nN95uIz5c5pB8vU1Xk_d08TGbT8aLtGRCtmmB3JeelpRZDoUTRe6ElkyhVLmTXmqHWEirHBbgfcEznSnMlc6cF85Dxofk-Xx3H5qfDmNrNk0X6v6l4cAl04Jy3lNwpsrQxBjQm32odjYcDQVzamX6VubUylxa9crTWakQ8R_Olcqk5L9E_WVd</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Wang, Youze</creator><creator>Hu, Wenbo</creator><creator>Hong, Richang</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-0639-2012</orcidid><orcidid>https://orcid.org/0009-0003-5621-6310</orcidid><orcidid>https://orcid.org/0000-0001-5461-3986</orcidid></search><sort><creationdate>2024</creationdate><title>Iterative Adversarial Attack on Image-Guided Story Ending Generation</title><author>Wang, Youze ; Hu, Wenbo ; Hong, Richang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c245t-be3fcf1c12a30bd4b9d47526e569d5f57deeb5a6deb0ffb38786e9678df4df083</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>adversarial attack</topic><topic>Artificial neural networks</topic><topic>Computational modeling</topic><topic>Data models</topic><topic>Data processing</topic><topic>Fuses</topic><topic>Iterative methods</topic><topic>Machine learning</topic><topic>Machine translation</topic><topic>Multimodal</topic><topic>multimodal text generation</topic><topic>Perturbation methods</topic><topic>Task analysis</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Youze</creatorcontrib><creatorcontrib>Hu, Wenbo</creatorcontrib><creatorcontrib>Hong, Richang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on multimedia</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wang, Youze</au><au>Hu, Wenbo</au><au>Hong, Richang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Iterative Adversarial Attack on Image-Guided Story Ending Generation</atitle><jtitle>IEEE transactions on multimedia</jtitle><stitle>TMM</stitle><date>2024</date><risdate>2024</risdate><volume>26</volume><spage>6117</spage><epage>6130</epage><pages>6117-6130</pages><issn>1520-9210</issn><eissn>1941-0077</eissn><coden>ITMUF8</coden><abstract>Multimodal learning involves developing models that can integrate information from various sources like images and texts. In this field, multimodal text generation is a crucial aspect that involves processing data from multiple modalities and outputting text. The image-guided story ending generation (IgSEG) is a particularly significant task, targeting on an understanding of complex relationships between text and image data with a complete story text ending. Unfortunately, deep neural networks, which are the backbone of recent IgSEG models, are vulnerable to adversarial samples. Current adversarial attack methods mainly focus on single-modality data and do not analyze adversarial attacks for multimodal text generation tasks that use cross-modal information. To this end, we propose an iterative adversarial attack method (Iterative-attack) that fuses image and text modality attacks, allowing for an attack search for adversarial text and image in a more effective iterative way. Experimental results demonstrate that the proposed method outperforms existing single-modal and non-iterative multimodal attack methods, indicating the potential for improving the adversarial robustness of multimodal text generation models, such as multimodal machine translation, multimodal question answering, etc.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TMM.2023.3345167</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-0639-2012</orcidid><orcidid>https://orcid.org/0009-0003-5621-6310</orcidid><orcidid>https://orcid.org/0000-0001-5461-3986</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1520-9210 |
ispartof | IEEE transactions on multimedia, 2024, Vol.26, p.6117-6130 |
issn | 1520-9210 1941-0077 |
language | eng |
recordid | cdi_crossref_primary_10_1109_TMM_2023_3345167 |
source | IEEE Electronic Library (IEL) Journals |
subjects | adversarial attack Artificial neural networks Computational modeling Data models Data processing Fuses Iterative methods Machine learning Machine translation Multimodal multimodal text generation Perturbation methods Task analysis Visualization |
title | Iterative Adversarial Attack on Image-Guided Story Ending Generation |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T23%3A58%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Iterative%20Adversarial%20Attack%20on%20Image-Guided%20Story%20Ending%20Generation&rft.jtitle=IEEE%20transactions%20on%20multimedia&rft.au=Wang,%20Youze&rft.date=2024&rft.volume=26&rft.spage=6117&rft.epage=6130&rft.pages=6117-6130&rft.issn=1520-9210&rft.eissn=1941-0077&rft.coden=ITMUF8&rft_id=info:doi/10.1109/TMM.2023.3345167&rft_dat=%3Cproquest_cross%3E3035274133%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c245t-be3fcf1c12a30bd4b9d47526e569d5f57deeb5a6deb0ffb38786e9678df4df083%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3035274133&rft_id=info:pmid/&rft_ieee_id=10366855&rfr_iscdi=true |