Loading…

Multistage Spatio-Temporal Networks for Robust Sketch Recognition

Sketch recognition relies on two types of information, namely, spatial contexts like the local structures in images and temporal contexts like the orders of strokes. Existing methods usually adopt convolutional neural networks (CNNs) to model spatial contexts, and recurrent neural networks (RNNs) fo...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on image processing 2022, Vol.31, p.2683-2694
Main Authors: Li, Hanhui, Jiang, Xudong, Guan, Boliang, Wang, Ruomei, Thalmann, Nadia Magnenat
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c347t-351dc29fe302927ad502f70addf6de58dbd68e24789aa669987ca3064c4b33113
cites cdi_FETCH-LOGICAL-c347t-351dc29fe302927ad502f70addf6de58dbd68e24789aa669987ca3064c4b33113
container_end_page 2694
container_issue
container_start_page 2683
container_title IEEE transactions on image processing
container_volume 31
creator Li, Hanhui
Jiang, Xudong
Guan, Boliang
Wang, Ruomei
Thalmann, Nadia Magnenat
description Sketch recognition relies on two types of information, namely, spatial contexts like the local structures in images and temporal contexts like the orders of strokes. Existing methods usually adopt convolutional neural networks (CNNs) to model spatial contexts, and recurrent neural networks (RNNs) for temporal contexts. However, most of them combine spatial and temporal features with late fusion or single-stage transformation, which is prone to losing the informative details in sketches. To tackle this problem, we propose a novel framework that aims at the multi-stage interactions and refinements of spatial and temporal features. Specifically, given a sketch represented by a stroke array, we first generate a temporal-enriched image (TEI), which is a pseudo-color image retaining the temporal order of strokes, to overcome the difficulty of CNNs in leveraging temporal information. We then construct a dual-branch network, in which a CNN branch and a RNN branch are adopted to process the stroke array and the TEI respectively. In the early stages of our network, considering the limited ability of RNNs in capturing spatial structures, we utilize multiple enhancement modules to enhance the stroke features with the TEI features. While in the last stage of our network, we propose a spatio-temporal enhancement module that refines stroke features and TEI features in a joint feature space. Furthermore, a bidirectional temporal-compatible unit that adaptively merges features in opposite temporal orders, is proposed to help RNNs tackle abrupt strokes. Comprehensive experimental results on QuickDraw and TU-Berlin demonstrate that the proposed method is a robust and efficient solution for sketch recognition.
doi_str_mv 10.1109/TIP.2022.3160240
format article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_proquest_journals_2645985432</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9740528</ieee_id><sourcerecordid>2645985432</sourcerecordid><originalsourceid>FETCH-LOGICAL-c347t-351dc29fe302927ad502f70addf6de58dbd68e24789aa669987ca3064c4b33113</originalsourceid><addsrcrecordid>eNpdkE1Lw0AQhhdRbK3eBUECXrykzn5ls0cpfhTqB209h00yqWmTbt1NEP-9Ka09eJqBed6X4SHkksKQUtB38_H7kAFjQ04jYAKOSJ9qQUMAwY67HaQKFRW6R868XwJQIWl0SnpccgYUWJ_cv7RVU_rGLDCYbUxT2nCO9cY6UwWv2Hxbt_JBYV0wtWnrm2C2wib7DKaY2cW67PD1OTkpTOXxYj8H5OPxYT56DidvT-PR_STMuFBNyCXNM6YL5MA0UyaXwAoFJs-LKEcZ52kexciEirUxUaR1rDLDIRKZSDmnlA_I7a534-xXi75J6tJnWFVmjbb1CYsEizVVsejQm3_o0rZu3X23paSOpeCso2BHZc5677BINq6sjftJKCRbvUmnN9nqTfZ6u8j1vrhNa8wPgT-fHXC1A0pEPJy1EiBZzH8B5o58mg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2645985432</pqid></control><display><type>article</type><title>Multistage Spatio-Temporal Networks for Robust Sketch Recognition</title><source>IEEE Xplore (Online service)</source><creator>Li, Hanhui ; Jiang, Xudong ; Guan, Boliang ; Wang, Ruomei ; Thalmann, Nadia Magnenat</creator><creatorcontrib>Li, Hanhui ; Jiang, Xudong ; Guan, Boliang ; Wang, Ruomei ; Thalmann, Nadia Magnenat</creatorcontrib><description>Sketch recognition relies on two types of information, namely, spatial contexts like the local structures in images and temporal contexts like the orders of strokes. Existing methods usually adopt convolutional neural networks (CNNs) to model spatial contexts, and recurrent neural networks (RNNs) for temporal contexts. However, most of them combine spatial and temporal features with late fusion or single-stage transformation, which is prone to losing the informative details in sketches. To tackle this problem, we propose a novel framework that aims at the multi-stage interactions and refinements of spatial and temporal features. Specifically, given a sketch represented by a stroke array, we first generate a temporal-enriched image (TEI), which is a pseudo-color image retaining the temporal order of strokes, to overcome the difficulty of CNNs in leveraging temporal information. We then construct a dual-branch network, in which a CNN branch and a RNN branch are adopted to process the stroke array and the TEI respectively. In the early stages of our network, considering the limited ability of RNNs in capturing spatial structures, we utilize multiple enhancement modules to enhance the stroke features with the TEI features. While in the last stage of our network, we propose a spatio-temporal enhancement module that refines stroke features and TEI features in a joint feature space. Furthermore, a bidirectional temporal-compatible unit that adaptively merges features in opposite temporal orders, is proposed to help RNNs tackle abrupt strokes. Comprehensive experimental results on QuickDraw and TU-Berlin demonstrate that the proposed method is a robust and efficient solution for sketch recognition.</description><identifier>ISSN: 1057-7149</identifier><identifier>EISSN: 1941-0042</identifier><identifier>DOI: 10.1109/TIP.2022.3160240</identifier><identifier>PMID: 35320102</identifier><identifier>CODEN: IIPRE4</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Arrays ; Artificial neural networks ; Color imagery ; Convolutional neural networks ; Feature extraction ; feature fusion ; Image recognition ; Image segmentation ; Modules ; multi-modal networks ; Network architecture ; Neural networks ; Recognition ; Recurrent neural networks ; Robustness (mathematics) ; Sketch recognition ; Sketches ; spatio-temporal feature ; Stroke (medical condition)</subject><ispartof>IEEE transactions on image processing, 2022, Vol.31, p.2683-2694</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c347t-351dc29fe302927ad502f70addf6de58dbd68e24789aa669987ca3064c4b33113</citedby><cites>FETCH-LOGICAL-c347t-351dc29fe302927ad502f70addf6de58dbd68e24789aa669987ca3064c4b33113</cites><orcidid>0000-0002-9838-6532 ; 0000-0002-3897-4041 ; 0000-0001-5221-8018 ; 0000-0002-9104-2315</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9740528$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,777,781,4010,27904,27905,27906,54777</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35320102$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Li, Hanhui</creatorcontrib><creatorcontrib>Jiang, Xudong</creatorcontrib><creatorcontrib>Guan, Boliang</creatorcontrib><creatorcontrib>Wang, Ruomei</creatorcontrib><creatorcontrib>Thalmann, Nadia Magnenat</creatorcontrib><title>Multistage Spatio-Temporal Networks for Robust Sketch Recognition</title><title>IEEE transactions on image processing</title><addtitle>TIP</addtitle><addtitle>IEEE Trans Image Process</addtitle><description>Sketch recognition relies on two types of information, namely, spatial contexts like the local structures in images and temporal contexts like the orders of strokes. Existing methods usually adopt convolutional neural networks (CNNs) to model spatial contexts, and recurrent neural networks (RNNs) for temporal contexts. However, most of them combine spatial and temporal features with late fusion or single-stage transformation, which is prone to losing the informative details in sketches. To tackle this problem, we propose a novel framework that aims at the multi-stage interactions and refinements of spatial and temporal features. Specifically, given a sketch represented by a stroke array, we first generate a temporal-enriched image (TEI), which is a pseudo-color image retaining the temporal order of strokes, to overcome the difficulty of CNNs in leveraging temporal information. We then construct a dual-branch network, in which a CNN branch and a RNN branch are adopted to process the stroke array and the TEI respectively. In the early stages of our network, considering the limited ability of RNNs in capturing spatial structures, we utilize multiple enhancement modules to enhance the stroke features with the TEI features. While in the last stage of our network, we propose a spatio-temporal enhancement module that refines stroke features and TEI features in a joint feature space. Furthermore, a bidirectional temporal-compatible unit that adaptively merges features in opposite temporal orders, is proposed to help RNNs tackle abrupt strokes. Comprehensive experimental results on QuickDraw and TU-Berlin demonstrate that the proposed method is a robust and efficient solution for sketch recognition.</description><subject>Arrays</subject><subject>Artificial neural networks</subject><subject>Color imagery</subject><subject>Convolutional neural networks</subject><subject>Feature extraction</subject><subject>feature fusion</subject><subject>Image recognition</subject><subject>Image segmentation</subject><subject>Modules</subject><subject>multi-modal networks</subject><subject>Network architecture</subject><subject>Neural networks</subject><subject>Recognition</subject><subject>Recurrent neural networks</subject><subject>Robustness (mathematics)</subject><subject>Sketch recognition</subject><subject>Sketches</subject><subject>spatio-temporal feature</subject><subject>Stroke (medical condition)</subject><issn>1057-7149</issn><issn>1941-0042</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNpdkE1Lw0AQhhdRbK3eBUECXrykzn5ls0cpfhTqB209h00yqWmTbt1NEP-9Ka09eJqBed6X4SHkksKQUtB38_H7kAFjQ04jYAKOSJ9qQUMAwY67HaQKFRW6R868XwJQIWl0SnpccgYUWJ_cv7RVU_rGLDCYbUxT2nCO9cY6UwWv2Hxbt_JBYV0wtWnrm2C2wib7DKaY2cW67PD1OTkpTOXxYj8H5OPxYT56DidvT-PR_STMuFBNyCXNM6YL5MA0UyaXwAoFJs-LKEcZ52kexciEirUxUaR1rDLDIRKZSDmnlA_I7a534-xXi75J6tJnWFVmjbb1CYsEizVVsejQm3_o0rZu3X23paSOpeCso2BHZc5677BINq6sjftJKCRbvUmnN9nqTfZ6u8j1vrhNa8wPgT-fHXC1A0pEPJy1EiBZzH8B5o58mg</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Li, Hanhui</creator><creator>Jiang, Xudong</creator><creator>Guan, Boliang</creator><creator>Wang, Ruomei</creator><creator>Thalmann, Nadia Magnenat</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-9838-6532</orcidid><orcidid>https://orcid.org/0000-0002-3897-4041</orcidid><orcidid>https://orcid.org/0000-0001-5221-8018</orcidid><orcidid>https://orcid.org/0000-0002-9104-2315</orcidid></search><sort><creationdate>2022</creationdate><title>Multistage Spatio-Temporal Networks for Robust Sketch Recognition</title><author>Li, Hanhui ; Jiang, Xudong ; Guan, Boliang ; Wang, Ruomei ; Thalmann, Nadia Magnenat</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c347t-351dc29fe302927ad502f70addf6de58dbd68e24789aa669987ca3064c4b33113</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Arrays</topic><topic>Artificial neural networks</topic><topic>Color imagery</topic><topic>Convolutional neural networks</topic><topic>Feature extraction</topic><topic>feature fusion</topic><topic>Image recognition</topic><topic>Image segmentation</topic><topic>Modules</topic><topic>multi-modal networks</topic><topic>Network architecture</topic><topic>Neural networks</topic><topic>Recognition</topic><topic>Recurrent neural networks</topic><topic>Robustness (mathematics)</topic><topic>Sketch recognition</topic><topic>Sketches</topic><topic>spatio-temporal feature</topic><topic>Stroke (medical condition)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Hanhui</creatorcontrib><creatorcontrib>Jiang, Xudong</creatorcontrib><creatorcontrib>Guan, Boliang</creatorcontrib><creatorcontrib>Wang, Ruomei</creatorcontrib><creatorcontrib>Thalmann, Nadia Magnenat</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on image processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Hanhui</au><au>Jiang, Xudong</au><au>Guan, Boliang</au><au>Wang, Ruomei</au><au>Thalmann, Nadia Magnenat</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multistage Spatio-Temporal Networks for Robust Sketch Recognition</atitle><jtitle>IEEE transactions on image processing</jtitle><stitle>TIP</stitle><addtitle>IEEE Trans Image Process</addtitle><date>2022</date><risdate>2022</risdate><volume>31</volume><spage>2683</spage><epage>2694</epage><pages>2683-2694</pages><issn>1057-7149</issn><eissn>1941-0042</eissn><coden>IIPRE4</coden><abstract>Sketch recognition relies on two types of information, namely, spatial contexts like the local structures in images and temporal contexts like the orders of strokes. Existing methods usually adopt convolutional neural networks (CNNs) to model spatial contexts, and recurrent neural networks (RNNs) for temporal contexts. However, most of them combine spatial and temporal features with late fusion or single-stage transformation, which is prone to losing the informative details in sketches. To tackle this problem, we propose a novel framework that aims at the multi-stage interactions and refinements of spatial and temporal features. Specifically, given a sketch represented by a stroke array, we first generate a temporal-enriched image (TEI), which is a pseudo-color image retaining the temporal order of strokes, to overcome the difficulty of CNNs in leveraging temporal information. We then construct a dual-branch network, in which a CNN branch and a RNN branch are adopted to process the stroke array and the TEI respectively. In the early stages of our network, considering the limited ability of RNNs in capturing spatial structures, we utilize multiple enhancement modules to enhance the stroke features with the TEI features. While in the last stage of our network, we propose a spatio-temporal enhancement module that refines stroke features and TEI features in a joint feature space. Furthermore, a bidirectional temporal-compatible unit that adaptively merges features in opposite temporal orders, is proposed to help RNNs tackle abrupt strokes. Comprehensive experimental results on QuickDraw and TU-Berlin demonstrate that the proposed method is a robust and efficient solution for sketch recognition.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>35320102</pmid><doi>10.1109/TIP.2022.3160240</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0002-9838-6532</orcidid><orcidid>https://orcid.org/0000-0002-3897-4041</orcidid><orcidid>https://orcid.org/0000-0001-5221-8018</orcidid><orcidid>https://orcid.org/0000-0002-9104-2315</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1057-7149
ispartof IEEE transactions on image processing, 2022, Vol.31, p.2683-2694
issn 1057-7149
1941-0042
language eng
recordid cdi_proquest_journals_2645985432
source IEEE Xplore (Online service)
subjects Arrays
Artificial neural networks
Color imagery
Convolutional neural networks
Feature extraction
feature fusion
Image recognition
Image segmentation
Modules
multi-modal networks
Network architecture
Neural networks
Recognition
Recurrent neural networks
Robustness (mathematics)
Sketch recognition
Sketches
spatio-temporal feature
Stroke (medical condition)
title Multistage Spatio-Temporal Networks for Robust Sketch Recognition
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T15%3A07%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multistage%20Spatio-Temporal%20Networks%20for%20Robust%20Sketch%20Recognition&rft.jtitle=IEEE%20transactions%20on%20image%20processing&rft.au=Li,%20Hanhui&rft.date=2022&rft.volume=31&rft.spage=2683&rft.epage=2694&rft.pages=2683-2694&rft.issn=1057-7149&rft.eissn=1941-0042&rft.coden=IIPRE4&rft_id=info:doi/10.1109/TIP.2022.3160240&rft_dat=%3Cproquest_pubme%3E2645985432%3C/proquest_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c347t-351dc29fe302927ad502f70addf6de58dbd68e24789aa669987ca3064c4b33113%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2645985432&rft_id=info:pmid/35320102&rft_ieee_id=9740528&rfr_iscdi=true