Loading…

MT-ASM: a multi-task attention strengthening model for fine-grained object recognition

Fine-Grained Object Recognition (FGOR) equips intelligent systems with recognition capabilities at or even beyond the level of human experts, making it a core technology for numerous applications such as biodiversity monitoring systems and advanced driver assistance systems. FGOR is highly challengi...

Full description

Saved in:

Bibliographic Details
Published in:	Multimedia systems 2024-10, Vol.30 (5), Article 297
Main Authors:	Liu, Dichao, Wang, Yu, Mase, Kenji, Kato, Jien
Format:	Article
Language:	English
Subjects:	Advanced driver assistance systems Algorithms Attention Computer Communication Networks Computer Graphics Computer Science Cryptology Data Storage Representation Human performance Machine learning Multimedia Information Systems Object recognition Operating Systems Performance evaluation Physical work Regular Paper Strengthening
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites	cdi_FETCH-LOGICAL-c200t-a260a0ce70d25aa1702530a978cad336fe89a8d7bf69c68eb948a250c6a5cb433
container_end_page
container_issue	5
container_start_page
container_title	Multimedia systems
container_volume	30
creator	Liu, Dichao Wang, Yu Mase, Kenji Kato, Jien
description	Fine-Grained Object Recognition (FGOR) equips intelligent systems with recognition capabilities at or even beyond the level of human experts, making it a core technology for numerous applications such as biodiversity monitoring systems and advanced driver assistance systems. FGOR is highly challenging, and recent research has primarily focused on identifying discriminative regions to tackle this task. However, these methods often require extensive manual labor or expensive algorithms, which may lead to irreversible information loss and pose significant barriers to their practical application. Instead of learning region capturing, this work enhances networks’ response to discriminative regions. We propose a multitask attention-strengthening model (MT-ASM), inspired by the human ability to effectively utilize experiences from related tasks when solving a specific task. When faced with an FGOR task, humans naturally compare images from the same and different categories to identify discriminative and non-discriminative regions. MT-ASM employs two networks during the training phase: the major network, tasked with the main goal of category classification, and a subordinate task that involves comparing images from the same and different categories to find discriminative and non-discriminative regions. The subordinate network evaluates the major network’s performance on the subordinate task, compelling the major network to improve its subordinate task performance. Once training is complete, the subordinate network is removed, ensuring no additional overhead during inference. Experimental results on CUB-200-2011, Stanford Cars, and FGVC-Aircraft datasets demonstrate that MT-ASM significantly outperforms baseline methods. Given its simplicity and low overhead, it remains highly competitive with state-of-the-art methods. The code is available at https://github.com/Dichao-Liu/Find-Attention-with-Comparison .
doi_str_mv	10.1007/s00530-024-01446-1
format	article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3110895466</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3110895466</sourcerecordid><originalsourceid>FETCH-LOGICAL-c200t-a260a0ce70d25aa1702530a978cad336fe89a8d7bf69c68eb948a250c6a5cb433</originalsourceid><addsrcrecordid>eNp9kDtPwzAUhS0EEqXwB5gsMRuuH3EctqriJbVioLBajuOElNYutjvw70kJEhvTWc53ru6H0CWFawpQ3iSAggMBJghQISShR2hCBWeEKsWO0QQqwYioJDtFZymtAWgpOUzQ23JFZi_LW2zwdr_JPckmfWCTs_O5Dx6nHJ3v8rvzve_wNjRug9sQcdt7R7pohmhwqNfOZhydDZ3vD9w5OmnNJrmL35yi1_u71fyRLJ4fnuazBbEMIBPDJBiwroSGFcbQEtjwhqlKZU3DuWydqoxqyrqVlZXK1ZVQhhVgpSlsLTifoqtxdxfD596lrNdhH_1wUnNKQVWFkHJosbFlY0gpulbvYr818UtT0Ad_evSnB3_6x5-mA8RHKA1l37n4N_0P9Q2uSXLJ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3110895466</pqid></control><display><type>article</type><title>MT-ASM: a multi-task attention strengthening model for fine-grained object recognition</title><source>Springer Link</source><creator>Liu, Dichao ; Wang, Yu ; Mase, Kenji ; Kato, Jien</creator><creatorcontrib>Liu, Dichao ; Wang, Yu ; Mase, Kenji ; Kato, Jien</creatorcontrib><description>Fine-Grained Object Recognition (FGOR) equips intelligent systems with recognition capabilities at or even beyond the level of human experts, making it a core technology for numerous applications such as biodiversity monitoring systems and advanced driver assistance systems. FGOR is highly challenging, and recent research has primarily focused on identifying discriminative regions to tackle this task. However, these methods often require extensive manual labor or expensive algorithms, which may lead to irreversible information loss and pose significant barriers to their practical application. Instead of learning region capturing, this work enhances networks’ response to discriminative regions. We propose a multitask attention-strengthening model (MT-ASM), inspired by the human ability to effectively utilize experiences from related tasks when solving a specific task. When faced with an FGOR task, humans naturally compare images from the same and different categories to identify discriminative and non-discriminative regions. MT-ASM employs two networks during the training phase: the major network, tasked with the main goal of category classification, and a subordinate task that involves comparing images from the same and different categories to find discriminative and non-discriminative regions. The subordinate network evaluates the major network’s performance on the subordinate task, compelling the major network to improve its subordinate task performance. Once training is complete, the subordinate network is removed, ensuring no additional overhead during inference. Experimental results on CUB-200-2011, Stanford Cars, and FGVC-Aircraft datasets demonstrate that MT-ASM significantly outperforms baseline methods. Given its simplicity and low overhead, it remains highly competitive with state-of-the-art methods. The code is available at https://github.com/Dichao-Liu/Find-Attention-with-Comparison .</description><identifier>ISSN: 0942-4962</identifier><identifier>EISSN: 1432-1882</identifier><identifier>DOI: 10.1007/s00530-024-01446-1</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Advanced driver assistance systems ; Algorithms ; Attention ; Computer Communication Networks ; Computer Graphics ; Computer Science ; Cryptology ; Data Storage Representation ; Human performance ; Machine learning ; Multimedia Information Systems ; Object recognition ; Operating Systems ; Performance evaluation ; Physical work ; Regular Paper ; Strengthening</subject><ispartof>Multimedia systems, 2024-10, Vol.30 (5), Article 297</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c200t-a260a0ce70d25aa1702530a978cad336fe89a8d7bf69c68eb948a250c6a5cb433</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27922,27923</link.rule.ids></links><search><creatorcontrib>Liu, Dichao</creatorcontrib><creatorcontrib>Wang, Yu</creatorcontrib><creatorcontrib>Mase, Kenji</creatorcontrib><creatorcontrib>Kato, Jien</creatorcontrib><title>MT-ASM: a multi-task attention strengthening model for fine-grained object recognition</title><title>Multimedia systems</title><addtitle>Multimedia Systems</addtitle><description>Fine-Grained Object Recognition (FGOR) equips intelligent systems with recognition capabilities at or even beyond the level of human experts, making it a core technology for numerous applications such as biodiversity monitoring systems and advanced driver assistance systems. FGOR is highly challenging, and recent research has primarily focused on identifying discriminative regions to tackle this task. However, these methods often require extensive manual labor or expensive algorithms, which may lead to irreversible information loss and pose significant barriers to their practical application. Instead of learning region capturing, this work enhances networks’ response to discriminative regions. We propose a multitask attention-strengthening model (MT-ASM), inspired by the human ability to effectively utilize experiences from related tasks when solving a specific task. When faced with an FGOR task, humans naturally compare images from the same and different categories to identify discriminative and non-discriminative regions. MT-ASM employs two networks during the training phase: the major network, tasked with the main goal of category classification, and a subordinate task that involves comparing images from the same and different categories to find discriminative and non-discriminative regions. The subordinate network evaluates the major network’s performance on the subordinate task, compelling the major network to improve its subordinate task performance. Once training is complete, the subordinate network is removed, ensuring no additional overhead during inference. Experimental results on CUB-200-2011, Stanford Cars, and FGVC-Aircraft datasets demonstrate that MT-ASM significantly outperforms baseline methods. Given its simplicity and low overhead, it remains highly competitive with state-of-the-art methods. The code is available at https://github.com/Dichao-Liu/Find-Attention-with-Comparison .</description><subject>Advanced driver assistance systems</subject><subject>Algorithms</subject><subject>Attention</subject><subject>Computer Communication Networks</subject><subject>Computer Graphics</subject><subject>Computer Science</subject><subject>Cryptology</subject><subject>Data Storage Representation</subject><subject>Human performance</subject><subject>Machine learning</subject><subject>Multimedia Information Systems</subject><subject>Object recognition</subject><subject>Operating Systems</subject><subject>Performance evaluation</subject><subject>Physical work</subject><subject>Regular Paper</subject><subject>Strengthening</subject><issn>0942-4962</issn><issn>1432-1882</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kDtPwzAUhS0EEqXwB5gsMRuuH3EctqriJbVioLBajuOElNYutjvw70kJEhvTWc53ru6H0CWFawpQ3iSAggMBJghQISShR2hCBWeEKsWO0QQqwYioJDtFZymtAWgpOUzQ23JFZi_LW2zwdr_JPckmfWCTs_O5Dx6nHJ3v8rvzve_wNjRug9sQcdt7R7pohmhwqNfOZhydDZ3vD9w5OmnNJrmL35yi1_u71fyRLJ4fnuazBbEMIBPDJBiwroSGFcbQEtjwhqlKZU3DuWydqoxqyrqVlZXK1ZVQhhVgpSlsLTifoqtxdxfD596lrNdhH_1wUnNKQVWFkHJosbFlY0gpulbvYr818UtT0Ad_evSnB3_6x5-mA8RHKA1l37n4N_0P9Q2uSXLJ</recordid><startdate>20241001</startdate><enddate>20241001</enddate><creator>Liu, Dichao</creator><creator>Wang, Yu</creator><creator>Mase, Kenji</creator><creator>Kato, Jien</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20241001</creationdate><title>MT-ASM: a multi-task attention strengthening model for fine-grained object recognition</title><author>Liu, Dichao ; Wang, Yu ; Mase, Kenji ; Kato, Jien</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c200t-a260a0ce70d25aa1702530a978cad336fe89a8d7bf69c68eb948a250c6a5cb433</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Advanced driver assistance systems</topic><topic>Algorithms</topic><topic>Attention</topic><topic>Computer Communication Networks</topic><topic>Computer Graphics</topic><topic>Computer Science</topic><topic>Cryptology</topic><topic>Data Storage Representation</topic><topic>Human performance</topic><topic>Machine learning</topic><topic>Multimedia Information Systems</topic><topic>Object recognition</topic><topic>Operating Systems</topic><topic>Performance evaluation</topic><topic>Physical work</topic><topic>Regular Paper</topic><topic>Strengthening</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, Dichao</creatorcontrib><creatorcontrib>Wang, Yu</creatorcontrib><creatorcontrib>Mase, Kenji</creatorcontrib><creatorcontrib>Kato, Jien</creatorcontrib><collection>CrossRef</collection><jtitle>Multimedia systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Liu, Dichao</au><au>Wang, Yu</au><au>Mase, Kenji</au><au>Kato, Jien</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>MT-ASM: a multi-task attention strengthening model for fine-grained object recognition</atitle><jtitle>Multimedia systems</jtitle><stitle>Multimedia Systems</stitle><date>2024-10-01</date><risdate>2024</risdate><volume>30</volume><issue>5</issue><artnum>297</artnum><issn>0942-4962</issn><eissn>1432-1882</eissn><abstract>Fine-Grained Object Recognition (FGOR) equips intelligent systems with recognition capabilities at or even beyond the level of human experts, making it a core technology for numerous applications such as biodiversity monitoring systems and advanced driver assistance systems. FGOR is highly challenging, and recent research has primarily focused on identifying discriminative regions to tackle this task. However, these methods often require extensive manual labor or expensive algorithms, which may lead to irreversible information loss and pose significant barriers to their practical application. Instead of learning region capturing, this work enhances networks’ response to discriminative regions. We propose a multitask attention-strengthening model (MT-ASM), inspired by the human ability to effectively utilize experiences from related tasks when solving a specific task. When faced with an FGOR task, humans naturally compare images from the same and different categories to identify discriminative and non-discriminative regions. MT-ASM employs two networks during the training phase: the major network, tasked with the main goal of category classification, and a subordinate task that involves comparing images from the same and different categories to find discriminative and non-discriminative regions. The subordinate network evaluates the major network’s performance on the subordinate task, compelling the major network to improve its subordinate task performance. Once training is complete, the subordinate network is removed, ensuring no additional overhead during inference. Experimental results on CUB-200-2011, Stanford Cars, and FGVC-Aircraft datasets demonstrate that MT-ASM significantly outperforms baseline methods. Given its simplicity and low overhead, it remains highly competitive with state-of-the-art methods. The code is available at https://github.com/Dichao-Liu/Find-Attention-with-Comparison .</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s00530-024-01446-1</doi></addata></record>
fulltext	fulltext
identifier	ISSN: 0942-4962
ispartof	Multimedia systems, 2024-10, Vol.30 (5), Article 297
issn	0942-4962 1432-1882
language	eng
recordid	cdi_proquest_journals_3110895466
source	Springer Link
subjects	Advanced driver assistance systems Algorithms Attention Computer Communication Networks Computer Graphics Computer Science Cryptology Data Storage Representation Human performance Machine learning Multimedia Information Systems Object recognition Operating Systems Performance evaluation Physical work Regular Paper Strengthening
title	MT-ASM: a multi-task attention strengthening model for fine-grained object recognition
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-13T19%3A56%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=MT-ASM:%20a%20multi-task%20attention%20strengthening%20model%20for%20fine-grained%20object%20recognition&rft.jtitle=Multimedia%20systems&rft.au=Liu,%20Dichao&rft.date=2024-10-01&rft.volume=30&rft.issue=5&rft.artnum=297&rft.issn=0942-4962&rft.eissn=1432-1882&rft_id=info:doi/10.1007/s00530-024-01446-1&rft_dat=%3Cproquest_cross%3E3110895466%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c200t-a260a0ce70d25aa1702530a978cad336fe89a8d7bf69c68eb948a250c6a5cb433%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3110895466&rft_id=info:pmid/&rfr_iscdi=true