Loading…

MT-ASM: a multi-task attention strengthening model for fine-grained object recognition

Fine-Grained Object Recognition (FGOR) equips intelligent systems with recognition capabilities at or even beyond the level of human experts, making it a core technology for numerous applications such as biodiversity monitoring systems and advanced driver assistance systems. FGOR is highly challengi...

Full description

Saved in:
Bibliographic Details
Published in:Multimedia systems 2024-10, Vol.30 (5), Article 297
Main Authors: Liu, Dichao, Wang, Yu, Mase, Kenji, Kato, Jien
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c200t-a260a0ce70d25aa1702530a978cad336fe89a8d7bf69c68eb948a250c6a5cb433
container_end_page
container_issue 5
container_start_page
container_title Multimedia systems
container_volume 30
creator Liu, Dichao
Wang, Yu
Mase, Kenji
Kato, Jien
description Fine-Grained Object Recognition (FGOR) equips intelligent systems with recognition capabilities at or even beyond the level of human experts, making it a core technology for numerous applications such as biodiversity monitoring systems and advanced driver assistance systems. FGOR is highly challenging, and recent research has primarily focused on identifying discriminative regions to tackle this task. However, these methods often require extensive manual labor or expensive algorithms, which may lead to irreversible information loss and pose significant barriers to their practical application. Instead of learning region capturing, this work enhances networks’ response to discriminative regions. We propose a multitask attention-strengthening model (MT-ASM), inspired by the human ability to effectively utilize experiences from related tasks when solving a specific task. When faced with an FGOR task, humans naturally compare images from the same and different categories to identify discriminative and non-discriminative regions. MT-ASM employs two networks during the training phase: the major network, tasked with the main goal of category classification, and a subordinate task that involves comparing images from the same and different categories to find discriminative and non-discriminative regions. The subordinate network evaluates the major network’s performance on the subordinate task, compelling the major network to improve its subordinate task performance. Once training is complete, the subordinate network is removed, ensuring no additional overhead during inference. Experimental results on CUB-200-2011, Stanford Cars, and FGVC-Aircraft datasets demonstrate that MT-ASM significantly outperforms baseline methods. Given its simplicity and low overhead, it remains highly competitive with state-of-the-art methods. The code is available at https://github.com/Dichao-Liu/Find-Attention-with-Comparison .
doi_str_mv 10.1007/s00530-024-01446-1
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3110895466</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3110895466</sourcerecordid><originalsourceid>FETCH-LOGICAL-c200t-a260a0ce70d25aa1702530a978cad336fe89a8d7bf69c68eb948a250c6a5cb433</originalsourceid><addsrcrecordid>eNp9kDtPwzAUhS0EEqXwB5gsMRuuH3EctqriJbVioLBajuOElNYutjvw70kJEhvTWc53ru6H0CWFawpQ3iSAggMBJghQISShR2hCBWeEKsWO0QQqwYioJDtFZymtAWgpOUzQ23JFZi_LW2zwdr_JPckmfWCTs_O5Dx6nHJ3v8rvzve_wNjRug9sQcdt7R7pohmhwqNfOZhydDZ3vD9w5OmnNJrmL35yi1_u71fyRLJ4fnuazBbEMIBPDJBiwroSGFcbQEtjwhqlKZU3DuWydqoxqyrqVlZXK1ZVQhhVgpSlsLTifoqtxdxfD596lrNdhH_1wUnNKQVWFkHJosbFlY0gpulbvYr818UtT0Ad_evSnB3_6x5-mA8RHKA1l37n4N_0P9Q2uSXLJ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3110895466</pqid></control><display><type>article</type><title>MT-ASM: a multi-task attention strengthening model for fine-grained object recognition</title><source>Springer Link</source><creator>Liu, Dichao ; Wang, Yu ; Mase, Kenji ; Kato, Jien</creator><creatorcontrib>Liu, Dichao ; Wang, Yu ; Mase, Kenji ; Kato, Jien</creatorcontrib><description>Fine-Grained Object Recognition (FGOR) equips intelligent systems with recognition capabilities at or even beyond the level of human experts, making it a core technology for numerous applications such as biodiversity monitoring systems and advanced driver assistance systems. FGOR is highly challenging, and recent research has primarily focused on identifying discriminative regions to tackle this task. However, these methods often require extensive manual labor or expensive algorithms, which may lead to irreversible information loss and pose significant barriers to their practical application. Instead of learning region capturing, this work enhances networks’ response to discriminative regions. We propose a multitask attention-strengthening model (MT-ASM), inspired by the human ability to effectively utilize experiences from related tasks when solving a specific task. When faced with an FGOR task, humans naturally compare images from the same and different categories to identify discriminative and non-discriminative regions. MT-ASM employs two networks during the training phase: the major network, tasked with the main goal of category classification, and a subordinate task that involves comparing images from the same and different categories to find discriminative and non-discriminative regions. The subordinate network evaluates the major network’s performance on the subordinate task, compelling the major network to improve its subordinate task performance. Once training is complete, the subordinate network is removed, ensuring no additional overhead during inference. Experimental results on CUB-200-2011, Stanford Cars, and FGVC-Aircraft datasets demonstrate that MT-ASM significantly outperforms baseline methods. Given its simplicity and low overhead, it remains highly competitive with state-of-the-art methods. The code is available at https://github.com/Dichao-Liu/Find-Attention-with-Comparison .</description><identifier>ISSN: 0942-4962</identifier><identifier>EISSN: 1432-1882</identifier><identifier>DOI: 10.1007/s00530-024-01446-1</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Advanced driver assistance systems ; Algorithms ; Attention ; Computer Communication Networks ; Computer Graphics ; Computer Science ; Cryptology ; Data Storage Representation ; Human performance ; Machine learning ; Multimedia Information Systems ; Object recognition ; Operating Systems ; Performance evaluation ; Physical work ; Regular Paper ; Strengthening</subject><ispartof>Multimedia systems, 2024-10, Vol.30 (5), Article 297</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c200t-a260a0ce70d25aa1702530a978cad336fe89a8d7bf69c68eb948a250c6a5cb433</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27922,27923</link.rule.ids></links><search><creatorcontrib>Liu, Dichao</creatorcontrib><creatorcontrib>Wang, Yu</creatorcontrib><creatorcontrib>Mase, Kenji</creatorcontrib><creatorcontrib>Kato, Jien</creatorcontrib><title>MT-ASM: a multi-task attention strengthening model for fine-grained object recognition</title><title>Multimedia systems</title><addtitle>Multimedia Systems</addtitle><description>Fine-Grained Object Recognition (FGOR) equips intelligent systems with recognition capabilities at or even beyond the level of human experts, making it a core technology for numerous applications such as biodiversity monitoring systems and advanced driver assistance systems. FGOR is highly challenging, and recent research has primarily focused on identifying discriminative regions to tackle this task. However, these methods often require extensive manual labor or expensive algorithms, which may lead to irreversible information loss and pose significant barriers to their practical application. Instead of learning region capturing, this work enhances networks’ response to discriminative regions. We propose a multitask attention-strengthening model (MT-ASM), inspired by the human ability to effectively utilize experiences from related tasks when solving a specific task. When faced with an FGOR task, humans naturally compare images from the same and different categories to identify discriminative and non-discriminative regions. MT-ASM employs two networks during the training phase: the major network, tasked with the main goal of category classification, and a subordinate task that involves comparing images from the same and different categories to find discriminative and non-discriminative regions. The subordinate network evaluates the major network’s performance on the subordinate task, compelling the major network to improve its subordinate task performance. Once training is complete, the subordinate network is removed, ensuring no additional overhead during inference. Experimental results on CUB-200-2011, Stanford Cars, and FGVC-Aircraft datasets demonstrate that MT-ASM significantly outperforms baseline methods. Given its simplicity and low overhead, it remains highly competitive with state-of-the-art methods. The code is available at https://github.com/Dichao-Liu/Find-Attention-with-Comparison .</description><subject>Advanced driver assistance systems</subject><subject>Algorithms</subject><subject>Attention</subject><subject>Computer Communication Networks</subject><subject>Computer Graphics</subject><subject>Computer Science</subject><subject>Cryptology</subject><subject>Data Storage Representation</subject><subject>Human performance</subject><subject>Machine learning</subject><subject>Multimedia Information Systems</subject><subject>Object recognition</subject><subject>Operating Systems</subject><subject>Performance evaluation</subject><subject>Physical work</subject><subject>Regular Paper</subject><subject>Strengthening</subject><issn>0942-4962</issn><issn>1432-1882</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kDtPwzAUhS0EEqXwB5gsMRuuH3EctqriJbVioLBajuOElNYutjvw70kJEhvTWc53ru6H0CWFawpQ3iSAggMBJghQISShR2hCBWeEKsWO0QQqwYioJDtFZymtAWgpOUzQ23JFZi_LW2zwdr_JPckmfWCTs_O5Dx6nHJ3v8rvzve_wNjRug9sQcdt7R7pohmhwqNfOZhydDZ3vD9w5OmnNJrmL35yi1_u71fyRLJ4fnuazBbEMIBPDJBiwroSGFcbQEtjwhqlKZU3DuWydqoxqyrqVlZXK1ZVQhhVgpSlsLTifoqtxdxfD596lrNdhH_1wUnNKQVWFkHJosbFlY0gpulbvYr818UtT0Ad_evSnB3_6x5-mA8RHKA1l37n4N_0P9Q2uSXLJ</recordid><startdate>20241001</startdate><enddate>20241001</enddate><creator>Liu, Dichao</creator><creator>Wang, Yu</creator><creator>Mase, Kenji</creator><creator>Kato, Jien</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20241001</creationdate><title>MT-ASM: a multi-task attention strengthening model for fine-grained object recognition</title><author>Liu, Dichao ; Wang, Yu ; Mase, Kenji ; Kato, Jien</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c200t-a260a0ce70d25aa1702530a978cad336fe89a8d7bf69c68eb948a250c6a5cb433</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Advanced driver assistance systems</topic><topic>Algorithms</topic><topic>Attention</topic><topic>Computer Communication Networks</topic><topic>Computer Graphics</topic><topic>Computer Science</topic><topic>Cryptology</topic><topic>Data Storage Representation</topic><topic>Human performance</topic><topic>Machine learning</topic><topic>Multimedia Information Systems</topic><topic>Object recognition</topic><topic>Operating Systems</topic><topic>Performance evaluation</topic><topic>Physical work</topic><topic>Regular Paper</topic><topic>Strengthening</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, Dichao</creatorcontrib><creatorcontrib>Wang, Yu</creatorcontrib><creatorcontrib>Mase, Kenji</creatorcontrib><creatorcontrib>Kato, Jien</creatorcontrib><collection>CrossRef</collection><jtitle>Multimedia systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Liu, Dichao</au><au>Wang, Yu</au><au>Mase, Kenji</au><au>Kato, Jien</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>MT-ASM: a multi-task attention strengthening model for fine-grained object recognition</atitle><jtitle>Multimedia systems</jtitle><stitle>Multimedia Systems</stitle><date>2024-10-01</date><risdate>2024</risdate><volume>30</volume><issue>5</issue><artnum>297</artnum><issn>0942-4962</issn><eissn>1432-1882</eissn><abstract>Fine-Grained Object Recognition (FGOR) equips intelligent systems with recognition capabilities at or even beyond the level of human experts, making it a core technology for numerous applications such as biodiversity monitoring systems and advanced driver assistance systems. FGOR is highly challenging, and recent research has primarily focused on identifying discriminative regions to tackle this task. However, these methods often require extensive manual labor or expensive algorithms, which may lead to irreversible information loss and pose significant barriers to their practical application. Instead of learning region capturing, this work enhances networks’ response to discriminative regions. We propose a multitask attention-strengthening model (MT-ASM), inspired by the human ability to effectively utilize experiences from related tasks when solving a specific task. When faced with an FGOR task, humans naturally compare images from the same and different categories to identify discriminative and non-discriminative regions. MT-ASM employs two networks during the training phase: the major network, tasked with the main goal of category classification, and a subordinate task that involves comparing images from the same and different categories to find discriminative and non-discriminative regions. The subordinate network evaluates the major network’s performance on the subordinate task, compelling the major network to improve its subordinate task performance. Once training is complete, the subordinate network is removed, ensuring no additional overhead during inference. Experimental results on CUB-200-2011, Stanford Cars, and FGVC-Aircraft datasets demonstrate that MT-ASM significantly outperforms baseline methods. Given its simplicity and low overhead, it remains highly competitive with state-of-the-art methods. The code is available at https://github.com/Dichao-Liu/Find-Attention-with-Comparison .</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s00530-024-01446-1</doi></addata></record>
fulltext fulltext
identifier ISSN: 0942-4962
ispartof Multimedia systems, 2024-10, Vol.30 (5), Article 297
issn 0942-4962
1432-1882
language eng
recordid cdi_proquest_journals_3110895466
source Springer Link
subjects Advanced driver assistance systems
Algorithms
Attention
Computer Communication Networks
Computer Graphics
Computer Science
Cryptology
Data Storage Representation
Human performance
Machine learning
Multimedia Information Systems
Object recognition
Operating Systems
Performance evaluation
Physical work
Regular Paper
Strengthening
title MT-ASM: a multi-task attention strengthening model for fine-grained object recognition
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-13T19%3A56%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=MT-ASM:%20a%20multi-task%20attention%20strengthening%20model%20for%20fine-grained%20object%20recognition&rft.jtitle=Multimedia%20systems&rft.au=Liu,%20Dichao&rft.date=2024-10-01&rft.volume=30&rft.issue=5&rft.artnum=297&rft.issn=0942-4962&rft.eissn=1432-1882&rft_id=info:doi/10.1007/s00530-024-01446-1&rft_dat=%3Cproquest_cross%3E3110895466%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c200t-a260a0ce70d25aa1702530a978cad336fe89a8d7bf69c68eb948a250c6a5cb433%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3110895466&rft_id=info:pmid/&rfr_iscdi=true