Loading…
MT-ASM: a multi-task attention strengthening model for fine-grained object recognition
Fine-Grained Object Recognition (FGOR) equips intelligent systems with recognition capabilities at or even beyond the level of human experts, making it a core technology for numerous applications such as biodiversity monitoring systems and advanced driver assistance systems. FGOR is highly challengi...
Saved in:
Published in: | Multimedia systems 2024-10, Vol.30 (5), Article 297 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | cdi_FETCH-LOGICAL-c200t-a260a0ce70d25aa1702530a978cad336fe89a8d7bf69c68eb948a250c6a5cb433 |
container_end_page | |
container_issue | 5 |
container_start_page | |
container_title | Multimedia systems |
container_volume | 30 |
creator | Liu, Dichao Wang, Yu Mase, Kenji Kato, Jien |
description | Fine-Grained Object Recognition (FGOR) equips intelligent systems with recognition capabilities at or even beyond the level of human experts, making it a core technology for numerous applications such as biodiversity monitoring systems and advanced driver assistance systems. FGOR is highly challenging, and recent research has primarily focused on identifying discriminative regions to tackle this task. However, these methods often require extensive manual labor or expensive algorithms, which may lead to irreversible information loss and pose significant barriers to their practical application. Instead of learning region capturing, this work enhances networks’ response to discriminative regions. We propose a multitask attention-strengthening model (MT-ASM), inspired by the human ability to effectively utilize experiences from related tasks when solving a specific task. When faced with an FGOR task, humans naturally compare images from the same and different categories to identify discriminative and non-discriminative regions. MT-ASM employs two networks during the training phase: the major network, tasked with the main goal of category classification, and a subordinate task that involves comparing images from the same and different categories to find discriminative and non-discriminative regions. The subordinate network evaluates the major network’s performance on the subordinate task, compelling the major network to improve its subordinate task performance. Once training is complete, the subordinate network is removed, ensuring no additional overhead during inference. Experimental results on CUB-200-2011, Stanford Cars, and FGVC-Aircraft datasets demonstrate that MT-ASM significantly outperforms baseline methods. Given its simplicity and low overhead, it remains highly competitive with state-of-the-art methods. The code is available at
https://github.com/Dichao-Liu/Find-Attention-with-Comparison
. |
doi_str_mv | 10.1007/s00530-024-01446-1 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3110895466</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3110895466</sourcerecordid><originalsourceid>FETCH-LOGICAL-c200t-a260a0ce70d25aa1702530a978cad336fe89a8d7bf69c68eb948a250c6a5cb433</originalsourceid><addsrcrecordid>eNp9kDtPwzAUhS0EEqXwB5gsMRuuH3EctqriJbVioLBajuOElNYutjvw70kJEhvTWc53ru6H0CWFawpQ3iSAggMBJghQISShR2hCBWeEKsWO0QQqwYioJDtFZymtAWgpOUzQ23JFZi_LW2zwdr_JPckmfWCTs_O5Dx6nHJ3v8rvzve_wNjRug9sQcdt7R7pohmhwqNfOZhydDZ3vD9w5OmnNJrmL35yi1_u71fyRLJ4fnuazBbEMIBPDJBiwroSGFcbQEtjwhqlKZU3DuWydqoxqyrqVlZXK1ZVQhhVgpSlsLTifoqtxdxfD596lrNdhH_1wUnNKQVWFkHJosbFlY0gpulbvYr818UtT0Ad_evSnB3_6x5-mA8RHKA1l37n4N_0P9Q2uSXLJ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3110895466</pqid></control><display><type>article</type><title>MT-ASM: a multi-task attention strengthening model for fine-grained object recognition</title><source>Springer Link</source><creator>Liu, Dichao ; Wang, Yu ; Mase, Kenji ; Kato, Jien</creator><creatorcontrib>Liu, Dichao ; Wang, Yu ; Mase, Kenji ; Kato, Jien</creatorcontrib><description>Fine-Grained Object Recognition (FGOR) equips intelligent systems with recognition capabilities at or even beyond the level of human experts, making it a core technology for numerous applications such as biodiversity monitoring systems and advanced driver assistance systems. FGOR is highly challenging, and recent research has primarily focused on identifying discriminative regions to tackle this task. However, these methods often require extensive manual labor or expensive algorithms, which may lead to irreversible information loss and pose significant barriers to their practical application. Instead of learning region capturing, this work enhances networks’ response to discriminative regions. We propose a multitask attention-strengthening model (MT-ASM), inspired by the human ability to effectively utilize experiences from related tasks when solving a specific task. When faced with an FGOR task, humans naturally compare images from the same and different categories to identify discriminative and non-discriminative regions. MT-ASM employs two networks during the training phase: the major network, tasked with the main goal of category classification, and a subordinate task that involves comparing images from the same and different categories to find discriminative and non-discriminative regions. The subordinate network evaluates the major network’s performance on the subordinate task, compelling the major network to improve its subordinate task performance. Once training is complete, the subordinate network is removed, ensuring no additional overhead during inference. Experimental results on CUB-200-2011, Stanford Cars, and FGVC-Aircraft datasets demonstrate that MT-ASM significantly outperforms baseline methods. Given its simplicity and low overhead, it remains highly competitive with state-of-the-art methods. The code is available at
https://github.com/Dichao-Liu/Find-Attention-with-Comparison
.</description><identifier>ISSN: 0942-4962</identifier><identifier>EISSN: 1432-1882</identifier><identifier>DOI: 10.1007/s00530-024-01446-1</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Advanced driver assistance systems ; Algorithms ; Attention ; Computer Communication Networks ; Computer Graphics ; Computer Science ; Cryptology ; Data Storage Representation ; Human performance ; Machine learning ; Multimedia Information Systems ; Object recognition ; Operating Systems ; Performance evaluation ; Physical work ; Regular Paper ; Strengthening</subject><ispartof>Multimedia systems, 2024-10, Vol.30 (5), Article 297</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c200t-a260a0ce70d25aa1702530a978cad336fe89a8d7bf69c68eb948a250c6a5cb433</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27922,27923</link.rule.ids></links><search><creatorcontrib>Liu, Dichao</creatorcontrib><creatorcontrib>Wang, Yu</creatorcontrib><creatorcontrib>Mase, Kenji</creatorcontrib><creatorcontrib>Kato, Jien</creatorcontrib><title>MT-ASM: a multi-task attention strengthening model for fine-grained object recognition</title><title>Multimedia systems</title><addtitle>Multimedia Systems</addtitle><description>Fine-Grained Object Recognition (FGOR) equips intelligent systems with recognition capabilities at or even beyond the level of human experts, making it a core technology for numerous applications such as biodiversity monitoring systems and advanced driver assistance systems. FGOR is highly challenging, and recent research has primarily focused on identifying discriminative regions to tackle this task. However, these methods often require extensive manual labor or expensive algorithms, which may lead to irreversible information loss and pose significant barriers to their practical application. Instead of learning region capturing, this work enhances networks’ response to discriminative regions. We propose a multitask attention-strengthening model (MT-ASM), inspired by the human ability to effectively utilize experiences from related tasks when solving a specific task. When faced with an FGOR task, humans naturally compare images from the same and different categories to identify discriminative and non-discriminative regions. MT-ASM employs two networks during the training phase: the major network, tasked with the main goal of category classification, and a subordinate task that involves comparing images from the same and different categories to find discriminative and non-discriminative regions. The subordinate network evaluates the major network’s performance on the subordinate task, compelling the major network to improve its subordinate task performance. Once training is complete, the subordinate network is removed, ensuring no additional overhead during inference. Experimental results on CUB-200-2011, Stanford Cars, and FGVC-Aircraft datasets demonstrate that MT-ASM significantly outperforms baseline methods. Given its simplicity and low overhead, it remains highly competitive with state-of-the-art methods. The code is available at
https://github.com/Dichao-Liu/Find-Attention-with-Comparison
.</description><subject>Advanced driver assistance systems</subject><subject>Algorithms</subject><subject>Attention</subject><subject>Computer Communication Networks</subject><subject>Computer Graphics</subject><subject>Computer Science</subject><subject>Cryptology</subject><subject>Data Storage Representation</subject><subject>Human performance</subject><subject>Machine learning</subject><subject>Multimedia Information Systems</subject><subject>Object recognition</subject><subject>Operating Systems</subject><subject>Performance evaluation</subject><subject>Physical work</subject><subject>Regular Paper</subject><subject>Strengthening</subject><issn>0942-4962</issn><issn>1432-1882</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kDtPwzAUhS0EEqXwB5gsMRuuH3EctqriJbVioLBajuOElNYutjvw70kJEhvTWc53ru6H0CWFawpQ3iSAggMBJghQISShR2hCBWeEKsWO0QQqwYioJDtFZymtAWgpOUzQ23JFZi_LW2zwdr_JPckmfWCTs_O5Dx6nHJ3v8rvzve_wNjRug9sQcdt7R7pohmhwqNfOZhydDZ3vD9w5OmnNJrmL35yi1_u71fyRLJ4fnuazBbEMIBPDJBiwroSGFcbQEtjwhqlKZU3DuWydqoxqyrqVlZXK1ZVQhhVgpSlsLTifoqtxdxfD596lrNdhH_1wUnNKQVWFkHJosbFlY0gpulbvYr818UtT0Ad_evSnB3_6x5-mA8RHKA1l37n4N_0P9Q2uSXLJ</recordid><startdate>20241001</startdate><enddate>20241001</enddate><creator>Liu, Dichao</creator><creator>Wang, Yu</creator><creator>Mase, Kenji</creator><creator>Kato, Jien</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20241001</creationdate><title>MT-ASM: a multi-task attention strengthening model for fine-grained object recognition</title><author>Liu, Dichao ; Wang, Yu ; Mase, Kenji ; Kato, Jien</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c200t-a260a0ce70d25aa1702530a978cad336fe89a8d7bf69c68eb948a250c6a5cb433</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Advanced driver assistance systems</topic><topic>Algorithms</topic><topic>Attention</topic><topic>Computer Communication Networks</topic><topic>Computer Graphics</topic><topic>Computer Science</topic><topic>Cryptology</topic><topic>Data Storage Representation</topic><topic>Human performance</topic><topic>Machine learning</topic><topic>Multimedia Information Systems</topic><topic>Object recognition</topic><topic>Operating Systems</topic><topic>Performance evaluation</topic><topic>Physical work</topic><topic>Regular Paper</topic><topic>Strengthening</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, Dichao</creatorcontrib><creatorcontrib>Wang, Yu</creatorcontrib><creatorcontrib>Mase, Kenji</creatorcontrib><creatorcontrib>Kato, Jien</creatorcontrib><collection>CrossRef</collection><jtitle>Multimedia systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Liu, Dichao</au><au>Wang, Yu</au><au>Mase, Kenji</au><au>Kato, Jien</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>MT-ASM: a multi-task attention strengthening model for fine-grained object recognition</atitle><jtitle>Multimedia systems</jtitle><stitle>Multimedia Systems</stitle><date>2024-10-01</date><risdate>2024</risdate><volume>30</volume><issue>5</issue><artnum>297</artnum><issn>0942-4962</issn><eissn>1432-1882</eissn><abstract>Fine-Grained Object Recognition (FGOR) equips intelligent systems with recognition capabilities at or even beyond the level of human experts, making it a core technology for numerous applications such as biodiversity monitoring systems and advanced driver assistance systems. FGOR is highly challenging, and recent research has primarily focused on identifying discriminative regions to tackle this task. However, these methods often require extensive manual labor or expensive algorithms, which may lead to irreversible information loss and pose significant barriers to their practical application. Instead of learning region capturing, this work enhances networks’ response to discriminative regions. We propose a multitask attention-strengthening model (MT-ASM), inspired by the human ability to effectively utilize experiences from related tasks when solving a specific task. When faced with an FGOR task, humans naturally compare images from the same and different categories to identify discriminative and non-discriminative regions. MT-ASM employs two networks during the training phase: the major network, tasked with the main goal of category classification, and a subordinate task that involves comparing images from the same and different categories to find discriminative and non-discriminative regions. The subordinate network evaluates the major network’s performance on the subordinate task, compelling the major network to improve its subordinate task performance. Once training is complete, the subordinate network is removed, ensuring no additional overhead during inference. Experimental results on CUB-200-2011, Stanford Cars, and FGVC-Aircraft datasets demonstrate that MT-ASM significantly outperforms baseline methods. Given its simplicity and low overhead, it remains highly competitive with state-of-the-art methods. The code is available at
https://github.com/Dichao-Liu/Find-Attention-with-Comparison
.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s00530-024-01446-1</doi></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0942-4962 |
ispartof | Multimedia systems, 2024-10, Vol.30 (5), Article 297 |
issn | 0942-4962 1432-1882 |
language | eng |
recordid | cdi_proquest_journals_3110895466 |
source | Springer Link |
subjects | Advanced driver assistance systems Algorithms Attention Computer Communication Networks Computer Graphics Computer Science Cryptology Data Storage Representation Human performance Machine learning Multimedia Information Systems Object recognition Operating Systems Performance evaluation Physical work Regular Paper Strengthening |
title | MT-ASM: a multi-task attention strengthening model for fine-grained object recognition |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-13T19%3A56%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=MT-ASM:%20a%20multi-task%20attention%20strengthening%20model%20for%20fine-grained%20object%20recognition&rft.jtitle=Multimedia%20systems&rft.au=Liu,%20Dichao&rft.date=2024-10-01&rft.volume=30&rft.issue=5&rft.artnum=297&rft.issn=0942-4962&rft.eissn=1432-1882&rft_id=info:doi/10.1007/s00530-024-01446-1&rft_dat=%3Cproquest_cross%3E3110895466%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c200t-a260a0ce70d25aa1702530a978cad336fe89a8d7bf69c68eb948a250c6a5cb433%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3110895466&rft_id=info:pmid/&rfr_iscdi=true |