Loading…
Context Enhanced Transformer for Single Image Object Detection
With the increasing importance of video data in real-world applications, there is a rising need for efficient object detection methods that utilize temporal information. While existing video object detection (VOD) techniques employ various strategies to address this challenge, they typically depend...
Saved in:
Published in: | arXiv.org 2023-12 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | |
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | An, Seungjun Park, Seonghoon Kim, Gyeongnyeon Baek, Jeongyeol Lee, Byeongwon Kim, Seungryong |
description | With the increasing importance of video data in real-world applications, there is a rising need for efficient object detection methods that utilize temporal information. While existing video object detection (VOD) techniques employ various strategies to address this challenge, they typically depend on locally adjacent frames or randomly sampled images within a clip. Although recent Transformer-based VOD methods have shown promising results, their reliance on multiple inputs and additional network complexity to incorporate temporal information limits their practical applicability. In this paper, we propose a novel approach to single image object detection, called Context Enhanced TRansformer (CETR), by incorporating temporal context into DETR using a newly designed memory module. To efficiently store temporal information, we construct a class-wise memory that collects contextual information across data. Additionally, we present a classification-based sampling technique to selectively utilize the relevant memory for the current image. In the testing, We introduce a test-time memory adaptation method that updates individual memory functions by considering the test distribution. Experiments with CityCam and ImageNet VID datasets exhibit the efficiency of the framework on various video systems. The project page and code will be made available at: https://ku-cvlab.github.io/CETR. |
format | article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2905670508</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2905670508</sourcerecordid><originalsourceid>FETCH-proquest_journals_29056705083</originalsourceid><addsrcrecordid>eNqNyrEOgjAUQNHGxESi_MNLnElqSwEXF8To5CA7qfhACLxqWxI_XwY_wOkM9y5YIKTcRVksxIqFzvWcc5GkQikZsENuyOPHQ0FPTTU-oLSaXGPsiBZm4NZROyBcRt0iXO891h6O6Gc6Qxu2bPTgMPy5ZttTUebn6GXNe0Lnq95MluZUiT1XScoVz-R_1xea9jgH</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2905670508</pqid></control><display><type>article</type><title>Context Enhanced Transformer for Single Image Object Detection</title><source>Publicly Available Content Database</source><creator>An, Seungjun ; Park, Seonghoon ; Kim, Gyeongnyeon ; Baek, Jeongyeol ; Lee, Byeongwon ; Kim, Seungryong</creator><creatorcontrib>An, Seungjun ; Park, Seonghoon ; Kim, Gyeongnyeon ; Baek, Jeongyeol ; Lee, Byeongwon ; Kim, Seungryong</creatorcontrib><description>With the increasing importance of video data in real-world applications, there is a rising need for efficient object detection methods that utilize temporal information. While existing video object detection (VOD) techniques employ various strategies to address this challenge, they typically depend on locally adjacent frames or randomly sampled images within a clip. Although recent Transformer-based VOD methods have shown promising results, their reliance on multiple inputs and additional network complexity to incorporate temporal information limits their practical applicability. In this paper, we propose a novel approach to single image object detection, called Context Enhanced TRansformer (CETR), by incorporating temporal context into DETR using a newly designed memory module. To efficiently store temporal information, we construct a class-wise memory that collects contextual information across data. Additionally, we present a classification-based sampling technique to selectively utilize the relevant memory for the current image. In the testing, We introduce a test-time memory adaptation method that updates individual memory functions by considering the test distribution. Experiments with CityCam and ImageNet VID datasets exhibit the efficiency of the framework on various video systems. The project page and code will be made available at: https://ku-cvlab.github.io/CETR.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Context ; Image enhancement ; Object recognition ; Sampling methods ; Testing time ; Transformers ; Video data</subject><ispartof>arXiv.org, 2023-12</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2905670508?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>An, Seungjun</creatorcontrib><creatorcontrib>Park, Seonghoon</creatorcontrib><creatorcontrib>Kim, Gyeongnyeon</creatorcontrib><creatorcontrib>Baek, Jeongyeol</creatorcontrib><creatorcontrib>Lee, Byeongwon</creatorcontrib><creatorcontrib>Kim, Seungryong</creatorcontrib><title>Context Enhanced Transformer for Single Image Object Detection</title><title>arXiv.org</title><description>With the increasing importance of video data in real-world applications, there is a rising need for efficient object detection methods that utilize temporal information. While existing video object detection (VOD) techniques employ various strategies to address this challenge, they typically depend on locally adjacent frames or randomly sampled images within a clip. Although recent Transformer-based VOD methods have shown promising results, their reliance on multiple inputs and additional network complexity to incorporate temporal information limits their practical applicability. In this paper, we propose a novel approach to single image object detection, called Context Enhanced TRansformer (CETR), by incorporating temporal context into DETR using a newly designed memory module. To efficiently store temporal information, we construct a class-wise memory that collects contextual information across data. Additionally, we present a classification-based sampling technique to selectively utilize the relevant memory for the current image. In the testing, We introduce a test-time memory adaptation method that updates individual memory functions by considering the test distribution. Experiments with CityCam and ImageNet VID datasets exhibit the efficiency of the framework on various video systems. The project page and code will be made available at: https://ku-cvlab.github.io/CETR.</description><subject>Context</subject><subject>Image enhancement</subject><subject>Object recognition</subject><subject>Sampling methods</subject><subject>Testing time</subject><subject>Transformers</subject><subject>Video data</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNyrEOgjAUQNHGxESi_MNLnElqSwEXF8To5CA7qfhACLxqWxI_XwY_wOkM9y5YIKTcRVksxIqFzvWcc5GkQikZsENuyOPHQ0FPTTU-oLSaXGPsiBZm4NZROyBcRt0iXO891h6O6Gc6Qxu2bPTgMPy5ZttTUebn6GXNe0Lnq95MluZUiT1XScoVz-R_1xea9jgH</recordid><startdate>20231226</startdate><enddate>20231226</enddate><creator>An, Seungjun</creator><creator>Park, Seonghoon</creator><creator>Kim, Gyeongnyeon</creator><creator>Baek, Jeongyeol</creator><creator>Lee, Byeongwon</creator><creator>Kim, Seungryong</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20231226</creationdate><title>Context Enhanced Transformer for Single Image Object Detection</title><author>An, Seungjun ; Park, Seonghoon ; Kim, Gyeongnyeon ; Baek, Jeongyeol ; Lee, Byeongwon ; Kim, Seungryong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_29056705083</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Context</topic><topic>Image enhancement</topic><topic>Object recognition</topic><topic>Sampling methods</topic><topic>Testing time</topic><topic>Transformers</topic><topic>Video data</topic><toplevel>online_resources</toplevel><creatorcontrib>An, Seungjun</creatorcontrib><creatorcontrib>Park, Seonghoon</creatorcontrib><creatorcontrib>Kim, Gyeongnyeon</creatorcontrib><creatorcontrib>Baek, Jeongyeol</creatorcontrib><creatorcontrib>Lee, Byeongwon</creatorcontrib><creatorcontrib>Kim, Seungryong</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>An, Seungjun</au><au>Park, Seonghoon</au><au>Kim, Gyeongnyeon</au><au>Baek, Jeongyeol</au><au>Lee, Byeongwon</au><au>Kim, Seungryong</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Context Enhanced Transformer for Single Image Object Detection</atitle><jtitle>arXiv.org</jtitle><date>2023-12-26</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>With the increasing importance of video data in real-world applications, there is a rising need for efficient object detection methods that utilize temporal information. While existing video object detection (VOD) techniques employ various strategies to address this challenge, they typically depend on locally adjacent frames or randomly sampled images within a clip. Although recent Transformer-based VOD methods have shown promising results, their reliance on multiple inputs and additional network complexity to incorporate temporal information limits their practical applicability. In this paper, we propose a novel approach to single image object detection, called Context Enhanced TRansformer (CETR), by incorporating temporal context into DETR using a newly designed memory module. To efficiently store temporal information, we construct a class-wise memory that collects contextual information across data. Additionally, we present a classification-based sampling technique to selectively utilize the relevant memory for the current image. In the testing, We introduce a test-time memory adaptation method that updates individual memory functions by considering the test distribution. Experiments with CityCam and ImageNet VID datasets exhibit the efficiency of the framework on various video systems. The project page and code will be made available at: https://ku-cvlab.github.io/CETR.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2023-12 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2905670508 |
source | Publicly Available Content Database |
subjects | Context Image enhancement Object recognition Sampling methods Testing time Transformers Video data |
title | Context Enhanced Transformer for Single Image Object Detection |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T23%3A43%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Context%20Enhanced%20Transformer%20for%20Single%20Image%20Object%20Detection&rft.jtitle=arXiv.org&rft.au=An,%20Seungjun&rft.date=2023-12-26&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2905670508%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_29056705083%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2905670508&rft_id=info:pmid/&rfr_iscdi=true |