Loading…

Context Enhanced Transformer for Single Image Object Detection

With the increasing importance of video data in real-world applications, there is a rising need for efficient object detection methods that utilize temporal information. While existing video object detection (VOD) techniques employ various strategies to address this challenge, they typically depend...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2023-12
Main Authors:	An, Seungjun, Park, Seonghoon, Kim, Gyeongnyeon, Baek, Jeongyeol, Lee, Byeongwon, Kim, Seungryong
Format:	Article
Language:	English
Subjects:	Context Image enhancement Object recognition Sampling methods Testing time Transformers Video data
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	An, Seungjun Park, Seonghoon Kim, Gyeongnyeon Baek, Jeongyeol Lee, Byeongwon Kim, Seungryong
description	With the increasing importance of video data in real-world applications, there is a rising need for efficient object detection methods that utilize temporal information. While existing video object detection (VOD) techniques employ various strategies to address this challenge, they typically depend on locally adjacent frames or randomly sampled images within a clip. Although recent Transformer-based VOD methods have shown promising results, their reliance on multiple inputs and additional network complexity to incorporate temporal information limits their practical applicability. In this paper, we propose a novel approach to single image object detection, called Context Enhanced TRansformer (CETR), by incorporating temporal context into DETR using a newly designed memory module. To efficiently store temporal information, we construct a class-wise memory that collects contextual information across data. Additionally, we present a classification-based sampling technique to selectively utilize the relevant memory for the current image. In the testing, We introduce a test-time memory adaptation method that updates individual memory functions by considering the test distribution. Experiments with CityCam and ImageNet VID datasets exhibit the efficiency of the framework on various video systems. The project page and code will be made available at: https://ku-cvlab.github.io/CETR.
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2905670508</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2905670508</sourcerecordid><originalsourceid>FETCH-proquest_journals_29056705083</originalsourceid><addsrcrecordid>eNqNyrEOgjAUQNHGxESi_MNLnElqSwEXF8To5CA7qfhACLxqWxI_XwY_wOkM9y5YIKTcRVksxIqFzvWcc5GkQikZsENuyOPHQ0FPTTU-oLSaXGPsiBZm4NZROyBcRt0iXO891h6O6Gc6Qxu2bPTgMPy5ZttTUebn6GXNe0Lnq95MluZUiT1XScoVz-R_1xea9jgH</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2905670508</pqid></control><display><type>article</type><title>Context Enhanced Transformer for Single Image Object Detection</title><source>Publicly Available Content Database</source><creator>An, Seungjun ; Park, Seonghoon ; Kim, Gyeongnyeon ; Baek, Jeongyeol ; Lee, Byeongwon ; Kim, Seungryong</creator><creatorcontrib>An, Seungjun ; Park, Seonghoon ; Kim, Gyeongnyeon ; Baek, Jeongyeol ; Lee, Byeongwon ; Kim, Seungryong</creatorcontrib><description>With the increasing importance of video data in real-world applications, there is a rising need for efficient object detection methods that utilize temporal information. While existing video object detection (VOD) techniques employ various strategies to address this challenge, they typically depend on locally adjacent frames or randomly sampled images within a clip. Although recent Transformer-based VOD methods have shown promising results, their reliance on multiple inputs and additional network complexity to incorporate temporal information limits their practical applicability. In this paper, we propose a novel approach to single image object detection, called Context Enhanced TRansformer (CETR), by incorporating temporal context into DETR using a newly designed memory module. To efficiently store temporal information, we construct a class-wise memory that collects contextual information across data. Additionally, we present a classification-based sampling technique to selectively utilize the relevant memory for the current image. In the testing, We introduce a test-time memory adaptation method that updates individual memory functions by considering the test distribution. Experiments with CityCam and ImageNet VID datasets exhibit the efficiency of the framework on various video systems. The project page and code will be made available at: https://ku-cvlab.github.io/CETR.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Context ; Image enhancement ; Object recognition ; Sampling methods ; Testing time ; Transformers ; Video data</subject><ispartof>arXiv.org, 2023-12</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2905670508?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>An, Seungjun</creatorcontrib><creatorcontrib>Park, Seonghoon</creatorcontrib><creatorcontrib>Kim, Gyeongnyeon</creatorcontrib><creatorcontrib>Baek, Jeongyeol</creatorcontrib><creatorcontrib>Lee, Byeongwon</creatorcontrib><creatorcontrib>Kim, Seungryong</creatorcontrib><title>Context Enhanced Transformer for Single Image Object Detection</title><title>arXiv.org</title><description>With the increasing importance of video data in real-world applications, there is a rising need for efficient object detection methods that utilize temporal information. While existing video object detection (VOD) techniques employ various strategies to address this challenge, they typically depend on locally adjacent frames or randomly sampled images within a clip. Although recent Transformer-based VOD methods have shown promising results, their reliance on multiple inputs and additional network complexity to incorporate temporal information limits their practical applicability. In this paper, we propose a novel approach to single image object detection, called Context Enhanced TRansformer (CETR), by incorporating temporal context into DETR using a newly designed memory module. To efficiently store temporal information, we construct a class-wise memory that collects contextual information across data. Additionally, we present a classification-based sampling technique to selectively utilize the relevant memory for the current image. In the testing, We introduce a test-time memory adaptation method that updates individual memory functions by considering the test distribution. Experiments with CityCam and ImageNet VID datasets exhibit the efficiency of the framework on various video systems. The project page and code will be made available at: https://ku-cvlab.github.io/CETR.</description><subject>Context</subject><subject>Image enhancement</subject><subject>Object recognition</subject><subject>Sampling methods</subject><subject>Testing time</subject><subject>Transformers</subject><subject>Video data</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNyrEOgjAUQNHGxESi_MNLnElqSwEXF8To5CA7qfhACLxqWxI_XwY_wOkM9y5YIKTcRVksxIqFzvWcc5GkQikZsENuyOPHQ0FPTTU-oLSaXGPsiBZm4NZROyBcRt0iXO891h6O6Gc6Qxu2bPTgMPy5ZttTUebn6GXNe0Lnq95MluZUiT1XScoVz-R_1xea9jgH</recordid><startdate>20231226</startdate><enddate>20231226</enddate><creator>An, Seungjun</creator><creator>Park, Seonghoon</creator><creator>Kim, Gyeongnyeon</creator><creator>Baek, Jeongyeol</creator><creator>Lee, Byeongwon</creator><creator>Kim, Seungryong</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20231226</creationdate><title>Context Enhanced Transformer for Single Image Object Detection</title><author>An, Seungjun ; Park, Seonghoon ; Kim, Gyeongnyeon ; Baek, Jeongyeol ; Lee, Byeongwon ; Kim, Seungryong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_29056705083</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Context</topic><topic>Image enhancement</topic><topic>Object recognition</topic><topic>Sampling methods</topic><topic>Testing time</topic><topic>Transformers</topic><topic>Video data</topic><toplevel>online_resources</toplevel><creatorcontrib>An, Seungjun</creatorcontrib><creatorcontrib>Park, Seonghoon</creatorcontrib><creatorcontrib>Kim, Gyeongnyeon</creatorcontrib><creatorcontrib>Baek, Jeongyeol</creatorcontrib><creatorcontrib>Lee, Byeongwon</creatorcontrib><creatorcontrib>Kim, Seungryong</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>An, Seungjun</au><au>Park, Seonghoon</au><au>Kim, Gyeongnyeon</au><au>Baek, Jeongyeol</au><au>Lee, Byeongwon</au><au>Kim, Seungryong</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Context Enhanced Transformer for Single Image Object Detection</atitle><jtitle>arXiv.org</jtitle><date>2023-12-26</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>With the increasing importance of video data in real-world applications, there is a rising need for efficient object detection methods that utilize temporal information. While existing video object detection (VOD) techniques employ various strategies to address this challenge, they typically depend on locally adjacent frames or randomly sampled images within a clip. Although recent Transformer-based VOD methods have shown promising results, their reliance on multiple inputs and additional network complexity to incorporate temporal information limits their practical applicability. In this paper, we propose a novel approach to single image object detection, called Context Enhanced TRansformer (CETR), by incorporating temporal context into DETR using a newly designed memory module. To efficiently store temporal information, we construct a class-wise memory that collects contextual information across data. Additionally, we present a classification-based sampling technique to selectively utilize the relevant memory for the current image. In the testing, We introduce a test-time memory adaptation method that updates individual memory functions by considering the test distribution. Experiments with CityCam and ImageNet VID datasets exhibit the efficiency of the framework on various video systems. The project page and code will be made available at: https://ku-cvlab.github.io/CETR.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2023-12
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2905670508
source	Publicly Available Content Database
subjects	Context Image enhancement Object recognition Sampling methods Testing time Transformers Video data
title	Context Enhanced Transformer for Single Image Object Detection
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T23%3A43%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Context%20Enhanced%20Transformer%20for%20Single%20Image%20Object%20Detection&rft.jtitle=arXiv.org&rft.au=An,%20Seungjun&rft.date=2023-12-26&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2905670508%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_29056705083%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2905670508&rft_id=info:pmid/&rfr_iscdi=true