Loading…
Event-Free Moving Object Segmentation from Moving Ego Vehicle
Moving object segmentation (MOS) in dynamic scenes is an important, challenging, but under-explored research topic for autonomous driving, especially for sequences obtained from moving ego vehicles. Most segmentation methods leverage motion cues obtained from optical flow maps. However, since these...
Saved in:
Published in: | arXiv.org 2024-09 |
---|---|
Main Authors: | , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | |
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Zhou, Zhuyun Wu, Zongwei Danda Pani Paudel Boutteau, Rémi Yang, Fan Luc Van Gool Timofte, Radu Ginhac, Dominique |
description | Moving object segmentation (MOS) in dynamic scenes is an important, challenging, but under-explored research topic for autonomous driving, especially for sequences obtained from moving ego vehicles. Most segmentation methods leverage motion cues obtained from optical flow maps. However, since these methods are often based on optical flows that are pre-computed from successive RGB frames, this neglects the temporal consideration of events occurring within the inter-frame, consequently constraining its ability to discern objects exhibiting relative staticity but genuinely in motion. To address these limitations, we propose to exploit event cameras for better video understanding, which provide rich motion cues without relying on optical flow. To foster research in this area, we first introduce a novel large-scale dataset called DSEC-MOS for moving object segmentation from moving ego vehicles, which is the first of its kind. For benchmarking, we select various mainstream methods and rigorously evaluate them on our dataset. Subsequently, we devise EmoFormer, a novel network able to exploit the event data. For this purpose, we fuse the event temporal prior with spatial semantic maps to distinguish genuinely moving objects from the static background, adding another level of dense supervision around our object of interest. Our proposed network relies only on event data for training but does not require event input during inference, making it directly comparable to frame-only methods in terms of efficiency and more widely usable in many application cases. The exhaustive comparison highlights a significant performance improvement of our method over all other methods. The source code and dataset are publicly available at: https://github.com/ZZY-Zhou/DSEC-MOS. |
format | article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2808433436</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2808433436</sourcerecordid><originalsourceid>FETCH-proquest_journals_28084334363</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSwdS1LzSvRdStKTVXwzS_LzEtX8E_KSk0uUQhOTc8FSiWWZObnKaQV5efC5F3T8xXCUjMyk3NSeRhY0xJzilN5oTQ3g7Kba4izh25BUX5haWpxSXxWfmlRHlAq3sjCwMLE2NjE2MyYOFUAMnc3WQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2808433436</pqid></control><display><type>article</type><title>Event-Free Moving Object Segmentation from Moving Ego Vehicle</title><source>Publicly Available Content Database</source><creator>Zhou, Zhuyun ; Wu, Zongwei ; Danda Pani Paudel ; Boutteau, Rémi ; Yang, Fan ; Luc Van Gool ; Timofte, Radu ; Ginhac, Dominique</creator><creatorcontrib>Zhou, Zhuyun ; Wu, Zongwei ; Danda Pani Paudel ; Boutteau, Rémi ; Yang, Fan ; Luc Van Gool ; Timofte, Radu ; Ginhac, Dominique</creatorcontrib><description>Moving object segmentation (MOS) in dynamic scenes is an important, challenging, but under-explored research topic for autonomous driving, especially for sequences obtained from moving ego vehicles. Most segmentation methods leverage motion cues obtained from optical flow maps. However, since these methods are often based on optical flows that are pre-computed from successive RGB frames, this neglects the temporal consideration of events occurring within the inter-frame, consequently constraining its ability to discern objects exhibiting relative staticity but genuinely in motion. To address these limitations, we propose to exploit event cameras for better video understanding, which provide rich motion cues without relying on optical flow. To foster research in this area, we first introduce a novel large-scale dataset called DSEC-MOS for moving object segmentation from moving ego vehicles, which is the first of its kind. For benchmarking, we select various mainstream methods and rigorously evaluate them on our dataset. Subsequently, we devise EmoFormer, a novel network able to exploit the event data. For this purpose, we fuse the event temporal prior with spatial semantic maps to distinguish genuinely moving objects from the static background, adding another level of dense supervision around our object of interest. Our proposed network relies only on event data for training but does not require event input during inference, making it directly comparable to frame-only methods in terms of efficiency and more widely usable in many application cases. The exhaustive comparison highlights a significant performance improvement of our method over all other methods. The source code and dataset are publicly available at: https://github.com/ZZY-Zhou/DSEC-MOS.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Annotations ; Cameras ; Computer vision ; Datasets ; Frames (data processing) ; Image segmentation ; Object motion ; Temporal resolution</subject><ispartof>arXiv.org, 2024-09</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2808433436?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Zhou, Zhuyun</creatorcontrib><creatorcontrib>Wu, Zongwei</creatorcontrib><creatorcontrib>Danda Pani Paudel</creatorcontrib><creatorcontrib>Boutteau, Rémi</creatorcontrib><creatorcontrib>Yang, Fan</creatorcontrib><creatorcontrib>Luc Van Gool</creatorcontrib><creatorcontrib>Timofte, Radu</creatorcontrib><creatorcontrib>Ginhac, Dominique</creatorcontrib><title>Event-Free Moving Object Segmentation from Moving Ego Vehicle</title><title>arXiv.org</title><description>Moving object segmentation (MOS) in dynamic scenes is an important, challenging, but under-explored research topic for autonomous driving, especially for sequences obtained from moving ego vehicles. Most segmentation methods leverage motion cues obtained from optical flow maps. However, since these methods are often based on optical flows that are pre-computed from successive RGB frames, this neglects the temporal consideration of events occurring within the inter-frame, consequently constraining its ability to discern objects exhibiting relative staticity but genuinely in motion. To address these limitations, we propose to exploit event cameras for better video understanding, which provide rich motion cues without relying on optical flow. To foster research in this area, we first introduce a novel large-scale dataset called DSEC-MOS for moving object segmentation from moving ego vehicles, which is the first of its kind. For benchmarking, we select various mainstream methods and rigorously evaluate them on our dataset. Subsequently, we devise EmoFormer, a novel network able to exploit the event data. For this purpose, we fuse the event temporal prior with spatial semantic maps to distinguish genuinely moving objects from the static background, adding another level of dense supervision around our object of interest. Our proposed network relies only on event data for training but does not require event input during inference, making it directly comparable to frame-only methods in terms of efficiency and more widely usable in many application cases. The exhaustive comparison highlights a significant performance improvement of our method over all other methods. The source code and dataset are publicly available at: https://github.com/ZZY-Zhou/DSEC-MOS.</description><subject>Annotations</subject><subject>Cameras</subject><subject>Computer vision</subject><subject>Datasets</subject><subject>Frames (data processing)</subject><subject>Image segmentation</subject><subject>Object motion</subject><subject>Temporal resolution</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSwdS1LzSvRdStKTVXwzS_LzEtX8E_KSk0uUQhOTc8FSiWWZObnKaQV5efC5F3T8xXCUjMyk3NSeRhY0xJzilN5oTQ3g7Kba4izh25BUX5haWpxSXxWfmlRHlAq3sjCwMLE2NjE2MyYOFUAMnc3WQ</recordid><startdate>20240925</startdate><enddate>20240925</enddate><creator>Zhou, Zhuyun</creator><creator>Wu, Zongwei</creator><creator>Danda Pani Paudel</creator><creator>Boutteau, Rémi</creator><creator>Yang, Fan</creator><creator>Luc Van Gool</creator><creator>Timofte, Radu</creator><creator>Ginhac, Dominique</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240925</creationdate><title>Event-Free Moving Object Segmentation from Moving Ego Vehicle</title><author>Zhou, Zhuyun ; Wu, Zongwei ; Danda Pani Paudel ; Boutteau, Rémi ; Yang, Fan ; Luc Van Gool ; Timofte, Radu ; Ginhac, Dominique</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28084334363</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Annotations</topic><topic>Cameras</topic><topic>Computer vision</topic><topic>Datasets</topic><topic>Frames (data processing)</topic><topic>Image segmentation</topic><topic>Object motion</topic><topic>Temporal resolution</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhou, Zhuyun</creatorcontrib><creatorcontrib>Wu, Zongwei</creatorcontrib><creatorcontrib>Danda Pani Paudel</creatorcontrib><creatorcontrib>Boutteau, Rémi</creatorcontrib><creatorcontrib>Yang, Fan</creatorcontrib><creatorcontrib>Luc Van Gool</creatorcontrib><creatorcontrib>Timofte, Radu</creatorcontrib><creatorcontrib>Ginhac, Dominique</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhou, Zhuyun</au><au>Wu, Zongwei</au><au>Danda Pani Paudel</au><au>Boutteau, Rémi</au><au>Yang, Fan</au><au>Luc Van Gool</au><au>Timofte, Radu</au><au>Ginhac, Dominique</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Event-Free Moving Object Segmentation from Moving Ego Vehicle</atitle><jtitle>arXiv.org</jtitle><date>2024-09-25</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Moving object segmentation (MOS) in dynamic scenes is an important, challenging, but under-explored research topic for autonomous driving, especially for sequences obtained from moving ego vehicles. Most segmentation methods leverage motion cues obtained from optical flow maps. However, since these methods are often based on optical flows that are pre-computed from successive RGB frames, this neglects the temporal consideration of events occurring within the inter-frame, consequently constraining its ability to discern objects exhibiting relative staticity but genuinely in motion. To address these limitations, we propose to exploit event cameras for better video understanding, which provide rich motion cues without relying on optical flow. To foster research in this area, we first introduce a novel large-scale dataset called DSEC-MOS for moving object segmentation from moving ego vehicles, which is the first of its kind. For benchmarking, we select various mainstream methods and rigorously evaluate them on our dataset. Subsequently, we devise EmoFormer, a novel network able to exploit the event data. For this purpose, we fuse the event temporal prior with spatial semantic maps to distinguish genuinely moving objects from the static background, adding another level of dense supervision around our object of interest. Our proposed network relies only on event data for training but does not require event input during inference, making it directly comparable to frame-only methods in terms of efficiency and more widely usable in many application cases. The exhaustive comparison highlights a significant performance improvement of our method over all other methods. The source code and dataset are publicly available at: https://github.com/ZZY-Zhou/DSEC-MOS.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2024-09 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2808433436 |
source | Publicly Available Content Database |
subjects | Annotations Cameras Computer vision Datasets Frames (data processing) Image segmentation Object motion Temporal resolution |
title | Event-Free Moving Object Segmentation from Moving Ego Vehicle |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T05%3A09%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Event-Free%20Moving%20Object%20Segmentation%20from%20Moving%20Ego%20Vehicle&rft.jtitle=arXiv.org&rft.au=Zhou,%20Zhuyun&rft.date=2024-09-25&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2808433436%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_28084334363%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2808433436&rft_id=info:pmid/&rfr_iscdi=true |