Loading…

CEMFormer: Learning to Predict Driver Intentions from In-Cabin and External Cameras via Spatial-Temporal Transformers

Driver intention prediction seeks to anticipate drivers' actions by analyzing their behaviors with respect to surrounding traffic environments. Existing approaches primarily focus on late-fusion techniques, and neglect the importance of maintaining consistency between predictions and prevailing...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2023-05
Main Authors:	Ma, Yunsheng, Ye, Wenqian, Cao, Xu, Abdelraouf, Amr, Han, Kyungtae, Gupta, Rohit, Wang, Ziran
Format:	Article
Language:	English
Subjects:	Cameras Coders Consistency Context Representations Transformers
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Ma, Yunsheng Ye, Wenqian Cao, Xu Abdelraouf, Amr Han, Kyungtae Gupta, Rohit Wang, Ziran
description	Driver intention prediction seeks to anticipate drivers' actions by analyzing their behaviors with respect to surrounding traffic environments. Existing approaches primarily focus on late-fusion techniques, and neglect the importance of maintaining consistency between predictions and prevailing driving contexts. In this paper, we introduce a new framework called Cross-View Episodic Memory Transformer (CEMFormer), which employs spatio-temporal transformers to learn unified memory representations for an improved driver intention prediction. Specifically, we develop a spatial-temporal encoder to integrate information from both in-cabin and external camera views, along with episodic memory representations to continuously fuse historical data. Furthermore, we propose a novel context-consistency loss that incorporates driving context as an auxiliary supervision signal to improve prediction performance. Comprehensive experiments on the Brain4Cars dataset demonstrate that CEMFormer consistently outperforms existing state-of-the-art methods in driver intention prediction.
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2814210622</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2814210622</sourcerecordid><originalsourceid>FETCH-proquest_journals_28142106223</originalsourceid><addsrcrecordid>eNqNjcFqwlAQRR8FoVL9h4GuA8lEo3SbRiy0UGj2MppJeZLMizMv0s9vKn5AV5fLPZz74OaY51myXSE-uqXZOU1TLDa4XudzN5bVxy5oz_oC70wqXr4hBvhUbvwpwqv6Kyu8SWSJPohBq6GfelLS0QuQNFD9RFahDkqaPGRw9QRfA0VPXVJzPwSdxlpJrL1d2cLNWuqMl_d8cs-7qi73yaDhMrLFwzmMf0o74DZbYZYWiPn_qF_iekvt</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2814210622</pqid></control><display><type>article</type><title>CEMFormer: Learning to Predict Driver Intentions from In-Cabin and External Cameras via Spatial-Temporal Transformers</title><source>ProQuest Publicly Available Content</source><creator>Ma, Yunsheng ; Ye, Wenqian ; Cao, Xu ; Abdelraouf, Amr ; Han, Kyungtae ; Gupta, Rohit ; Wang, Ziran</creator><creatorcontrib>Ma, Yunsheng ; Ye, Wenqian ; Cao, Xu ; Abdelraouf, Amr ; Han, Kyungtae ; Gupta, Rohit ; Wang, Ziran</creatorcontrib><description>Driver intention prediction seeks to anticipate drivers' actions by analyzing their behaviors with respect to surrounding traffic environments. Existing approaches primarily focus on late-fusion techniques, and neglect the importance of maintaining consistency between predictions and prevailing driving contexts. In this paper, we introduce a new framework called Cross-View Episodic Memory Transformer (CEMFormer), which employs spatio-temporal transformers to learn unified memory representations for an improved driver intention prediction. Specifically, we develop a spatial-temporal encoder to integrate information from both in-cabin and external camera views, along with episodic memory representations to continuously fuse historical data. Furthermore, we propose a novel context-consistency loss that incorporates driving context as an auxiliary supervision signal to improve prediction performance. Comprehensive experiments on the Brain4Cars dataset demonstrate that CEMFormer consistently outperforms existing state-of-the-art methods in driver intention prediction.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Cameras ; Coders ; Consistency ; Context ; Representations ; Transformers</subject><ispartof>arXiv.org, 2023-05</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2814210622?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml></links><search><creatorcontrib>Ma, Yunsheng</creatorcontrib><creatorcontrib>Ye, Wenqian</creatorcontrib><creatorcontrib>Cao, Xu</creatorcontrib><creatorcontrib>Abdelraouf, Amr</creatorcontrib><creatorcontrib>Han, Kyungtae</creatorcontrib><creatorcontrib>Gupta, Rohit</creatorcontrib><creatorcontrib>Wang, Ziran</creatorcontrib><title>CEMFormer: Learning to Predict Driver Intentions from In-Cabin and External Cameras via Spatial-Temporal Transformers</title><title>arXiv.org</title><description>Driver intention prediction seeks to anticipate drivers' actions by analyzing their behaviors with respect to surrounding traffic environments. Existing approaches primarily focus on late-fusion techniques, and neglect the importance of maintaining consistency between predictions and prevailing driving contexts. In this paper, we introduce a new framework called Cross-View Episodic Memory Transformer (CEMFormer), which employs spatio-temporal transformers to learn unified memory representations for an improved driver intention prediction. Specifically, we develop a spatial-temporal encoder to integrate information from both in-cabin and external camera views, along with episodic memory representations to continuously fuse historical data. Furthermore, we propose a novel context-consistency loss that incorporates driving context as an auxiliary supervision signal to improve prediction performance. Comprehensive experiments on the Brain4Cars dataset demonstrate that CEMFormer consistently outperforms existing state-of-the-art methods in driver intention prediction.</description><subject>Cameras</subject><subject>Coders</subject><subject>Consistency</subject><subject>Context</subject><subject>Representations</subject><subject>Transformers</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNjcFqwlAQRR8FoVL9h4GuA8lEo3SbRiy0UGj2MppJeZLMizMv0s9vKn5AV5fLPZz74OaY51myXSE-uqXZOU1TLDa4XudzN5bVxy5oz_oC70wqXr4hBvhUbvwpwqv6Kyu8SWSJPohBq6GfelLS0QuQNFD9RFahDkqaPGRw9QRfA0VPXVJzPwSdxlpJrL1d2cLNWuqMl_d8cs-7qi73yaDhMrLFwzmMf0o74DZbYZYWiPn_qF_iekvt</recordid><startdate>20230513</startdate><enddate>20230513</enddate><creator>Ma, Yunsheng</creator><creator>Ye, Wenqian</creator><creator>Cao, Xu</creator><creator>Abdelraouf, Amr</creator><creator>Han, Kyungtae</creator><creator>Gupta, Rohit</creator><creator>Wang, Ziran</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PHGZM</scope><scope>PHGZT</scope><scope>PIMPY</scope><scope>PKEHL</scope><scope>PQEST</scope><scope>PQGLB</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20230513</creationdate><title>CEMFormer: Learning to Predict Driver Intentions from In-Cabin and External Cameras via Spatial-Temporal Transformers</title><author>Ma, Yunsheng ; Ye, Wenqian ; Cao, Xu ; Abdelraouf, Amr ; Han, Kyungtae ; Gupta, Rohit ; Wang, Ziran</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28142106223</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Cameras</topic><topic>Coders</topic><topic>Consistency</topic><topic>Context</topic><topic>Representations</topic><topic>Transformers</topic><toplevel>online_resources</toplevel><creatorcontrib>Ma, Yunsheng</creatorcontrib><creatorcontrib>Ye, Wenqian</creatorcontrib><creatorcontrib>Cao, Xu</creatorcontrib><creatorcontrib>Abdelraouf, Amr</creatorcontrib><creatorcontrib>Han, Kyungtae</creatorcontrib><creatorcontrib>Gupta, Rohit</creatorcontrib><creatorcontrib>Wang, Ziran</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central (subscription)</collection><collection>Technology collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>ProQuest Central (New)</collection><collection>ProQuest One Academic (New)</collection><collection>ProQuest Publicly Available Content</collection><collection>ProQuest One Academic Middle East (New)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Applied & Life Sciences</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ma, Yunsheng</au><au>Ye, Wenqian</au><au>Cao, Xu</au><au>Abdelraouf, Amr</au><au>Han, Kyungtae</au><au>Gupta, Rohit</au><au>Wang, Ziran</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>CEMFormer: Learning to Predict Driver Intentions from In-Cabin and External Cameras via Spatial-Temporal Transformers</atitle><jtitle>arXiv.org</jtitle><date>2023-05-13</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Driver intention prediction seeks to anticipate drivers' actions by analyzing their behaviors with respect to surrounding traffic environments. Existing approaches primarily focus on late-fusion techniques, and neglect the importance of maintaining consistency between predictions and prevailing driving contexts. In this paper, we introduce a new framework called Cross-View Episodic Memory Transformer (CEMFormer), which employs spatio-temporal transformers to learn unified memory representations for an improved driver intention prediction. Specifically, we develop a spatial-temporal encoder to integrate information from both in-cabin and external camera views, along with episodic memory representations to continuously fuse historical data. Furthermore, we propose a novel context-consistency loss that incorporates driving context as an auxiliary supervision signal to improve prediction performance. Comprehensive experiments on the Brain4Cars dataset demonstrate that CEMFormer consistently outperforms existing state-of-the-art methods in driver intention prediction.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2023-05
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2814210622
source	ProQuest Publicly Available Content
subjects	Cameras Coders Consistency Context Representations Transformers
title	CEMFormer: Learning to Predict Driver Intentions from In-Cabin and External Cameras via Spatial-Temporal Transformers
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-03-09T19%3A37%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=CEMFormer:%20Learning%20to%20Predict%20Driver%20Intentions%20from%20In-Cabin%20and%20External%20Cameras%20via%20Spatial-Temporal%20Transformers&rft.jtitle=arXiv.org&rft.au=Ma,%20Yunsheng&rft.date=2023-05-13&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2814210622%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_28142106223%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2814210622&rft_id=info:pmid/&rfr_iscdi=true