Loading…

CEMFormer: Learning to Predict Driver Intentions from In-Cabin and External Cameras via Spatial-Temporal Transformers

Driver intention prediction seeks to anticipate drivers' actions by analyzing their behaviors with respect to surrounding traffic environments. Existing approaches primarily focus on late-fusion techniques, and neglect the importance of maintaining consistency between predictions and prevailing...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2023-05
Main Authors: Ma, Yunsheng, Ye, Wenqian, Cao, Xu, Abdelraouf, Amr, Han, Kyungtae, Gupta, Rohit, Wang, Ziran
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Ma, Yunsheng
Ye, Wenqian
Cao, Xu
Abdelraouf, Amr
Han, Kyungtae
Gupta, Rohit
Wang, Ziran
description Driver intention prediction seeks to anticipate drivers' actions by analyzing their behaviors with respect to surrounding traffic environments. Existing approaches primarily focus on late-fusion techniques, and neglect the importance of maintaining consistency between predictions and prevailing driving contexts. In this paper, we introduce a new framework called Cross-View Episodic Memory Transformer (CEMFormer), which employs spatio-temporal transformers to learn unified memory representations for an improved driver intention prediction. Specifically, we develop a spatial-temporal encoder to integrate information from both in-cabin and external camera views, along with episodic memory representations to continuously fuse historical data. Furthermore, we propose a novel context-consistency loss that incorporates driving context as an auxiliary supervision signal to improve prediction performance. Comprehensive experiments on the Brain4Cars dataset demonstrate that CEMFormer consistently outperforms existing state-of-the-art methods in driver intention prediction.
format article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2814210622</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2814210622</sourcerecordid><originalsourceid>FETCH-proquest_journals_28142106223</originalsourceid><addsrcrecordid>eNqNjcFqwlAQRR8FoVL9h4GuA8lEo3SbRiy0UGj2MppJeZLMizMv0s9vKn5AV5fLPZz74OaY51myXSE-uqXZOU1TLDa4XudzN5bVxy5oz_oC70wqXr4hBvhUbvwpwqv6Kyu8SWSJPohBq6GfelLS0QuQNFD9RFahDkqaPGRw9QRfA0VPXVJzPwSdxlpJrL1d2cLNWuqMl_d8cs-7qi73yaDhMrLFwzmMf0o74DZbYZYWiPn_qF_iekvt</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2814210622</pqid></control><display><type>article</type><title>CEMFormer: Learning to Predict Driver Intentions from In-Cabin and External Cameras via Spatial-Temporal Transformers</title><source>ProQuest Publicly Available Content</source><creator>Ma, Yunsheng ; Ye, Wenqian ; Cao, Xu ; Abdelraouf, Amr ; Han, Kyungtae ; Gupta, Rohit ; Wang, Ziran</creator><creatorcontrib>Ma, Yunsheng ; Ye, Wenqian ; Cao, Xu ; Abdelraouf, Amr ; Han, Kyungtae ; Gupta, Rohit ; Wang, Ziran</creatorcontrib><description>Driver intention prediction seeks to anticipate drivers' actions by analyzing their behaviors with respect to surrounding traffic environments. Existing approaches primarily focus on late-fusion techniques, and neglect the importance of maintaining consistency between predictions and prevailing driving contexts. In this paper, we introduce a new framework called Cross-View Episodic Memory Transformer (CEMFormer), which employs spatio-temporal transformers to learn unified memory representations for an improved driver intention prediction. Specifically, we develop a spatial-temporal encoder to integrate information from both in-cabin and external camera views, along with episodic memory representations to continuously fuse historical data. Furthermore, we propose a novel context-consistency loss that incorporates driving context as an auxiliary supervision signal to improve prediction performance. Comprehensive experiments on the Brain4Cars dataset demonstrate that CEMFormer consistently outperforms existing state-of-the-art methods in driver intention prediction.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Cameras ; Coders ; Consistency ; Context ; Representations ; Transformers</subject><ispartof>arXiv.org, 2023-05</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2814210622?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml></links><search><creatorcontrib>Ma, Yunsheng</creatorcontrib><creatorcontrib>Ye, Wenqian</creatorcontrib><creatorcontrib>Cao, Xu</creatorcontrib><creatorcontrib>Abdelraouf, Amr</creatorcontrib><creatorcontrib>Han, Kyungtae</creatorcontrib><creatorcontrib>Gupta, Rohit</creatorcontrib><creatorcontrib>Wang, Ziran</creatorcontrib><title>CEMFormer: Learning to Predict Driver Intentions from In-Cabin and External Cameras via Spatial-Temporal Transformers</title><title>arXiv.org</title><description>Driver intention prediction seeks to anticipate drivers' actions by analyzing their behaviors with respect to surrounding traffic environments. Existing approaches primarily focus on late-fusion techniques, and neglect the importance of maintaining consistency between predictions and prevailing driving contexts. In this paper, we introduce a new framework called Cross-View Episodic Memory Transformer (CEMFormer), which employs spatio-temporal transformers to learn unified memory representations for an improved driver intention prediction. Specifically, we develop a spatial-temporal encoder to integrate information from both in-cabin and external camera views, along with episodic memory representations to continuously fuse historical data. Furthermore, we propose a novel context-consistency loss that incorporates driving context as an auxiliary supervision signal to improve prediction performance. Comprehensive experiments on the Brain4Cars dataset demonstrate that CEMFormer consistently outperforms existing state-of-the-art methods in driver intention prediction.</description><subject>Cameras</subject><subject>Coders</subject><subject>Consistency</subject><subject>Context</subject><subject>Representations</subject><subject>Transformers</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNjcFqwlAQRR8FoVL9h4GuA8lEo3SbRiy0UGj2MppJeZLMizMv0s9vKn5AV5fLPZz74OaY51myXSE-uqXZOU1TLDa4XudzN5bVxy5oz_oC70wqXr4hBvhUbvwpwqv6Kyu8SWSJPohBq6GfelLS0QuQNFD9RFahDkqaPGRw9QRfA0VPXVJzPwSdxlpJrL1d2cLNWuqMl_d8cs-7qi73yaDhMrLFwzmMf0o74DZbYZYWiPn_qF_iekvt</recordid><startdate>20230513</startdate><enddate>20230513</enddate><creator>Ma, Yunsheng</creator><creator>Ye, Wenqian</creator><creator>Cao, Xu</creator><creator>Abdelraouf, Amr</creator><creator>Han, Kyungtae</creator><creator>Gupta, Rohit</creator><creator>Wang, Ziran</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PHGZM</scope><scope>PHGZT</scope><scope>PIMPY</scope><scope>PKEHL</scope><scope>PQEST</scope><scope>PQGLB</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20230513</creationdate><title>CEMFormer: Learning to Predict Driver Intentions from In-Cabin and External Cameras via Spatial-Temporal Transformers</title><author>Ma, Yunsheng ; Ye, Wenqian ; Cao, Xu ; Abdelraouf, Amr ; Han, Kyungtae ; Gupta, Rohit ; Wang, Ziran</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28142106223</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Cameras</topic><topic>Coders</topic><topic>Consistency</topic><topic>Context</topic><topic>Representations</topic><topic>Transformers</topic><toplevel>online_resources</toplevel><creatorcontrib>Ma, Yunsheng</creatorcontrib><creatorcontrib>Ye, Wenqian</creatorcontrib><creatorcontrib>Cao, Xu</creatorcontrib><creatorcontrib>Abdelraouf, Amr</creatorcontrib><creatorcontrib>Han, Kyungtae</creatorcontrib><creatorcontrib>Gupta, Rohit</creatorcontrib><creatorcontrib>Wang, Ziran</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central (subscription)</collection><collection>Technology collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>ProQuest Central (New)</collection><collection>ProQuest One Academic (New)</collection><collection>ProQuest Publicly Available Content</collection><collection>ProQuest One Academic Middle East (New)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Applied &amp; Life Sciences</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ma, Yunsheng</au><au>Ye, Wenqian</au><au>Cao, Xu</au><au>Abdelraouf, Amr</au><au>Han, Kyungtae</au><au>Gupta, Rohit</au><au>Wang, Ziran</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>CEMFormer: Learning to Predict Driver Intentions from In-Cabin and External Cameras via Spatial-Temporal Transformers</atitle><jtitle>arXiv.org</jtitle><date>2023-05-13</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Driver intention prediction seeks to anticipate drivers' actions by analyzing their behaviors with respect to surrounding traffic environments. Existing approaches primarily focus on late-fusion techniques, and neglect the importance of maintaining consistency between predictions and prevailing driving contexts. In this paper, we introduce a new framework called Cross-View Episodic Memory Transformer (CEMFormer), which employs spatio-temporal transformers to learn unified memory representations for an improved driver intention prediction. Specifically, we develop a spatial-temporal encoder to integrate information from both in-cabin and external camera views, along with episodic memory representations to continuously fuse historical data. Furthermore, we propose a novel context-consistency loss that incorporates driving context as an auxiliary supervision signal to improve prediction performance. Comprehensive experiments on the Brain4Cars dataset demonstrate that CEMFormer consistently outperforms existing state-of-the-art methods in driver intention prediction.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2023-05
issn 2331-8422
language eng
recordid cdi_proquest_journals_2814210622
source ProQuest Publicly Available Content
subjects Cameras
Coders
Consistency
Context
Representations
Transformers
title CEMFormer: Learning to Predict Driver Intentions from In-Cabin and External Cameras via Spatial-Temporal Transformers
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-03-09T19%3A37%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=CEMFormer:%20Learning%20to%20Predict%20Driver%20Intentions%20from%20In-Cabin%20and%20External%20Cameras%20via%20Spatial-Temporal%20Transformers&rft.jtitle=arXiv.org&rft.au=Ma,%20Yunsheng&rft.date=2023-05-13&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2814210622%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_28142106223%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2814210622&rft_id=info:pmid/&rfr_iscdi=true