Loading…

Demonstrating Multi-modal Human Instruction Comprehension with AR Smart Glass

We present a multi-modal human instruction comprehension prototype for object acquisition tasks that involve verbal, visual and pointing gesture cues. Our prototype includes an AR smart-glass for issuing the instructions and a Jetson TX2 pervasive device for executing comprehension algorithms. With...

Full description

Saved in:
Bibliographic Details
Main Authors: Weerakoon, Dulanga, Subbaraju, Vigneshwaran, Tran, Tuan, Misra, Archan
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 233
container_issue
container_start_page 231
container_title
container_volume
creator Weerakoon, Dulanga
Subbaraju, Vigneshwaran
Tran, Tuan
Misra, Archan
description We present a multi-modal human instruction comprehension prototype for object acquisition tasks that involve verbal, visual and pointing gesture cues. Our prototype includes an AR smart-glass for issuing the instructions and a Jetson TX2 pervasive device for executing comprehension algorithms. With this setup, we enable on-device, computationally efficient object acquisition task comprehension with an average latency in the range of 150-330msec.
doi_str_mv 10.1109/COMSNETS56262.2023.10041269
format conference_proceeding
fullrecord <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10041269</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10041269</ieee_id><sourcerecordid>10041269</sourcerecordid><originalsourceid>FETCH-LOGICAL-i187t-89d84ad6407c3a637bb454b216b48d8142fa022b3e7fe7864426dbe6fed42cb33</originalsourceid><addsrcrecordid>eNo1kF1LwzAYhaMgOGb_gRcBrzuTN5-9HHVug9WBndcjaVMX6cdoUsR_r0O9OhweODwchB4oWVBKssd8X5Qvq0MpJEhYAAG2oIRwCjK7QkmmNJVScKWIhGs0AypECoJktygJ4YMQwqjOBIMZKp5cN_Qhjib6_h0XUxt92g21afFm6kyPtxc4VdEPPc6H7jy6k-vDpX36eMLLV1x2Zox43ZoQ7tBNY9rgkr-co7fn1SHfpLv9epsvd6mnWsVUZ7XmppacqIoZyZS1XHALVFqua005NIYAWOZU45SWnIOsrZONqzlUlrE5uv_d9c6543n0Pwpfx_8H2DdLAFGu</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Demonstrating Multi-modal Human Instruction Comprehension with AR Smart Glass</title><source>IEEE Xplore All Conference Series</source><creator>Weerakoon, Dulanga ; Subbaraju, Vigneshwaran ; Tran, Tuan ; Misra, Archan</creator><creatorcontrib>Weerakoon, Dulanga ; Subbaraju, Vigneshwaran ; Tran, Tuan ; Misra, Archan</creatorcontrib><description>We present a multi-modal human instruction comprehension prototype for object acquisition tasks that involve verbal, visual and pointing gesture cues. Our prototype includes an AR smart-glass for issuing the instructions and a Jetson TX2 pervasive device for executing comprehension algorithms. With this setup, we enable on-device, computationally efficient object acquisition task comprehension with an average latency in the range of 150-330msec.</description><identifier>EISSN: 2155-2509</identifier><identifier>EISBN: 9781665477062</identifier><identifier>EISBN: 1665477067</identifier><identifier>DOI: 10.1109/COMSNETS56262.2023.10041269</identifier><language>eng</language><publisher>IEEE</publisher><subject>Computational efficiency ; Computational modeling ; Human-AI Collaboration ; Multi-Modal Networks ; Pervasive Systems ; Predictive models ; Prototypes ; Real-time systems ; Referring Expression Comprehension ; Task analysis ; Visual Grounding ; Visualization</subject><ispartof>2023 15th International Conference on COMmunication Systems &amp; NETworkS (COMSNETS), 2023, p.231-233</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10041269$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10041269$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Weerakoon, Dulanga</creatorcontrib><creatorcontrib>Subbaraju, Vigneshwaran</creatorcontrib><creatorcontrib>Tran, Tuan</creatorcontrib><creatorcontrib>Misra, Archan</creatorcontrib><title>Demonstrating Multi-modal Human Instruction Comprehension with AR Smart Glass</title><title>2023 15th International Conference on COMmunication Systems &amp; NETworkS (COMSNETS)</title><addtitle>COMSNETS</addtitle><description>We present a multi-modal human instruction comprehension prototype for object acquisition tasks that involve verbal, visual and pointing gesture cues. Our prototype includes an AR smart-glass for issuing the instructions and a Jetson TX2 pervasive device for executing comprehension algorithms. With this setup, we enable on-device, computationally efficient object acquisition task comprehension with an average latency in the range of 150-330msec.</description><subject>Computational efficiency</subject><subject>Computational modeling</subject><subject>Human-AI Collaboration</subject><subject>Multi-Modal Networks</subject><subject>Pervasive Systems</subject><subject>Predictive models</subject><subject>Prototypes</subject><subject>Real-time systems</subject><subject>Referring Expression Comprehension</subject><subject>Task analysis</subject><subject>Visual Grounding</subject><subject>Visualization</subject><issn>2155-2509</issn><isbn>9781665477062</isbn><isbn>1665477067</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2023</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNo1kF1LwzAYhaMgOGb_gRcBrzuTN5-9HHVug9WBndcjaVMX6cdoUsR_r0O9OhweODwchB4oWVBKssd8X5Qvq0MpJEhYAAG2oIRwCjK7QkmmNJVScKWIhGs0AypECoJktygJ4YMQwqjOBIMZKp5cN_Qhjib6_h0XUxt92g21afFm6kyPtxc4VdEPPc6H7jy6k-vDpX36eMLLV1x2Zox43ZoQ7tBNY9rgkr-co7fn1SHfpLv9epsvd6mnWsVUZ7XmppacqIoZyZS1XHALVFqua005NIYAWOZU45SWnIOsrZONqzlUlrE5uv_d9c6543n0Pwpfx_8H2DdLAFGu</recordid><startdate>20230103</startdate><enddate>20230103</enddate><creator>Weerakoon, Dulanga</creator><creator>Subbaraju, Vigneshwaran</creator><creator>Tran, Tuan</creator><creator>Misra, Archan</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>20230103</creationdate><title>Demonstrating Multi-modal Human Instruction Comprehension with AR Smart Glass</title><author>Weerakoon, Dulanga ; Subbaraju, Vigneshwaran ; Tran, Tuan ; Misra, Archan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i187t-89d84ad6407c3a637bb454b216b48d8142fa022b3e7fe7864426dbe6fed42cb33</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computational efficiency</topic><topic>Computational modeling</topic><topic>Human-AI Collaboration</topic><topic>Multi-Modal Networks</topic><topic>Pervasive Systems</topic><topic>Predictive models</topic><topic>Prototypes</topic><topic>Real-time systems</topic><topic>Referring Expression Comprehension</topic><topic>Task analysis</topic><topic>Visual Grounding</topic><topic>Visualization</topic><toplevel>online_resources</toplevel><creatorcontrib>Weerakoon, Dulanga</creatorcontrib><creatorcontrib>Subbaraju, Vigneshwaran</creatorcontrib><creatorcontrib>Tran, Tuan</creatorcontrib><creatorcontrib>Misra, Archan</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore (Online service)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Weerakoon, Dulanga</au><au>Subbaraju, Vigneshwaran</au><au>Tran, Tuan</au><au>Misra, Archan</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Demonstrating Multi-modal Human Instruction Comprehension with AR Smart Glass</atitle><btitle>2023 15th International Conference on COMmunication Systems &amp; NETworkS (COMSNETS)</btitle><stitle>COMSNETS</stitle><date>2023-01-03</date><risdate>2023</risdate><spage>231</spage><epage>233</epage><pages>231-233</pages><eissn>2155-2509</eissn><eisbn>9781665477062</eisbn><eisbn>1665477067</eisbn><abstract>We present a multi-modal human instruction comprehension prototype for object acquisition tasks that involve verbal, visual and pointing gesture cues. Our prototype includes an AR smart-glass for issuing the instructions and a Jetson TX2 pervasive device for executing comprehension algorithms. With this setup, we enable on-device, computationally efficient object acquisition task comprehension with an average latency in the range of 150-330msec.</abstract><pub>IEEE</pub><doi>10.1109/COMSNETS56262.2023.10041269</doi><tpages>3</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier EISSN: 2155-2509
ispartof 2023 15th International Conference on COMmunication Systems & NETworkS (COMSNETS), 2023, p.231-233
issn 2155-2509
language eng
recordid cdi_ieee_primary_10041269
source IEEE Xplore All Conference Series
subjects Computational efficiency
Computational modeling
Human-AI Collaboration
Multi-Modal Networks
Pervasive Systems
Predictive models
Prototypes
Real-time systems
Referring Expression Comprehension
Task analysis
Visual Grounding
Visualization
title Demonstrating Multi-modal Human Instruction Comprehension with AR Smart Glass
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T14%3A19%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Demonstrating%20Multi-modal%20Human%20Instruction%20Comprehension%20with%20AR%20Smart%20Glass&rft.btitle=2023%2015th%20International%20Conference%20on%20COMmunication%20Systems%20&%20NETworkS%20(COMSNETS)&rft.au=Weerakoon,%20Dulanga&rft.date=2023-01-03&rft.spage=231&rft.epage=233&rft.pages=231-233&rft.eissn=2155-2509&rft_id=info:doi/10.1109/COMSNETS56262.2023.10041269&rft.eisbn=9781665477062&rft.eisbn_list=1665477067&rft_dat=%3Cieee_CHZPO%3E10041269%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i187t-89d84ad6407c3a637bb454b216b48d8142fa022b3e7fe7864426dbe6fed42cb33%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10041269&rfr_iscdi=true