Loading…

Multi-Modal Learning from Video, Eye Tracking, and Pupillometry for Operator Skill Characterization in Clinical Fetal Ultrasound

This paper presents a novel multi-modal learning approach for automated skill characterization of obstetric ultrasound operators using heterogeneous spatio-temporal sensory cues, namely, scan video, eye-tracking data, and pupillometric data, acquired in the clinical environment. We address pertinent...

Full description

Saved in:
Bibliographic Details
Main Authors: Sharma, Harshita, Drukker, Lior, Papageorghiou, Aris T., Noble, J. Alison
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 1649
container_issue
container_start_page 1646
container_title
container_volume 2021
creator Sharma, Harshita
Drukker, Lior
Papageorghiou, Aris T.
Noble, J. Alison
description This paper presents a novel multi-modal learning approach for automated skill characterization of obstetric ultrasound operators using heterogeneous spatio-temporal sensory cues, namely, scan video, eye-tracking data, and pupillometric data, acquired in the clinical environment. We address pertinent challenges such as combining heterogeneous, small-scale and variable-length sequential datasets, to learn deep convolutional neural networks in real-world scenarios. We propose spatial encoding for multi-modal analysis using sonography standard plane images, spatial gaze maps, gaze trajectory images, and pupillary response images. We present and compare five multi-modal learning network architectures using late, intermediate, hybrid, and tensor fusion. We build models for the Heart and the Brain scanning tasks, and performance evaluation suggests that multi-modal learning networks outperform uni-modal networks, with the best-performing model achieving accuracies of 82.4% (Brain task) and 76.4% (Heart task) for the operator skill classification problem.
doi_str_mv 10.1109/ISBI48211.2021.9433863
format conference_proceeding
fullrecord <record><control><sourceid>proquest_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_9433863</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9433863</ieee_id><sourcerecordid>2563426276</sourcerecordid><originalsourceid>FETCH-LOGICAL-i362t-4f1eee0dc72541d61e02af33b787856f537e884650ceb6af1fb501f610911b7b3</originalsourceid><addsrcrecordid>eNpVUU1P3DAQdSuqQim_oBLykQPZevyZXCqVFbQrLaIS0GvkJBNwceytk1TanvjpWGKLyhxmRnpP780HIcfAFgCs-ry6PlvJkgMsOOOwqKQQpRZvyAfQWkngUsNbcgCVVEUpFd_b9abi5T45GsdfLIeRUjD5nuwLKUFUQhyQx8vZT664jJ31dI02BRfuaJ_iQH-6DuMpPd8ivUm2fcjAKbWhoz_mjfM-DjilLe1jolcbTHbKzfVDBujy3mb-hMn9tZOLgbpAl94F12aPC5xyvvVTsmOcQ_eRvOutH_FoVw_J7cX5zfJ7sb76tlp-XRdOaD4VsgdEZF1reF6304CM216IxpSmVLpXwmBZSq1Yi422PfSNYtDrfDuAxjTikHx51t3MzYBdiyFP4OtNcoNN2zpaV79Ggruv7-Kf2mgAxXkWONkJpPh7xnGqBze26L0NGOex5koLyTU3OlOP__d6Mfl39kz49ExweakXePdU8QQvHpPk</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype><pqid>2563426276</pqid></control><display><type>conference_proceeding</type><title>Multi-Modal Learning from Video, Eye Tracking, and Pupillometry for Operator Skill Characterization in Clinical Fetal Ultrasound</title><source>IEEE Xplore All Conference Series</source><creator>Sharma, Harshita ; Drukker, Lior ; Papageorghiou, Aris T. ; Noble, J. Alison</creator><creatorcontrib>Sharma, Harshita ; Drukker, Lior ; Papageorghiou, Aris T. ; Noble, J. Alison</creatorcontrib><description>This paper presents a novel multi-modal learning approach for automated skill characterization of obstetric ultrasound operators using heterogeneous spatio-temporal sensory cues, namely, scan video, eye-tracking data, and pupillometric data, acquired in the clinical environment. We address pertinent challenges such as combining heterogeneous, small-scale and variable-length sequential datasets, to learn deep convolutional neural networks in real-world scenarios. We propose spatial encoding for multi-modal analysis using sonography standard plane images, spatial gaze maps, gaze trajectory images, and pupillary response images. We present and compare five multi-modal learning network architectures using late, intermediate, hybrid, and tensor fusion. We build models for the Heart and the Brain scanning tasks, and performance evaluation suggests that multi-modal learning networks outperform uni-modal networks, with the best-performing model achieving accuracies of 82.4% (Brain task) and 76.4% (Heart task) for the operator skill classification problem.</description><identifier>ISSN: 1945-7928</identifier><identifier>EISSN: 1945-8452</identifier><identifier>EISBN: 1665412461</identifier><identifier>EISBN: 9781665412469</identifier><identifier>DOI: 10.1109/ISBI48211.2021.9433863</identifier><identifier>PMID: 34413933</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Biological system modeling ; Brain modeling ; convolutional neural networks ; eye tracking ; Heart ; Multi-modal learning ; Network architecture ; Performance evaluation ; pupillometry ; Tensors ; Ultrasonic imaging ; ultrasound</subject><ispartof>2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), 2021, Vol.2021, p.1646-1649</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9433863$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>230,309,310,314,780,784,789,790,885,23930,23931,25140,27924,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9433863$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/34413933$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Sharma, Harshita</creatorcontrib><creatorcontrib>Drukker, Lior</creatorcontrib><creatorcontrib>Papageorghiou, Aris T.</creatorcontrib><creatorcontrib>Noble, J. Alison</creatorcontrib><title>Multi-Modal Learning from Video, Eye Tracking, and Pupillometry for Operator Skill Characterization in Clinical Fetal Ultrasound</title><title>2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI)</title><addtitle>ISBI</addtitle><addtitle>Proc IEEE Int Symp Biomed Imaging</addtitle><description>This paper presents a novel multi-modal learning approach for automated skill characterization of obstetric ultrasound operators using heterogeneous spatio-temporal sensory cues, namely, scan video, eye-tracking data, and pupillometric data, acquired in the clinical environment. We address pertinent challenges such as combining heterogeneous, small-scale and variable-length sequential datasets, to learn deep convolutional neural networks in real-world scenarios. We propose spatial encoding for multi-modal analysis using sonography standard plane images, spatial gaze maps, gaze trajectory images, and pupillary response images. We present and compare five multi-modal learning network architectures using late, intermediate, hybrid, and tensor fusion. We build models for the Heart and the Brain scanning tasks, and performance evaluation suggests that multi-modal learning networks outperform uni-modal networks, with the best-performing model achieving accuracies of 82.4% (Brain task) and 76.4% (Heart task) for the operator skill classification problem.</description><subject>Biological system modeling</subject><subject>Brain modeling</subject><subject>convolutional neural networks</subject><subject>eye tracking</subject><subject>Heart</subject><subject>Multi-modal learning</subject><subject>Network architecture</subject><subject>Performance evaluation</subject><subject>pupillometry</subject><subject>Tensors</subject><subject>Ultrasonic imaging</subject><subject>ultrasound</subject><issn>1945-7928</issn><issn>1945-8452</issn><isbn>1665412461</isbn><isbn>9781665412469</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2021</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNpVUU1P3DAQdSuqQim_oBLykQPZevyZXCqVFbQrLaIS0GvkJBNwceytk1TanvjpWGKLyhxmRnpP780HIcfAFgCs-ry6PlvJkgMsOOOwqKQQpRZvyAfQWkngUsNbcgCVVEUpFd_b9abi5T45GsdfLIeRUjD5nuwLKUFUQhyQx8vZT664jJ31dI02BRfuaJ_iQH-6DuMpPd8ivUm2fcjAKbWhoz_mjfM-DjilLe1jolcbTHbKzfVDBujy3mb-hMn9tZOLgbpAl94F12aPC5xyvvVTsmOcQ_eRvOutH_FoVw_J7cX5zfJ7sb76tlp-XRdOaD4VsgdEZF1reF6304CM216IxpSmVLpXwmBZSq1Yi422PfSNYtDrfDuAxjTikHx51t3MzYBdiyFP4OtNcoNN2zpaV79Ggruv7-Kf2mgAxXkWONkJpPh7xnGqBze26L0NGOex5koLyTU3OlOP__d6Mfl39kz49ExweakXePdU8QQvHpPk</recordid><startdate>20210401</startdate><enddate>20210401</enddate><creator>Sharma, Harshita</creator><creator>Drukker, Lior</creator><creator>Papageorghiou, Aris T.</creator><creator>Noble, J. Alison</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope><scope>NPM</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20210401</creationdate><title>Multi-Modal Learning from Video, Eye Tracking, and Pupillometry for Operator Skill Characterization in Clinical Fetal Ultrasound</title><author>Sharma, Harshita ; Drukker, Lior ; Papageorghiou, Aris T. ; Noble, J. Alison</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i362t-4f1eee0dc72541d61e02af33b787856f537e884650ceb6af1fb501f610911b7b3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Biological system modeling</topic><topic>Brain modeling</topic><topic>convolutional neural networks</topic><topic>eye tracking</topic><topic>Heart</topic><topic>Multi-modal learning</topic><topic>Network architecture</topic><topic>Performance evaluation</topic><topic>pupillometry</topic><topic>Tensors</topic><topic>Ultrasonic imaging</topic><topic>ultrasound</topic><toplevel>online_resources</toplevel><creatorcontrib>Sharma, Harshita</creatorcontrib><creatorcontrib>Drukker, Lior</creatorcontrib><creatorcontrib>Papageorghiou, Aris T.</creatorcontrib><creatorcontrib>Noble, J. Alison</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection><collection>PubMed</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Sharma, Harshita</au><au>Drukker, Lior</au><au>Papageorghiou, Aris T.</au><au>Noble, J. Alison</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Multi-Modal Learning from Video, Eye Tracking, and Pupillometry for Operator Skill Characterization in Clinical Fetal Ultrasound</atitle><btitle>2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI)</btitle><stitle>ISBI</stitle><addtitle>Proc IEEE Int Symp Biomed Imaging</addtitle><date>2021-04-01</date><risdate>2021</risdate><volume>2021</volume><spage>1646</spage><epage>1649</epage><pages>1646-1649</pages><issn>1945-7928</issn><eissn>1945-8452</eissn><eisbn>1665412461</eisbn><eisbn>9781665412469</eisbn><abstract>This paper presents a novel multi-modal learning approach for automated skill characterization of obstetric ultrasound operators using heterogeneous spatio-temporal sensory cues, namely, scan video, eye-tracking data, and pupillometric data, acquired in the clinical environment. We address pertinent challenges such as combining heterogeneous, small-scale and variable-length sequential datasets, to learn deep convolutional neural networks in real-world scenarios. We propose spatial encoding for multi-modal analysis using sonography standard plane images, spatial gaze maps, gaze trajectory images, and pupillary response images. We present and compare five multi-modal learning network architectures using late, intermediate, hybrid, and tensor fusion. We build models for the Heart and the Brain scanning tasks, and performance evaluation suggests that multi-modal learning networks outperform uni-modal networks, with the best-performing model achieving accuracies of 82.4% (Brain task) and 76.4% (Heart task) for the operator skill classification problem.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>34413933</pmid><doi>10.1109/ISBI48211.2021.9433863</doi><tpages>4</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1945-7928
ispartof 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), 2021, Vol.2021, p.1646-1649
issn 1945-7928
1945-8452
language eng
recordid cdi_ieee_primary_9433863
source IEEE Xplore All Conference Series
subjects Biological system modeling
Brain modeling
convolutional neural networks
eye tracking
Heart
Multi-modal learning
Network architecture
Performance evaluation
pupillometry
Tensors
Ultrasonic imaging
ultrasound
title Multi-Modal Learning from Video, Eye Tracking, and Pupillometry for Operator Skill Characterization in Clinical Fetal Ultrasound
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T18%3A35%3A01IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Multi-Modal%20Learning%20from%20Video,%20Eye%20Tracking,%20and%20Pupillometry%20for%20Operator%20Skill%20Characterization%20in%20Clinical%20Fetal%20Ultrasound&rft.btitle=2021%20IEEE%2018th%20International%20Symposium%20on%20Biomedical%20Imaging%20(ISBI)&rft.au=Sharma,%20Harshita&rft.date=2021-04-01&rft.volume=2021&rft.spage=1646&rft.epage=1649&rft.pages=1646-1649&rft.issn=1945-7928&rft.eissn=1945-8452&rft_id=info:doi/10.1109/ISBI48211.2021.9433863&rft.eisbn=1665412461&rft.eisbn_list=9781665412469&rft_dat=%3Cproquest_CHZPO%3E2563426276%3C/proquest_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i362t-4f1eee0dc72541d61e02af33b787856f537e884650ceb6af1fb501f610911b7b3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2563426276&rft_id=info:pmid/34413933&rft_ieee_id=9433863&rfr_iscdi=true