Loading…

SURGIVID: Annotation-Efficient Surgical Video Object Discovery

Surgical scenes convey crucial information about the quality of surgery. Pixel-wise localization of tools and anatomical structures is the first task towards deeper surgical analysis for microscopic or endoscopic surgical views. This is typically done via fully-supervised methods which are annotatio...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2024-09
Main Authors: Köksal, Çağhan, Ghazal Ghazaei, Navab, Nassir
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Köksal, Çağhan
Ghazal Ghazaei
Navab, Nassir
description Surgical scenes convey crucial information about the quality of surgery. Pixel-wise localization of tools and anatomical structures is the first task towards deeper surgical analysis for microscopic or endoscopic surgical views. This is typically done via fully-supervised methods which are annotation greedy and in several cases, demanding medical expertise. Considering the profusion of surgical videos obtained through standardized surgical workflows, we propose an annotation-efficient framework for the semantic segmentation of surgical scenes. We employ image-based self-supervised object discovery to identify the most salient tools and anatomical structures in surgical videos. These proposals are further refined within a minimally supervised fine-tuning step. Our unsupervised setup reinforced with only 36 annotation labels indicates comparable localization performance with fully-supervised segmentation models. Further, leveraging surgical phase labels as weak labels can better guide model attention towards surgical tools, leading to \(\sim 2\%\) improvement in tool localization. Extensive ablation studies on the CaDIS dataset validate the effectiveness of our proposed solution in discovering relevant surgical objects with minimal or no supervision.
format article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3104277524</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3104277524</sourcerecordid><originalsourceid>FETCH-proquest_journals_31042775243</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSwCw4NcvcM83SxUnDMy8svSSzJzM_TdU1Ly0zOTM0rUQguLUrPTE7MUQjLTEnNV_BPykpNLlFwySxOzi9LLarkYWBNS8wpTuWF0twMym6uIc4eugVF-YWlqcUl8Vn5pUV5QKl4Y0MDEyNzc1MjE2PiVAEAN383ZQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3104277524</pqid></control><display><type>article</type><title>SURGIVID: Annotation-Efficient Surgical Video Object Discovery</title><source>Publicly Available Content Database</source><creator>Köksal, Çağhan ; Ghazal Ghazaei ; Navab, Nassir</creator><creatorcontrib>Köksal, Çağhan ; Ghazal Ghazaei ; Navab, Nassir</creatorcontrib><description>Surgical scenes convey crucial information about the quality of surgery. Pixel-wise localization of tools and anatomical structures is the first task towards deeper surgical analysis for microscopic or endoscopic surgical views. This is typically done via fully-supervised methods which are annotation greedy and in several cases, demanding medical expertise. Considering the profusion of surgical videos obtained through standardized surgical workflows, we propose an annotation-efficient framework for the semantic segmentation of surgical scenes. We employ image-based self-supervised object discovery to identify the most salient tools and anatomical structures in surgical videos. These proposals are further refined within a minimally supervised fine-tuning step. Our unsupervised setup reinforced with only 36 annotation labels indicates comparable localization performance with fully-supervised segmentation models. Further, leveraging surgical phase labels as weak labels can better guide model attention towards surgical tools, leading to \(\sim 2\%\) improvement in tool localization. Extensive ablation studies on the CaDIS dataset validate the effectiveness of our proposed solution in discovering relevant surgical objects with minimal or no supervision.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Ablation ; Annotations ; Image annotation ; Image segmentation ; Labels ; Localization ; Semantic segmentation ; Surgical instruments ; Video</subject><ispartof>arXiv.org, 2024-09</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/3104277524?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Köksal, Çağhan</creatorcontrib><creatorcontrib>Ghazal Ghazaei</creatorcontrib><creatorcontrib>Navab, Nassir</creatorcontrib><title>SURGIVID: Annotation-Efficient Surgical Video Object Discovery</title><title>arXiv.org</title><description>Surgical scenes convey crucial information about the quality of surgery. Pixel-wise localization of tools and anatomical structures is the first task towards deeper surgical analysis for microscopic or endoscopic surgical views. This is typically done via fully-supervised methods which are annotation greedy and in several cases, demanding medical expertise. Considering the profusion of surgical videos obtained through standardized surgical workflows, we propose an annotation-efficient framework for the semantic segmentation of surgical scenes. We employ image-based self-supervised object discovery to identify the most salient tools and anatomical structures in surgical videos. These proposals are further refined within a minimally supervised fine-tuning step. Our unsupervised setup reinforced with only 36 annotation labels indicates comparable localization performance with fully-supervised segmentation models. Further, leveraging surgical phase labels as weak labels can better guide model attention towards surgical tools, leading to \(\sim 2\%\) improvement in tool localization. Extensive ablation studies on the CaDIS dataset validate the effectiveness of our proposed solution in discovering relevant surgical objects with minimal or no supervision.</description><subject>Ablation</subject><subject>Annotations</subject><subject>Image annotation</subject><subject>Image segmentation</subject><subject>Labels</subject><subject>Localization</subject><subject>Semantic segmentation</subject><subject>Surgical instruments</subject><subject>Video</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSwCw4NcvcM83SxUnDMy8svSSzJzM_TdU1Ly0zOTM0rUQguLUrPTE7MUQjLTEnNV_BPykpNLlFwySxOzi9LLarkYWBNS8wpTuWF0twMym6uIc4eugVF-YWlqcUl8Vn5pUV5QKl4Y0MDEyNzc1MjE2PiVAEAN383ZQ</recordid><startdate>20240912</startdate><enddate>20240912</enddate><creator>Köksal, Çağhan</creator><creator>Ghazal Ghazaei</creator><creator>Navab, Nassir</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240912</creationdate><title>SURGIVID: Annotation-Efficient Surgical Video Object Discovery</title><author>Köksal, Çağhan ; Ghazal Ghazaei ; Navab, Nassir</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31042775243</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Ablation</topic><topic>Annotations</topic><topic>Image annotation</topic><topic>Image segmentation</topic><topic>Labels</topic><topic>Localization</topic><topic>Semantic segmentation</topic><topic>Surgical instruments</topic><topic>Video</topic><toplevel>online_resources</toplevel><creatorcontrib>Köksal, Çağhan</creatorcontrib><creatorcontrib>Ghazal Ghazaei</creatorcontrib><creatorcontrib>Navab, Nassir</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Köksal, Çağhan</au><au>Ghazal Ghazaei</au><au>Navab, Nassir</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>SURGIVID: Annotation-Efficient Surgical Video Object Discovery</atitle><jtitle>arXiv.org</jtitle><date>2024-09-12</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Surgical scenes convey crucial information about the quality of surgery. Pixel-wise localization of tools and anatomical structures is the first task towards deeper surgical analysis for microscopic or endoscopic surgical views. This is typically done via fully-supervised methods which are annotation greedy and in several cases, demanding medical expertise. Considering the profusion of surgical videos obtained through standardized surgical workflows, we propose an annotation-efficient framework for the semantic segmentation of surgical scenes. We employ image-based self-supervised object discovery to identify the most salient tools and anatomical structures in surgical videos. These proposals are further refined within a minimally supervised fine-tuning step. Our unsupervised setup reinforced with only 36 annotation labels indicates comparable localization performance with fully-supervised segmentation models. Further, leveraging surgical phase labels as weak labels can better guide model attention towards surgical tools, leading to \(\sim 2\%\) improvement in tool localization. Extensive ablation studies on the CaDIS dataset validate the effectiveness of our proposed solution in discovering relevant surgical objects with minimal or no supervision.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-09
issn 2331-8422
language eng
recordid cdi_proquest_journals_3104277524
source Publicly Available Content Database
subjects Ablation
Annotations
Image annotation
Image segmentation
Labels
Localization
Semantic segmentation
Surgical instruments
Video
title SURGIVID: Annotation-Efficient Surgical Video Object Discovery
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T15%3A47%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=SURGIVID:%20Annotation-Efficient%20Surgical%20Video%20Object%20Discovery&rft.jtitle=arXiv.org&rft.au=K%C3%B6ksal,%20%C3%87a%C4%9Fhan&rft.date=2024-09-12&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3104277524%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_31042775243%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3104277524&rft_id=info:pmid/&rfr_iscdi=true