Loading…

Enhancing the reliability of deep learning-based head and neck tumour segmentation using uncertainty estimation with multi-modal images

Deep learning shows promise in autosegmentation of head and neck cancer (HNC) primary tumours (GTV-T) and nodal metastases (GTV-N). However, errors such as including non-tumour regions or missing nodal metastases still occur. Conventional methods often make overconfident predictions, compromising re...

Full description

Saved in:
Bibliographic Details
Published in:Physics in medicine & biology 2024-08, Vol.69 (16), p.165018
Main Authors: Ren, Jintao, Teuwen, Jonas, Nijkamp, Jasper, Rasmussen, Mathis, Gouw, Zeno, Grau Eriksen, Jesper, Sonke, Jan-Jakob, Korreman, Stine
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c219t-e4644039dd06095974394071e02d986e31db4da3e94d32ce7a9868a05c0b5e603
container_end_page
container_issue 16
container_start_page 165018
container_title Physics in medicine & biology
container_volume 69
creator Ren, Jintao
Teuwen, Jonas
Nijkamp, Jasper
Rasmussen, Mathis
Gouw, Zeno
Grau Eriksen, Jesper
Sonke, Jan-Jakob
Korreman, Stine
description Deep learning shows promise in autosegmentation of head and neck cancer (HNC) primary tumours (GTV-T) and nodal metastases (GTV-N). However, errors such as including non-tumour regions or missing nodal metastases still occur. Conventional methods often make overconfident predictions, compromising reliability. Incorporating uncertainty estimation, which provides calibrated confidence intervals can address this issue. Our aim was to investigate the efficacy of various uncertainty estimation methods in improving segmentation reliability. We evaluated their confidence levels in voxel predictions and ability to reveal potential segmentation errors. Approach. We retrospectively collected data from 567 HNC patients with diverse cancer sites and multi-modality images (CT, PET, T1-, and T2-weighted MRI) along with their clinical GTV-T/N delineations. Using the nnUNet 3D segmentation pipeline, we compared seven uncertainty estimation methods, evaluating them based on segmentation accuracy (Dice similarity coefficient, DSC), confidence calibration (Expected Calibration Error, ECE), and their ability to reveal segmentation errors (Uncertainty-Error overlap using DSC, UE-DSC). Main Results. Evaluated on the hold-out test dataset (n=97), the median DSC scores for GTV-T and GTV-N segmentation across all uncertainty estimation methods had a narrow range, from 0.73 to 0.76 and 0.78 to 0.80, respectively. In contrast, the median ECE exhibited a wider range, from 0.30 to 0.12 for GTV-T and 0.25 to 0.09 for GTV-N. Similarly, the median UE-DSC also ranged broadly, from 0.21 to 0.38 for GTV-T and 0.22 to 0.36 for GTV-N. A probabilistic network - PhiSeg method consistently demonstrated the best performance in terms of ECE and UE-DSC. Significance. Our study highlights the importance of uncertainty estimation in enhancing the reliability of deep learning for autosegmentation of HNC GTV. The results show that while segmentation accuracy can be similar across methods, their reliability, measured by calibration error and uncertainty-error overlap, varies significantly. Used with visualisation maps, these methods may effectively pinpoint uncertainties and potential errors at the voxel level.&#xD.
doi_str_mv 10.1088/1361-6560/ad682d
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1088_1361_6560_ad682d</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3085114237</sourcerecordid><originalsourceid>FETCH-LOGICAL-c219t-e4644039dd06095974394071e02d986e31db4da3e94d32ce7a9868a05c0b5e603</originalsourceid><addsrcrecordid>eNp1kD1PwzAQhi0EglLYmZBHBgLn2EnjESG-pEosMFtOfG0NiVNsR6i_gL-Nq5RuTCfdPffq7iHkgsENg6q6ZbxkWVmUcKtNWeXmgEz2rUMyAeAsk6woTshpCB8AjFW5OCYnXEIhBc8n5OfBrbRrrFvSuELqsbW6tq2NG9ovqEFc0xa1dwnIah3Q0BVqQ7Uz1GHzSePQ9YOnAZcduqij7R0dwjZucA36qK1LURii7cbht40r2g1ttFnXG93SNFhiOCNHC90GPN_VKXl_fHi7f87mr08v93fzrMmZjBmKUgjg0hgoQRZyJrgUMGMIuZFViZyZWhjNUQrD8wZnOnUrDUUDdYEl8Cm5GnPXvv8a0l2qs6HBttUO-yEoDlXBmMj5LKEwoo3vQ_C4UGufjvUbxUBt9auta7V1rUb9aeVylz7UHZr9wp_vBFyPgO3X6iOZc-nZ__N-Ab0kkFI</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3085114237</pqid></control><display><type>article</type><title>Enhancing the reliability of deep learning-based head and neck tumour segmentation using uncertainty estimation with multi-modal images</title><source>Institute of Physics</source><creator>Ren, Jintao ; Teuwen, Jonas ; Nijkamp, Jasper ; Rasmussen, Mathis ; Gouw, Zeno ; Grau Eriksen, Jesper ; Sonke, Jan-Jakob ; Korreman, Stine</creator><creatorcontrib>Ren, Jintao ; Teuwen, Jonas ; Nijkamp, Jasper ; Rasmussen, Mathis ; Gouw, Zeno ; Grau Eriksen, Jesper ; Sonke, Jan-Jakob ; Korreman, Stine</creatorcontrib><description>Deep learning shows promise in autosegmentation of head and neck cancer (HNC) primary tumours (GTV-T) and nodal metastases (GTV-N). However, errors such as including non-tumour regions or missing nodal metastases still occur. Conventional methods often make overconfident predictions, compromising reliability. Incorporating uncertainty estimation, which provides calibrated confidence intervals can address this issue. Our aim was to investigate the efficacy of various uncertainty estimation methods in improving segmentation reliability. We evaluated their confidence levels in voxel predictions and ability to reveal potential segmentation errors. Approach. We retrospectively collected data from 567 HNC patients with diverse cancer sites and multi-modality images (CT, PET, T1-, and T2-weighted MRI) along with their clinical GTV-T/N delineations. Using the nnUNet 3D segmentation pipeline, we compared seven uncertainty estimation methods, evaluating them based on segmentation accuracy (Dice similarity coefficient, DSC), confidence calibration (Expected Calibration Error, ECE), and their ability to reveal segmentation errors (Uncertainty-Error overlap using DSC, UE-DSC). Main Results. Evaluated on the hold-out test dataset (n=97), the median DSC scores for GTV-T and GTV-N segmentation across all uncertainty estimation methods had a narrow range, from 0.73 to 0.76 and 0.78 to 0.80, respectively. In contrast, the median ECE exhibited a wider range, from 0.30 to 0.12 for GTV-T and 0.25 to 0.09 for GTV-N. Similarly, the median UE-DSC also ranged broadly, from 0.21 to 0.38 for GTV-T and 0.22 to 0.36 for GTV-N. A probabilistic network - PhiSeg method consistently demonstrated the best performance in terms of ECE and UE-DSC. Significance. Our study highlights the importance of uncertainty estimation in enhancing the reliability of deep learning for autosegmentation of HNC GTV. The results show that while segmentation accuracy can be similar across methods, their reliability, measured by calibration error and uncertainty-error overlap, varies significantly. Used with visualisation maps, these methods may effectively pinpoint uncertainties and potential errors at the voxel level.&amp;#xD.</description><identifier>ISSN: 0031-9155</identifier><identifier>ISSN: 1361-6560</identifier><identifier>EISSN: 1361-6560</identifier><identifier>DOI: 10.1088/1361-6560/ad682d</identifier><identifier>PMID: 39059432</identifier><identifier>CODEN: PHMBA7</identifier><language>eng</language><publisher>England: IOP Publishing</publisher><subject>deep learning ; gross tumour volume ; head and neck cancer ; radiotherapy ; tumour segmentation ; uncertainty estimation ; uncertainty quantification</subject><ispartof>Physics in medicine &amp; biology, 2024-08, Vol.69 (16), p.165018</ispartof><rights>2024 The Author(s). Published by IOP Publishing Ltd</rights><rights>2024 Institute of Physics and Engineering in Medicine.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c219t-e4644039dd06095974394071e02d986e31db4da3e94d32ce7a9868a05c0b5e603</cites><orcidid>0000-0001-5155-5274 ; 0000-0002-1558-7196 ; 0000-0001-6206-6839 ; 0000-0002-1825-1428 ; 0000-0001-7523-5881 ; 0000-0002-1145-6033 ; 0000-0002-3523-382X ; 0000-0002-7853-3531</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/39059432$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Ren, Jintao</creatorcontrib><creatorcontrib>Teuwen, Jonas</creatorcontrib><creatorcontrib>Nijkamp, Jasper</creatorcontrib><creatorcontrib>Rasmussen, Mathis</creatorcontrib><creatorcontrib>Gouw, Zeno</creatorcontrib><creatorcontrib>Grau Eriksen, Jesper</creatorcontrib><creatorcontrib>Sonke, Jan-Jakob</creatorcontrib><creatorcontrib>Korreman, Stine</creatorcontrib><title>Enhancing the reliability of deep learning-based head and neck tumour segmentation using uncertainty estimation with multi-modal images</title><title>Physics in medicine &amp; biology</title><addtitle>PMB</addtitle><addtitle>Phys. Med. Biol</addtitle><description>Deep learning shows promise in autosegmentation of head and neck cancer (HNC) primary tumours (GTV-T) and nodal metastases (GTV-N). However, errors such as including non-tumour regions or missing nodal metastases still occur. Conventional methods often make overconfident predictions, compromising reliability. Incorporating uncertainty estimation, which provides calibrated confidence intervals can address this issue. Our aim was to investigate the efficacy of various uncertainty estimation methods in improving segmentation reliability. We evaluated their confidence levels in voxel predictions and ability to reveal potential segmentation errors. Approach. We retrospectively collected data from 567 HNC patients with diverse cancer sites and multi-modality images (CT, PET, T1-, and T2-weighted MRI) along with their clinical GTV-T/N delineations. Using the nnUNet 3D segmentation pipeline, we compared seven uncertainty estimation methods, evaluating them based on segmentation accuracy (Dice similarity coefficient, DSC), confidence calibration (Expected Calibration Error, ECE), and their ability to reveal segmentation errors (Uncertainty-Error overlap using DSC, UE-DSC). Main Results. Evaluated on the hold-out test dataset (n=97), the median DSC scores for GTV-T and GTV-N segmentation across all uncertainty estimation methods had a narrow range, from 0.73 to 0.76 and 0.78 to 0.80, respectively. In contrast, the median ECE exhibited a wider range, from 0.30 to 0.12 for GTV-T and 0.25 to 0.09 for GTV-N. Similarly, the median UE-DSC also ranged broadly, from 0.21 to 0.38 for GTV-T and 0.22 to 0.36 for GTV-N. A probabilistic network - PhiSeg method consistently demonstrated the best performance in terms of ECE and UE-DSC. Significance. Our study highlights the importance of uncertainty estimation in enhancing the reliability of deep learning for autosegmentation of HNC GTV. The results show that while segmentation accuracy can be similar across methods, their reliability, measured by calibration error and uncertainty-error overlap, varies significantly. Used with visualisation maps, these methods may effectively pinpoint uncertainties and potential errors at the voxel level.&amp;#xD.</description><subject>deep learning</subject><subject>gross tumour volume</subject><subject>head and neck cancer</subject><subject>radiotherapy</subject><subject>tumour segmentation</subject><subject>uncertainty estimation</subject><subject>uncertainty quantification</subject><issn>0031-9155</issn><issn>1361-6560</issn><issn>1361-6560</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp1kD1PwzAQhi0EglLYmZBHBgLn2EnjESG-pEosMFtOfG0NiVNsR6i_gL-Nq5RuTCfdPffq7iHkgsENg6q6ZbxkWVmUcKtNWeXmgEz2rUMyAeAsk6woTshpCB8AjFW5OCYnXEIhBc8n5OfBrbRrrFvSuELqsbW6tq2NG9ovqEFc0xa1dwnIah3Q0BVqQ7Uz1GHzSePQ9YOnAZcduqij7R0dwjZucA36qK1LURii7cbht40r2g1ttFnXG93SNFhiOCNHC90GPN_VKXl_fHi7f87mr08v93fzrMmZjBmKUgjg0hgoQRZyJrgUMGMIuZFViZyZWhjNUQrD8wZnOnUrDUUDdYEl8Cm5GnPXvv8a0l2qs6HBttUO-yEoDlXBmMj5LKEwoo3vQ_C4UGufjvUbxUBt9auta7V1rUb9aeVylz7UHZr9wp_vBFyPgO3X6iOZc-nZ__N-Ab0kkFI</recordid><startdate>20240821</startdate><enddate>20240821</enddate><creator>Ren, Jintao</creator><creator>Teuwen, Jonas</creator><creator>Nijkamp, Jasper</creator><creator>Rasmussen, Mathis</creator><creator>Gouw, Zeno</creator><creator>Grau Eriksen, Jesper</creator><creator>Sonke, Jan-Jakob</creator><creator>Korreman, Stine</creator><general>IOP Publishing</general><scope>O3W</scope><scope>TSCCA</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-5155-5274</orcidid><orcidid>https://orcid.org/0000-0002-1558-7196</orcidid><orcidid>https://orcid.org/0000-0001-6206-6839</orcidid><orcidid>https://orcid.org/0000-0002-1825-1428</orcidid><orcidid>https://orcid.org/0000-0001-7523-5881</orcidid><orcidid>https://orcid.org/0000-0002-1145-6033</orcidid><orcidid>https://orcid.org/0000-0002-3523-382X</orcidid><orcidid>https://orcid.org/0000-0002-7853-3531</orcidid></search><sort><creationdate>20240821</creationdate><title>Enhancing the reliability of deep learning-based head and neck tumour segmentation using uncertainty estimation with multi-modal images</title><author>Ren, Jintao ; Teuwen, Jonas ; Nijkamp, Jasper ; Rasmussen, Mathis ; Gouw, Zeno ; Grau Eriksen, Jesper ; Sonke, Jan-Jakob ; Korreman, Stine</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c219t-e4644039dd06095974394071e02d986e31db4da3e94d32ce7a9868a05c0b5e603</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>deep learning</topic><topic>gross tumour volume</topic><topic>head and neck cancer</topic><topic>radiotherapy</topic><topic>tumour segmentation</topic><topic>uncertainty estimation</topic><topic>uncertainty quantification</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ren, Jintao</creatorcontrib><creatorcontrib>Teuwen, Jonas</creatorcontrib><creatorcontrib>Nijkamp, Jasper</creatorcontrib><creatorcontrib>Rasmussen, Mathis</creatorcontrib><creatorcontrib>Gouw, Zeno</creatorcontrib><creatorcontrib>Grau Eriksen, Jesper</creatorcontrib><creatorcontrib>Sonke, Jan-Jakob</creatorcontrib><creatorcontrib>Korreman, Stine</creatorcontrib><collection>IOP_英国物理学会OA刊</collection><collection>IOPscience (Open Access)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Physics in medicine &amp; biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ren, Jintao</au><au>Teuwen, Jonas</au><au>Nijkamp, Jasper</au><au>Rasmussen, Mathis</au><au>Gouw, Zeno</au><au>Grau Eriksen, Jesper</au><au>Sonke, Jan-Jakob</au><au>Korreman, Stine</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Enhancing the reliability of deep learning-based head and neck tumour segmentation using uncertainty estimation with multi-modal images</atitle><jtitle>Physics in medicine &amp; biology</jtitle><stitle>PMB</stitle><addtitle>Phys. Med. Biol</addtitle><date>2024-08-21</date><risdate>2024</risdate><volume>69</volume><issue>16</issue><spage>165018</spage><pages>165018-</pages><issn>0031-9155</issn><issn>1361-6560</issn><eissn>1361-6560</eissn><coden>PHMBA7</coden><abstract>Deep learning shows promise in autosegmentation of head and neck cancer (HNC) primary tumours (GTV-T) and nodal metastases (GTV-N). However, errors such as including non-tumour regions or missing nodal metastases still occur. Conventional methods often make overconfident predictions, compromising reliability. Incorporating uncertainty estimation, which provides calibrated confidence intervals can address this issue. Our aim was to investigate the efficacy of various uncertainty estimation methods in improving segmentation reliability. We evaluated their confidence levels in voxel predictions and ability to reveal potential segmentation errors. Approach. We retrospectively collected data from 567 HNC patients with diverse cancer sites and multi-modality images (CT, PET, T1-, and T2-weighted MRI) along with their clinical GTV-T/N delineations. Using the nnUNet 3D segmentation pipeline, we compared seven uncertainty estimation methods, evaluating them based on segmentation accuracy (Dice similarity coefficient, DSC), confidence calibration (Expected Calibration Error, ECE), and their ability to reveal segmentation errors (Uncertainty-Error overlap using DSC, UE-DSC). Main Results. Evaluated on the hold-out test dataset (n=97), the median DSC scores for GTV-T and GTV-N segmentation across all uncertainty estimation methods had a narrow range, from 0.73 to 0.76 and 0.78 to 0.80, respectively. In contrast, the median ECE exhibited a wider range, from 0.30 to 0.12 for GTV-T and 0.25 to 0.09 for GTV-N. Similarly, the median UE-DSC also ranged broadly, from 0.21 to 0.38 for GTV-T and 0.22 to 0.36 for GTV-N. A probabilistic network - PhiSeg method consistently demonstrated the best performance in terms of ECE and UE-DSC. Significance. Our study highlights the importance of uncertainty estimation in enhancing the reliability of deep learning for autosegmentation of HNC GTV. The results show that while segmentation accuracy can be similar across methods, their reliability, measured by calibration error and uncertainty-error overlap, varies significantly. Used with visualisation maps, these methods may effectively pinpoint uncertainties and potential errors at the voxel level.&amp;#xD.</abstract><cop>England</cop><pub>IOP Publishing</pub><pmid>39059432</pmid><doi>10.1088/1361-6560/ad682d</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0001-5155-5274</orcidid><orcidid>https://orcid.org/0000-0002-1558-7196</orcidid><orcidid>https://orcid.org/0000-0001-6206-6839</orcidid><orcidid>https://orcid.org/0000-0002-1825-1428</orcidid><orcidid>https://orcid.org/0000-0001-7523-5881</orcidid><orcidid>https://orcid.org/0000-0002-1145-6033</orcidid><orcidid>https://orcid.org/0000-0002-3523-382X</orcidid><orcidid>https://orcid.org/0000-0002-7853-3531</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0031-9155
ispartof Physics in medicine & biology, 2024-08, Vol.69 (16), p.165018
issn 0031-9155
1361-6560
1361-6560
language eng
recordid cdi_crossref_primary_10_1088_1361_6560_ad682d
source Institute of Physics
subjects deep learning
gross tumour volume
head and neck cancer
radiotherapy
tumour segmentation
uncertainty estimation
uncertainty quantification
title Enhancing the reliability of deep learning-based head and neck tumour segmentation using uncertainty estimation with multi-modal images
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T07%3A41%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Enhancing%20the%20reliability%20of%20deep%20learning-based%20head%20and%20neck%20tumour%20segmentation%20using%20uncertainty%20estimation%20with%20multi-modal%20images&rft.jtitle=Physics%20in%20medicine%20&%20biology&rft.au=Ren,%20Jintao&rft.date=2024-08-21&rft.volume=69&rft.issue=16&rft.spage=165018&rft.pages=165018-&rft.issn=0031-9155&rft.eissn=1361-6560&rft.coden=PHMBA7&rft_id=info:doi/10.1088/1361-6560/ad682d&rft_dat=%3Cproquest_cross%3E3085114237%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c219t-e4644039dd06095974394071e02d986e31db4da3e94d32ce7a9868a05c0b5e603%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3085114237&rft_id=info:pmid/39059432&rfr_iscdi=true