Loading…

A new dataset for measuring the performance of blood vessel segmentation methods under distribution shifts

Creating a dataset for training supervised machine learning algorithms can be a demanding task. This is especially true for medical image segmentation since one or more specialists are usually required for image annotation, and creating ground truth labels for just a single image can take up to seve...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2024-04
Main Authors:	Matheus Viana da Silva, Natália de Carvalho Santos, Ouellette, Julie, Lacoste, Baptiste, Comin, Cesar Henrique
Format:	Article
Language:	English
Subjects:	Algorithms Blood vessels Datasets Image acquisition Image annotation Image segmentation Machine learning Medical imaging Methodology Outliers (statistics) Sampling methods Supervised learning Training
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Creating a dataset for training supervised machine learning algorithms can be a demanding task. This is especially true for medical image segmentation since one or more specialists are usually required for image annotation, and creating ground truth labels for just a single image can take up to several hours. In addition, it is paramount that the annotated samples represent well the different conditions that might affect the imaged tissues as well as possible changes in the image acquisition process. This can only be achieved by considering samples that are typical in the dataset as well as atypical, or even outlier, samples. We introduce VessMAP, a heterogeneous blood vessel segmentation dataset acquired by carefully sampling relevant images from a larger non-annotated dataset. A methodology was developed to select both prototypical and atypical samples from the base dataset, thus defining an assorted set of images that can be used for measuring the performance of segmentation algorithms on samples that are highly distinct from each other. To demonstrate the potential of the new dataset, we show that the validation performance of a neural network changes significantly depending on the splits used for training the network.
ISSN:	2331-8422