Loading…

Does Your Dermatology Classifier Know What It Doesn't Know? Detecting the Long-Tail of Unseen Conditions

We develop and rigorously evaluate a deep learning based system that can accurately classify skin conditions while detecting rare conditions for which there is not enough data available for training a confident classifier. We frame this task as an out-of-distribution (OOD) detection problem. Our nov...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2021-04
Main Authors:	Abhijit Guha Roy, Ren, Jie, Azizi, Shekoofeh, Loh, Aaron, Natarajan, Vivek, Mustafa, Basil, Pawlowski, Nick, Freyberg, Jan, Liu, Yuan, Beaver, Zach, Vo, Nam, Bui, Peggy, Winter, Samantha, MacWilliams, Patricia, Corrado, Greg S, Telang, Umesh, Liu, Yun, Cemgil, Taylan, Karthikesalingam, Alan, Lakshminarayanan, Balaji, Winkens, Jim
Format:	Article
Language:	English
Subjects:	Classification Classifiers Data analysis Dermatology Inliers (landforms) Machine learning Outliers (statistics) Risk levels Subgroups System effectiveness Training
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	We develop and rigorously evaluate a deep learning based system that can accurately classify skin conditions while detecting rare conditions for which there is not enough data available for training a confident classifier. We frame this task as an out-of-distribution (OOD) detection problem. Our novel approach, hierarchical outlier detection (HOD) assigns multiple abstention classes for each training outlier class and jointly performs a coarse classification of inliers vs. outliers, along with fine-grained classification of the individual classes. We demonstrate the effectiveness of the HOD loss in conjunction with modern representation learning approaches (BiT, SimCLR, MICLe) and explore different ensembling strategies for further improving the results. We perform an extensive subgroup analysis over conditions of varying risk levels and different skin types to investigate how the OOD detection performance changes over each subgroup and demonstrate the gains of our framework in comparison to baselines. Finally, we introduce a cost metric to approximate downstream clinical impact. We use this cost metric to compare the proposed method against a baseline system, thereby making a stronger case for the overall system effectiveness in a real-world deployment scenario.
ISSN:	2331-8422
DOI:	10.48550/arxiv.2104.03829