Loading…

Crowdsourcing Dermatology Images with Google Search Ads: Creating a Real-World Skin Condition Dataset

Background: Health datasets from clinical sources do not reflect the breadth and diversity of disease in the real world, impacting research, medical education, and artificial intelligence (AI) tool development. Dermatology is a suitable area to develop and test a new and scalable method to create re...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2024-02
Main Authors: Ward, Abbi, Li, Jimmy, Wang, Julie, Sriram Lakshminarasimhan, Carrick, Ashley, Campana, Bilson, Hartford, Jay, Pradeep Kumar S, Tiyasirichokchai, Tiya, Virmani, Sunny, Wong, Renee, Matias, Yossi, Corrado, Greg S, Webster, Dale R, Siegel, Dawn, Lin, Steven, Ko, Justin, Karthikesalingam, Alan, Semturs, Christopher, Rao, Pooja
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Background: Health datasets from clinical sources do not reflect the breadth and diversity of disease in the real world, impacting research, medical education, and artificial intelligence (AI) tool development. Dermatology is a suitable area to develop and test a new and scalable method to create representative health datasets. Methods: We used Google Search advertisements to invite contributions to an open access dataset of images of dermatology conditions, demographic and symptom information. With informed contributor consent, we describe and release this dataset containing 10,408 images from 5,033 contributions from internet users in the United States over 8 months starting March 2023. The dataset includes dermatologist condition labels as well as estimated Fitzpatrick Skin Type (eFST) and Monk Skin Tone (eMST) labels for the images. Results: We received a median of 22 submissions/day (IQR 14-30). Female (66.72%) and younger (52% < age 40) contributors had a higher representation in the dataset compared to the US population, and 32.6% of contributors reported a non-White racial or ethnic identity. Over 97.5% of contributions were genuine images of skin conditions. Dermatologist confidence in assigning a differential diagnosis increased with the number of available variables, and showed a weaker correlation with image sharpness (Spearman's P values
ISSN:2331-8422
DOI:10.48550/arxiv.2402.18545