Loading…

Synthetic Bone Marrow Smears Are a Privacy-Preserving Substitute for Developing Accurate Leukemia Classification Models in Hematological Microscopy

The term ‘big data’ has become a buzzword in the medical literature, yet medical data remains largely inaccessible due to insufficient digitization, proprietary restrictions, and privacy concerns. This inaccessibility is particularly detrimental for developing and validating deep learning models for...

Full description

Saved in:
Bibliographic Details
Published in:Blood 2024-11, Vol.144, p.1057-1057
Main Authors: Eckardt, Jan-Niklas, Srivastava, Ishan, Wang, Zizhe, Winter, Susann, Schmittmann, Tim, Riechert, Sebastian, Gediga, Miriam Eva Helena, Sulaiman, Anas Shekh, Schneider, Martin M. K., Schulze, Freya, Thiede, Christian, Sockel, Katja, Kroschinsky, Frank P., Röllig, Christoph, Bornhäuser, Martin, Wendt, Karsten, Middeke, Jan Moritz
Format: Article
Language:English
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The term ‘big data’ has become a buzzword in the medical literature, yet medical data remains largely inaccessible due to insufficient digitization, proprietary restrictions, and privacy concerns. This inaccessibility is particularly detrimental for developing and validating deep learning models for cancer detection in rare diseases like acute myeloid leukemia (AML) and acute promyelocytic leukemia (APL) using bone marrow smears. While conventional methods for sample size augmentation, such as geometric or photometric transformations, can boost training set sizes, we hypothesized that using synthetically generated bone marrow smear images for model training can enhance performance while preserving patient privacy, thereby facilitating unrestricted image data sharing. We digitized bone marrow smears of 1251 AML and 51 APL patients as well as 236 healthy bone marrow donors by capturing field-of-view images at a resolution of 2560 * 1920 pixels, covering an area of 171 * 128 µm. StyleGAN2-ADA, a generative adversarial network, was used with initialized features for shape and color generation to generate bone marrow smear image data for AML, APL, and healthy donors. Both real and synthetic image data were then fed at varying proportions into a convolutional neural net classification model tasked with disease detection to determine the ratio of real-to-synthetic images needed to train an accurate disease detection model. Imbalances in data set sizes were accommodated for using standard image augmentation techniques such as rotation, mirroring, or linear transformations. Hyperparameter search was performed using the Optuna framework. To evaluate the quality of synthetic images, a visual Turing test was conducted with 14 hematologists. Using a web application that displayed one image at a time (either real or synthetic), participants were asked to distinguish between the two. The resulting area-under-the-curve (AUC) of 0.63 indicated that experts could not reliably differentiate synthetic images from real bone marrow smears. Next, classification performance for binary decisions (AML vs. donors, APL vs. donors, AML vs. APL) was assessed starting with the total amount (100%) of available real samples (1251 AML, 51 APL, 236 donors) and zero (0%) synthetic samples. The proportion of synthetic images was incrementally increased by 10% while decreasing the real images until training was performed solely on synthetic samples. Starting with real samples only, we obtained
ISSN:0006-4971
DOI:10.1182/blood-2024-198486