Loading…
CT-Less Whole-Body Bone Segmentation of PET Images Using a Multimodal Deep Learning Network
In bone cancer imaging, positron emission tomography (PET) is ideal for the diagnosis and staging of bone cancers due to its high sensitivity to malignant tumors. The diagnosis of bone cancer requires tumor analysis and localization, where accurate and automated wholebody bone segmentation (WBBS) is...
Saved in:
Published in: | IEEE journal of biomedical and health informatics 2024-11, p.1-16 |
---|---|
Main Authors: | , , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In bone cancer imaging, positron emission tomography (PET) is ideal for the diagnosis and staging of bone cancers due to its high sensitivity to malignant tumors. The diagnosis of bone cancer requires tumor analysis and localization, where accurate and automated wholebody bone segmentation (WBBS) is often needed. Current WBBS for PET imaging is based on paired Computed Tomography (CT) images. However, mismatches between CT and PET images often occur due to patient motion, which leads to erroneous bone segmentation and thus, to inaccurate tumor analysis. Furthermore, there are some instances where CT images are unavailable for WBBS. In this work, we propose a novel multimodal fusion network (MMF-Net) for WBBS of PET images, without the need for CT images. Specifically, the tracer activity (\lambda-MLAA), attenuation map (\mu-MLAA), and synthetic attenuation map (\mu-DL) images are introduced into the training data. We first design a multi-encoder structure employed to fully learn modalityspecific encoding representations of the three PET modality images through independent encoding branches. Then, we propose a multimodal fusion module In this work, we propose a novel multimodal fusion network (MMF-Net) for WBBS of PET images, without the need for CT images. Specifically, the tracer activity (\lambda-MLAA), attenuation map (\mu-MLAA), and synthetic attenuation map (\mu-DL) images are introduced into the training data. We first design a multiencoder structure employed to fully learn modality-specific encoding representations of the three PET modality images through independent encoding branches. Then, we propose a multimodal fusion module in the decoder to further integrate the complementary information across the three modalities. Additionally, we introduce revised convolution units, SE (Squeeze-and-Excitation) Normalization and deep supervision to improve segmentation performance. Extensive comparisons and ablation experiments, using 130 whole-body PET image datasets, show promising results, with Dice similarity coefficient (DSC) values of |
---|---|
ISSN: | 2168-2194 2168-2208 |
DOI: | 10.1109/JBHI.2024.3501386 |