Loading…
A Multilevel Transfer Learning Technique and LSTM Framework for Generating Medical Captions for Limited CT and DBT Images
Medical image captioning has been recently attracting the attention of the medical community. Also, generating captions for images involving multiple organs is an even more challenging task. Therefore, any attempt toward such medical image captioning becomes the need of the hour. In recent years, th...
Saved in:
Published in: | Journal of digital imaging 2022-06, Vol.35 (3), p.564-580 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Medical image captioning has been recently attracting the attention of the medical community. Also, generating captions for images involving multiple organs is an even more challenging task. Therefore, any attempt toward such medical image captioning becomes the need of the hour. In recent years, the rapid developments in deep learning approaches have made them an effective option for the analysis of medical images and automatic report generation. But analyzing medical images that are scarce and limited is hard, and it is difficult even with machine learning approaches. The concept of transfer learning can be employed in such applications that suffer from insufficient training data. This paper presents an approach to develop a medical image captioning model based on a deep recurrent architecture that combines Multi Level Transfer Learning (MLTL) framework with a Long Short-Term-Memory (LSTM) model. A basic MLTL framework with three models is designed to detect and classify very limited datasets, using the knowledge acquired from easily available datasets. The first model for the source domain uses the abundantly available non-medical images and learns the generalized features. The acquired knowledge is then transferred to the second model for the intermediate and auxiliary domain, which is related to the target domain. This information is then used for the final target domain, which consists of medical datasets that are very limited in nature. Therefore, the knowledge learned from a non-medical source domain is transferred to improve the learning in the target domain that deals with medical images. Then, a novel LSTM model, which is used for sequence generation and machine translation, is proposed to generate captions for the given medical image from the MLTL framework. To improve the captioning of the target sentence further, an enhanced multi-input Convolutional Neural Network (CNN) model along with feature extraction techniques is proposed. This enhanced multi-input CNN model extracts the most important features of an image that help in generating a more precise and detailed caption of the medical image. Experimental results show that the proposed model performs well with an accuracy of 96.90%, with BLEU score of 76.9%, even with very limited datasets, when compared to the work reported in literature. |
---|---|
ISSN: | 0897-1889 1618-727X 1618-727X |
DOI: | 10.1007/s10278-021-00567-7 |