Loading…
Enhanced Image Captioning Using Bahdanau Attention Mechanism and Heuristic Beam Search Algorithm
Captioning images is a challenging task at the intersection of Computer Vision (CV) and Natural Language Processing (NLP), that involves generating descriptive text to depict the content of an image. Existing methodologies typically employ Convolutional Neural Networks (CNNs) for feature extraction...
Saved in:
Published in: | IEEE access 2024, Vol.12, p.100991-101003 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Captioning images is a challenging task at the intersection of Computer Vision (CV) and Natural Language Processing (NLP), that involves generating descriptive text to depict the content of an image. Existing methodologies typically employ Convolutional Neural Networks (CNNs) for feature extraction and Recurrent Neural Networks (RNNs) for generating captions. However, these approaches often suffer from a lack of contextual understanding, inability to capture fine-grained details, and to generate generic captions. This study proposes VisualCaptionNet (VCN), a novel image captioning model that leverages ResNet50 for rich visual feature extraction and a Long Short-Term Memory (LSTM) network for sequential caption generation while retaining context. By incorporating the Bahdanau attention mechanism to focus on relevant image regions and integrating beam search for coherent and contextually relevant descriptions, VCN addresses the limitations of previous methodologies. Extensive experimentation on benchmark datasets such as Flickr30K and Flickr8K demonstrates VCN's notable improvements of 10% and 12% over baseline models in terms of caption quality, coherence, and relevance. These enhancements emphasize VCN's effectiveness in advancing image captioning tasks, promising more accurate and contextually relevant descriptions for images. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2024.3431091 |