Loading…

Deep Learning Approaches for Image Captioning: Opportunities, Challenges and Future Potential

Generative intelligence relies heavily on the integration of vision and language. Much of the research has focused on image captioning, which involves describing images with meaningful sentences. Typically, when generating sentences that describe the visual content, a language model and a vision enc...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE access 2024, p.1-1
Main Authors:	Jamil, Azhar, Saif-Ur-Rehman, Mahmood, Khalid, Villar, Monica Gracia, Prola, Thomas, Diez, Isabel De La Torre, Samad, Md Abdus, Ashraf, Imran
Format:	Article
Language:	English
Subjects:	Artificial intelligence Computational modeling Context modeling Convolutional neural networks Deep learning Feature extraction Image captioning Image capture Image processing Natural language processing Surveys Task analysis Visualization
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Generative intelligence relies heavily on the integration of vision and language. Much of the research has focused on image captioning, which involves describing images with meaningful sentences. Typically, when generating sentences that describe the visual content, a language model and a vision encoder are commonly employed. Because of the incorporation of object areas, properties, multi-modal connections, attentive techniques, and early fusion approaches like bidirectional encoder representations from transformers (BERT), these components have experienced substantial advancements over the years. This research offers a reference to the body of literature, identifies emerging trends in an area that blends computer vision as well as natural language processing in order to maximize their complementary effects, and identifies the most significant technological improvements in architectures employed for image captioning. It also discusses various problem variants and open challenges. This comparison allows for an objective assessment of different techniques, architectures, and training strategies by identifying the most significant technical innovations, and offers valuable insights into the current landscape of image captioning research.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2024.3365528