A review of Deep learning image captioning approaches

Keywords:
Image captioning, Computer vision, Natural language processing , Attention mechanism, Deep LearningAbstract
In today's information-driven world, images have become a prevalent and influential means of communication and artistic expression. While humans effortlessly understand visual scenes and describe them in nuanced language, replicating this ability in machines has been a significant hurdle. Image captioning, a burgeoning field at the intersection of computer vision and natural language processing (NLP), aims to overcome this challenge by developing sophisticated algorithms and models that can intelligently interpret visual data and generate accurate, contextually relevant, and human-like textual descriptions for images. This survey paper presents a systematic examination of deep learning approaches in image captioning, offering a detailed taxonomy for each method category. It extensively covers widely-used datasets and evaluation metrics designed to assess image captioning model performance. The discussion emphasizes challenges encountered in the field along with highlighting the current state-of-the-art technologies.
URN:NBN:sciencein.jist.2024.v12.712
Downloads
Downloads
Published
Issue
Section
URN
License
Copyright (c) 2023 Yugandhara A. Thakare, Kishor H. Walse

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Rights and Permission