A review of Deep learning image captioning approaches
Keywords:Image captioning, Computer vision, Natural language processing , Attention mechanism, Deep Learning
In today's information-driven world, images have become a prevalent and influential means of communication and artistic expression. While humans effortlessly understand visual scenes and describe them in nuanced language, replicating this ability in machines has been a significant hurdle. Image captioning, a burgeoning field at the intersection of computer vision and natural language processing (NLP), aims to overcome this challenge by developing sophisticated algorithms and models that can intelligently interpret visual data and generate accurate, contextually relevant, and human-like textual descriptions for images. This survey paper presents a systematic examination of deep learning approaches in image captioning, offering a detailed taxonomy for each method category. It extensively covers widely-used datasets and evaluation metrics designed to assess image captioning model performance. The discussion emphasizes challenges encountered in the field along with highlighting the current state-of-the-art technologies.
How to Cite
Copyright (c) 2023 Yugandhara A. Thakare, Kishor H. Walse
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Rights and Permission