A review of Deep learning image captioning approaches

image captioning using deep learning


  • Yugandhara A. Thakare Sant Gadge Baba Amravati University
  • Kishor H. Walse Sant Bhagwan Baba Kala Mahavidyalaya


Image captioning, Computer vision, Natural language processing , Attention mechanism, Deep Learning


In today's information-driven world, images have become a prevalent and influential means of communication and artistic expression. While humans effortlessly understand visual scenes and describe them in nuanced language, replicating this ability in machines has been a significant hurdle. Image captioning, a burgeoning field at the intersection of computer vision and natural language processing (NLP), aims to overcome this challenge by developing sophisticated algorithms and models that can intelligently interpret visual data and generate accurate, contextually relevant, and human-like textual descriptions for images. This survey paper presents a systematic examination of deep learning approaches in image captioning, offering a detailed taxonomy for each method category. It extensively covers widely-used datasets and evaluation metrics designed to assess image captioning model performance. The discussion emphasizes challenges encountered in the field along with highlighting the current state-of-the-art technologies.


Author Biography

Yugandhara A. Thakare, Sant Gadge Baba Amravati University

PG Department of Computer Science




How to Cite

Thakare, Y. A., & Walse, K. H. (2023). A review of Deep learning image captioning approaches. Journal of Integrated Science and Technology, 12(1), 712. Retrieved from https://pubs.thesciencein.org/journal/index.php/jist/article/view/a712