Machine translation of low resource Indian language using deep learning approach

English to Bhojpuri language translation protocol

Authors

  • Madhuri Tayal G.H. Raisoni College of Engineering and Management, Nagpur
  • Aniket Tiwari Deloitte, Pune
  • Anuj Dharme IBM Pune
  • Pratik K Agrawal Symbiosis International (Deemed University), Pune
  • Animesh Tayal 5Codemate IT Services Pvt LTD, Nagpur
  • Nilima V. Pardakhe Prof. Ram Meghe Institute of Technology & Research, Badnera

DOI:

https://doi.org/10.62110/sciencein.jist.2025.v13.1127

Keywords:

Neural Network, Machine Translation, sparse occurences, Bahdanau Attention Mechanism

Abstract

The creation of an English-to-Bhojpuri machine translation (MT) system is presented in this paper, with an emphasis on the difficulties in translating words based on Devanagari script. Due to the sparse occurrence of distinct Devanagari words and the lack of training data, accurate translation is challenging in Bhojpuri, a language spoken in Bihar, India. A sequence-to-sequence model trained on a self-made dataset of 10,105 English-Bhojpuri sentence pairs is used to construct the system. Particular focus is placed on adding examples of these distinct words to the dataset. The model uses word embeddings and attention mechanisms to capture the semantic and contextual relationships required for precise translation. This machine translation (MT) system provides a customized solution for handling Indian languages in the Devanagari script by tackling the linguistic complexities of Devanagari words, improving the accuracy and fluency of English-to-Bhojpuri translation.

Downloads

Download data is not yet available.

Author Biographies

  • Madhuri Tayal, G.H. Raisoni College of Engineering and Management, Nagpur

    Department of Data Science

  • Aniket Tiwari, Deloitte, Pune

    USI, AI and Data Department

  • Pratik K Agrawal, Symbiosis International (Deemed University), Pune

    Symbiosis Institute of Technology, Nagpur Campus

  • Nilima V. Pardakhe, Prof. Ram Meghe Institute of Technology & Research, Badnera

    Department of Computer Science & Engineering

Downloads

Published

2025-04-09

Issue

Section

Computer Science and Engineering

URN

How to Cite

Tayal, M. ., Tiwari, A. ., Dharme, A. ., Agrawal, P. K. ., Tayal, A. ., & Pardakhe, N. V. . (2025). Machine translation of low resource Indian language using deep learning approach. Journal of Integrated Science and Technology, 13(6), 1127. https://doi.org/10.62110/sciencein.jist.2025.v13.1127

Similar Articles

1-10 of 151

You may also start an advanced similarity search for this article.