Machine translation of low resource Indian language using deep learning approach

Madhuri  Tayal; Aniket  Tiwari; Anuj  Dharme; Pratik K  Agrawal; Animesh  Tayal; Nilima V.  Pardakhe

doi:10.62110/sciencein.jist.2025.v13.1127

English to Bhojpuri language translation protocol

Authors

Madhuri Tayal G.H. Raisoni College of Engineering and Management, Nagpur
Aniket Tiwari Deloitte, Pune
Anuj Dharme IBM Pune
Pratik K Agrawal Symbiosis International (Deemed University), Pune
Animesh Tayal 5Codemate IT Services Pvt LTD, Nagpur
Nilima V. Pardakhe Prof. Ram Meghe Institute of Technology & Research, Badnera

DOI:

https://doi.org/10.62110/sciencein.jist.2025.v13.1127

Keywords:

Neural Network, Machine Translation, sparse occurences, Bahdanau Attention Mechanism

Abstract

The creation of an English-to-Bhojpuri machine translation (MT) system is presented in this paper, with an emphasis on the difficulties in translating words based on Devanagari script. Due to the sparse occurrence of distinct Devanagari words and the lack of training data, accurate translation is challenging in Bhojpuri, a language spoken in Bihar, India. A sequence-to-sequence model trained on a self-made dataset of 10,105 English-Bhojpuri sentence pairs is used to construct the system. Particular focus is placed on adding examples of these distinct words to the dataset. The model uses word embeddings and attention mechanisms to capture the semantic and contextual relationships required for precise translation. This machine translation (MT) system provides a customized solution for handling Indian languages in the Devanagari script by tackling the linguistic complexities of Devanagari words, improving the accuracy and fluency of English-to-Bhojpuri translation.

Downloads

Download data is not yet available.

Author Biographies

Madhuri Tayal, G.H. Raisoni College of Engineering and Management, Nagpur

Department of Data Science
Aniket Tiwari, Deloitte, Pune

USI, AI and Data Department
Pratik K Agrawal, Symbiosis International (Deemed University), Pune

Symbiosis Institute of Technology, Nagpur Campus
Nilima V. Pardakhe, Prof. Ram Meghe Institute of Technology & Research, Badnera

Department of Computer Science & Engineering