Research Article Open Access

shu-torjoma: An English↔Bangla Statistical Machine Translation System

Mohammad Abdullah Al Mumin1, Md Hanif Seddiqui2, Muhammed Zafar Iqbal1 and Mohammed Jahirul Islam1
  • 1 Shahjalal University of Science and Technology, Bangladesh
  • 2 University of Chittagong, Bangladesh
Journal of Computer Science
Volume 15 No. 7, 2019, 1022-1039

DOI: https://doi.org/10.3844/jcssp.2019.1022.1039

Submitted On: 25 March 2019 Published On: 29 July 2019

How to Cite: Al Mumin, M. A., Seddiqui, M. H., Iqbal, M. Z. & Islam, M. J. (2019). shu-torjoma: An English↔Bangla Statistical Machine Translation System. Journal of Computer Science, 15(7), 1022-1039. https://doi.org/10.3844/jcssp.2019.1022.1039

Abstract

An efficient and publicly open machine translation system is in dire need to get the maximum benefits of Information and Communication Technology through removing the language barrier in this era of globalization. In this study, we present a Phrase-Based Statistical Machine Translation (PBMT) system between English and Bangla languages in both directions. To the best of our knowledge, the system is trained on the largest dataset of more than three million tokens each side in English↔Bangla translation task. In the system, we perform data preprocessing and use optimized parameters to produce efficient system output. We analyze our system output from several viewpoints: overall results, comparisons with the available systems, sentence type and length effect, and behaviour of two challenging linguistic properties–  prepositional phrase and noun inflection. Our analysis provides useful insights that translating into morphologically richer language is harder than translating from them and this is mainly due to the difficulties of translating noun inflections. Comparisons with the available systems show that our system outperforms the other systems significantly and gain 10.84 BLEU, 2.18 NIST and 19.02 TER points over the next best system. The analysis of the sentence type and length effect shows that simple sentences are easier to translate and the sentences longer than 15 words are harder to translate for English↔Bangla translation task. To foster the English↔Bangla machine translation research, we have developed development and test datasets, which are representative in sentence length and balanced in genre to be used as a benchmark and are made publicly available.

  • 948 Views
  • 1,090 Downloads
  • 1 Citations

Download

Keywords

  • English-Bangla Machine Translation
  • Machine Translation System
  • Morphologically Rich
  • Statistical Machine Translation