A Systematic Literature Review on English and Bangla Topic Modeling
- 1 Shahjalal University of Science and Technology, Bangladesh
Abstract
Due to the enormous growth of information and technology, the digitized texts and data are being immensely generated. Therefore, identifying the main topics in a vast collection of documents by humans is merely impossible. Topic modeling is such a statistical framework that infers the latent and underlying topics from text documents, corpus, or electronic archives through a probabilistic approach. It is a promising field in Natural Language Processing (NLP). Though many researchers have researched this field, only a few significant research has been done for Bangla. In this literature review paper, we have followed a systematic approach for reviewing topic modeling studies published from 2003 to 2020. We have analyzed topic modeling methods from different aspects and identified the research gap between topic modeling in English and Bangla language. After analyzing these papers, we have identified several types of topic modeling techniques, such as Latent Dirichlet Allocation (LDA), Latent Semantic Analysis (LSA), Support Vector Machine (SVM), Bi-term Topic Modeling (BTM). Furthermore, this review paper also highlights the real-world applications of topic modeling. Several evaluation methods were used to evaluate these models’ performances, which we have discussed in this study. We conclude by mentioning the huge future research scopes for topic modeling in Bangla.
DOI: https://doi.org/10.3844/jcssp.2021.1.18
Copyright: © 2021 Md. Basim Uddin Ahmed, Ananta Akash Podder, Mahruba Sharmin Chowdhury and Mohammad Abdullah Al Mumin. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 3,926 Views
- 1,739 Downloads
- 4 Citations
Download
Keywords
- English Bangla Comparison
- Latent Dirichlet Allocation (LDA)
- Systematic Literature Review (SLR)
- Topic Modeling Bangla
- Topic Modeling Methods
- Topic Extraction