TY  - JOUR
AU  - Defersha, Naol Bakala 
AU  - Tune, Kula Kekeba 
AU  - Abate, Solomon Teferra 
PY  - 2024
TI  - EthioLSocMDMTLM: Exploring Application of Topic Modeling for Building Ethiopian Language Social Media Data-Based Multilingual Transformer Language Models for Multilingual Hateful Content Detection
JF  - Journal of Computer Science
VL  - 21
IS  - 2
DO  - 10.3844/jcssp.2025.250.262
UR  - https://thescipub.com/abstract/jcssp.2025.250.262
AB  - This study proposes topic modeling techniques to develop Ethiopian Language Social Media Data Based Multilingual Transformer Language Models for multilingual hateful content detection. We modified various multilingual pretrained models, investigated the challenges of using pre-trained transformer language models, and built multilingual hateful content detection models. Topic words with rows of 1561, 70, and 1044 extracted from Afaan Oromo, Tigrigna, and Amharic Afaan Oromo, Amharic, and Tigrigna respectively used to train transformers. The proposed models were also tested by developing a multilingual hateful content detection model for low-resource Ethiopian languages using deep learning techniques. A total of 45522, 59529, and 48882, Tex documents of Amharic, Afaan Oromo, and Tigrigna were collected and three annotators annotated the data into binary classes where the agreement among annotators result scored 87% for Amharic, 82% for Tigrigna and 84% for Afaan Oromo. LSTM, CNN, and BiLSM deep learning algorithms applied algorithms, that includes integration of EthioLan_mBERT, EthioLan_BERT, and EthioLan_XLM-Roberta contextual embeddings. Among applied the techniques; LSTM+ EthioLan_mBERT outperforms the score performance of F1score 81%. We publicly release the modified pre-trained models, dataset, and related codes.