Comparison of Stochastic and Rule-Based POS Tagging on Malay Online Text

Kalaiarasi Sonai Muthu Anbananthen; Jaya Kumar Krishnan; Mohd. Shohel Sayeed; Praviny Muniapan

doi:10.3844/ajassp.2017.843.851

Technical Report Open Access

Comparison of Stochastic and Rule-Based POS Tagging on Malay Online Text

Kalaiarasi Sonai Muthu Anbananthen¹, Jaya Kumar Krishnan¹, Mohd. Shohel Sayeed¹ and Praviny Muniapan¹

¹ Department of Information Science and Technology, Multimedia University, Melaka, Malaysia

Abstract

Extensive development of web 2.0 has led to production of gigantic amount of user generated data. These data consist of many useful information. Manual analyzing these data and classifying sentiment in them, is an exhausting task, thus opinion mining method is needed. Opinion mining approach uses natural language processing where Part-of-Speech (POS) Tagging is a crucial part. The performance of any NLP system depends on the accuracy of a POS tagger. Two main issues that affect the accuracy of POS tagger are unknown words and ambiguity. Although research on POS tagging has been back dated few decades ago, yet they have been mostly focused on English. Research on Malay language is still in the early stage. Also, online Malay Text differs from proper Malay text, in the sense of structure and also grammar. Online users tend use a lot of abbreviations and short forms in their text. Besides this, the “BahasaRojak” phenomena complicate tagging process even further. Thus taking all these into consideration, in this study, we will review stochastic and rule-based POS tagging methodologies to deal with ambiguous and unknown words on online Malay text.

American Journal of Applied Sciences

Volume 14 No. 9, 2017, 843-851

DOI: https://doi.org/10.3844/ajassp.2017.843.851

Submitted On: 28 September 2016 Published On: 5 April 2017

How to Cite: Anbananthen, K. S. M., Krishnan, J. K., Sayeed, M. S. & Muniapan, P. (2017). Comparison of Stochastic and Rule-Based POS Tagging on Malay Online Text. American Journal of Applied Sciences, 14(9), 843-851. https://doi.org/10.3844/ajassp.2017.843.851

Copyright: © 2017 Kalaiarasi Sonai Muthu Anbananthen, Jaya Kumar Krishnan, Mohd. Shohel Sayeed and Praviny Muniapan. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

14,324 Views
58,740 Downloads
15 Citations

Download

Keywords

Opinion Mining
Part-of-Speech Tagging
Malay Language
Malay Online Text
Rule Based Approach
Stochastic Approach