Investigation of Naive Bayes Combined with Multilayered Perceptron for Arabic Sentiment Analysis and Opinion Mining

—Sentiment analysis and opinion mining has recently become active research area, which associated with studying the people’s opinions, emotions, and evaluation from a written text. The rapid growing of social networks increases the importance of sentiment analysis; in which people are always sharing their opinion about several subjects and topics over the internet. Therefore, it will be useful to classify people’s reviews, discussions, and blogs to identify their opinion of specific product, movie or hotel ether is a positive or negative, to farther helps companies or other people for decision making. Sentiment analysis for English language has been well studied. Conversely, the work that has been carried out in terms of Arabic remains in its infancy; thus, more cooperation is required between research communities in order for them to offer a mature sentiment analysis system for Arabic. In this paper, the Naive Bayes (NB) algorithm and Multilayered Perceptron (MLP) network are combined with hybrid system for Arabic sentiment classification. The dataset are then classified into positive or negative polarities of sentiment using the proposed system. A comparison with many classifiers for the tweets dataset were tested. As a result, the performance of the combined MLP ranked first (99.5%) compared to other classifiers with recorded accuracy of 93.0%, 84.9%, 84.1%, 88.2%, 80.7%, and 78.1% for NB, Decision Table, JRip, OneR, PART, and ZeroR, respectively.


INTRODUCTION
The sentiment classification is a task of classifying the people's text of reviews by performing Text Analysis techniques and Natural Language Processing to determine whether the people's attraction about some topic (product, hotel, or movie, etc.) is positive, negative, or neutral. Sentiment analysis and opinion mining recently become very popular and active research, that due to the importance of the sentiment analysis in various fields, such as in politic [1], marketing [2], education [3], healthcare [4] etc. In addition, sentiment analysis plays an important role in making decisions for individuals, organizations or governments; an appropriate decision can be made wisely by getting the opinion from others [5]. Many researchers have been studied the sentiment analysis in terms of English language, nonetheless the research work in terms of Arabic language still insufficient. Therefore, more research work of Arabic is required [6]. The authors Farra et al. in [7] used sentiment mining of Arabic text at both sentence level and document level. The study works in a grammatical approach and a semantic approach for sentence-level sentiment mining in Arabic text, and a document-level, also they construct semantic dictionary includes Arabic roots. The results showed high accuracy in grammatical approach. El-Beltagy et al. [8] studied the possibility of determining the semantic orientation in terms of Arabic Egyptian. The authors used two datasets from tweeter, first datasets contained 500 tweets, classified into 155 positive tweets, 310 negative tweets, and 35 neutral tweets; second dataset is the Dostour dataset which has 100 random comments. The comments were classified 40 positive, 38 negative, and 22 neutral. Methodology proposed by Rushdi-Saleh et al. [9] proposes Arabic corpus for opinion mining, which collected from Arabic reviews of web pages related to movies and films, the authors used Support Vector Machines (SVM) and Naïve Bayes (NB) algorithms to classify the polarity of Arabic reviews. The authors have translated the Opinion Corpus for Arabic (OCA) corpus into English, generating the EVOCA corpus (English Version of OCA). The result has shown that the accuracy of EVOCA is low because of translation. Refaee et al. [10] used gold-standard annotated corpus. The corpus ----------------------Shakir Mrayyen and Mohammad Subhi Al-Batah are with the contained 8,868 tweets collected manually at different time. Test set (1,365 tweets) development set (7,503 tweets). The result of the analysis showed the top of five frequent words. The researchers Al-Kabi et al. [11] used lexicon-based tool to analyze Arabic reviews and comments by using MSA language. This tool is able to identify the polarity, intensity and subjectivity, of Arabic reviews. The result of the study is a tool that more accurate for Arabic domain. The work of Cheong et al. [12] was done for a tweets collected from Twitter during the Australian floods in 2010-2011. The authors used the social network analysis techniques to generate and analyze the online networks traffic that arisen at that time. The SNA metrics is used to identify influential members on online communities. However, one limitation of it is the insufficient number of tweets collected for the purpose of generalizing the findings. Kouloumpis et al. [13] studied the effect of linguistic features for detecting the sentiment of Twitter messages and evaluated the usefulness of existing lexical resources. The researchers used three datasets (hashtag, emotion and iSieve) from twitter. Different features are used to classify data, use bigrams, unigrams and sentiment analysis, mainly features representing information from a sentiment lexicon and POS features.
Duwairi and Qarqaz [14] used crowd sourcing to collect a large dataset of tweets and developed a framework for analyzing Tweets to detect the positive, negative or neutral sentiments. However, there was a need to expand the dictionaries. El-Makky et al. [15] designed a novel subjective and sentiment analysis system for Arabic tweets by building a new Arabic lexicon and merging Modern standard Arabic lexica with two Egyptian Arabic lexica. The result of novel hybrid approach showed good result compared to the previous work. Shoukry and Rafea [16] used sentence level sentiment analysis for Arabic. 1000 tweets are collected from twitter and classified by using two algorithms in Weka program, Support Vector Machine (SVM) and naïve Bayes (NB) algorithms. The result showed that SVM accuracy was higher than NB. However, the study ignored neutral tweets. The study by Butgereit [17] aims to monitoring Twitter of weather status for a specific city. It used crowd sourced and collected almost 600 tweets from twitter and used the μ Model to analyze the tweets. The work by Rushdi-Saleh et al. [18] used verity web pages Arabic and blogs to build a small Arabic opinion corpus consists of 500 movies, 250 positive and 250 negative opinion using two machine learning; support vector machines algorithm and Naïve Bayes algorithm. The accuracy of the NB classifier was 84% and the accuracy of SVM classifier was 90%. However, there was a need to expand the corpus. The work by Siqi zhao [19] developed Sport Sense to detect sport events targeted for the National Football League (NFL) games. The results showed that major game events can be accurately and effectively extracted by using open access Twitter data. Sport Sense API is accessible for developers in order to create Twitter-enabled applications. The application was tested and used by the National Football League (NFL) games as a targeted domain. [20] analyzed the tweet posts in emergency events in North America during the Oklahoma grass fires and Red River Floods in spring 2009 by using Twitter Search API and geo-location. The aim of the study is to improve situational awareness in emergency situations. The authors Godfrey et al. [21] used k-means and the Non-Negative Matrix Factorization (NMF) to cluster the topics of number of tweets, the results demonstrate that k-means and NMF algorithms produces comparable results, but NMF showed to be faster. Duwairi in [22] introduced a framework to sentiment analysis on tweets written in Modern Standard Arabic (MSA) and Jordanian Dialectical Arabic (JDA). The researcher produced dialect lexicon which maps the words in JDA into their corresponding MSA words. The dataset consists of 22550 tweets, NB and SVM classifiers were used to determine the polarity of the tweets. Also, Korayem et al. in [23] presents a survey of different techniques for subjectivity and sentiment analysis (SSA) for Arabic. These surveys describe the main existing techniques and test corpora for Arabic SSA that has been introduced in the literature. The researchers Ibrahim and Salim [24] used 65 studies to determine features and methods used for twitter opinion mining. The results showed n-grams features is commonly used for twitter sentiments analysis and also used for Arabic tweets. The study also reported that the methods most common used is the Lexical based classification using Support Vector Machines (SVM) and Naive Bayes (NB). In addition, Al-Kabi et al. [25] used 4625 reviews/comments collect from Yahoo and Maktoob social media to determine length and likes/dislikes reviews. The dataset contains MSA language and various Arabic dialects. The result shows the accuracy of SVM (68.2%) and the accuracy of NB (61.43%).

II. ARABIC SENTIMENT ANALYSIS
Sentiment analysis is a part of machine learning and natural language processing (NLP), which consists of extracting the sentiment, emotion or opinion of people about a specific topic from their writing. Many terms have been used throughout the literature to indicate the sentiment analysis, such as subjectivity analysis, opinion mining, review mining, emotion detection from text and appraisal extraction [26]. Furthermore, the sentiment analysis of the text can either be explicit or implicit aspects. In explicit aspect, a sentiment is given directly, such as ( ‫جيدة‬ ‫مقالة‬ ‫إنها‬ / It is a good article), while in implicit aspect; the sentiment is implied in the text, such as ( ‫من‬ ‫كثير‬ ‫لها‬ ‫االستشهادات‬ / it has a lot of citations). Sentiment analysis can be defined formally as: "Given a text t from a text set T, computationally assigning polarity labels p from a set of polarities P in such a way that p would reflect the actual polarity that is found in T" [27].
Sentiment analysis is an active research area, also known as text mining. Sentiment analysis can be powerful tool for decision making applied in online sentiments reviews or tweets [28]. Sentiment analysis of tweets is a powerful application of mining social media sites that can be used for a variety of social sensing tasks [29].
Tweets dataset are considered in our experiment. Twitter dataset for Arabic sentiment analysis were collected from tweeter on various topics such as: politics and arts, which include 2000 labelled tweets (1000 positive tweets and 1000 negative ones). These tweets include opinions written in both Modern Standard Arabic (MSA) and the Jordanian dialect [30].

III. NAIVE BAYES CLASSIFIER
Naive Bayes (NB) classifier is a classifier based probabilistic method that can be categorized as supervised classification method. NB applies Bayes' theorem with strong independence rules, which approved to be fast, accurate, effective and simple, in text classification [31]. The NB classifier is an algorithm that can be applied to problems that associating an object with a discrete category.
To perform text classification method, it's required to extract features from given text. Features are stored in a vector to be analyzed by the classifier, which called the feature vector or Term vector. The vector is generated based on vocabulary from the training dataset with unique words. The methods used by NB text classification can be categorized to two as presented in [32] by Mccallum and Nigam, the multinomial model and the multivariate Bernoulli model. Both methods are performed by applying the Bayes rule as follow [33].
where, ci specific class that might be positive or negative dj is the text to be classified P(ci) and P(dj) are the Prior probabilities P(ci | dj) and P(dj | ci) are the Posterior probabilities The multinomial model is used in this study. That due to in text classification the multinomial model outperforms the multivariate Bernoulli model [34].
NB multinomial model calculates the word frequency in a given text [35]. The Maximum Likelihood Estimate (MLE) estimates the parameters of given training data based on relative frequency [36]. MLE used the likelihood function to find the parameter values that maximize the most likely value. For the prior probability, the estimation is given in Equation 2.
where Nc is the number of documents in the class ci and N is the total number of documents.
Multinomial model considers that all attributes are independent for a given context of class. The drawback of MLE occurs if the number of words in the class ci is zero, which means the word did not occurred in the training data. In other words the training data are insufficient to represent the frequency of rare occurrence words. In order to overcome this drawback and avoid zero in frequency of words, add-one smoothing method is used (or Laplace smoothing), this method simply adds one to each word count. The probability P(Wk|ci) is a multinomial distribution presented in Equation 3, which shows the relative frequency of the word W in class ci.
Where: n(ci) is the number of words belonging to a class ci (which can be pos or neg), nk(ci) is the number of occurrences of word k in class ci. V is the vocabulary which stores the unique words |V|.
Then the classification of a new given text will be based on the value of naïve-bayes classifier VNB, which shown in Equation 4.
Where, cj is the j th class and wi is the i th word.
To show how Naive Bayes classifier works, can best be explained by providing an illustrative example. Our target is to classify whether a new tweet has positive attitude or negative attitude. Let's consider we have number of classified tweets on a topic as training set as in table I.

‫للعنصري‬ ‫تدعو‬ ‫ه‬
Reinforce Racist neg Now we will use this example to estimate the probability of the positive and negative attitudes and the probability of words in each tweet for the positive and negative attitudes as follow:  P(pos)  P(neg)  P(word in doc|pos) positive probability of word  P(word in doc|neg) negative probability of word There are 01 unique Arabic words to be stored in a vector of words. This stores all unique words for each tweet.
Note that: -The words ‫"انا"‬ and " ‫"جدا‬ occurred twice and each will be counted as one word.
Training Part: Step 1: calculate priori probability of pos and neg by using Equation 2 as follow: p (pos) = 2/4=0.5 (two positive out of four) p (neg) = 2/4 =0.5 (two negative out of four) Step 2: calculate the individual probability of all possible words with positive attitude and negative attitude based on likelihood smoothing NB estimate using Equation 3 as follow: For positive attitude: as in table I the word ‫"فكرة"‬ occurred once in pos class, and 1 for likelihood smoothing, the value will be (1+1). The number of unique words is 5 in pos class and the number of all unique words is 01, so the value will be (5+01), do the same for all unique words as follow:  Table II shows the probability for each word in pos and neg classes taken from Step 2:  Using Bayes' rule, we label the new tweet TS with pos class that achieves the highest probability, since the value of p (pos | TS) is greater than the value of p (neg | TS), so the testing sentence has positive attitude.

IV. CLASSIFICATION
In this paper, the MLP, Decision Table, JRip, OneR, PART, and ZeroR classifiers are proposed for tweets classification.
Since the inputs of tweets are text, it was not possible for these classifiers to recognize and categorize the text data. Only, the NB has the ability for classifying the polarity of the text dataset. Thus, a combined method for NB with the other classifiers is proposed. In the combined method, the weight of the tweets and the predicted output using NB is automatically feed to next classifier.   [38]. -One Rule (OneR) this algorithm uses the minimum-error attribute for prediction, and discretizing numeric attributes [39]. -Partial C4.5 (PART) is a separate and conquer rule learner. Builds a partial C4.5 decision tree in each iteration and makes the "best" leaf into a rule [40]. -0-R classifier (ZeroR) this algorithm predicts the mean (for a numeric class) or the mode (for a nominal class) [41].
-Multilayer perceptron (MLP) uses a feedforward architecture and can have multiple hidden layers [42][43][44]. MLP uses dot products between inputs and weights and sigmoidal activation functions (or other monotonic functions such as rectified linear unit (ReLU)). The training is usually done through back-propagation for all layers [45][46][47]. This type of neural network is used in deep learning with the help of many techniques such as dropout or batch normalization [48][49][50]. The structure of the three layered back-propagation neural network is shown in Figure I.

I. RESULT AND DISSCUSSION
The proposed approach combined the NB with many classifier for Arabic sentiment classification. Six classifiers were tested; MLP, Decision Table, JRip, OneR, PART, and ZeroR. This study employs many methods to arrange the number of the data for training and testing sets [51]. The methods for test options are full training set, cross-validation (2 folds to10 folds), and percentage split (10%-99%) [52][53][54]. From the experiments, the best accuracy is considered for each classifier.  In addition, the results in Figure II show that the combined MLP is able to achieve better classification performance than other classifiers. The combined MLP outperformed the others in terms of the percentage of training accuracy by more than 1.6% for NB, 14.2% for Decision Table,  These results show the effectiveness of the combined MLP as compared with other classifiers. In our future work, we targeted to apply this method on other domains with feature selection schemes to select features that carry sentiment from those that do not.