Advanced Automatic Lexicon with Sentiment Analysis Algorithms for Arabic Reviews

: Sentiment analysis is a statistical analysis of people’s attitudes, directions and emotions about a specific domain. The advance of telecommunication networks makes it very important to develop different sentiment analysis algorithms for gathering information of user preferences from multiple specialized sources. The next step is to analyze the polarity of information and finally gain and predict knowledge to anticipate future results. Arabic language has gained much interest in the past few years because of its wide morphological and linguistic terms. In this study, automatic lexicons for both modern standard Arabic and colloquial vocabularies are developed for multi-domain classical and vernacular terms. In addition, new novel sentiment analysis algorithms are presented, developed and implemented for analyzing the polarity of multi-domain datasets and increasing efficiency and flexibility of sentiment lexicons. The sentiment analysis algorithms are used for analyzing datasets based on their intensification weights to increase the performance of lexicon analyzer. The experimental results show high accuracy, precision, recall and F-measure compared with other recent research experiments.


Introduction
Sentiment analysis is the computational study of people's opinions, attitudes and emotions toward an entity. This entity can represent individuals, events or topics (Medhat et al., 2014). Opinion mining and sentiment analysis has gained a lot of attention in the last few years due to the wide spread of social networks, blogs and online news sites. Sentiment analysis can be defined as the task of determining the semantic orientation of an opinion holder H on an object or a feature of an Object O. Semantic Orientation (SO) can be positive, negative or neutral (El-Beltagy and Ali, 2013). Arabic language has little resources due to the amount of available datasets. This can refer to a fact that most resources developed for Arabic sentiment analysis are either limited in size, not publicly available, or developed for a very specific domain (El-Sahar and El-Beltagy, 2015).
Finding and extracting Arabic reviewing content from the internet is considered to be a hard task relative to carrying out the same task in English (Saleh and Valdivia, 2011). This is due to the smaller number of Arabic based websites and many Arabic speakers use English language or Arabic transliterated for commenting on a specific area. Arabic sentiment analysis has been increased in the last few years due to the wide increasing of social networks and micro blogging services. One of the recent statistics of Arabic speaking and internet users in 2015 stated that Arabic speakers are more than 375 million with about 5.2% of the total population of the world. The number of Arab internet users in June 2015 was more than 155 million users with a percent of 4.8% of the total world population (http://www.internetworldstats.com/stats19.htm).
The Arabic language contains large number of morphological expressions that can result to high difficulties in collecting and analyzing them. In addition, Arabic formal and informal dichotomy causes a specific challenge when attempting to deal with Arabic contents found in Arabic web sites.
Arabic is generally known to be non-structured and difficult to standardize and analyze. Consequently, research in the processing of informal Arabic languages is known to be scarce (Akaichi, 2014). The major criteria for sentiments are the analysis and classification methodology to obtain accurate and efficient results especially in Arabic language which contain different and wide expressions. The contribution of this paper is as follows: • Collecting multi-domain resources with large number of Arabic reviews based on sentence-level • Building three efficient Arabic lexicons that support Modern Standard Arabic (MSA) Colloquial Arabic and negation terms • Building an automatic analyzer algorithm that discriminates between subjective and objective comments instead of manual discrimination • Develop different automatic analyzer algorithms for analyzing Arabic sentiments with weights and without weights to increase flexibility and efficiency • Perform the analysis of sentiment polarity with different weight to clarify the term orientation and reduce false polarities • Analyzing mixed sentiments that contains both MSA and colloquial terms at the same time • Achieve high accuracy, precision, recall and Fmeasure due to the reduction of false positive and false negative polarities

Related Work
A recent evaluation of Arabic sentiment analysis is presented in our paper (Mostafa, 2017). In this study, most recent Arabic sentiment analysis researches are evaluated along with machine learning algorithms used in determining the performance of datasets. An Arabic sentiment lexicon for assigning sentiment scores to the words found in the Arabic Word-Net is presented in (Mahyoub et al., 2014). In this research, the seed of Arabic words is determined by defining the nouns, verbs, adjective and adverbs of words. An opinion mining and analysis for Arabic language is presented in (Al-Kabi et al., 2014). This research focus on determining the weight for each term polarity in MSA but it did not focus on colloquial Arabic. In addition, the weight value is static for all terms and did not express the variability of Arabic expressions.
Arabic datasets are collected and tested using supervised classifications as presented in (Duwairi and Qarqaz, 2014). In this research, the SVM and KNN algorithms are used to test the precision and recall of the data. An analytical study of Arabic sentiments is presented in (Al-Kabi et al., 2013). The dataset is tested using SVM and naïve Bayes algorithms. Another classification of Arabic datasets is presented in (Abdulla et al., 2013). In this research, datasets are tested using supervised classification and unsupervised experiments.

Data Collection
The first step for building sentiment lexicons is the size of collected datasets. As presented in (Akaichi et al., 2013), few comments of about 260 posts are used for classifying sentiments. The authors of (Akaichi, 2014) collect about 500 posts and then become 300 posts after removing special characters and correcting spelling mistakes. In addition, collected datasets are categorized to positive and negative corpora and not targeted to neutral comments. In the developed lexicons, a large number of datasets are collected from different domains that create multi-domain sentiment lexicons with efficiency in analyzing the polarity of Arabic sentiments based on their weight. Sentiment lexicons contain opinion terms, along with their polarity and strength are considered an essential part of any sentiment analysis tool. There are currently no publically available colloquial Arabic sentiment lexicons (El-Beltagy and Ali, 2013). In this study, Modern Standard Arabic (MSA) and colloquial lexicons are constructed separately in the developed lexicon with multi-domain datasets.
As presented in (Abdulla et al., 2014), the development of lexicons is either manually or automatically. The automatic construction requires little effort of work and time but suffers from low accuracy and poor robustness. In this study, automatic lexicons with analysis algorithms that achieve high accuracy and robustness are constructed.
As presented in Table 1, the datasets of this paper are collected from different domains for increasing the efficiency and flexibility of the developed lexicons. A total of 6364 comments from different domains are collected such as: Economy, politics, sport and history that were collected from Twitter, Face book and Arabic book reviews.

Domain Selection
The first stage of lexicon construction is to select the domain of the data to be analyzed whether it is single domain or multi-domain. As presented in Table 1, the user can select one of the topics of datasets as a single domain or to select multi-domain if the datasets to be added in the lexicon from different sources and topics.

Intensification Weight
After defining the domain of data, the intensification weight of the system that will be used by the automatic analyzer must be defined. The same domain can be used with different intensification weights according to the variability and the wide vocabularies of data. Three weights are used for rating the value for each term: 3, 5, or 7 respectively. For weight 3, the polarity will be discrete values from 1 to-1 (1 Positive, 0 Neutral and -1 Negative). For weight 5, the polarity will be discrete values from 2 to -2 (2 Positive, 1 Nearly Positive, 0 Neutral, -1 Nearly Negative and -2 Negative). For weight 7, the polarity will be from 3 to -3 2 (3 High Positive, 2 Positive, 1 Nearly Positive, 0 Neutral, -1 Nearly Negative, -2 Negative and -3 High Negative).

Building Lexicons
After identifying the domain and the intensification weight, three lexicons are constructed for storing user terms and orientations. The three lexicons are: Modern Standard Arabic (MSA) lexicon for classical Arabic, colloquial lexicon for slang Arabic and negation lexicon for storing all negation terms that will convert the orientation of the term from positive to negative and vice versa.

MSA Lexicon
All classical Arabic terms are added in the Modern Standard Arabic (MSA) lexicon. For each term added in the lexicon, the polarity weight is determined whether it is positive, negative, or neutral based on the value of the intensification weight. If a weight of 5 is used, then the polarity is set based on formula (1): where, t is the term and W i is the weight value.

Colloquial Lexicon
All Arabic slang terms are added in the colloquial lexicon. The slang term is added with its weight value whether it is positive, negative, or neutral. For example, the term " " means "sweet or good" is added. If the lexicon is constructed based on weight of 5, then this term will have a "nearly positive" polarity. If the lexicon is constructed based on weight of 3, then this term will have a "positive" polarity because there is no wide context of meaning.

Negation Lexicon
The negation lexicon is constructed for all Arabic negative words. For example, the terms " ------" which means "no-not-will notnever" are added to the negation lexicon and are used for both MSA and colloquial lexicons. During analysis process, the automatic lexicon analyzer will trace all the comment. If a negation word is detected, the term weight will be inversed.
For example, the comment " " means "not good program". If the term " " is nearly positive of a weight value 1, it will be converted to nearly negative of a weight value-1. As presented in (Rabab'ah et al., 2016), the sentistrength evaluation for Arabic sentiment analysis achieves low accuracy due to the inability to detect negation terms and dialectical Arabic. The senti-strength takes each term as alone without understanding the context of the term meaning. This is considered as one of the contributions of this research. In this study, lexicons with classical MSA, colloquial words and negation terms are automatically created with automatic analysis of Arabic sentiment polarities.

Sentiment Analysis Framework
One of the recent frameworks in sentiment analysis is presented in (Hathlian and Hafezs, 2016). This framework presents the normalization and stemming of Arabic tweets but it lacks the construction of lexicons and analysis processes. An enhanced framework with lexicons and automatic analyzer process is presented in Fig. 1.

Fig. 1. Sentiment analysis framework
As presented in Fig. 1, the automatic lexicon algorithms are based on 5 processes: Step 1: Data Collection The data sets are collected from multi-domains to increase the variability of the developed application.
Step 2: Data Preprocessing In this stage, stop words are removed from sentences to reduce the search space.

Step 3: Lexicon Construction
In this stage, 3 lexicons are developed: Modern Standard Arabic (MSA) lexicon, colloquial lexicon and negation lexicon.
Step 4: Subjective-Objective Discrimination In this stage, an algorithm (Algorithm 1) is presented for discriminating between objective sentences which have no polarity terms and subjective sentences which have polarity terms.
Step 5: Automatic Analyzer In this stage, three algorithms are presented. An automatic analyzer without weight is used (Algorithm 2), an automatic analyzer with intensification weight is used (Algorithm 3) negation algorithm for inversing the polarity term if a negation term is detected (Algorithm 4).

Sentiment Analysis Algorithms
After building the lexicons, automatic algorithms for analyzing the sentiment polarity are used. These algorithms are used instead of manual methods presented in (Akaichi, 2014). The authors of (Akaichi, 2014) focus on positive and negative polarities only without considering neutral polarities. One of the recent algorithms for sentiment analysis is presented in (Sghaier and Zrigui, 2016). This algorithm is used for data preprocessing without analyzing and determining the polarity for each term. This research presents different interactive algorithms for analyzing the polarity of sentences with multi-weight automatic analyzer.

Subjective -Objective Discrimination
Subjective sentence consists of terms that have positive, negative or neutral polarities. It reflects the orientation of the user preferences. Objective sentence have no polarities because it refers to a certain fact. Algorithm 1 presents automatic subjective-objective analyzer. Instead of manual operations presented in (Al-Kabi et al., 2014), Algorithm 1 can trace thousands of sentences and determine whether they are subjective or objective. This can reduce the time complexity.

Automatic Analyzer without Weight
The first algorithm for performing the automatic analysis process is presented in Algorithm 2. In this algorithm the number of possible alternatives for determining the polarity of each sentence is explained in formula (2) Each sentence in the lexicons is analyzed without weight. The algorithm counts the maximum number of positive, negative and neutral terms. If positive and negative terms are equal and greater than neutral terms, then the sentence will be directed as "neutral orientation". If positive and neutral terms are equal and greater than negative terms, then the sentence will be directed as "positive orientation". The difference between "positive" sentence and "positive orientation" sentence is that the first one is actually positive but the next one is approximately positive. All remaining alternatives are explained in Algorithm 2.

Automatic Analyzer with Weight
One of the recent researches in Arabic sentiment analysis is presented in (Al-Kabi et al., 2014;Ibrahim et al., 2015). It is used to build a corpus for modern standard Arabic and colloquial Arabic but it collects datasets and performs the annotation process manually. In addition, a fixed weight value is used and tested in (Ibrahim et al., 2015) with only classical Arabic comments and this can reduce the accuracy of data due to the wide range of vocabularies and vernaculars.
As presented in Algorithm 3, the automatic analyzer is used to analyze each sentence based on the weight of each term (t) detected during the tracing process. For each sentence, the average weight is calculated based on formula (3): where, w i is the weight for each term t i . The general sentence polarity is automatically defined by dividing the sum of all sentence weights by the sum of polarity terms in the sentence. Algorithm 3 performs the automatic analysis process based on different weights 3, 5 and 7. In each weight, different calculations are used to determine the sentence polarity.

Negation Algorithm
For completing the sentiment analysis process, the automatic analyzer must perform the negation process automatically before assigning the polarity of the sentence. During the lexicon construction, a negation lexicon is constructed. The automatic analyzer retrieves all negation terms and then performs the tracing process in both MSA and colloquial lexicons. If a negation term precedes a polarity term, the weight value of the term is inversed to its opposite value. For example, if a negative term of weight -2 is detected, its weight will be inversed to 2.

Experimental Results
The automatic lexicon was developed using Microsoft Visual Studio.net 2015 with Microsoft SQL Server 2010 database. The experimental results were conducted on Intel ® Core i5 @ CPU 1.8 GHz machine with 4GB RAM. The operating system was Microsoft Windows 10. A total of 5181 slang and classical Arabic terms was embedded into the automatic lexicon as presented in Table 2. A total of 4394 Arabic seed terms are collected from (Aly and Atiya, 2013). In addition, a number of 1574 new terms are added of a total of 5968 terms. These datasets are stored in the automatic lexicons and tested for multi-domain data with different weight value.
The seed words are tested for determining the polarity of sentences without weight value as presented in Algorithm 2 and with weight value as presented in algorithm 3. The weight value is tested using W i = 3, W i = 5 and W i = 7.
For measuring the performance of the developed automatic lexicon and sentiment analysis algorithms, the precision, recall, accuracy and F-measure are used based on the following formulas: Compared with the two algorithms presented in (Abdulla et al., 2014;Duwairi, 2015) that achieves an accuracy of 74.6 and 87.8% respectively. The experimental results for the developed automatic lexicon show high accuracy of 90.37% without using weight value and 90.79, 93.67% and 96.15% accuracy with using weight value of W i = 3, W i = 5 and W i = 7 respectively. The overall performance of the automatic lexicon is presented in Table 3 -6. As presented in Table 3-6, when the weight value is small, the polarity of sentences will be very strict. By using multi-weight Algorithm, the variability and distribution of sentence polarity will be clearer and the context of the sentence will be more explained specially for large amount of datasets. In addition, the accuracy, precision, recall and F-measure are increased due to the reduction of False Positive (FP) and False Negative (FN) polarities. For Example, the sentence: That means "A normal reaction to a sudden and accelerated growth rates in the global economy".
If a weight of 7 is used, the automatic analyzer detects the four terms: • " " which means "normal" takes a neutral polarity with a value of 0 • " " which means "growth" takes a positive polarity with value 2 • ‫رع"‬ " which means "accelerated" takes a high positive polarity with value 3 " which means "sudden" takes a neutral polarity with a value of 0 Based on Algorithm 3, the automatic analyzer will calculate the sentence polarity based on the formula (3).
In Algorithm 3, the value 1.25 will satisfy the condition 0 < 1.25 < = 1.5 will be positive orientation or nearly positive. In Algorithm 2, if the weight is not used in the automatic analyzer, then the number of positive terms will be 2 ( , ‫رع‬ ) which means (growth, accelerated) respectively and the number of neutral terms will be 2 ( , ) which means (normal, sudden) respectively. If the number of positive and neutral terms is equal, then the sentence polarity will be positive orientation or nearly positive.
A performance comparison between automatic analyzer without weight and with weight is presented in Fig. 2 -5. As presented in Fig. 2-5, the automatic analyzer algorithms achieve high average precision, recall, accuracy and F-measure compared with the average performance presented in (Al Shboul et al., 2015) that show low results. In research (Al Shboul et al., 2015), the precision, recall and F-measure using SVM shows 43.3, 43.6% and 43.1% respectively. Using naïve Byes; the precision, recall and F-measure shows 42.6, 43.3% and 42.4% respectively. Figure 2 explains the average precision of 89.39% without weight and 91.46% with weight. Figure 3 explains average recall of 92.39% without weight and 95.94% with weight. Figure 4 explains average accuracy of 90.37% without weight and 93.54% with weight. Figure 5 explains average F-measure of 90.42% without weight and 93.65% with weight. For exploring the efficiency and flexibility of the developed automatic lexicon, Fig. 6 and 7 presents the general GUI for user input for colloquial and MSA lexicons.
The implemented automatic lexicon is applied by tracing all sentiments in the MSA, colloquial and negation lexicons and the final polarity for each sentiment is determined as presented in Fig. 8.
Based on the analysis of sentiments, the automatic analyzer presents each sentence with its polarity. The overall polarity of all sentences is determined as presented in Fig. 9.
On a sample of 112 sentences, the automatic analyzer presents a total of 61 negative sentences with different polarities (high negative, negative, nearly negative) and a total of 45 positive sentences with different polarities (high positive, positive, nearly positive) and only 6 neutral sentences.
Based on this knowledge, the general direction of this domain is negative. This can help institutions, government and private departments to anticipate the reactions on a particular topic or domain.

Discussion
Based on the developed automatic lexicons, the experimental results show high performance in acquiring and analyzing datasets with an ability to present sentiment polarity with high accuracy, precision, recall and F-measure. After applying the automatic lexicon without weight, an average accuracy of 90.37% is proved. If the automatic lexicon with weight is applied, an average accuracy of 93.54% is proved. In future work, an enhancement of the negation lexicon will be made for increasing the performance of the lexicon analyzer.

Conclusion and Future Work
Sentiment analysis is a statistical analysis of comments and reviews to determine the polarity of sentences. Different research papers in this field are targeted in English language. Few number of research papers focus on Arabic sentiment analysis. Due to the increase of social networks, recent applications are used to analyze Arabic reviews from different domains but the majority of these research papers collects and analyzes datasets manually. This paper presents an automatic lexicon with enhanced algorithms for analyzing sentiments and increasing the performance analysis of datasets from multi-domain. The experimental results achieve high accuracy, precision, recall and F-measure compared with recent research papers. Future work of this research is to enhance the negation algorithm and to increase the size of the automatic lexicon terms in order to increase the accuracy and performance of the system.