A Review on Challenging Issues in Arabic Sentiment Analysis

: Understanding what people think about an idea or how they evaluate a product, a service or a policy is important for individuals, companies and governments. Sentiment analysis is the process of automatically identifying opinions expressed in text on certain subjects. The accuracy of sentiment analysis has a direct effect on decision making in both business and government. Working with the Arabic language is very important because of the growing number of online contents in Arabic and the existing resources are limited and the accuracy of existing methods is low. In this study, we do a survey to highlight Arabic sentiment analysis challenging issues based on two main perspectives: Arabic-specific and general linguistic issues. The Arabic-specific challenges are mainly caused by Arabic morphological complexity, limited resources and dialects, while the general linguistic issues include polarity fuzziness, polarity strength, implicit sentiment, sarcasm, spam, review quality and domain dependence.


Introduction
The use of microblogging services has led to wide spread availability of opinionated posts (El-Beltagy and Ali, 2013). These available data provide an advantage for using social media websites and blogs in opinion studies. Understanding what people think about an idea or how they evaluate a product, a service or a policy is important for individuals, companies and governments. Sentiment Analysis (SA), also referred to in the literature as opinion mining, is the process of automatically identifying opinions expressed in text on certain subjects (Baly et al., 2016). SA has been performed on various levels of granularity; word, phrase, sentence (Wiebe et al., 1999), document (Pang et al., 2002;Turney, 2002), or aspect (Pontiki et al., 2014;Negi and Buitelaar, 2014) and from different perspectives; subjectivity identification or sentiment analysis. In order to apply SA, two main approaches are adopted: Lexicon-based and machine-learning approaches. Lexicon-based approaches use a dictionary of subjective words with their polarities and use simple matching methods to calculate the polarity scores (Al-Kabi et al., 2013;Li and Li, 2013;Badaro et al., 2014a). Machine-learning approaches use annotated datasets to train classifiers such as Support Vector Machines (SVM) (Kontopoulos et al., 2013;Tang et al., 2014), Naïve Bayes (NB) (Alhumoud et al., 2015;Farra et al., 2010), Neural Networks (NN) (Sharma and Dey, 2012;Bollen et al., 2011) and more recently Deep Learning NN (Socher et al., 2013;Yuan and Zhou, 2015;Al Sallab et al., 2015). However, the obtained results are generally low in terms of accuracy especially in languages other than English such as the Arabic language on which we focus in this study.
According to the Internet World State rank in June 2016, Arabic is the fourth of the top ten languages used in the Internet (IWS, 2016). Most efforts in SA are focused on English and other Indo-European languages and little work has been done on Arabic (El-Halees, 2011;Abdul-Mageed et al., 2011). Most of the SA methods have been developed for English text and are difficult to apply to other languages like Arabic (Al-Kabi et al., 2014). Arabic is a morphologically rich language that poses significant challenges to Natural Language Processing (NLP) systems in general (Abdul-Mageed et al., 2011). Annotated Arabic corpora, necessary for training machine learning classifiers, are not only small, but also rare to be publically available. Moreover, almost all efforts on Arabic SA are focused on processing text in the general domain or in text from news articles and little is developed for targeted and specific domains such as finance, sports, legal, etc. Assiri et al. (2015) reported that this lack of support for the Arabic language is due to the limited scholarly work and research fund and the morphological complexities and different dialects of the Arabic language.
There are many survey studies covering SA. For example, alOwisheq et al. (2016) reviewed works pertaining to the recent resources (i.e., lexica and corpora) which have been targeting the Arabic language, Korayem (2016) studied the sentiment and subjectivity methods for languages other than English, Schouten and Frasincar (2015) focused on the aspect detection for SA, Korayem et al. (2012) surveyed different techniques for subjectivity and SA of the Arabic language, Vinodhini and Chandrasekaran (2014) discussed the different SA techniques, methods and applications and other survey papers like (Yadav, 2015;Sadegh et al., 2012;Buche et al., 2013;Mahadik and Bharambe, 2015;Wiegand et al., 2010;Tang et al., 2009;Kaur and Duhan, 2015;Ahire, 2014;Liu and Zhang, 2012;Assiri et al., 2015). Moreover, El-Beltagy and Ali (2013) discussed some of the SA open issues in Arabic social media. These previous surveys focused on recent works categorization, SA techniques and applications. While some of them highlight the SA challenging issues, differently and albeit more comprehensively, the current manuscript attempts to cover such issues, discuss their causes to the SA low accuracy problem, focus on the Arabic language and highlight how previous work dealt with those issues.

Arabic Sentiment Analysis Issues
The main aim in any SA work is to produce highly accurate results, thus we discuss the challenges which contribute to low accuracy in Arabic SA. Figure 1 illustrates the challenging issues, where these issues are divided based on two main perspectives; Arabic-specific and general linguistic issues that are common to all languages.

Arabic-Specific Challenges
Arabic, the language considered in this study, introduces additional difficulties when developing SA systems because of its morphological complexity, the existence of a large number of dialectal variants and the lack of resources. The Arabic language has a complex morphological structure based on root-pattern schemes (Al-Sughaiyer and Al-Kharashi, 2004). Also it has many variants, such as classical Arabic, which is the language of the Quran; modern standard Arabic (MSA), the official language that is standardized, written in news and taught in schools; and dialectal Arabic (DA), which is used in daily lives and spoken communications (Zaidan and Callison-Burch, 2011;Habash, 2010). Arabic used in social media is usually a mixture of MSA and one or more of the Arabic dialects (Refaee and Rieser, 2015).

Morphological Complexity
Arabic language is one of the morphologically rich languages that has significant challenges to NLP systems in general (Abdul-Mageed et al., 2011). Arabic is a highly inflectional and derivational language and various forms can exist for the same Arabic word using different suffixes, affixes and prefixes (Shoukry and Rafea, 2012). Inflectional morphology refers to the process of adding extensions to a word while the Part Of Speech (POS) and the meaning of the word remain intact. For example, one word may have more than one lexical category in different contexts (El-Halees, 2011), such as a tense-based affix used as the present-tense prefix ‫ـ‬J /y-/ in KLM -KLNJ /ynẓr/,'he looks' (Al-Sabbagh and Girju, 2012). Derivational morphology refers to extracting new words from other words with modifying the core meaning of the word, e.g. the Arabic verb'‫ل‬TU', /qạl/, 'to say', is the source for the Arabic noun 'WXTU', /qạỷl/, 'the person who is saying' and the noun '‫ل‬ZU', /qwl/, 'say (n)' (Habash, 2010). In addition, Arabic nouns and verbs are typically derived from a set of 10,000 roots (Mourad and Darwish, 2013) for different words and completely different meanings can be composed form the same root (Shoukry and Rafea, 2012). Almuqren and Cristea (2016) reported challenges of dealing with script of the Arabic language such as diacritization, negation and spelling errors. These different challenges in the Arabic language led to the lack of SA resources such as comprehensive sentiment lexica and corpora.
General tools for NPL are also moderately developed for Arabic. Khoja and Garside (1999) developed an Arabic stemmer. The author also developed a POS tagger for Arabic (Khoja, 2001). ISRI Arabic stemmer algorithm (Taghva et al., 2005) was implemented without a root-dictionary like Khoja Arabic Stemmer. The authors in (Pasha et al., 2014) developed MADAMIRA; a tool for morphological analysis and disambiguation. The authors in (Elfardy et al., 2014) introduced AIDA: Identifying Code Switching in Informal Arabic Text relying on Language Models and MADAMIRA (Pasha et al., 2014) to identify the class of each word in a given sentence.

Dialectal Arabic
Dealing with DA creates additional challenges (Zaidan and Callison-Burch, 2011;Refaee and Rieser, 2014). Using the different dialects in social media, where Arab users freely express themselves, adds more challenging to SA because the majority of the NLP tools for the Arabic language have been developed for MSA (alOwisheq et al., 2016). According to (Habash, 2010), Arabic dialects significantly differ from MSA in terms of phonology, morphology, lexical choice and syntax. The Arabic dialects are divided as: • EA: Egyptian Arabic for Egypt and Sudan   Habash (2010) added that each of the dialects contains three sub-dialects: City, Rural and Bedouin. These dialects lack standardization, written in free-text and vary from MSA. For example, the word 'qrsTta‫;'ا‬ /ạlʿạfyẗ/; 'wellness' in MSA, but in the MA dialect, it means 'hell' and is widely used in a sentiment sentence like 'qrsTta‫ا‬ `rxtJ ‫;'ﷲ‬ /ạllhyʿṭykạlʿạfyẗ/; 'May Allah bless you' in almost all of the other Arabic dialects; however, in MA it means 'go to hell'. Also, the word '‫ط‬Z\]^'; /mbswṭ/; 'happy' or 'rich', but in the IA it means 'severe beatings'. Table 1 compares the polarity in MSA words/phrases and DA.

Limited Resources
There is little focus from researchers on tackling the challenge of Arabic SA (Refaee and Rieser, 2014;El-Beltagy and Ali, 2013). Therefore, Arabic resources for SA are difficult to find. There is a lack of labeled corpora and polarity lexica (Refaee and Rieser, 2015;Abdul-Mageed et al., 2011;alOwisheq et al., 2016). In addition, the size of existing subjectivity lexicons is small (Mourad and Darwish, 2013). The complexity of the Arabic language as discussed earlier, affected negatively on the amount of existing resources.
Articles in the literature introduced Arabic corpora that are annotated for SA include: • OCA Opinion Corpus for Arabic (Rushdi-Saleh et al., 2011) is an Arabic dataset consisting of 500 movie reviews • COLABA (Diab et al., 2010) targeted EA, IA, LA and a much smaller effort on MA, ASTD (Nabil et al., 2015) contains 10 K tweets for the Egyptian Dialect • YADAC (Al-Sabbagh and Girju, 2012) presented as a multi-genre dialectal Arabic corpus, using data from micro-blogs like twitter, blogs, forums and online market services • AWATIF (Abdul-Mageed and Diab, 2012 In spite of these different resources, the variety of the Arabic language dialects still need more efforts to be covered. However, some domains like news have been covered in many resources, there are many other domains have not been targeted yet. On the other hand, there are some polarity lexica such as the lexicon introduced in (Abdul-Mageed et al., 2011) that consisted of adjectives, the lexicon in (Mourad and Darwish, 2013) that used Machine Translation (MT) to translate an existing English lexicon and random graph walk to expand a manually prepared Arabic lexicon and ArSenL (Badaro et al., 2014b) that used existing resources including English SentiWordNet (ESWN) and Arabic WordNet to produce a large scale Standard Arabic sentiment lexicon. Depending on adjectives polarity lexicon is not enough, because of the richness of Arabic in expressing the feelings and sentiment. In addition, the change in polarity classification for each polarity word in different dialects, contexts and domains still an open issue (Liu, 2012;Varghese and Jayasree, 2013;Refaee and Rieser, 2014).

General Linguistic Issues
This section discusses the general linguistic issues that cause the low accuracy of SA in any language. We discuss these issues in relation to the Arabic language.

Polarity Fuzziness
Most of the methods used in sentiment classification considered the polarity (e.g., positivity and negativity) and do not pay attention to the polarity fuzziness (Wang et al., 2015). Sometimes it is hard to identify the polarity of a text. Even two humans may not agree on the same annotation; each can have a different point of view. A sentence containing sentiment words may not express any sentiment, such as questions, e.g., ' ‫ء‬Zza‫ا‬ ‫ا•~}|ام‬ W‫ھ‬ ‫‚؟‬aT}^ oaTta‫/;'ا‬hl ạstkẖdạmạlḍwʾạlʿạlymkẖạlf?/; 'Is the use of the car main beam a violation?'. This question contains the word '‚aT}^'; /mkẖạlf/; 'violation' which bears a negative sentiment, but in this context, it bears no sentiment. Another example is conditional sentences, e.g., ' ‫ا‪K ‡a‬ور‬ |ˆ‫ا‬ZU ‚aT‰ ‫إذا‬ OE‫ا‬ •}Ž ‫أي‬ •UTtr• '^ ‫ا‪K ‡a‬ور‬ . '; /ạlmrwrmsẖbyʿạqbạ ỷ sẖkẖṣạlạạ ̹ dẖạkẖạlfqwạʿdạlmrwr./; 'Traffic police doesn't punish anyone but those who violate the traffic rules', which contains '‚aT‰'; /kẖạlf/; 'violate', bearing negative sentiment but there is no sentiment in this conditional sentence (Liu, 2012). However, questions and conditional sentences may express sentiments, e.g. ' •e}~‫ھ‬ o"KrŽ•" ‫أˆ‪K‬ف‬ -˜ ‡• Za ™š ‡‫؟‬ ™JK›Ž ™^ oeN~\^ TM‫أ‬ K›Ž •ž |t•'; /mmknlwsmḥtạ ʿ rftạ s ẖyrtyhtkẖlṣbʿd km sẖhrạ n ạmstny ̱ mnsẖhryn?/; 'Please let me know, how many months still for my visa to be issued, I have been waiting for two months?' and ' q ‡b a‫ا‬ -¡‰ -MTž ™r^ ~e^ ™r¢XT\a‫ا‬ Za . '; /lwạlsạỷqynmltzmynkạntkẖftạlzḥmẗ./; 'If the drivers were committed, jams would be reduced'. The authors in (Jindal and Liu, 2006) proposed a novel rule mining and machine learning approach to identifying comparative sentences, which are useful in many applications such as marketing intelligence, product benchmarking and ecommerce. The authors in (Narayanan et al., 2009) carried out their study from both the linguistic and computational perspectives. The linguistic study focused on canonical tense patterns, which have proven useful in classification, while in the computational study, they automatically predicted whether opinions on topics were positive, negative, or neutral by building SVM models. In the Arabic language, using adjectives and nouns for people's names is common (Table 2). Thus, it is confusing to use one of them in a context similar to this: ' Wr ‡p ™\b ‫م‬K£ ‡a‫ا‬ . '; /ạlmjrmḥsnjmyl./; which literally has two different meanings.
(1) To use as adjectives, the phrase means 'The offender is good and beautiful'. (2) To use as a person's name, the phrase means 'Hassan Jamil is the offender'.

Polarity Strength
The sentiment word or phrase is a dominating factor in SA and the strength of the polarity is an important reference to the person's opinion or sentiment. To calculate the document-level sentiment scores, Taboada et al. (2011) used polarity dictionaries of sentiment words and phrases categorized by polarity and strength and employed with negations and intensifiers. To classify the sentiment strength in English text, Thelwall et al. (2011;2012) proposed and improved a new algorithm, SentiStrength, using methods to exploit the de-facto grammar and spelling styles of cyberspace. Oraby et al. (2013) proposed a rule-based approach to extract the opinion-phrase using a sentiment lexicon with opinion indicators and after measuring the strength of the opinion, they developed the calculation method with four polarity categories (positive weak, positive strong, negative weak and negative strong). Arabic's strength level can be expressed in various forms. For instance, the review ‫•‪T‬رد'‬ Z£a‫'ا‬ /ạljwbạrd/, ‫ا‪K]a‬ودة'‬ ‫رص‬TU Z£a‫'ا‬ /ạljwqạrṣạlbrwdẗ/, ‫ا‪K]a‬ودة'‬ |J|Ž Z£a‫'ا‬ /ạljwsẖdydạlbrwdẗ/, ‫إˆ~‪Tr‬دي'‬ Krm Wš §• ‫•‪T‬رد‬ Z£a‫/'ا‬ạljwbạrdbsẖklgẖyrạ ̹ ʿtyạdy/ and other forms used to say, 'The weather is cold, in different strength level of the coldness'.

Domain Dependency
Sentiment is expressed differently in different domains (Varghese and Jayasree, 2013), so a sentiment classifier trained to classify opinion polarities in a domain may produce poor or useless results when used in another domain; the results are only accurate in the domain for which they are trained (Oraby et al., 2013). In addition, the sentiment word may have opposite orientations in different domains. For example, 'WU‫/;'أ‬ạ q l/usually expresses negative sentiment, e.g., 'T‫ھ‬Kt• ™^ WU‫أ‬ ‫س‬Zes'; /flwsạ q lmnsʿrhạ/; 'Money is less than its worth', but it may also express a positive, e.g., ' -U‫و‬ os -©e‰ ‫ا«‪Kp‬اءات‬ WU‫;'أ‬ /ạlạ ̹ jrạʾạtkẖlṣtfywqtạ q l'/; 'The procedures were done in less time'. Another exampleis the sentence ' ‫ن‬Z^|}~\J ‚"‫ا‬Z›a‫;'ا‬ /ystkẖdmwnạlhwạtf/; 'They are using the phones'. This sentence in the domain of public services providing booking of appointment through the phones is positive, but when it is used in the context of people driving habits it bears a negative sentiment. Aue and Gamon (2005) discussed the challenges in using sentiment classifiers in new domains, showing that although the approaches are different, they all need a relatively labeled training dataset. Blitzer (2007) reported that domain adaptation addresses the situation in which labeled data from a source domain is used to train a model, but little or no labeled data from a target domain where the model will be applied. They applied learning representations, which minimize the difference between source and target domains. The proposed approach in (Wu et al., 2009) integrated the sentiment orientations of documents into the graph-ranking algorithm, which uses the accurate labels of old-domain documents as well as the 'pseudo' labels of new-domain documents. Pan et al. (2010) proposed a general framework for cross-domain sentiment classification. They first build a bipartite graph between domain-independent and domain-specific features and then they propose a Spectral Feature Alignment (SFA) algorithm to align the domain-specific words from the source and target domains into meaningful clusters with the help of domain-independent words as a bridge. ElSahar and El-Beltagy (2015) introduced large, multi-domain datasets for SA in the domains of movies, hotels, restaurants and products. Additionally, a multi-domain lexicon of 2,000 entries was extracted from the datasets. The researchers used SVM and K-Nearest (KNN) classifiers. SVM results were better than KNN ones and the best performing feature representations were the combination of the lexicon-based features with the other features.

Implicit Sentiment
Sentences with implicit sentiment are opinionated objective sentences (Yazdavar, 2013;Pang and Lee, 2004). Many sentences without sentiment words can also imply opinions. For example: 'This washer uses a lot of water' implies negative sentiment, 'After sleeping on the mattress for two days, a valley has formed in the middle' expresses a negative opinion, 'Phone doesn't fit pocket'implies that the phone size is inappropriate and 'Phone is cheap' implies a bad quality of the phone rather than a good price for it. In the Arabic language, implicit sentiment is popular in sentences like ' ‫ﷲ‬ o]\b WržZa‫ا‬ •tM‫;'و‬ /ḥsbyạllhwnʿmạlwkyl/; 'in Allah (God) I trust and He is best to trust', which is used when someone is beingoppressed. Implicit sentiment is also found in ' ‫ده‬ •r‫ط‬ Wp‫;'را‬ /dh rạjlṭyb/; 'he is a good man', as it may bear a negative sentiment when describing the man as unintelligent. Zhang and Liu (2011) studied the problem of objective nouns and sentences with implied opinions. They proposed a method that determines the feature polarity of opinion words that modify features and their surrounding context. Van de Kauter et al. (2015) introduced a fine-grained scheme for the annotation of polar sentiment, explicit sentiment (polar expressions) and implicit expressions of sentiment (polar facts) in text.

Politeness and Euphemism
Politeness is the practical application of good manners or etiquette. It is a culturally defined phenomenon and therefore what is considered polite in one culture can sometimes be quite rude or simply eccentric in another cultural context (Wikipedia, 2015). On the one hand, politeness may affect how people express their opinions or sentiments. For example, when we ask someone, 'Could you please activate my account'? The idea of blocking the account gives us a negative sentiment, but, in such a polite sentence, it is very hard to classify. On the other hand, direct requests like, 'Activate my account', have a negative sentiment (Abdul-Mageed and Diab, 2012). As mentioned above, politeness is changing according to the society and culture, so the different Arabic dialects present a big challenge, even in MSA. For example: ' ‫ﷲ‬ ®r• `›p‫و‬ '; /byḍạllhwjhk/; 'God whiten your face', is used in daily communication as a positive and polite sentence, while the original meaning for it is 'to wish death for someone who is bad'. Euphemism (e.g., use 'story' or 'cover' instead of 'lie') is also used widely in Arabic. For example, in the Quran, ' ™ › a ‫س‬T ] a • ~M ‫أ‬ ‫و‬ • š a ‫س‬T ] a ™ ‫ھ‬ ‫ﱠ‬ ُ ‫ﱠ‬ ٌ َ ِ ْ ُ َ َ ْ ُ ‫ﱠ‬ ٌ َ ِ ‫ﱠ‬ ُ '; /hunãlibāsuⁿ lãkum ̊ wāảntum ālibāsuⁿ lãhunã ̊ / [Al-Baqara 187] describes the relationship between husbands and wives, as the wives cover their husbands and protects them from sins.

Sarcasm
Sarcasm is difficult to detect (Refaee and Rieser, 2014) because it uses positive indicators to express negative emotions, e.g., 'What a great car! It stopped working in two days'. Sarcasm is not used in consumer reviews, but is very common in political reviews. Mourad and Darwish (2013) reported that in the annotated tweets in their corpus, nearly 13.5% were sarcastic. Using a positive sentiment in bad situations for the purpose of sarcasm is popular in Arabic. For instance, ' ‫وھ‪Z‬‬ ¾~p‫و‬ a •"‫ا‬Ka‫ا‬ oexˆ‫أ‬ |¢a ¿sTN^ ،•~\]J'; /lqdạ ʿ ṭy ̱ ạlrạtblzwjthwhwybstm, mnạfq/; 'He was happy when he gave the salary to his wife, he is hypocrite'. Davidov et al. (2010) used a semi-supervised approach to classify sentences in online product reviews into sarcastic classes. González-Ibánez et al. (2011) studied the problem of automatically detecting sarcasm in Twitter messages. Using an annotated corpus, they explored the contribution of linguistic and pragmatic features of tweets to the automatic identification of sarcastic messages and found that the three pragmatic features-ToUser, smiley and frown-were among the ten most discriminating features in the classification tasks. Maynard and Greenwood (2014) investigated the Twitter sarcasm characteristics and the effect of sarcasm on sentiment analysis.

Spam
The abundance of social media allows spammers to post fake opinions to promote a product or to discredit another. Spam is also spread in political and governmental reviews. Jindal and Liu (2007) studied the spam review problem in a manufactured products dataset and a logistic regression was performed. Three types of duplicate reviews are most likely to be spam: (1) From different userids on the same product, (2) from the same userid on different products and (3) from different userids on different products. Jindal and Liu (2008) reported three types of spam reviews: (1) Fake reviews: Untruthful reviews containing positive or negative opinions about target entities (products or services) in order to promote or damage their reputations, (2) brand reviews: Do not comment on the specific products or services but on the brands or the manufacturers and (3) non-reviews. There are two main subtypes: Advertisements and other irrelevant texts containing no opinions (e.g., questions, answers and random texts). Strictly speaking, these are not opinion spam, as they do not give user opinions.

Review Quality
The quality, usefulness, helpfulness, or the utility of the review is important to be taken into account in SA. A review may not be actually spam, but neither is it helpful. For example, a review targeting a brand like Apple, while evaluating a product such as IPhone 7, may be Apple is a good brand but the evaluated product is not good. Also, greeting comments such as ' Kr}a‫ا‬ ‫ح‬T]f ‫ر‬ZNa‫وا‬ .
'; /ṣbạḥạlkẖyrwạlnwr/; 'good morning'. Consider this comment on an organization's Facebook post. It appears to bear a positive sentiment, while it is not helpful in evaluating the sentiment toward the organization. As using greetings may be followed by a question or a complaint against the introduced services. Kim et al. (2006) proposed an algorithm for automatically assessing helpfulness and ranking reviews according to helpfulness using an SVM regression system. Ghose and Ipeirotis (2007) find reviews that include a mixture of subjective and objective elements that are considered more informative or helpful by the users.

Conclusion
Sentiment analysis have been used in various applications in public and customer opinion studies such as social, news and commerce domains. The accuracy of the analysis has a direct effect upon the decision-making capacity of businesses and governments. Therefore, the need for efficient sentiment analysis systems is on the rise. Even though the popularity of using Arabic language in the internet is on the rise, there are limited efforts in Arabic sentiment analysis and building of necessary resources, namely lexica and corpora. The work on building Arabic polarity lexicon often relies on the English available lexicons which may be affected by the different cultures. Also, composing the lexicon from adjectives and neglecting the nouns and other POSs is not enough. Most Arabic corpora are unpublished and the available ones still need more efforts to cover the multi-dialects and the different domains issue. There are few attempts to use Fuzzy logic to raise the accuracy of Sentiment Analysis. In spite of the mentioned efforts, the sentiment analysis in Arabic language still has many unsolved issues. Table 3 below shows some of the related works to highlight the different Approaches and methods which used to address the dimensions of sentiment analysis low accuracy problem. This includes Sources/Genres, Domains, Dialects and Linguistic Issues. Also the table shows how they dealt with the Arabic-specific challenges and illustrates their outputs either annotated corpus or sentiment lexicon. From the table, it is clear that dealing with the sentiment analysis low accuracy requires to take into account different dimensions that affect the polarity strength and direction. Domain dependency still considered an open issue because the previous woks aedomain specific.  Moreover, dialectal Arabic still not covered and needs to be addressed especially with different domains. In this study, the issues which cause the sentiment analysis low accuracy problem are discussed based on two main components: Arabic-specific challenges and general linguistic issues. The Arabic-specific challenges divided into three main parts: Morphological complexity, limited resources and dialects. The general linguistic issues include polarity fuzziness, implicit sentiment, sarcasm, polarity strength, spam, review quality issues and domain dependence.