Social Media Sentiment Analysis: The Hajj Tweets Case Study

Corresponding Author: Mohammad Ashraf Ottom Department of Information Systems, Faculty of Information Technology and Computer Sciences, Yarmouk University, Irbid, Jordan Email: ottom.ma@yu.edu.jo Abstract: About forty five percent of the world's population use social networks, thinking of using these platforms seemed to find people's opinions and feelings on various topics. Companies that offer their services and products to customers focus on the subject for future improvement. Thus, serious thinking began to analyze the views of people across different social platforms and also to develop the best ways to analyze these views. In this study, we focused on finding the best way for sentiment analysis by using a series of Hajj-related tweets, which is one of the most important rituals performed by Muslims, where the companies responsible for the pilgrimage season seek to complete the season in best way every year. We used the Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and Naïve Bayes (NB) as supervised algorithms for machine-learning approach and Text Blob analyzer for lexicon-based approach. Finding shows that, machine learning techniques worked better than the lexicon approach in the classification and analysis of Hajj related tweets. Even the limited availability of Hajj tweets corpus dataset, SVM reaches the best accuracy which was 84%.


Introduction
Hajj is an Islamic terminology meaning pilgrimage. Hajj is an annual Islamic pilgrimage to Mecca in the Arabian Peninsula and it is mandatory worship and duty for all Muslims who are financially and physically capable, once in a lifetime. Every Islamic calendar year in the month of Du-Al-Hijjah, millions of Muslims visit Mecca to perform Hajj. Nowadays and with advancement in communication and technology, people can use social media platforms such as Twitter to provide opinion and experience during this journey. Data mining techniques especially opinion mining tools can be used to obtain knowledge and pattern about the overall people experiences and opinions about the services and facilities in Mecca, which can be used later by official authorities to enhance future experience and services in the following years (Clingingsmith et al., 2009;Peters, 1996).
Technological development and emergence of many social networking platforms have given people the opportunity to express themselves and their interests and to share their thoughts and feelings on different topics. Many service organizations and businesses are interested in following customer feedback and customer satisfaction or dissatisfaction with the products and services offered to them in order to make better decisions about improvement. Social networking platforms may be one of the best choices for such companies and organizations.
According to the latest statistics published by the social media management platform "Hootsuite", about 3.48 billion people are now using social media, in another way: 45% of the total world population are using social networks. It should be noted that these percentages have been measured through various social media such as Twitter, where it has 321 million monthly active users. Twitter offers a small blogging service that allows users to send "Tweets" to others, with a maximum of 280 characters per message which makes Twitter an ideal choice for the research reported in this study (Khanna, 2018). Sentiment Analysis (SA) is the study of natural language to systematically extract knowledge and useful patterns for knowledge-based decision. Sentiment Analysis, also called opinion mining, is the process of determining whether a text has been written to express different view. One of common applications for this technology is to learn how individuals believe about a certain topic (Hussein, 2018). In this study, we want to focus on knowing the most accurate analysis method from the main sentiment analysis methods depending on our data collected from the last pilgrimage season.
The manuscript is structured into the following sections: Literature review, the suggested methodology and after all, conclusions of this study and future work were discussed.

Literature Review
The literature showed many approaches of sentiment analysis. Overall, there are two main methods, firstly is a lexicon-based approach which is presented in this study and the other one is Machine-learning-based Approach Which is often used different classification techniques that can be used for text classification in supervised learning, such as Naïve Bayes, K-Nearest Neighbor classifier (KNN) and Support Vector Machines (SVM) (Alessia et al., 2015) Naïve Bayes Naïve Bayes Classifier is a probabilistic classifier based on applying Bayes theorem written in the Eqs. (1a) and (1b): where each attribute set A = {A1, A2, ..., An} consists of n attributes, given the class label C.
The assumption made here is that the attributes are independent. That is the presence of one particular attribute does not affect the other. Naïve Bayes works well on text categorization (Ahmad, 2013).

Support Vector Machine (SVM)
Another algorithm that can be used for text classification is a support vector machine. The main idea in this classifier is to find the hyperplane which separates the data as optimally as possible (Demidova et al., 2016;Ottom and Alshorman, 2019) as shown in Fig. 1.

K-Nearest Neighbor (KNN)
KNN does not build a model from the training data. To classify any test example A, define K-Neighborhood as K nearest Neighbors of A. Computing the distance between A and every example in the data set. Choosing the K examples in the dataset that are nearest to A. Assign A the class that is the most frequent class in Knearest classes (Ashraf et al., 2013;Navlani, 2018) as shown in Fig. 2.
The author in (Bagheri and Islam, 2017) showed the application of sentimental analysis and how to connect to Twitter and run sentimental analysis queries using python and some packages like Tweepy and TextBlob packages. Then, he compared the sentimental analysis results from different queries including movie, politics, fashion and fake news.  (Navlani, 2018) Authors in the paper (Madhoushi et al., 2015) aimed to classify SA methods away from the level or task and to examine the major research problems in recent articles shown in the field. They found that ML methods are the most frequent used approaches. On the other side, they discussed the ongoing issues such as some methods are still not suitable in some domains, due to small number of labeled data records in some applications. Another issue with SA is the need for more researches of SA in Arabic and other languages rather than English, to deal with complicated sentences and words that requires advance sentiment and parsing. Hasan et al. (2018) created a hybrid approach that involves three sentiment analyzers named SentiWordNet, TextBlob and Word Sequence Disambiguation (WSD) to analysis political opinions using machine learning methods. Alsaeedi and Khan (2019) explored the various sentiment analysis applied to Twitter data and their outcomes using different approaches such as bigram, ngram and ensemble method. The experiments obtained 80% accuracy using n-gram and bigram approaches, whereas ensemble method obtained about 85% of classification accuracy.
Authors in the paper (Gautam and Yadav, 2014) extracted dataset from Twitter using Twitter API and made preprocessing using some ML libraries such as Scikit-learn in Python. The SVM gave better accuracy, but its long execution was the main drawback. The Naive Bayes performed satisfactorily but does not exceed expectations, on the other hand, Logistic Regression performed a bit similar to SVM and needs similar execution time as Naive Bayes. Sarlan et al. (2014) showed the importance of sentiment analysis and its use in analyzing customer opinions and increasing competition in the business market. Their paper focused on the design of sentiment analysis, extracting a vast number of tweets, to provide competitive advantages to the business and increase business revenue using ML and sentiment analysis methods. Gupta et al. (2017) built a hybrid approach consisting of two machine learning algorithms K-Nearest Neighbors (KNN) and Support Vector Machines (SVM). The experiments showed that the proposed hybrid approach achieved better results in regard to accuracy and Fmeasure. The accuracy result of hybrid approach was about 76% Where the accuracy of the measurement 71 compared to the results of KNN and SVM, which amounted to 67% each of them separately.
For our contribution to this aspect, we will make a comparison of the results of the main methods of analysis using a collection of Hajj related tweets and determine the accuracy of each method in sentiment analysis.

Methodology
Our framework explains the work from the data collection phase to the results analysis phase. We have compiled the Hajj-related tweets using the Twitter API. After keeping data that we need about tweets, we filtered out the tweets, kept the most appropriate tweets and removed the re-tweets, then polarity and subjectivity were calculated using TextBlob Python library.
We applied several ML methods such as Naïve Bayes and SVM classifier on training set using Orange data mining tool to build the classification model. The model was evaluated using the testing dataset to obtain the accuracy result for the classification model, then comparing the result for each analysis in terms of accuracy. The following subsections detail the techniques that help us in sentiment analysis. Furthermore, a clear view of the sentiment-analysis framework is illustrated below in Fig. 3.

Dataset Collection
In this phase, we generated our dataset by collecting tweets using API. API stands for Application Programming Interface. It is a tool that makes interaction with computer programs and web services easily. Many web services provide APIs for developers to interact with their services and to access data programmatically, the following steps demonstrate the process of data collection: Step 1: Getting Twitter API keys: In order to access Twitter Streaming API, we need to obtain the following information from Twitter: API key, API secret, Access token and Access token secret, therefore, we followed twitter guidelines to obtain the mentioned keys and tokens.

Dataset Preprocessing
To analyze tweets well and to achieve the project goal, it was important to clean up tweets and remove any things that were not useful and does not add much meaning to the sentence like stop words and remove any character which may hinder the analysis of these tweets such as punctuation and special symbol. For data preprocessing, we did the following steps using Python programming language version 3.6:  Read the CSV file that contains the data. In the reading process, we only care about the column that contains the tweet text and store the tweet text in the list. We need to import 'string' and 'CSV' modules  Import 'nltk' module and use it to download English stop words, where these words already collected in the corpus  Now we can pass on each tweet text within the list using a while loop and apply the cleaning commands on tweets as follow:  Import 're' module and use it to remove each "mention" and "URL" in the tweet  Split the text into smaller words and remove all the punctuation from the tweet text and remove each re-tweet "RT" from the list  The last step in the cleaning process is to remove stop words from the text, so we use a for loop to pass each word in each tweet and check whether the word is a stop word or not. if the word is not a stop word, we insert this word into a new list that contains what we need from the tweet  Lemmatization: Replaces words with their base form. For example, the words "caring" to "care"  Now, we can make a join process for all the meaningful words again and get a new pure tweet In the next step of our work, we need to use all the pre-processed tweets in the sentiment analysis process, so we have compiled all the outputs in 'txt' files using the 'stdout' command. The following Table 1 shows some statistics about data after preprocessing.

Sentiment Extraction
There are two main approaches for extracting sentiment which is the lexicon-based approach and machine-learning-based approach, their working mechanism is summarized as follows:  Lexicon-based methods use a predefined list of words where each word is associated with a specific sentiment, then each sentiment is combined by summing or averaging sentiment, or the majority of sentiment. The lexicon methods have a basic pattern which are: I. Preprocess each tweet II. Initialize a total polarity score (P) equal 0 III. Check if the token is present in a dictionary, then:  If the token is positive, p will be positive (+)  If the token is negative, p will be negative (-)  If the token is neutral, p will be neutral (0) IV. look at total polarity:  If p>0, the text is positive  If p<0, the text is negative  If p = 0, the text is neutral  Machine-learning-based Approach: in this approach we built a labeled dataset, extract features from them and then trained the algorithms on these features so that we can predict the correct label as much as possible

Features Extraction
Feature extraction process was carried out using: This technique aims to represent the number of times a given word appears in a document relative to the number of documents in the corpus that the word appears in  N-Grams: N-gram is a contiguous sequence of n terms from a given text, where n-gram of size 1 is referred to as a unigram, n-gram of size 2 is a bigram, an n-gram of size 3 is a trigram and so on For our project, we conducted two experiments using these methods, then, the results of each experiment were extracted separately. The experiments and their results were as follows:

Experiments
Experiment 1 (lexicon-based approach): In this experiment, we used the analyzer powered by TextBlob library within Python language, which analysis the polarity and subjectivity tweets text and therefore determine whether it gives a negative, positive or neutral impression. In the beginning, we pass each tweet that has already been processed to the analyzer and then the total polarity is calculated, thus giving an analysis of each tweet. Figure 4 illustrates the details of this experiment.
The result of this experiment was promising and achieved an accuracy of 0.621, it was able to predict 182 tweets correctly from 293 tweets.
Experiment 2 (Machine-learning-based Approach): In this method, we used a set of pre-processed tweets and fed them to the models to classify tweets as positive, negative or neutral based on model learning. The accuracy of these supervised algorithms and their ability to predict the correct label is then calculated as shown in Fig. 5.

Results
The dataset is divided into training and testing set using 10-Fold cross-validation. The project uses 3 algorithms for preparing and training the model (Naïve Bayes NB, Support Vector Machines SVM and k Nearest Neighbors kNN). Each machine learning algorithms is evaluated using the following well-knowns measurements in Eqs. 2-5 (Nahar, 2018;Nahar et al., 2019).
Accuracy is simply the number of correct observations to the total observations, as shown in Eq. 2: Accuracy, recall, precision and F1 score are measured for each experiment (lexicon-based approach and machine learning approach) and each classification algorithm (Naive Bayes, Support Vector Machines and k Nearest Neighbors).
Tables 2-5 show the results of applying the dataset on all classification algorithms. Also, each classification algorithm uses n-gram, a contiguous sequence of n items (unigram, bigram and trigram). Also, each classification uses Term Frequency (TF) and Term Frequency-Inverse Document Frequency (TF-IDF) technique which aims to represent the number of times a given word/phrase appears in a document relative to the number of documents in the corpus that the word appears in. Figure 6 shows evaluation and comparison between algorithms used in this study.

Discussion
After reviewing the results of the classification experiment, it was found that SVM achieved the highest accuracy rate of about 0.84 followed by NB in the second degree with an accuracy of about 0.71 and then kNN by about 0.62. According to the lexiconbased approach classification accuracy obtained was 0.66 using Python package analyzer, which is close to the highest accuracy achieved by the kNN classifier, which was 0.620 and less than the accuracy of both the NB and SVM which was 0.715 and 0.84 respectively. It was therefore concluded that the second experiment (Machine-learning-based Approach) was better and with more accurate results than the first experiment (lexicon-based approach). The results were as shown in Fig. 7.

Conclusion and Future Work
Due to the rapid and wide usage of social media, sentiment analysis has become an important topic to be studied and focused on. Emphasis should also be placed on developing methods and techniques used in the analysis process. In this study, we have done two experiments to determine the accuracy of the methods of analysis and this was done using a set of Hajj-related tweets. We made the filtering process of these data and then analyzed using the TextBlob analyzer and a collection of supervised algorithms. we have been reached a set of results that indicate that machine learning was better and achieved greater accuracy than the TextBlob Where the SVM gave the accuracy of around 0.80 while the accuracy reached by the analyzer did not exceed 0.660. For future work, it is possible to think of creating a model that integrates the two main methods of analysis and takes advantage of the set of characteristics that can be obtained through them and thus obtain higher accuracy in the process of analyzing sentiment, beside collecting more tweets when Hajj resume and after Corona pandemic ends, hopefully.