RHALSA: Ranking Hotels using Aspect Level Sentiment Analysis

: Opinion play a vital role which impacts the business in large extent. Users by-and-large depend on the reviews of others to avail the services. Online hotel bookings are becoming popular because of mobile applications being used on smart phones for placing the bookings from far away in advance along with the access to their reviews on different aspects. In this work, we present a data mining algorithm to rank the hotels based on the reviews provided by the users using aspect level sentiment analysis. The proposed Ranking Hotels using Aspect Level Sentiment Analysis (RHALSA) algorithm is evaluated conducting experiments on Tripadvisor data set to demonstrate its adequacy.


Introduction
The decision making by humans largely depends on opinions of others. Technology advancements has created a huge open podium for people to express their opinion largely. Sometimes useful knowledge is gathered from these opinions generated from different people all around the world. There are three types of opinions available for any product or destination; Positive, Negative and Neutral. Positive replicates that the opinion is good, Negative replicates that the opinion is bad and Neutral replicates neither good nor bad. Sentiment analysis is one of the best practices to analyze the opinions generated. Few opinions may contain both positive and negative words in single opinion and hence such type of opinions cannot be categorized under either positive or negative opinion as a whole. Therefore, it becomes more difficult to categorize the opinions if it contains multiple sentences with difference of opinion in each sentence with respect to each aspect.
Many a times the words in opinions differ but the meaning remains the same and thus making it worth for sentiment analysis. With respect to topic/aspect, finding the orientation in the piece of text is termed as 'Sentiment Analysis'. The versatile and interdisciplinary problem of text mining and machine learning are sentiment analysis and opinion mining. The purpose of these approaches is to unearth the viewpoint of writers and to find the similar words in the given input sentences.
Reputation is one of the key and mandatory things of any business. Business owners concentrate maximum on investment in order to maintain reputation of their business. Even though the business owners provide advertisements and give offers on their product or facilities, the users depend on the text opinions available about those products or facilities for taking necessary decision. In order to maintain a good reputation, a business should have a good score through the comments or reviews from others who have already availed their services. Here, the challenging thing is to analyse and categorize the sentiment behind the words/sentences in the provided reviews with respect to the product/service. Sentiment analysis plays a vital role in categorizing the words in the given review and this can be achieved for the whole document or for the input sentence or with respect to predefined aspects. Among all these, the sentiment analysis with respect to predefined aspects achieves more accuracy than the other two. Several approaches are available in sentiment analysis: 1. Subjective Lexicon: This approach contains collection of words with weight/score which helps in categorizing the words into positive, negative or neutral 2. N-Gram modelling: It creates unigram, bigram, trigram or combination of all these from the given input dataset for categorization 3. Machine Learning: This approach is used as data prediction based on supervised and semi-supervised learning The challenge that lies in Sentiment classification is different from traditional text categorization. This makes sentiment categorization a challenging task. In traditional text categorization, documents are classified based on the requirement of user or it could be application dependent and also by topic. Whereas, in sentiment classification, there exists relatively few classes (e.g., "positive" or "3 stars" or "5 starts" etc) that generalize across many domains/products/facilities and users. The classes in text categorization may be completely unrelated or it may be related to single or multiple topics whereas, in sentiment classification the classes/labels are related, in some cases inter related and may represent opposing sentiments (e.g., "positive" or "negative"). In contrast to topic-based classification, sentiments are usually expressed in different and contrast manner which makes it difficult to identify by any of the document's terms or sentence when considered in isolation. Another difficulty in sentiment analysis is context-sensitive and domain dependent parameters. Elimination of Stopwords affects information retrieval in a positive way by reducing the additional time taken for processing and also the overall input data to noticeable extent.
Hotel business is one of the main domains where opinions play vital role which impacts business in large extent. Here the users depend largely on the reviews of others for availing the food/services. As it has become more common for the people to verify the reviews before using the facility/food of a hotel, it is in need to provide the users with more proper and accurate rating of the hotel from the reviews provided by others. This can be achieved using the Natural Language Processing approach which depicts every word in reviews to action. Our work in this paper is to compute the score of hotels based on the reviews provided by users and then rank the hotels. Since the hotel location, timings, terms and conditions, all can be verified and known through any of the available applications, we consider Cleanliness and Services of the hotel as important aspects in order to take necessary decision. These two aspects cannot be concluded just by seeing the photos of rooms/food provided by them in their respective sites. Cleanliness and Service aspects are concentrated to generate the score of hotel from the input reviews and Standard CoreNLP sentiment approach is used for sentiment analysis to generate the score values.

Motivation
A large collection of annotated corpora is essential for Natural Language Processing and Sentiment Analysis tasks. The multiple-domain sentiment classification is a challenging task and has recently received the attention of the researchers. Online booking of hotels are now a days picking up and becoming polular. The benefits of the smart phone gives the advantage of booking hotels from far away in advance from various booking sites along with hotel reviews on different aspects. These reviews are available in bulk, out of which the knowledge is needed to be extracted.

Contribution
The proposed ranking system takes hotel reviews and rank the hotels according to their top reviews from various aspects.

Organization
The remainder of the paper flows with Section-2 giving a glimpse of literature. The background work is summerized in Section-3. The definition of the problem is described in Section-4 whereas the proposed system is discussed in detail in Section-5. Ranking Hotels using Aspect Level Sentiment Analysis (RHALSA) algorithm is presented in Section-6. Simulation Results and Performance Analysis is discussed in Section-7. The entire work is summarized with Conclusions in Section-8.

Literature Survey
The general stop-list is generated without any additions and extra emotions having 1377 words (Alajmi et al., 2012). The second list, corpus based stop-list has 359 words. Based on the occurrence of words, the list is generated and the words whose occurrence is more than 25,000 times in the corpus is considered. The final result showed that first list performed better than the other two lists. Based on entropy calculation, Zheng and Gaowa (2010) provided a method for constructing stopword list for Mongolian language. For final stopword list creation, the initial list is combined with the Mongolian parts of speech. Rao and Ravichandran (2009) concluded that a word can be classified as bipolar by providing a detection problem. They introduced a graph with semi-supervised label propagation. Here, each word is represented by a node with label positive or negative, which determines its polarity. For computing trust, various statistical methods like the Beta Reputation (Jøsang and Ismail, 2002), Rating Aggregation algorithm (Wang et al., 2012) and Kalman Inference (Resnick et al., 2006) are proposed. Mining feedback comments for trust evaluation and ranking sellers are proposed in (Zhang et al., 2014). To produce ranking of most trusted agents, a computational trust representation system has been proposed and evaluated in (Wierzbicki et al., 2013).
A graph-based reputation model has been proposed in (Yan et al., 2015) which provides the social relationship reputations of users. This is achieved by discarding "bad-mouthing" opinions and identified malicious providers. A unifying framework has been proposed in (Kim and Ahmad, 2013) where as a general discussion on the impact of robustness against strategic manipulation for the usefulness of trust and reputation systems is presented in (Jøsang, 2012). With enamours experiments on Epinions.com dataset, a computational trust model similar to real-life stereotypes has been proposed in (Liu et al., 2013). A survey of classifications has been presented in (Wahab et al., 2015) highlighting on reviews for computing reputation and trust. By describing both factors, an improved reputation system (Herbrich et al., 2013) is provided.
To determine trust fraud by raising its cost and support the growth of small and medium-sized sellers, a dynamic time decay trust model has been proposed in (Zhang et al., 2013). The impact of interactivity of electronic word of mouth systems and E-Quality on decision support satisfaction has been investigated in (Yoo et al., 2015). They have also developed hypotheses using three theories, the interactivity theory, the cognition-to-action loyalty framework and the E-Quality model. A trust model has been developed in (Wang et al., 2016) for group-buying websites stickiness.
Based on hidden semi-Markov model, a reputation system has been developed in (Xiao and Dong, 2015). With respect to scores of reputation, it has been found that a seller's reputation represented by online review scores has no effect on listing price or likelihood of consumer buying (Ert et al., 2016). The student activity and learning outcomes in massive open online courses and forums have been investigated in (Coetzee et al., 2014). A study of the popular question answering website `StackOverow' has been presented in (Movshovitz-Attias et al., 2013). Brin and Page (2013) designed a Page Rank algorithm that ranks pages returned by a search engine. Phyu (2013) proposed a popularity and similarity based Page Rank algorithm to predict web page access behaviour. This model for next page prediction is a promising approach than Markov models. Schouten and Frasincar (2016) conducted a survey on aspect-level sentiment analysis and a breakdown is provided based on the type of algorithm used. Conceptcentric aspect-level sentiment analysis is identified as one of the most promising future research direction.
Mohammad AL-Smadi et al. (2016) have proposed a framework for aspect Based Sentiment Analysis (ABSA) of Hotels' reviews. Akhtara et al. (2017) developed Hotel Recommender System where reviews and metadata are crawled from website and classified into predefined classes as per some of the common aspects.

Background Work
A Sentiment Treebank is introduced by Socher et al. (2013), which includes fine grained sentiment labels. They presented Recursive Neural Tensor Network (RNTN) which out performs previous state-of-the-Art methods on several metrics. This model accurately captures the effects of negation and its scope at various tree levels for both positive and negative phrases.

Problem Definition
Given the data set based on hotel reviews provided by the users online, the problem is to design appropriate data mining algorithm. The objective is: • To Rank the Hotels based on the reviews provided by the users using Aspect Level Sentiment Analysis

Proposed System
In order to rank the hotels based on the reviews given by the users, we analyze the review comments and extract the keywords upon various aspects considered. Cleanliness and Service are the two aspects considered in our analysis as these two aspects are user independent and provide maximum and critical information about the hotels to take necessary decision and rank the hotels. Other aspects namely; Location, Rate/Price, Food, Facilities etc., are user dependent and vary from person-to-person with requirements and are not much relevant in order to rank the hotels. The system architecture is as shown in Fig. 1. The online user review comments so collected from various users are filtered to get aspect-wise review statements. The stopwords and the special characters are eliminated from these sentences to get the keywords. The aspect related sentiment scores for those keywords are generated and using which the average sentiment scores for the hotel is computed. Likewise, for all the hotels considered for analysis, their respective average sentiment scores are computed. Based on these scores, the hotels are ranked accordingly.

Algorithm
The Ranking Hotels using Aspect Level Sentiment Analysis (RHALSA) algorithm runs on the data set to rank the hotels based on the user reviews. Stopwords and special characters are eliminated from the review comments to get subjective keywords. Aspect level sentiment scores are generated using Mod_RNTN, where the average scores of all the keywords generated out of RNTN (Socher et al., 2013) is computed. By computing the average sentiment scores of all the aspects for every hotel in the data set, the hotels are ranked in descending order of their Average Sentiment Scores.

Simulation Results and Performance Analysis
The proposed algorithm is implemented with Java on a system having Intel Pentium i7 with 4GB RAM upon Windows-8 platform. The data is collected from Tripadvisor and consists of 514 review comments for 61 hotels of New Delhi. These comments are filtered to get 454 comments on Cleanliness and 407 comments on Service aspect. The algorithm is run on the preprocessed data set containing Cleanliness and Service as appropriate aspects in determining the sentiment scores of reviews effectively and ranking the hotels efficiently. The neutral class is ignored while analyzing the performance.

Sentiment Score Calculation
The Sentiment Score is calculated based on the Stanford CoreNLP Sentiment Levels and depicted as:

Result Analysis
The results of simulation on a sample data set of 20 review comments is as shown in Table 1. The review comments of respective hotels of New Delhi are categorized in to the comments on Cleanliness and Service. The C-Score represents the result of sentiment analysis of the Cleanliness review comment whereas, S-Score represents the result of sentiment analysis of the Service review comment for a particular review comment on that hotel. The H-Score shown represents the Hotel Score for that particular review comment.   The graph in the Fig. 2 shows the sentiment scores obtained for both cleanliness and service aspects in the review comments. These scores are obtained after averaging the sentiment scores so obtained for each of the review comments for respective aspects. The graph clearly indicates that there is a mixed opinion with respect to each of the aspects considered upon the same hotel. This could be because of the reason that individuals taste vary and depends upon their interest.
The graph in Fig. 3 shows the average sentiment scores of each hotel in the data set. This score is obtained after averaging the scores of both cleanliness and service scores of all the reviews of that hotel. From the graph we observe that very less hotels get positive and higher positive average scores even after getting more number of review comments.
The hotels whose SentimentScore Hotel is greater than 2 indicating positivity in their reviews are listed and as shown in Table 2. The hotels are ranked according to their SentimentScore Hotel value. Out of 61 hotels in the data set, only 27 have their SentimentScore Hotel value greater that 2 indicating overall positive comments in the reviews that they have received.
The graph in Fig. 4 shows only those hotels with their average sentiment scores greater than 2. These hotels are ranked in order and we observe from the graph that out of 61 hotels in the data set, only 27 hotels have average sentiment scores greater than 2.
Conducting several experiments and analyzing 514 review comments contained in the data set for 61 hotels, the precision, recall, F1 measure and the accuracy are computed to evaluate the experimental results so obtained. Precision is the ratio of correctly predicted positive observations to the total predicted positive observations. Recall is the ratio of correctly predicted positive observations to the all observations in actual class -yes. F1 Measure is the weighted average of Precision and Recall. Whereas, Accuracy is the ratio of correctly predicted observation to the total observations. These are computed as shown in Equations (4)-(7) respectively:    The results are validated with the evaluations metrics and are as shown in Table. 3. The accuracy of results thus obtained for proposed RHALSA algorithm is determined and compared with RNTN (Schouten and Frasincar, 2016) algorithm proposed by Richard Socher et al. and is as shown in Table 4.
The user who is in requisite of checking in to a hotels need not go through all the review comments of all those users who visited the hotel, instead he can run the proposed RHALSA algorithm to get the hotel ranked, by way of which he can decide whether to go ahead with the intended hotel or not. The proposed RHALSA algorithm outperforms achieving an accuracy of 88.89% when compared to 85.4% achieved by the existing RNTN (Socher et al., 2013) algorithm, as the keywords are extracted in the preprocessing stage and then the average scores of all the keywords are computed in our proposed RHALSA algorithm.

Conclusion
Ranking Hotels using Aspect Level Sentiment Analysis (RHALSA) algorithm is presented in this work. Based on the review comments provided by the users, keywords are extracted with respect to vaious aspects. Further, aspect level wise sentiment is analyzed and their average scores are computed to rank the hotels based on their average scores. The user who is in requisite of checking in to a hotels need not go through all the review comments of all those users who visited the hotel, instead he can run the proposed RHALSA algorithm to get the hotel ranked, by way of which he can decide whether to go ahead with the intended hotel or not. The proposed algorithm is evaluated conducting experiments on Tripadvisor data set and achieved 88.89% accuracy demonstrating its adequacy.
The RHALSA presented is limited to handle negative comments, but can also be extended to handle discourse relation which may change the orientation of the sentence. The avenue is open in this direction for consideration in future works.