English Sentiment Classification using Only the Sentiment Lexicons with a JOHNSON Coefficient in a Parallel Network Environment

: Sentiment classification is significant in everyday life, such as in political activities, commodity production and commercial activities. In this survey, we have proposed a new model for Big Data sentiment classification. We use many sentiment lexicons of our basis English Sentiment Dictionary (bESD) to classify 5,000,000 documents including 2,500,000 positive and 2,500,000 negative of our testing data set in English. We do not use any training data set in English. We do not use any one-dimensional vector in both a sequential environment and a distributed network system. We also do not use any multi-dimensional vector in both a sequential system and a parallel network environment. We use a JOHNSON Coefficient (JC) through a Google search engine with AND operator and OR operator to identify many sentiment values of the sentiment lexicons of the bESD in English. One term (a word or a phrase in English) is clustered into either the positive polarity or the negative polarity if this term is very close to either the positive or the negative by using many similarity measures of the JC. It means that this term is very similar to either the positive or the negative. We tested the proposed model in both a sequential environment and a distributed network system. We achieved 87.56% accuracy of the testing data set. The execution time of the model in the parallel network environment is faster than the execution time of the model in the sequential system. Our new model can classify sentiment of millions of English documents based on the sentiment lexicons of the bESD in a parallel network environment. The proposed model is not depending on both any special domain and any training stage. This survey used many similarity coefficients of a data mining field. The results of this work can be widely used in applications and research of the English sentiment classification.


Introduction
Clustering data is to process a set of objects into classes of similar objects. One cluster is a set of data objects which are similar to each other and are not similar to objects in other clusters. A number of data clusters can be clustered, which can be identified following experience or can be automatically identified as part of clustering method. Therefore, we have studied this model in more details. To get higher accuracy of the results of the sentiment classification and shorten execution time of the sentiment classification, we did not transfer one sentence into one one-dimensional vector based on VSM (Singh and Singh, 2015;Carrera-Trejo et al., 2015;Soucy and Mineau, 2015) in both the sequential system and the distributed system. We also do not transfer one sentence into one one-dimensional vector based on many sentiment lexicons of our basis English Sentiment Dictionary (bESD). We did not transfer one document into one multi-dimensional vector based on VSM (Singh and Singh, 2015;Carrera-Trejo et al., 2015;Soucy and Mineau, 2015). We also did not transfer one document into one multi-dimensional vector based on the sentiment lexicons of our basis English Sentiment Dictionary (bESD). We create many sentiment lexicons of our basis English Sentiment Dictionary (bESD) and the valences and the sentiment polarity of the sentiment lexicons of the bESD are calculated by using the JC through a Google search engine with AND operator and OR operator. One term (one word or phrase in English) is the positive polarity if this term is very close to the positive (the term is very similar to the positive). One term is the negative polarity if this term is very close to the negative (the term is very similar to the negative). One term is the neutral polarity if this term is not very close to both the positive and the negative (the term is not very similar to both the positive and the negative).
The term is very close to the positive if a similarity measure (by using the JC) between this term and the positive polarity is greater than a similarity measure (by using the JC) between this term and the negative polarity. Thus, the term is clustered to the positive.
The term is very close to the negative if a similarity coefficient (by using the JC) between this term and the positive polarity is less than a similarity coefficient (by using the JC) between this term and the negative polarity. Therefore, the term is clustered into the negative.
The term is very close to the neutral if a similarity measure (by using the JC) between this term and the positive polarity is as equal as a similarity measure (by using the JC) between this term and the negative polarity. Thus, the term is not clustered to both the positive and the negative. The term is certainly the neutral polarity.
One sentence in English is the positive polarity if a total of terms clustered into the positive is greater than a total of terms clustered into the negative in this sentence.
One sentence in English is the negative polarity if a total of terms clustered into the positive is less than a total of terms clustered into the negative in this sentence.
One sentence in English is the neutral polarity if a total of terms clustered into the positive is as equal as a total of terms clustered into the negative in this sentence.
One document in English is the positive polarity if the number of sentences clustered into the positive is greater than the number of sentences clustered into the negative in this document.
One document in English is the negative polarity if the number of sentences clustered into the positive is less than the number of sentences clustered into the negative in this document.
One document in English is the neutral polarity if the number of sentences clustered into the positive is as equal as the number of sentences clustered into the negative in this document.
We perform the proposed model as follows: First of all, we calculate the valences of the sentiment lexicons of the bESD by using the JC through the Google search engine with AND operator and OR operator. With each sentence of one document of the testing data set, we split this sentence into the meaningful terms (meaningful words or meaningful phrases) based on the bESD. Each term in the meaningful terms of one sentence of one document of the testing data set, we calculate the sentiment score of this term based on the bESD. This term belongs to the positive group if this valence is greater than 0. The term belongs to the negative group if the sentiment value is less than 0. The term belongs to the neutral group if the sentiment score is as equal as 0. One sentence is clustered into the positive group if a total of the valences of all the meaningful terms is greater than 0 in this sentence. One sentence is clustered into the negative group if a total of the sentiment scores of all the meaningful terms is less than 0 in this sentence. One sentence is clustered into the neutral group if a total of the sentiment values of all the meaningful terms is as equal as 0 in this sentence. One document of the testing data set is clustered into the positive group if the number of the sentences clustered into the positive is greater than the number of the sentences clustered into the negative in the document. One document of the testing data set is clustered into the negative group if the number of the sentences clustered into the positive is less than the number of the sentences clustered into the negative in the document. One document of the testing data set is clustered into the neutral group if the number of the sentences clustered into the positive is as equal as the number of the sentences clustered into the negative in the document.
We perform all the above things in the sequential system firstly. To shorten execution time of the proposed model, we implement all the above things in the distributed environment secondly.
Our model has many significant applications to many areas of research as well as commercial applications as follows: 1. Many surveys and commercial applications can use the results of this work in a significant way 2. JC is used in identifying opinion scores of the English verb phrases and words through the Google search on the internet 3. The formulas are proposed in the paper 4. The algorithms are built in the proposed model 5. This survey can certainly be applied to other languages easily 6. The results of this study can significantly be applied to the types of other words in English 7. Many crucial contributions are listed in the Future Work section 8. The algorithm of data mining is applicable to semantic analysis of natural language processing 9. This study also proves that different fields of scientific research can be related in many ways 10. Millions of English documents are successfully processed for emotional analysis 11. The semantic classification is implemented in the parallel network environment We also compare this novel model's results with the latest sentiment classification models in (Agarwal and Mittal, 2016a;2016b;Canuto et al., 2016;Ahmed and Danti, 2016;Phu and Tuoi, 2014;Tran et al., 2014;Dat et al., 2017;Phu et al., 2017f;2017g;2017h) This study contains 6 sections. Section 1introduces the study; section 2 discusses the related works about the JOHNSON Coefficient (JC), etc.; section 3 is about the English data set; section 4 represents the methodology of our proposed model; section 5 represents the experiment. Section 6 provides the conclusion. The References section comprises all the reference documents; all tables are shown in the Appendices section.

Related Work
We summarize many researches which are related to our research. By far, we know that Pointwise Mutual Information (PMI) equation and Sentiment Orientation (SO) equation are used for determining polarity of one word (or one phrase) and strength of sentiment orientation of this word (or this phrase). Jaccard Measure (JM) is also used for calculating polarity of one word and the equations from this Jaccard measure are also used for calculating strength of sentiment orientation this word in other research. PMI, Jaccard, Cosine, Ochiai, Tanimoto and Sorensen measure are the similarity measure between two words; from those, we prove that the JOHNSON Coefficient (JC) is also used for identifying valence and polarity of one English word (or one English phrase). Finally, we identify the sentimental values of English verb phrases based on the basis English semantic lexicons of the basis English Emotional Dictionary (bESD).
There are the works related to PMI measure in (Bai et al., 2014;Turney and Littman, 2002;Malouf and Mullen, 2017;Scheible, 2010;Jovanoski et al., 2015;Htait et al., 2016;Wan et al., 2009;Brooke et al., 2009;Jiang et al., 2015;Brooke et al., 2009;Hernández-Ugalde et al., 2011;Ponomarenko et al., 2002;Meyer et al., 2004;Mladenović Drinić et al., 2008;Tamás et al., 2001). In the research (Bai et al., 2014), the authors generate several Norwegian sentiment lexicons by extracting sentiment information from two different types of Norwegian text corpus, namely, news corpus and discussion forums. The methodology is based on the Point wise Mutual Information (PMI). The authors introduce a modification of the PMI that considers small "blocks" of the text instead of the text as a whole. The study in (Turney and Littman, 2002) introduces a simple algorithm for unsupervised learning of semantic orientation from extremely large corpora, etc.
Two studies related to the PMI measure and Jaccard measure are in (Feng et al., 2013;An and Hagiwara, 2014). In the survey (Feng et al., 2013), the authors empirically evaluate the performance of different corpora in sentiment similarity measurement, which is the fundamental task for word polarity classification. The research in (An and Hagiwara, 2014) proposes a new method to estimate impression of short sentences considering adjectives. In the proposed system, first, an input sentence is analyzed and preprocessed to obtain keywords. Next, adjectives are taken out from the data which is queried from Google N-gram corpus using keywords-based templates.
The works related to the Jaccard measure are in (Shikalgar and Dixit, 2014;Ji et al., 2015;Omar et al., 2013;Mao et al., 2014;Ren et al., 2014;Netzer et al., 2012;Ren et al., 2011). The survey in (Shikalgar and Dixit, 2014) investigates the problem of sentiment analysis of the online review. In the study (Ji et al., 2015), the authors are addressing the issue of spreading public concern about epidemics. Public concern about a communicable disease can be seen as a problem of its own, etc.
The surveys related the similarity coefficients to calculate the valences of words are in (Phu et al., 2017a;2017c;2017d;2017e).
There are the works related to the JOHNSON Coefficient (JC) in (Choi et al., 2010;Wilk et al., 2002;Tulloss, 1997;Dalirsefat et al., 2009;Wijaya et al., 2016;Duarte et al., 1999). The authors in (Choi et al., 2010) collected 76 binary similarity and distance measures used over the last century and reveal their correlations through the hierarchical clustering technique, etc.
There are the works related to vector space modeling in (Singh and Singh, 2015;Carrera-Trejo et al., 2015;Soucy and Mineau, 2015). In this study (Singh and Singh, 2015), the authors will be Examining the Vector Space Model, an Information Retrieval technique and its variation. In this survey (Carrera-Trejo et al., 2015), the authors consider multi-label text classification task and apply various feature sets. The authors consider a subset of multi-labeled files from the Reuters-21578 corpus. The authors use traditional tf-IDF values of the features and tried both considering and ignoring stop words. The authors also tried several combinations of features, like bigrams and unigrams. The authors in (Soucy and Mineau, 2015) introduce a new weighting method based on statistical estimation of the importance of a word for a specific categorization problem. This method also has the benefit to make feature selection implicit, since useless features for the categorization problem considered to get a very small weight.
The latest researches of the sentiment classification are (Agarwal and Mittal, 2016a;2016b;Canuto et al., 2016;Ahmed and Danti, 2016;Phu and Tuoi, 2014;Tran et al., 2014;Dat et al., 2017;Phu et al., 2016). In the research (Agarwal and Mittal, 2016a), the authors present their machine learning experiments with regard to sentiment analysis in blog, review and forum texts found on the World Wide Web and written in English, Dutch and French. The survey in (Agarwal and Mittal, 2016b) discusses an approach where an exposed stream of tweets from the Twitter micro blogging site are preprocessed and classified based on their sentiments. In sentiment classification system the concept of opinion subjectivity has been accounted. In the study, the authors present opinion detection and organization subsystem, which have already been integrated into our larger question-answering system, etc.

Data Set
In Fig. 1, the testing data set includes 5,000,000 documents in the movie field, which contains 2,500,000 positive documents and 2,500,000 negative documents in English. All the documents in our English testing data set are automatically extracted from English Facebook, English websites and social networks; then we labeled positive and negative for them.

Methodology
This section comprises two parts: The first part is to create the sentiment lexicons in English in both a sequential environment and a distributed system in the sub-section (4.1). The second part is to use the sentimentlexicons with the JC to classify the documents of the testing data set into either the positive vector group or the negative vector group in both a sequential environment and a distributed system in the sub-section (4.2).
In the sub-section (4.1), the section includes three parts. The first sub-section of this section is to identify a sentiment value of one word (or one phrase) in English in the sub-section (4.1.1). The second part of this section is to create a basis English Sentiment Dictionary (bESD) in a sequential system in the sub-section (4.1.2). The third sub-section of this section is to create a basis English Sentiment Dictionary (bESD) in a parallel environment in the sub-section (4.1.3).

Fig. 1: Our English testing data set
In the sub-section (4.2), the section comprises two parts. The first part of this section is to use the sentiment-lexicons with the JC to classify the documents of the testing data set into either the positive vector group or the negative vector group in a sequential environment in the sub-section (4.2.1). The second part of this section is to use the sentiment-lexicons with the JC to classify the documents of the testing data set into either the positive vector group or the negative vector group in a distributed system in the sub-section (4.2.2).

Creating the Sentiment Lexicons in English
The section includes three parts: The first sub-section of this section is to identify a sentiment value of one word (or one phrase) in English in the sub-section (4.1.1). The second part of this section is to create a basis English Sentiment Dictionary (bESD) in a sequential system in the sub-section (4.1.2). The third sub-section of this section is to create a basis English sentiment dictionary (bESD) in a parallel environment in the sub-section (4.1.3).

Calculating a Valence of One Word (or One Phrase) in English
In this part, we calculate the valence and the polarity of one English word (or phrase) by using the JC through a Google search engine with AND operator and OR operator, as the following diagram in Fig. 2.
With (Jiang et al., 2015;Tan and Zhang, 2007;Du et al., 2010;Zhang et al., 2010), the PMI equations are used in Chinese, not English and Tibetan is also added in (Jiang et al., 2015). About the search engine, the AltaVista search engine is used in (Du et al., 2010) and (Zhang et al., 2010) and uses three search engines, such as the Google search engine, the Yahoo search engine and the Baidu search engine. The PMI equations are also used in Japanese with the Google search engine in (Wang and Araki, 2007). Feng et al. (2013) and (An and Hagiwara, 2014) also use the PMI equations and Jaccard equations with the Google search engine in English.
The Jaccard equations with the Google search engine in English are used in (Feng et al., 2013;An and Hagiwara, 2014;Ji et al., 2015). (Shikalgar and Dixit, 2014) and (Netzer et al., 2012) use the Jaccard equations in English. (Ren et al., 2014) and (Ren et al., 2011) use the Jaccard equations in Chinese. (Omar et al., 2013) uses the Jaccard equations in Arabic. The Jaccard equations with the Chinese search engine in Chinese are used in (Mao et al., 2014).
The authors in (Phu et al., 2017a) used the Ochiai Measure through the Google search engine with AND operator and OR operator to calculate the sentiment values of the words in Vietnamese. The authors in (Phu et al., 2017b) used the Cosin Measure through the Google search engine with AND operator and OR operator to identify the sentiment scores of the words in English. The authors in (Phu et al., 2017c) used the Sorensen Coefficient through the Google search engine with AND operator and OR operator to calculate the sentiment values of the words in English. The authors in (Phu et al., 2017d) used the Jaccard Measure through the Google search engine with AND operator and OR operator to calculate the sentiment values of the words in Vietnamese. The authors in (Phu et al., 2017e) used the Tanimoto Coefficient through the Google search engine with AND operator and OR operator to identify the sentiment scores of the words in English With the above proofs, we have the information as follows: PMI is used with AltaVista in English, Chinese and Japanese with the Google in English; Jaccard is used with the Google in English, Chinese and Vietnamse. The Ochiai is used with the Google in Vietnamese. The Cosin and Sorensen are used with the Google in English.
According to (Bai et al., 2014;Turney and Littman, 2002;Malouf and Mullen, 2017;Scheible, 2010;Jovanoski et al., 2015;Htait et al., 2016;Wan et al., 2009;Brooke et al., 2009;Jiang et al., 2015;Hernández-Ugalde et al., 2011;Ponomarenko et al., 2002;Meyer et al., 2004;Mladenović Drinić et al., 2008;Tamás et al., 2001), PMI, Jaccard, Cosine, Ochiai, Sorensen, Tanimoto and JOHNSON Coefficient (JC) are the similarity measures between two words and they can perform the same functions and with the same characteristics; so JC is used in calculating the valence of the words. In addition, we prove that JC can be used in identifying the valence of the English word through the Google search with the AND operator and OR operator.
With the JOHNSON Coefficient (JC) in (Choi et al., 2010;Wilk et al., 2002;Tulloss, 1997;Dalirsefat et al., 2009;Wijaya et al., 2016;Duarte et al., 1999), we have the equation of the JC: with a and b are the vectors. From the Equation 1 to 6, we propose many new equations of the JC to calculate the valence and the polarity of the English words (or the English phrases) through the Google search engine as the following equations below.
In Equation 6, when a has only one element, a is a word. When b has only one element, b is a word. In Equation 6, a is replaced by w1 and b is replaced by w2: Valence w SO JC w JC w positive query JC w negative query In Equation 7, w1 is replaced by w and w2 is replaced by position_query. We have Equation 9 as follows: In Equation 7, w1 is replaced by w and w2 is replaced by negative_query. We have Equation 10) as follows: in Google search by keyword ((not w) and negative_query). We use the Google Search API to get the number of returned results in search online Google by keyword ((not w) and negative_query) • P(w,¬negative_query): Number of returned results in the Google search by keyword (w and (not (negative_query))). We use the Google Search API to get the number of returned results in search online Google by keyword (w and (not (negative_query))) • P(¬w,¬negative_query): Number of returned results in the Google search by keyword (w and (not (negative_query))). We use the Google Search API to get the number of returned results in search online Google by keyword ((not w) and (not (negative_query))) As like Cosine, Ochiai, Sorensen, Tanimoto, PMI and Jaccard about calculating the valence (score) of the word, we identify the valence (score) of the English word w based on both the proximity of positive_query with w and the remote of positive_query with w; and the proximity of negative_query with w and the remote of negative_query with w. The English word w is the nearest of positive_query if JC(w, positive_query) is as equal as 1. The English word w is the farthest of positive_query if JC(w, positive_query) is as equal as 0. The English word w belongs to positive_query being the positive group of the English words if JC(w, positive_query) > 0 and JC(w, positive_query) ≤ 1.The English word w is the nearest of negative_query if JC(w, negative_query) is as equal as 1. The English word w is the farthest of negative_query if JC(w, negative_query) is as equal as 0. The English word w belongs to negative_query being the negative group of the English words if JC(w, negative_query) > 0 and JC(w, negative_query) ≤ 1. So, the valence of the English word w is the value of JC(w, positive_query) substracting the value of JC(w, negative_query) and the Equation 8 is the equation of identifying the valence of the English word w.
We have the information about JC as follows: • The polarity of the English word w is positive polarity If SO_JC (w) > 0. The polarity of the English word w is negative polarity if SO_JC (w) < 0. The polarity of the English word w is neutral polarity if SO_JC (w) = 0. In addition, the semantic value of the English word w is SO_JC (w).
We calculate the valence and the polarity of the English word or phrase w using a training corpus of approximately one hundred billion English words -the subset of the English Web that is indexed by the Google search engine on the internet. AltaVista was chosen because it has a NEAR operator. The AltaVista NEAR operator limits the search to documents that contain the words within ten words of one another, in either order. We use the Google search engine which does not have a NEAR operator; but the Google search engine can use the AND operator and the OR operator. The result of calculating the valence w (English word) is similar to the result of calculating valence w by using AltaVista. However, AltaVista is no longer.
In summary, by using Equation 8 to 10, we identify the valence and the polarity of one word (or one phrase) in English by using the SC through the Google search engine with AND operator and OR operator.
The comparisons of our model's benefits and drawbacks with the studies related to the JOHNSON coefficient (JC) in (Choi et al., 2010;Wilk et al., 2002;Tulloss, 1997;Dalirsefat et al., 2009;Wijaya et al., 2016;Duarte et al., 1999) are displayed in Table 2.   (Bai et al., 2014;Turney and Littman, 2002;Malouf and Mullen, 2017;Scheible, 2010;Jovanoski et al., 2015;Htait et al., 2016;Wan et al., 2009;Brooke et al., 2009;Jiang et al., 2015;Brooke et al., 2009;Hernández-Ugalde et al., 2011;Ponomarenko et al., 2002;Meyer et al., 2004;Mladenović Drinić et al., 2008;Tamás et al., 2001) Surveys Approach Advantages Disadvantages Bai et al. (2014) Constructing sentiment Through the authors'PMI The authors need to investigate this lexicons in Norwegian from computations in this survey they more closely to find the optimal a large text corpus used a distance of 100 words distance. Another factor that has not from the seed word, but it might been investigated much in the literature be that other lengths that generate is the selection of seed words. Since better sentiment lexicons. Some of they are the basisfor PMI calculation, it the authors' preliminary research might be a lot to gain by finding better showed that 100 gave a better result. seed words. The authors would like to explore the impact that different approaches to seed word selection have on the performance of the developed sentiment lexicons. Turney and Littman (2002) (2017) Graph-based user The authors describe several There is still much left to investigate in classification for informal experiments in identifying the political terms of optimizing the linguistic online political discourse orientation of posters in an informal analysis, beginning with spelling environment. The authors' results correction and working up to shallow indicate that the most promising parsing and co-reference identification. approach is to augment text Likewise, it will also be worthwhile to classification methods by exploiting further investigate exploiting sentiment information about how posters interact values of phrases and clauses, taking with each other cues from methods Scheible (2010) Anovel, graph-based The authors presented a novel approach The authors' future work will include a approach using SimRank.
to the translation of sentiment further examination of the merits of its information that outperforms SOPMI, application for knowledge-sparse an established method. In particular, languages. the authors could show that SimRank outperforms SO-PMI for values of the threshold x in an interval that most likely leads to the correct separation of positive, neutral and negative adjectives. Jovanoski et al. (2015) Analysis in Twitter for The authors' experimental results show In future work, the authors are Macedonian an F1-score of 92.16, which is very interested in studying the impact of the strong and is on par with the best results raw corpus size, e.g., the authors could for English, which were achieved in only collect half a million tweets for recent SemEval competitions.
creating lexicon sand analyzing/ evaluating the system. Moreover, the authors are interested not only in quantity but also in quality, i.e., in studying the quality of the individual words and phrases used as seeds. Htait et al. (2016) Using Web Search Engines -For the General English sub-task, Although the results are encouraging, for English and Arabic the authors' system has modest but further investigation is required, in both Unsupervised Sentiment interesting results. languages, concerning the choice of Intensity Prediction -For the Mixed Polarity English sub-positive and negative words which task, the authors' system results once associated to a phrase, they make achieve the second place.
it more negative or more positive. -For the Arabic phrases sub-task, the authors' system has very interesting results since they applied the unsupervised method only  SO-PMI for Japanese, but also modified languages. it to analyze Japanese opinions more effectively. Feng et al. (2013) In this survey, the authors Experiment results show that the Twitter No Mention empirically evaluate the data can achieve a much better performance of different corpora performance than the Google, Web1T in sentiment similarity and Wikipedia based methods. measurement, which is the fundamental task for word polarity classification. An and Hagiwara (2014) Adjective-Based Estimation The adjectives are ranked and top an In the authors' future work, they will of Short Sentence's adjectives are considered as an output improve more in the tasks of keyword Impression of system. For example, the experiments extraction and semantic similarity were carried out and got fairly good methods to make the proposed system results. With the input "it is snowy", the working well with complex inputs. results are white (0.70), light (0.49), cold (0.43), solid (0.38) and scenic (0.37) Shikalgar and Dixit (2014) Jaccard Index based In this study, the problem of predicting For future work, by using this Clustering Algorithm for sales performance using sentiment framework, it can extend it to Mining Online Review information mined from reviews is studied predicting sales performance in the and a novel JIBCA Algorithm is proposed other domains like customer and mathematically modeled. The electronics, mobile phones, computers outcome of this generates knowledge based on the user reviews posted on the from mined data that can be useful for websites, etc. forecasting sales. Ji et al. (2015) Twitter sentiment Based on the number of tweets classified No Mention classification for measuring as Personal Negative, the authors compute public health concerns a Measure of Concern (MOC) and a timeline of the MOC. We attempt to correlate peaks of the MOC timeline to the peaks of the News (Non-Personal) timeline. The authors' best accuracy results are achieved using the two-step method with a Naïve Bayes classifier for the Epidemic domain (six datasets) and the Mental Health domain (three datasets). Omar et al. (2013) Ensemble of Classification The experimental results show that the No Mention algorithms for Subjectivity ensemble of the classifiers improves the and Sentiment Analysis of classification effectiveness in terms of Arabic Customers' Reviews macro-F1 for both levels. The best results obtained from the subjectivity analysis and the sentiment classification in terms of macro-F1 are 97.13% and 90.95% respectively. Mao et al. (2014) Automatic Construction of Semantic orientation lexicon of positive No Mention Financial Semantic and negative words is indispensable Orientation Lexicon from for sentiment analysis. However, many Large-Scale Chinese News lexicons are manually created by a small Corpus number of human subjects, which are susceptible to high cost and bias. In this survey, the authors propose a novel idea to construct a financial semantic orientation lexicon from large-scale Chinese news corpus automatically... Ren et al. (2014) Sentiment Classification in In particular, the authors found that As future work, first, the authors will Under-Resourced choosing initially labeled vertices in a attempt to use a sophisticated approach Languages UsingGraph-JC or dance with their degree and to induce better sentiment features. The based Semi-supervised PageRank score can improve the authors consider such elaborated Learning Methods performance. However, pruning features improve the classification unreliable edges will make things more performance, especially in the book difficult to predict. The authors believe domain. The authors also plan to that other people who are interested in exploit a much larger amount of this field can benefit from their unlabeled data to fully take advantage empirical findings.
of SSL algorithms  -We use the sentiment-lexicons with the JC to classify one document of the testing data set into either the positive polarity or the negative polarity in both the sequential environment and the distributed system.
-The advantages and disadvantages of this survey are shown in the Conclusion section.  (Choi et al., 2010;Wilk et al., 2002;Tulloss, 1997;Dalirsefat et al., 2009;Wijaya et al., 2016;Duarte et al., 1999  DNA regions with negative co occurrences between two strains are the Silkworm, Bombyx mori indeed identical, the use of coefficients such as Jaccard and Sorensen-Dice that do not include negative co-occurrences was imperative for closely related organisms. Wijaya et al. (2016) Finding an appropriate The selection of binary similarity and dissimilarity measures No mention equation to measure for multivariate analysis is data dependent. The proposed similarity between binary method can be used to find the most suitable binary similarity vectors: case studies on and dissimilarity equation wisely for a particular data. Our Indonesian and Japanese finding suggests that all four types of matching quantities in the herbal medicines Operational Taxonomic Unit (OTU) table are important to calculate the similarity and dissimilarity coefficients between herbal medicine formulas. Also, the binary similarity and dissimilarity measures that include the negative match quantity d achieve better capability to separate herbal medicine pairs compared to equations that exclude d. Duarte et al. (1999) Comparison of similarity The employment of different similarity coefficients caused No mention coefficients based on RAPD few alterations in cultivar classification, since correlations markers in the common been among genetic distances were larger than0.86. Nevertheless, the different similarity coefficients altered the projection efficiency in a two-dimensional space and formed different numbers of groups by Tocher's optimization procedure. Among these coefficients, Russel and Rao's was the most discordant and the Sorensen-Dice was considered the most adequate due to a higher projection efficiency in a two-dimensional space. Even though few structural changes were suggested in the most different groups, these coefficients altered some relationships between cultivars with high genetic similarity. Our work -We use the sentiment-lexicons with the JC to classify one document of the testing data set into either the positive polarity or the negative polarity in both the sequential environment and the distributed system.
-The advantages and disadvantages of this survey are shown in the Conclusion section.

Creating a basis English Sentiment Dictionary (bESD) in a Sequential Environment
According to (EDL, 2017;OED, 2017;CED, 2017a;LED, 2017;CED, 2017b;MMED, 2017), we have at least 55,000 English terms, including nouns, verbs, adjectives, etc. In this part, we calculate the valence and the polarity of the English words or phrases for our basis English Sentiment Dictionary (bESD) by using the JC in a sequential system, as the following diagram in Fig. 3.
We propose the algorithm 1 to perform this section. The main ideas of the algorithm 1 are as follows: Input: the 55,000 English terms; the Google search engine Output: a basis English sentiment dictionary (bESD) Step 1: Each term in the 55,000 terms, do repeat: Step 2: By using Equation 8 to 10 of the calculating a valence of one word (or one phrase) in English in the section (4.1.1), the sentiment score and the polarity of this term are identified. The valence and the polarity are calculated by using the JC through the Google search engine with AND operator and OR operator.
Step 3: Add this term into the basis English Sentiment Dictionary (bESD); Step 4: End Repeat -End Step 1; Step 5: Return bESD; Our basis English Sentiment Dictionary (bEED) has more 55,000 English words (or English phrases) and bESD is stored in Microsoft SQL Server 2008 R2.

Creating a basis English Sentiment Dictionary (bESD) in a Distributed System
According to (EDL, 2017;OED, 2017;CED, 2017a;LED, 2017;CED, 2017b;MMED, 2017), we have at least 55,000 English terms, including nouns, verbs, adjectives, etc. In this part, we calculate the valence and the polarity of the English words or phrases for our basis English Sentiment Dictionary (bESD) by using the JC in a parallel network environment, as the following diagram in Fig. 4.
In Fig. 4, this section includes two phases: The Hadoop Map (M) phase and the Hadoop Reduce (R) phase. The input of the Hadoop Map phase is the 55,000 terms in English in (EDL, 2017;OED, 2017;CED, 2017a;LED, 2017;CED, 2017b;MMED, 2017). The output of the Hadoop Map phase is one term which the sentiment score and the polarity are identified. The output of the Hadoop Map phase is the input of the Hadoop Reduce phase. Thus, the input of the Hadoop Reduce phase is one term which the sentiment score and the polarity are identified. The output of the Hadoop Reduce phase is the basis English Sentiment Dictionary (bESD).

Fig. 4: Overview of creating a basis English Sentiment Dictionary (bESD) in a distributed environment
We build the algorithm 2 to implement the Hadoop Map phase. The main ideas of the algorithm 2 are as follows: Input: the 55,000 English terms; the Google search engine Output: one term which the sentiment score and the polarity are identified.
Step 1: Each term in the 55,000 terms, do repeat: Step 2: By using Equation 8 to 10 of the calculating a valence of one word (or one phrase) in English in the section (4.1.1), the sentiment score and the polarity of this term are identified. The valence and the polarity are calculated by using the JC through the Google search engine with AND operator and OR operator.
Step 3: Return this term; We proposed the algorithm 3 to perform the Hadoop Reduce phase. The main ideas of the algorithm 3 are as follows: Input: one term which the sentiment score and the polarity are identified -The output of the Hadoop Map phase. Output: a basis English sentiment dictionary (bESD) Step 1: Add this term into the basis English sentiment dictionary (bESD); Step 2: Return bESD; Our basis English sentiment dictionary (bEED) has more 55,000 English words (or English phrases) and bESD is stored in Microsoft SQL Server 2008 R2.

Using the Sentiment-Lexicons with the JC to Classify the Documents of the Testing Data Set into Either the Positive Polarity or the Negative Polarity
This section comprises two parts. The first part of this section is to use the sentiment-lexicons with the JC to classify the documents of the testing data set into either the positive polarity or the negative polarity in a sequential environment in the sub-section (4.2.1). The second part of this section is to use the sentiment-lexicons with the JC to classify the documents of the testing data set into either the positive polarity or the negative polarity in a distributed system in the sub-section (4.2.2).

Using the Sentiment-Lexicons with the JC to Classify the Documents of the Testing Data Set into Either the Positive Polarity or the Negative Polarity in a Sequential Environment
In Fig. 5, we use the sentiment-lexicons with the JC to classify the documents of the testing data set into either the positive polarity or the negative polarity in the sequential environment as follows. This section is performed in the sequential system as follows: Firstly, we create the sentiment lexicons of the basis English Sentiment Dictionary (bESD) based on the creating a basis English Sentiment Dictionary (bESD) in a sequential environment in (4.1.2). Each document in the documents of the testing data set, we split this document into the n sentences. Each sentence in the n sentences, we split this sentence into the m meaningful terms based on the bESD. Each term in the m terms, we identify the sentiment score of this term based on the bESD. The sentiment polarity of this sentence is based on a total of all the valences of all the terms in the sentence. This sentence is clustered into the positive if a total of all the sentiment values of all the terms clustered into the positive is greater than a total of all the sentiment scores of all the terms clustered into the negative in the sentence. This sentence is clustered into the negative if a total of all the sentiment values of all the terms clustered into the positive are less than a total of all the sentiment scores of all the terms clustered into the negative in the sentence. This sentence is clustered into the neutral if a total of all the sentiment values of all the terms clustered into the positive are as equal as a total of all the sentiment scores of all the terms clustered into the negative in the sentence. The sentiment polarity of this document is based on the number of all the sentences clustered into either the positive or the negative. The document is clustered into the positive if the number of the sentences clustered into the positive is greater than the number of the sentences clustered into the negative in the document. The document is clustered into the negative if the number of the sentences clustered into the positive is less than the number of the sentences clustered into the negative in the document. The document is clustered into the neutral if the number of the sentences clustered into the positive is as equal as the number of the sentences clustered into the negative in the document.
We propose the algorithm 4 to cluster one sentence into either the positive or the negative in the sequential system. The main ideas of the algorithm 4 are as follows: Input: one sentence Output: the sentiment polarity (positive, negative, neutral) Step 1: Split this sentence into m meaningful terms (meaningful words or meaningful phrases) based on the bESD; Step 2: Set ANumberOfPositiveValences := 0 and ANumberOfNegativeValences := 0 Step 3: Each term in the m terms, do repeat: Step 4: Valence := get valence of this term based on the bESD; Step 5: If Valence is greater than 0 Then ANumberOfPositiveValences := ANumberOfPositiveValences + Valence; Step 6: Else If Valence is less than 0 Then ANumberOfNegativeValences := ANumberOfNegativeValences + Valence; Step 7: End Repeat -End Step 3; Step 8: If ANumberOfPositiveValences is greater than ANumberOfPositiveValences Then Return positive; Step 9: Else If ANumberOfPositiveValences is less than ANumberOfPositiveValences Then Return negative; Step 10: Return neutral; We propose the algorithm 5 to cluster on document into either the positive or the negative in the sequential environment. The main ideas of the algorithm 5 are as follows: Input: one document Output: the sentiment polarity (positive, negative, neutral) Step 1: Split this document into n sentences; Step 2: Set ANumberOfPositiveSentences := 0 and ANumberOfNegativeSentences := 0 Step 3: Each sentence in the n sentences terms, do repeat: Step 4: Polarity := Algorithm 4 with the input is this sentence; Step 5: If Polarity is positive Then ANumberOfPositiveSentences := ANumberOfPositiveSentences + 1; Step 6: Else If Polartiy is negative Then ANumberOfNegativeSentences := ANumberOfNegativeSentences + 1; Step 7: End Repeat -End Step 3; Step 8: If ANumberOfPositiveSentences is greater than ANumberOfNegativeSentences Then Return positive; Step 9: Else If ANumberOfPositiveSentences is less than ANumberOfNegativeSentences Then Return negative; Step 10: Return neutral; We propose the algorithm 6 to cluster all the documents of the testing data set into either the positive or the negative in the sequential system. The main ideas of the algorithm 6 are as follows: Input: the testing data set Output: the sentiment polarity (positive, negative, neutral) Step 1: Each document in the documents of the testing data set, do repeat: Step 2: Polarity := Algorithm 5 with the input is this document; Step 3: End Repeat -End Step 1;

Using the Sentiment-Lexicons with the JC to Classify the Documents of the Testing Data Set into Either the Positive Polarity or the Negative Polarity in a Parallel System
In Fig. 6, we use the sentiment-lexicons with the JC to classify the documents of the testing data set into either the positive polarity or the negative polarity in the distributed environment as follows.
This section is performed in the parallel system as follows: Firstly, we create the sentiment lexicons of the basis English sentiment dictionary (bESD) based the creating a basis English sentiment dictionary (bESD) in a distributed system in (4.1.3). Each document in the documents of the testing data set, we split this document into the n sentences. Each sentence in the n sentences, we split this sentence into the m meaningful terms based on the bESD. Each term in the m terms, we identify the sentiment score of this term based on the bESD. The sentiment polarity of this sentence is based on a total of all the valences of all the terms in the sentence. This sentence is clustered into the positive if a total of all the sentiment values of all the terms clustered into the positive is greater than a total of all the sentiment scores of all the terms clustered into the negative in the sentence. This sentence is clustered into the negative if a total of all the sentiment values of all the terms clustered into the positive is less than a total of all the sentiment scores of all the terms clustered into the negative in the sentence. This sentence is clustered into the neutral if a total of all the sentiment values of all the terms clustered into the positive is as equal as a total of all the sentiment scores of all the terms clustered into the negative in the sentence. The sentiment polarity of this document is based on the number of all the sentences clustered into either the positive or the negative. The document is clustered into the positive if the number of the sentences clustered into the positive is greater than the number of the sentences clustered into the negative in the document. The document is clustered into the negative if the number of the sentences clustered into the positive is less than the number of the sentences clustered into the negative in the document. The document is clustered into the neutral if the number of the sentences clustered into the positive is as equal as the number of the sentences clustered into the negative in the document.
In Fig 7, we propose the algorithm 7 and the algorithm 8 to cluster one sentence into either the positive or the negative in the parallel system. This stage includes two phases:   We use the algorithm 7 to perform the Hadoop Map phase of clustering one sentence into either the positive or the negative in the parallel system. The main ideas of the algorithm 7 are as follows: Input: one sentence Output: one term which the valence is identified based on the bESD.
Step 1: Input this sentence and the bESD into the Hadoop Map in the Cloudera system.
Step 2: Split this sentence into m meaningful terms (meaningful words or meaningful phrases) based on the bESD; Step 3: Each term in the m terms, do repeat: Step 4: Valence := get valence of this term based on the bESD; Step 5: Return this term; //the output of the Hadoop Map We use the algorithm 8 to perform the Hadoop Reduce phase of clustering one sentence into either the positive or the negative in the parallel system. The main ideas of the algorithm 8 are as follows: Input: one term which the valence is identified based on the bESD -the output of the Hadoop Map Output: the sentiment polarity (positive, negative, neutral) Step 1: Receive one term; Step 2: If Valence is greater than 0 Then ANumberOfPositiveValences := ANumberOfPositiveValences + Valence; Step 3: Else If Valence is less than 0 Then ANumberOfNegativeValences := ANumberOfNegativeValences + Valence; Step 4: End Repeat -End Step 3; Step 5: If ANumberOfPositiveValences is greater than ANumberOfPositiveValences Then Return positive; Step 6: Else If ANumberOfPositiveValences is less than ANumberOfPositiveValences Then Return negative; Step 7: Return neutral; We propose the algorithm 9 to perform the Hadoop Map phase of cluster one document into either the positive or the negative in the distributed environment. The main ideas of the algorithm 9 are as follows: Input: one document Output: One sentence which the polarity is identified Step 1: Input this document into the Hadoop Map in the Cloudera system Step 2: Each sentence in the n sentences terms, do repeat: Step 4: Polarity := get the polarity of this sentence based on the clustering one sentence into either the positive or the negative in the parallel system in Fig. 7; Step 5: Return this sentence;//the output of the Hadoop Map We propose the algorithm 10 to perform the Hadoop Reduce phase of cluster one document into either the positive or the negative in the distributed environment. The main ideas of the algorithm 10 are as follows: Input: one sentence which the polarity is identified Output: the sentiment polarity (positive, negative, neutral) Step 1:Receive one sentence; Step 2: If Polarity is positive Then ANumberOfPositiveSentences := ANumberOfPositiveSentences + 1; Step 3: Else If Polartiy is negative Then ANumberOfNegativeSentences := ANumberOfNegativeSentences + 1; Step 4: If ANumberOfPositiveSentences is greater than ANumberOfNegativeSentences Then Return positive; Step 5: Else If ANumberOfPositiveSentences is less than ANumberOfNegativeSentences Then Return negative; Step 6: Return neutral; We propose the algorithm 11 to implement the Hadoop Map phase of clustering all the documents of the testing data set into either the positive or the negative in the parallel system. The main ideas of the algorithm 11 are as follows: Input: The testing data set Output: One document which the polarity is identified Step 1: Input the documents of the testing data set into the Hadoop Map in the Cloudera system; Step 2: Each document into the documents, do repeat: Step 3: Polarity := get the polarity of this document based on the clustering one document into either the positive or the negative in the distributed environment in Fig. 8; Step 4: Return this document;//the output of the Hadoop Map We propose the algorithm 12 to implement the Hadoop Reduce phase of clustering all the documents of the testing data set into either the positive or the negative in the parallel system. The main ideas of the algorithm 12 are as follows: Input: One document which the polarity is identified Output: The results of the sentiment classification of the testing data set Step 1: Receive one document; Step 2: Add the result of the sentiment classification of this document into the results of the sentiment classification of the testing data set; Step 3: Return the results of the sentiment classification of the testing data set;

Experiment
We have measured an Accuracy (A) to calculate the accuracy of the results of emotion classification. A Java programming language is used for programming to save data sets, implementing our proposed model to classify the 5,000,000 documents of the testing data set. To implement the proposed model, we have already used Java programming language to save the English testing data set and to save the results of emotion classification.
The sequential environment in this research includes 1 node (1 server). The Java language is used in programming our model related to the sentimentlexicons with the JC. The configuration of the server in the sequential environment is: Intel® Server Board S1200V3RPS, Intel® Pentium® Processor G3220 (3M Cache, 3.00 GHz), 2 GB JC3-10600 EJC 1333 MHz LP Unbuffered DIMMs. The operating system of the server is: Cloudera. We perform the proposed model related to the sentiment-lexicons with the JC in the Cloudera parallel network environment; this Cloudera system includes 9 nodes (9 servers). The Java language is used in programming the application of the proposed model related to the sentimentlexicons with the JC in the Cloudera. The configuration of each server in the Cloudera system is: Intel® Server Board S1200V3RPS, Intel® Pentium® Processor G3220 (3M Cache, 3.00 GHz), 2GB JC3-10600 EJC 1333 MHz LP Unbuffered DIMMs. The operating system of each server in the 9 servers is: Cloudera. All 9 nodes have the same configuration information.
In Table 3, we display the results of the documents in the testing data set.
The accuracy of our new model for the documents in the testing data set is shown in Table 4.
In Table 5, we present the average execution times of the classification of our new model for the documents in testing data set.

Results and Discussion
In this section, we show the results of this survey in the tables as follows: Table 3 to 5.
We show the results of the documents in the testing data set in Table 3.
The accuracy of the sentiment classification of the documents of the testing data set is presented in Table 4.
We display the average execution times of the classification of our novel model for the documents of the testing data set in Table 5.
In Table 3, we have had the 4,378,000documents of the correct classification of the testing data set comprising the 2,500,000 negative documents and the 2,500,000 positive documents. We have also had the 622,000 documents of the incorrect classification of the testing data set. The documents of the correct classification of the testing data set have comprises the 2,188,746 negative documents and the 2,189,254 positive documents. The documents of the incorrect classification of the testing data set have includes the 311,254 negative documents and the 310,746 positive documents.
In Table 4, we had achieved 87.56% accuracy of the testing data set.
In Table 5, the average time of the semantic classification of using the sentiment-lexicons with the JC in the sequential environment is 21,035,241 sec/5,000,000 English documents and it is greater than the average time of the emotion classification of using the sentiment-lexicons with the JC in the Cloudera parallel network environment with 3 nodes which is 7,485,069 sec/5,000,000 English documents. The average time of the emotion classification of using the sentiment-lexicons with the JC in the Cloudera parallel network environment with 9 nodes, which is 2,359,471 sec/5,000,000 English documents, is the shortest time. Besides, the average time of the emotion classification of using the sentiment-lexicons with the JC in the Cloudera parallel network environment with 6 nodes is 3,627,584 sec/5,000,000 English documents

Conclusion
Although our new model has been tested on our English data set, it can be applied to many other languages. In this study, our model has been tested on the 5,000,000 English documents of the testing data set in which the data sets are small. However, our model can be applied to larger data sets with millions of English documents in the shortest time.
In this study, we have proposed a new model to classify sentiment of English documents using the sentimentlexicons with the JC with Hadoop Map (M) /Reduce (R) in the Cloudera parallel network environment. With our proposed new model, we have achieved 87.56% accuracy of the testing data set in Table 6. Until now, not many studies have shown that the clustering methods can be used to classify data. Our research shows that clustering methods are used to classify data and, in particular, can be used to classify emotion in text.
The execution time of using the sentiment-lexicons with the JC in the Cloudera is dependent on the performance of the Cloudera parallel system and also dependent on the performance of each server on the Cloudera system.
The proposed model has many advantages and disadvantages. Its positives are as follows: It uses using the sentiment-lexicons with the JC to classify semantics of English documents based on sentences. The proposed model can process millions of documents in the shortest time. This study can be performed in distributed systems to shorten the execution time of the proposed model. It can be applied to other languages. Its negatives are as follows: It has a low rate of accuracy. It costs too much and takes too much time to implement this proposed model. Table 6: Comparisons of our model's advantages and disadvantages with the works in (Singh and Singh, 2015;Carrera-Trejo et al., 2015;Soucy and Mineau, 2015) Researches Approach Advantages Disadvantages Singh and Singh (2015) Examining the vector space In this study, the authors have given an The drawbacks are that the system model, an information insider to the working of vector space yields no theoretical findings. Weights retrieval technique and model techniques used for efficient associated with the vectors are very its variation retrieval techniques. It is the bare fact arbitrary and this system is an that each system has its own strengths independent system, thus requiring and weaknesses. What we have sorted separate attention. Though it is a out in the authors' work for vector space promising technique, the current level modeling is that the model is easy to of success of the vector space model understand and cheaper to implement, techniques used for information considering the fact that the system retrieval are not able to satisfy user should be cost effective (i.e., should needs and need extensive attention. follow the space/time constraint. It is also very popular. Although the system has all these properties, it is facing some major drawbacks. Carrera-Trejo et al. (2015) +Latent Dirichlet In this study, the authors consider multi-No mention allocation (LDA). label text classification tasks and apply +Multi-label text various feature sets. The authors classification tasks and consider a subset of multi-labeled files apply various feature sets.
of the Reuters-21578 corpus. The authors +Several combinations of use traditional TF-IDF values of the features, like bi-grams features and tried both considering and and uni-grams.
ignoring stop words. The authors also tried several combinations of features, like bi-grams and uni-grams. The authors also experimented with adding LDA results into vector space models as new features. These last experiments obtained the best results. Soucy and Mineau (2015) The K-Nearest Neighbors In this study, the authors introduce a new Despite positive results in some algorithm for English weighting method based on statistical settings, GainRatio failed to show that sentiment classification in estimation of the importance of a word supervised weighting methods are the Cloudera distributed for a specific categorization problem. One generally higher than unsupervised system. benefit of this method is that it can make ones. The authors believe that feature selection implicit, since useless ConfWeight is a promising supervised features of the categorization problem weighting technique that behaves considered get a very small weight.
gracefully both with and without Extensive experiments reported in the feature selection. Therefore, the work show that this new weighting authors advocate its use in further method improves significantly the experiments. classification accuracy as measured on many categorization tasks. Our work -We use the sentiment-lexicons with the JC to classify one document of the testing data set into either the positive polarity or the negative polarity in both the sequential environment and the distributed system.
-The advantages and disadvantages of the proposed model are shown in the Conclusion section. Table 7: Comparisons of our model's positives and negatives the latest sentiment classification models in (Agarwal and Mittal, 2016a;2016b;Canuto et al., 2016;Ahmed and Danti, 2016;Phu and Tuoi, 2014;Tran et al., 2014;Dat et al., 2017;Phu et al., 2017f;2017g; 2017h) Studies Approach Positives Negatives Agarwal and Mittal (2016a) The Machine Learning The main emphasis of this survey is to No mention Approaches Applied to discuss the research involved in Sentiment Analysis-Based applying machine learning methods, Applications mostly for sentiment classification at document level. Machine learning-based approaches work in the following phases, which are discussed in detail in this study for sentiment classification: (1) feature extraction, (2) feature weighting schemes, (3) feature selection and (4) machine-learning methods. This study also discusses the standard free benchmark datasets and evaluation methods for sentiment analysis. The authors conclude the research with a comparative study of some state-of-the -art methods for sentiment analysis and some possible future research directions in opinion mining and sentiment analysis. Agarwal and Mittal (2016b) Semantic Orientation-This approach initially mines sentiment-No mention Based Approach for bearing terms from the unstructured text Sentiment Analysis and further computes the polarity of the terms. Most of the sentiment-bearing terms are multi-word features unlike bag-of-words, e.g., "good movie," "nice cinematography," "nice actors," etc. Performance of semantic orientationbased approach has been limited in the literature due to inadequate coverage of multi-word features. Canuto et al. (2016) Exploiting New Sentiment-Experiments performed with a substantial A line of future research would be to Based Meta-Level Features number of datasets (nineteen) explore the authors' meta features with for Effective Sentiment demonstrate that the effectiveness of the other classification algorithms and Analysis proposed sentiment-based meta-level feature selection techniques in features is not only superior to the different sentiment analysis tasks such traditional bag-of-words representation as scoring movies or products a (by up to 16%) but also is also superior JCording to their related reviews. in most cases to state-of-art meta-level features previously proposed in the literature for text classification tasks that do not take into account any idiosyncrasies of sentiment analysis. The authors' proposal is also largely superior to the best lexicon-based methods as well as to supervised combinations of them. In fact, the proposed approach is the only one to produce the best results in all tested datasets in all scenarios. Ahmed and Danti (2016) Rule-Based Machine The proposed approach is tested by No mention Learning Algorithms experimenting with online books and political reviews and demonstrates the efficacy through Kappa measures, which have a higher accuracy of 97.4% and a lower error rate. The weighted average of different accuracy measures like Precision,Recall and TP-Rate depicts higher efficiency rate and lower FP-Rate. Comparative experiments on various rulebased machine learning algorithms have been performed through a ten-fold cross validation training model for sentiment classification.  (0). The authors combine five dictionaries into a new one with 21,137 entries. The new dictionary has many verbs, adverbs, phrases and idioms that were not in five dictionaries before. The study shows that the authors' proposed method based on the combination of Term-Counting method and Enhanced Contextual Valence Shifters method has improved the accuracy of sentiment classification. The combined method has accuracy 68.984% on the testing dataset and 69.224% on the training dataset. All of these methods are implemented to classify the reviews based on our new dictionary and the Internet Movie Database data set. Tran et al. (2014) Naive To understand the scientific values of this research, we have compared our model's results with many studies in the tables below.
In Table 6, we show the comparisons of our model's advantages and disadvantages with the works in (Singh and Singh, 2015;Carrera-Trejo et al., 2015;Soucy and Mineau, 2015) The comparisons of our model's positives and negatives the latest sentiment classification models in (Agarwal and Mittal, 2016a;2016b;Canuto et al., 2016;Ahmed and Danti, 2016;Phu and Tuoi, 2014;Tran et al., 2014;Dat et al., 2017;Phu et al., 2016) are presented in Table 7.

Author's Contributions
Vo Ngoc Phu: He conceived the original research idea. He implemented surveys. He checked, fixed and wrote the draft documents finally.
Vo Thi Ngoc Tran: He built data sets. He wrote the draft documents of our manuscripts.