Research Article Open Access

A CURE Algorithm for Vietnamese Sentiment Classification in a Parallel Environment

Vo Ngoc Phu1, Vo Thi Ngoc Tran2 and Jack Max3
  • 1 Nguyen Tat Thanh University, Vietnam
  • 2 Vietnam National University, Vietnam
  • 3 Sumatra Univesity, Thailand

Abstract

Solutions to process big data are imperative and beneficial for numerous fields of research and commercial applications. Thus, a new model has been proposed in this paper to be used for big data set sentiment classification in the Cloudera parallel network environment. Clustering Using Representatives (CURE), combined with Hadoop MAP (M) / REDUCE (R) in Cloudera – a parallel network system, was used for 20,000 documents in a Vietnamese testing data set. The testing data set included 10,000 positive Vietnamese documents and 10,000 negative ones. After testing our new model on the data set, a 62.92% accuracy rate of sentiment classification was achieved. Although our data set is small, this proposed model is able to process millions of Vietnamese documents, in addition to data in other languages, to shorten the execution time in the distributed environment

Journal of Computer Science
Volume 15 No. 10, 2019, 1355-1377

DOI: https://doi.org/10.3844/jcssp.2019.1355.1377

Submitted On: 29 January 2018 Published On: 19 April 2018

How to Cite: Phu, V. N., Ngoc Tran, V. T. & Max, J. (2019). A CURE Algorithm for Vietnamese Sentiment Classification in a Parallel Environment. Journal of Computer Science, 15(10), 1355-1377. https://doi.org/10.3844/jcssp.2019.1355.1377

  • 3,114 Views
  • 2,010 Downloads
  • 1 Citations

Download

Keywords

  • Sentiment Classification
  • Vietnamese Sentiment Classification
  • Vietnamese Sentence Sentiment Classification
  • Opinion Mining
  • Vietnamese Opinion Mining
  • Vietnamese Document Opinion Mining
  • Clustering Using Representatives
  • Cure
  • Cloudera
  • Parallel Environment
  • Parallel Network
  • Parallel Network Environment