Research Article Open Access

Mining of Social Media on Covid-19 Big Data Infodemic in Indonesia

Faisal Binsar1 and Tuga Mauritsius1
  • 1 Bina Nusantara University, Jakarta, Indonesia
Journal of Computer Science
Volume 16 No. 11, 2020, 1598-1609

DOI: https://doi.org/10.3844/jcssp.2020.1598.1609

Submitted On: 22 July 2020 Published On: 16 November 2020

How to Cite: Binsar, F. & Mauritsius, T. (2020). Mining of Social Media on Covid-19 Big Data Infodemic in Indonesia. Journal of Computer Science, 16(11), 1598-1609. https://doi.org/10.3844/jcssp.2020.1598.1609

Abstract

Covid-19 is an unprecedented disaster that is still difficult to contain. During the pandemic, there were a lot of cases that were reported to increase exponentially. In this situation, the dissemination of messages and information was very important. The social media platform has contributed as a channel of communication with unprecedented speed. However, the uncontrolled and irresponsible dissemination of information will result in new problems that can be detrimental to many parties. A lot of information may trigger panic, fear and result in lose hope and even paranoia. The provision of correct and timely information as well as any curative and preventive effort to stop the disease are very important. This study aims to present a method in finding out public opinion through Twitter social media mining in the Indonesian context. We are particularly interested in finding out what people’s stance with the pandemic. Some people may fully aware of this threat, but the remaining could be careless about what is going on.  It is assumed that this stance could lead to people’s obedience to the government’s policy on COVID 19 Protocol.  It is believed that the opinion is hidden behind the comments in the media. By scrapping the tweets on Twitter during March 2020 using Corona and COVID keywords, we obtained as many as 31,003 tweets. We manually classified the tweets into 3 classes, positive, negative and neutral stances. Predictive models are derived using Support Vector Machine, Random Forest and Naïve Bayes algorithms. Random Forest-based model gives the highest accuracy level as high as 89%, followed by Support Vector Machine as high as 87% and Naïve Bayes as high as 68%. The model can further be used to classify opinions in the future giving valuable information for the government in making policies and steps in overcoming the pandemic.

  • 101 Views
  • 62 Downloads
  • 0 Citations

Download

Keywords

  • Coronavirus Pandemic
  • Infodemic Social Media
  • Sentiment Analysis
  • Twitter Clustering
  • Data Science
  • Machine Learning