Research Article Open Access

Sentimental Analysis on Health-Related Information with Improving Model Performance using Machine Learning

Wael M.S. Yafooz1 and Abdullah Alsaeedi1
  • 1 Taibah University, Saudi Arabia
Journal of Computer Science
Volume 17 No. 2, 2021, 112-122


Submitted On: 5 January 2021 Published On: 22 February 2021

How to Cite: Yafooz, W. M. & Alsaeedi, A. (2021). Sentimental Analysis on Health-Related Information with Improving Model Performance using Machine Learning. Journal of Computer Science, 17(2), 112-122.


Social media platforms are extensively used in exchanging and sharing information and user experience, thereby resulting in massive outspread and viewing of personal experiences in many fields of life. Thus, informative health-related videos on YouTube are highly perceptible. Many users tend to procure medical treatments and health-related information from social media particularly from YouTube when searching for chronic illness treatments. Sometimes, these sources contain misinformation that cause fatal effects on the users’ health. Many sentimental analyses and classifications have been conducted on social media platforms to study user post and comments on many life science fields. However, no study has been conducted on the analysis of Arabic user comments, which provide details on herbal treatments for people with diabetes. Therefore, this study proposes a model to detect and discover emotions/opinions of YouTube users on herbal treatment videos is proposed through an analysis of user comments by using machine learning classifiers. In addition, a new Arabic Dataset on Herbal Treatments for Diabetes (ADHTD), which is based on user comments from several YouTube videos, is introduced. This study examines the impact of four representation methods on ADHTD to show the performance of machine learning classifiers. These methods remove repeating characters in Arabic dialect and character extension known as ‘TATAWEEL’ or ‘MAD’, stemming of Arabic words, Arabic stop words removal and N-grams with Arabic words. Experiments has been conducted based aforementioned methods to handle imbalanced proposed dataset and identify the best machine learning classifiers over Arabic dialect textual data. The model has achieved a higher accuracy that reached 95% when using Synthetic Minority Oversampling TEchnique (SMTOE) techniques to balanced dataset than imbalanced dataset.

  • 0 Citations



  • Sentiment Analysis
  • N-gram
  • Support Vector Machine
  • Logistic Regression