Sentimental Analysis on Health-Related Information with Improving Model Performance using Machine Learning

Wael M.S. Yafooz; Abdullah Alsaeedi

doi:10.3844/jcssp.2021.112.122

Research Article Open Access

Sentimental Analysis on Health-Related Information with Improving Model Performance using Machine Learning

Wael M.S. Yafooz¹ and Abdullah Alsaeedi¹

¹ Taibah University, Saudi Arabia

Abstract

Social media platforms are extensively used in exchanging and sharing information and user experience, thereby resulting in massive outspread and viewing of personal experiences in many fields of life. Thus, informative health-related videos on YouTube are highly perceptible. Many users tend to procure medical treatments and health-related information from social media particularly from YouTube when searching for chronic illness treatments. Sometimes, these sources contain misinformation that cause fatal effects on the users’ health. Many sentimental analyses and classifications have been conducted on social media platforms to study user post and comments on many life science fields. However, no study has been conducted on the analysis of Arabic user comments, which provide details on herbal treatments for people with diabetes. Therefore, this study proposes a model to detect and discover emotions/opinions of YouTube users on herbal treatment videos is proposed through an analysis of user comments by using machine learning classifiers. In addition, a new Arabic Dataset on Herbal Treatments for Diabetes (ADHTD), which is based on user comments from several YouTube videos, is introduced. This study examines the impact of four representation methods on ADHTD to show the performance of machine learning classifiers. These methods remove repeating characters in Arabic dialect and character extension known as ‘TATAWEEL’ or ‘MAD’, stemming of Arabic words, Arabic stop words removal and N-grams with Arabic words. Experiments has been conducted based aforementioned methods to handle imbalanced proposed dataset and identify the best machine learning classifiers over Arabic dialect textual data. The model has achieved a higher accuracy that reached 95% when using Synthetic Minority Oversampling TEchnique (SMTOE) techniques to balanced dataset than imbalanced dataset.

Journal of Computer Science

Volume 17 No. 2, 2021, 112-122

DOI: https://doi.org/10.3844/jcssp.2021.112.122

Submitted On: 5 January 2021 Published On: 22 February 2021

How to Cite: Yafooz, W. M. & Alsaeedi, A. (2021). Sentimental Analysis on Health-Related Information with Improving Model Performance using Machine Learning. Journal of Computer Science, 17(2), 112-122. https://doi.org/10.3844/jcssp.2021.112.122

Copyright: © 2021 Wael M.S. Yafooz and Abdullah Alsaeedi. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

6,637 Views
3,298 Downloads
41 Citations

Download

Keywords

Sentiment Analysis
N-gram
Support Vector Machine
Logistic Regression
SMOTE