TY - JOUR AU - Santhi, B. AU - Renuka, K. PY - 2020 TI - Study and Analysis of Prediction Model for Heart Disease Data Using Machine Learning Techniques JF - Journal of Computer Science VL - 16 IS - 3 DO - 10.3844/jcssp.2020.344.354 UR - https://thescipub.com/abstract/jcssp.2020.344.354 AB - Heart disease is the number one cause of death for all communities of individuals in advanced countries and a major problem for emerging nations too. Doctors’ availability to care for the general population could not catch up with the present demand for healthcare. So, there is a severe need for a support system to assist save individuals. With novel ML frameworks and big data repositories, our motive is to design a machine learning model to predict heart disease at the earliest, help prioritize hospital consultations and improve accuracy. For this study, several analyzes were carried out on the Cleveland heart disease data set with 303 patients records, using five different classifiers namely Support Vector Machine (SVM), Random forests, Ordinal Regression, Logistic Regression and Naïve Bayes. Feature selection using chi- squared statistical test and correct tuning of hyperparameters maximized classification accuracy of the Support vector machine (Radial basis function) from 40% to 85%. By incorporating rules based on the statistical patterns observed, the efficiency was further enhanced to 95%. On the other side, seeing it as a 5-class classification, multi-class imbalance issue was addressed using suitable sampling techniques that resulted in 96% accuracy for 5-class data. We evaluated model efficiency using k-fold cross validation and confusion matrix. This study shows that the classification accuracy could be significantly improved by balancing the dataset using sampling and by properly tuning hyperparameters after feature selection.