Classification of Heart Disease Using Cluster Based DT Learning

: In the rural side, due to the absence of cardiovascular ailment centers, around 12 million people passing away worldwide reported by WHO. The principal purpose of coronary illness is a propensity of smoking. Our Cluster based disease Diagnosis (CDD) applies the ML classifiers to improve the prediction accuracy of cardiovascular diseases. For this we have taken a real Cleveland dataset from UCI. First, the ML performance is evaluated through all features. Then, the dataset is split through the class pairs through its distribution. From this class pair, the significant features are identified through entropy process. Through our CDD approach four significant features are identified from thirteen features. From this four features, the ML performance increases when compared to all other features. That is, in R F model the accuracy improves to 9.5%, SVM by 7.2% and DT model by 2.3%.


Introduction
In this study, the heart disease processed-dataset has been taken from Cleveland clinic foundation. An investigation says that Coronary illness like Intense Myocardial Dead tissue (AMI), Coronary illness (CHD), Myocardial Localized necrosis (MI) and Cardio Vascular Ailment (CVD) kill one individual at regular intervals in the USA. The expression "cardiovascular ailment" influences the corridors in the heart, which develops with a plaque and ends up restricted, diminishing the stream of blood to the heart. It can prompt chest torment or in the end a heart assault. The extraction of helpful information and mapping of concealed examples and connections from the Cleveland databases, we have to combine distinctive advances. One such is combining information mining with a measurable investigation, machine learning and database innovation. This innovation is utilized as a part of numerous territories including the medicinal administrations. Information mining strategies can be utilized viably in surgical methodology, therapeutic tests, medicine and the revelation of connections among clinical and analysis information alongside anticipating the ailments. The information will be mentioned by the specialists' objective facts and experience. The issue in the choices is that the specialist's ability is not even in each subspecialty and is in a few places as a rare asset.
We apply all 14 features of size 303 samples. The target class includes the distribution from class 0 to 4 where class 0 indicates absence of heart disease and from class 1 to 4 indicates the presence. Based on Machine Learning (ML) models, the primary attributes of heart disease are considered as cp (chest pain type), thal (normal, fixed defect, reversible defect), ca (number of major vessels), thalach (maximum heart rate) and finally, num (heart disease prediction attribute). Based on the correlation method, the attribute pair, slope and old peak takes 0.61 relation. In this study, we are going to apply ML classifiers such as Decision Tree (DT), Random Forest (RF), support vector machine SVM and Linear Model (LM). We are going to apply our cluster based DT to find its accuracy and error. Finally, we are giving our system results with the existing classifiers.

Related Systems
In this section, the proper related systems are considered. This includes ML classifiers such as decision tree, Random Forest, SVM and linear model.

A. Decision Tree
A call tree uses a treelike model of determinations and their possible results, together with happening results, asset costs and utility (Meriem and Abdelaziz, 2019). It is a system to demonstrate algorithmic approach to decide that exclusively contains restrictive administration articulations. Choice trees are typically utilized in explore, particularly in call investigation, to help to decide a method conceivably to accomplish an objective, however, are a favored instrument in machine learning. The techniques from root to leaf speak to order runs the show (Mokhtar et al., 2016). In call investigation, tree and along these lines the firmly associated impact chart region unit utilized as a distinct and logical choice help device, wherever the standard estimation of driving elective territory unit figured (Dua and Karra, 2017). Choice trees, impact graphs, utility capacities and diverse call examination apparatuses and procedures region unit educated to school kid understudies in resources of business, wellbeing political economy and general wellbeing and zone unit tests of research or administration science methodologies. For a Cleveland training samples of a data D, the decision trees are built through high entropy features. The entropy has the following form:

B. Random Forest
This ensemble model works by building a shell of call trees and acquiescent the classification. Arbitrary call for trees bent for overfitting to their training set (Othman and Azahari, 2016;Liu et al., 2017).
For each tree, the uncertainity is estimated through the standard deviation:

C. SVM
In system learning, bolster vector machines square measure directed mastering with associated getting to know calculations that wreck down statistics applied for characterization and multivariate exam. Let the dataset, represents the i th vector and n i yR  represents the target item. The linear SVM finds the optimal hyperplane, f(x) = w T x + b where, w is a dimensional coefficient vector and b is an offset. This is set by solving the succeeding optimization problem:

D. Linear Model
It is a statistical method used to characterize at least one indicator attribute. This model essentially depicts the strength of association between a dependent variable y and at least one independent variable Xi.
This model is signified as follows:

Clustering Based Disease Diagnosis
Our CDD method applies entropy on class pair distribution to identify the significant features. For this, the Cleveland heart samples are split into possible class pair sample distribution. This class based significant features helps in prediction accuracy rather whole samples feature importance. From the class pair based significant features, the ML models like DT, RF, SVM and LM performance are evaluated. At last, the comparative result has been made with significant and all features to view its performance improvements. It goes through the following steps: Step 6: Select extract its decisional features with respect to i th and j th class clustered target  Step 7: Extract the interconnecting features among the pair class (i, j). That is, (i th Class j th Class)  Significant features  Step 8: Apply the ML classifiers using significant features to evaluate its performance

Results
The experimental results are made on Intel i5 processor with Windows 10 OS and simulated in Rstudio using ML packages. Simulation results are made on raw set by setting the proportion as 70% as training, 15% as testing and 15% as validation. The Cleveland heart dataset in Table 1 contains 303 samples with 13 processed attributes and a target class distribution c ranges from 0 to 4. Our CDD model predicts a patient from 13 fields to classify whether it belongs to heart risk or not. The class-0 indicates no heart risk, whereas class-1:4 indicates heart risk. Here, class-1 indicates less severity of heart risk and class 4 indicates high severity.
The error matrix of DT, RF, SVM and LM model are presented in the following Table 2 to 5 respectively. Here, the LM model outperforms other models by attaining the accuracy level of 69% whereas, DT, SVM and RF achieves only 66.70%, 64.30% and 59.50%.
From the ML results of class wise prediction, the SVM model performs well in class-0 prediction with no error as in Table 4. Similarly, in class-1, the LM comparatively performs well with 62.50% as in Table 5; in class-2, all ML models are performed with full error as in Table 2 to V; in class-3, the SVM and LM model attains 66.70% error as in Tables 4 and 5; and finally in class-4, the DT model attains no error as in Table 2. Now, we apply our CDD approach to improve the prediction accuracy through class distribution based significant features via entropy process. Here the Cleveland heart samples have five classes, ranges from 0:4. From the below Table 6, it is clear that the class -0 has high samples as comparative to others. Hence the pairwise class samples are made with other class such as D01, D02, D03 and D04.
Here, class 0 contains 164 samples, class 1 contains 55 samples, class 2 contains 36 samples, class 3 contains 35 samples and class 4 contains 13 samples. The DT performance results are evaluated on these pair-wise samples and the same is depicted from Table 8 to 9. From this result, it is observed that the error rate decreases with the class pair high risk combinations. That is in D01 samples, the overall error rate is 20% as depicted in Table 7, whereas, in D04 samples, the overall error rate is only 4.40% as depicted in Table 10.
From D01 pairwise samples, the entropy features are selected. That is, on class-0, the cp feature and on class-1, cp, thal, ca features are identified as the decisional features through entropy. Similarly, for pairwise samples D02, D03 and D04 the respective significant features are depicted in the following Table 11.
From the above class wise Dij features, the interconnected features are extracted as, (i th Class j th Class)  cp, ca, thal and old peak. Hence, from thirteen features only four significant features are extracted. Now, the ML performance is evaluated using the significant features. When compared to all features performance, the significant features performance gets increases and the same is presented in Table 12. That is, in RF model the accuracy improves by 9.5%, SVM by 7.2% and DT model by 2.3%.

Conclusion
Identification of significant features contributes significantly in decision making. In addition, the significant features play a vital role in resource constrained devices without compromising accuracy. In this paper, we defined a CDD approach to select the significant features through pair wise class distribution in multi-labels and entropy. We demonstrated that our CDD can address both accuracy improvement and feature selection. Our evaluation of CDD on Cleveland samples selects only four features such as cp, ca, thal and old peak from thirteen features. Moreover, though CDD features, the ML performance increases when compared to all features performance. That is, in RF model the accuracy progresses to 9.50%, SVM by 7.20% and DT model by 2.30%.