Predicting Risk of Diabetes using a Model based on Multilayer Perceptron and Features Extraction

Corresponding Author: Francesca Fallucchi Guglielmo Marconi University, 44 Plinio Street, 00193 Rome, Italy Email: f.fallucchi@unimarconi.it Abstract: Diabetes (diabetes mellitus) is a disease emerging when a person has a high blood sugar level for a prolonged period. In the healthcare context, one of the most important topic is the prevention of the disease. This study relates to diabetes prevention. In particular, it aims to produce random generated datasets according to a rule named Finnish Diabetes Risk Score and to test these datasets with a model based on Multilayer Perceptron and features extraction, to determine the diabetes risk. A second layer of the model produces the prediction. This classification layer bases on comparing the single unlabeled element (its features) against all labelled elements (their features), considering risk level similarities too. The health rule consider daily lifestyle and health parameters. We define random generated datasets to avoid privacy problems and to manage equally distributed data in order to control better the behavior of the applied model and to propose datasets to simplify the comparing of the behaviors for different models. Moreover, in this study we propose an initial hypothesis to test the explain ability of the model in terms of our datasets (input parameters, corresponding to health rule parameters), defining a method based initially based on Relevance Propagation, Deep Taylor Decomposition and testing elements features distribution. In this study, we obtain the generation of random datasets equally distributed in respect to the possible risk levels and with a mean distribution near to 0,5 (note that we manage normalized values) for the different input attributes. We define a MLP with no under fitting or over fitting problems. All accuracies values (in our scenario, definition of accuracy considers class similarity too because of ordered risk levels) for the overall model are greater than 0.939, with best result over 0.96 for 1500 labelled elements as training dataset.


Introduction
The diseases prevention is one of the topic of interest for healthcare. Diabetes mellitus is a chronic and lifelong metabolic disorder that occurs either when the pancreas does not secret enough insulin (type 1 diabetes), or when the body's cells do not respond to insulin, so having a high level of glucose in the blood (type 2 diabetes). In particular, we study a module to identify risks for type 2 diabetes for a person. Hence, in this study we are interested in diabetes prevention. It is an important issue considering significant human, economic and social costs (Perveen et al. (2019). Our work aims to define random datasets and to test them using a model based on Multilayer Perceptron (MLP) and features extraction, to determine the diabetes risk according to daily lifestyle and health parameters. These parameters are Body Mass Index (BMI), age, waist circumference, use of blood pressure medication, history of high blood glucose, physical activity, consumption of vegetables/fruits/berries and family history of diabetes. We choose a model based on MLP and a classifier based on similarity, in order to try to improve the accuracy by MLP feature extraction and some particular implementation in the classification layer (similarity between elements considering class similarities too). There are different works about this specific issue for diabetes (e.g., Xiong et al. (2019), Chandrakar et al. (2016)). We want to contribute with another possible model having these features: High level of prediction quality, initial training and testing with randomly generated data, support for future explain ability according to input attributes so to add our solution to recent studies in an analogous context (e.g., Kopitar et al. (2019)). About randomly generated data, in this way we have not privacy problems. With real data, we would need privacy consents and anyway of course we cannot consider open data, generally very useful (e.g., Fallucchi et al. (2018)). We use dataset randomly generated according to a healthcare rule named Finnish Diabetes Risk Score (FINDRISC), with the possibility to improve the model later with real data and more features too. Briefly, our contribution defines a diabetes prevention model, producing testing datasets and setting a future modality for the explain ability of predictions in respect to input attributes (features of a person). For our implementation, we use Colaboratory 1 as environment to execute our code, selecting Python™ 3 and using Tensor Processing Unit (TPU) as runtime environment. We use MLP component for features extraction with a classification component based on Cosine similarity between elements and based on diabetes risk levels similarities too. We evaluate the quality of the overall model using a definition of accuracy that consider the similarity between risk levels. We define a future implementation for explain ability of our model in term of input attributes, simply understandable by a human expert such as a Medical Doctor (MD), starting from considerations related to Layer-Wise Relevance Propagation (LRP) and Deep Taylor Decomposition (DTD) (e.g., Bach et al. (2015), Montavon et al. (2017)). Next sections organize as follows. Related work section reports some useful articles about Machine Learning (ML) techniques used in the general context of diabetes. Rule for diabetes risk section, presents FINDRISC together with the derived algorithm used to create training and testing datasets for our model. Method section describes our general solution. Accuracy section describes the accuracy definition used in our context, considering the similarity between risk levels. Architecture section describes the details of the prediction model (MLP and classification component). Experimental results section reports the results of our tests for the model from the accuracies point of view. Tables and Figures section dedicates to present also the results outlined in the paper, graphically and in tabular form. Explain ability section discusses a hypothetical solution for our model. In discussion and future work section, we outline the achieved results and our future developments.

Related Work
In Khanam et al. (2021), they use Pima Indian Diabetes (PID) dataset, testing seven ML algorithms for diabetes predictions. They use Waikato Environment for Knowledge Analysis (WEKA) tool too. The best results obtained are by using Logistic Regression (LR) and 1 https://colab.research.google.com/notebooks/welcome.ipynb Support Vector Machine (SVM). They also implemented a Neural Network (NN) with two hidden layers providing 88.6% accuracy. In Tigga et al. (2020), the study is about diabetes risk based on lifestyles and family background. They manage 952 instances produced by questionnaire about health, lifestyle and family background. They studied the behavior of different ML algorithms applied to both this new dataset and PID dataset. Most accurate performance arises for Random Forest (RF) Classifier. In Hasan et al. (2020), they use ML models trained by PID dataset. They propose a solution based on pre-processing, K-fold Cross-Validation (KCV), Grid search for hyper-parameters, in order to select the best model among different possibilities. In future work they are interested in trying to apply their work in other medical context to verify the solution in its generality. In Contreras et al. (2018), there is a review about Artificial Intelligence (AI) techniques for diabetes, considering 141 articles. They study AI techniques considering three kind of problems: Learning from knowledge, exploration and discovery of knowledge, reasoning from knowledge. In particular, about first problem, they consider also the following solutions: SVM, RF, Evolutionary Algorithm (EA), Deep Learning (DL), Naïve Bayes (NB), Decision Tree (DT) and regression algorithms. They use these categories: Blood glucose control strategies; blood glucose prediction; detection of adverse glycemic events; insulin bolus calculators and advisory systems; risk and patient personalization; detection of meals, exercise and faults; lifestyle and daily-life support in diabetes management. In Swapna et al. (2018), there is a methodological study to classify diabetic and normal Heart Rate Variability (HRV) signals by using DL. HRV signals relate to Electro Cardio Gram (ECG) signals. The architecture considers these modules: Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM) and SVM for final classification starting from features extracted by CNN and LSTM components. As kernel, there is Radial Basis Function (RBF). They implement their tests, using Graphics Processing Unit (GPU) with Tensor Flow, Keras and scikit-learn. The solution is helpful for the diabetes diagnosis using ECG signals. The accuracy is 95.7%. In Miotto et al. (2016), they produce "deep patient", a framework for modelling patients by features automatically extracted from an Electronic Health Record (EHR) dataset with DL techniques. Data are from Mount Sinai data warehouse. They process EHRs using a Deep Neural Network (DNN) based on Stacked Denoising Autoencoder (SDA). The solution is useful for different predictions, also for diabetes diseases. In Sisodia et al. (2018), they use DT, SVM and NB for predicting early-stage diabetes, with a dataset from University of California, Irvine (UCI) repository. Performance measures are precision, accuracy, F-measure, recall and Receiver Operating Curve (ROC). They test the solution, using WEKA tool. The best accuracy is for NB algorithm. In Kavakiotis et al. (2017), they study a review of Machine Learning (ML) and data mining solutions for diabetes issue considering diabetic complications, genetic background and environment, healthcare and management and prediction and diagnosis too. In this context, 85% of ML algorithms are supervised algorithms and 15% are unsupervised algorithms. SVM is the most used and it has better results. In Mercaldo et al. (2017), they work on a method for classifying patients with diabetes, using Hoeffding Tree (HT) algorithm, also known as Very Fast Decision Tree (VFDT) algorithm. They want to classify diabetes patients using the minimum features number. They consider number of times pregnant, plasma glucose concentration at 2 hours in an oral glucose tolerance test, triceps skin fold thickness, diastolic blood pressure, 2-Hour serum insulin, BMI, diabetes pedigree function, age. They use data from UCI repository, as usual. They verify if selected features are significant to determine if a patient is diabetic. They use these classification algorithms: J48, MLP, HT, JRip, Bayesian Network (BN) and RF. Classification uses WEKA tool, obtaining a precision of 0.757 and a recall of 0.762. In Erdem et al. (2012), they introduce Graph Transduction Game (GTG) for the context of graph transduction. In the tests, they use also diabetes dataset from UCI to study the behavior of GTG for comparing to other methods. In Table 1, we summarize the related work presented in this section.

Rule for Diabetes Risk
To generate random data useful to test our model, we consider a healthcare rule, in particular we choice FINDRISC 2 . As initial reference for this rule, see Lindström et al. (2003). For the validation of the rule, see Makrilakis et al. (2011) and Zhang et al. (2014). About clinical practice guidelines, see Pottie et al. (2012). Other useful articles about FINDRISC are e.g., Lindström et al. (2010) and Noble et al. (2011). The rule aims to identify high-risk individuals, without doing laboratory tests. Starting from attribute related to healthcare data and lifestyle of a person, the rule calculates the diabetes risk. We consider all five risk levels in respect to score: Very low (0-3), low (4-8), moderate (9-12), high (13-20) and very high (21-26). The attributes to consider are the following: BMI (weight (kg)/height squared (m2)) (B), age (years) (A), Waist circumference (W) (differentiating for Gender (G)), Use of 2 https://www.mdcalc.com/findrisc-finnish-diabetes-risk-score blood pressure medication (U), History of high blood glucose (H), Physical activity expressed in hours/week (P), Daily consumption of vegetables, fruits or berries (D), Family history of diabetes (F). The following algorithm calculates the score according to the rule: We generate datasets equally balanced in respect to the possible risk levels. We produce random data normalized to [0,1] and corresponding to the different input attributes used by the rule. These attributes correspond to the input for our model, while the calculated risk value corresponds to the right prediction for our model. Risk value is useful for training and validation set. It uses also to determine the quality of our model during the experimentation with a testing dataset.

Methods
After evaluating some useful papers in the context of healthcare predictions with particular interest for diabetes issue, we set our solution according to these steps: 1) Choice of a rule to produce significant random datasets 2) Choice of an MLP for features extraction 3) Definition of a classification component 4) Test of the solution using the same testing dataset, different training datasets and different number of extracted features For first, we chose FINDRISC according to the references already cited about this rule. For MLP, we followed three directions: Doing initial and general tests, considering initial MLP models described in literature and using Grid Search CV of scikit-learn tool. About the last issue, we considered MLP Classifier of scikit-learn (with 200 as max number of iterations and a dataset of 1000 elements) for the following parameters: hidden_layer_sizes (with three hidden layers with 128/256/32, 256/512/32, or 512/1024/32 neurons), learning_rate_init (0.01 or 0.1), validation_fraction (0.1 or 0.2), batch_size (50 or 100). Table 2 for the results of our interest. We combined all the considerations emerged from the three-direction analysis to choice our MLP and its parameters (see architecture section for details). Generally, our model solution considers the following steps: At the end of our work, we also define a theoretical hypothesis for explain ability in respect to input values. In Fig. 1, we briefly summarize our method.

Accuracy
For the proposed overall model, we consider this accuracy definition: This definition of accuracy is significant because of the order implicitly defined among the five levels of diabetes risks.

Architecture
In Fig. 2

S=[0.0 for h in range (0,m)] for x=0 to numberTrainElems-1: S[L[x]]+=G[l] S=S/sum(S)
where L[x] is the is the (right) risk associated to training element x and m=5 (number of risk levels) (4)

Experimental Results
From Fig. 3 to 11, we present the accuracies (in terms of the classical definition) results for MLP (training and validation), in the different scenarios considering the behavior until the 1000 epochs of training. Testing dataset has always 1750 elements, while training datasets have 1000, 1250 and 1500 elements. Training datasets are used for training validation too, according to validation Split parameter. The number of extracted features is 8, 16 and 32. As we can see, usually the curve for training accuracy is quite similar and slightly better than the curve of validation accuracy. This scenario suggests that there are not significant possible problems about over fitting or under fitting. Fig. 12 shows the results about accuracy1 (definition that it considers the similarities between classes/risk levels) for the whole model. Each curve presents the behavior of the model in terms of accuracy1 in varying the number of training elements (1000, 1250 and 1500), considering the particular number of extracted features and, as usual, with the number of testing elements set to 1750. Behaviors are similar, but the best achieves for 8 extracted features and 1500 training elements. In Fig. 13 and 14, we present the histograms representing the data distribution for the generated datasets (testing dataset of 1750 elements and training datasets of 1000, 1250 and 1500 elements). The distribution are expressed in terms of mean (average) and standard deviation for each input attribute, normalized to [0,1]. In Table 3, we present the model execution times.
We can summarize the results of our tests as follows:  Generation of random datasets equally distributed in respect to the possible risk levels and with a mean distribution near to 0,5 for the different input attributes (standard deviation is more variable)  Training of MLP (first layer of the model) with training accuracy curve slightly better than validation accuracy with no under fitting or over fitting problems  All accuracy1 values for model are greater than 0.939, with best results over 0.96 for 1500 labelled elements (training dataset) and 8 extracted features (*) attribute matching: 0-gender, 1-age, 2-BMI, 3waist circumference, 4-use of blood pressure medication, 5-history of high blood glucose, 6-physical activity, 7-consumption of vegetables/fruits/berries, 8family history of diabetes (for Fig. 13 and 14).

Explain Ability
In ML, we are interested in having good predictions but another important issue relates to explain why one model produces these predictions (e.g., Tjoa et al., 2020) and Holzinger et al. (2019). Therefore, in our work we are interested in understanding a prediction in respect to the input features data. Hence, a user could understand what the significant initial data affect the prediction, understanding new relations between input data and prediction and helping in validations of the results. This is particularly important for sensitive contexts such as healthcare, where it is fundamental a validation of results by human expert (e.g.,: MD). Starting from the research about LRP and DTD (Bach et al. (2015), Montavon et al. (2017) and Samek et al. (2019)), we define a rule to show the relevance of the single input element against the particular feature extracted from MLP. Relevance relates to the weights of the edges for the trained MLP, without biases. We also consider the weight of the different features for the classification component, using training data (known predictions for the elements) to weight the standard deviation of a feature. The idea is that if a feature has a high variation, it is more useful in establishing the distance between elements (if we consider a normalization too). Hence, it is more important for the classification. The framework under studying and implementation for explain ability of our model, bases on these forward recursive definitions:

Discussion and Future Work
Diabetes is one of the most important disease. It is very important for a MD to understand the presence of a general disease in early stages. Predicting the disease at the beginning is very important to save the life of a person. Preventing a disease is another important issue that can be very useful from healthcare, social and economic point of views. Generally, DL models can predict diseases with interesting results. Our model is a contribution in trying to provide accurate results in the context of preventive medicine. In particular, it aims to predict the risk for a person to contract the disease of diabetes so to give the possibility to a MD to suggest him better lifestyles, more health controls and laboratory tests. We trained our diabetes prevention model, using randomly generated datasets produced in this study, according to FINDRISC. Same consideration is valid for testing dataset too. This is important because in this way, we are able to start with a significant dataset kernel and without any privacy problems related to real data and consents. Of course, the model could become more useful (e.g., overcoming the usage of FINDRISC) when it retrains again, adding real data during an actual usage and when it expands with a higher number of input attributes too. The level of accuracy1 for classification component is mainly due to the quality of features values extracted from MLP. We defined a MLP with no under fitting or over fitting problems.
All accuracies values (the definition of accuracy considers class similarity too) for the overall model are greater than 0.939, with best results over 0.96 for 1500 labelled elements (training dataset). Another initial result of our research is the definition of a method to explain our model in terms of input attributes. In future work, we want to better analyze our model, so to define function (see Explainability section) in order to combine the behavior of MLP with the variability of a feature in the classification component for training dataset, before implementing and testing the explain ability.

Funding Information
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author's Contributions
Francesca Fallucchi: Designed the research plan and organized the study; coordinated the data-analysis and contributed to the writing of the manuscript.
Alessandro Cabroni: Designed the research plan and organized the study; coordinated the data-analysis and contributed to the writing of the manuscript; participated in all the experiments.