TY - JOUR AU - Fallucchi, Francesca AU - Cabroni, Alessandro PY - 2021 TI - Predicting Risk of Diabetes using a Model based on Multilayer Perceptron and Features Extraction JF - Journal of Computer Science VL - 17 IS - 9 DO - 10.3844/jcssp.2021.748.761 UR - https://thescipub.com/abstract/jcssp.2021.748.761 AB - Diabetes (diabetes mellitus) is a disease emerging when a person has a high blood sugar level for a prolonged period. In the healthcare context, one of the most important topic is the prevention of the disease. This study relates to diabetes prevention. In particular, it aims to produce random generated datasets according to a rule named Finnish Diabetes Risk Score and to test these datasets with a model based on Multilayer Perceptron and features extraction, to determine the diabetes risk. A second layer of the model produces the prediction. This classification layer bases on comparing the single unlabeled element (its features) against all labelled elements (their features), considering risk level similarities too. The health rule consider daily lifestyle and health parameters. We define random generated datasets to avoid privacy problems and to manage equally distributed data in order to control better the behavior of the applied model and to propose datasets to simplify the comparing of the behaviors for different models. Moreover, in this study we propose an initial hypothesis to test the explain ability of the model in terms of our datasets (input parameters, corresponding to health rule parameters), defining a method based initially based on Relevance Propagation, Deep Taylor Decomposition and testing elements features distribution. In this study, we obtain the generation of random datasets equally distributed in respect to the possible risk levels and with a mean distribution near to 0,5 (note that we manage normalized values) for the different input attributes. We define a MLP with no under fitting or over fitting problems. All accuracies values (in our scenario, definition of accuracy considers class similarity too because of ordered risk levels) for the overall model are greater than 0.939, with best result over 0.96 for 1500 labelled elements as training dataset.