Using Rule-Based Reasoning and Object-Oriented Methodologies to Diagnose Diabetes

: Problem statement: Diabetes mellitus or diabetes epidemic is one of the high prevalence diseases worldwide with increased number of disability, complications and death toll. An early diagnosis helps patients and medical practitioners to reduce the burden of diabetes. Approach: In this research, we propose a framework for a system using rule-based reasoning and object-oriented methodologies to diagnose both Type 1 and Type 2 diabetes. Results: Extensive literature reviews were carried out and questionnaires were distributed to medical practitioners to build the knowledge base. This knowledge base stores the rules needed to perform a diagnosis. Conclusion: This study only presents the proposed framework and not the system itself. We believe that great improvements can be provided to the medical practitioners and also the diabetics with the implementation of this system in future.


INTRODUCTION
Medical domain is a huge domain that consists of large number of expertise, knowledge and information. Gathering information and sharing expertise and knowledge has been a hassle among medical practitioners and often becomes a challenge when knowledge is not easily accessible. To overcome this problem, many researchers have utilized the Information and Communication Technology (ICT) to manage knowledge, working hand in hand with domain experts. Knowledge collected and shared can be used for prediction, prognosis and diagnosis of diseases. Chronic diseases may require tedious tests to confirm the signs and symptoms which may take few hours to few days. In such cases, to have a medical assistance such as a diagnostic system will help medical practitioners to make an early diagnosis and proceed with preliminary treatments.
One such common chronic disease that has been discussed and has a high prevalence worldwide is diabetes mellitus or simply known as diabetes. It is caused by high blood glucose that results from defects in insulin secretion, or action, or both. The statistics for diabetes in Malaysia for the year of 2006 is approximately 1.5 million and is estimated to hit 2.3 million in 2030 (Letchumanan et al., 2006;Mastura et al., 2008). The healthcare for a diabetes patient is far more expensive compared to one without diabetes. However, with proper care and management, this disease can be controlled. In Malaysia, about 14.5MYR (USD 4.75) billion was estimated for 60,000 diabetes patients per year that were registered with the Ministry of Health (Tan, 2009).
There are two main types of diabetes, namely Type 1 and Type 2. Type 1 patients are insulin-dependent whereas Type 2 patients are non-insulin dependent. Another type of diabetes is Type 3, also known as Gestational Diabetes Mellitus (GDM), a temporary condition that occurs during pregnancy due to hormonal changes (Leigh, 2010Takahashi et al., 2008. The symptoms for Type 1 diabetes are polyuria (excessive urination especially at night), polydipsia (excessive thirst), extreme hunger and weight loss, among others and as for Type 2 diabetes, patients will normally have the Type 1 symptoms and additional infections, blurred vision, slow healing of bruises and recurring diseases like skin and gum rash (Leigh, 2010) Malaysia Diabetes Association 2006. HbA1c test is generally done on patient's blood sample to determine patient's fasting plasma glucose level (tested after a patient has been fasting for a day) and two-h postprandial glucose level (patient's blood sample is tested after two h of glucose consumption). Common treatments followed by diabetes patients are controlling diet, exercising, taking medicine or insulin according to schedules and maintaining an ideal body weight.
There is no proper diagnostic system yet for diabetes in Malaysia and it often takes laboratory test results to diagnose diabetes patients, which consumes a lot of time before patients can start their medications. In the present study, a framework to diagnose both Type 1 and Type 2 diabetes integrating Rule-Based Reasoning (RBR) and Object-Oriented (OO) approaches is proposed. This was done by conducting extensive literature reviews on existing techniques in diagnosing diseases and comparing the advantages and disadvantages of the techniques.
The rest of the papers are categorized as follows: the extensively discusses the various techniques used for diagnosis, followed by explanations on the proposed framework and future work. The study is then finally concluded. In RBR, rules are constructed from the collected information to represent the knowledge base. Rules are particularly patterns that represent the knowledge (Liao, 2004). An inference engine then performs inferences by chaining through the rules recursively. RBR is popular as the rules can be easily constructed, debugged and maintained (Lin et al., 2003). Examples of diagnostic systems based on RBR are MYCIN, to diagnose blood and nervous system infections (Pandey and Mishra, 2009) and ESEDED, to diagnose eye diseases in Malaysia, namely cataract, glaucoma, conjunctivitis, dry eyes syndrome and keratitis (Ibrahim et al., 2001).
CBR is a type of KBS that retrieves similar cases based on previous cases. In medical domain, solutions, medical histories, past experiences, human expertise and knowledge are stored in databases that are used to solve new cases that arise (Liao, 2004). One of the earliest examples of CBR system is CASEY, to diagnose heart failures. The system searches for similar cases in the knowledge base and then looks for the evidence of difference between both cases and finally transfers the diagnosis to the current case (Schmidt and Gierl 2001). Other notable CBR systems are Intelligent Patient Knowledge Management System, a mobile application for physicians to access patient's information during consultation (Wilson et al., 2006) and an intelligent system to diagnose liver diseases in Taiwan (Lin, 2009). In CBR, the knowledge on existing cases and treatments are shared among medical practitioners for future similar diagnosis and the retainability of the cases help increase accuracy, however this is also a disadvantage as there may be redundancy and the growth of the case base may disrupt the performance of the system (Long, 2001;Schmidt and Gierl 2001).
The last variant of KBS is MBR, a technique that uses models to represent knowledge for the purpose of observation, prediction and evaluation (Davis and Hamscher, 1998). One example of MBR system is YAQ, a respiratory distress syndrome diagnosis system (Pandey and Mishra, 2009). This system was developed using descriptive language which eases the diagnosis of clinical states and conditions (Uckun et al., 1993). MBR is a good technique to use in modeling unexpected cases or diagnosis, but the model is good as the model itself and developers need to develop many models that may need to be cross referenced and relate to each other (Lee, 2000).
As for the ICM methods, ANN is a successful technique used for pattern matching, classification and clustering because it is considered to be highly accurate in data prediction (Papik et al., 1998). An example of ANN technique applied in the medical domain is Hacettepe System, consisting of two ANN models which were developed to reduce perinatal morbidity and mortality. One model is used to diagnose genetical disorder and the other is used to diagnose midpregnancy fetal health (Beksac et al., 1995). ANN can be used to represent complex parameter interactions and multiple variables (Mobley et al., 2000;Mobley et al., 2005), however, prediction can be difficult if the data are to be considered thoroughly as they are generally hidden (Long, 2001;Pandey and Mishra, 2009). Another system using neural network training is the skin disease diagnosis system which was used to evaluate the skin texture recognition algorithm to differentiate between healthy and non healthy skin (Abbadi et al., 2010). FL uses linguistic variables to define the system's knowledge base as a collection of fuzzy IF-THEN rules (Vitez et al., 1996). DoctorMoon is a popular system based on FL that is capable of diagnosing pulmonary tuberculosis, lung abscess, lung cancer, asthma, pneumonia and bronchiectasis. FL systems are simple and easy to design, however, the systems are not easy to maintain when large number of rules or conditions are changed as this will affect the overall performance of the systems (Vitez et al., 1996). Finally, GA is a search method based on the principles of natural selection and population genetics. It is typically an iterative procedure that generates and mutates new sample points in the search space (Whitley, 1994) using two genetic operators, namely crossover and mutation. GA was successfully used in (Podgorelec et al., 1999) to diagnose mitral valve prolapsed and (Vinterbo and Ohno-Machado, 2000) to diagnose multiple disorders among patients. GA solves optimization problem and is able to work on many different problems due to its flexibility but it is not a straightforward algorithm and thus is computationally demanding (Moorkamp, 2005).
Other popular approaches include data mining and agents. Data mining is one way of cleaning and extracting hidden, significant patterns from data (Lin, 2009). Most of the data mining methods in medical domain deploy different techniques for the diagnosis of various diseases such as Classification And Regression Tree (CART), which was used to examine the blood donor using blood transfusion dataset (Santhanam and Sundaram, 2010) and Smooth Support Vector Machine (SSVM) was proposed for classification problems to produce better performance in classifying diabetes diagnosis (Purnami et al., 2009). On the other hand, agents are software and/or hardware that are capable of acting exactingly in order to accomplish tasks on behalf of its user (Jennings and Wooldridge, 1998). Agents were used in the National electronic Library for Health (NeLH), a governmental project in United Kingdom, to offer documents over the internet for retrieving medical based information (Isern et al., 2010). Agents help to improve performance and resource limitations. As agents are still new, they are not easily extended into other domain in medical industry (Isern et al., 2010).
Work integrating KBS and ICM has also been carried out to take advantage of each technique's strength to create a better system. One such system is the web-based knowledge management and decision support system for Type 1 diabetes patients, developed using CBR and RBR techniques (Montani and Bellazzi, 1999). Forward chaining mechanism was used to identify the metabolic condition, generate suggestion for each identified problem and select the most suitable suggestion. Then the CBR technique identifies the closest case for diagnosis (Montani and Bellazzi, 1999). In 2003, the researchers included Multi-Modal Reasoning (MMR) into the system design, which is capable of extracting explicit and implicit data from the knowledge domain (Papik et al., 1998). By integrating the three methodologies, they were able to overcome the limitations of each methodology implemented separately. Unfortunately this system is only able to diagnose Type 1 diabetes mellitus and is not capable of providing explanation on the reasoning process. Other integrated examples include Pena-Reyes and Sipper, (1999) who used FL-GA approaches to diagnose Wisconsin breast cancer and Pandey and Mishra, (2009) who integrated CBR, RBR and ANN to detect and interpret electromyography based diseases, such as neuromuscular diseases.
Object-Oriented (OO) methodology has been widely used in diagnostic systems as well, especially to represent knowledge (Lin et al., 2003) and learn knowledge (Geymayr and Ebecken, 1995) as it enables data and real world concepts to be modeled in a more natural way (Diaz, 1996). An intelligent tele-diabetes system, WEBDIACIN, was modeled using OO methodology (Devamalar et al., 2008). This telediabetes system was developed to deliver health care services to individuals or community through internet, visualized using class diagrams, state charts, activity diagrams. This modeling was done at an early stage to allow an accurate estimation and tracing back the original requirements once the system is in pace. Other examples are DRAMA (Lin et al., 2003), a system developed in Taiwan to contain and process knowledge and Electronic Nursing Record System (ENRS) for clinical, reference and administrative purposes in Bundang Seoul National University Hospital (Park et al., 2007).
The summary of the advantages and disadvantages of each of the main techniques discussed are depicted in Table 1. Based on the literature reviews, it can be noted that many approaches can be considered to design and develop a diagnostic system. For our system, we intend to implement RBR and OO methodologies. RBR can be used to easily construct, debug and maintain the rules in the system. OO methodology helps to structure the whole system in a conceptual view which will ease the programming implementation.

MATERIALS AND METHODS
Figure 1 below depicts the overall framework for the proposed system. The core components in this framework are the knowledge base (database layer) and also the reasoning module (application layer), that is, the inference engine. The knowledge base are gathered from explicit knowledge from domain expert and structured into rules. The inference engine will access the knowledge base and all the rules that match the patient's symptoms will contribute to the reasoning process and then to the final conclusion. These sign-symptoms are entered by the users via the console (user interface layer). The Fig. 1 below also shows three main functionalities of the system, that is, to diagnose Type 1 and Type 2 diabetes, provide an explanation as to why a certain diagnosis is made and suggest treatment or drug prescriptions. The targeted users are physicians and patients.
Knowledge base: If x is A and y is B then z is C: As mentioned earlier, the knowledge base contains the information used to make decisions. This information presents expertise gained from top experts in the field, in the form of facts and rules. Generally rules consist of IF-THEN statements, where a given set of conditions will lead to a specified set of results. An example of a general syntax is depicted below: In this study, the knowledge base was successfully built after conducting preliminary studies for approximately a month to identify the sign-symptoms required to diagnose Type 1 and Type 2 diabetes. Extensive literature reviews of medical journals, reports and other printed materials were conducted to acquire the necessary knowledge on the diagnosis process and also in determining the important sign-symptoms. These were then used to prepare a questionnaire for the medical practitioners. The preliminary questionnaire was distributed to a medical practitioner, who recommended some questions to be rephrased for better clarity.
The final questionnaire consisted of twelve questions, covering topics related to diabetes diagnosis process, crucial symptoms and also the treatments for diabetes patients.  The questionnaires were distributed to two medical practitioners in Seremban, Malaysia and collected after a week. These findings from the domain experts were then used to build the rules for the knowledge base in the system.

Rules:
The rules required for diagnosing Type 1 and Type 2 were formally represented using a decision tree, as depicted in Fig. 2. An example of a rule based on Fig. 2

RESULTS AND DISCUSSION
The proposed system will post the appropriate questions for each type of diabetes via the user application layer (i.e., console).

Inference engine:
The inference engine is responsible to seek information and form relationships from the knowledge base and provide solutions. It determines which rules to apply to a given question, and in what order, by using information in the knowledge base. We intend to use the forward-chaining method where the engine will try to prove a rule conclusion by confirming the truth of all its premises and these premises may themselves be conclusions of other rules. Explanation sub-system: Another feature of the proposed system is its ability to explain its diagnosis, advice or recommendations and to justify why a certain action was recommended. This enables the subsystem to examine its own reasoning process.
Object-oriented approach: The proposed system will be built using the OO approach, fully utilizing the benefits of using objects and inheritance. Inheritance allows objects to be developed from existing objects by specifying how the new objects differ from the originals. Fig. 3 below shows the inheritance between the various objects related to diagnosing Type 1 and Type 2 diagnosis.
Diabetes serves as the parent class consisting of common attributes and operations (methods) needed to diagnose both types of diabetes. Two other sub-classes, namely, Type 1 and Type 2 are derived from the parent class. As the name implies, Type 1 consist of all the attributes to diagnose Type 1 diabetes whereas Type 2 is for diagnosing Type 2 diabetes.

CONCLUSION
The diagnosis of diabetes can be improved by incorporating the advancements of information technology. With that in mind, we proposed a framework to design and develop a diagnostic system that is capable of diagnosing both Type 1 and Type 2 diabetes based on the sign-symptoms provided. The system will be developed using Rule-Based Reasoning And Object-Oriented (RBR-OO) methodologies. The knowledge base required for the system has been built by carrying out literature reviews and also questionnaire surveys. In the next phase, the rules from the knowledge base will be used to develop the diagnostic system. The users will input the sign-symptoms required performing a diagnosis and the inference engine will fire the appropriate rules prior to a reasoning. The final system will also be tested by the medical practitioners and also other users (diabetic and non-diabetic).