ONCODIAG Select and Test (ST) Algorithm: An Approximate Clinical Reasoning Model for Diagnosing and Monitoring Breast Cancer

ion This sub-module is an input model that interfaces between the patient/user and the system. Whatever we can classify as symptoms, patient profile and manifestations are elicited, abstracted and stored as input source. The output of this module serves as input to the abduction module.


Introduction
Clinical reasoning which is often associated with machine reasoning is also considered a branch of artificial intelligence (Kishan et al., 2012). Human aided clinical reasoning is a routine and delicate task that requires clinicians to reason out likely diagnosis base on some facts provided by patient and/or other sources. Usually, while the older clinicians rely on experience built over years to deal with the challenge of clinical 2 reasoning, the younger ones are constrained to the large medical literature and theory taught in medical school. In fact, clinical teachers are often unfamiliar with the complex body of literature; and they tend to rely mainly on their practical knowledge as clinicians (Audétat, 2011). While a blend of experience and medical literature or theory may help to sustain medical diagnoses based on human reasoning, the need for supportive roles of machines through machine reasoning is becoming glaring. Aiding clinical diagnosis through machine reasoning is characterized by unavailability of optimal and accurate reasoning models and formalism for approximate reasoning techniques. To tackle the problem of inaccurate models, designing models that can bridge the gap between theory and the reality of clinical practice will help clinicians to gain a better understanding of problems (Coiera, 2003). Application of such models is known as Clinical Decision Support Systems (CDSS), whose roles include: Information management, diagnoses and patient-specific consultation (Kawamoto et al., 2005), have improved clinical practice by 69% of trials (Shortliffe, 1987). Though these models can be categorized into fuzzy logic, statistical, datadriven, mathematical, rule-base, inference model, casebased, knowledge-based problem-solving models, structural (symbolic) model, probabilistic, statistical pattern classifier, production model and prognostic models (Abu-Hanna and Lucas, 2001). This paper is an improvement on an inference model known Select and Test (ST) Fernando and Henskens, 2016a;Ramoni and Stefanelli, 1992), similar to clinical diagnostic reasoning models are characterized by dual processing (Monteiro and Norman, 2013).
Approximate clinical reasoning is necessitated by the presence of uncertainty and missing data which usually characterizes medical reasoning, clinical data and diagnosis procedure. Different techniques such as probability theory and multivalued logic have been used for approximate reasoning. These techniques are further classified into formalism such as fuzzy logic or argumentation systems, probabilistic reasoning, belief functions (Dempster-Shafer theory), possibility theory, certainty factor and Cohence inductive probabilities and Bayes theorem. . Most of these techniques are limited by their different approaches of approximate reasoning. Fuzzy logic is limited by its implication operators which does not sufficiently model medical diagnoses procedure; belief functions (Dempster-Shafer theory) and Cohence inductive probabilities are limited by their inability to capture the positive and negative relationship exists between manifesting symptoms on a diagnosis; though Certainty factor tackles this limitation, it is however, unable to deeply describe clinical diagnoses procedure; though Bayesian networks are a powerful and sound formalism that allows reasoning under uncertainty (Bayesian networks), however, such models are limited by their probabilistic disposition (assumption of conditional dependence and approximation of probabilities to one).
The focus of this paper is to bridge the gap between clinical reasoning solutions designed from the perspective of approximate reasoning and those which adopted design of reasoning models. The approach adopted in this paper is first to design an inference model which illustrates the flow of the proposed clinical reasoning procedure. Secondly, algorithms were written to further describe the applicability to the proposed inference model. Thirdly, our proposed inference model was further described using mathematical notations to present the relevance and possibility of achieving approximate reasoning based on the constraint associated with clinical reasoning. Meanwhile, this paper also included a monitoring module and an opening to relating the proposed model with an ontology knowledge base. The remaining part of the paper is organized as follows: Review of related literature was carried out and the limitations of such literature highlighted; this was followed by the improved ST inference model and the approximate clinical reasoning formalism; Furthermore, sections on the monitoring module and the proposed improve ST algorithm were presented; this was followed by the implementation section; and finally the result and evaluation cum discussion sections were outlined. We concluded the paper by restating its aim and our results.

Relate Work
The use of multiple models or process, as proposed in this paper, for managing clinical reasoning process is already gaining research interests. The research in (Hosseinzadeh and Hosseini, 2017) identified six models of clinical reasoning which included hypothetic-deductive model, pattern recognition, a dual process diagnostic reasoning model, pathway for clinical reasoning, an integrative model of clinical reasoning and model of diagnostic reasoning strategies in primary care. They observed that only one model had specifically focused on general practitioners reasoning and then suggested that there is need for model of clinical reasoning that included specific features for scaling difficulties of clinical reasoning. However, going by the models they reviewed, one model may not be sufficient to meet this need, hence the necessity for intelligent hybridization of models. CDR (2018), the authors presented a dual-process model system and aimed at curtailing the limitation of one-process model discovered in (Hosseinzadeh and Hosseini, 2017). The dual-process model consists of Type 1 (Intuitive) processes are fastused by experts most of the time. Type 2 (Rational) processes are slower, deliberate and more reliable and focus more on a hypothesis and deductive clinical reasoning (Hypothetical-Deductive Reasoning). They further stated that repetitive operation of Type 2 leads to Type 1. Type 2 processing can override Type 1 and Type 1 processing can override Type 2. This was 3 further reinforced by (Croskerry, 2009) when it stated that dual-process theory had emerged as the predominant approach, positing two systems of decision making, System 1 (heuristic, intuitive) and System 2 (systematic, analytical). The author proposes a schematic model that uses the theory to develop a universal approach toward clinical decision making. Properties of the model explain many of the observed characteristics of physicians' performance.
A similar argument was made in (Babette et al., 2013) when the authors opined that system medicine as a specialized aspect of systems biology combines in an interdisciplinary approach all expertise necessary to decipher the human body in all its complexity. The new initiative aims for the integration of molecular data, anatomical, physiological, environmental and lifestyle data in a predictive model approach called the 'virtual patient'. The benefit of this multiple process model is that it will allow the clinician to predict and anticipate the optimal treatment for the individual patient. Application of the virtual patient model will allow truly personalized medicine. In a separate study, Daniel et al. (2016) hybridized models to form a cognitive engineering technique known as work domain analysis which was implemented to provide a framework for uncovering the relationship between diagnosis, complex health systems and theories of cognition and reasoning. The resulting model of diagnosis provides a comprehensive, novel perspective of the diagnostic process that offers a new foundation to formulate empirical inquiries about diagnosis and provides new avenues for the design and development of health information technologies, assessment strategies and diagnosis-centered simulation paradigms. Stausberg and Person (1999), the paper presented a model-based approach to diagnostic reasoning in medicine. A process model is defined on the levels of static elements, dynamic elements and reasoning control. Static elements, facts, hypotheses and different types of disease knowledge, are identified and variations relevant for hypotheses generation are described. Dynamic elements correspond to actions, which in turn modify static elements, but are also controlled and started by the expressions of the static elements. The presented model could serve as a basis for an implementation in a model-based and process-oriented decision-support system. While the work in (Stausberg and Person, 1999) might demonstrate a form of multipleprocess model, we observed that it is not. Hence, this paper is focused on improving the multi-inference (model) clinical diagnostic model in  named Select and Test (ST).
Building medical reasoning algorithms also requires a supportive knowledge base to aid the reasoning process. Formalizing such knowledge base in structured and machine meaningful ontology formalisms (Simple Knowledge Organization System [SKOS]) remains a research interest in artificial intelligence. Oftentimes, information that may aid reasoning may be silo in different databases in an unstructured format. One relevant knowledge silo is the social media which users (and even patients) openly discuss relevant issues. Practical approaches like (Mike et al., 2017;2018) have used Patient Authored Text, statistical and linguistics methods to build a French Consumer Health Vocabulary on breast cancer to aid medical diagnostic reasoning. Similarly, authors in (Alsane et al., 2018) formulated an approach for mining and further representation of medical knowledge from existing studies into formalism that will assist Clinical Decision Support System (CDSS) usage in-flight medical emergency management.
The fuzzy nature of reasoning and knowledge representation in medicine has raised the need for more research in development of approximate reasoning algorithms. For example, Rakus-Andersson (2009) argued that if the biological index of a patient has risen to riskedlevel, there may be no need for surgery. Hence, the authors proposed a means for evaluating such biological index through the incorporation of fuzzy set in the approximate algorithms. This, they argued, will detect patient's clinical symptom levels, pathologically heightened levels that indicate the presence of a disease possible to recover by surgery. Similarly, fuzzy-like approximate medical reasoning algorithm which was based on the logical inferences of (Ramoni and Stefanelli, 1992), was presented by . The approximate reasoning algorithm used the abduction, deduction and induction logical inferences in creating an inference model. Though the inference model was argued to achieve a desirable accuracy in clinical reasoning, it is however, limited by its approach to approximate reasoning and knowledge representation. This paper therefore seeks to improve on inference model proposed by .

The Improved Select and Test (ST) Inference Model
In this section, the improved ST model, based on models of Oyelade et al., 2017a) and the supportive models of the intelligent personal agents with that of the monitoring agents are discussed. Meanwhile, a model typical to the application of the resulting improved ST model is also discussed in this section. Fernando and Henskens (2016b), the authors designed the ST model which consists of three inferences (abduction, deduction and induction) and input collection sub-module (abstraction). In this paper, we build on the model by enhancing its reasoning process through semantic web inference making tools and models. Furthermore, a monitoring agent was built into the modified model, with the redesigned model enabled for portability of knowledge representation in ontology forms. This model is illustrated in Fig. 1 and consists of five major components. These are the abstraction, Abduction, deduction, induction, monitoring agent and the ontology knowledge base.

Abstraction
This sub-module is an input model that interfaces between the patient/user and the system. Whatever we can classify as symptoms, patient profile and manifestations are elicited, abstracted and stored as input source. The output of this module serves as input to the abduction module.

Abduction
The overall aim of this logical inference making is to get all the diagnosis related to some given symptoms. It involves determining all likely diagnoses related to the reported symptoms. These differential diagnoses are generated by finding all related diagnosis with respect to each symptom in a set of symptoms found. The output of this module is becomes the input to the deduction sub-module.

Deduction
For each likely diagnosis from the previous stage, all the expected symptoms of such diagnosis are drawn out based on the result of a logical inference process. The output of this module is sent as input to the abduction module, while receiving more inputs from the abstraction module.

Induction
Abduction-deduction-abstraction forms a cyclic pattern which performs the process of clinical findings it carries out differential diagnoses. The result of the cyclic refinement of likely diagnoses is stored. The aim of inference by induction involves matching the acceptable criteria of each diagnosis already elicited for consideration, with their corresponding criteria according to standards or clinical protocols. This will enable the reasoning process isolate diagnoses with most likelihood of existence in patient, based on manifestations and symptoms presenteda process called clinical decision making. Meanwhile, it is at the induction sub-module that data gathered by the monitoring module is used to aid the process of clinical decision making.

Monitoring Agent
The monitoring module is encased within the ST model. It monitors some activities/events in the patient and logs information gathered to support the task of the induction sub-module.

Diagnoses and Symptoms Mapping
While the knowledge base of (Fernando and Henskens, 2016b) was modeled in tabular form, that of (Oyelade et al., 2017b) was modeled as bipartite graph. This paper promotes enrichment of inference making in during the clinical reasoning process. Hence, an ontology approach for knowledge representation was crafted into the model in Fig. 1 that is improvement on the original model presented in (Fernando and Henskens, 2016b). Similarly, in Fig. 2 is a block diagram representing Fig. 1 and it demonstrates an outlined description and flow of the prosed ST model.

Formalism for Clinical Reasoning for the Improved ST Algorithm
In this section, we present the formalism for approximation of the clinical reasoning process of the improved Select and Test (ST) algorithm. Our approach for approximate reasoning improves on that presented by . Note that relation differs from functional relations by their mapping strategy. While the later maps elements of one set (D) onto exactly one element in the other (S), the later permits multiple mapping onto the other set (B). Contrary to  which simply model functional relationship, we introduced relation of sets D and S which denotes sets of Diagnosis and Symptoms respectively. Meanwhile, let's assume we have a set of diagnosis D = {d1,…dn} and set of symptoms S = {s1,…sm}. The clinical relationship that usually exists between set of D and S is that of dependent entities in D relating with independent entities in S respectively. Recall that the element of S may also relate with another di even though it relate with some dk. Hence, we termed the each element of D as dependent variables while the elements in S are independent variables. Note that only a composition of some sx will confirm the existence/occurrence of a particular dy. Furthermore, we introduce another set A which describes the likely attributes of each sx in S, that relates to a particular dy in D. Consider a dy (breast cancer) and an attribute sx (nipple discharge). The likely attributes of sx relating to dy are frequency of nipple discharge, duration of nipple discharge, associated manifestation of nipple discharge, commencement of nipple discharge. Figure 3a presents the illustration between sets D, S and Fig. 3b captures the mapping of elements in S to A. Considering the forgoing, there exist a relation between D and S which we simply call a relation rather than functional relationship as claimed in . The justification for this renaming follows from definition of the two concepts as it applies to clinical reasoning. Functional relationship is considered to hold between D and S if dy maps/relates with only one sx, while a relation will require that dy may map unto some sx. Clinically speaking, if we have a set of all known symptoms and diagnosis, we will not obtain a functional relationship between D and S, but simply a relation. However, conjugations of all known symptoms, say KS, with respect to a diagnosis in D will result into a functional relationship between D and KS. Figures 3c and 3d illustrates the total relation D ↦ S and partial function D ⇸ KS.  The last paragraph described the set comprehension of elements that constitutes sets D, S and A. However, it is clinically established that D, S and A can be described in a sense of degree of occurrence or manifestation. For example, consider any arbitrary dx in D, say dx is breast cancer. Such dx can be described as an ailment that has advanced to a particular critically (just as the advancement of breast cancer is usually described in stages, e.g., Stage1, Stage2a and so on). Similarly, it is necessary to quantify any symptom sy observed to have been an effect to the cause dx. Hence, we shall describe the specification of degree/quantity of dx and sy by Q(dx) and Q(sy) respectively. Furthermore, we describe the possible relationship that could exist between the quantity of Q(sy) required to result into quantity of Q(dx) to be Q(dx|sy). Also the attributes of sy may be described by some quantification that may be denoted by Q(sy | [a1 … an]). The outcome of Q with respect to any of its argument is bounded from 0 to 1, i.e., [0, 1]: The absence of any sy, that is if Q(sy) = 0, then there cannot be Q(dx|sy) and Q (sy|[a1…an]). This then necessitates equation 2 which establishes the presence of sy for the confirmation of the diagnosis of dx to be reinforced. So, dx can be ruled out or confirm by: If Equation 2 evaluates to 0, then dx is completely ruled out while its evaluation to 1 confirms the diagnosis. We observe that even if some sy can evaluate P(…) into 1, Equation 3 must be established: Equation 3 approximates the clinical reasoning for confirming dx. We define a relation diagR: D↦S or a functional relation diagF: D⇸KS, where a ksy in KS is an element which is a conjugation/agglomeration of some sy that relates with a dx in D. In equation 4, we present a mathematical function based diagF given some observed symptoms s1…sn conjugated in S: Note that wy denote the weight of sy in dx, while cy denotes the observed severity of sy. While severity cy may be defined/quantified by a clinically acceptable scaling system in the range [0, 1], weight wy is given by Equation 6, where BI(dx) is the biological impact of sy with respect dx. In addition, BI(dx) may be defined as the degree of biological impact of sy in aiding the existence or spread of dx: Considering Equations 1-6 and some definitions made in this section, the function diagF(S) can further be defined by Equation 7: The three (3)  In conclusion, observe that it is not every sy in S that may have a positive functional relationship with the dx concluded in rule 3. Such sy with negative functional relationship with dx are implicitly excluded while those with positive functional relationship are absorbed in rule 3.

Patient Monitoring module
Recall that the aim of this paper is to improve the ST model through the redesign of the model and addition of personalized diagnosis and monitoring. Therefore, this section presents two critical components for monitoring patients and gathering of information to aid the personalization of diagnoses process of the application of the expert system to the patient. First, we present the monitoring model, targeted for capturing relevant events related to patients' lifestyle, in Fig. 4 which has the knowledge representation schema to the right and the monitoring process to the left. Monitoring in this context means a procedure for making findings which could augment the plausibility of the diagnoses process of ST model. Basically, four sub-processes which includes event monitoring, event selector, data gathering and data formalizer, all controls the monitoring process proposed in this paper. Furthermore, the output data/knowledge of the four sub-process represented in Spatial-Temporal-Thematic (STT) format in ontology file. The event monitor receives information from the intelligent personal agent (discussed later) and then sends it to the event selector which appropriately identifies some characteristics of information desired. The data gathering and reasoning faculty conceptualizes the required data base on the event being captured. The last component then formalizes the data in a STT representation.

ONCODIAG ST Algorithms
In this section, an improved version of algorithm in (Oyelade et al., 2018) is presented. Algorithm 1 contains the demonstration of the models in Fig. 4; algorithm of the proposed model is captured in algorithms 2 through 5. The ONCObc-ST algorithm in Algorithm 6 combines the five algorithms described in Algorithms 2-5 and also lists the expected input and output of the algorithm. Within the body of the algorithm are sets initially declared as empty. Meanwhile, before the call to the Abduction(), Deduction() and Abstraction() sub-modules, the SymptomsAlreadyElicited and the SymptomsFound sets are populated by the PresentingSymptoms set which itself derives its elements through the input model in (Bayesian networks).

Implementation of the Inference Model for Clinical Reasoning
In this section, we used the breast cancer knowledge base developed in (Oyelade et al., 2017b;2018) in the implementation of the algorithms described in section 6. In the meantime, we first present the criteria for deciding the criticality (diseases staging) of the diagnosis (breast cancer) base on Fig. 2 (induction section) and algorithm 5. This criterion is computed using equation 8 and described by Table 1    For each of the parameter in Table 1, the patient is expected to enter the likelihood value or Symptom Weight: to describe approximation of each parameter as felt by the patient. Algorithm 6 then compares the summed Symptom Weight of patient with that of the summed Symptom Weight for each stage on Table 1 and computes the Criticality or staging of diagnosis using the summed Symptom Weight collected. Java programming language was used for the implementation of the Algorithm 6 as shown by a code snippet in Fig. 5 and the application interface in Fig. 6.

Results and Discussion
This research considered it appropriate to employ the medical diagnostic metrics state by Šimundić (2008). The author stated that diagnostic accuracy of any diagnosis test gives us an answer to the following question: How well this test discriminates between certain two conditions of interest (health and disease)? It is this discriminative ability that this paper measures by measuring diagnostic accuracy: True Positive (TP), True Negative ( The area under the ROC curve (AUC): By both FPF and TPR yields a coordinate (x, y), where TPR = TP/(TP+FN) and FPR = FP/(FP+TN). Hence ROC point is given by (FPF, TPR). Consider two tests/diagnosis X and Y with false positive rates (FPR) 0.95 and 0.88 respectively. If the region that is clinically relevant in the AUC is at low FPRs, then diagnosis Y is preferable to diagnosis Y even though the ROC area of X is greater than Y.
Sensitivity, specificity and accuracy are described in term of number of true positive assessment (TP), Number of true negative assessment (TN), Number of all false negative assessment (FN) and Number of all False Positive assessment (FP). A good diagnostic test has LR+ > 10 and LR-< 0.1. By the standard of accessing the values of AUC with respect to diagnostic accuracy, values ranging between 0.9-1.0 are judged to excellent, 0.8-0.9 is very good, 0.7-0.8 is good, 0.6-0.7 is sufficient, 0.5-0.6 is bad and anything less than 0.5 is considered a diagnosis that is not useful. Data collected from the Ahmadu Bello University Teaching Hospital (ABUTH) as shown in Table 5, was used to draw a comparison between the improved ST algorithm (ONCODIAG) with that of  to obtain the result in Table 2. Similarly, in Table 3, the performance of our proposed algorithm is compared with the Wisconsin Breast Cancer Database (WBCD) and Wisconsin Diagnostic Breast Cancer (WDBC) datasets.    100  56  44  67  33  200  116  84  139  61  300  163  137  191  109  400  229  171  266  134  500  303  197  343  157  600  380  220  424  176  699  458  241 513 186  The Area Under Curve (AUC) in Figure 7 shows that the point falls close to the top-leftmost section of the curve, which is an indication that the performance of the algorithm is good. Also, the sensitivity and specificity values of 1 and 0.54 are indications of how the improved ST model demonstrates the ability to detect presence of disease (breast cancer) and to rule out the presence of disease (breast cancer). Base on Table 4, the WBCD datasets has 699 instances and our algorithm achieved an accuracy of 86%, sensitivity and specificity of 0.81 and 0.89 respectively.

Evaluation of the Proposed Algorithm and Discussion
In this section, we present an evaluation and its discussion in comparison of similar literatures of the result shown in the last section. The evaluation is focused on the accuracy measure of the related systems. We have chosen to compare this system (proposed in this paper) with machine learning, machine reasoning and knowledge representation techniques. Our comparison in this section is done base on the result of the comparison in Table 3 using Wisconsin repository.
In Table 6, eight (8) related works were compared with the proposed algorithm (ONCObc-ST). Mike et al. (2017;2018;Alsane et al., 2018) works which were focused on formalization of knowledge base for medical diagnoses and were not based on medical diagnoses, accuracy measure was not necessary. Those of (Aloraini, 2012;Agarap, 2018) which are Machine Learning (ML) based techniques have accuracy of 95.6%, 90.0% and 97.36% respectively. Furthermore, the works of (Alharbi and Tchier, 2016) is a fuzzy-genetic based algorithms attained accuracy of 97.33%. However, fuzzy-genetic algorithms are limited by difficulty of genetic algorithm to guarantee optimality and solution weakens with increased size of the problem. Similarly ML algorithms may suffer the problem of acquisition of relevant knowledge or data which usually have impact on the performance of ML. This is contrary to MR which adapts to more flexible adaptation even in big data investigations. We conclude that though our proposed algorithm (ONCObc-ST) attained an accuracy of 88.72%, its technique was rigorously verified and is proven to be acceptable. Furthermore, semantic reasoning (which is the approach of this paper) permits the representation of knowledge in a very deep and meaningfully structured form which yields high inference power. In addition, semantic reasoning approach excels in provability through formal logic proofs to explain the result gotten by the system. Lastly, in domain like medicine, deep knowledge representation with complex rules is required and which semantic reasoning appropriately solves.

Conclusion
In conclusion, this paper presents an improved Select and Test (ST) model for performing clinical reasoning. In addition, a three-fold (abduction, deduction and induction) logical inferences based algorithm of the improved model was also outlined and discussed. The formalism for carrying out approximate clinical reasoning was modeled and discussed using mathematical concepts such as relations and functions. The ST model we designed, which was augmented by a block diagram of the formalism for approximate reasoning, was meant to serve as an illustration of our proposal of using logical inferences in clinical solving problems. Furthermore, this paper gave a hint to the possibility of the interaction these logical inference methods to knowledge base that is modeled as ontology file. Moreover, we have argued that the peculiarity of clinical data and its reasoning process is characterized by some form of uncertainty or missing data, which necessitated the need for a formalized approximation method of reasoning with data. We addressed this challenge through the use of our formalism to prove that our logical inference methods are relevant in handling such peculiarity of clinical problem solving. Meanwhile, we have added a monitoring module into the improved model earlier discussed. Finally, the implementation of the model and its associated algorithms was done and casestudy to breast cancer diagnoses. Results showed that the performance of the improved ST model compared to the existing ST model was better. We however note that the model presented in this paper is limited by lack of an explanation facilitya means for justification of how and why clinical reasoning solutions were executed to increase confidence for acceptability of result of diagnosis.
In future, we plan to employ the use of machine learning to carry out feature extraction from mammograms. The current approach presented in this paper depends on Radiologists to study the image and pass necessary input. However, we desire to completely automated the readership of all images: MRI, mammograms tomosyntheses and ultrasounds.