ONCObc-ST: An Improved Clinical Reasoning Algorithm Based on Select and Test (ST) Algorithm for Diagnosing Breast Cancer

: The need for an accurate reasoning algorithm is usually necessitated by the sensitivity of domain of (medicine as example) application of such algorithms. Most reasoning algorithms for medical diagnosis are either limited by their techniques or accuracy and efficiency. Even the Select and Test (ST) algorithm which is considered a more approximate reasoning algorithm is also limited by its approach of using bipartite graph in modeling domain knowledge and making inference through the use of orthogonal vector projection for estimating likelihood of diagnosis at the clinical decision stage (induction). While the bipartite graph knowledge base lacks n-ary use of predicate on concepts, orthogonal vector projection on the other hand has high computation for the inference process. The aim of this paper is to enhance ST algorithm for improved performance and accuracy. First, we propose the use of ontologies and semantic web based rule for knowledge representation so as to provide support for inference making. Furthermore, three major improvements were added to ST algorithm to aid the improvement of its approximation. Secondly, we designed an inference making procedure to enable interaction with the knowledge base mentioned earlier. Thirdly, we model Hill’s Criteria of Causation into clinical decision stage of ST to overcome the limitation of orthogonal vector projection. Lastly, the improved ST algorithm was largely represented and described using set notations (though implemented as linked-list and queues) and mathematical notations. The result of the improved ST algorithm revealed a sensitivity of 0.81 and 0.89 and specificity of 0.82 and 1.0 in the Wisconsin Breast Cancer Database (WBCD) and Wisconsin Diagnostic Breast Cancer (WDBC) datasets respectively. In addition, the accuracy obtained from the proposed algorithm was 86.0% and 88.72% for the Wisconsin Breast Cancer Database (WBCD) and Wisconsin Diagnostic Breast Cancer (WDBC) datasets respectively. This enhancement in accuracy was obtained at a slowdown time due to the reasoning process and ontology parsing task added to the enhanced system. However, there was an improvement in the accuracy and inference power of the resulting system.


Introduction
Reasoning applies logic (rule application) to a given algorithm to arrive at a goal or desired end. While reasoning is being carried out, inference making is also being tended towards. Chakraborty (2012) gave some definitions to reasoning as follows: The act of deriving conclusion from certain premises using a given methodology; process of thinking; logically arguing. Similarly, reasoning is to draw inferences appropriate to a given situation (Copeland, 2014). There are different methods of reasoning which includes: Deductive, inductive, abductive, analogical, formal, procedural numeric, generalization, abstraction and meta-level reasoning. As expected with any system, reasoning algorithms continue to play significant roles, however, they are limited by capability and functionality (Chowdhury and Sadek, 2012). Clinical reasoning, which is a form of reasoning, is the process of reasoning through clinical findings over symptoms/manifestations presented by patient, with the aim of making clinical decision to identify an appropriate diagnosis. This reasoning process consists of the two stages of clinical diagnoses process (clinical findings and decision making) which when omitted in clinical reasoning algorithm design, can adversely affect the approximation and accuracy of the algorithm (Fernando and Henskens, 2013;Ramoni et al., 1992). Furthermore, in , the authors improved on (Fernando and Henskens, 2013) through the use of bipartite graph in modeling domain knowledge and making inference through the use of orthogonal vector projection for estimating likelihood of diagnosis in the clinical decision making process.
The major contribution of this paper is an enhanced medical reasoning algorithm base on Select and Test (ST) algorithm for improved performance and accuracy. The first approach taken in this paper to achieve the enhancement involved the use of ontologies and semantic web based rule languages for knowledge representation. Also, three major improvements were added to ST algorithm to aid the improvement of its approximation. Secondly, we designed an inference making procedure to enable interaction with the knowledge base mentioned earlier. Thirdly, we model Hill's Criteria of Causation into clinical decision stage of ST to overcome the limitation of orthogonal vector projection. Fourthly, the improved ST algorithm was largely represented and described using set notations (though implemented as linked-list and queues) and mathematical notations. The rest of the paper is organized as follows: Sections II and III, we present an overview of clinical reasoning and some related literatures on clinical reasoning algorithms respectively. In section IV, we presented a theoretical approach for defining ONCOb-ST algorithm and in section V the algorithm was presented. Furthermore, sections VI and VII present both the knowledge representation and reasoning pattern of ONCOb-ST and its implementation. Finally, section VIII lists the result of the experiment and concludes with a discussion. In section IX, we conclude the paper.

An Overview of Clinical Reasoing
Clinical reasoning algorithms may be categorized into three: Probabilistic, model based and rule based. Some of these algorithms are listed and described briefly in this paragraph. Scheme inductive reasoning which is based on adding characteristics of syndrome to narrow the list of potential diagnoses (Anderson, 2006). In scheme inductive reasoning, schemes are drawn to resemble that of road maps. It helps clinicians break down information into chunks, storing them in their memory and then retrieving them subsequently for problem solving task. Another example is pattern recognition, which is used in machine learning for assigning some outputs to some inputs based on the coordination of a given algorithm (Umoh et al., 2012). Similarly, hypotheticodeductive reasoning involves the self-reflection and informed clinical decision making process of generating and testing hypotheses in association with the patient's presenting symptoms and signs (Kumar et al., 2013). Forward chaining system, involves writing rules to manage sub goals. Whereas, backward chaining systems automatically manage sub goals (Sharma et al., 2012). Forward reasoning is efficient and fast, backward reasoning can be employed to resolve the conflict between two competing hypotheses. A combination of the two reasoning method -backward and forward -with increased experience leads to increased coordination of hypothesis and evidence (Hardin, 2002). Fuzzy logic based clinical reasoning algorithm uses linguistic variables to represent operating parameters in order to apply a more human-like way of thinking (Torshabi et al., 2013). Although fuzzy logic performance is limited by data clustering for membership function generation, processing model for diagnostic reasoning (Stausberg and Person, 1999). Furthermore, ST algorithm earlier described, is considered to be the most approximate (Fernando and Henskens, 2013; medical reasoning algorithm. Moreover, other clinical reasoning algorithms like Parsimonious Covering Theory (PCT) works on the basis of associating a disorder to a set of manifestations. It uses two finite sets (disorders and manifestations) to define the scope of diagnostic problems (Wainer and Rezender, 1997); Certainty Factor (CF), which is used for managing uncertainty cases in a rule, based system (Heckerman, 1990) and can be interpreted as measures of change in belief within the theory of probability (Heckerman and Shortliffe, 1992); Bayesian networks (uses probabilistic approach) are oriented acyclic graphs consisting of nodes (circles), which represent random variables; arcs (arrows), which represent probabilistic relationships among these variables (Gadewadikar et al., 2010) which helps in dealing with uncertainties.

Related Works
In this section, we review some related works on reasoning algorithms. Though, these reasoning algorithms have different domain of applications. This paper compares their efficiency by concentrating on their reasoning structures. Fernando and Henskens (2013), the author described the approach for clinical diagnostic reasoning based on ST Algorithm model which was earlier introduced in (Shortliffe and Fagan, 1985). The author adduced the fact that most of the algorithms discussed in the previous paragraph are lacking accuracy in their diagnostic approximation result. Hence, they showed that their approach of using ST algorithm in medical diagnostic reasoning yields an approximate reasoning model. The author in (Croskerry and Nimmo, 2011) presented two models for performing clinical diagnosis. These models are intuitive and analytical reasoning based on dual process theory. This decision making model combines these two models by seeking to recognize patient input as a pattern. If the pattern is formed, then the intuitive mode is used, but when the pattern is not recognized, the slow analytic model is used. In addition, one of the authors in (Croskerry and Nimmo, 2011) shows that the dual process theory that is based on intuitive and analytical models can be used to explain how diagnostic error occurs (Croskerry, 2009). Chapman et al. (2006), a decision making algorithm for clinical diagnosis was presented. The algorithm uses the two stage of performing diagnosis: Medial inquiry and clinical decision making. The emergency medicine based algorithm also makes provision in identifying likely errors at the medial inquiry and clinical decision making.
A research work on application development in diagnosis and monitoring of health issues in patients was done by (Yinyeh and Alhassan, 2015). The authors developed a simple medical expert system that can diagnose common ailments and also to provide medical professionals with information about diseases. Fuzzy systems have been developed in providing solutions in medical diagnostic reasoning as in the case of (Awotunde et al., 2014). The authors developed a diagnostic system using fuzzy logic. They achieved this through the formulation of three mathematical models, with the assistance of medical professionals. In addition, they created a fuzzy rule base for diagnosing malariawhich was the aim of the fuzzy diagnostic application.
Steps in clinical reasoning model consists of: Cue acquisition; cue clustering; cue interpretation, generating multiple hypotheses; focused cue acquisition; ruling in and ruling out hypotheses; making a diagnosis; evaluate treatment options relevant to the diagnosis; prescribe and/or Implement treatment plan and evaluate treatment outcomes (Jefford et al., 2011). Parsimonious covering theory which seeks to carry out medical diagnosis base on the relationship that exist between a finite set of some disorders (causes) and manifestations (effects), is another medical reasoning algorithm used by (Wainer and de Melo Rezende, 1997) to accommodate the association of time appearances of manifestations to their causes/effects. The authors used this improved model to diagnose food-borne disease.
Machine Learning (ML) algorithms have also played impactful role in sustaining high performance and accuracy of medical diagnoses in cases like breast cancer. Appproaces like comparing the performance of classification algorithms (Bayesian Network, Naïve Bayes, Decision trees J4.8, ADTree and Multi-layer Neural Network) for prediction of ailements/disease like breast cancer (Aloraini, 2012) have being used enhance accurcy. Similalrly, (Agarap, 2018) have also compared other comibination of ML like GRU-SVM, Linear Regression, Multilayer Perceptron (MLP), Nearest Neighbor (NN) search, Softmax Regression and Support Vector Machine (SVM) to move-up the accuracy of medical diagnoses. A different approach was a technique on improving and prunig diagnosis rule (Setiono, 2000), while others like (Andres et al., 1999;Nabil et al., 2008;Alharbi and Tchier, 2016) employed the use of fuzzy models and genetic algorithms. This hybridization of fuzzy models and genetic algorithms is usally aimed at attaining high classification performance which yields simple expert systems with few rules. However, the argument of this paper is that the performance of such approaches are psudoe-like considering the limitations of the underlying techniques. Our crticsism is based on the weakness of the techniques used. We therefore present a systematic multiple inference models with rule sets developed with medical experts for improved performce and accuracy of breast cancer diagnoses.

Theoretical Description of the Proposed Select and Test (ST) Algorithm
The ST Model describes a cyclic process which uses the logical inferences of abduction, deduction and induction procedures in carrying out medical diagnostic reasoning. By convention, two stages of reasoning are considered when performing medical diagnoses and these are clinical findings and clinical decision making process. ST algorithm's design is based on these two reasoning process, resulting in the desired approximation capability of the algorithm (Fernando and Henskens, 2013). In this section, a theoretical approach is used to describe the propose ST algorithm. Furthermore, a model for abstracting patient's inputs (symptoms) through the mapping of patient's term with acceptable medical term is also described in theory (though exhaustively treated in (Oyelade et al., 2017a)).

A. Mapping Patient's Input to Medical Knowledge
The formalized input model in (Oyelade et al., 2017) first collects raw input (r) -patient's description of manifesting symptoms in his/her own vocabularyduring a clerking section with the patient. Taking advantage of python's capability in natural language processing, r is tokenized and lemmatized accordingly thereby resulting in an array of terms (t). Furthermore, the synonyms and hyponyms of t are then compiled using the python libraries and stored as patient's supposed input collection (c). Finally, c is syntactically and semantically matched against a breast cancer lexicon KBl in (Oyelade et al., 2017b), to produce the list of acceptable medical tokens (inputs) oncoTokens -words associated with breast cancer diagnoses and treatment. Hence the mathematical model in Equation 1:

B. ST Inference Structures
The reasoning structures of ONCObc-ST consist of four levels of sub-modules: Abstraction, abduction, deduction and induction. The abstraction level conceptualizes the collection of oncoTokens into the ST algorithm. The items in oncoTokens are then assigned to the set Presenting Symptoms (PS) and Symptoms Found (SF) as model by Equation (2): : : Also, in Equation (3), the abduction inference generates a set D of differential diagnoses by listing d i that is associated with each s in SF. Assume that both the knowledge base and rule set of abduction are KB a and R a and also assume we denote the process of invoking the rule by the function apply(): Furthermore, all the symptoms (combined in set M) of each d in the set D of differential diagnoses are generated in deduction reasoning/inference process. This is captured in Equation (4): Equations (3), (4) and (2) are executed in a cyclic pattern accordingly. Finally, the induction inference process determines all diagnosis (ld) from the set D, with likelihood values higher than (or equal to) a given threshold as captured in Equation (5): While the abstraction, abduction and deduction stages are being redesigned to handle the clinical findings procedure in a cyclic pattern, the induction layer handles the clinical decision making process and are all further detailed in section V.

C. Knowledge Representation in SEM-ST
It will be observed that four different knowledge bases KBl, KB a , KB d and KB i were mentioned between Equations (1) to (5). Also, rule sets R, R a , R d and R i were also correspondingly applied to their respective knowledge bases. The design of the knowledge bases and the rules and their applications are further in detailed in section VI.

V. ONCObc-ST Algorithm
The last section presented an abstraction of the improvement to ST algorithm this paper proposes. In this section, the process of inference making in medical diagnostic reasoning are detailed using algorithmic approach.

A. Abstraction
The process of mapping descriptive terms that are understood by patients onto well-defined symptom entities used in the knowledgebase is known as abstraction. In Algorithm 1, manifestations and symptoms felt by patient are collected. Practically, these could be sourced from patient's file, through interaction with patient, family history and other means. However, this paper depends on the human medical experts in the domain of consideration to help map information elicited through a model designed by these authors in (Oyelade et al., 2017a). Afterward, the result of such mapping forms the elements of the set PatientProfile which now serves as a major input into ONCObc-ST. Furthermore, each symptom s in SymptomsToBeElicited is check against the set PatientProfile to ensure it was elicited from patient, before adding it to another set SymptomFound. Thereafter, s is removed from SymptomsToBeElicited now considered treated by adding it to SymptomsAlreadyElicited.

B. Abduction
The set SymptomFound populated in Algorithm 1 is now an input into Algorithm 2. Meanwhile, rule set and ontology developed in (Oyelade et al., 2017b) by this authors are also provided for Algorithm 2. The overall aim of this logical inference making is to get all the diagnosis related to some given symptoms. It involves determining all likely diagnoses related to the reported symptoms. These differential diagnoses are generated by finding all related diagnosis with respect to each s in SymptomFound. During this process, the algorithm checks if s is related to a diagnosis d by a predicate P in the ontology, then using the rules and diagnostic rule engine (practically using JESS (2015)), it checks the possibility of s causing d. if that causal property holds d is added to a set called diagnoses. Furthermore, each differential diagnoses in the set diagnoses is checked if probability of causation is greater than or equal to a specified threshold, then such diagnosis d is added to the set DiagnosesToBeElicited.

C. Deduction
For each likely diagnosis from the previous stage, all the expected symptoms of such diagnosis are drawn out based on a logical inference method. In Algorithm 3, modeling deduction inference process requires that both its rule and ontology be provided. If we can deduce that, for each d in differential diagnoses stored in DiagnosesToBeElicited, there is an association with some s and then get each symptom known to relate with d and store them in set symptoms. Afterward, each s in symptoms is added into a collective set SymptomsToBeElicited. Meanwhile, note that the deductive reasoning processing is practically aided by using pellet (Sirin et al., 2007).

D. Induction
Abduction-deduction-abstraction forms a cyclic pattern which performs the process of clinical findingsit carries out differential diagnoses. The result of the cyclic refinement of likely diagnoses is stored in a set DiagnosesAlreadyElicited. The aim of inference by induction involves matching the acceptable criteria of each diagnosis in DiagnosesAlreadyElicited with their corresponding criteria according to standards or clinical protocols. This will enable the reasoning process isolate diagnoses with most likelihood of existence in patient, based on manifestations and symptoms presented -a process called clinical decision making. Meanwhile, the criticality of each d in DiagnosesAlreadyElicited is computed by the weight of the presented symptoms by patient against the weight of all symptoms known with d. This criticality models a form of staging of the disease/diagnosis d. Now, on how the criteria computation described above works, this paper used Hill's Criteria of Causation (Lucas and McMichael, 2005) to standardize the process of checking the criteria of ailment/diagnosis at the induction reasoning. Here are the definitions of each criterion and their application in Algorithm 4. The implication rule below shows how the induction logical inference stage carries out its conclusion of suggesting a diagnosis to have met the acceptable criteria. The parameters of rule are drawn from the input of the three background process shown in Table 1: where, temporal associates time between exposure to causal-effect and manifestations, bio is a variable that gives pointer biological authentication of the disease, consistency variable in equation above holds the degree at which the findings of disease being diagnosed correlates with epidemiological knowledge. Finally, the coherence value is what a medical expert submits as authentication to the reasoning algorithm when presented with the outcome of the reasoning process.

E. The Complete ONCObc-ST Algorithm
The ONCObc-ST algorithm now combines the four algorithms described above to achieve the process of medical diagnostic task. Note that the technique for designing this algorithm is the brute force/exhaustive search technique and the algorithm specification taking the form of pseudocode and the use of mathematical notations. Algorithm 5 lists the expected input into the algorithm and the prospective output as well. Within the body of the algorithm are sets initially declared as empty. Meanwhile, before the call to the Abduction(), Deduction() and Abstraction() sub-modules, the SymptomsAlreadyElicited and the SymptomsFound sets are populated by the PresentingSymptoms set which itself derives its elements through the input model in (Oyelade et al., 2017a).
Algorithm 5: Modified combined ST Algorithm //ALGORITHM 5 Inputs: A special graph of ontologies P (s j |s j ) and P (d 1 |s j ): AbductionKB A special graph of ontologies P (s j |s j ) and P (d 1 |s j ): AbductionKB A special graph of ontologies P (s j |s j ) and P (d 1 |s j ): AbductionKB A presenting symptoms set PresentingSymptoms Profile of the patient PatientProfile Threshold for symptoms t s Threshold for diagnoses t c A set of rules for reasoning at abduction: AbductionRuleSet A set of rules for reasoning at deduction: DeductionRuleSet A set of rules for reasoning at induction: InductionRuleSet Output: Set of likely diagnoses DiagnosesIncluded Set of diagnoses excluded, DiagnosesExcluded Set of symptoms that were found in patient SymptomsFound Set of symptoms that were not found in patient SymptomsNotFound

Knowledge Representation for ONCObc-ST
Recall that in section III, sub sections B and C, we modeled and abstracted away some key information of the knowledge base for the propose ONCObc-ST algorithm. Though the knowledge representation will be explained further in this section, however the detail of both implementation and content of the knowledge base are in (Oyelade et al., 2017b).
The concept of knowledge base as used in this paper consists of both the body of knowledge in a particular field of medicine (oncology -breast cancer) and the procedures or guidelines for diagnosing. These two notions are henceforth referred to as relation of concepts (data or entities) in ontology forms and rules respectively. Figure 1 therefore is an illustration of this view of the knowledge representation in this paper. Oyelade et al. (2017b), the related concepts were implemented as OWL files and the protocols/rules were modeled using two semantic web based rule languages (SWR and JESS) in a separate paper (Oyelade and Adewuyi, 2018).   100  56  44  67  33  200  116  84  139  61  300  163  137  191  109  400  229  171  266  134  500  303  197  343  157  600  380  220  424  176  699 458 241 513 186 Let the knowledge representation in Fig. 1 be denoted by the tuple in Equation (6): Also, let's assume OntoOnco and RuleOnco are sets whose elements are also set of other items as shown in (7) and (8): Note that KBl, KB a , KB d and KB i are the knowledge bases of the breast cancer lexicon, abduction, deduction and induction inferences respectively. Furthermore, the Rl, Ra, R d and R i are all rule sets for matching input with lexicon, abduction, deduction and induction inferences respectively. Recall that KB a , KB d and KB i were read into Algorithm 5 and used within its submodules. This is also similar for the cases of Ra, R d and R i . Meanwhile, the ontologies represented by KB a , KB d and KB i are representations or formal naming of relations or properties P of concepts, data or entities (like diagnosis and symptoms) that substantiate or domain of breast cancer knowledge. Therefore, our KB a , KB d and KB i are simply listings of P(d|s) or P(s|d) depending of the relation and were both d and s represents elements from disease/diagnosis and symptoms respectively. Finally, we modeled each ontology file in OntoOnco using Protégé editor and Rl, R d and R i where modeled with Semantic Web Rule Language (SWRL) while R a was model with JESS rule language JESS-RL. The application of the rules written in SWRL was applied using Pellet while that of JESS-RL was applied using Java expert system shell (JESS).

Implementation and Result Presentation
The implementation of the Algorithm 5, described in section IV, was achieved using Java programming language as shown in Fig. 2. The datasets used, which are discussed further in the next section, consists of Wisconsin datasets WBCS, WDBC and WPCS. We have earlier mentioned a formalized input model, capable of serializing patient's input into acceptable medical tokens, designed by these authors in (Oyelade et al., 2017a). However, when testing this implementation of the proposed Algorithm 5, a benchmark dataset, the Wisconsin datasets were used. This datasets were serialized and passed as input into Algorithm 5. In the output section of Fig. 2, we will notice that the result of criticality of the resulting diagnosis is shown. Also, the staging of the breast cancer diagnosis was staged at Stage 3A. Finally, the implemented algorithm also suggests to the patient action to take for treatment. The output in Fig. 2 suggested the patient undergoes surgery and radiotherapy with chemotherapy. Fig. 2: Implementation of the improved ONCObc-ST algorithm A careful consideration was carried out in selection of metrics for quantifying the accuracy of diagnoses process in this paper. These metrics were chosen base on standards adhered to when measuring diagnostic accuracy, moreover,  also used some of these parameters in measuring the accuracy of their algorithm. In (Šimundić, 2008), the author stated that diagnostic accuracy of any diagnostic procedure or a test gives us an answer to the following question: How well this test discriminates between certain two conditions of interest (health and disease)? It is this discriminative ability that this paper quantifies by measuring diagnostic accuracy: . A good diagnostic test has LR+ > 10 and LR-< 0.1. By the standard of accessing the values of AUC with respect to diagnostic accuracy, values ranging between 0.9-1.0 are judged to excellent, 0.8-0.9 is very good, 0.7-0.8 is good, 0.6-0.7 is sufficient, 0.5-0.6 is bad and anything less than 0.5 is considered a diagnosis that is not useful. Note that TPR means true positive rate and FPR also means false positive rate.
In this paper, definitions of TP, TN, FP and FN are as follows: a. TP: What was found during elicitation from patient (denoted by variable symtomsFound) and proven present by the inference process (denoted by the variable inferedSymptoms) b. TN: What was found absent during elicitation from patient (denoted by variable symtomsFound) and proven absent or unavailable by the inference process (denoted by the variable inferedSymptoms) c. FP: What was found present during elicitation from patient (denoted by variable symtomsFound) and proven absent or missing by the inference process (denoted by the variable inferedSymptoms) d. FN: What was found missing or absent during elicitation from patient (denoted by variable symtomsFound) and proven present by the inference process (denoted by the variable inferedSymptoms) Let user input be denoted by UI, inferred knowledge be denoted by KI and the entire knowledge base be KB. Then base on the definitions (a)-(d), the following sets are obtainable: The WBCS dataset has 699 instances and WDBC has 559 instances totaling 1258 instances used as participants in the testing. However, for clarification purpose, these total participants are categorized according to their datasets. Tables 1 and 2, shows the result of diagnosis for the participating instances in each category.   100  35  65  61  39  200  96  104  118  82  300  154  146  171  129  400  227  173  219  181  500  305  195  259  241  559  353  206 290 269   Agarap (2018) GRU-SVM, R, MLP, NN, SR and SVM Machine learning (ML) >90. 0% Nabil et al. (2008) Artificial immune genetic algorithm Machine reasoning (MR) 97.36% with fuzzy rule Alharbi and Tchier (2016) Fuzzy-Genetic Algorithm Method Machine reasoning (MR) 97.33% Setiono (2000) Neuro-rule ANN Machine reasoning (MR) 97.97% Andres et al. (1999) fuzzy-genetic algorithm approach Machine reasoning ( Table 3. Furthermore, we compute and listed in Table 4 the values of the three metrics (sensitivity, specificity and accuracy) for determining medical diagnostic accuracy.
Note that the datasets of WBCS and WDBC comprises of instances derived from cases of patients already diagnosed accordingly, hence the reason for computed accuracy of 100% in the three categories of datasets. On the other hand, in Table 3, sensitivity of 0.81 and 0.82 when using the datasets of WBCS and WDBC respectively in our proposed algorithm indicates that the ONCObc-ST has a good probability of getting a positive test result in subjects with the disease and it has the potential to recognize subjects with the breast cancer. Similarly, specificity of 0.89 and 1 of WBCS and WDBC respectively represents the probability of ONCObc-ST able to find a negative test result in a subject without the breast cancer and describes the improved algorithm to have ability to recognize subjects without breast cancer. In addition, the values obtained for the accuracy implies that the diagnosis obtained from the proposed algorithm is good. We may then state that the results of these metrics demonstrate the improved ST algorithm, ONCObc-ST, to be relevant for consideration as a medical diagnostic reasoning algorithm and for implementation of medical expert systems.
Meanwhile, our presumption during the translation and formatting of the datasets to serve as input into the improved ST algorithm might have placed some constraint in correctly mapping the numeric values in the datasets to symptoms needed in the improved ST algorithm. However, assumptions made during this data mapping were clearly based on clinical protocols of breast cancer consulted, hence the credibility of the procedure. We observed that an accuracy of 88.72% performance of the improved ST algorithm suffices it for adoption in any medical diagnoses procedures.

Evaluation and Discussion
In section VI, we drew up the performance of the proposed ONCObc-ST algorithm using some selected metrics. It was observed that the optimal performance of the algorithm was 88.72% and with good results in terms of sensitivity and specificity. However, we have decided to compare the proposed algorithm with similar algorithms that are based on intelligent systems (machines). Canonically, algorithms relating with machine intelligence may be grouped into Machine Learning (ML) and Machine Reasoning (MR). While those of ML simply need a related algorithm with both training and input datasets, those of MR are strongly knowledge based. In addition, ML algorithms may diagnose diseases through prediction after learning from a training datasets and creation of models, while those of MR are purely diagnose through reasoning process with the aid of some form rules (most often). Though it may appear that the results/outputs of the two techniques are same, however, they differ in terms of algorithm complexity, problem solving approach and production of acceptable explanation systems. The algorithm proposed in this paper is classified as MR algorithm. And as earlier stated its reasoning capability is driven by two inference algorithms (abduction and deduction) and a generalization/decision making algorithm (induction). These three algorithms ensure that an acceptable explanation system (to justify the reason for its output) can be generated. In addition, the problem solving approach of the proposed algorithm systematically arrives as its conclusion/results. Hence, our approach is more reliable and correct (even sound and complete) than other approaches of ML listed in Table 4.
In Table 4, six (6) related works were compared with the proposed algorithm (ONCObc-ST). Those of Aloraini (2012), Agarap (2018) and Nabil et al. (2008) which are ML based techniques have accuracy of 95.6%, 90.0% and 97.36% respectively. The argument presented in the last paragraph may explain why ONCObc-ST is 88.72%. Furthermore, the works of Alharbi and Tchier (2016) and Andres et al. (1999) which are both fuzzygenetic based algorithms attained accuracy of 97.33% and 97.80%. Also, Setiono (2000) which is Neuro-rule ANN algorithm attained 97.97% accuracy. However (Ernest et al., 2014) clearly stated that fuzzy-genetic algorithms are limited by difficulty of genetic algorithm to guarantee optimality and solution weakens with increased size of the problem. Similarly, (Wuest et al., 2016) revealed that ML algorithms may suffer the problem of acquisition relevant knowledge or data which usually have impact on the performance of ML. This is contrary to MR which adapts to more flexible adaptation even in big data investigations (Ganeshan, 2018). Therefore, we conclude that though our proposed algorithm (ONCObc-ST) attained an accuracy of 88.72%, its technique was rigorously verified and has proven acceptable. Furthermore, semantic reasoning (which is the approach of this paper) permits the representation of knowledge in a very deep and meaningfully structured form which yields high inference power. In addition, semantic reasoning approach excels in provability through formal logic proofs to explain the result gotten by the system. Lastly, in domain (medicine), deep knowledge representation with complex rules are required and which semantic reasoning appropriately solves. In conclusion, though the accuracy of 88.72% is good, further refinements based on limitations and future works highlighted in Section VIII may raise up the accuracy to an excellent performance.

Conclusion
In this paper, we have presented an enhanced and more approximate medical reasoning algorithm named ONCObc-ST which was an improvement of ST algorithm. The improvement was achieved through the design of ontology-based knowledge representation that assists logical reasoning. In addition, the ST algorithm was also modified at the induction sub-modules to reason effectively through the incorporation of Hill's Criteria of Causation and provision of support for some inference making processes. The result of the improvement showed that the accuracy or approximation of ONCObc-ST demonstrates a good performance when we consider the following: Sensitivity of 0.81 and 0.82, specificity of 0.89 and 1 and accuracy of 86% and 88.72% for the Wisconsin Breast Cancer datasets WBCD and WDBC respectively. Note that while this paper applied the improved algorithm in diagnosing breast cancer, future works may consider an application of other ailments or even a collection of similar ailments, with further consideration of hybridizing the rule-based system in this paper with a moderately case-based features. Furthermore, there is an omission of an explanation facility for analyzing the basis and pattern for problem solving approach and to convince patients of acceptability of the diagnosis result. Finally, machine models may be adopted in filtering, classifying and cleaning of datasets before serving them as inputs.