A Comparative Analysis of Software Engineering with Knowledge Engineering

,


INTRODUCTION
Planning, monitoring and quality control: Like any human-intensive production/engineering activity, software development needs reliable techniques to plan resource expenditures and monitor, assess and control product quality. More precisely, project expenditures need to be predicted and significant deviations need to be monitored. This requires the construction of accurate prediction models and heuristics to detect significant deviations and take remedial actions. With respect to prediction, a number of techniques coming from machine learning have shown to be useful. Examples are decision trees (Briand and Wuest, 2002) and rough sets (Harman and Jones, 2001).
The main advantages of these techniques can be described as follows: • They can easily handle qualitative, categorical data, which are common in software engineering • They produce models that are easier to interpret, which is important in our case as we would like to understand what factors affect software development productivity and quality • They enable the discovery of certain structures in data sets, e.g., variable interactions in decision/regression trees Computational intelligence, with techniques such as neural networks (Keung et al., 2004), can also play a role. Neural networks are good at building complex, non-linear prediction models. They do not require any assumption regarding the functional form of the relationships between predictors and the variable to be predicted. However, their usage may be tedious (i.e., the training phase) and the interpretation of the resulting models difficult. This stems from the fact that it is difficult to deduce the type and form of relationships between variables from a neural network.
Fuzzy set theory has also been used to help with software engineering prediction models. The main motivation is that, as mentioned above, the data that prediction models rely on can be of qualitative and subjective nature (e.g., Team Cohesion cost driver in COCOMO II (Chulani et al., 1999). Fuzzy sets have been designed to deal with linguistic uncertainty and can help model the uncertainty associated with some of the subjective model parameters and input data which are elicited from expert opinion. In other words, when the user of prediction models have to provide qualitative values (e.g., categories) in input, fuzzy set theory can allow them to grant different levels of memberships to various categories, thus reflecting their uncertainty about the model inputs. Such uncertainty has, however, to be accounted for in the prediction model outputs.
Another interesting strategy that has been used in the context of quality and cost prediction models is Case Based Reasoning (CBR) (Vijay and Manoharan, 2009;Khoshgoftaar et al., 1995). The basic principle of CBR is to define a similarity function or measure and use it to retrieve similar projects to reuse their cost or quality data as a basis for prediction. However, it requires that a similarity function be defined beforehand. But in software engineering we are very often in a situation where we attempt to uncover trends from data and we are not a position to define such a similarity function. With respect to cost estimation, results have so far been rather disappointing (Briand and Wieczorek, 2001) and this result very likely stems from the difficulty to define an appropriate similarity function.
We have seen that many models (e.g., cost models) cannot, due to practical constraints, be built solely based on data (Briand et al., 1998). Therefore, eliciting expert opinion and modeling expert knowledge is sometimes key to developing prediction systems. Ideally, software engineering prediction models should combine expert opinion and project data. For example, the COCOMO II (Chulani et al., 1999) model is based in part of expert opinion. One important question is then how to integrate expert opinion and project data into common models. Techniques such as Bayesian analysis (Chulani et al., 1999) and expert opinion elicitation techniques combined with Monte Carlo simulation (Briand et al., 1998) have been used in the area of cost estimation. The latter technique has also been used for technology evaluation in the context of inspections (Briand et al., 2000a). We are in the process of developing a hybrid techniques which uses the concept of both software engineering and knowledge engineering (Vijay and Manoharan, 2009;Keung et al., 2004).

Software learning organizations:
Within an organization, experience and knowledge acquired on past software projects can be used to improve practices on future projects. For example, it may be important to know whether a requirements engineering technique has worked well on past projects, what were the benefits and challenges, what the project participants felt should be done to improve the way it was used or automated. The main reason is that, in software engineering, it is difficult to know a priori whether a given technique or method will fit well with the problems at hand and existing practices. Corporate learning, based on experience, then becomes key to the effective adoption of new practices and productivity/quality improvement.
However, to achieve such an objective, best practices, lessons learned, models and data need to be made accessible and reusable across an organization. Different issues have to be addressed to make this possible: • Technical issues: Data and documents need to be stored and retrieved in an efficient manner.
Knowledge bases need to be designed and maintained and connected to the company intranet for corporate-wide accessibility. Security issues then arise as a result as some of the information may be confidential • Organizational issues: Such knowledge bases need to be fed by projects. Data, information and documents need to be provided in a consistent form, based on agreed-upon structure and content. The information provided must be precise, accurate and complete. This requires a certain organizational discipline with procedures that are defined and enforced • Cognitive issues: Users accessing such knowledge bases may be faced with tremendous amounts of information, most of it being irrelevant to the problem at hand. It is therefore important to reduce the cognitive load of the user by allowing him to retrieve, in an efficient and precise manner, relevant information In this context, the design and maintenance of corporate wide knowledge bases then become a key issue to address.
Well-known and mature technologies exist to address the technical issues related to the design and maintenance of knowledge bases. The organizational issue has been addressed by the Quality Improvement Paradigm (Basili and Caldiera, 1995) and the Experience Factory Model. The Quality Improvement Paradigm (QIP) provides steps and guidelines about how an organization can go about improving itself based on project experience. The Experience Factory Model provides a model of corporate infrastructure that needs to be put in place to support the QIP.
Cognitive issues can be addressed by using techniques such as Case-Based Reasoning (CBR) (Vijay and Manoharan, 2009;Gresse et al., 2001) to retrieve relevant pieces of information in a knowledge base. For example, similar past projects can be retrieved based on a description of a new project and relevant lessons learned on various technologies and process issues can be retrieved, e.g., the usage of inspections. In this case a similarity measure between projects would need to be defined and, in practice, it would probably require the use of expert opinion. Furthermore, Incomplete data (e.g., project descriptions), the use of categorical variables and taxonomies (e.g., project types) and the use of various measurement scales are additional issues to address in defining similarity.
Numerous, complex decisions have to be made during software development and maintenance. For example, one may want to decide what should be the order of development and integration of components in a system , whether a given document needs further inspection before being approved (Briand et al., 2000b) and used for the next phases of development, or whether an inspection technique at a given stage of development is beneficial (Briand et al., 2000a). Such types of decisions are usually not trivial. They typically involve a certain level of risk and substantial resources are at stake. Some of these decision problems can be reformulated as optimization problems. For example, the integration order problem above can be reformulated as a combinatorial optimization problem and techniques such as genetic algorithms or simulated annealing can help find near optimal solutions . The advantage of such metaheuristic techniques (Vijay and Manoharan, 2009), as they are referred to, is their flexibility. The objective function to be minimized is often to be tailored to specific situations. Such heuristics, as opposed to mathematical optimization techniques, enable such tailoring without changes to the optimization algorithms and automation. Furthermore, meta-heuristics allows us to solve complex, non-linear optimization problems that are not always addressable by conventional mathematical optimization techniques (Vijay and Manoharan, 2009). Their drawback tough is that there is no absolute guarantee such heuristics will provide near optimal solutions. Only case studies and experimentation can help us determine whether they are adequate for a problem and under which conditions. Not all decisions can be formulated as an optimization problem. In some cases, the parameters that have a strong influence on a decision outcome are not known or can only be estimated with a certain level of uncertainty. This is the case of the inspection cost-benefit evaluation example mentioned above (Briand et al., 2000a). In general, to decide about using a technology, one usually needs to formulate a cost benefit model and possibly perform some simulation to account for the multiple sources of uncertainty in the model inputs and parameters (Briand et al., 2000a). However, in practice, even when carefully considering simplifying assumptions, such models depend on parameters that are not only unknown but specific to a particular development environment and for which we cannot collect data. Fortunately, there exists a large body of literature on expert estimation, which has been used, for example, in the nuclear industry to build risk models. Reported techniques have shown, under certain conditions, to be very useful to help estimate unknown model parameters.
Automation: Many activities in software engineering need to be automated so as to make methods and techniques economically viable. One good example is the generation of test data. In most cases, whether we refer to unit, integration, or functional testing, test strategies are defined based on coverage criteria, e.g., cover all control flow edges in a procedure. As a result, in many situations, generating appropriate test cases consists in finding test data that are compliant with a set of logical constraints, e.g., conditions determining the control of execution in a procedure. This exercise is very tedious and error-prone.
Fortunately, a number of research articles (Vijay and Manoharan, 2009;Pedrycz and Peters, 1998) have shown that metaheuristic techniques can also be used in this context. For example, based on constraints, an objective function can be defined in the context of genetic algorithms in order to ensure convergence of the algorithm towards acceptable input data. Initial results suggest this is feasible but more empirical investigations are however needed to determine the best ways to use those techniques and assess their limitations to address software engineering issues. Though many techniques are available and have been experimented with, software engineering problems provide new contexts in which to use them.

CONCLUSION
From the discussions above, we have seen that a wealth of knowledge engineering, artificial and computational intelligence techniques can be used to address a number of important software engineering issues. Though we have focused on techniques and problems on which we already have experience, it is clear that this study only scratches the surface. The potential for cost-effective applications in software engineering is enormous.
Expectedly, most of the techniques discussed here are based on heuristics. What this implies is that they can only be validated through experimentation and case studies. And they need to be investigated for each problem to be addressed and under realistic conditions. Only then we can determine whether they are applicable, economically viable and under which conditions this is the case.
It is therefore important not to fall into the trap of blindly using knowledge engineering techniques to arbitrary software engineering techniques. The wellknown "hammer nail" dilemma should be avoided as it could lead to substantial waste of effort and negatively affect the perception that there is an important role to play for knowledge engineering in software development. The knowledge engineering community needs to make a conscious effort to understand the reality of software engineering challenges and technologies. In a similar way, software engineers need to get educated on the latest developments in computational intelligence, knowledge engineering, machine learning and hybrid techniques of estimation.