Optimizing Software Effort Estimation Models Using Back-Propagation Versus Radial Base Function Networks

: Software development effort estimation becomes a very important and vital tool for many researchers in different fields. Software estimation used in controlling, organizing and achieving projects in the required time and cost to avoid the financial punishments due to the time delay and other different circumstances that may happen. Good project cost estimation will lead to project success and reduce the risk of project failure. In this paper, two neural network models are used, the Back-propagation algorithm versus the redial base algorithm. A comparison is done between the suggested models to find the best model that can reduce the project risks related to time and increase the profit by achieving the demands of the required project in time. The two models are implemented on a 60 of NASA public dataset, divided into 45 data samples for training and 15 data samples for testing. From the result obtained we can clearly say that the performance of the back-propagation neural network in training and testing cases is actually better than the radial base function, so the back-propagation algorithm can be recommended as a useful tool in the software effort and cost estimation.


Introduction
Building and estimating successful software is an important task that attracted many software developers (Boraso et al., 1996;Dolado, 2011). Bidding, budgeting and planning are very important factors that affect project success. Accurate defining of these factors will reflect on the project size, time, efforts, complexity and the different required tools to avoid the sudden and unexpected events that may happen during the project duration, that cause a project loss. Good software estimation gives exact feedback about the project progress that allows better resource utilization, allocation and use (Boehm, 1981).
In Software Technology Conference held in 1998, Dr. Patricia Sanders, Director of Test Systems Engineering and Evaluation at OUSD, stated that 40% of the DoD's software development costs are wasted and paid on reworking the software, that caused an annual loss of $18 billion on the year of 2000. Dr. Patricia added that only 16% of the developed software could finish in the accurate time and budget.
Effort estimation was mainly affected by the Developed Line of Code (DLOC), where the instructions of the program and statements were included. This model worked on 63 software projects and its core function based on finding and determining the arithmetical relationship between three important variables; the time of software development, human efforts during the work months and effort of maintenance (Kemere, 1987).
The Constructive Cost Model (COCOMO) is considered as one of the most important, popular and famous models used to estimate the software effort which is developed by Boehm (1981;Boehm et al., 1995). Numerous techniques were used by different researchers for building an efficient estimation models structure to process the software cost estimation problem. Artificial neural network with different architecture was one of these models that proved its solidity and efficiency in this field (Shepper and Schoeld, 1997) moreover, the fuzzy logic used by (Kumar et al., 1994;Kaushik et al., 2012) and evolutionary algorithms such as genetic algorithm and genetic programming was also strongly used to deal with such types of problems.
Artificial neural network algorithm with backpropagation algorithm versus the radial base function is used in this paper. The comparison between the two models is presented. This comparison will contribute in selecting the best neural network model for solving the software effort estimation problem. Artificial Neural Networks (ANN) works in a similar way as the human biological neural system acts, exactly comparable to the way how the brain operates and process information (Negnevitsky, 2005). The brain consists of large numbers of small cells that are fully interconnected to process the data. Also, the Artificial Neural Networks (ANN) consists of a great amount of fully and strongly interconnected cells called neurons, all working to gather in a systematic manner to solve specific problems, which also learn by example similar to the way the human biological systems do. Learning in Neural networks means readjusting the synaptic relations that existed between the variously connected neurons until we reach the optimal solution. In 1943 the first artificial neuron was introduced by the neurophysiologist Warren McCulloch and the logician Walter Pits. This research paper is formulated according to the following. Section 2 describes the related work. Section 3 represents the backpropagation learning algorithm. Section 4 presents the redial base function. Section 5 describes the constructive models. Section 6 discusses the experimental results. Finally, Section 7 discusses the conclusion and future work.

Literature Review
The Soft Computing technique is recently used in many research fields. These techniques are developed by (Zadeh, 1994) which contain different algorithms architecture such as the Fuzzy Logic, neurocomputing like neural networks and probabilistic reasoning. Later, the field is extended to include other new techniques such as genetic algorithms, genetic programming, swarm intelligence…etc. All these techniques played a vital role in developing and improving the research area (Kaushik et al., 2012;Huang et al., 2003). Soft Computing techniques were also used in processing and treating software cost estimation problems. The authors (Feng et al., 2010) implemented the genetic programming algorithm to optimize and improve the performance of backpropagation neural network to reduce the construction cost for software estimation. The authors (Shepper and Schofield, 1997) also used neural network in optimizing and treating the cost estimation models. A fuzzy logic using different techniques was also used to process the famous COCOMO model (Kaushik et al., 2012). The Fuzzy Logic technique using Takagi Sugeno model was used to find out how the rules can contribute in solving the software effort estimation problem as presented by the authors (Sheta and Aljahdali, 2013). Moreover, the author (Sheta, 2006) used the genetic programming to estimate the COCOMO model parameters for the NASA software project. The authors (Ghatasheh et al., 2015) used the firefly algorithm for optimizing the software effort estimation models. A neural network was also strongly presented in solving the software cost estimation problem as presented by (Singh et al., 2011).
The authors (Oliveira et al., 2010) used the Hybrid method for selecting parameters and optimizing the model in order to clarify the impact of using GA in feature selection and effort estimation. The authors (Sehra et al., 2011) used the soft computing techniques for software project effort estimation where the NN, FL and GP were used for estimating the project efforts. In this paper, the interest and motivation of processing such type of problem comes from the real and historical importance in processing the software cost estimation problem as presented in the above -related works.

Back-Propagation Learning Algorithm
ANN with back-propagation algorithm considered as one of the most important learning algorithms used tell now. Back-propagation (BP) was introduced by David Rumelhart, Geoffrey Hinton and Ronald Williams 1986. It is considered as the fastest and the workhorse of learning in a neural network. The working mechanism of the back-propagation neural network is based on the concept of learning by example. This means that the user should give the neural network the examples of what he wants (desired output) and the network change the weights of the network's related to that, when training is completed, the output will be estimated according to the desired one which is called the (target output) for a particular input. The Back-propagation Artificial Network still proves its efficiency in a variety of application solving numerous serious real-life problems in finance sectors, cancer disease recognition (Braik and Sheta, 2011), science, forecasting (Baareh et al., 2006;Sheta et al., 2015;2018), feature extraction (Al-Batah et al., 2010), classifications (Seethe et al., 2007;Hongjun et al., 1996;El-Sayyad et al., 2015), face recognition (Radha and Nallammal, 2011), Fingerprint recognition (Al-Najjar and Sheta, 2008) etc. The back-propagation artificial neural is used in this paper to solve the software cost estimation problem. ANN mostly, formulated from three layers, i.e. the input, hidden and the output. The weighted sum of the input neurons specifies the nonlinear activation (i.e. sigmoid) function argument (Baareh et al., 2006). Let x 1 (p), x 2 (p)....x n (p) be the inputs of the network and let y 1 (p), y 2 (p),…,y n (p) be the required output. The iteration number is defined by P. The function of the back-propagation neural network is illustrated as in (El-Sayyad et al., 2015): 1. Equation 1 represents the output calculated from the hidden layer: The number of input neurons is represented by n, j which represents the hidden layer number, w ij represent the calculated mapped weights between the inputs layer to the hidden layer and from the hidden to the output layer, θ is a threshold value: 2. Equation 2 represents the implemented sigmoid function: 3. Equation 3 represents the calculated output of the output layer: where, m is the number of inputs of neuron k in the output layer 4. Equation 4 represents the Error Gradient calculated from the output layer: where e k (p) is the output layer error: 5. Equation 6 represents the ANN calculated weights: 6. Readjust the ANN weights using Eq. 7.
9. Equation 10 represents the readjusted weights: The structure of the back-propagation neural network is shown in Fig. 1.

Radial Basis Function (RBF) Networks
Radial Basis Function (RBF) Networks derives from the theory of function approximation, it is a way of learning, very fast and very good in interpolation (Harikumar and Vijayakumar, 2013). The constructed Radial Basis Function (RBF) consists of two feed-forward networks layers, input, hidden and output layers. The input layers used to read the inputs into the network for a process, a series of radial basis functions (e.g. Gaussian functions) were executed by the hidden nodes and the linear summation functions were also executed by the output nodes. When the network process started, the weights from both layers (input-hidden) and (hidden-output) were calculated. Moreover, if we have N data set of points in a space of multi-dimensional, this requires that every input vectors of the D dimension to be related to its corresponding target output p t . x 1 x 2 x 3 The purpose is to get a function f (x) such that The weights between the inputhidden layers and the hidden-output layers were determined using Equation 11: where, this can be written in a matrix form by defining the vectors and the matrix as: where, this can finally be written as: This operation can be clearly shown in Fig. 2.

Constructive Models
Two neural network algorithms were used in this paper, back-propagation algorithm compared to radial bases function. The constructed back-propagation network consists of three layers as shown in Fig. 3, an input layer, two hidden layers and one output layer. The input layer consists of three inputs that are product of complexity (CPLX), programmer capability (PCAP) and the thousand of source lines code (KSLOC), the first hidden layer consists of twenty fully interconnected neurons, the second hidden layer consists of ten fully interconnected neurons also and the output layer consists of one output that is the measured efforts.
The radial base function consists of three layers, the input layer, one hidden layer and, one output layer. The input layer consists of three inputs as mentioned above that is CPLX, PCAP and the KSLOC, the hidden layer constituted from four fully interconnected neurons and finally, one output layer is produced i.e., the estimated efforts as shown in Fig. 4.

Data Collection
In this paper, NASA public dataset is used. The dataset consists of 60 projects data (Singh and Sahoo, 2011). The dataset consists of 17 attributes, but in this paper only four attributes are considered, three for the input, product complexity (CPLX), programmer capability (PCAP) and thousand of source lines of code (KLOC) and one for the output, that is efforts as shown in Table 1. The 60 datasets are actually divided into 45 for training and 15 for testing. This experiment is implemented using NN-Matlab toolbox.

Evaluation Criteria
In this paper, different evaluation criteria are used to measure and compare the actual and estimated efforts error as seen below: Correlation coefficient (R): where, y and ŷ are the actual and the estimated efforts and n is the number of measurements used in the experiment. Back-propagation performance: In this paper, the constructed back-propagation neural network consists of three layers as mentioned before three inputs that are the Product of Complexity (CPLX), Programmer Capability (PCAP) and thousand of source lines of code (KLOC), we have also two hidden layers the first one consists of 20 neurons and the second one consists of 10 neurons, in addition to one output layer. The correlation coefficient graph can be also shown in Fig. 5. The training and testing performance of the actual and estimated back-propagation neural network can be shown in Fig. 6 and 7. The different statistical results of error estimation functions for the back-propagation neural network at training and testing cases can be shown in Table 2 and 3. According to the results obtained from the plotted figures and the different error evaluation criteria, it's obvious that the performance of the back-propagation neural network in both training and testing cases is better than the redial base function.
Redial Base Function performance: The constructed radial base function consists of two layers, the input layer that contains three inputs; the hidden layer contains four interconnected neurons and the output layer. The RBF correlation coefficient diagram can be shown in Fig. 8.  The training and testing performance of the actual and estimated radial base function network can be shown in Fig. 9. and 10. The different error estimation statistical functions results for the training and testing cases of the radial base function neural network can be shown in Table 4 and 5.
The performance of the radial base function was satisfactory but not as the back-propagation neural network according to the results obtained from the plotted graphs and the error evaluation criteria.

Conclusion
This paper is introduced to contribute in solving the problem of software effort and cost estimation by comparing two well known neural networks models. These proposed models used to help the project managers in planning, managing and avoiding the risks resulted from the unexpected problems and the delay that may happen during the project period. Actually, a comparison between the both models, back-propagation algorithm (BP) and Redial Base Function (RBF) was presented. In both models, training and testing cases are developed and the Figures that show the actual and estimated efforts are plotted, furthermore, the errors measurements are calculated. According to the results obtained we can say that the performs of the backpropagation neural network in both cases training and testing are better than the radial base function, so the back-propagation algorithm proves, again and again, its efficiency in dealing with such types of problems. My future plan is to extend this work to cover other applications in soft computing techniques.