OPTIMIZATION OF TEST CASES BY PRIORITIZATION

Regression testing is testing the software in order to make sure that the modification made on the program lines does not affect the other parts of the software, it is in maintenance phase and accounts for 80% of the maintenance cost and thus optimizing regression testing is one of the prime motives of software testers. Here we take the advantage of selecting test case information available in regression testing and prioritize them based on the number of modified lines covered by the test case, the test case which covers the most number of modified lines has the highest priority and is executed first and the one with the least coverage of modified lines has the lowest priority and is executed last provided deadline time is not reached, thus even if the testing is not finished we will have covered maximum modified lines, the prioritization of the test cases are done using the genetic algorithm, the genetic algorithm takes test case information from regression testing as input and produces a sequence of test case to be executed such that the maximum number of modified code is covered.


INTRODUCTION
Software testing requires resources and consumes 30-50% of the total cost of development. It is impractical to repeatedly test the software by executing a complete set of test cases under resource constraints (Zhong, 2008). Because of these reason researches have considered various methods for reducing the cost of regression testing, this includes test case minimization and regression test selection, test suite minimization techniques lower cost by reducing a test suite to a minimal subset that maintains equivalent coverage of the original test suite with respect to a particular test adequacy criterion, regression test selection method reduces the cost of regression testing by selecting an appropriate subset of the existing test suite based on information about the program, modified version (Jacob and Ravi, 2013a). Test suite minimization methods and Regression test selection, however, can have drawbacks (Smith, 2009). For example, although some empirical evidence indicates that, in certain cases, there is little or no loss in the ability of a minimized test suite to reveal faults in comparison to its non-minimized original other empirical evidence shows that the fault detection capabilities of test suites can be severely compromised by minimization (Sampath, 2008). Because test case prioritization techniques do not themselves discard test cases, they can avoid the drawbacks that can occur when regression test selection and test suite minimization discard test cases (Islam, 2012). Alternatively, in cases where the discarding of test cases is acceptable, test case prioritization can be used in conjunction with regression test selection or test suite minimization techniques to prioritize the test cases in the selected or minimized test suite (Kapfhammer, 2007). Huang (2010) has proposed a cost cognizant test case prioritization technique based on the use of historic records and genetic algorithm. They run a controlled experiment to evaluate the proposed technique's Science Publications JCS effectiveness. This technique however does not take care of the test cases similarity. Sabharwal (2011) has proposed a technique for prioritization test case scenarios derived from activity diagram using the concept of basic information flow metric and genetic algorithm. Sabharwal (2011) has generated prioritized test case in static testing using genetic algorithm. They have applied a similar approach as to prioritize test case scenarios derived from source code in static testing. Andrews and Sasikala (2012) has applied genetic algorithm for randomized unit testing to figure out the best suitable test cases. Mohsen FallahRad has applied common genetic and bacteriological algorithm for optimizing testing data in mutation testing.

PROBLEM DEFINITION
Prioritization (orderings) of T and f are a function that, applied to any such ordering, yields an award value to that ordering. For simplicity and without loss of generality, the definition assumes that higher award values are preferable to lower ones. For given T, a test suite, PT, the set of permutations of T and f, a function from PT to the real number. Our aim is to find T'∈ PT such that: To measure the success of a prioritization technique in meeting the goal, we must describe the goal quantitatively. Depending upon the choice of f, the test case prioritization problem may be intractable. It is also possible to integrate test case prioritization with regression test selection or test suite minimization techniques (Jacob and Ravi, 2013b). Alternatively, we might prioritize test cases in terms of their increasing cost-per-coverage of features listed in a requirements specification. We restrict our attention, focusing on general test case prioritization in application to regression testing, independent of regression test selection and test suite minimization (Canessane and Srinivasan, 2013;Andrews and Sasikala, 2012).

GENETIC ALORITHM
Genetic Algorithms (GAs) are search methods based on principles of natural selection and genetics. GAs encodes the decision variables of a search problem into finite-length strings of alphabets of certain cardinality.
The strings which are candidate solutions to the search problem are referred to as chromosomes, the alphabets are referred to as genes and the values of genes are called alleles (Sabharwal, 2011). Unlike traditional search methods, genetic algorithms rely on a population of candidate solutions. Once the problem is encoded in a chromosomal manner and a fitness measure for discriminating good solutions from bad ones has been chosen, we can start to evolve solutions to the search problem using the following steps.

Initialization
The initial population of candidate solutions is usually generated randomly across the search space.

Evaluation
Once the population is initialized the fitness values of the candidate solutions are evaluated.

Selection
Selection allocates more copies of those solutions with higher fitness values and imposes the survival-ofthe-fittest mechanism on the candidate solutions.

Recombination
Recombination combines parts of two or more parental solutions to create new, possibly better solutions (i.e., offspring).

Mutation
While recombination operates on two or more parental chromosomes, mutation locally but randomly modifies a solution.

Replacement
The offspring population created by selection, recombination and mutation replaces the original parental population. Repeat steps from evolution to replacement until a terminating condition is met.

Generate Population
Initially population is randomly selected and encoded. Each chromosome represents the possible solution of the problem.

Evaluate the Fitness
Fitness of the chromosome can be defined by the objective function. This objective function generates a real number from the input chromosome. Based on this number two or more chromosome can be compared.

Apply Selection
In general the selection is depending on the fitness value of the chromosome. The chromosome with higher or lower value will be selected based on the problem definition.

Apply Crossover and Mutation
Parents are chosen and randomly combined. This technique for generating random chromosome is called crossover.

TEST CASE OPTIMIZATION USING GA
Let's say a program has test case suite T, now if one can make modification in the program p, suppose modified program is P', so in order to test program P' one can generate a prioritize sequence of test cases from test case suite T, on the basis of the line of code modified (Binkley and College, 1997).

Fitness Function
The following fitness function will be used.
Fitness value (F) = Σ {order * (number of modified lines covered by test cases)}.

Crossover
Here one can use one point cross over with crossover probability Pc = 0.33.
Crossover Probability = Fitness Function of Chromosomes/∑Fitness Function.

Mutation
Here we will use mutation probability Pm = 0.2. It means that 20% of the genes will be muted within a chromosome. Table 1 tells us which test case covers which line code. This is helpful later on when we know the number of modified lines, we can compare the number of modified lines with above information and sort out which test case covers most modified lines of code (Sastry, 2007). Assume that lines 5,8,10,15,20,23,28,35 are modified and the modified lines of code covered by each test case are shown in the Table 2. It shows the test cases which does not at all cover modified lines of code though they cover lines. We limit only to prioritize the test cases based on number of modified lines a test case covers are shown in the Table 3.
Now we apply genetic algorithm, on this data, generate random number without repetition and put it in the following column, these pattern of random number would represent chromosomes and we would have chromosomes, e1, e2, … and so on and then we find the fitness of each chromosomes, find probability, perform selection and recommend which chromosomes to be taken into the popula-tion. Based on the random number we came to know that the first random number recommends the chromosome1 which is represented as:  Test case  T1  T2  T3  T4  T5  T6  T7  T8  T9  T10  T11  T12  Number of modified lines  2  4  1  3  2  2  5  2  4 1 0 2 Because the selected random number lies between 0-0.342. Second random number recommends the chromosome 2 which is represented as:

JCS
Because the random number lies between 0.342-0.671. The third random number recommends the chromosome 1 which is represented as: Because the selected random number lies between 0-0.342. So now we have the following member in our mating pool: Now we will apply the one point crossover on these chromosomes and will generate the new off springs: When we apply one point crossover to the selected population then we get these offspring's: Suppose if the crossover probability is 0.3 then we select 2 chromosomes from the offspring and one from the parents based on the fitness function value. This process is repeated certain fixed number of iterations, on repeating this procedure multiple times, we will get the nearly optimum solution are shown in the Table 4.

GA Initialization
In this module sample population is initialized. It is generated randomly. Population is a collection of chromosomes. Each chromosome consists of genes in it.
Here order is the priority of the test case, if the test case is to be executed first then the order of the test case will be n, where n is the number of test case, NML is number of lines modified. E1, E2,.. are the chromosomes, to generate this random pattern we use rand() present in stdlib of c language, if "K" the random number generated it should satisfy this condition K ≤ N, the other condition is that the number should not repeat, thus if we calculate the total number of possibilities then one will have to calculate the value of N X (N-1) X (N-2) X (N-3)….1 this value will be very large if N is large, thus genetic algorithm would much optimize the load of find such a possibilities.

GA Evaluation
Once the population is initialized, the fitness values of the candidate solutions are evaluated. This is where we attempt to identify the most successful members of the population and typically we accomplish this using a fitness function (Guillaumier, 2003

∑
Here one can find the order and number of modified lines of each test cases in a test case pattern present in a chromosomes, gives the fitness value of a particular chromosomes. Here for instance if one takes the first chromosome e1, then one has test case 5 scheduled to be executed first, test case 4 comes second thus, for first test case We take the value 5 and index it in the array of matrix, this gives as the order and number of the particular test case in column one and two, we find the product of order and number of modified line test case 5 and it comes out to be 48 as 8×6 then one can proceed with test case 4 it comes out to be 63 and then we add 48+63, this process continues till then end of all the test cases finally we get the fitness of chromosomes e1 and we calculate for e1-e5.

GA Selection
In the selection process typically we call the fitness function to identify the individuals that we use to create the next generation. We calculate the probability and cumulative probability of the population by the formula: After finding the cumulative probability, one use roulette wheel technique to find the parents, so that one can perform crossover and mutation operation.

GA Crossover
Recombination combines parts of two or more parental solutions to create new, possibly better solutions. Consider that the following two chromosomes (e1, e2) were selected to be the fittest amongst the five chromosomes. The execution sequence of these two chromosomes: E2 T8 T7 T3 T5 T1 T12 T4 T11 T6 T10 T2 T9  E4 T3 T4 T1 T10 T12 T11 T9 T8 T7 T2 T5 T6 In one point cross over one generates the a random number smaller than the number of test cases, then one can take that random number of point of crossover, we calculate the cross over probability: E2 T8 T7 T3 T5 T1 T12 T4 T11 T7 T2 T5 T6  E4 T3 T4 T1 T10 T12 T11 T9 T8 T6 T10 T2 T9

GA Mutation
While recombination operates on two or more parental chromosomes, mutation locally but randomly modifies a solution. Considering the below chromosomes where cross over is already performed and suppose the mutation probability is 0.16 then one can generate two random numbers and then brings changes about those structure, if 3 and 8 are then number generated then the above chromosomes becomes. The structure that is at the index 3, index 8 that are swapped as a process of mutation, it is believed to improve the fitness if mutation is done once in certain iteration and not all: T8  T3  T7  T4  T11  T8  T5  T10  T1  T12  T12  T11  T4  T9  T3  T1  T7  T6  T2  10  T5  T2  T6 T9

Evaluation Operation
Test info is an array that stores all the necessary information of a test case represents the chromosomes. Fitness is variable that stores fitness value of chromosomes. Fitar is an array that stores the fitness value of each chromosome:

PERFORMANCE ANALYSIS
For performance analysis we use some random chromosomes it then uses a fitness function and checks how at an average is the fitness of each chromosomes, we observe that in the beginning or otherwise called first generation are shown in the Table 5, at an average the fitness value of the chromosomes is very poor, in order to improve the fitness at an average it uses the genetic algorithm, its main postulate being "the survival of the fittest", this algorithm mimics the nature and produces the best optimum solution. Amongst many operations available in the genetic algorithm cross over and mutation are the two that is implemented, the two produces a fairly good outcome. The output which is produced by the chromosome has the fitness function as in Table 6.

JCS
If the average fitness value of the chromosomes are found it comes out to be 190.6 fitness values. With above fitness value we search two best parents and perform cross over for fixed amount of times, for instance with five iteration we get the following output. The chromosomes fitness values are shown in Table 7.

CONCLUSION
Here the genetic algorithm is applied on the test cases with their execution history. We used a fitness function which gives higher value if a test case covers more line of code and a test case which has higher fitness value is provide higher priority in ordered sequence. When we applied genetic algorithm a large number of time we will get a nearly optimized solution. The input given to the genetic algorithm is a set of chromosomes and the chromosomes are set of test cases with the execution history, below is an instance of chromosome:

T1→T2→T3→T4→T5→T6→T7→T8→T9→T10→T1 1→T12
We consider a random execution sequence generated by random number generator function available in stdlib library (c language) the sequence so generated becomes one chromosomes, we use five chromosomes, generates the fitness of each chromosomes and then the average fitness value is found. In the first generation the average fitness value comes out to be 190.6, we use iteration value five as a fixed terminating condition, after the fifth iteration we find that the average fitness value of the population becomes 205.6 a much better one than the first generation.

JCS
This means that the final population has a set of chromosomes, whose execution sequence is nearly the best optimum solution are shown in the Fig. 1. We considers a random terminating value, we can perform analysis on bench mark problems and derive the terminating criteria by which we can find the least iteration value that will provide guarantee the near optimal solution.