Feature Subset Selection for Hot Method Prediction using Genetic Algorithm wrapped with Support Vector Machines

: Problem statement: All compilers have simple profiling-based heuristics to identify and predict program hot methods and also to make optimization decisions. The major challenge in the profile-based optimization is addressing the problem of overhead. The aim of this work is to perform feature subset selection using Genetic Algorithms (GA) to improve and refine the machine learnt static hot method predictive technique and to compare the performance of the new models against the simple heuristics. Approach: The relevant features for training the predictive models are extracted from an initial set of randomly selected ninety static program features, with the help of the GA wrapped with the predictive model using the Support Vector Machine (SVM), a Machine Learning (ML) algorithm. Results: The GA-generated feature subsets containing thirty and twenty nine features respectively for the two predictive models when tested on MiBench predict Long Running Hot Methods (LRHM) and frequently called hot methods (FCHM) with the respective accuracies of 71% and 80% achieving an increase of 19% and 22%. Further, inlining of the predicted LRHM and FCHM improve the program performance by 3% and 5% as against 4% and 6% with Low Level Virtual Machines (LLVM) default heuristics. When intra-procedural optimizations (IPO) are performed on the predicted hot methods, this system offers a performance improvement of 5% and 4% as against 0% and 3% by LLVM default heuristics on LRHM and FCHM respectively. However, we observe an improvement of 36% in certain individual programs. Conclusion: Overall, the results indicate that the GA wrapped with SVM derived feature reduction improves the hot method prediction accuracy and that the technique of hot method prediction based optimization is potentially useful


INTRODUCTION
Compiler optimizations are most effective when targeted at the hot methods of the input program. Method hotness, determined by execution time and call frequency, is still detected and predicted by profiling in both dynamic and static optimization systems. Although profiling is accurate, it incurs a lot of overhead which impedes program speed. The need for improving the accuracy of the hot method predictive models necessitates the focus on feature subset selection since feature selection greatly influences the performance of the machine learnt predictive models. The main aim of this work is to implement the machine learnt static hot method prediction technique using Genetic Algorithms (GA) derived feature subsets. By hot methods, we mean the long running and frequently called program segments that form the vital targets for various compiler optimization techniques (Sandra and Valli,0000).
The relevant features are extracted from an initial set of randomly selected ninety static program features, using a GA (Koza, 1990) wrapped with the predictive model based on the Support Vector Machine (SVM) (Vapnik, 1997), a ML algorithm. The genetic algorithm proven to be an effective search tool is used in this work. The evaluation of features is based on the feedback obtained from the predictive models. Hence, the time required to converge at the final feature subset is dependent on the number of generations chosen in the GA.
The model's ability to achieve performance improvement is investigated by optimizing the predicted hot methods offline. The optimizations applied are method inlining and Intra-Procedural Optimizations (IPO) like constant propagation and loop unrolling. The impact of optimizing the predicted hot methods on program performance is evaluated on UTDSP and MiBench benchmark programs. The results obtained are compared against LLVM's default optimization heuristics.

Related work:
The application areas of the GA include a wide spectrum of problem solving domain such as supply chain management (Radhakrishnan et al., 2009), input allocation problem (Madan et al., 2010) and various feature selection problems. Several researchers ( Vafaie and Jong, 1992;Kohavi and John, 1997;Yang and Honavar, 1998;Pernkopf and O'Leary, 2001;Fröhlich and Chapelle, 2003;Yu and Cho, 2006;Huang and Wang, 2006;Faraoun and Rabhi, 2007;Rajavarman et al., 2007;Ramirez and Puiggros, 2007;Xia et al., 2009) have employed the genetic algorithm as a search tool in feature subset selection in their work. All these investigations have confirmed that the GA-generated feature subsets perform better than the initial universal set or the full feature set.
GA is particularly useful when search is large. Hence, the algorithm has found wide application in compiler research. To find the best optimization sequence which reduces the code size Cooper et al. (1999) have used the GA. Cavazos and O'Boyle (2005) have used the GA to tune dynamic compiler inlining heuristics. Li et al. (2008) and Zhuo et al. (2008) have used GA-based feature subset selection for optimizing the SVM parameters. In our present work on hot method prediction, the GA is used only as a feature selection algorithm and the prediction models use the default SVM parameters. Sandra et al. (Sandra and Valli, 2008a;2008b) in their work have developed a basic hot method prediction model to predict the call frequency, whereas in this work two predictive models are built (Sandra and Valli, 0000; one based on call frequency and another the time spent in a method. In a previous work on hot method prediction (Sandra and Valli, 0000) the authors deal with the construction of an effective feature set from a full set of ninety randomly chosen static features using a 'knock-out' algorithm. Their model for the long running hot methods guided by twenty nine static features provides 68% prediction accuracy and the one for the frequently called hot methods yields 61% prediction accuracy on UTDSP and MiBench benchmark programs when trained with ten features.
The GA has been used in compiler based feature generation problems (Leather et al., 2009), where, each feature is a sentence in a grammar for the purpose of loop unrolling optimization. In the present study, we use the GA for selecting features specific to the prediction of hot methods and then apply inlining and intra-procedural optimizations to evaluate the effects of prediction. Stephenson et al. (2003), in their work, have used the GA to automatically search the solution space of the priority function, while we have used it for feature reduction.

MATERIALS AND METHODS
In the construction of the predictive models, an initial set of ninety static program features (Sandra and Valli, 0000) has been used, for training and testing the classifiers. The SVM classifier is trained offline with the training dataset taken from the UTDSP and MiBench benchmark suites to predict hot methods of a new untrained program. Selecting the most relevant feature for a particular learning problem is a key challenge because any inappropriate feature included in the final set is bound to misguide the predictive models. Feature reduction from the full feature set is performed using the standard GA.
Introduction to the genetic algorithm: The genetic algorithm (Koza, 1990) is a powerful problem solving strategy that is widely used as a search tool for feature subset selection in any ML based classification problem. It works on the central evolutionary principle of the "survival of the fittest". Evolution is a population phenomenon and its forces operate on the individual's phenotypes that are manifestations of their genetic makeup called genotype. Based on their contribution to the individual's reproductive fitness they are either preserved or rejected during the selection process. The adaptive value of the phenotypes is influenced by the random variations introduced by the regular gene recombination effected by crossovers in chromosome segments and to a small extent the gene alterations introduced by point mutations. Individuals that are adaptively superior to others are selected to be the parents of the next generation. Over many generations of such progressive adaptation operating under selection pressure, a population that is far superior to the initial one appears. A GA is a programming technique that mimics this evolutionary mechanism to evaluate and select the best digital individual from among a pool of randomly generated candidates or solutions.
The crossover operator generates new offspring from parents by interchanging the genes between the parents at the crossover point. Crossovers can be single point, two point or homologue. Mutation alters the genes at random points to generate new offspring. Figure 1 shows these genetic operations. Genetic algorithm in feature subset selection: An initial population of solutions is generated wherein each individual is represented by a chromosome. The feature vector consisting of a set of randomly selected ninety static program features is represented as a bitstring in the chromosome. Every bit is either '0' or '1' in the chromosome. The f i in Fig. 2 represents the mask value of the i th feature. A '1' includes the i th feature and a '0' excludes the respective feature. The number of '1's in the bitstring represents the number of features selected. These bitstrings constitute the genotype which should be converted to its phenotype for evaluating its fitness value. The phenotype of the genes in the chromosome is the feature value extracted from each method in a benchmark program. For each individual in the initial population, the genotype of the chromosomes is translated into its phenotype. That is, the feature vectors for the training data set are created by extracting the individual feature values of each method from the set of training benchmark programs. The testing data set is also constructed using the same chromosome from each method in the test benchmark programs.
A predictive model is constructed based on the ML-based SVM algorithm, to predict hot methods. The prediction accuracy obtained is fed to the GA to evaluate the fitness of the chromosome. The GA uses a metric called a fitness function that evaluates the fitness of the bit string. The fitness criterion is designed on the basis of the hot method prediction accuracy and the number of features and is calculated using the formula given in Eq. 1: The w a in Eq. 1 represents the weight associated with the prediction accuracy and w n is the weight associated with the number of features. Those individuals which exhibit a higher fitness value than others are passed on to the next generation. In our scheme of searching strategy, a high fitness value is attributed to the chromosome when the prediction accuracy is high with a small number of features. The weights associated are changed for different runs and finally set to 50% both for w a and w n . The CHROMO_LEN is the total number of ninety static features used in this work. The f i is the feature vector in bitstring form as given in Fig. 2.
Thus, the principle of the "survival of the fittest" of the GA is applied to retain the individuals with a high fitness value as "elitism" of the population to constitute the next generation. The fittest chromosomes representing the evolving individuals, survive the selection procedure. Two terminating conditions are used to stop the evolution process. One of them is the maximum fitness value and the other is the number of generations. If the terminating condition is reached, the evolution process stops and the individual with the highest fitness value is returned as the best solution. Else, the evolution process continues with the two genetic operators, namely, crossover and mutation. Two individuals with the highest fitness value are chosen for the crossover operation to produce two offsprings. Crossover points are chosen randomly. Mutation is also decided randomly and is applied on the two offsprings in the new generation. The process is repeated for the new generations of the population. The procedure for feature subset selection using GA wrapped with SVM is given in Fig. 3.

Evaluation of hot method prediction accuracy:
To test and evaluate the performance accuracy of our prediction models, we use the standard 'leave-one-out' methodology under a subset of programs of MiBench (Guthaus et al., 2001) and UTDSP (Lee, 1998) benchmark suites that are successfully compiled in Low Level Virtual Machine (LLVM) (Lattner and Adve, 2004).
We use the following evaluation metrics (Sandra and Valli,0000) to measure the performance of the predictive models. Total Prediction Accuracy (TPA) is the ratio of the number of correct predictions of both the hot and cold methods to the total number of methods in the program. The Hot Method Prediction Accuracy (HMPA) is defined as the ratio of the number of predicted hot methods to the total number of methods that are actually hot. Biased Hot Method Prediction Accuracy (BHMPA) is the ratio of the number of predicted hot methods to the total number of methods that are actually hot. Bias can be either 'hot' or 'cold' and the ML-based optimizing compiler optimizes all the methods in the program when the model is 'hot biased' and optimizes nothing if the model is cold biased. The bias factor is calculated using Eq. 2: To ascertain that the prediction values are accurate, we eliminate the bias using Eq. 3: A Hot Method Threshold (HMT) is set to find the actual number of methods that are hot in a program and a HMT of 50% is arbitrarily fixed for the evaluation of the predictive models. In the case of the Long Running Hot Methods (LRHM) predictive model, the 'gprof' tool is used to determine the execution time of each method, while profiling is used to find the call frequency of the methods for the Frequently Called Hot Methods (FCHM) predictive model during the training phase. The top 50% of the methods in both the models are designated as hot and assigned the label (+). The remaining methods are cold and are labeled (-1).

Optimization effects:
Our goal is to show that the MLbased hot method prediction technique could be a viable alternative to the simple heuristics to decide when to apply inlining and Intra-Procedural Optimizations (IPO) for a new program. We compare the execution time of programs subjected to the selective optimization of hot methods predicted by the ML-based prediction models with LLVM's default optimization heuristics to assess the impact of the new approach on program performance.

RESULTS
Genetic Algorithm Derived Feature Subsets: Based on the parameters given in Table 1, the GA is used to derive the bit string. The bit string is interpreted as a feature subset vector. If the bit position 'i' is '1' then the i th feature is chosen. All the features whose corresponding bit string positions are '0' are not included in the feature subset.
The fitness function uses the predictive model built by SVM to calculate the prediction accuracy of the feature subset. For each derived bit string, a predictive model is built and the Hot Method Prediction Accuracy (HMPA) is calculated on each UTDSP benchmark program using the standard 'leave-one-out' method. The fitness function is the average HMPA of all the UTDSP benchmark programs. The number of features, i.e., the number of '1's in the bit string is also used in the fitness function. Figure 4 and 5 represent the derived feature vector for the LRHM and the FCHM predictive models with their respective thirty and twenty nine features. It is found that twenty one features are common to the two predictive models and only nine and eight features are unique to the LRHM and FCHM predictive models respectively. Table 2 gives the static feature subsets for the two predictive models generated by GA.

Hot method prediction:
The GA derived feature subset of thirty and twenty-nine features are used in building the LRHM and FCHM predictive models. Table 3 presents the HMPA obtained from the UTDSP and the MiBench benchmark programs. The two benchmark suites are designed with programs that are successfully compiled in LLVM compiler infrastructure.    Number of basic blocks with more than two predecessors 87

Results of optimization:
The effects of inlining and IPO using constant propagation on the predicted LRHM and FCHM are evaluated on the MiBench and UTDSP benchmark suites and the results are presented in Table 4. The IPO performed on the predicted LRHM and FCHM achieve an overall improvement of 5-4% respectively as against 0-3% using the LLVM's default set of optimization heuristics. However, inlining of LRHM and FCHM achieves 3-5% improvement on the program execution speed as against 4-6% seen in the case of LLVM's default heuristics. Despite a small decrease in the case of inlining, certain individual programs like 'latnrm' and 'susan' appear to have a positive impact. For instance, from Table 4, it is seen that 'latnrm' has a speedup of 13% when its LRHM are inlined and a speedup of 6% when its FCHM are inlined.

DISCUSSION
In a previous work (Sandra and Valli,0000), the authors have demonstrated that the predictive models for LRHM and FCHM trained with the full set of ninety features are capable of achieving 79-38% prediction accuracies on the UTDSP benchmark suites and 52-58% on MiBench. According to the present study, based on GA generated feature subsets of thirty and twenty-nine features to train the LRHM and FCHM models, the accuracies are 81-71% respectively on the UTDSP and 71-80% on the MiBench. This is an improvement of 8-29% in predicting LRHM and FCHM over the models trained with the full feature set. In another approach (Sandra and Valli, 0000) where a 'knock-out' algorithm is implemented in order to eliminate irrelevant features, the LRHM and FCHM models have accuracies of 68-61%. It is evident that GA derived predictive models provide improvement in prediction over the other models.

CONCLUSION
This study describes the derivation of feature subsets using the GA wrapped with the ML-based algorithm SVM, to maximize the accuracy of the prediction of long running and frequently called hot methods, leading on to optimization and improvement in program performance. The GA evaluates the fitness of the feature subsets on UTDSP benchmark programs using the SVM algorithm. The GA-generated feature subsets containing thirty and twenty-nine features respectively for the two predictive models when tested on MiBench, predict the LRHM and FCHM with the respective prediction accuracies of 71-80%. The UTDSP benchmark suite achieves 81-71% for the LRHM and FCHM predictive models. These observations indicate that the GA-based approach in hot method prediction yields comparable results.
Future work in this GA based approach would focus on incorporating SVM parameters in the GA bit string coding.