Modern Metaheuristic with Multi-Objective Formulation for the Variable Selection Problem

: The development of efficient algorithms for variable selection becomes important to deal with large and complex datasets. Most works in quantitative chemical analysis have used Genetic Algorithms (GAs) as a reference method to select variables. On the other hand, new advances in metaheuristic techniques provide novel possibilities in this task Moreover, the application of Multi-Objective Optimization (MOO) may significantly contribute to efficiently construct an accurate model in the context of multivariate calibration. MOO has showed itself as an efficiently and successful tool to dealing with conflicting objective-functions. For instance, the use of MOO may be considered as a good choice to treat the reducing of prediction error and the number of selected variables in a calibration model. In this paper, we present a modern metaheuristic implementation called Multi-Objective Firefly Algorithm (MOFA) for variable selection in multivariate calibration models. The goal is to propose an optimization to reduce the prediction error of the property of interest in the analysed sample as well as reducing the number of selected variables. However, the outcomes are remarkably promising compared with the previous work. Based on the results obtained, it is possible to demonstrate that our proposal is a viable alternative in order to deal with such conflicting objectives. Additionally, we compare MOFA with a traditional GA implementation and show that MOFA is more efficient for the variable selection problem.


Introduction
The variable selection problem arises when one requires to model the relationship between a variable of interest and a set of potential explanatory variables (or predictors). It has become the focus of many research in areas of application with large datasets as chemometrics, where devices such as spectrophotometers have generated thousands of variables for just one sample (Beebe et al., 1998). To solve this problem, the use of selection methods it is necessary to select variables that yield the best prediction. In this sense, the development of efficient algorithms for variable selection becomes important in order to handle large and complex datasets (Paula et al., 2014;2016).
Most works in quantitative chemical analysis have used Genetic Algorihms (GAs) as a reference to select variables (Niazi and Leardi, 2012;Ferrand et al., 2011;Cong et al., 2013;Yun et al., 2014;Sarkhosh et al., 2014;Wang et al. 2015). As reviewed by Niazi and Leardi (2012), in the last decades GAs have been even more frequently used to solve different kinds of problems in chemistry data. For instance, Ferrand et al. (2011) used a GA combined with a Partial Least Squares (PLS) regression to produce models with a reduced number of wavelengths and a better accuracy. The authors showed that the number of wavelengths considered was reduced substantially by four and accuracy was increased on average by fifteen percent. Cong et al. (2013) proposed a variable selection method that combines a GA with PLS to select proper descriptor subset for Structure-Activity Relationship (QSAR) modeling in a linear model. Their outcomes demonstrated that it was possible to gain satisfactory prediction results and can be extended to other QSAR studies. Yun et al. (2014) presented a modified GA with PLS (GA-PLS) for variable selection in multivariate calibration. Based on their results, the authors showed that GA-PLS was able to perform an improvement on variable selection compared to the original GA-PLS. Finally, Sarkhosh et al. (2014) proposed an application of GAs for pixel selection in multivariate image analysis for a QSAR study of trypanocidal activity for quinone compounds and design new quinone compounds. They investigated the pixel selection effect by genetic algorithm application for PLS model. The resulted model showed a high prediction ability with low error values and the proposed QSAR model with GA-PLS was used for structure modification and their activity predicted. On the other hand, despite the success of GAs in these applications, new studies have demonstrated that the use of bio-inspired metaheuristics such as Firefly Algorithm (FA) may be an efficient optimization technique (Yang, 2009;Paula et al., 2014;Paula et al., 2014). For instance, Yang (2010) proposed and used the FA for solving multimodal optimization applications. The author compared FA with Particle Swarm Optimization (PSO) and GA demonstrating superior performance of FA. According to Yang (2013), FA has two major advantages over other metaheuristics: (1) automatical subdivision: this means that the whole population can automatically subdivide into subgroups and each group can swarm around each mode or local optima; and (2) the ability of dealing with multimodality: that is, the subdivision allows the fireflies to be able to find all optima simultaneously if the population size is sufficiently higher than the number of modes. Thus, theses significant characteristics can be exploited to deal with complex search problems such as variable selection. Goodarzi and dos Santos Coelho (2014) presented a FA as a feature selection approach of Near Infrared (NIR) spectral information. Based on the results obtained, authors demonstrated that FA-PLS can improve prediction results in comparison to when only a PLS model was built using all wavelengths. However, in their approach a single-objective function was adopted in order to minimize the Root Mean Squared Error (RMSE). Finally, Paula et al. (2014) proposed a Graphics Processing Unit (GPU)-based FA with multi-objective formulation for variable selection in multivariate calibration problems. Their results showed that FA is a more suitable choice and a relevant contribution for the variable selection problem. Notwithstanding, authors applied a simple multi objective strategy in FA and used a relatively large number of fireflies.
Often in multivariate calibration, it is performed a multi-objective analysis in the outcomes yielded by the variable selection algorithms. In such analysis, it is assessed the prediction error of the property of interest as well as the number of selected variables (Galvao et al., 2011;Sofacles et al., 2012). Nevertheless, both objectives were not used in the proposal of Goodarzi and dos Santos Coelho (2014). Moreover, the strategy applied by Paula et al. (2014) does not provide the Pareto optimal front describing the relationship between both objectives. In general, multi-objective analysis is usually performed after the variable selection. In this context, the application of Multi-Objective Optimization (MOO) in metaheuristics can significantly contribute to efficiently construct an accurate model in multivariate calibration (Wang et al., 2015;Tan et al. 2014). Furthermore, MOO may be an efficient tool to deal with conflicting objective functions such as reducing the prediction error value and the number of selected variables. Therefore, this paper proposes an enhanced implementation of a Multi-Objective Firefly Algorithm (MOFA) for variable selection in multivariate calibration models using Multiple Linear Regression (MLR).
It is important to highlight that a previous study was published in Paula et al. (2015).
However, such paper provides only a simple MOFA implementation as well as a naive comparison against a standard GA. In this work, we aim to show how to adapt this metaheuristic, initially proposed for the continuous domain, into a binary problem (variable selection). Additionally, MOFA is capable of outperforming a traditional GA both in mono-objective as in multiobjective formulation. Based on the results obtained, it is possible to demonstrate that MOFA is indeed a more efficient choice for the variable selection problem.
The remainder of this paper is organized as follows. Section Firefly Algorithm depicts the original Firefly Algorithm. Section Proposal presents our proposed algorithm. The material and methods used to obtain results are described in Section Experimental. The results are discussed in Section Results. Finally, Section Conclusion shows the conclusion of the paper.

Firefly Algorithm
Nature-inspired metaheuristics have been a powerful tool in solving various types of problems (Yang, 2008;2009). FA is a recently developed optimization algorithm proposed by Yang (2009). It is based on the behaviour of the flashing characteristics of fireflies. A pseudocode for the original FA can be seen in the Algorithm 1.
In the original algorithm, there are two important issues to be treated: (i) the variation of light intensity; and (ii) the attractiveness formulation. The attractiveness of a firefly is determined by its brightness or light intensity, which is associated with the encoded objective function (Yang, 2009). The brightness I of a firefly at a particular location x can be chosen as I(x)⇒ f(x). The light intensity I(r) varies with the distance r monotonically and exponentially as shown by Equation (1): where, I o is the original light intensity and γ is the light absorption coefficient.
As a firefly's attractiveness is proportional to the light intensity seen by adjacent fireflies, one can define the attractiveness w of a firefly by: where, ω o is the attractiveness at r = 0.
The distance between any two fireflies is calculated using Cartesian distance in Equation (3): According to Yang (2013), a firefly i is attracted to a brighter firefly j and its movement is determined by: for j = 1: n 7.
Light intensity Calculate the attractiveness between i and j which varies with distance r via exp[−γr] 10.
Move firefly i towards j in all d dimensions according to the attractiveness between i and j 11. end if 12. Evaluate the new fireflies and update light intensities 13.
end for j 14. end for i 15. Rank the fireflies and find the current best 16. end while 17. Postprocess results Proposal Paula et al. (2014a) demonstrated that FA can be used for variable selection to solve multivariate calibration problems. The original formulation of FA uses the evaluation of a single objective and does not exploit additional features such as multi-objective optimization. However, previous works have showed that multi-objective algorithms can use fewer variables with a less prediction error . Thus, this paper presents a Multi-Objective Firefly Algorithm (MOFA) for variable selection in multivariate calibration. Section Codification: A numerical example presents a numerical example in order to describe our strategy applied in the FA to adapt it for variable selection. Section Multi-Objective Optimization explains the multi-objective optimization strategy used in our proposed algorithm.

Codification: A Numerical Example
Let us consider a short variable selection problem with five variables available and only three fireflies. Initially, the fireflies (F 1 , F 2 and F 3 ) are uniformly distributed random numbers in the range [0, 1]: The variable selection problem may be considered as a binary problem. Therefore, each firefly must be encoded. In our algorithm, each variable information greater than 0.7 was encoded to 1. This means that such variable will be used in the regression model. The others one that are less or equal to 0.7 was encoded to 0, meaning that the variable will not be used: Soon after coding, each firefly is evaluated using Equation (5): where, X is the matrix of samples and independent variables, y is the vector of dependent variables and β is the vector of regression coefficients.
In Equation (5), only the columns of X indicated by encoded fireflies are used in the regression model. The outcomes obtained by calculating Equation (5) represents the brightness for each firefly. Then, a firefly i is moved towards a firefly j always when the light intensity of firefly j is greater than light intensity of firefly i (In case of MOO, a firefly i is moved towards firefly j when the error prediction and selected variables obtained by firefly j are lower than those obtained by firefly i). For this purpose, the distance between fireflies must be calculated using Equation (3). With the distance between fireflies, one can calculate the attractiveness using Equation (2). As a result, all fireflies and encoded fireflies are updated.
Iterations are repeated until all solutions have been updated. The updates allow solutions moving towards to the current optimal solution. Solution that produces the best fitness (the lowest RMSEV (In this paper, RMSEV means Root Mean Squared Error of prediction on the Validation set) or the lowest number of selected variables) may be choosen by decision maker as the global best solution.

Multi-Objective Optimization
A Multi-objective Optimization Problem (MOP) deals with more than one objective function. MOP has a number of objective functions which are to be minimized or maximized (Deb, 2001). In this sense, a MOP may be described in its general form: where, Ω is the decision space; F: Ω→ R m consists of m real-valued objective functions; and R m is called the Objective space. The attainable objective set is defined as the set {F(x)|∈ Ω}.
In general, there is no point (or vector) in Ω that is capable of maximize (or minimize) all the objectives simultaneously. Thus, it becomes necessary to balance them. The best tradeoffs among the goals can be defined in terms of Pareto optimality (Zhang and Li, 2007).
In mathematical terms, let u, v∈R m be two random vectors. In case of minimization, vector u is said to dominate v if and only if (All the inequalities should be reversed if the goal is to maximize the objectives in Equation (6)): 1. u i ≤v i for every i ∈{1,..,m} 2. u j ≤v j for at least one index j ∈{1,..,m} A point x * ∈ Ω is Pareto-optimal to Equation (6) if there is no point x∈Ω such that F(x) dominates F (x * ). In this sense, a feasible solution x 1 ∈Ω is said to dominate another solution x 2 ∈Ω if and only if: In the multi-objective formulation of FA (MOFA), the choice of current best solution is based on these two steps. A solution x 1 ∈Ω is called Pareto-optimal if there does not exist another solution that dominates it. Among non-dominated solutions, it is applied a multi-objective decision maker method described by Lucena et al. (2013) to choose the current best. Algorithm 2 shows a pseudocode for the proposed MOFA. In line 10 of Algorithm 2, a firefly i dominates another firefly j when its prediction error value and number of selected variables are lower.
Move firefly j towards firefly i using Equation (4)  The number of variables can be treated as a problem constraint. In this case, the algorithm would minimize only the prediction error of the model, as proposed by Goodarzi and dos Santos Coelho (2014). Consequently, the number of variables to be selected should be informed by user which would depend on a prior knowledge about the database and a fortuitous number over a range of the ideal number of variables. On the other hand, the advantage of MOO consists on the fact that the algorithm can optimize this number as a free parameter which is independent of prior knowledge.
In the multi-objective optimization, the algorithms must search solutions with a maximum spread as possible to explore the search space considering the objectives of the problem. In the end, a set of solutions are provided, generally they are non-dominated, that is, does not exist another feasible solution better than the current one in some objective function without worsening other objective function. Our MOFA yields a set of solutions, that explored the search space, in its final population. However, in practical terms, an analyst probably should choose at least one solution to be used. In this sense, we proposed a decision maker to help in this task. The final choice from a multiobjective optimization is an open problem because it depends strongly of the problem and the objectives considered. As far as we know, there is no proposals considering multi-objective optimization for the variable selection problem in chemometrics neither a decision maker to choose a final solution considering the aspects of the application.
In order to help choosing a solution within this set, we used the Wilcoxon signed-rank Ramsey et al. (1993) as a decision maker . Wilcoxon Signed-Rank is a nonparametric hypotheses test used when comparing two related samples to evaluate if the rank of the population means are different. Instead of choosing a solution from one of the extremes of the Pareto front, this test can be used to choose the best solution on an optimized manner. Moreover, it can be used as an alternative to the Paired t test for small dependent samples when the population cannot be assumed as a normal distribution (Ramsey et al., 1993). Algorithm 3 describes the decision maker. The test is applied on the residuals values calculates over the validation dataset. It is note-worthy that the first condition in line 7 of Algorithm 3 (h = 0) refers to a value returned by the Matlab built-in function (wilcoxonSignedRank). Furthermore, the second condition (N j <N best ) indicates the number of selected variables by firefly j and best, respectively.
Although proposing a method that in the end makes a choice primarily based on the prediction error, the number of options have solutions that have been optimized in both parameters (RMSEV and number of variables).
Algorithm 3: Decision Maker 1. Parameters: Pop s×m . 2. best ← index of firefly that has the lowest RMSEV 3. e 1 ← Residuals calculated in the validation dataset using firefly 1 4. for j = 2 to s 5. e j ← Residuals calculated in the validation dataset using firefly j 6. h ← wilcoxonSignedRank (e 1 , e j ) 7. if h = 0 and N j < N best 8. e 1 ← e j 9.
best ← j 10. end if 11. end for 12. Return firefly best

Data Set
The real dataset employed in this work consists of whole grain wheat samples, obtained from vegetal material from occidental Canadian producers. The standard data were determined at the Grain Research Laboratory as in works of Paula et al. (2014b) and Soares et al. (2010;2013). The data set for the multivariate calibration study consists of 1090 Near-Infrared (NIR) spectra of whole-kernel wheat samples, which were used as shoot-out data in the 2008 International Diffuse Reflectance Conference http://www.idrcchambersburg.org/shootout.html).
The Kennard and Stone (1969) algorithm was applied to the resulting spectra to divide the samples into three sets: calibration, validation and prediction. Calibration set contained 389 samples (Each sample in calibration, validation and prediction sets contains 690 wavelengths) and was used to calculate the regression coefficients. The validation and prediction sets contained 193 samples both. The validation set was employed to guide the variable selection in FA, GA, MOFA and NSGA-II. The prediction set was only employed in the final performance assessment of the resulting MLR models. In this paper, we used two different terms: RMSEV and RMSEP. The former indicates that the validation set was used in the error assessment and the latter indicates the prediction set.

Metrics
As shown in Equation (7), predictive ability of MLR models comparing predictions with reference values for a test set from the squared deviations can be calculated by Root Mean Squared Error of Prediction: where, y is the reference value of the property of interest, N is the number of observations and 1 2ˆˆ{ , ,..., } T N y y y y = is the estimated value calculated as: Another criteria that may be used to determine the predictive ability of MLR models is the Mean Absolute Percentage Error (MAPE) Hibon and Makridakis (1995). MAPE is a relative measure to express errors as a percentage of the actual data defined as:

Setup of the Algorithms
The proposed MOFA was implemented using α = 0.2, γ = 1 and ω 0 = 0.97 as proposed by (Yang, 2010). The number of fireflies was defined empirically as 100 and the number of generations as 300. A presents the convergence analysis for MOFA implementation. In the mono-objective formulation the fitness is the Root Mean Square Error of Validation (RMSEV) and in the MOFA the objectives are (1) the RMSEV and (2) the number of selected variables.
We have used for comparison the Non-dominated Sorting Genetic Algorithm (NSGA-II), in particular, the same implementation of Lucena et al. (2013). The main difference between NSGA-II and a simple GA is how the selection operator is applied. This operator is divided into two processes: (i) Fast Non-dominated Sorting; and (ii) Crowding Distance. Table 1 presents the configuration for NSGA-II and GA. For GA the fitness is the Root Mean Square Error of Validation (RMSEV) and for NSGA-II the objectives are (1) the RMSEV and (2) the number of selected variables. The number of maximum number of generations for NSGA-II was defined using convergence analysis presented in the A.
In the Partial Least Square (PLS) study, the calibration and validation sets were joined into a single modeling set, which was used in the leave-one-out crossvalidation procedure. The number of latent variables was selected on the basis of the cross-validation error by using the F-test criterion of Haaland and Thomas (1988) with α= 0.25. The prediction set was only employed in the final evaluation of the PLS model.
All calculations were carried out by using a desktop computer with an Intel Core i7 2600 (3.40 GHz), 8 GB of RAM memory and Windows 7 Professional. The Matlab 8.1.0.604 (R2013a) software platform was employed throughout. Regarding the outcomes, it is important to note that all of them were obtained by averaging twenty executions.

Multi-Objective Evaluation
According with literature Jiang et al. (2015;Azevedo et al., 2011;Auger et al., 2012), in the multiobjective optimization, two key issues are important: first, a solution that is better than another solution in all objectives should be preferred over the latter. Second, the diversity of solutions should be supported. The hypervolume metric offers one possibility to achieve the two aspects (Jiang et al., 2015).
The hypervolume indicator corresponds to the integral of a weight function over the set of objective vectors that are weakly dominated by a solution set and in addition weakly dominate the reference point (Auger et al., 2012). A reference point was set in 150 and 15 for number of variables and RMSEP, respectively.

Mono-Objective Formulation
For comparison of mono-objective implementations, both algorithms GA and FA use the same randomly generated initial solutions in 30 trials. The results are showed in Table 2. One can see that FA generates a model with better generalization (in average) when compared with GA and PLS. However, the number of variables is still large.

Multi-Objective Formulation
As proposed, the number of variables can be optimized in the same time as error of prediction. However, since the optimal Pareto front is unknown, the best way to proceed the evaluation of multi-objective optimization is to test the proposed approach on the same problem against other established Multi-Objective algorithm, in this case, the NSGA-II. Table 3 presents the hypervolume obtained for 30 trials for each algorithm. In this case, each trial correspond to one set of randomly initial solutions. The average hypervolume of MOFA was approximately 7.88% better than NSGA-II. The maximum value (1274) were obtained by MOFA while the minimum value (1120) was obtained by NSGAII. From this result we can conclude that MOFA covers in a better way the multi-objective search space of the problem.      Table 4 presents the results for MOFA implementation. The multi-objective formulation in the FA optimized both RMSEV and number of variables, therefore MOFA improved the results compared to mono-objective formulation. When compared with NSGA-II implementation, MOFA presents the lowest RMSEV as well as the lowest number of selected variables. The best solution obtained by MOFA has 0.05 of error using only 37 variables.

Conclusion
The use of Firefly Agorithm (FA) has been widely used to solve several types of optimization problems. However, it has not been commonly used for variable selection in multivariate calibration model. Moreover, the application of Multi-Objective Optimization in FA has demonstrated that it is possible to achieve viable outcomes when conflicting-objective functions are present in the problem. In this context, this paper proposed an enhanced implementation of the Multi-Objective Firefly Algorithm (MOFA) for variable selection involving NIR spectrometric analysis of wheat samples. The objective was to propose an optimization procedure to reduce the prediction error value of the property of interest as well as reducing the number of selected variables. Additionally, we presented a comparison between our proposed MOFA and a traditional genetic algorithm called NSGA-II. Based on the results obtained, it was possible to demonstrate that MOFA can be indeed a better solution for obtaining a calibration model with an adequate prediction ability and a reduced number of variables.
Future works may present the use of other bioinspired metaheuristics such as Bat Algorithm for variable selection in multivariate calibration. Furthermore, a comparison between MOFA and other metaheuristics may be performed. It is worth mentioning that the choice of a set of non-dominated solutions is an open problem in multi-objective optimization and other options may be presented in contrast to what we proposed in this paper.