A Hybridization of Enhanced Artificial Bee Colony-Least Squares Support Vector Machines for Price Forecasting

: Problem statement: As the performance of Least Squares Support Vector Machines (LSSVM) is highly rely on its value of regularization parameter, γ and kernel parameter, σ 2 , man-made approach is clearly not an appropriate solution since it may lead to blindness in certain extent. In addition, this technique is time consuming and unsystematic, which consequently affect the generalization performance of LSSVM. Approach: This study presents an enhanced Artificial Bee Colony (ABC) to automatically optimize the hyper parameters of interest. The enhancement involved modifications that provide better exploitation activity by the bees during searching and prevent premature convergence. Later, the prediction process is accomplished by LSSVM. Results and Conclusion: Empirical results obtained indicated that feasibility of proposed technique showed a satisfactory performance by producing better prediction accuracy as compared to standard ABC-LSSVM and Back Propagation Neural Network.


INTRODUCTION
Time series prediction corresponds to uncertainty. It is vital an essential in various fields which leads people to study and present numerous of approaches in order to predict what will happen in the future. The tendency to choose a better prediction tools is inevitable for several reasons. Some of them are to allow proper plans to be executed about potential developments, reduce unwanted risk that may cause serious loss, no matter in form of non material or otherwise, reducing cost and in particular field, to provide better customer service (Makridakis and Hibon, 2000).
In the field of finance, there are many predictor tools that have been used since years ago, especially conventional statistical prediction tool. One of the most common applied is Box Jenkins method, which is technically known as Autoregressive Integrated Moving Average (ARIMA) (Palit and Popovic, 2005). The major shortcoming of ARIMA is the pre assumption on the linearity of the model which is clearly not suitable with commodity price of interest. Such dataset deals with high nonlinearity and complexity (Liu, 2009) where the mean and variance of the data sets can change over time (Kumar and Thenmozhi, 2007). Another common used approach is exponential smoothing method (Palit and Popovic, 2005). Nonetheless, this approach also faced similar problem as ARIMA where it is difficult to deal with non linear feature in time series (Lai et al., 2006).
The difficulties and demerits that own by those techniques have pave a way for many researchers to suggest a Computational Intelligence (CI) based predictor tool, which is Artificial Neural Network (ANN) to overcome the complexity in choosing an appropriate prediction approach. ANN is one of the well known CI techniques to be helpful prediction tool. By applying ANN, the requirement for analyst in specifying form of the model is not compulsory (White, 1994). This technique also has enjoyed considerable success in prediction where a good numbers of published studies about it can be seen and the reserach is continuously carry on (Jammazi and Aloui, 2012).
However, there are several limitations in developing the model. The adaptation of Empirical Risk Minimization (ERM) in ANN leads to numerical instabilities and poor generalization performance (Xiang and Jiang, 2009). The complexity in determining a great volume of control parameters worsens the situation and make it more complicated to be applied (Kumar and Thenmozhi, 2007). In addition, due to the employment of gradient descent based training method, ANN exposed to over fitting problem (Ying and Hua, 2008). All of these have restricted its further application.
As compared to ANN, the emerging of Statistical Learning Theory based method, namely Support Vector Machines (SVM) has been developed as a robust tool for regression and classification. This technique was pioneered by Vapnik (1995) where it comes with the adaptation of Structural Risk Minimization (SRM). As opposed to ERM, this scheme seeks to minimize an upper bound of generalization error rather than only minimizing the training error.
Since SVM is based on Quadratic Programming (QP) which arises computational burden problem, the emerging of variant SVM, namely Least Squares Support Vector Machines (LSSVM) (Suykenset al., 2002) which leads to solve a set of linear equations makes it becomes favorable among academia from different fields.
For the purpose of parameter tuning of LSSVM, there are many approaches have been presented regardless of the application domain. To date, the implementation of Evolutionary Computation (EC) algorithm, which includes Evolutionary Algorithm (EA) and Swarm Intelligence (SI), can be seen as the most promising. As compared to man-made selection approach in determining the value of parameters of interest, the hybridization of LSSVM with these optimization techniques seems much reliable and systematic. Before the emerging of SI, EA techniques first gained interest among academia where the most prominent technique under the group was Genetic Algorithm (GA) where the hybridization of GA and LSSVM have been extensively done by several researchers from different areas (Sun and Zhang, 2008;Yu et al., 2009;Mustafa et al., 2011). This makes GA more dominant than other EA techniques, such Evolutionary Programming (EP) and Genetic Programming (GP) (Karaboga et al., 2012).
On the other hand, there has been an avalanche of studies since the emerging of SI technique and the hybridization between SI with LSSVM has become increasingly popular during the last decade. The SI technique mimics the intelligent behavior of swarm of social insect, flocks of birds, or school of fish (Blum and Li, 2008). Among them, the application of Particle Swarm Optimization (PSO) has been researcher's favorable choice due to its non-complicated concept and ease of use (Park et al., 2010;Ping and Jian, 2009). The hybridization of PSO-LSSVM for chaotic time series prediction has been proposed by Ping and Jian (2009). The proposed method which was tested on several important chaotic time series proved the feasibility and effectiveness by defeating ANFIS and Fuzzy Cluster. In related work, PSO-LSSVM has been presented in solving water quality prediction by Xiang and Jiang (2009). By comparing with several single prediction models, the proposed technique proved to be superior by yielding lowest Mean Absolute Percentage Error (MAPE). In other field, mid long term load prediction has been presented by Niu et al. (2009) utilizing an improved PSO-LSSVM. A bit different with both studies, a modification in inertia weight leads the algorithm to faster searching.
Another SI approach, namely Artificial Fish Swarm Algorithm (AFSA) has been presented in Chen et al. (2008) for predicting electricity load in China. By making PSO-LSSVM as a competitor in the study, AFSA-LSSVM proved its capability to escape from local minima by producing better prediction capability. Meanwhile, a hybridization of Ant Colony Optimization (ACO) and LSSVM can be seen in Fang and Bai (2009) in predicting share price.
On the other hand, a relatively new optimization technique, namely Artificial Bee Colony (ABC) (Karaboga and Akay, 2005) has captured much attention from academia since its invention. Besides its simplicity and ease of use, it is also comes with modest requirement and less control parameters to be tuned (Gao et al., 2012). Interestingly, this Karaboga's approach has been proven to be competitive to other SI technique. The remarkable performance of ABC can be seen in many studies regardless of area, including in the hybridization of ABC-LSSVM in solving gold price prediction (Yusof and Mustaffa, 2011).
In this study, several modifications are introduced for enhancing the capability of ABC. Firstly, Lévy Probability Distribution (LPD) will be integrated in improving the exploitation process of the bees and mutation approach is later introduced in preventing premature convergence. The proposed prediction model is tested on correlated energy fuel price time series, namely propane.
A brief review on support vector machines and least squares support vector machines: The nonlinear technique of Vapnik, which is known as Support Vector Machines (SVM) (Vapnik, 1995) was first introduced in 1995. It is developed based Statistical Learning Theory. The elegance of SVM is due to the employment of kernel, which is often called the kernel trick (Suykens et al., 2002). It offers the capability to map nonlinear input data into a high dimensional feature space that are essentially linear (Fig. 1). In the view of ANN, this feature space is known as high dimensional hidden layer (Suykens et al., 2002).  (Suykens et al., 2002) Here, the optimization process can be duplicated as in the linear case (Sapankevych and Sankar, 2009). With the adaptation SRM, a main objective of Vapnik-Chervonenkis (VC) theory is minimize an upper bound of generalization error, which consequently leads to the superiority of SVM as compared to ANN (Kumar and Thenmozhi, 2007). However, due to the employment of QP solvers in SVM, a computational burden of SVM is inevitable (Yu et al., 2009). Due to that matter, a reformulation of SVM by Suykens et al. (2002), namely LSSVM comes to fill the gaps. This technique is reported to consume less computational effort in the huge-scale problem compared to standard SVM's. As a modified version of Vapnik's technique, LSSVM appears to outperforms SVM in many regression problems (Tarhouni et al., 2011). By still maintaining the advantages of its original form, LSSVM appears with some user friendly properties regarding the implementation and computational time during training (Valyon and Horvath, 2007). Thus, LSSVM able to optimize more precise due to its short computation time (Wang and Hu, 2005). A brief introduction of LSSVM is presented here while a detailed description can be found in Suykens et al., (2002).
The standard framework for LSSVM is based on the primal-dual formulation. Given the dataset {x i , y i } N i=1 , the aim is to estimate a model of the form (Suykens et al., 2002) Eq. 1: where, x∈R n , y∈R andφ (.): h n n R R → is a mapping to a high dimensional feature space. The following optimization problem is formulated (Suykens et al., 2002): w w e 2 2 = = + γ ∑ (2) Subject to: With the application of Mercer's theorem (Vapnik, 1995) for the kernel matrix Ω as .,N it is not required to compute explicitly the nonlinear mapping φ(.) as this is done implicitly through the use of positive definite kernel functions K.From the Lagrangian function Eq. 3: where, i R α ∈ are lagrange multipliers. Differentiating (3) with w, b, e i zand α i , the conditions for optimality can be described as follow Eq. 4: By elimination of w and e i , the following linear system is obtained Eq. 5: With y =[y 1 ,…,y N ] T , α=[ α 1 ,…, α N ] T . The resulting LSSVM model in dual space becomes Eq. 6: Usually, the training of the LSSVM model involves an optimal selection of regularization parameter γ and kernel parameter σ 2 . Several kernel functions, viz. Gaussian Radial Basis Function (RBF) Kernel, linear Kernel and quadratic Kernel are available. For this project, the RBF Kernel is used which is expressed as Eq. 7: where,σ 2 is a tuning parameter which associated with RBF function. Another tuning parameter is γwhich showed in Eq. 2.
Artificial bee colony: For the purpose of obtaining optimized hyper-parameters of LSSVM, a new SI population-based meta-heuristic approach, namely ABC is applied into LSSVM. ABC algorithm was first introduced by Karaboga (2005) for real parameter optimization. It was enlightened by simulating the foraging behavior of honey bee swarm. This technique has been proven to be uncomplicated structure and easy to be applied with modest requirement. In the standard ABC algorithm, the colony of artificial bees is classified into three groups of bees: employed bees which associated with specific food sources, onlooker bees which responsible to watch the dance of employed bees within the hive in order to choose a food source and scout bees which randomly search for food source. For onlookers and scouts, they are also known as unemployed bees (Karaboga et al., 2012).
Half of the colony consists of employed bees and the rest are of the onlooker bees. The number of food sources/nectar sources is equal with the employed bees, which means that one employed bee is responsible for a single nectar source. The objective of the whole colony is to maximize the amount of nectar. In ABC algorithm, the local and global exploration is combined, hence leads better balance in food exploitation and exploration of the bees (Dongli et al., 2011). The duty of employed bees is to search for food sources (solutions). Later, the amount of nectars (solutions' qualities/fitness value) is calculated. Then, the information obtained is shared with the onlooker bees which are waiting in the hive (dance area). The onlooker bees decide to exploit a nectar source depending on the information shared by the employed bees. For this purpose, the onlooker bees watch various dances before choosing a food source position where the duration of dance by employed bees is proportional to the nectar's content (fitness value) of food source currently being exploited by the employed bees. The onlooker bees also determine the source to be abandoned and allocate its employed bee as scout bees. For the scout bees, their task is to find the new valuable food sources. They search the space near the hive randomly.
In ABC algorithm, suppose the solution space of the problem is D-dimensional, where D is the number of parameters to be optimized. The fitness value of the randomly chosen site is formulated as follows Eq. 8: The size of employed bees and onlooker bees are both SN, which is equal to the number of food sources. For each food source's position, one employed bee is assigned to it. For each employed bee whose total numbers are equal to the half of the food sources, a new source is obtained according to (9): Where: i = 1,2,…, SN j = 1,2,…,D ǐ = a random generalized real number within the range[-1,1] k = is a randomly selected index number in the colony After producing the new solution, v' 1 = {x' i1 , x' i2 ,…,x' iD }, it is compared to the original solution v 1 = {x i1 , x i2 ,…x iD }. If the new solution is better than previous one, the bee memorizes the new solution; otherwise she memorizes the former solution. The onlooker bee selects a food source to exploit with the probability Eq. 10: where,fit i is the fitnessof the solution v. SN is the number of food sources potions. Later, the onlooker bee searches a new solution in the selected food source site by Eq. 9, the same way as exploited by employed bees. After all the employed bees exploit a new solution and the onlooker bees are allocated a food source, if a source is found that the fitness hasn't been improved for a given number (denoted by limit) steps, it is abandoned and the employed bee associated with it becomes a scout and makes a random search by Eq. 11: • Initialize the food source positions (population) • Each employed bees is assigned on their food sources • Each onlooker bee select a source base on the quality of her solution produces a new food source in selected food source site and exploits the better source • Decide the source to be cast aside and assign its employed bee as scout for discovering new food sources • Memorize the best food source found so far • If requirement are met, output the best solution, otherwise repeat steps 2-5 until the stopping criterion is met

MATERIALS AND METHODS
Materials and methods under study include research data and data preparation, correlation coefficient among time series employed, data proportion, data normalization, experiment setup, performance evaluation metric and finally enhanced of ABC Research data and data preparation: In this study, data sets utilized consist of four energy fuels price time series which are highly correlated, namely Crude Oil (CL), Heating Oil (HO), gasoline (HU) and Propane (PN). All of time series are in daily basis, spanning from December 1997 to November 2002, which is similar as employed in Malliaris and Malliaris (2008).
Correlation:As to determine the correlation coefficient between the said time series, Pearson Product Moment Coefficient (Lomax, 2007) was utilized. In this study, it is defined as r = 0.8-1.0, which indicates as high correlation (Farid, 2010). Table 1 shows the correlation between the time series of interest.

Data proportion:
Prior to conduct the experiment processes, the time series is split into three different sets, namely training, validation and testing set. The prediction model is build using the training set while to assess the performanceof the model, the validation set takes place. Finally, the testing set is used to assess the expected performance of the model (Williams, 2011). The proportion of each set is as tabulated in Table 2.    Data normalization: The importance of data normalization has been emphasized in many studies (Shalabi et al., 2006;Kotsiantis et al., 2006) In min max normalization, it maps a value of v of A to v' in the range [new-max a , new-min a ] by solving the equation above. Data tabulated in Table 3 and 4 show the samples of original data used in this work and the normalized data. Table 5 shows the variables assigned to features involved. The input arrangement employed in study includes four daily closing price of time series of interest and also another three derivation inputs, namely percent change in daily closing prices from the previous day, standard deviation over the previous 5 trading days and standard deviation over the previous 21 trading days. While for the output, the prediction begin from day 21 onwards of propane's price (PN21owd). The purpose of including the derivation variables is to helps the model to discover an underlying relationship that is constant over time (Malliaris and Malliaris, 2008).

Performance evaluation metric:
In this study, two indexes were utilized in measuring the performance of the prediction model. The first metric is Mean Absolute Percentage Error (MAPE). MAPE is one of the most common evaluation metric applied in prediction (Klimberg et al., 2010) while another metric is Prediction Accuracy (PA). The definitions of both metrics are as followed Eq. 13 and 14: Where: i, j = 1, 2, …, n y = Actual value p = Prediction value PA mean(100% (MAPE 100)) = − × Enhanced artificial bee colony: In this study, the standard ABC-LSSVM is improved by introducing two modifications namely Levy Mutation and Simple Mutation.

Levy mutation to enhance bee's exploitation process:
First modification that has been made is by applying Lévy Probability Distribution (LPD). The purpose is to enhance the exploitation process of the bees in standard ABC. This approach, termed as Lévy Mutation ABC (lvABC) is introduced in generating new food source, which involved employed and onlooker bee phase. The lvABC is based on LPD which is first introduced by (Levy, 1954): From (15), the distribution is symmetric with respect to y = 0 and has two parameters α and θ. α controls the shape of the distribution, requiring 0<α<2 while θ is the scaling factor satisfying θ>0. LPD possesses a power law in the tail region that is characteristic of the fractal structure of nature. Since the analytic form of Eq. 15 is unknown for generalα, an algorithm to generate Levy random number is commonly utilized, such as suggested in Lee and Yao (2004) or algorithm by McCulloch (1996) from Ohio State University.
For θ, it can be set to θ = 1 without loss of generality. To describe this, rescale y to y' = by with some constant b. Then, from (15), the following relation can be obtained: where,θ = θb − α . In particular, by setting θ' = 1, Eq. 16 and 17 becomes: Implying that θis nothing but an overall scaling factor. Thus, with the distribution of θ = 1, the distribution of any other θ can be obtained (Lee and Yao, 2004).
In standard ABC, the exploitation and exploration process carried out by employed and onlooker bees rely on the same equation, which is Eq. 9. However, in the proposed approach, for employed bee phase, instead of applying Eq. 9, the following equation is introduced: On the other hand, in onlooker bee phase, the following equation is used as to replace Eq. 19: Equations 18 and 19 are designed through experimental approach. The main objective is to emphasize the exploitation process at the strong (promising) solutions. In this study, all parameters involved are automatically tuned by eABC-LSSVM.

Simple
Mutation to Prevent Premature Convergence:Secondly, as to prevent the premature convergence, another modification that has been made is by applying a mutation approach, termed as Simple Mutation ABC (SMABC). This modification involved in determining the value of parameters of interest. In ABC-LSSVM (Yusof and Mustaffa, 2011), if the generated parameter value is out of boundaries, it is automatically shifted onto the boundaries. Nonetheless, in smABC-LSSVM, instead of forcing the parameter value to the boundary, when the unwanted situation aforementioned occurred, instead of forcing the parameter value to the boundary, mutation strategy is introduced. This operation is executed by multiplying the generated random number with the range of boundary that has been determined. In this study, the boundaries are set to the range of between [1, 1000] Eq. 20: new _ param (ub lb) * rand _ num = − Where: New−param = New parameter rand_num = Random number ub = Upper bound lb = Lower bound The purpose of this modification is to prevent the model from trapping in local minima which may cause premature convergence. This might happen if the area explored by the model is not a global minima area. By applying mutation strategy, the model is induced to explore other areas in order to look for global minima rather than local minima (Haupt and Haupt, 1998). Figure 2 shows the simplified form of the proposed prediction model while the flow of eABC algorithm with the modification that has been made is illustrated in Fig.  3. From Fig. 3, it can be seen that both modifications, namely smABC-LSSVM and lvABC-LSSVM involved in both employed and onlooker bee phases. For smABC-LSSVM, the modification is done after producing new food solution while lvABC-LSSVM involved in the phases of producing new food solution.

RESULTS
The proposed method is elaborated by designing an appropriate eABC-LSSVM in Matlab environment utilizing LSSVMlab Toolbox which can be obtained in Pelkmans et al. (2000). Prior to that, the properties of ABC and competitor chose in the study, namely Back Propagation Neural Network (BPNN) was determined, as tabulated in Table 6 while the empirical results were showed in Table 7. The prediction accuracy yielded by ABC-LSSVM was 87.4138% and is achieved using γ = 135.4851 and σ 2 = 995.0188. An increment of 0.1325 and 0.2105% prediction accuracy was recorded by smABC-LSSVM and lvABC-LSSVM respectively.      With the combination of both said approaches, the eABC-LSSVM able to produce higher prediction accuracy, which is 87.9401%, with the difference of 0.5263% from results obtained by standard ABC-LSSVM. For BPNN, the method was defeated with 3.0767% prediction accuracy by eABC-LSSVM. The visual result of experiment conducted is illustrated in Fig.  4 where the blue round mark represents the actual value, the red straight line represents eABC-LSSVM while the green dot and dashed line represents ABC-LSSVM and BPNN respectively. As to test the significance level of modifications that have been made, T-Test was executed using SPSS (Hinton, 2004). As showed in Table 8, the significance level between pre modification (ABC-LSSVM) and post modification (eABC-LSSVM) was 0.000 (seelastcolumn). This indicates that there is significance difference in prediction accuracy when the eABC-LSSVM is utilized.

DISCUSSION
The performance of proposed technique was evaluated based on MAPE which reflect the PA obtained. From the empirical results, it shows that there is a positive increment of PA produced from standard ABC-LSSVM to eABC-LSSVM. Even increment in PA produced by smABC-LSSVM and lvABC-LSSVM are quite small, however when both techniques arecombined, better PA can be obtained, as shown in Table 7. This shows that the combination of several mutation approach at different stages can lead to better performance of optimization techniques, which finally yield a good prediction result. By adopting mutation based on LPD, the search exploitation of the bees is enhanced due to a wider search region. This implies that large variation of mutation are quite likely and reduces the chance of revisiting the samefood source. On the other hand, the integration of Simple Mutation in decision making process induced the bees to further the exploration instead of going to the boundary of the search space. This finally lead to better convergence as the maximum and minimum values of the boundary do not mean the optimal (Lendasse et al., 2005).

CONCLUSION
This study reports the empirical results which examines the feasibility of proposed approach, eABC, which combines the modifications in lvABC and smABC when hybrid with LSSVM in predicting propane's price. The highly dependence of the algorithm in the single equation for exploitation process has being considered by introducing lvABC. The advantage of standard ABC which comes with fast convergence feature also not taken for granted as it may increase the risk of facing with premature convergence. With respect to both matters, appropriate actions have been taken.With a modest requirement, the feasibility of proposed technique in predicting the time series of interest showed a satisfactory performance by producing better prediction accuracy than the competitors chosen under study.