A NOVEL APPROACH BASED ON GENETIC FUZZY CLUSTERING AND ADAPTIVE NEURAL NETWORKS FOR SALES FORECASTING

This article proposes a new hybrid sales forecasting system based on genetic fuzzy clustering and Back-Propagation (BP) Neural Networks with adaptive learning rate (GFCBPN).The proposed architecture consists of three stages: (1) utilizing Winter’s Exponential Smoothing method and Fuzzy C-Means clustering, all normalized data records will be categorized into k clusters; (2) using an adapted Genetic Fuzzy System (MCGFS), the fuzzy rules of membership levels to each cluster will be extracted; (3) each cluster will be fed into parallel BP networks with a learning rate adapted as the level of cluster membership of training data records. Compared to previous researches which use Hard clustering, this research uses the fuzzy clustering which capable to increase the number of elements of each cluster and consequently improve the accuracy of the proposed forecasting system. Printed Circuit Board (PCB) will be utilized as a case study to evaluate the precision of our proposed system. Experimental results show that the proposed model outperforms the previous and traditional approaches. Therefore, it is a very promising method for financial forecasting.


INTRODUCTION
Introduction and related research Reliable prediction of sales becomes a vital task of business decision making. Companies that use accurate sales forecasting system earn important benefits. Sales forecasting is both necessary and difficult. It is necessary because it is the starting point of many tools for managing the business: production schedules, finance, marketing plans and budgeting and promotion and advertising plan. It is difficult because it is out of reach regardless of the quality of the methods adopted to predict the future with certainty. The parameters which intervenient are numerous, complex and often unquantifiable.

Science Publications
JCS networks (BPN) in order to forecast safety stock. Zhang and Huang (2010) used BPN for Sales Forecasting Based on ERP System. They found out that BPN can be used as an accurate sales forecasting system.
The rate of convergence of the traditional back propagation networks is very slow because it's dependent upon the choice of value of the learning rate parameter. However, the experimental results (Iranmanesh and Mahdavi, 2009) showed that the use of an adaptive learning rate parameter during the training process can lead to much better results than the traditional neural network model (BPN).
Many papers indicate that the system which uses the hybridization of fuzzy logic and neural networks can more accurately perform than the conventional statistical method and single ANN. Kuo and Xue (1999) proposed a Fuzzy Neural Network (FNN) as a model for sales forecasting. They utilized fuzzy logic to extract the expert's fuzzy knowledge. Chen (2003) used a model for wafer fab prediction based on a fuzzy back Preprint submitted to Elsevier 24 décembre 2012 Propagation Network (FBPN). The proposed system is constructed to incorporate production control expert judgments in enhancing the performance of an existing crisp back propagation network. The results showed the performance of the FBPN was better than that of the BPN. Efendigil et al. (2009) utilized a forecasting system based on artificial neural networks ANNs and Adaptive Network-Based Fuzzy Inference Systems (ANFIS) to predict the fuzzy demand with incomplete information.
A Hybrid Intelligent Clustering Forecasting System was proposed (Oh and Han, 2001). It was based on change point detection and artificial neural networks. The basic concept of proposed model is to obtain significant intervals by change point detection. They found out that the proposed model is more accurate and convergent than the traditional neural network model (BPN).
Recently, some researchers have shown that the use of the hybridization between fuzzy logic and GAs leading to Genetic Fuzzy Systems (GFSs) (Cordon et al., 2001) has more accurate and efficient results than the traditional intelligent systems. Orriols-Puig et al. (2009) and, utilized GFS in various case Management. They have all obtained good results.
This article proposes a new hybrid system based on genetic fuzzy clustering using a Genetic Fuzzy System (GFS) (Cordon and Herrera, 1997) and Back propagation Neural Networks with adaptive learning rate (GFCBPN) for sales forecasting in Printed Circuit Board (PCB) industry.
The data test used in this study was collected from sales forecasting case study, called Printed Circuit Board (PCB) industry in Taiwan, which has been frequently used by the other authors as a case study. The total number of training samples was collected from January 1999 to December 2002 while the total number of testing samples was from January 2003 to December 2003.

PCB Sales Forecasting
Due to the important role of Printed Circuit Board (PCB) industry in Taiwan's economy, there are several studies in the literature which have considered PCB sales forecasting as the case study (Table 1). Chang and Lai (2005) used Back Propagation Neural networks (BPN) trained by a genetic algorithm (ENN) to estimate demand production of Printed Circuit Board (PCB). The experimental results show that the performance of ENN is greater than BPN.  used a Fuzzy Back Propagation Network (FBPN) for sales forecasting. The opinions of sales managers about the importance of each input, were converted into pre-specified fuzzy numbers to be integrated into a proposed system. They concluded that FBPN approach outperforms other traditional methods such as Grey Forecasting, Multiple Regression Analysis and back propagation networks.  proposed a fusion of SOM, ANNs, GAs and FRBS for PCB sales forecasting. They found that performance of the model was superior to previous methods that proposed for PCB sales forecasting. Chang et al. (2007) developed a Weighted Evolving Fuzzy Neural Network (WEFuNN) model for PCB sales forecasting. The proposed model was based on combination of sales key factors selected using GRA. The experimental results that this hybrid system is better than previous hybrid models. Chang and Liu (2008) developed a hybrid model based on fusion of Cased-Based Reasoning (CBR) and fuzzy multicriteria decision making. The experimental results showed that FCBR model is superior to traditional statistical models and BPN. Chang et al. (2009) developed a K-means clustering and Fuzzy Neural Network (FNN) to estimate the future sales of PCB. They used K-means for clustering data in different clusters to be fed into independent FNN models. The experimental results show that the proposed approach outperforms other traditional forecasting models, such as, BPN, ANFIS and FNN.

JCS
Year  Esmaeil et al. (2011) proposed a novel sales forecasting approach by the integration of Genetic Fuzzy Systems (GFS) and data clustering to construct a sales forecasting expert system. They use GFS to extract the whole knowledge base of the fuzzy system for sales forecasting problems. Experimental results show that the proposed approach outperforms the other previous approaches.

Development of the GFCBPN Model
The proposed architecture consists of three stages as shown in Fig. 1: (1) all normalized records of data are categorized into K clusters by using the fuzzy c-means model; (2) the fuzzy distances from all records data (x i ) to different cluster centers (c j ) founded by fuzzy Cmeans will be introduced into independent Membership Cluster Genetic Fuzzy Systems (MCGFS); (3) For each cluster, we train a parallel BP networks with a learning rate adapted according to the level of cluster membership of each record of training data set.

Data Preprocessing Stage
Historical data of an electronic company in Taiwan is used to choose the key variables (K 1 , K 2 , K 3 ) ( Table 2) that are to be considered in the forecasted model. Monthly sales amount of Printed Circuit Board (PCB) is considered as a case of the forecasting model which has been used as the case in different studies.

Winter's Exponential Smoothing
In order to take the effects of seasonality and trend into consideration, Winter's Exponential Smoothing is used to preliminarily forecast the quantity of PCB production. For time serial data, Winter's Exponential Smoothing is used to preprocess all the historical data and use them to predict the production demand (Fig. 2), which will be entered into the proposed hybrid model as input variable (K 4 ) ( Table 2). Similar to the previous researches, we assume α = 0.1, β = 0.1 and γ = 0.9.

Data Normalization
The input values (K 1 , K 2 , K 3 , K 4 ) will be ranged in the interval [0.1, 0.9] to meet property of neural networks. The normalized Equation 1 is as follows: where, K i presents a key variable, N i presents normalized input ( Table 2), max (K i ) and min (K i ) represent maximum and minimum of the key variables, respectively.

Fuzzy C-Means Clustering
In hard clustering, data is divided into distinct clusters, where each data element belongs to exactly one cluster. In Fuzzy C-Means (FCM) (developed by (Dunn, 1973) and improved by (Bezdek, 1981), data elements can belong to more than one cluster and associated with each element is a set of membership levels. It is based on minimization of the following objective function: where, u ij is the degree of membership of x i in the cluster j, x i is the i th of measured data and c j is the center of the j th cluster. The algorithm is composed of the following steps: Step 1 Step 3: For each point, update its coefficients of being in the clusters (U (k) ,U (k+1) ): Step 4: If ||U (k+1) -U (k) ||<ε, 0 <ε < 1. Then STOP; otherwise return to step 2.
This procedure converges to a local minimum or a saddle point of J m . According to (Bezdek, 1981), the appreciated parameter combination of two factors (m and ε) is m = 2 and ε = 0.5.
Using fuzzy c-means, Table 3 shows that the use of four cluster is the best among all different clustering numbers.

Extract the Fuzzy IF-THEN Rules of Membership Levels to Each Cluster
The degree of membership to a cluster is related inversely to the distance from data record to the cluster center. Thus, the degree of membership to a cluster has a strong dependency to the position of the cluster center which has to move each time a new data record is added. To get around this dependency, we use a kind of Genetic Fuzzy Systems (MCGFS) to extract the fuzzy rules which define the distance between the data records and cluster. Therefore, in adding a new record data, using the fuzzy rules generated we can estimate the distances between data record and cluster centers and to know its degree of membership to each clusters.
In recent years, fuzzy system, used for several complex problems, has become a popular research topic. The fundamental methodologies of fuzzy systems are to store the available knowledge in the form of fuzzy linguistic IF-THEN rules (Fig. 3). It is composed of The Rule Base (RB), constituted by the collection of rules in their symbolic forms and the Data Base (DB), which contains the linguistic term sets and the membership functions defining their meanings (Casillas et al., 2004).
Obviously, it is difficult for human experts to express their knowledge in the form of fuzzy IF-THEN rules.
To cope with this problem, several approaches for automatically extracting fuzzy rules from historical data have been proposed. In this sense, Intelligent

JCS
Systems, like Genetic Algorithms (GA) (Casillas et al., 2004;Cordon and Herrera, 1997), or Particle Swam Optimization (PSO) (Esmin, 2007) have been demonstrated to be a powerful tool to perform tasks such as generation of fuzzy IF-THEN rules. These approaches can be given the general name of Genetic Fuzzy Systems (GFS) (Cordon and Herrera, 1997).
This stage uses a new kind of Genetic Fuzzy System (GFS), called Membership Cluster Genetic Fuzzy System (MCGFS), to extract fuzzy IF-THEN rules of the degree of belonging (Membership levels) to clusters. For each cluster, MCGFS returns the two best fuzzy linguistic rules which present the distance between data records and cluster centers. The proposed MCGFS consists of two general steps Fig. 4.
The derived RB is the best individual of the last population: Step 1: A Genetic Algorithm (GA) will be used to Genetic learn the Fuzzy Rule Based Systems (FRBS). Extract for each cluster, using Genetic Algorithm (GA), the two best fuzzy Rule Bases (RB) of distance between training records and cluster center.
Step 2: Particle Swam Optimization (PSO) will be used to tune data base of fuzzy system. This proposed process, using Particle Swarm Optimization method (PSO), modifies the shapes of the membership functions of preliminary two best fuzzy rule bases (RB) to improve the precision of results.

Genetic Rule Base Learning Process for FRBS (GA)
The main aim of this sub stage is to extract, for each cluster K, the two best fuzzy Rule Bases (RB) of the distances between training records x i and cluster center (c k ). The Data Base (DB) definition constituted by uniform fuzzy partitions, with triangular membership functions crossing a height 0.5 for input and output variables linguistic terms, is considered. Each variable is defined by a fuzzy linguistic term (Binary numbers corresponding), such as, non (00), small (01), medium (10) and large (11). For each cluster, all distance from each cluster center c k to each record data x i will be introduced into independent (GFS) models by cluster, with the ability of fuzzy rule bases extraction. For each cluster, the derived Rule Base (RB) is the two of the best chromosomes of the last population. This stage consists of the following steps.
Each two genes present the triangular membership functions for input or output variables linguistic terms. A chromosome is constructed from a series of genes. A sample coded combination of fuzzy rule base as chromosomes with four inputs as well as an output variable which present the fuzzy distance from training data x i to the k th cluster is shown in Fig. 5.
Step 2: Generating the initial values.
Initial binary values of linguistic variables (chromosomes) are randomly generated. The initial chromosomes generated form the first population (N pop ).
Step 3: Calculating the fitness values.
In this step, the Mean Squared Error (MSE) is used as the objective function to evaluate the deviation of the training data, which is computed as: Dist is the actual distance between the i th training element x i and k th cluster center and k i Out , obtained from the FRBS using the RB coded in j th chromosome ( k j C ), is the output distance between the i th training element x i and k th cluster center and N is the number of training data.
Step 4: Reproduction and selection.
The roulette wheel selection (Goldberg, 1989) is applied in this stage. The two best individuals of each generationwere copied without changes in the next generation. We use a binary tournament selection scheme for the selection procedure to generate (N pop -2). In binary tournament selection, two individuals are randomly selected; the better of the two is selected as a parent.
After the parameter design, two-point crossover method (Goldberg, 1989) is applied in this step. Step 6: Mutation.
After the parameter design, one-point mutation method (Goldberg, 1989) is applied in this step.
The new population generated by the previous steps updates the old population.
If the number of generations is equal to the maximum generation number then stop, otherwise go to step 3.

Tuning Process of Fuzzy Rule Bases (PSO)
In order to improve the precision of the two best fuzzy rule bases of each cluster returned by the above generation method, This substage applies a tuning process of fuzzy rule-based similar to the genetic tuning process that was proposed by (Cordon and Herrera, 1997). The proposed tuning process uses the Particle Swarm Optimization method (PSO) to modify the shapes of the membership functions of the preliminary two best Rule Bases (RB) of each Cluster.
The Particle Swarm Optimization Algorithm (PSO) is a population based optimization method that finds the optimal solution using a population of particles (Eberhart and Kennedy, 1995). Every swarm of PSO is a solution in the solution space. PSO is basically developed through simulation of bird flocking. PSO can yield faster convergence when compared to Genetic Algorithm (GA), because of the balance between exploration and exploitation in the search space (Sivanandam and Visalakshi, 2009).
For each cluster fuzzy Rule Base (RB), we apply Particle Swarm Optimization Algorithm (PSO) to adjust the parameters (shapes) of membership functions to improve the accuracy of estimates of the distances between training records and cluster center. The proposed tuning process is presented as follows: Step 1: Defining of the search space.
The PSO algorithm works by having a search space (called a swarm) of candidate solutions (called particles). In our case, the search space is the set of all possible three values representing the triangles of the membership functions. The dimensionality of the search space is 15. Each particle is represented by: where, k k k i i i [a ,b ,c ] are the three parameters which define the input triangle fuzzy membership function of the K ith variable (X k ) and j j j 1,i 2,i 3,i [o ,b ,c ] are the three parameters which define the output triangle fuzzy membership function of fuzzy distance between cluster center c j and normalized record data X(X 1 , X 2 , X 3 , X 4 ).
Step 2: Generating the initial population.
Initialization the positions of the particles: Step 3: Calculate fitness values for each particle. The fitness value for each particle is calculated utilizing MSE over a training data set, which is computed as: Dist is the actual distance between the l th training element (x l ) and k th cluster center (c k ) and k i Out , obtained from fuzzy rule coded in the particle P i , is the output distance between the i th training element x i and k th cluster center and N is the number of training data.
Step 4: Assign best particle's best i P value to g best . Compare each particle's P i fitness evaluation with its best i P . If the current value is better than best i P , set the best i P value to the current value P i . Compare the population's fitness evaluation with the population's overall previous best (g best ). If the current value is better than g best , reset the g best location to the current particle's location.
Step 5: Calculate velocity for each particle.
The velocity of each of the particles (P i ) for the next generation t+1 are updated as:

r and (). P -P +C .r and (). g -P
where, constants c1 and c2 represent the weights of the stochastic acceleration terms that draw each particle i towards best i P and g best positions. In this research, we assume C 1 = 2 and C 2 = 2.
The particles are moved to their new positions according to: Until a termination criterion is met (number of iterations performed, or adequate fitness reached), go to step 3.

The Degree of Membership Levels (MLC k )
Using the two previous stages, we get six fuzzy rules as Results (Fig. 6). Each pair of rules presents the distance between records data (X i ) and a cluster center (c j ).
In this stage, we will use the sigmoid function (Fig. 7) to improve the precision of results and to accelerate the training process of neural networks. Then, the advanced fuzzy distance to cluster k (AF D k ) will be presented as follow: The degree of membership levels of belonging of a record X i to k ith cluster (MLC k (X i )) is related inversely to the distance from records data X i to the cluster center c k (AF D k (X i )): Thus, we can construct a new training sample (X i , MLC 1 (X i ), MLC 2 (X i ), MLC 3 (X i ), MLC 4 (X i )) for the adaptive neural networks evaluating (Fig. 1).

Adaptive Neural Networks Evaluating Stage
The Artificial Neural Networks (ANNs) concept is originated from the biological science (neurons in an organism). Its components are connected according some pattern of connectivity, associated with different weights. The weight of a neural connection is updated by learning. The ANNs possess the ability to identify nonlinear patterns by learning from the data set. The Back Propagation (BP) training algorithms are probably the most popular ones. The structure of BP neural networks consists of an input layer, a hidden layer, as well as an output layer. Each layer contains I; J and L nodes denoted. The w ij is denoted as numerical weights between input and hidden layers and so is w jl between hidden and output layers as shown in Fig. 8. In this stage, we propose an adaptive neural networks evaluating system which consists of four neural networks. Each cluster K is associated with the K ith BP network. For each cluster, the training sample will be fed into a parallel Back Propagation Networks (BPN) with a learning rate adapted according to the level of clusters membership (MLC k ) of each records of training data set. The structure of the proposed system is shown in Fig. 1.
The Adaptive neural networks learning algorithm is composed of two procedures: (a) a feed forward step and (b) a back-propagation weight training step. These two separate procedures will be explained in details as follows: Step 1: All BP networks are initialized with the same random weights.
For each BPN k (associate to the K th cluster), we assume that each input factor in the input layer is denoted by x i . k Step 3: Back-propagation weight training. The error function is defined as: where, t k is a predefined network output (or desired output or target value) and e k is the error in each output node.

JCS
The goal is to minimize E so that the weight in each link is accordingly adjusted and the final output can match the desired output. The learning speed can be improved by introducing the momentum term α. Usually, η falls in the range [0, 1]. For the iteration n and for BPN k (associated to k th cluster), the adaptive learning rate in BPN k and the variation of weights ∆w k can be expressed as: As shown above, we can conclude that the variation of the BPN k network weights ( k oj w and k ol w ) are more important as long as the the Membership Level (MLC k ) of data record X j to k th cluster is high. If the value of Membership Level (MLC k ) of data record X j to k th cluster is close to zero then the changes in BPN k network weights are very minimal.
The configuration of the proposed BPN is established as follows:

Constructing GFCBPN Sales Forecasting ES
Our proposed system (GFCBPN) has three stages: first, all normalized records of data are categorized into k clusters by using the fuzzy c-means model. Secondly, using a Genetic Fuzzy System (GFS), extract the fuzzy rules of membership levels to each cluster. Finally, for each cluster, we train a parallel BP networks with a learning rate adapted according to the level of cluster membership of each record of training data set.   The proposed GFCBPN system was applied as case to forecast the sales data of the PCB. The results are presented in Table 4.
Using Membership Cluster Genetic Fuzzy System MCGFS (Fig. 4) of cluster 1, cluster 2, cluster 3 and cluster 4, the fuzzy rules of membership levels to each cluster will be extracted and the results are summarized in Fig. 6.

Comparisons of GFCBPN Model with Other Previous Models
Experimental comparison of outputs of GFCBPN with other methods show that the proposed model outperforms the previous approaches (Table 4-9 and Fig. 9-14). We apply two different performance measures called Mean Absolute Percentage Error (MAPE) and Root Mean Square Error (RMSE), to compare the GFCBP N model with the previous methods, i.e., KGFS, KFNN, FNN, WES, BPN and RBFNN: where, P t is the expected value for period t, Y t is the actual value for period t and N is the number of periods. As shown in Fig. 15, the use of MCGFS enhanced by the sigmoid function in the proposed acupuncture (GFCBPN) has better precision results than the use of the fuzzy c-means clustering in the test stage. GFCBPN has made 1,7 as MAPE evaluation and 1820 as RMSE evaluation. Therefore, the forecasting accuracy of GFCBPN out performs the previous approaches regarding MAPE and RMSE evaluations which is summarized in Table 10.

CONCLUSION
This article proposes a new hybrid system based on genetic fuzzy clustering and Back-propagation Neural Networks with adaptive learning rate (GFCBPN) for sales forecasting.
The experimental results of the proposed approach show that the effectiveness of the GFCBPN outperforms the previous and traditional approaches: WES, BPN, RBFNN, KFNN, FNN, WES, BPN and KGFS. Furthermore, it also demonstrates that our modeling approach (GFCBPN) has properties, such as, fast convergence, high precision, robust and accurate forecasting techniques.
Compared to previous researches which tend to use the classical hard clustering methods (K-means clustering) to divide data set into subgroups in order to reduce the noise and form more homogeneous clusters , the advantage of our proposed system (GFCBPN) is that it uses a fuzzy clustering (fuzzy c-means clustering) which permits each data record to belong to each cluster to a certain degree, which allows the clusters to be larger which consequently increases the accuracy of forecasting system results.
Another advantage of our system is that it uses a kind of Genetic Fuzzy System (GFS), called MCGFS, which allows estimation of the degree of membership of each data record to each cluster with no dependencies to the position of the cluster centers.
We applied GFCBPN for sales forecasting in Printed Circuit Board (PCB) as a case study. The results demonstrated the effectiveness and superiority of the GFCBPN compared to the previous approaches regarding MAPE and RMSE evaluations. Other academic researchers and industrial practitioners may find these contributions interesting.