A Hybrid Clustering Process using a Genetic Fuzzy System for the Knowledge Base of a Fuzzy Rule-Based System

: The present paper proposes a new Hybrid clustering Process based on Fuzzy Genetic System. The proposed Approach consists of two steps: (1) Using a method called Fuzzy clustering, all data elements will be clustered into N groups; (2) utilizing a Fuzzy Genetic System, for every level the fuzzy rule of adhesion will be generated. If we compare our research to others that use the hard clustering, we will conclude that by using the fuzzy clustering we are able to raise the ingredient of each cluster and upgrade the accuracy of the offer target system and we will win in terms of complexity because the system is based on hybrid intelligent method and then we will not need to generate a new cluster every time we add a new data point. Experimental results on estimation models using clustering methods on synthetic data show that the proposed algorithm outperforms few commonly used clustering algorithms.


Introduction
On our days, we can perceive that there is a greater movement for researchers to utilize clustering methods so as to rise the accuracy of their results. Thus, we can classify the clustering as the most important unattended apprenticeship and that is why every problem from this kind should be treat by located a system in a series of unlabeled input. They are enough famouns owing to their speed that is the higher. The results we obtain are spherical and the sensitive are very highly to initialization.
A hybrid intelligent clustering system was suggested (Oh and Han, 2001) it was based in ANN and change point detection. By changing the discovery item the staple construct of offer template is obtained. So we conclude that the proposed model is more exact that the traditional one.
Lately, some researchers have exposed that the use of the hybridization between fuzzy logic and Ga is principal to Genetic Fuzzy Systems (GFSs) (Cordón et al., 2001) is more performing than the traditional intelligent systems. Orriols-Puig et al. (2009;Martínez-López and Casillas, 2009;Esmin, 2007), employed GFS in several events Management. They have all got good results.
Recently, the consolidated intelligence technique employing fuzzy logic, Particle Swarm Optimization PSO and genetic algorithms proved that they are the best approach. Many studies practice the hybrid models because the sales input are nonlinear. Hartigan (1975) has developed the K-means clustering algorithm. It's a simple method and the most famous. The principal of this process is to start with K2 cluster centers and divides into K subsets.
Our research is a comparative of K-means and others clustering methods (Dunham, 2002;Rakhlin and Caponnetto, 2007;Berkhin, 2002;Borah and Ghose, 2009;Han and Kamber, 2006;Xiong et al., 2009;Park et al., 2006). This paper suggests a novel hybrid clustering approach utilizing a Gentic Fuzzy System. The article is organized as follows: Section 2, characterizes the proposed model which named Membership Cluster Genetic Fuzzy Systems (MCGFS). After all, in section 3, we finish the article with conclusions.

Materials and Methods
We are going to propose an architecture that consists of two stages ( Fig. 1): Stage1: By using the "fuzzy means" all the input are normalized into K clusters Stage2: The difference from clusters centers (c j ) to all data (x i ) will be inserted into independent Membership Genetic Fuzzy Systems (MCGFS) The variable (K 1 , K 2 , K 3 , K 4 ) of historical date of an company in Taiwan specialized on electronic is treated like an event of the clustering approach that has been used in différent studies.

Data Preprocessing Stage
This stage contain 2 steps in the first one, we are going to normalized all the records data and in the second one and by using the fuzzy method we are going to normalized records data into K clusters.

Data Normalization
In the interval [0.1, 0.9] all the input values (K 1 , K 2 , K 3 , K 4 ) will ranged in order to meet property of neural networks.
The equation of the normalization can be expressed as follows: where, N i a normalized input, K i is a key variable, max (K i ) is the maximum of the key variables and min (K i ) minimum of the same Key variable.

Fuzzy C-Means Clustering
Input is divided into different clusters using hard clustering. Data elements can appertain to many clusters and joined with each element is a set of membership levels by employing the model Fuzzy C-Means (FCM) (developed by Dunn (1973)) and improved by Bezdek (1981)), it is founded on minimization of the following objective function: with U(k): Step3: Update the coefficients for each point in the clusters (U(k),U(k+1)): Step4: If || U(k+1) -U(k)||<ε, 0 <ε < 1. Then STOP; else return to step 2.
This process converges to a saddle point of J m or a local minimum. The developed parameter combination of two factors (m and ε) are m = 2 and ε = 0.5 according to Bezdek (1981).
Using the model fuzzy c-means, we can const at (Table 1) that the use of four clusters is the best between all different clustering numbers.

Extract the Fuzzy IF-THEN Rules of Membership Levels to Each Cluster
The distance between the cluster center and the input record determine the degree of belonging to a cluster.  Therefore, there is a strong dependency between the position of cluster center and the degree of belonging to a cluster change the positon every time we add a new input.
To avoid this dependency we should use the Genetic Fuzzy Systems (MCGFS) and we will have the rule that define the difference between the cluster and the input records. We can measure the difference between input and cluster by using the fuzzy rules generated. In recent years, Fuzzy system become the most popular algorithms used to involve problems. The Principe of this process is to conserve in the form of fuzzy linguistic the applicable learning (Fig. 2). It is mixed of the rule base and the Data base (Casillas et al., 2004).
Clearly, the human experts found a lot of difficulties to demonstrate their knowledge in the form of fuzzy IF-THEN rules.
To disconcert this issue a lot of historical record had been suggested by using the fuzzy rules. In this way, Intelligent System, such as Genetic Algorithms (GA) (Casillas et al., 2004;Cordon and Herrera, 1997) or Particle Swam Optimization (Esmin, 2007) have been attested to be a efficient implement to execute assignment like generation of fuzzy IF-THEN rules. This approaches is called Genetic Fuzzy Systems (GFS) (Cordon and Herrera, 1997).
This stages uses Genetic Fuzzy Systole (GFS), it's a new type that have fuzzy IF-THEN rules of the degree of adhesion to clusters. MCGFS returns the best resukts (two) for each cluster, that offer the distance between centers of clusers and input. The MCGFS have two point.
The drivel RB is the best one: Step1: The distance between training records and cluster center will be extract for every clusetrs using the genetic algorithm Step 2: In order to fix data base of fuzzy system and to ameliorate the exactitude of results, we will use the particle swam optimization, changes the forms of appurtenance functions.

Genetic Rule Base Learning Process for FRBS (GA)
The goal of this part is to extract the two best fuzzy RB of the distance between training cluster center and training records, for each cluster. We could define each variable by using the rule defined by a fuzzy linguistic term (ex: None (00), medium (10), small (01) and large (11)).
Using the fuzzy rule, we will presented for each cluster the distance from each cluster center to each record input. The result will be the best chromosomes of the final population. The next steps will be the establishment for this stage (Fig. 3): Step1: Encoding of chromosomes.
The triangular functions for output and the input variable for linguistic terms could be introduced by two genes and using many genes we can have a chromosome. Using 4 inputs and output variable we could have a specimen coded with a fuzzy rule base (Fig. 4).
Step2: Generating the initial values.
The chromosomes are randomly produced. The first population produced the first one.
Step3: Calculating the fitness values.
In order to have an estimation of the deviation of the training input, we will utilize the mean squared error as the objective function: Dist present the actual distance between the i th training element x i and k th cluster center and k i Out , got from the FRBS utilizing the RB coded in j th chromosome ( ) k j C , present the output distance between the i th training element x i and k th cluster center and N present the number of training input.
In this stage we will applied the roulette wheel selection (Goldberg, 1989). Without any transformation the best two results of every generation were reproduced in the next one. The binary contest is used for every process. Two individual will be randomly chosen and the best one is selected as a parent: Binary selection.
In this stage, two point crossover is applied after parameter design.
In this stage, one point crossover is applied after parameter design (Goldberg 1989).
The new population produced by the precedent steps updates the old population.
The old population is updated by the new one using the precedent stages: Step8: Stopping criteria.
Stop, if the number maximum generation is the same as the number of generations else execute the stage 3.

Tuning Process of Fuzzy Rule Bases (PSO)
This sub-stage applies an adjustment process like the genetic adjustment process suggested by (Cordon and Herrera, 1997) In order to upgrade the exactitude of the two best fuzzy rules founds returned by the above generation method, the Particle Swarm Optimization method (PSO) is utilized by the proposed adjustment process in order to update the form of the appurtenance functions of the introductory the 2 RB of each Cluster.
The particle Swarm Optimization Algorithm (PSO) is a population founded on optimization method that discovers the optimal solution utilizing a population of particles (Eberhart and Kennedy, 1995). Every swarm is a solution in the solution space. PSO is fundamentally developed by simulation of bird flocking. PSO can efficiency faster convergence when compared to Genetic Algorithm (GA), because of the equilibrium between exploration and exploitation in the search space (Sivanandam and Visalakshi, 2009).
For each cluster fuzzy Rule Base (RP), we exercise Particle Swarm Optimization Algorithm (PSO) to adjust the parameters (shapes) of membership functions to upgrade the exactness of the asses distances between training records and cluster center. The proposed tuning process is introduced as follows: Step1: Defining of the search space.
The PSO algorithms runs by having a search space (named a swarm) of candidate solutions (named particles). In our case, the search space is the ensemble of all possible three values performing the triangles of the membership functions. The dimensionality of the search space is 15. Each particle represented by: , , , , , , , , , , , , , ,  presents the three parameters to specify the input triangle fuzzy membership function of the Kith variable (X k ) and 1, 2, 3, , ,  presents three other parameters to specify the output triangle fuzzy membership function of fuzzy distance between cluster center c j and normalized record data X(X1,X2,X3,X4).
Step2: Generating the initial population.
Step3: For each particle calculate fitness value.
The fitness value for each particle is elaborated employing MSE over a training data set, which is calculated as: Dist present the actual distance between the l th training element (x l ) and k th cluster center (c k ) and k i Out , got from fuzzy rule coded in the particle P i , present the output distance between the i th training element x i and k th cluster center and N present the number of training data.
Step4: Assign best particle's best i P value to gbest.
Assimilate each particle's P i fitness evaluation with its best i P . If the present value is better than best i P , set the best i P value to the current value P i . Compare the population's fitness evaluation with the population's global precedent best (gbest). If the present value is better than gbest, reset the gbest location to the current particle's location: Step5: Calculate velocity for each particle.
The speed of each of the particles (P i ) for the next generation t+1 are updated as:

The Degree of Membership Levels (MLC k )
Utilizing the two previous stages, we get six fuzzy rules as outcome (Fig. 6). Each pair of rules offers the distances between records data (X i ) and a cluster center (c j ).
In this level, the sigmoid function is used (Fig. 5) to ameliorate the exactness of results and to have a training process of neural netwok more faster. Then, the advanced fuzzy distance to cluster k (AF D k ) will be introduced like: The degree of appurtenance stage of a record X i to kith cluster (MLC k (X i )) is related inversely to the distance from records data X i to the cluster center c k (AF D k (X i )).
The grade of belonging stage of membership of a record X i to kith cluster (MLC k (X i )) is related inversely to the distance from records data X i to the cluster center ck (AF D k (X i )):

Constructing MCGFS Model
Our proposed system (MCGFS) has two stages: Steps: (1) using a method called Fuzzy clustering, all data elements will be clustered into N groups; (2) utilizing a Fuzzy Genetic System, the fuzzy rules of membership levels to each cluster will be generated. The proposed MCGFS system was applied to forecast the sales data of the PCB. The results are in Table 2-4. We chose BPN with clustering data as a forecast method. A parallel BP networks is trained with a learning rate adapted to the stage of cluster appurtenance of every record of training input We will compare the result of use of BPN with three clustering method: -K-means -Fuzzy c-means -Membership Cluster Genetic Fuzzy Systems (MCGFS)

Comparisons of GFCBPN Model with Other Previous Models
Experimental comparison of outputs of GFCBPN with other methods show that the proposed model outperforms the previous approaches ( where, P t is the expected value for period t, Y t is the actual value for period t and N is the number of periods.  As shown in Fig. 10, to have a good precision is better to use the MCGFS than the fuzzy c-means clustering.

Conclusion
This article offers a new hybrid system founded on genetic fuzzy clustering (MCGFS). Compared to others approach which tend to utilize the classical hard clustering methods (K-means clustering to separate data set into subgroups so as to minimize the noise and form more homogeneous clusters (Chang et al., 2009), the benefit of our proposal system (MCGFS) is that it employs a fuzzy clustering (fuzzy c-means clustering) which permits each data record to appertain to each cluster to some grade, which permits the clusters to be large which consequently raises the accuracy of forecasting system results.
Another benefit of our approach is with no dependencies of the positions of the cluster centers, the estimation of belonging degree of each input record to each cluster is calulated.