Memory Convergence and Optimization with Fuzzy PSO and ACS

: Associative neural memories are models of biological phenomena that allow for the storage of pattern associations and the retrieval of the desired output pattern upon presentation of a possibly noisy or incomplete version of an input pattern. In this study, we introduce fuzzy swarm particle optimization technique for convergence of associative neural memories based on fuzzy set theory. A Fuzzy Particle Swarm Optimization (FPSO) consists of clustering of swarm’s particle by applying fuzzy c-mean algorithm to attain the neighborhood best. We present a singular value decomposition method for the selection of efficient rule from a given rule base required to attain the global best. Finally, we illustrate the proposed method by virtue of some examples. Further, ant colony system ACS algorithm is used to study the Symmetric Traveling Salesman Problem TSP. The optimum parameters for this algorithm have to found by trial and error. The ACS parameters working in a designed subset of TSP instances has also been optimized by virtue of Particle Swarm Optimization PSO.


INTRODUCTION
An artificial neural network (ANN) is an analysis paradigm that is a simple model of the brain and the back-propagation algorithm is the one of the most popular method to train the artificial neural network. Recently there have been significant research efforts to apply evolutionary computation techniques for the purposes of evolving one or more aspects of artificial neural networks.
The efficient supervised training of feedforward neural networks (FNNs) is a subject of considerable ongoing research and numerous algorithms proposed to this end. The back propagation (BP) algorithm [1] is one of the most common supervised training methods. Although BP training has proved to be efficient in many applications, its convergence tends to be slow and yields to suboptimal solutions [2] .
Attempts to speed up training and reduce convergence to local minima have been made in the context of gradient descent [3,4,5] . However, these methods are based on variable weight, learning rate, step size and bias to dynamically adapt BP algorithm and use a constant gain for any sigmoid function during its training cycle.
Evolutionary computation methodologies have been applied to three main attributes of neural networks: network connection weights, network architecture (network topology, transfer function) and network learning algorithm.
Particle swarm optimization (PSO) is a population based stochastic optimization technique [6,7] inspired by social behavior of bird flocking or fish schooling. This is modeled by particles in multidimensional space that have a position and a velocity. These particles are flying through a hyperspace and have two essential reasoning capabilities: the memory of their own best position and knowledge of the swarm's best, best simply meaning the position with the smallest objective function value. Members of a swarm communicate good positions to each other and adjust their own position and velocity based on good positions. There are two main ways this communication is done: • A global best that is known to all and immediately updated when a new best position is found by any particle in the swarm. • A Neighborhood best where each particle only immediately communicates with a subset of the swarm about best positions.
In this study we have designed an algorithm using a Particle Swarm Optimization (PSO) framework, to optimize the parameters of the ACS algorithm working on a single Symmetric Travelling Salesman Problem (TSP) instance. For each instance the algorithm computes an optimal set of ACS parameters, their performance on all instances (not only their related instance) and the characteristics of their related instance for the purpose of finding correlations.
The first ACO algorithm, called Ant-System, was proposed in [8][9][10] . A full review of ACO algorithms and applications can be found in [11] . ACS is a version of the Ant System that modifies the updating of the pheromone trail [12,13] . We have chosen this ACS algorithm to work with because of the theoretical background we have found on it [11,12] and the previous fine-tuning research on the parameters by [14] .
We have chosen PSO because it has an easy implementation for integer and real parameters and, as genetic algorithms, it performs a blind search on all the possible sets of parameters. In our algorithm the domain of the PSO will be all possible sets of parameters for ACS. For a position of a particle we compute the fitness by running the ACS algorithm with the parameters given by the position on a TSP instance.

THE PARTICLE SWARM OPTIMIZATION
PSO's precursor was a simulator of the social behavior that was used to visualize the movement of a birds' flock. Several version of the simulation model were developed, incorporating concepts such as nearest neighbor velocity matching and acceleration by distance [6,15] . Two variants of the PSO algorithm were developed, One with a global neighborhood and another with local neighborhood [16] .
Suppose that the search space is D-dimensional and then the i th particle of the swarm can be represented by a D-dimensional vector X i = (x i1 , x i2 ,...,x iD ). The velocity (position change) of this particle can be represented by another D-dimensional vector V i = (v i1 , v i2 ,...,v iD ). The best previously visited position of the i th particle is denoted as P i = (p i1 , p i2 ,...,p iD ). Defining g as the index of the best particle in the swarm (i.e., g th particle is the best) and let the superscript denote the iterative number, then the swarm is manipulated according to the following two equations [6] : id id n n 1 n n 1 id id id where, d = 1,2,...D, i = 1,2,...,N and N is the size of the swarm, C is a positive constant called, acceleration constant r 1 , r 2 are the random numbers, uniformly distributed in [0,1], and n = 1,2,…,determines the iteration numbers. Equations 1 and 2 define the initial version of the PSO algorithm. Since there was no actual mechanism for controlling the velocity of a particle, it was necessary to impose a maximum value V max on it. If the velocity exceeded this threshold, it was set equal to V max . This parameter is proved to be crucial, because large values could results in the particles moving past good solutions, while small values could result in insufficient exploration of the search space. This lack of control mechanism for the velocity resulted in low efficiency for PSO.
Various attempts have been made to improve the performance of the base line PSO with varying success. In [16] emphasis has been given on optimizing the update equations for the particles. Some researcher used a selection mechanism in an attempt to improve the general quality of the particles in swarm. In [6] cluster analysis technique is used to modify the update equation, so that particles attempt to confirm to the centre of their clusters rather than attempting to conform to a global best.
The aforementioned problem was addressed by incorporating a weight parameter for the previous velocity of the particle. Thus in the latest version of the PSO, Eq. 1 and 2 are changed to the following ones [17] : n n n n n n n X W C r X ) C r (p X ) X Z 2 id id id id id id In our proposed model both the approaches have been consider together. First, we clustered the swarm by applying fuzzy c-mean algorithm to attain the neighborhood best and then we reduce the number of rules required to attain the global best by virtue of singular value decomposition method.

NEIGHBORHOOD BEST USING FCM
The Fuzzy C-Means algorithm generalizes the hard C-means algorithm to allow a particle of swarm to partially belong to a multiple clusters. Therefore, it produces a soft partition for a given swarm. To do this, the objective function J of hard c-means has been extended in two ways Fig 1 and 2.
The fuzzy membership degrees in clusters were incorporated into the formula and: • An additional parameter P is introduced as a weight exponent in the fuzzy membership. • The extended objective function, denoted J, is where, P is a fuzzy partition of the swarm X formed by C 1 , C 2 ,...,,C k . The parameter p is a weight that determines the degree to which partial members of a cluster effect the clustering result.
Theorem: A constrained fuzzy partition (C 1 , C 2 ,...,,C k ) can be a local minimum of the objective function J only if the following conditions are satisfied: Based on this theorem, FCM updates the prototypes and the membership function iteratively using 3 and 4, until a convergence criterion is reached.
The algorithm of FCM can be described as: FCM (X, c, m, ε) X: An unlabeled swarm size C: The number of clusters to form p: The parameter in objective function ε: A threshold for the convergence criteria.

GLOBAL BEST USING SVD
In our proposed model, after clustering the particles of swarm, orthogonal transformation method is used for selecting important fuzzy rules from a given rule base [18][19][20][21] . Unlike conventional methods where multiple iterations are usually required to find optimal number of fuzzy rules, orthogonal transformation methods are a non iterative procedure. Therefore, orthogonal transformation methods are computationally less expensive compared to the conventional methods especially when the numbers of particles in the swarm are too large. In this section we introduce how to use singular value decomposition (SVD) to select the most important fuzzy rules from a given rule base and construct compact fuzzy models with better generalization ability.
Singular value decomposition takes a rectangular nby-p matrix A, in which the n rows represents the genes and the columns represents the experimental condition [22] . The SVD theorem states: T n p n n n p p p (i.e. U and V are orthogonal) S = diag (σ 1 , σ 2 , σ 3 ,..,σ m )∈R n×p (m = min{n,p}) is a diagonal matrix with σ 1 ≥σ 2 ≥σ 3 ≥….≥σ m ≥0. The columns of U are the left singular vectors has singular values and is diagonal (mode amplitudes), and V T has rows that are the right singular vectors (expression level vectors).
In the basic principle of using SVD for fuzzy rule selection, we can use fuzzy model with constant consequent constituents as an example. This type of fuzzy model, which is usually referred to as the zero order TSK model, has the following form [23] .
Where, C i is the constant constituents. The total output of the model is computed by.
where, w i is the matching degree. The SVD starts with an oversized rule base and then remove redundant or less important fuzzy rules through a one pass operation. Finally the efficient rule obtained is obeyed by all the swarm's cluster to approach the global best. Now we will illustrate the method by taking a example.
Suppose we are given a swarm of size six particles, each of which has two features F 1 and F 2 .We list the particle in given Table 1. Assuming that we want to use FCM to partition the swarm in two clusters [23] , suppose we set the parameter p in FCM at 2 and the initial prototypes to v 1 = (5,5) v 2 = ( 10,10 ).
The initial membership functions of the two clusters are calculated using 3:  Similarly, we obtain the following: Therefore, using three prototypes of the two clusters, the membership function indicates that x 1 and x 2 are more in the first cluster, while the remaining particles in the swarm are more in the second cluster.
The FCM algorithm then updates the prototypes according to 4. The update prototype V 1 , as shown in Fig. 3, is Fig.3 An example of Fuzzy c-mean Algorithm moved closer to the center of the cluster formed by X 1 , X 2 and X 3 , while the updated prototype V 2 is moved closer to the cluster formed by X 4 , X 5 and X 6 . Now we illustrate how to solve for SVD to obtain efficient rule for approaching the global best, let's take the example of the matrix.
For a n×n matrix W, the nonzero vector X is the eigenvector of W if: WX = λX, λ is the eigenvalue of A and X is the eigenvector of A corresponding to λ. So to find the eigenvalues of the entity we compute matrices AA T and A T A. The eigenvectors of AA T make up the columns of U so we can do the following analysis to find U. 20  . This is what the study was indicating.

ANT COLONY SYSTEM
The ACS works as follow, it has a population of n ants. Let denote for each arc e = (i, j) in the TSPinstance graph an initial heuristic value η e and an initial pheromone value τ e is originally set to the inverse of the cost of traversing the edge e. τ e is initially set to τ 0 = 1\L nn for all edge e, where L nn is equal to the inverse of the tour length computed by the nearest-neighborheuristic algorithm. Let q 0 , α, ρ∈[0,1], be real values and ϕ,β integer values between 0 and 8. For each vertex s∈V a neighbor set is defined among the nearest vertices, N(s). For a given ant r, let NV(r) be the set of non-visited vertices. We denote j r (s) = N (s)∩NV (r) the set of non-visited vertices among the neighbour set for a given vertex s and a given ant r.
In every iteration, each ant constructs a tour solution for the TSP-instance. The constructions phase works as follows: Each ant is initially set in a randomly vertex, then at each step the entire ants make a movement to a nonvisited vertex. Given an ant r with an actual position (vertex)s, p krs is computed as a reference value for visiting or not vertex k, where: This formula includes a small modification respect to the original ACS algorithm including ϕ as exponent of the pheromone level, this will allow us a deeper research on the effects of the possible combinations of, ϕ, β parameters.
A sample random value q is computed. If q≤q 0 we visit the city k∈V with maximum p krs (exploitation of the knowledge) otherwise ACS follows a randomproportional rule on p krs for all k∈V (biased exploration). If there are no non-visited vertex on the neighbor of vertex s, we extend 5 to all vertices in NV (r)/N (s) (those not visited by ant r and not included in the neighbor of s) and visits the vertex with maximum p krs .
After an arc is inserted into a route (a new vertex is visited), its pheromone trail is updated. This phase is called Local update and for an inserted e∈E: This reduces the pheromone level in the visited arcs and the exploration in the set of possible tours is increased. When all the tours have been computed a global update phase is done. For each edge e pertaining to the global-best-tour found: 1 / L e gb ∆τ = (8) where, L gb is the length of the global-best-tour found.
In the original ant algorithm and in most of the later versions, pheromone global update is performed in all the edges; ACS only updates pheromone level in the set of edges pertaining to the best tour.
We consider a trial as a performance of 1000 iterations. The lowest length tour found after all iterations are finished, is the best solution found by the trial. A feasible set of parameters for running ACS is a combination of feasible q 0 , ϕ, β, α, ρ number of ants (na) and a concrete neighbor definition.

PSO-ACO ALGORITHM AND IMPLEMENTATION
The algorithm is run each time on a single TSPinstances. The set of parameters of ACS that define a point in the PSO domain are q 0 , ϕ, β, φ, α, ρ and the number of ants (na). Most of them have already been explained. Let φ denotes the percentage of vertices that will be included in to N (v) for any vertex v∈V, so for a given v∈V and φ = 0.5 |N (v)| = [φ*|V|]. The ranges of each parameter are shown in Table 2 7 . is the domain of the PSO. We define the fitness value of a given position (point) as the length of the best tour computed by an ACS using the related parameters in the given instance. If comparing two different positions they have the same length value then computing time is considered. We consider better of those parameters that minimize the length of the tour and secondly the time of computing. For computing the fitness of a given position, first integer parameters (na, ϕ and β) are rounded up as shown in Fig. 4, secondly the algorithm runs five trails of the ACS algorithm using the rounded parameters in the TSP-instance and returns the best value obtained from the trails.
For each particle of the population its initial velocity is set randomly. For half of the population the initial position is set using predefined parameters assuring that for every parameter there will be a  particle containing a value covering the full range. The positions of the other half of the initial population is set randomly.
Parameters for the PSO have been set following [6,8] C1 = C2 = 2, χ = 0.729, the inertia weight is set initially to 1 and gradually decreasing from 1-0.1 (at each PSO iteration w = 0.99w). Maximum number of iterations is set to 500 due to computing time constraints (for 1000 PSO-iterations more than 1 day of computing time was necessary). The algorithm PSO-ACS pseudo-code is as follows: Select End Do Return the set of parameters related to the best tour length found and the tour length.
The algorithm is based in a PSO framework, where particles are initialized and iteratively are moving though the domain of the set of parameters. The goal of the algorithm is, for a given instance to compute the tour with lowest length and to compute the set of ACS parameters, among those in DPSO, which gets the best ACS performance. Those final parameters are related with the TSP-instance selected.

RESULTS AND ANALYSIS
Algorithm was coded in C++. Algorithm has been run on six of the most widely used TSP-instances. Computational results are given in four parts: PSO-ACS behavior, PSO-ACS optimum values obtained, best set of parameters and comparison among sets of parameters performance.
Computationally, each PSO -ACS iteration shows a clear convergence: when the optimum (defined by the algorithm) number of ants and φ are nearly fixed, the  Computational time is also fixed (Fig. 5). In less than 100 iterations algorithm computes an optimum for integer parameters and in 200 iterations there are small differences among the optimum found and the particle's position for real parameters. In Fig. 6 we can see the evolution of the algorithm in the first 100 iterations. For the average of the fitness of the swarm(in a given iteration),there is a decreasing global tendency and after iteration 75 we can see the average of the fitness is kept on a fixed range, the size of this range is variable as shown in ( c) and (d). For the minimum value obtained by the swarm in a given iteration, computational results show that at the beginning there are increasing and decreasing phases, when the particles are exploring their local optimums and moving also to the global one, but near iteration 100 the minimum is maintained as in (a) or frequently visited as in (b). This fast convergence can be an advantage as well as a drawback because it can lead to a fast non-desirable convergence.
We set the reasons of this fast convergence in the PSO framework used and mainly in the method for evaluating a set of parameters: in a stochastic algorithm there is the probability that a bad set of parameters could perform well, if all the particles move into this area and the number of iterations in this area increases leading to probably good solutions that will cause the algorithm to remain in this non-optimal area. Table 3 shows the optimum set of parameters found running PSO-ACS on each one of the instances selected.

CONCLUSION
In this research, a new approach is proposed for the convergence of associative neural memories by using the Fuzzy Particle Swarm Optimization technique (FPSO). The approach focuses on the neighborhood best and global best to increase the speed of convergence. In addition, this proposed model overcomes the local minima problem which is major drawback with the PSO technique.
The example illustrated suggests that our new approach can be used successfully as real time memory convergence technique for the artificial neural network.
Computational results seem to show that there is no uniquely optimal set of ACS parameters yielding best quality solutions in all the TSP instances. Nevertheless the PSO-ACS has been able to find a set of ACS parameters that work optimally for a majority of instances unlike others known so far.
PSO-ACS algorithm works well across different instances because it adapts itself to the instance characteristics. But it has a high computational overhead. A future work will try to modify the algorithm framework to reduce this cost.
PSO-ACS also has a fast convergence that can lead to a bad set of parameters. This may be due to two reasons: first is the specific PSO framework used and in modifying it we expect to obtain better results. Secondly the way the sets of parameters are evaluated may have to be reviewed as a bad set of parameters could lead to a non-desired convergence.