Swarm-Based Feature Selection for Handwriting Identification

: Problem statement: Handwriting identification is the study for identifying or verifying the writer of a given handwritten document. Since the handwriting features are the cornerstone in the writers’ classification process, the classifier accuracy is sensitive in terms of how the writers are scored based on the used features. Approach: In this study, we introduced swarm intelligence as a features weighting mechanism to differentiate between the features having high importance and those having low importance in the identification process. The weights obtained from the swarm experiments were used to adjust the features scores and then to identify the most important subset feature for the writers classification process. Results: The experiments results showed that a significance influence of the feature weights in the handwriting identification process. Conclusion: This communication investigated the influence of the feature importance in the handwriting identification process. Binary Particle Swarm Optimization (BPSO) is used as feature selection method and Euclidian Distance (ED) is used as an evaluation function for the BPSO. The BPSO is trained using 956 words of the off-line IAM data (English handwriting) to learn the feature weights. Each word is represented by 29 statistical features.


INTRODUCTION
The ease of handwriting made it the oldest communication medium. This common habit has two main researches handwriting recognition and handwriting identification. Handwriting recognition is the task of determining the meaning of a handwritten text by transforming a language represented in graphical marks into a symbolic representation. Handwriting identification is the study for identifying or verifying the writer of a given handwritten document. Handwriting identification is a relatively new area of handwriting research when compared to the handwriting recognition or signature verification areas. Handwriting features are characteristics useful for writer discrimination (Srihari et al., 2005). Analysis of allographs (characters) and allograph combinations (words) is the key for obtaining those discriminating feature (Zhang and Srihari, 2003). Several studies have shown that handwritten elements are not equal in their discriminating power.
Handwritten words carry more individuality than handwritten allographs (Zhang and Srihari, 2003;Tomai et al., 2004). Capital letters bear more individual information than simple characters like "i" or "c" is proven by (Pervouchine and Leedham, 2006). Principal Component Analysis (PCA) is used by (Wang and Ding, 2004) for reduced the dimensionality of the feature space based Chinese handwriting. Aimed to improve the identification accuracy (Schlapbach et al., 2005) has evaluated the set search (Kittler, 1978) feature selection algorithms and GA on the problem of writer identification. The study has concluded that feature selection can significantly improve the writer identification rate using a substantial smaller set of features. Writer relevant features based artificial immune systems is presented in (Muda and Shamsuddin, 2005). Chapran (2006) has described a writer identification system that can be used in a resource contained embedded environment. Since the embedded environment has to have minimum computational costs and minimum identification errors, a feature selection algorithm based on likeness coefficients is proposed. Genetic Algorithm (GA) is used by (Gazzah and Amara, 2006) to select subset features for writer identification using Arabic script. Particle Swarm Optimization (PSO) has been used by (Das and Dulger, 2007) to select a reliable signature verification rate (fine tuning between acceptance and rejection rates). Pervouchine (2006) has shown that, not every document examiner feature can easily be represented as a computational feature and vice versa. Writer invariant features have been studied by (Bensefia et al., 2002;Muda et al., 2008).
Since, the handwriting features are the cornerstone in the identification process, the classifier accuracy is sensitive in terms of how the writers are scored based on these features. This communication is intended to investigate the effect of the feature selection and the feature weight in the handwriting identification problem. The PSO (Kennedy and Eberhart, 1995) has ability to perform such role and learn the feature weights. Since, PSO works on local level (particle level) and global level (swarm level), where many solutions are suggested for the problem and the best solution among them is selected. Furthermore, PSO is still not tested for handwriting identification, but it could revealed a high performance in some related fields like pattern classification (Tu et al., 2006;Huang and Kechadi, 2006), signature verification (Das and Dulger, 2007), handwriting digit recognition (Sahel Ba-Karait and Shamsuddin, 2008).

Particle Swarm Optimization (PSO):
Intelligence System (SI) is the collective intelligence resulting in the collective behaviors of (unsophisticated) individuals interacting locally and with their environment causing coherent functional global patterns to emerge (Ahmed, 2004). Particle Swarm Optimization (PSO) which is inspired by the social behavior of bird flocking or fish schooling and Ant Colony Optimization (ACO) which is inspired by behavior of ants are the primary computational parts of swarm intelligence.
Particle Swarm Optimization (PSO) is a population-based stochastic search algorithm. The PSO like other evolutionary algorithms (e.g., genetic algorithm) performs searches using a population (called swarm) of individuals (called particle) that are updated from iteration to iteration. Compared to GA, PSO is fast to implement since it has no evolution operators such as crossover and mutation i.e., few parameters to be adjusted.
The PSO was originally developed by (Kennedy and Eberhart, 1995); it made up of three parts a momentum, a cognitive, and a social. The momentum part consists of the V id term to keep the particle moving without getting trapped. The cognitive part is P id , each particle keeps track of its best location which is associated with the best solution the particle has achieved so far. The best position ever encountered by all particles of the swarm is also communicated to all particles. The social P gd term represents the information shared between all particles.
Generally, the initially start of PSO is created by distribute the particles over the search space randomly. Each particle flies through the search space at a velocity that is dynamically adjusted according to two factors its own experience (pBest), as well as according to the experience of all other particles (gBest). In the search space each particle representing a potential problem solution. At any iteration, each particle updates its own velocity and its position using Eq. 1 and 2 respectively. Finally, after several iterations the optimized (optimal or near optimal) solution is found. Figure 1 summarizes the work mechanism of PSO-ED: = The velocity of the particle i in the time point t in the search space along the dimension d p id (t) = The best position in which the particle previously got high fitness value, it is called pBest. x id (t) = The current position of the particle i in the search space r 1 and r 2 = Random generated numbers in the range (0,1) p gd (t) = The overall best position in which a particle got best fitness value, it is called the gBest c 1 and c 2 = Acceleration parameters W = Inertia weight, its value is decreased linearly over the time from 0.9-0.4: Where: x id (t+1) = The new position which the particle I must move to, where x id (t) = The current position of the particle i V id (t+1) = The new velocity of the particle I resulting in the calculation in Eq. 1 which mainly determines the new position of the particle The velocity of the particle must be in the range [V max , V min ]. Initially, PSO was set up as an optimization technique for real-number spaces. The binary PSO (Kennedy and Eberhart, 1997) is extension of the continuous PSO, since many optimization problems occur in a space featuring discrete, qualitative distinctions between variables and between levels of variables. in which the particle position is represented as bit string rather than real numbers. In BPSO the velocity is became a probability based a sigmoid function Eq. 3 that transform velocity values randomly into range (0 or 1): Handwriting features: The features that have been used in this paper are 29 statistical (moment and words' measurements) feature. Invariant moment features displayed by Eq. 4-16, while the words' measurements features which are word area, length, height, upper zone height, middle zone, lower zone height and their relationship e.g., aspect ratio of word length to its width displayed by Eq. 17-22.

Moment features:
Moments have been used extensively in computer vision, pattern and handwriting recognition and writer identification. Geometrical moments proved to be most useful to aspects of the shape of handwriting. It was determined that features corresponding to the human perception of word shape can be extracted from two-and three-dimensional moments (Liu et al., 1995). The geometric moment of (p+q) th order of digital image of size M×N is computed by Eq. 4: where, p, q = 0, 1, 2, ,n; x, y image coordinates and m pq is a geometrical moment of (p+q) th order. The central moments are computed by Eq. 5: where, p, q = 0, 1, 2, ,n; x, y are x and y means and µ pq is a central moment of (p + q) th order.
Following features are extracted from second and third order moments: where, p, q = 0, 1, 2, ,n; λ 1 , λ 2 are the inertial moments, and µ pq is a central moment of (p +q) th order.
The orientation feature is computed by Eq. 9: The inertial ratio feature is computed by Eq. 10: The aspect ratio feature is computed by Eq. 11: The horizontal skewness feature is computed by Eq.
Physically, the f 1 ,…,f 8 features are used to measuring the variance and skewness of black pixels at x, and y coordinates. They represent the extension of an object in the horizontal and the vertical directions.

Word's measurements features:
This study extracted 21 statistical feature based a word, such features as area, length, height, height of upper zone, height of middle zone, height of lower zone and their relation e.g. aspect ratio of word length to its height. The following subsections describe how we computed these features.

Area:
The area of the word handwriting image was found by summing any black pixels in the image Eq. 17: M N 9 x 1 y 1 f f (x, y) where, f(x,y) represents the black pixel at x, and y coordinates. M and N are the t image dimensions.

Length:
The length of word is determined by finding the column number of the first and the last pixel at x coordinate. Then, we subtract the column number of the first pixel from the column number of the last pixel and the result given as the word length: Where: x min = Represent the column number where the first pixel located, while x max = The column number where the last pixel located.

Height:
The way to find the height of the word image is the same as finding the length of the word image but we looking for row number. Then the height is found by subtracting the row number of the first pixel from the row number of the last pixel in the word image: y min and y max are rows numbers at y coordinate.
Upper zone, middle zone, lower zone: As depict by Fig. 2, English writing has three zones upper zone, middle zone and lower zone. To find the heights of each of these three zones. We first, should find the middle zone position in the word image. Then, Eq. 20-22 will be used to calculate the zones heights. In this study the position of the middle zone is determined using the horizontal projection: After computing all the features, a normalizing is implemented on the features vector because features can have different scales since they refer to comparable objects.

Particle Position Representation:
The BPSO is obtained by changing the position update formula Eq. 2 and 3 while leaving the velocity update formula unchanged. The velocity is became a probability where the value of each bit is retrieved from Eq. 3. In our case we use the binary PSO in which the particle position is coded to a binary bit string, Fig. 3. This means, each bit can take only the value one or zero which represents the selection case of one feature. When the bit value is 1, its corresponding feature is selected whereas the bit value is 0, it represents the corresponding feature is non-selected.
Evaluation function: Generally, a feature selection model (e.g., PSO) consists of a search mechanism and a fitness function. The search mechanism is used to find subset feature according to a selection criterion, while the fitness function to score the candidates of subset feature. The selection of fitness function depends on the application. In handwriting identification, a classifier is usually selected as a fitness function. In this study we will use the Euclidian Distance (ED) Eq. 23 as an evaluation function for BPSO.
In the BPSO architecture the fitness function is responsible for calculating the value for each particle. The list of particle values at each run consist the weights vector. Each feature weight corresponds to one bit in the particle position, that bit may contain one or zero. The score of the feature is calculated by summing up the feature weights corresponding to the bits containing ones and the feature weights corresponding to the bits containing zeros are excluded. Based on the resulting scores for each feature we can do the handwriting identification in two methods. First, feature selection method, the top n features in the weights ranked list will be selected as an optimal subset feature. Second, feature weight method, all the features will be used in the identification process. This method differs from the normal process since the features will represent by their importance which corresponding to their weights: Where: n = Represents the feature number r i = Represents the reference document q i = Represents questioned document The Training Procedure: We have set the BPSO variables as follows: number of particles is 5, V max = 4, V min = -4, c 1 =2, c 2 =2, the value of w is in the range [0.9, 0.4], the maximum number of iterations is 500 and the number of runs is 10. In each iteration, each particle selects specific number of features. Based on the selected features, writer identification process is created and evaluated using the fitness function as in Eq. 23. BPSO works as: (1) in the first iteration, the evaluation value of each identification process is selected as pBest for the corresponding particles and the best evaluation value among those five evaluation values is selected as gBest; (2) in the second iteration and above, the new pBest and gBest are selected by comparing the new evaluation values with the previous pBests; (3) by the end of each run, the position of the particle with the gBest value is selected as vector for the best selected features. In step 2 the comparison process is done as: (a) if any new evaluation value is better than the current pBest, the new evaluation value will be selected as pBest; (b) if there is any change in the pBest for any particle, the new pBest will be compared with the current gBest the better one will be selected as new gBest. Finally, the weights of the features are calculated as average of the vectors created in each run. The final feature weights are calculated over the vectors of the feature weights of all writers in the data collection.

RESULTS
The main purpose of conducting this experiment is to investigate the effectiveness of the feature weights in the handwriting identification process. Figure 3 shows the final weights of the features used in this study. Each feature first got weight as average of its selection cases over ten runs for each writer in the data collection (956 English handwriting words). Then the final weight of the feature is average of the feature average weight resulting in the ten runs over the total number of writers. From the results, we can see that the feature weights are differing in term of the feature importance in the identification process (Fig. 4). For example, quarter of the word's measurement features has got high importance while most of the moment features have got medium importance. Based on the experimental results we can do the handwriting identification in two methods. First, identification based on feature selection, the top n high weight features will be selected as an optimal subset feature. Second, identification based on feature weights, all the extracted features will be used in the identification process after multiplying each feature by its weight. This method differs from the normal process since the features will represent by their importance which corresponding to their weights.

DISCUTION
Among all the phases of handwriting identification, feature selection presents a multi-criterion optimization functions, e.g., simplifying features extracting task, optimizing identification system complexity, reducing running time and improving the classification accuracy. The goal of feature selection is to obtain the significant features by reducing the redundant features. This communication is intended to investigate the effect of the feature selection and the feature weight in the handwriting identification problem. The weights suggested by PSO promoted the scores of the highly important features, which give each handwriting feature the right score it was worth. For our future study, we will apply the feature weights obtained from this study for handwriting identification.

CONCLUSION
In this study, we have investigated the influence of the feature importance in the handwriting identification process. Binary particle swarm optimization is used as feature selection method through training process.
Euclidian Distance (ED) Eq. 23 is used as an evaluation function for the BPSO. We used 956 words of English handwriting for training purpose. The feature weights obtained by the training process have shown a significance influence of the feature weights in the identification process. The final average weights will be used as mechanism to distinguish between the high importance and the low importance features. Future work will be to apply the feature weights obtained from this study for handwriting identification.