Broken Character Image Restoration Using Genetic Snake Algorithm: Deep Concavity Problem

: Active contours also known as snakes became a familiar and widely used in the field of image segmentation and restoration of historical documents in last few decades. Gradient Vector Flow (GVF) snake successes in overcome of converge to boundary concavities which represents the drawback of traditional snakes. Deep concavity problem it has become Obstacle faced GVF snake when restoring broken characters of historical documents. In this study we proposed algorithm to use genetic algorithm with GVF snake algorithm in order to optimize snake points to get right positions in deep concavity boundaries, also adding a Divergence factor as the third force to enhance the restoring and recognizing results. The experimental results show that our proposed algorithm has more capture than GVF alone.


Introduction
Old documents have important information therefore converted to digital images to save it from degradation. Most of these documents suffer from degradation like humidity, storing environments, ink, washed water. Break one or more of characters from these documents lead to lose historical or security information (Gonzalez and Woods, 2002).
Snakes were firstly introduced by Kassand its make the initial closed curve which is a sequence of discrete points converge to the desired edge byminimizing the energy functional. Active contours (snakes) are the efficient methods used in image segmentations to detect boundaries of images. Initially the object location is estimated then set the snake to surround object boundaries. Based on energy minimization functional, snake pushed to object boundaries by external force with saving curve from bending by internal force in order to convergence. Snakes have advantage of continuous segmentation On the contrary, of other detection techniques (Kass et al., 1988a).
Drawbacks in traditional snakes like initializing snake curve and converge to concave areas led to update new techniques be able to access concave object boundaries. Gradient Vector Flow (GVF) was introduced by Xu and Prince (1998) is able to overcome from limited converge to concave boundaries. Although GVF have large capture range and ability of converge to concave boundaries, it is difficult torealize accurate segmentation to detect deep concavities of complex shapes. Accordingly, we present Genetic-Snake algorithm to optimize GVF snake to enhance the capability of GVF to capture deep concavity boundaries.
Thepurpose of this paper is to overcome deep concavity problem by optimizing GVF snake points to deep concavity boundaries using genetic algorithm. This paper is organized as follows: In section 2 Related Works. In section 3 introduction to Active Contours. In section 4 introductions to Genetic Algorithms. In section 5 Genetic Active contours described. In section 6 Experimental results are discussed and in sections 7 conclusion is discussed.

Related Works
There are few research have been proposed for recognizing a broken characters. Bose and Kuo (1994) introduce method for Hidden Markov Model (HMM) for recognition of the degraded and touching texts. Yuasa et al. (1996) presented neural networks (Hopfield) to restore degraded characters. Some of work introduced to recognize the degraded character in Indian language script document (Chaudhuri et al., 2001). Allier and Emptor presented an active contour to restore degraded characters (Bénédicte and Emptoz, 2002). Pilevar and Pilevar (2011) presented chain method and template matching to recognizebroken and touching Persian characters. Also Sumetphong and Tangwongsan (2012) introduced Set partitioning technique to recognize broken Thai Characters. Using Balloon algorithm triangle step to restore broken character images (Mosa and Nasrudin, 2015).

Active Contours (Snakes)
Active contours or snakes are deformable models that moved according to influence of internal and external forces (Kass et al., 1988b). Active Contours are defined by an energy function. By minimizing this energy function, the contour converges to boundaries and the solution is accomplished.
The energy functionals of a parametric snake model can be represented as: The internal energy is defined as: The energy is composed of a first order form controlled by a(s) and a second order form controlled by β(s). The parameter a(s) controls the tension of the contour; parameter β(s) controls the rigidity of contour.
The external energy is defined as: where, G σ *I (x, y) denote to the image with Gaussian filter with standard deviation σ.∇ is the gradient operator and γ a weight associated with image energies. Snakes have drawbacks in initializing the curve close from object, also converge to boundary concavities. Xu and Prince (1998) introduced GVF snake to converge to boundary concavities.
Difference between VF snake and the traditional snake model lies in the external energies.
For the GVF-snake, the external force E-external is replaced by gradient vector follow: V(x, y) can be obtained by minimizing the following energy function: where, ∇f is the gradient of edge map and µ is an adjusting parameter. In order to minimize the energy of Equation 5, it must satisfy the following Euler equation: where, ∇ 2 is the Laplacian operator, in homogenous regions both f x and f y are zero. Although GVF has high ability to converge to objects with concave boundaries, faced difficult with complex shapes have deep concavity boundaries.
We propose an energy minimization procedure combine GVF snake with Genetic Algorithm will help to optimize snake points through genetic algorithm to minimize the snake energy by overcome the difficulties related to deep concavity areas.

Genetic Algorithm
Genetic Algorithm is the adaptive efficient method for optimization. It is based on evolutionary ideas of natural selection and genes. Using Genetic Algorithm we can get near optimum solution or most optimum solution. In genetic algorithm we create a new population of chromosomes from current population using natural selection together with adopted on crossover and mutation operators. Each chromosome represents a solution of the problem. In the search space we select the best solution depend on the selection method assigned. Each chromosome consists of "genes", each gene being an instance of a particular "allele" (0 or 1) (Sivanandam and Deepa, 2007). Encoding is the process of representing individual genes. The process can be performed using bits, numbers, trees, array, list or any other objects. The encoding depends mainly on solving the problem. Encoding mode and population size after assigning each chromosome (solution) will be evaluated using fitness function. After assign a fitness function for eachindividual of population, three genetic operators must be applied to prevent premature converge (Mitchell, 1996):

Selection Operator
This operator is applied on the population members to select the individuals have the highest fitness values to produce a new generation. There are different selection methods to selection the best from the population, follows the common methods of selection methods:

Crossover Operator
After select two best individuals for reproduction, using crossover operator produced two offspring from these two individuals (parents) selected. Three methods can be used to perform crossover operation: One point crossover, two point crossover and uniform crossover.
Crossover Probability (PC) is a parameter describes how the crossover will be performed. If there is no probability parents will copied to the next generation. If there is probability the offspring's will be made adopted crossover methods from the parts of parent chromosomes. Crossover was performed in order to enhance the chromosome by getting a new chromosome (offspring).

Mutation Operator
By changing a gene value, can produced a new chromosome. Applying the mutation operator save valuable information of the chromosome may be deleted during the execution of the algorithm, also prevent the algorithm from quick convergence and falls in the trap of local minima. There are various mutation operators: Mutation Probability (PM) decides how often parts of chromosome will be mutated. At first random number will be generated (0-1) for each gene to perform the mutation operation, if the number generated greater than or equal to PM the mutation will take place, otherwise no mutation will be performed. To calculate the mutation rate, the relation Pm = 1/L is used, where L is the chromosome length. After number of generations determined these operators will repeats until final criterion is reached. This termination criterion can be defined as reaching a predefined time limit or number of generations or population convergence (Alberto and Juan, 2006).

Genetic Active Contour (Snake)
Active Contours drawbacks which represent the obstacle to converge to boundary concavities and determining the initial position of concave contour, GVF snakes have successes in converge to concave areas with high rates; but still limited converge to complex shape with deep boundary concavities. Therefore combine Active counter with GA solve optimization problem of active contours.
In case of combine Active Contours with GA in application; the important thing must be concerning assign a proper population size and fitness function. An appropriate population (chromosome) size selection will be reducing the processing time and selecting a proper fitness function minimize the convergence time. Determining the searching region is important in GA to detect the object contour (Mun et al., 2004). Several methods of applications are proposed in genetic-snake to help in overcome of drawbacks in active contours. Lucia first introduced genetic-snake approaches to segmentation medical images (Ballerini, 1999), color image (Lucia, 2001). Some of authors adopted on Lucia approach, introduced update in chromosome structures and fitness functions (Rad and Kashanian, 2006;Talebi et al., 2011). Also use genetic-snake in optic nerve head segmentation (Hussain, 2008). Let we take a review for genetic active contour model as shown below in Fig. 1 in which the energy minimization based on genetic algorithm. Solution declaration (Fitness) definition and initialization represents the basic element of genetic snake. Coordinate x and y represent snake positions are encoded in the chromosome structure in gray code, also the total number of snake points are encoded in the chromosome and optimizing using GA, their ranges assigned by user.
Region of Interest (ROI) for genetic snake which include the initial population randomly selected in this region and each solution lies in this region (r and R defined by use). When setting r = 0 and R = max then we can hall area of the mage and the snake initializing will be automatically.

Materials and Methods
As we mentioned traditional snake has disadvantages summarized by first must be lies close from the object edge in order to converge it, second it's unable converge to boundary concavities. These drawbacks solved using GVF snake algorithm except converge to complex objects with deep concavity boundaries. Therefore used genetic algorithm to optimize the snake positions in the image plane v(s) = (x(s), y(s)). It's most important to select an appropriate population and fitness function in order to save time of convergence. The chromosome structure consists of the total number of snake points N and the coordinates of x, y encoded and optimized by the GA as shown in Fig. 2.
At first population of chromosomes considered as initial contour and then included a gene in the chromosome as described in Fig. 1. As soon as initial contours formed, the contour points are determined. Each of these points should be evaluated and the best ones are selected. Therefore must be defined fitness function lead to energy minimization of active contours as follows: Divergence is calculated by the equation: where, P and Q are the horizontal and vertical axis vectors respectively. From the divergence field as in (9), low values belong to the object boundaries, while large values to those areas which are far from the boundaries.
Setting the threshold value θ (appropriately chosen threshold because value of θ is delimit the deep concavity area) to determine which values of divF with coordinate (i,j) reach the boundary when: Each snaxel of the snake not satisfy (10) is removed from the snake.
Based on the definition of fitness function in Equation 8, we calculated a fitness function for all contour points in order to exceed the local minima and get better evaluation for all points at every step, instead of using the fitness function for each point. The total fitness function for every contour is got it by sum all fitness of the individual contour points. After that we select the contours with the highest values of fitness functions and considered as the best solutions. At selection stage, we used Stochastic Universal Sample (SUS) method by select four individuals from the population and adopted fitness function we selected two highest fitness values for crossover. Then pairs of members which selected will permitted to uniform crossover made. Next perform mutation operation on the population to prevent quick convergence and obtain better generation. This operation is continuing until criteria condition reached and the best contour is gained. Finally the best points of each contour selected from the last generation and the final contour is got by connect those points. We evaluate the proposed algorithm on restored broken characters which have complex object with deep boundaries concavities already GVF snake algorithm had suffered from lack converge it. We updated genetic-snake algorithm by added the divergence property as the third force to internal and external forces in fitness formula as mentioned in (8). Adopting on fitness function updated and chromosome structure, GA optimized snake points to the position of boundaries which had low divergence values with minimization of internal and external energies leads to outcome of fitness function.
The drawback of GVF snake faced from converges of character "G" which had difficult deep concavity boundaries. By made the gradient vector field for character "G" as shown in Fig. 3 will see the direction of vector fields which prevents GVF snake from converge it.
As compared with genetic algorithm, character "G" restored by GVF but not recognized well, because deep concavity problem as shown in Fig. 4. Genetic optimized for GVF was able to optimize the positions of snake points direct to deep boundary concavities adopted on maximize of fitness function which produced from minimization of energy forces and divergence. Figure 5 show the convergence of broken character suffering from deep concavity problem after optimized GVF using genetic algorithm.
To evaluate the performance of proposed algorithm versus GVF snake alone, images of Latin alphabet taken from ISO basic Latin alphabet are acquired and segmented using genetic snake algorithm. In this test we used 50 points for each contour and the energy coefficients are α = 0.4, β = 0.3, γ = 0.2. This test repeated for 600 times with 100 chromosomes. Vextractor application proceeds to get a ground truth images from raster images of ISO. To evaluate a performance of genetic-snake we used a Hausdorff Distance (HD) to compare between a ground truth images and other which restored by genetic-snake. These results gained compared with GVF snake algorithm which improved using Balloon force algorithm with triangle steps (Mosa and Nasrudin, 2015) as explained in Table 1.   Hausdorff Distance techniques specify the degree of mismatch between tow finite point sets, if A = {a 1, a 2 ...,a m } and B = {b 1 , b 2 …b n } are two sets and HD is defined as follows (Huttenlocher et al., 1993): ( , ) Where: When two images are identical, HD will be zero. In this algorithm we considered 40 chromosomes and length of gene is 6 bit.

Conclusion
In this article we tried to apply Genetic Algorithm to solve some drawback of GVF snake algorithm (Genetic-Snake) which represents deep concavity problem. Using Genetic Algorithms helps in optimize snake points to get right positions in deep concavity boundaries through adding third force (Divergence property) to snake energies in the fitness function to restore and recognize broken characters. The results have been obtained from the implementation of proposed algorithm on Alphabet broken characters images are well restored and the limitation of converge it is eliminated. The limitation of this algorithm happened with the difficulty to adjust the control parameters of internal and external energies, therefore future work these parameters will include in chromosome structure so as to assign theirvalues and getting on the technique able to restore broken characters without the need to adjust control parameters at every character convergence.