Swarm Optimization Techniques for Segmenting Gel Electrophoresis Images

: Gel Electrophoresis (GE) are discussed as the main tool to dissociate DNA sequences. It helps in analyzing the genome such that each image resulting from it consists of lanes that include several bands. Image segmentation plays the foremost role in image processing. It helps in producing accurate results in medical diagnosis. Image segmentation works by dividing an image into regions that cover the full image. Image segmentation methods can be implemented, but still have certain defects that cannot produce accurate results. On the other hand, Swarm Optimization methods produce results with high efficiency in image segmentation. In this study, swarm optimization techniques for image segmentation are proposed. The proposed technique depends on applying different segmentation methods as Fuzzy C-Means (FCM) and Particle Swarm Optimization (PSO) is an extensively used in computer science considered a simple and easy algorithm to implement. It also depends on swarm intelligence. PSO useful in image segmentation because the result is more exact and efficient. Furthermore, Darwinian PSO (DPSO) and Fractional Order Darwinian PSO (FODPSO) produced precise results. The efficiency of the proposed approach is compared with other by computing image quality measurement parameters like Peak Signal to Noise Ratio (PSNR), Mean Square Error (MSE) and others. The proposed technique, especially FODPSO produces more accurate results to segment GE image.


Introduction
Deoxyribonucleic Acid (DNA) is the backbone of all living organisms (Zhu et al., 2011). DNA comprises of double strands of sugar, connected together by nucleotide bases. It also has four bases; Adenine (A), Cytosine (C), Guanine (G) and Thymine (T). The varieties of DNA of different living beings in the whole world depend on the variety in the length of the helix and its order. It is very easy to determine a sequence of bases if you know the sequence of bases on one side of the double strand. Since the two strands are compliments like (A) is complemented with (T) and (C) is complemented with (G). Figure 1 displays two strands of DNA.
The DNA sequence is the process of specifying the accurate rank of the four bases of DNA or RNA. It's helpful for human recognition, which is used in genetic testing and fingerprint, is unique for every person. There are several techniques for DNA sequence like Maxam-Gilbert sequencing in (1977) by Allan Maxamand Walter Gilbert and Sanger Sequencing, the popular technique for DNA sequence, by Fredrick Sangerat (1977) (França et al., 2002). These techniques depend on a number of bases. If DNA sequences are above 1000 base pairs, then use the Shotgun sequencing and electrophoresis.
Electrophoresis, discovered by Fred Sanger, is the main tool for DNA sequencing. It divides a molecule in to several pieces of different sizes by restriction enzymes (Lee et al., 2011). Gel electrophoresis is used to divide DNA pieces according to size and an electric field. If the DNA is negatively charged, it will move towards the electrode of opposite charge (Fig. 2). There are several types of gel the selection depending on the sizes of basis. Agarose gel divides great nucleic acids, to short nucleic acids and proteins using Acrylamide gel (Akhter et al., 2008). The result of this method is a single image that includes multiple lanes which are vertical. There are several horizontal bands for every lane. Image resulted from Gel Electrophoresis still the main way to cope with a DNA sequence. The principal defect of the Gel Electrophoresis image is suffering from multiple noises that lead to a reduction in the image quality, so image processing is very useful in this area. In this study, image processing is composed of three levels: Image preprocessing, image segmentation and result evaluations. Image Preprocessing is the principal level in this approach. It consists of image enhancement techniques as applied filtering. It is also useful to display the clearest image Taher et al. (2013). Image segmentation is the next step in this approach. It is the method that separates an image into regions that cover the image. It is important to apply segmentation methods in the gel electrophoresis image to detect all bands and lanes and to use the resulted image to identify diseases such as breast cancer. Also, it is important because it's the second step after removing noise from the image. SWARM comes from a moving group of birds searching for food in a search space. They have not known the best area where food is in therefore, if any bird detect the destination, then all of combination will move toward it.
The core problem of multiple segmentation methods is producing defected results such as the result is not accurate enough to segment gel electrophoresis image.
So, the aim of this paper is to solve this problem by applying multiple Swarm Optimization techniques. Since they produce more accurate and high efficient results for medical image segmentation. The proposed segmentation method applies Fuzzy C-Means (FCM) and Particle Swarm Optimization (PSO). PSO is a stochastic optimization algorithm that depends on swarm intelligence and Darwinian PSO (DPSO) to solve the main problem that faced PSO and Fractional order Darwinian PSO (FODPSO) more popular techniques in image clustering and image segmentation and compute results by using image quality measurements. Then compare the preceding techniques to show the best.
The main contribution of this paper is to produce pure segmented images with clear bands and lanes without losing the features of images because they real images. Then classify the resulted image to detect breast cancer disease.
The rest of this paper divided into four sections: Section Two contains the methods that have been applied to image segmentation. It is divided into two subsections FCM and SWARM Optimization. SWARM Optimization is divided into three Sub subsections: PSO, DPSO and FODPSO. Section three includes two Subsections which state steps for preprocessing and illustrate a block diagram of the proposed segmentation technique. Section four includes three Subsections in Subsection Data Sets presents the image data sets that have been tested. In subsection Performance Evaluation shows some figures after applying segmentation techniques and performing comparisons between results using specific parameters. In subsection comparison with other technique displays their results in tables. After performing comparisons between results using specific, image error measurements to obtain the best result. Section Five explores the paper summary and its conclusion.

Related Work
Many researchers had applied several segmentation techniques that performed on gel electrophoresis images to get more accurate and efficient output (Talukder, 2011). Noor et al. (2011) proposed multilevel thresholding Otsu method based on Particle Swarm Optimization to segment Gel Electrophoresis image for DNA. After experimenting with this technique, efficient results for segmenting all bands in the gel electrophoresis image were produced. The disadvantage of this technique is that it did not segment all lanes. Sengar et al. (2012) applied watershed method using wavelet transform to segment 2D Gel Electrophoresis image as protein spots. The advantage of this technique is the fact that it applies a single threshold factor. This technique is helpful to segment spots, but there are some missing spots after segmentation. Ahmad et al. (2013) proposed segmentation of the Gel Electrophoresis image as DNA by PSO technique with Kapur multilevel thresholding. The aim of this paper is to detect best threshold level, advantage of the proposed technique is computed in a few times and segmented bands for the Gel Electrophoresis image correctly. On the other hand, the drawback of this technique removes the background, so some details may be lost. Savelonas et al. (2012) proposed segmentation method for Gel Electrophoresis image as protein spots depend on active contours. The benefits of this method are solving some problems in image analysis such as noisy image; weak spots and the result have more quality. Lee et al. (2011) proposed analysis of DNA Gel Electrophoresis image as spots using enhanced FCM for image segmentation. This paper compares basic FCM with Enhanced FCM. The benefits of this method bands are detected correctly depending on detecting lane that includes those bands and the repeated bands that have been removed. The weakness of this technique is because it detects bands, without segmenting it. Raju and Rao (2013) established segmentation of the mammography image based on FCM and PSO techniques. PSO techniques had been used to improve the result of FCM. The advantages of FODPSO along with FCM are that they are the best techniques for medical image segmentation and as their computation time is very low. Ghamisi et al. (2012) applied the DPSO technique to segment remote sensing image. The highest goal of this technique is to reach (n-1) an optimal result. The advantage of this technique is solving the problem of PSO that is trapped in local optima. It's also more efficiency compared with traditional PSO. This technique needs to be developed because it is the first time to be used in remote sensing image. Sandeli and Batouche (2014) proposed new segmentation technique consists of PSO, GA and Artificial Bee Colony (ABC). This combination called generalized Island Model (GIM). The aim of this method is to solve the local optima problem. The consequence of this technique is beneficial, but when compared with DPSO and FODPSO, the latest technique is more accurate. This technique needs to be developed to enhance the system performance. Ghamisi et al. (2014) established segmentation method depending on FODPSO along with Support Vector Machine (SVM) for remote sensing image to solve many optimization problems and Otsu problem. The advantages of this technique are reducing the n-level threshold to detect the optimal thresholds that maximize the variance between classes. Additionally, its computation time is very low. It is further more convenient than using DPSO for finding the global optimum.

Theoretical Background
In this section, we will discuss several segmentation techniques that are applied to a gel electrophoresis image. These are Fuzzy Clustering Means (FCM) and swarm optimization techniques which produce accurate results in medical image segmentation. This paper discussed Particle Swarm Optimization (PSO), Darwinian Particle Swarm Optimization (DPSO) and Fractional Order Darwinian Particle Swarm Optimization (FODPSO) each one of those discovered to solve a problem in another.

Fuzzy C-Means
Cluster analysis is the primary method in pattern recognition, image processing and image segmentation. It depends on an unsupervised method. This technique is built on splitting data set D into small subset d (cluster). Clustering is the method of collecting similar objects into groups; it consists of two approaches: Hard clustering and fuzzy clustering (Yang, 1993). FCM one of the hard clustering (Dias et al., 2015). The hard clustering method depends on one cluster for each item in the data set while Fuzzy Clustering method is used if an item belongs to two or more clusters (Hemanth and Anitha, 2015). FCM stands for (Fuzzy C-Means), it is the principal method of clustering that depends on the fuzzy clustering theory. The first appearance of this method was in 1973 by Dunn and in 1981 this method was developed by Bezdek (Menon and Ramakrishnan, 2015;Yang and Huang, 2012). Minimizing the objective functions are the goal of this method. It also composes of iterations. The advantage of FCM is the results more accurate and one of the important disadvantages is executed in more time so you need to avoid this problem by using Particle Swarm Optimization and the extension of this method (Hemanth and Anitha, 2015). This technique works by dividing the image into two parts, one of its similar areas and different area. The goal of this method is to minimize the objective function (Alsmadi, 2015;Hemanth and Anitha, 2015). The objective function given by this equation: where, weight is m∈ [1,∞], u ij is between 0 and 1, c i is the centroid of the cluster I, d ij , x j is the distance between the center and data point given by this equation: To get an optimal objective function as in Equation 1 then we update membership by this equation: The FCM algorithm given by the following steps: Step 1: Set value of c, m randomly where 2≤c<m.
Step 2: Put initial value for membership matrix u k = u 0 using Equation 3 where is the number of iterations.
Step 3: Compute the cluster center using C i an equation: Step 4: Also by using Equation 3 update value for to u k compute u k+1 Step 5: Compare between u k+1 and u k , if || u k+1 -u k ||<ε then terminate; otherwise return to Step 3.
Note threshold value between 0 and 1.

SWARM
In this subsection, we will discuss three segmentation methods depending on swarm intelligence: PSO, DPSO and FODPSO. Swarm Intelligence (SI) is helpful in several parts such as optimization. The main goal of the optimization method is to determine maximized or minimized objective functions in some feasible area (Talukder, 2011).

Particle Swarm Optimization
Particle Swarm Optimization (PSO) is a member of an effective and stochastic optimization algorithm. This technique is established by Eberhart and Kennedy in 1995 depending on the common behaviors of birds flocking and swarm theory (Rini et al., 2011). It is also useful for solving many optimization problems in several parts as image segmentation and its results are more accurate (Mohsen et al., 2011). The objective of PSO is to get the global optimal solution in a complex search space. Now, we found multiple version of PSO. These are assorted matches between this method and Genetic Algorithms (GA). The advantage of PSO is easier to compute and it is very faster than GA (Kaur and Singh, 2012). PSO is used in multiple fields such as signal processing, image segmentation, image processing, neural network, data mining and medical imaging (Tandan and Raja, 2013). It also consists of particles where particles set as candidate solutions. The aim of every particle is to produce an optimal solution in the search space. Traditional PSO algorithm consists of two main equations (Raju and Rao, 2013;Tandan and Raja, 2013): At the beginning particle velocities are set to zero and set the particle at a random position.
Table 1 discusses factors that have been used in the Particle Swarm Optimization method. In "Equation 5" and "Equation 6" there are several parameters like velocity n i v , Particle position n i x , select random number r 1 ,r 2 , r 3 . Which the best value for all particles represented by n i g , Local best is the best function for this particle and Neighborhood best is the best function for the neighborhood particle represented as the following n i x , n i n also using constant values m,c 1 ,c 2 , c 3 . The following algorithm explains how Particle Swarm Optimization method works.
Pseudocode of standard PSO considered as: Yetirajam and Jena (2012;Cui et al., 2005;Raju and Rao, 2013) Start particle with random position n i x and velocity n i v For all particles in search space from 1 to n Do Compute fitness value Compare between fitness (x i ) and fitness ( i x ) If fitness (x i ) > fitness ( i x ) Then i x = x i Compare between fitness (x i ) and fitness ( i g ) If fitness (x i ) > fitness ( i g ) Then i g = x i Update velocity and position using Equation 5 and 6 End The pervious pseudo code end when i g is the optimal solution. The most common defects in PSO technique are dealing with problems PSO can solve one, but at the same time failed on another. Another defect PSO depends on the parameters, so any changes in one parameter can change at the speed of this technique.

Darwinian Particle Swarm Optimization
Optimization algorithms and PSO faced a common problem that may trap in a local optimum (Raju and Rao, 2013;Tillett and Rao, 2005). Darwinian Particle Swarm Optimization DPSO developed by Tillett and Rao (2005) to solve the problem of PSO that is escaping from local optima. Because of using one swarm of test solutions in PSO. It is complex for one swarm to differentiate between a local optimum and a global optimum, so DPSO was developed for solving this problem. For any time, many swarms are existing to test solutions. If a search gets to a local optimum, then the search in that area is substituted for another area in a search. DPSO is very beneficial compared with the PSO, DPSO produce result with efficient performance and in less CPU time than PSO.
PseudocodeofBasic DPSO technique (Ghamisi et al., 2012;Tillett and Rao, 2005) For every swarm in the search space Do Apply the swarm (discussed below) Move to new swarm If the swarm is unsuccessful Then remove it End If End For every particle in the swarm Do Update Fitness Particle and Update Best Particle If global best fitness is found Then use a new particle End If If the swarm failed to reach best global fitness Then a particle is removed End If End

Fractional Order Darwinian Particle Swarm Optimization
One extension of the Darwinian particle swarm optimization (DPSO) is FODPSO presented in Pires et al. (2010). This based on Fractional Calculus (Raju and Rao, 2013). Fractional Calculus (FC): Is the ideal topic for many researchers. The concept of fractional differential is taken from Grünwald Letnikov. This is given by the following equations Kaur, 2012;Raju and Rao, 2013): where, α fractional coefficient αЄC, Г is the gamma function and the y(t) represents a general signal. In Discrete time, the signal D α [y(t)] can be defined as: The period for a sample is represented by T and r is the truncate order. The following equation depends on "Equation 1": Using the previous "Equation 9" and "Equation 5", "Equation 6" then "Equation 9" can be rewritten as the following: DPSO can be seen as specific in case of the FO-DPSO when α = 1 in the previous equation (Kaur, 2012). Finally, FODPSO is faster than the PSO and more efficiently than DPSO in the area of avoiding local optima.

Proposed Segmentation Technique
This section contains 2 Subsections, each of which is the proposed steps applied in this study. First, discuss preprocessing steps to show how to convert the DNA sequence to gel electrophoresis image. Other Subsection explains segmentation steps that used for this type of image and state there in the pseudo code.

Preprocessing Step
In this subsection, we consider the steps to preprocessing DNA electrophoresis gel image. First, explore the algorithm to convert DNA sequences to gel images.
The procedure for creating gel electrophoresis image of DNA Sequence is sequenced by the following steps: Step 1: Create a file, put multiple DNA sequences in this file and all of these must be in the same format like Fasta format.
Step 2: Apply DNA Sequence alignment ClustalW for each sequence in the file.
Step 3: Produced aligned file and apply a restriction enzyme to it that to cut DNA Sequence.
Step 4: Draw electrophoresis image: One gel for each sequence and one lane for each restriction enzyme.
In this subsection, we consider the steps to preprocessing DNA electrophoresis gel image. First, explore this algorithm applied to both multiple DNA sequences and the above steps to obtain the results for images which consist of lanes and bands. This algorithm can be applied by Bioperl toolbox and it is the first step for preprocessing image. Then we applied some MATLAB operations for image pre-processing such as: Image conversion from RGB color to grayscale if its color image, subtract background from an image, enhances it by applying some filter after this step, we apply some segmentation techniques such as: FCM, PSO, DPSO and FODPSO. Finally, we evaluate the result to be determined by the best technique.

Segmentation Step
Image segmentation is the main subject in image analysis, medical image processing and pattern recognition (Kannan et al., 2012;Mohsen et al., 2012;Yang and Huang, 2012). The target of image segmentation is dividing an image into several homogeneous regions as color, texture and detect boundary (Kannan et al., 2012;Wang and Bu, 2010;Yang and Huang, 2012). There are several categories of image segmentation such as: Clustering based segmentation and edge based segmentation. This paper discusses FCM and PSO techniques for image segmentation. It is important to apply segmentation methods in the gel electrophoresis image to detect all bands and lanes and to use the resulted image to identify diseases such as breast cancer.

Experimental Results
In this section, the database consists of several images that have been segmented by proposing a segmentation technique in Fig. 3 using MATLAB tool.
Results from the proposed segmentation technique are evaluated by image quality measurement factors and then the results are compared with each other to specify the accurate result. We will try four images that have been segmented to obtain the best result.

Data Sets Performance Evaluation
Pseudocode of the proposed segmentation techniques has been implemented using the MATLAB R2012a tool. Multiple electrophoresis images have been used in our experiments can be tested. The following figure shows

Comparison with Other Techniques
In this subsection, we evaluated the quality of previously discussed figures in the tables that represent MSE, PSNR and other factors. We have noticed that when MSE is low, PSNR is very high, so, there is a very low error in the image.

Mean Squared Error (MSE)
The squared difference between the original image and segment image. MSE measured error between pre-processing reference image and segmented image, the large value means poor quality image (Desai and Kulkarni, 2010;Ece and Mmu, 2011). It is defined by: where, x(i,k) is represent preprocessing reference image x i k is the segmented image.

Peak Signal to Noise Ratio (PSNR)
Used to measure quality between two images after applying some operation on them such as image compression, image enhancement and image segmentation (Desai and Kulkarni, 2010;Ece and Mmu, 2011

Average Difference (AD)
Computes the difference between the original image and segment image and then take the average result. It is based by this equation (Desai and Kulkarni, 2010;Ece and Mmu, 2011):

Maximum Difference (MD)
Computes differences between the original image and segment image and then take the maximum value of it (Desai and Kulkarni, 2010;Ece and Mmu, 2011), if the value of MD is large this mean that the image has poor quality:

Structural Content (SC)
Measures the similarity between two images, is considered one type of correlation (Desai and Kulkarni, 2010;Ece and Mmu, 2011

Normalized Cross-Correlation (NK)
It is also a type of correlation. It computes the similarity between the original image and segment image (Ece and Mmu, 2011).

Conclusion
Image segmentation methods for segmenting and detecting all bands of Gel electrophoresis images representing the DNA sequence is proposed and implemented. This work focuses on using several techniques for image segmentation and compare between them to find the best technique which produce the best results. The method is based on using threshold segmentation, FCM, FCM uses a threshold to segment Gel Electrophoresis images note the results and then compare their results after applying the proposed technique using swarm optimization and their generations like PSO, DPSO and FODPSO. PSO and DPSO techniques are quite effective. The benefits of FODPSO are decreasing the computational time and more efficient than DPSO in area of avoiding local optima. The experiments demonstrate that the image segmented using the proposed technique by FODPSO represent high accuracy, effective result and best technique to segment and detect all bands. In the future, We can extend this technique in DNA image classification to diagnose diseases.