IMPROVING INDEPENDENT COMPONENT ANALYSIS USING SUPPORT VECTOR MACHINES FOR MULTIMODAL IMAGE FUSION

The objective of this study is to combine multiple images of a scene acquired by different sensors to create a new image with all important information from the input images. Recent studies show that bases trained using Independent Component Analysis (ICA) is effective in multisensor fusion and has improved performance over traditional wavelet approaches. In the ICA based fusion, the coefficients of the input images are combined simply by selecting the coefficients with maximum magnitude. But this method resulted in fused images with poor contrast, due to the distortion introduced in constant background areas. The performance of ICA based fusion can be greatly improved by using a region based approach with intelligent decision making in order to choose the significant regions in the source images. Hence, a new region based image fusion algorithm for combining visible and Infrared (IR) images using Independent Component Analysis and Support Vector Machines (SVM) is proposed. Region based joint segmentation of the source images is carried out in the spatial domain and important features of each region are computed in spatial and transform domain. A Support Vector Machine is trained to select the regions from the source images with significant features and the corresponding ICA coefficients are combined to form the fused ICA representation. The proposed algorithm is applied to different sets of multimodal images to validate the robustness of the algorithm and compared with some standard image fusion methods. The fusion results demonstrate that the proposed scheme performs better than the state-of-the-art image fusion methods and show a significant improvement in Entropy, Petrovic and Piella evaluation metrics.


INTRODUCTION
In recent years, with the availability of low cost imaging sensors, multiple sensors are used in applications such as robotics, surveillance, defence, medical imaging and remote sensing in order to improve the system performance. In such multisensor systems, redundant and complementary information from different sensors are combined to form a fused image that contains all important information from the input images (Lewis et al., 2007). Image fusion enhances the perception of a scene, by integrating multiple sensor data (Mitianoudis and Stathaki, 2007). The fusion process creates more accurate fused images suitable for carrying out further processing (Nirmala et al., 2011). The multiple sensors deployed to capture images of a scene may be of the same type or different. In multimodal fusion the images from different sensors such as visible and Infrared (IR) are combined (Saha et al., 2013).

Recently, merging visual and infrared images is commonly
Science Publications JCS used in many surveillance and tracking applications (Drajic and Cvejic, 2007). The fused images obtained by this multimodal fusion, have enhanced information than the individual sensors, which improves detection.
Image fusion algorithms are broadly classified into two categories namely, pixel based and feature based. In pixel level fusion, the raw pixels obtained directly from imaging sensors are combined based on certain fusion rules. Feature based fusion, extracts some features from the source images and performs fusion based on these features. This is usually achieved by using a regionbased scheme. In the recent past, many researchers have demonstrated the effectiveness of Multiresolution Transforms (MR) in image fusion applications (Blum and Liu, 2005). A popular multiresolution transform is the Discrete Wavelet Transform (DWT) (Mallat, 1998). Li et al. (1995) first employed DWT for image fusion. In recent years, many pixel-level image fusion algorithms using MR transforms are proposed and successfully applied for fusion (Ellmauthaler et al., 2013).
Recently region-based methods are widely used in majority of the applications, as the objects and the region details are more important than the individual pixels. Hence, the object and region details are incorporated for fusion (Piella, 2002). In these methods the input images are divided into several regions and some important features of each region are determined. These features are used to decide the input image from which a particular region is to be selected in the fused representation. Region based methods are more flexible in adapting to intelligent fusion rules than pixel based approaches (Wan et al., 2009). Also these methods circumvent the common problems of noise sensitivity and blurring associated with pixel based schemes. A region based fusion technique using the Dual Tree Complex Wavelet Transform (DT-CWT) is proposed by Lewis et al. (2007). In this, only one region feature Shannon entropy computed from DT-CWT coefficients is used as the activity measure for selecting regions from the input images. The fusion performance of this region based method is found to be comparable to that of pixellevel fusion method using DT-CWT. Piella (2002) presented a standard region based multiresolution scheme that outperforms other multiresolution fusion techniques. But the implementation was in the preliminary stage and the results need to be optimized.
In image processing applications, it is required is to find a suitable representation for image data and are generally based on linear transformations such as Fourier, Cosine, Haar and wavelet transforms. Each of these transforms has some advantages over the other. If a linear transformation is estimated from the data to be processed itself, it could adapt to any type of data (Jutten and Herault, 1991). As Independent Component Analysis (ICA) is such a fundamental linear transformation in which the bases are determined from the data itself, in the proposed work ICA is used for image representation. The ICA bases are very effective for image fusion, which can outperform wavelet based schemes (Mitianoudis and Stathaki, 2007). In the recent past, several methods were implemented in which ICA is successfully applied for image fusion in transform domain . Mitianoudis and Stathaki (2007) used bases trained using Independent Component Analysis (ICA) as a tool for fusion and applied this algorithm for fusing visual and IR images. In the fused images produced by this method, the important objects from the infrared images were blurred. Recently, Cvejic et al. (2007) have proposed a region based method using ICA bases in which entropy of regions is used as a priority measure. The authors used an adaptive approach for the reconstructing the fused image using the Piella quality index (Piella and Heijmans, 2003). Though this approach resulted in improved performance, it adds to a high computational overhead and consumes more time.
Region based image fusion involves feature extraction and selection process. This requires a suitable classifier that uses multiple features of the input images to identify the most important regions (Lu and Weng, 2007). The Support Vector Machine (SVM) is a classifier employed successfully in various applications and is found to outperform conventional classifiers (Li et al., 2004). SVM a supervised classification method can handle large input data and feature sets (Luo et al., 2012) and hence suitable for image processing applications. A pixel based method for fusing multifocus images using Discrete Wavelet Frame Transform (DWFT) and SVM is proposed by Li et al. (2004) in which a trained SVM classifier chooses the pixels from the source images that have the highest activity at each coefficient location. Chen et al. (2008) presented Empirical Mode Decomposition (EMD) with SVM for merging multifocus images. Fusion results of these two methods demonstrate that the use of SVM classifier resulted in an improved performance over DWFT and EMD fusion.
It is evident from the related work, that ICA with SVM is more appropriate for combining multiple images. In this study, the combination of ICA and SVM

JCS
is used for multimodal image fusion, to improve the ICA based fusion. This method employs bases trained using Independent Component Analysis and Support Vector Machine classifier to choose regions with significant features from the segmented source images. This algorithm takes into account, multiple region-based measurements in spatial and transforms domains, to decide the significance of the various regions in the input images. The decision making to choose or discard a region, based on the region features is performed by a trained SVM. The region with highest priority is always chosen and the corresponding ICA coefficients form the fused ICA representation. The inverse transformation is applied to this fused ICA representation to get the fused image.
The rest of this study is organized as follows: In Section 1.1 the basics of ICA and SVM are discussed. In Section 2 the proposed fusion method is described in detail. The fusion results are presented in Section 3 and Section 4 provides the conclusion the paper.

Background
The proposed study employs Independent Component Analysis as the linear transformation tool for image analysis. Segmentation is performed to divide the source images into meaningful regions and a Support Vector Machine classifier is used to identify the most important regions based on the region properties. The algorithm then combines the ICA coefficients of the selected regions to obtain the fused representation. In this section, the theoretical background of the proposed study is discussed.

Image Analysis using ICA
In image analysis, an image is represented as a combination of basis images and is based on standard bases such as wavelet and cosine. But in ICA, the bases are estimated from the input data itself. In order to generate these bases some training images having statistical properties similar to the input images to be combined are required. For an M 1 ×N 2 image f(x,y), an image patch is defined by Equation 1: for all u,v ∈[0,N-1], where w(u,v) is a N×N window centered at the pixel (u 0 , v 0 ) Each N×N image patch f w (u,v) is expressed as a linear combination of P basis image patches b j (u,v) as given by: where, s j are scalar values, given by : Several N×N training patches are chosen randomly from the training set using a rectangular window and each N×N image patch is converted into a vector f w . The resulting vector is represented in terms of the basis vectors b j as follows Equation 3: where 't' is the patch index and P is the number of image patches.
Equation (2) can be written as Equation 4 and 5: where, s(t) = [s 1 (t),s 2 (t),…,s p (t)] T , B the synthesis kernel given by the analysis kernel. The purpose is to determine a set of L basis vectors that are statistically independent. Therefore to estimate the L<N 2 uncorrelated basis vectors, Principal Component Analysis (PCA) is used. This helps in dimensionality reduction (Hyvarinen, 1999a), as reducing the dimension lowers the computational costs. The input data correlation matrix given by Equation 6: where, ε is the expectation operator. PCA is the eigen decomposition of the matrix C d . After applying PCA, the L eigen vectors with L largest eigen values are selected. Let V R be the reduced PCA matrix of dimension L×N 2 and the transformed patches are given by Equation 7: Science Publications

JCS
Following the pre-processing step using PCA, the basis vectors j s that are statistically independent are selected by optimizing cost function negentropy. This optimization results in Fast ICA algorithm proposed by Hyvarinen and Oja (2000) and the update rule is given by Equation 8 and 9: where, i a are the projection vectors, is any non-quadratic function. In this study, G(x) is chosen as given by Equation 10: where, β is a constant, 1≤β≤2 (Hyvarinen, 1999b).
The ICA update rule given in Equation (8) is iterated in a chosen neighborhood until i a converges. The selected random patches k f (t) are now converted to the ICA domain representation k s (t) as given by Equation 11: Thus the input image patches k f (t) of the source images can be converted to the ICA domain representations s k (t).

Support Vector Machines
A SVM is a classification method that discriminates between two classes, by fitting an Optimal Separating Hyperplane (OSH) (Gunn, 1998). It actually maximises the margin between two classes of training data. With {x j } as a set of input training samples having n training points in a 'N' dimensional space, the output y j with y j ∈{-1,1}, the training data is given by Equation 12: The SVM performs a mapping of the training patterns to a feature space from the input space and constructs a hyperplane that separates the various classes with maximum margin. This is a quadratic programming problem (Gunn, 1998) and involves maximizing Equation 13: where, κ(x i ,x j ) is the SVM kernel function that performs the mapping, α i 's are support vectors, 0≤α i ≤C and n i i i=1 α y = 0 for i = 1, 2,..., n ∑ , C is a regularization parameter to be defined by the user. Higher the value assigned to C, higher the penalty assigned to the margin errors.
One of the popular kernels used in SVM is the Radial Basis Function (RBF) kernel, which has a parameter known as Gaussian width, σ. In this study, Gaussian Radial Basis Function (GRBF) is used and is given by where, x i and x j denote the training patterns given. Interested readers may refer to Gunn (1998) for more details.
To perform classification, the input data is divided into two sets: training and test. From the input data, several features or attributes are computed which form the input variables. The training set at each instance consists of a target value called class labels and the observed input variables. In the training phase, the SVM is trained using the training data generated. In the testing phase, the trained SVM determines the target values of the test data based on its attributes.

Region Based Fusion
Region based fusion methods employ segmentation in order to divide an image into several disjoint regions (O'Callaghan and Bull, 2005). The input images to be combined are segmented using a suitable method into regions of different sizes. For each region one or more important properties/features can be computed. These features are used in deciding whether a region in the fused representation should be from the infrared image or visible image.
The segmentation process employed should produce the optimum number of regions with all salient objects included, as more number of regions increases the time taken to generate the fused image. For fusion, it is better to use joint segmentation (Lewis et al., 2007) as the joint segmentation map will produce the same number of regions in each source image.

PROPOSED FUSION METHOD
Consider that there are M registered source images f k (x,y) of size M×N to be combined. Divide the given source images into all possible patches of size N×N and convert the patches into a vector k f (t) using lexicographic ordering. Transform these representations to the ICA domain S k (t). This method employs segmentation in spatial domain to divide the source images into non overlapping regions. For each region, various features are extracted and a SVM classifier is trained to select the regions from the visual or IR input images based on these features. Figure 1 shows the block schematic of the proposed image fusion method.

Training ICA Bases
From the set of training images which contains visual and IR images, 10,000 rectangular patches are randomly taken. These patches are transformed to the ICA domain, using the estimated analysis kernel A as given in (11). This training process is required to be done only once. After training the desired bases, the estimated transform is used for fusion of images of similar type.

Segmentation
In this study, joint segmentation of source images is performed using combined morphological-spectral unsupervised image segmentation algorithm proposed by O' Callaghan and Bull (2005). The first step of this spatial domain algorithm uses textured and non textured regions of the input image to perform the initial segmentation. The texture features are generated from the details sub-bands of the Dual-Tree Complex Wavelet Transform (DT-CWT). Then a Gaussian gradient function is applied to all scales and combined with the intensity gradient information to form the final gradient with all perceptual edges in the images. This is followed by Watershed transformation of the perceptual gradient. The second stage uses a clustering algorithm to group the primitive regions provided by the watershed algorithm (O'Callaghan and Bull, 2005).
Once the joint segmented map is obtained, it is applied to both visual and IR source images separately to divide them into different segments, so that the individual segments can be compared independently. In order to separate the various segments, all the regions formed by the segmentation process in each source images are labeled.

Feature Extraction and SVM Training
The segmentation process is followed by feature extraction, in which various region features are computed in order to determine the importance of each region in the source images. In this study, the features extracted in DT-CWT domain, ICA transformation domain and spatial domain are used in order to improve the performance of the fusion process. Using more number of features is of great significance and seven features proved to be good measures of importance are used in this method.The average energy, entropy and standard deviation of a region are generally good measures in deciding its importance. In the first stage of the segmentation process, DT-CWT is used to extract the various texture features. The DT-CWT coefficients d k(θ,1) (m,n) generated in segmentation are There are different activity measures that can be computed from the ICA coefficients to define the significance of a region. Some such activity measures for fusion were proposed by Piella (2002). The ideas proposed by the author are modified in this study to extract region features from ICA coefficients. The following two features for each region are computed from the ICA coefficients.
• The mean absolute value The mean absolute value (L 1 -norm) of the ICA coefficients of each region arranged as a vector is given by Equation 18: where, N is the total number of segmented regions.

• Variance Equation 19
: where, C ik (j) are the ICA coefficients of the i th region in image k arranged as a vector. Large values of M(S) and Var(S ik ) correspond to increased activity in that region.
Region contrast and Sharpness are two image features that are of greater visual importance. Region sharpness is a measure of the edge information a region can convey. Regions with a high contrast with respect to their surroundings correspond to increased activity (Lu and Weng, 2007). Hence, the following two features are calculated for all the regions of the input images in the spatial domain: where, S ik (p,q) are the (p,q) th entry in the i th region of image k in spatial domain.
where, G x and G y are the gradients along 'x' and 'y' directions for the i th region.
Thus seven important features computed in spatial and transform domains are used in this algorithm to quantify the importance of regions for making decisions, but additional features can be easily incorporated. To train the SVM a set of training images both visual and IR is required, which have similar content and statistical properties as that of the inputs to be fused. The training images are segmented into regions using the segmentation algorithm described in section 3.2. Some regions from the segmented images of different sizes are randomly selected for training and the seven features given above are computed. The inputs to the SVM are the seven features obtained in this section. The SVM is trained to determine whether a region from visual or IR image should be included in the fused image. The SVM output is positive (+1) if a region in the visual image has any four of the feature values greater than the corresponding IR image feature values and negative (-1) otherwise (supervised learning). The SVM is thus trained to select important regions from the visual or IR images using the extracted features.

Fusion Process
The SVM training is followed by the testing phase in which fusion is performed based on the SVM output. If Science Publications JCS positive, the ICA coefficients for the corresponding region will be taken the visual image and vice versa. The SVM always picks the regions with most important features and the coefficients corresponding to the selected regions form the fused ICA domain representation. The synthesis kernel B is used to generate the fused image from the fused ICA coefficients.

RESULTS AND DISCUSSION
In this study, region based fusion of different sets of multimodal images were carried out with the ICA-SVM framework and the results are presented. Before the fusion process, the ICA bases were trained using 2000 image patches of size 8×8 taken from a set of eight visual and eight IR images of similar content from the image fusion site (www.imagefusion.org). Then PCA is applied to the selected patches and 32 important bases were selected based on the largest eigen values.
The training images are segmented using the segmentation algorithm discussed in section 3.2 and the joint segmentation maps were formed. For SVM training, 100 regions of various sizes were randomly selected from the visual and IR images training data set, with 50 from each. The seven features discussed in the previous section were computed for the selected regions in the visible and IR images and are represented as a 7dimensional feature vector. This feature vector is fed to a SVM for classification. The same image set used for training ICA bases were used to train the SVM too. The training patterns for the SVM were only a small portion of the full image data.
Experiments in this study were carried out in an i3, 370M CPU having 4 GB RAM using MATLAB. The Bioinformatics toolbox with the GRBF kernel in MATLAB is used for SVM. The regularization parameter C is set to 2000, as the performance of the SVM classifier is found to be is stable for large values of C (Li et al., 2004). The value of σ for the GRBF kernel is chosen as the scatter radius of the training samples. The ICA bases and SVM were trained offline only once using randomly selected samples and thereafter used constantly for fusion.
The multimodal images used in this study are: surveillance UN camp images from the image fusion site (www.imagefusion.org), OTCBVS images available in the site (www.cse.ohiostate.edu/otcbvs-bench) and another pair of visible and IR surveillance images.
The results obtained for the source image sets, by the proposed method were compared with fusion algorithms such as standard ICA (Mitianoudis and Stathaki, 2007), Laplacian Pyramid (LAP) (Burt and Andelson, 1983), Discrete wavelet transform (Li et al., 1995), DT-CWT (Lewis et al., 2007) and Region based ICA  in order demonstrate the improvement in performance. In standard ICA, the source images were combined by the fusion scheme proposed by Mitianoudis and Stathaki (2007). The LAP and DWT based schemes were demonstrated to have good performance and hence included in the comparison. For LAP, DWT, DT-CWT 3-levels of decomposition is used and the transform coefficients with the maximum absolute value were selected to form the fused representation. In Region based ICA, fusion is performed following the algorithm given by Cvejic et al. (2007) using entropy as the priority measure for selecting regions.
The effectiveness of the proposed study in multimodal fusion is tested using the standard objective fusion metrics Entropy, Standard Deviation (SD), Mutual Information (MI), Petrovic metric (Q f AB ) (Xydeas and Petrovic, 2000) and Piella metric (Q piella ) (Piella and Heijmans, 2003). Mutual Information as a fusion metric is a measure of degree of dependence between the fused image and the input images. Petrovic metric (Q f AB ) evaluates the amount of edge details contained in the fused image which have been transferred from the inputs. The quality measure Q piella , estimates the relative amount of details transferred to the fused image, from the source images. Larger the values of these metrics better the performance.
Once the ICA bases are generated and the SVM is trained, fusion of input images can performed. The ICA coefficients for the source images to be fused were obtained using (11). Joint segmentation of the source images were performed in the spatial domain. Figure 2 shows the segmentation outputs obtained with the UN camp source images. For all regions in both visible and IR images, the seven region features were computed. Then the trained SVM is used to make intelligent decisions and fusion is performed based on the SVM output. Figure 3a and b show the UN camp visual and IR source images, Fig. 3c-h shows the fused images using standard ICA, LAP, DWT, DT-CWT, Region based ICA and the proposed method. The person appears brighter, in all methods tested except in standard ICA (Mitianoudis and Stathaki, 2007). It is observed from the fused images that only the proposed method is able to transfer the fine details present in the visible image to the fused image. The trees and the full fence detail in the visible image are clearly seen in the proposed scheme than any other method tested. Also, visual comparison show that the fused image in Fig. 3h is superior to Standard ICA (Mitianoudis and Stathaki, 2007), region based ICA  and other multiresolution methods compared. The comparison results based on the objective fusion metrics for the UN camp image are given in Table 1. The proposed method scores the highest MI value of 7.1, while region based ICA scores 4.162. All the multiresolution methods included for comparison scores low values for mutual information. This indicates that more useful details from the input images are available in the fused image of the proposed method. Also the proposed method scores high in terms of the two important fusion metrics Q f AB and Q piella than all other methods. Even a small difference of 0.01 in the two metrics Q f AB and Q piella , is considered significant for quality rating (Wan et al., 2009). Figure 4a and b show a visible image and an IR image of the same scene. Figure 4c-h show the fused images of the methods tested. Visual inspection of the fused images shows that DT-CWT and the two region based methods are only able to transfer the target (the gun) information from the input IR image to the fused images. But DT-CWT based fusion (Lewis et al., 2007) resulted in a fused image with the background blurred. The fused image of the ICA-SVM method in Fig. 4h has taken the entire background information from the visual image and object (the gun) information alone from the IR image. The fusion performance results for the source images shown in Fig. 4a and b are presented in Table 2. The proposed method scores slightly higher than region based ICA proposed by Cvejic et al. (2007) in terms of all the fusion metrics, but significantly outperforms the individual multiresolution methods in terms of Entropy, MI, Q f AB and Q piella . Figure 5a and b show OTCBVS multimodal source images and the fused images are presented in Fig.  5c-h. It is clearly seen that the fused images obtained by the proposed method contain more details than any other methods tested. The fused image obtained by the proposed ICA-SVM method in Fig. 5h has taken most of the significant information from the source images and scores higher than the tested methods in terms of the performance metrics. The comparison results for the OTCBVS image is listed in Table 3.   Table 2. Performance metrics of multimodal Images in Fig. 4 a and b Performance metrics   Comparing the fusion evaluation metrics it is found that the proposed method has resulted in a fused image with a MI value of 7.93 a much higher score than any other method tested. Experimental results show that the proposed regionbased ICA-SVM scheme is able to extract most of the important information in the input images.

JCS
The proposed algorithm performs better than the ICA based fusion algorithms for the tested multimodal images, with higher scores in objective performance metrics. Also it performs much better than the various multiresolution methods compared.Visual comparison of the fused images shows that the proposed method is superior to the standard ICA (Mitianoudis and Stathaki, 2007), region based ICA method  and also other multiresolution transform based fusion methods.The subjective quality of the fused images is improved significantly by the use of intelligent decision making using SVM.
Experiments were also carried out by varying the number of training patches to show the variation in the fusion metrics Q f AB and Q piella with number of training patches. The number of training patches is varied from 100 to 10000 and the fusion metrics Q f AB and Q piella were determined. The performance plots of the three ICA based methods namely Standard ICA method Region based ICA and Region based ICA with SVM for UN camp images and OTCBVS images are presented in Fig.  6 and 7. The plots of Petrovic metric (Q f AB ) and Piella metric (Q piella ) versus the number of training patches for the UN camp image are shown in Fig. 6(a) and (b) and for OTCBVS images is given in Fig. 7(a) and (b). It is found that the Petrovic and Piella fusion metrics increase with the number of training patches. Similar performance is observed for the multimodal source images shown in Fig. 4(a) and (b). This shows that the quality of the fused images would be further improved by increasing the number of ICA training patches.

CONCLUSION
In this study, a novel image fusion framework using Independent Component Analysis and Support Vector Machines is introduced and its performance in multimodal image fusion is investigated. This region based approach employs segmentation in spatial domain and uses several statistical properties of regions such as energy, entropy and contrast, to make intelligent decisions by the SVM. Based on the SVM outputs, regions are selected from visible or IR images to form the fused representation. Experiments were conducted with number of multimodal images and the fused images obtained are with best visual quality compared to other tested methods. Results obtained show that the proposed multimodal scheme outperforms the standard ICA, individual multiresolution methods and region based fusion ICA in terms of the various fusion metrics.The fused images obtained by the proposed ICA-SVM method contain more useful information than the original multimodal images and hence subsequent processing tasks are expected to produce accurate results. The proposed method may appear to increase the computational load since the ICA bases are to be estimated and the SVM is to be trained. But these two functions are carried out only once and can be separately trained offline using the same training set and used for fusion of any sets of multimodal images. Hence, this method does not increase the computational cost. The proposed method can be extended further by applying it to different image types such as medical, remote sensing and multifocus images with more region features.