Local Features Supported by the Complement Feature for Image Segmentation

Email: salahameer@alumni.uwaterloo.ca Abstract: An Eigen formulation is proposed for image thresholding/segmentation. A vector composed of local features, normalized intensity of each pixel and that of the neighboring pixels, is used to represent each pixel. A “complement” component is appended to this vector to produce a “unit” vector. The auto-correlation matrix is computed for each pixel in the image using this unit vector. The first component (corresponding to the intensity of the current pixel) from all Eigen vectors, obtained from the autocorrelation matrix, are used as multi-level thresholds. Similar procedure can be adopted using powers of the current pixel intensity value. In general, more than one threshold can be obtained. Results on a wide range of images are demonstrated to show the effectiveness of the proposed schemes.


Introduction
Multi-level image thresholding is essential in many image segmentation schemes needed by many computer vision tasks. The ultimate goal is to delineate the image in such a way to obtain useful descriptions of the objects comprising the scene. To achieve this goal, many algorithms has been (and still being) developed. Details regarding categorization of these algorithms and the feature space used can be found in many traditional survey papers such as (Sezgin and Sankur, 2004). In fact, the field is so vast and diverse that there are survey papers on a single subcategory e.g., (Ilea and Whelan, 2011;Peng et al., 2013;Unnikrishnan and Hebert, 2005).
In this study, the image segmentation problem is considered as a multilevel thresholding task. A simple but effective Eigen structure is proposed as a solution. The schemes are based on a recent work (Ameer, 2020).

Method
Without loss of generality, the intensity of the original image is normalized to the interval [0,1] (or [-1, 1]) and concatenated to produce a column vector of size Nx1, N is the number of pixels in the image. Each pixel can be represented by any subset from its 8neighborhood. For ease of notation, a description will be given for using one neighbor, more neighbors can be used and the description can be easily generalized. Each element is then extended to be represented by a "unit" vector type given by: where, wi is a weighting vector that sums to one and k = 1, …, N. In this study, the uniform, power dependent and variance weighting have been implemented with minor differences in their performance. The author is not confident that other schemes can provide better performance.
G is now an N  n vector, n is the size of the vector used in Equation (1), n = 10 for a 33 neighborhood. An auto correlation matrix (AG) of size n  n is then constructed from G as: Solving the Eigen formula: The Eigen vectors of AG represent the axes of inertia for the data set. The largest vector Vmax (corresponding to the maximum Eigen value λmax) points toward the direction of maximum inertia (Ameer, 2020 Effectively, there can be many schemes depending on the number of neighbors included in forming Equation (1) or Equation (4).
Another alternative would be to use powers (not necessarily integers) of the pixel intensity and/or its neighbors. One of the generalization of Equation (4) can be: where, k is the pixel index and s() is the sign function. Similar argument can be used to generalize Equation (1).
It should be pointed out that in all the schemes described above, the resultant Eigen vectors should be normalized back using the same formulation used, e.g., Equation (1) or Equation (4).
The performance is assessed through the traditional Root Mean Square Error (RMSE) given by: where, x and y stand for original and segmented images and ||x|| is the cardinality of the set. Adjustment should be placed when the range of images are different. In addition, it is unfair to compare performance between images having different number of segments.
Another evaluation scheme is the SSIM given by (Wang et al., 2004): where, µ is the mean, σ 2 is the variance, C1 = 0.0001, C2 = 0.0009 and σxy is the covariance between x and y. Experimental Results Figure 1 shows the test images used. Fig. 2-7 illustrate the results obtained using neighboring intensity values in the same fashion as in Equation (1). On the other hand, Fig.  8-13 illustrate the results using neighbors in the same fashion as in Equation (4). Table 1 and 2 list the values of RMSE and SSIM respectively for images in Fig. 2-13.
It is obvious that more neighbors may result in better performance with increased computational cost. However, some saturation is inherent which means that we may not get better performance when the number of neighbors goes beyond a certain value, its image dependent though. This result is in favor of Equation (4) compared to Equation (1). Normalization to [-1, 1] can be slightly better than [0, 1]. However, more tests are needed as it seems more image dependent.
As shown in Table 1 and 2, the measures are encouraging for the proposed schemes. However, care should be taken when using RMSE and SSIM as they have deficiency regarding scale. All the segmented images will have inferior values in these measures if the images were scaled back to [0,1] or [-1, 1] instead of the computed results, i.e., the means of each region. In addition, the higher the number of regions the higher the value of SSIM (lower RMSE) that may not result in higher performance.  The scheme used in Fig. 13 was iteratively implemented on each image until no change in the number of resultant regions. Each image is normalized after each iteration. The output images are shown in Fig. 14. Figure 15 shows the result of using Equation (5) to segment the images in Fig. 1.
To further illustrate the effectiveness of the proposed schemes, some images from Berkley Segmentation Database (BSD) (Martin et al., 2001) are segmented in Fig. 16 using the schemes used in Fig. 13 and 15 respectively. The resultant images are normalized after the segmentation process.  It is easily noticed from these images that using the normalization [-1,1] produces higher number of regions compared to [0,1]. However, the latter has better performance for thresholding tasks while the former is advantageous for segmentation tasks.
The schemes are easily extended to any feature space using Equation (1), (4) and (5) or any combination of them.

Conclusion and Future Work
Simple algorithms have been proposed in this study to perform multi-level image segmentation. Thresholding is a special case where only Vmax is used. The proposed schemes are very effective as demonstrated by the values of RMSE and SSIM. However, better evaluation schemes are needed to better distinguish the performances.
More elaboration is needed on the best aggregation used in selecting the thresholds. In particular, how many neighbors are required? Is it beneficial to append the mean or the median? What powers are essential? or even some sort of mixture.
The proposed algorithms can be easily extended to any feature space, e.g., color. However, the optimum selection and/or weighting are still need to be determined. The cost is paid by the extra computation required.
The component added to obtain a unit vector, Equation (1), can be generalized to any fuzzy complement. However, more work is needed to find the best formula and whether improvements can be attained, (Ameer, 2020) for some suggestions.

Ethics
This article is original and contains unpublished material. The corresponding author confirms that all of the other authors have read and approved the manuscript and no ethical issues involved.