The Use of Band Selection in Weighted Linear Prediction for Hyperspectral Image Classification

.


Introduction
Hyperspectral imaging is a quickly developing territory in remote sensing as announced in (Chang, 2003). It extends and enhances the capacity of multispectral image investigation. It exploits several coterminous spectral channels to reveal materials that typically can't be settled by multispectral sensors. Therefore, with such high spectral resolution, numerous unobtrusive articles and materials can presently be revealed and extricated by hyperspectral imaging sensors with exceptionally thin analytic otherworldly bands for detection, discrimination, classification, identification, recognition, and quantification. A considerable lot of its applications are yet to be investigated. It has been the presence of mind to consider hyperspectral imaging as a characteristic expansion of multispectral imaging with band development (Chang, 2007).We about dependably will confront a mixed pixel problem where the estimation speaks to a reaction from a composite of different materials. For instance, a field of grass contains cutting edges ordinarily of various species mixed together with different weeds, in addition to commitments from the hidden soil that can be comprised of different natural mixes and soil composes (Chang, 2007).
We perceive, as detailed in (Chang, 2003), that one of the awesome difficulties for hyperspectral imaging is subpixel recognition, which isn't dealt with in standard spatial-based image processing. After a target is recognized, the following stage is to characterize identified focuses as indicated by their spatial or otherworldly properties. The main contribution of the present work is a new approach called WLPBS for band selection to select informative bands for the classification task.
Hyperspectral band selection, which happens in a diversity of algorithms for information preprocessing, has gotten extensive consideration, with numerous researchers consistently advancing strategies and thoughts of their own. As to these techniques, they can be generally assembled into two sorts, namely supervised and unsupervised. The supervised methods need some earlier data, for example, training samples and target signatures. This is seen for example in (Kuo et al., 2014;Yang et al., 2011). In general, these techniques initially characterize a few criteria, for example, the Jeffreys-Matusita (JM) distance (Yang et al., 2011), class divergence and signature angle (Keshava, 2004). At that point, the subset of bands that optimizes the criteria for training samples is selected. However, because the prior information is habitually not available in practice, supervised techniques are not appropriate for hyperspectral band selection (Martinez-Uso et al., 2007). Hence, this paper centers unsupervised band selection methods.
The unsupervised band selection systems are usually built in data assessment techniques and attempt to choose the subset that has the most extreme information (Chacvez et al., 1982;Chang et al., 1999), maximum information divergence (Chang and Wang, 2006), or other information measurements (Jia et al., 2012;Mojaradi et al., 2008). However, these prioritizationbased strategies do not take band correlations into consideration. Thus, the bands acquired by these strategies are highly correlated in general. Others consider band correlation, for instance with a new similarity measure. The criterion of Linear Prediction Error (LPE) (Du and Yang, 2008) decreases the dimensionality by choosing a subset of spectral bands with distinctive and informative bands. However, a large portion of these supervised or unsupervised techniques considers band information into account. Though, few works focus on the quality of these bands. For the most part, the high level of outliers in bands can bring about extensive differences. Therefore, the outlying observations (pixels) in bands will have higher selection weights if the quality of the band is ignored, which will most likely prompt absurd outcomes. To overcome these limitations, the aim of this research was to give smaller weights to possible outliers observations using a weighted least-square function.
In this study, an enhancement to reduce the influence of outliers in the algorithm of selection and minimize the correlation between the bands selected is proposed using the Weighted Linear Prediction (WLP) criterion. The proposed feature selection method is called Weighted Linear Prediction-based Band Selection (WLPBS).
To guarantee that they chose bands are particular as well as instructive, data preprocessing, including terrible band pre-removal and data whitening, is required.
Experimental results demonstrate that our method WLPBS reduces the influence of outliers and minimizes the correlation between selected bands. The results with AVIRIS databases demonstrate that the proposed similarity-based band selection method has an obvious performance improvement in terms of information conservation and class separability compared with the widely used technique shown in (Du and Yang, 2008) and to those in the state-of-the-art MIFS (Banit'ouagua et al., 2016) and MI-est (Guo et al., 2006) methods, based on the selection of complementary bands via MI evaluation.
The rest of this paper is organized as follows. Section 2 reviews some relevant works on the topic of unsupervised hyperspectral band selection. In section 3 we present the proposed approach. The experimental results are presented in section 4. Finally, we give some closing comments in section 5.

Related Works
In the important writing, hyperspectral bands are frequently described by susceptible spectral information. Specialists have examined different unsupervised criteria to recognize these bands, for example, the spectral distance, band correlation, orthogonally, Linear Prediction (LP), Mutual Information (MI) and Kullback-Leibler Divergence (KLD). Among these criteria, the spectral distance is conceivably the most fundamental and well known.
Some unsupervised techniques take the linear characteristics of bands into thought and look for the particular bands in the vector space. The latter procedure can be separated into the accompanying two categories: One removing distinctive pixels based on similarity measurement and the other utilizing the geometry concept, for example, a simplex strategy. The endmember extraction algorithms employing Unsupervised Fully Compelled Least-Squares Linear Unmixing (UFCLSLU) in (Heinz and Chang, 2001) and Orthogonal Subspace Projection (OSP) in ) have a place with the primary category, though the well-known pixel purity index (Boardman et al., 1995) and NFINDR calculations (Winter, 1999) (Wang et al., 2007) have a place in the second category. The linear prediction error is likewise used to decrease the bands by estimating the similarity between bands (Du and Yang, 2008). Sun et al. (2014), an Autocorrelation network based Band Selection (ACMBS) strategy, which utilizes the base Linear Prediction (LP) error as the selection criterion and searches the suboptimal subset by Sequential Backward Selection (SBS) acquired promising results. The authors of (Han et al., 2017) propose an Improved Similarity Measurement method in light of LP (ISMLP) for hyperspectral ocean ice identification.
The clustering methods are frequently used to select a representative and different bands (Martinez-Uso et al., 2007;Cao et al., 2016). Positioning based techniques will rank bands in light of some ranking scores (Sun et al., 2015). Mutuelle Information (MI) is also favored by numerous researchers for their nonlinear attributes (Banit'ouagua et al., 2016;Guo et al., 2006).
In view of the defined criteria, two major kinds of optimization strategies, i.e., an incremental (Whitney, 1971) search and an integrated search (Marill and Green, 1963), are utilized in the literature to search for the particular bands. An incremental pursuit is a `down-top' strategy which method which starts with a single band followed by incrementally adding new bands. If the combination of a certain band and the preselected bands meet the optimal criterion, this band is recognized as the new band to be included. The procedure is repeated until the point that the quantity of bands is sufficiently extensive. An integrated search is the complement of an incremental search. It is a 'down-top' search method and treats the complete set of bands as the candidates to be selected at first. Then, the bands in the candidate set are detached one by one until reaching the wanted number of remaining selected bands.
Without a reasonable stopping criterion, the band selection process may run pointlessly long or perhaps forever depending on search strategy. Optimization strategies and evaluation functions can impact the decision of a stopping criterion. Stopping criteria based on a generation procedure include: (i) When a predefined number of features is chosen and (ii) when a predefined number of emphases is come to. Stopping criteria based on an evaluation function can include: (i) When the expansion (or erasure) of any element does not produce a better subset; and (ii) when an ideal subset as indicated by some evaluation function is acquired. The loop proceeds until the stopping criterion is satisfied (Dash and Liu, 1997).
As of late, numerous pertinent works have turned to the idea of Virtual Dimensionality (VD) (Chang and Liu, 2014) or subspace identification strategies (Ghamary Asl et al., 2014) to estimate the number of spectrally distinct signatures that describe the information.

Data Preprocessing
The experimental results in (Du and Yang, 2008) recommend that using data whitening (after bad band deletion) for selected bands may offer a somewhat preferable execution over utilizing original bands. This is because the noise component in different bands is varied and if the noise component is larger, a band may appear to be different from others, although it may not be informatively distinct. In this way, in this study, we apply data whitening to the original bands (after bad band removal), which can be easily achieved by the eigendecomposition of the data covariance matrix. The readers are referred to (Du et al., 2003) for a demonstration that the net effect of noise whitening and data whitening is similar.

Linear Prediction-Based Band Selection Method
To choose the most unique or different bands, a likeness metric should be assigned. The authors of (Du and Yang, 2008) utilized an approach where band similarity can be assessed mutually rather than pairwisely. It starts with the best band-band combination and afterward the two-band combination is in this way expanded to three, four, et cetera, until the desired numbers of bands are selected. Linear Prediction (LP), a new criterion for similarity comparison, can together assess the similarity between a solitary band and various bands.
There are two parameters for LP-based Band Selection (BS): Initial bands and number of bands to be selected. The last must be pre-decided. The accompanying algorithm can be utilized to locate the first two best groups (Du and Yang, 2008).
1. Arbitrarily select a band B 1 from the original data set and undertaking the various N-1 bands to its orthogonal subspace 〈B 1 〉 ⊥ . Discovery the band B 2 with the most extreme projection in 〈B 2 〉 ⊥ , which is viewed as the most, not at all like B 1 2. For the band B 2 , project the various N-1 bands to its orthogonal subspace 〈B 2 〉 ⊥ . Find the band B 3 with the most extreme projection in 〈B 2 〉 ⊥ 3. In the event that B 3 = B 1 , the algorithm is terminated because B 1 and B 3 are affirmed to the combine with the most noteworthy dissimilarity. At that point either B 1 or B 3 can be utilized as the underlying band. If B 3 ≠ B 1 , go to the following stage 4. For B i , locate the most dissimilar band until B i+1 = B i-1 at that point, either B i-1 or B i can be utilized as the initial band (or the two bands are utilized as the initial band pair) LP-based BS exploits the advantage of a simple algorithm concept and has shown high productivity in selecting appropriate bands. In view of the hypothesis of this strategy, the high linear independence of bands shows their appropriateness for selection as best bands. Though, this criterion does not generally fulfill the classification purpose it may fail when the candidate band to be selected has many outliers, as these outlying observations (pixels) in the band will have higher selection weights. This leads to an unreasonable result. To resolve this problem, we proposed WLPBS, which has both the ability to exclude the correlated bands and minimize the weights of outlying observations (pixels) in the band.

Proposed Algorithm: Weighted Linear Prediction-Based Band Selection Method
The basic steps of the WLP-based BS algorithm can be described as follows: 1. As in Linear Prediction-based Band Selection (LPBS), initialize the algorithm by choosing a pair of bands B 1 and B 1 ; the selected subset is denoted as S = {B 1 , B 2 } 2. Find a third band B 3 that is the most dissimilar to all the bands in the current S by using a certain criterion (discussed below) and then the selected band subset is updated as

Continue step 2 until the stopping condition is satisfied
The criterion used in the WLPBS algorithm is described as takes after. Expect that the present band subset S incorporates B 1 and B 2 . To find a third band that is the most different to B 1 and B 2 , let each band B be assessed as: where, B′ is the weighted linear prediction of band B by B 1 and B 2 and a 0 , a 1 and a 2 are the parameters that can minimize the linear prediction error: e = ||B-B′||. Using a weighted least squares solution, a = (a 0 , a 1 , a 2 ) T can be estimated as: Where X is an N ×3 matrix whose first column is one, the second column includes all the pixels in B 1 and third column includes all the pixels in B 2 . y is N ×1 vector with all the pixels in B. W is a matrix that is used to weight the outlier observations (pixels) in bands. WLS uses a kernel to weight nearby observations more heavily than other observations. The kernel assigns a weight given by: Where k is the number of already selected band, x i and x are respectively candidate and estimate bands. This builds the weight matrix W, which has only diagonal elements.
The band that yields the greatest error e min (utilizing the ideal parameters in a) is considered as the most dissimilar band to B 1 and B 2 and will be chosen as B 3 then S = {B 1 , B 2 , B 3 }. The schematically depiction of WLPBS approach is as appeared in Fig. 1.
The stopping condition: The research strategy in WLPBS stops when the linear prediction error cannot be further optimized. The performances of previously mentioned algorithms are assessed on two real AVIRIS datasets (Indian Pine and Salinas) as far as dimensionality reduction and classification accuracy utilizing SVM classifier. Every database has been separated into two parts: the training-base and the test one. Half of the pixels from each class were arbitrarily decided for training, with the staying 50% for the test set on which performance was evaluated. The results presented here are comparable with the results of the original unsupervised approach LPBS (Du and Yang, 2008) and to those in the state-of-the-art MIFS (Banit'ouagua et al., 2016) and MI-est (Guo et al., 2006) methods, based on the selection of complementary bands via MI evaluation. We point out that the performances of MIFS outperform (Banit'ouagua et al., 2016) any MI-based method using the associated reference map, an entropybased method, and a correlation-based method. This is the reason why we chose this method to advocate the idea of using WLPBS algorithm to perform bands selection in hyperspectral images, by comparing its results.
The evaluation of our band reduction approach is performed by supervised classification utilizing Support Vector Machine (SVMs). One-Vs-One multiclass SVMs were trained attributable to the accessibility of ground truth and tested on the reduced datasets. For this, we utilize the Library for Support Vector Machines (LIBSVM) package with the Gaussian Radial Basis Function (RBF) as in past work (Banit'ouagua et al., 2016). To improve the penalty parameter c and the aperture γ a cross-validation strategy is performed. The classification accuracy is processed for band reduction techniques.
It should be noted that the whitened data are used only for band selection processes, while the classifications are carried out on original data.

Data Description
Indian Pines scene: This image is a public hyperspectral dataset, which was acquired over a test site called Indian Pine in northwestern Indiana, U.S.A. The AVIRIS sensor nominally collects 224 spectral reflectance bands of data in the [0.4-2.5]µm Visible and Near-Infrared (VNIR) range.
Among them, four contain only zeros and are discarded. Therefore, 220 bands from the 92AV3C dataset are used for the experiments. Each of the 220 band images is of 145×145 pixels. Around 49% of pixels are grouped into 16 different classes (Table 1). As to the remaining pixels, it is difficult to group them into any of the existing classes and they are identified as the background.
In our experiments, as specified in (Du and Yang, 2008) we removed the noisiest bands from this dataset and kept 196 spectral bands as the initial data for every method. More precisely, we removed bands [1.3], [104.108], [150.163] and [218.220].
Owing to a large number of pixels in the remote sensing image, the sizes of the matrices in the data processor are large, which decreases proficiency. Be that as it may, utilization of just a moderately little subset of pixels in the band determination process does not change the outcomes as a rule (Du and Yang, 2008) (Han et al., 2017). This is on the grounds that a high spatial correlation exists between bands of hyperspectral data. Here, for band decrease, we utilized just 10% of N pixels scattered all through the labeled pixels to decide the bands to be selected. Table 2 shows the overall classification accuracy using an SVM classifier from the Indian Pine data. As we can see, the proposed WLP-based band selection algorithm significantly outperforms the LPBS, MIFS (Banit'ouagua et al., 2016) and MI-est (Guo et al., 2006) methods. The reason is that LPBS is influenced by outliers while WLPBS minimizes their weight. Accordingly, the bands selected by WLPBS are the best for classification. Compared to the MIFS algorithm (Banit'ouagua et al., 2016) WLPBS fast in terms of execution time and does not require many parameters because it's used just 10% of N pixels scattered throughout the labeled pixels to determine the bands to be selected.    Figure 4 presents the linear prediction error for 25 bands selected using WLPBS and LPBS. We can observe from this figure that WLPBS provides the lowest prediction error (the optimal parameters in a) as compared to LPBS for the Indian Pine data. With WLPBS we have the best-estimated band, used to select the most dissimilar band to B 1 and B 2 in order to maximize e min . LPBS does not exceed WLPBS in the quality of the band selected. The experimental results in Fig. 4 propose that utilizing WLPBS for select the bands may offer a marginally preferred performance over utilizing LPBS. This is on account of the outliers observations in various bands are changed, if the outliers observation is bigger, a band may appear to be different from others, despite the fact that it may not be informatively distinct.

Results and Discussion
Without minimizing the weight of outliers, the bands selected by LPBS are almost similar, which justifies the low variation of LPE for LPBS method. Figure 2 presents classification maps when using 25 bands selected by WLPBS and LPBS. We can see that 25 bands are adequate to distinguish materials contained in the district (Table ref tab 4). It additionally appears in this figure that the thematic map delivered utilizing the 25 bands chosen by WLPBS algorithm is the most like the Ground Truth (GT).

Data Description
Salinas scene was gathered utilizing the 224 bands AVIRIS sensor over Salinas Valley, California and is described by high spatial resolution (3.7 m pixels).

Results and Discussion
The supervised classification results got from Salinas band sets are shown in Table 3. This can be ascribed to the fact that LPBS method is based on band correlation and has little consideration of data quality. In this manner, classification results in light of the bands gotten by this strategy are less accurate using whitened bands. The most elevated accuracy is accomplished when utilizing WLPBS, that considers band correlations and outliers.
The results are given in Fig. 5 showing that the WLPBS algorithm achieves better performance in LPE reduction (best B′ with optimal parameters in a) than LPBS in the Salinas dataset. WLPBS selects the most dissimilar set of bands. Figure 3 presents classification maps for 25 selected bands using WLPBS and LPBS. As in the Indian Pine scene, the thematic map produced using 25 bands selected from the Salinas dataset with the WLPBS algorithm is most similar to the GT map. Table 4 presents the classification accuracy of each class, with respect to several methods.

The Selected Number of Bands (NB)
The VD utilized by (Du and Yang, 2008) might be a sensible marker of the fitting number of bands to be selected. However, this criterion may work effectively when noise is independently and identically distributed. This is, unfortunately, not true when is applied to real image data. For this reason, we here use LPE as a criterion to stop the selection process, we stop the selection if any band from the candidate ones can't optimize more the linear prediction error.       For the Indiana Pine scene, NB = 25 when the LPE = 10 −3 for the WLPBS method and NB = 25 when LPE = 10 −1 for the LPBS algorithm (Fig. 4). When 25 bands were selected, the classification accuracy from the WLPBS was 88.79% and for LPBS it was 87.50% (Table 2). If the LPE decreases, the number of bands selected and the classification values are increased. Thus, the LPE might be a sensible pointer of the fitting number of bands to be selected in the real data.

Conclusion
Band selection in hyperspectral imaging is imperative, regardless of the learning algorithm which is utilized to train the hyperspectral dataset. Because of the presence of irrelevant and redundant bands, by choosing just the relevant bands of the data, higher predictive accuracy can be obtained. In this study, the problem of band selection (selection of just the important band in the classification procedure) is achieved.
Among the numerous arrangements of this problem, algorithm-based band similarity measures are ideal for they think about band correlations while the bands gotten by the others are highly correlated in general.
Existing LPBS, one of these algorithms, excludes correlated bands effectively, but it may fail when the candidate band to be selected has many outliers.
To resolve this problem, an unsupervised band selection method called WLPBS has been proposed. WLPBS has the ability to both exclude the correlated bands and minimize the weights of outlying observations (pixels) in the band.
The investigations demonstrate that WLPBS can remove the bands with low qualities and accordingly get high classification accuracy. Nonetheless, the major problem of the WLP-based band selection is that computational cost is high all the pixels are used. Later on, we will build up The N-FINDR algorithm for pixel selection and then the selected pixels are used for band selection.