Characterization of the Earth's Surface State by Unsupervised Classification: Case of Vegetated, Aquatic and Mineral Surfaces

Corresponding Author: Sié Ouattara Laboratory of Signals and Electrical Systems (L2SE)), Institut National Polytechnique Houphouët Boigny, Yamoussoukro, Cote D’Ivoire Email: sie_ouat@yahoo.fr Abstract: In this study, we propose an unsupervised classification scheme based on the Dempster-Shafer Theory (TDS) and the Dezert-Smarandache Theory (DSmT) to characterize vegetated, aquatic and mineral surfaces. From pre-processed ASTER satellite images (georeferencing, geometric correction and 15 m re-sampling), neo-channels were produced by determining the spectral indices NDVI, MNDWI and NDBaI, considered as sources of information for classification of a given pixel. NDVI is a contrast function to highlight vegetation. By account, the MNDWI makes it possible to characterize the water and the NDBaI makes it possible to recognize the mineral resources. Then, we modeled respectively the formalisms of the DST and the DSmT, these formalisms are modeling tools close to advanced probabilities based on the notions of belief and fusion functions to take into account certain imperfections (uncertainty, ignorance, etc.) encountered in the acquisition of images. In addition, the DST manages a formalism of disjunction between the sources during the DSmT simultaneously manages a disjunction and a conjunction between the sources. Next we realized the algorithms and related codes that we implemented in the MATLAB environment. Our contribution lies in taking into account the imperfections (inaccuracies and uncertainties) linked to source information through the use of mass functions based on a simple Gaussian distribution support model in order to model each focal element independently of the others and to evaluate the belonging of a pixel to a class with respect to the majority of elements representing said class. The resulting results show that the DST approach is relatively satisfactory for the unsupervised classification of mineral surfaces and aquatic surfaces while it is not satisfactory for vegetated surfaces according to all proposed models. As for the DSmT, it presents satisfactory results for all the models proposed. The model with the exclusion integrity constraint E∩V ∩ M = φ was selected as the best model because having, in addition to an average rate of well-graded pixels of 93.34%, a compliance rate (96, 37%) with the terrain higher than those of the other models implemented.


Introduction
The cartography of the state of the Earth's surfaces in satellite imagery can be summarized in three (03) categories of entities: Vegetated surfaces (surfaces on which vegetation of different varieties or densities rest), aquatic surfaces (surfaces on which rivers and water reservoirs are entirely rested) and mineral surfaces (surfaces on which natural or artificial mineral formations rest: Geological outcrops, soils, buildings, roads, etc.). Physically on the spot, depending on the size of the area of a surface portion, we can observe seven (07) categories of entities: Vegetated surfaces, aquatic surfaces, mineral surfaces, vegetated and aquatic surfaces, vegetated and mineral surfaces, aquatic and mineral surfaces and vegetated, aquatic and mineral surfaces. The characterization of these different entities on satellite images is often vitiated by imperfections (uncertainty, inaccuracy, confusion, etc.) due to the inappropriate spatial and/or spectral characteristics of the images used. Thus, a good characterization of these entities implies having satellite images of very high spatial and/or spectral resolutions, or failing to define good discrimination criteria for these entities. If the first point relates to the availability of images, the second, it relates to image processing. Thus, from this last point, researchers have resorted to the use of spectral indices for the classification of satellite images for mapping vegetated surfaces, water surfaces, bare and built floors, etc. However, it is difficult to determine the appropriate threshold values for ideal results (Chen et al., 2006;Ji et al., 2009;Uddin et al., 2010). This leads to uncertainties and inaccuracies in the information produced by the images associated with said indices. So, a question arises: How to take advantage of these indices taking into account their imperfections in order to improve the characterization of the Earth's surface state? To answer this concern we propose to use information fusion to take into account and manage any imperfections related to the images associated with the indices.
The general objective of the study is to contribute to the unsupervised classification of satellite images by merging information by developing an unsupervised classification approach, based on the use of spectral indices, as well as theories of Dempster-Shafer (DST) and Dezert-Smarandache (DSmT), to characterize vegetated surfaces, aquatic surfaces and mineral surfaces.
It is a specific way, first to model the guiding elements of the classification, then to carry out several classifications by implementation of the algorithms and programming codes developed under the MATLAB environment and finally to evaluate and decide on the chosen model for classification. This paper, which aims to report on the work done, first deals with some approaches to information fusion techniques for the management of imperfections and spectral indices, then makes an inventory of the material used and exposes the methodology. Developed in this study and finally presents the main results of the study and their discussion, focusing on the results of images classified with DST and images classified with DSmT.

Information Fusion Techniques and Spectral Indices
There are several techniques for information fusion (Martin, 2005), some of which, for the management of information imperfections, are Bayesian methods, possibility theory, Dempster-Shafer theory and Dezert-Smarandache theory. Bayesian methods are based on conditional probabilities and allow reasoning only on singletons and under the constraint of a closed world, that is to say a situation in which the set of possible solutions is known. These methods model uncertainty well and often confuse inaccuracy to it, hence their inefficiency in correctly managing the imperfections of inaccuracies that are often found in satellite images. Moreover, they are not designed natively for information fusion.
The theory of possibilities (Dubois and Prade, 1988), for its part, is derived from the theory of fuzzy sets (Zadeh, 1968) and makes it possible to represent the imperfections of uncertainty and imprecision (Roux and Desachy, 1996;Masson, 2005). However, its formalism presents difficulties in the choice or the estimation on the one hand of the functions of belonging or distribution of possibilities and on the other hand to realize the fusion.
Contrary to the two previous approaches, those based on the functions of masses can be considered as more general and more flexible in their implementation than those of the probabilities or the possibilities (Masson, 2005). In addition, good management of imperfections of inaccuracy and uncertainty related to information is provided by methods based on the Dempster-Shafer Theory (DST) and the Dezert-Smarandache theory (DSmT). In this framework, it (Abbas, 2009) has been developed and applied models of fusion and classification of satellite images by the DST and the DSmT, in order to map the land occupation as well as to detect and quantify changes using multisource, multitemporal and multi-scale satellite images.
Therefore, as part of this study, we propose to use DST and DSmT, for the management of imperfections related to satellite images, whose formalisms are presented below.

Formalism of the DST and the DSmT
The formalism of the DST and that of the DSmT can be summarized in four (04) stages that are modeling, estimation, combination and decision.
Modeling consists of defining:

A Framework of Discernment
For the DST, it is an exhaustive set of exclusive responses to a given question for a study situation, noted Ω, while for the DSmT it is a complete set of distinct responses that are not necessarily exclusive, noted Ω'.

A Cadre Reasoning Framework
For the DST, it is the set of elements of the framework of discernment plus all the possible unions of this set, noted 2 Ω ; for the DSmT it will be necessary to add in addition all the possible intersections and it is noted D Ω' .

A Function of Masses
By putting Θ = Ω and G = 2 Ω (for the DST) and Θ = Ω' and G = D Θ (for the DSmT), we define a mass function m from G to values in [0,1] satisfying the conditions following of equation 1 where ∅ is the empty set: The value m (A) quantifies the belief that the class sought belongs to the subset A of Ω (and not to any other subset of A). The subsets A such that m (A) > 0 are called focal elements.

Special Cases of Interpretation of Mass Function
Source S has an imprecise knowledge; it only believes in A • If m (C) = 1, with C a singleton element of Θ: Source S has a precise knowledge, it fully believes in C If m (A) = s and m (Θ) = 1s, (0 ≠ s ≠ 1): Source S has incertain and imprecise knowledge; it believes in part in A, but nothing more.
The estimate consists in determining the values of the parameters of the model of function of masses retained.
For example, for the simple support mass function model below in Equation 2, the estimate amounts to determining the parameter ω that characterizes the ignorance given to the study situation. This model makes it possible to characterize each element of G independently of the others: The combination which is the fundamental step of the fusion is carried out for the DST according to the orthogonal combination rule Dempster (1967) of the Equation 3. Considering two functions of initial mass m 1 et m 2 representing the respective information of two different sources, we have: The term K is called the inconsistency of fusion and can be interpreted as a measure of conflict. It corresponds to the mass of the empty set. Equation 4 gives it expression: If K = 1, the combination of sources of information is impossible. This means that the sources are totally in conflict. They give contradictory information on the object considered.
For DSmT, two types of combinations are used: The classic combination and the hybrid combination. The classic combination is achieved with the conjunctive combination rule given by Equation 5: For the hybrid constraint, it is performed in the presence of integrity constraint. An integrity constraint on a set is the impossibility of assigning a nonzero mass to this set, in which case the zero mass is assigned to it. And then his true mass characterizes a conflict between the elements that compose it. In this case, this mass is redistributed proportionally to the focal elements involved in the generation of the conflict, with a chosen redistribution rule.
The combination rule with Proportional Conflict Redistribution (5 th version: PCR5) is regularly used to achieve the hybrid combination because giving satisfactory results (Djiknavorian, 2008) with a simplified implementation as follows: 1. Determine the masses combined with the classic combination 2. Evaluate the conflicting masses 3. Redistribute the conflict masses totally or partially in proportion to nonzero mass sets involved in the combination The decision is made through several rules, the most used of which give satisfactory results are based on the combined mass function, the credibility function (Equation 6) and the plausibility function (Equation 7). To do this, the maximum credibility (Equation 8) or the plausibility maximum (Equation 9) for the DST and the combined mass maximum (Equation 10) for the DSmT is used: Thus, the class C * retained is the element of Θ or G whose value is greater compared to the decision criterion chosen.

Spectral Indices NDVI, MNDWI and NDBaI
The spectral indices are neochannels obtained from mathematical operations carried on the original channels of the image considered. They are made for the purpose of specific characterization of given entities. Thus, spectral indices have been developed to characterize the terrestrial surface state in vegetated surfaces (Equation 11), aquatic surfaces (Equation 12) and mineral surfaces (Equation 13): These indices, while discriminating the categories of the entities mentioned above, present imperfections (uncertainty, imprecision, etc.) due to the discrimination thresholds used (Rousse et al., 1973, Zhao andChen, 2005;Chen et al., 2006;Uddin et al., 2010;Xu, 2006;Xiao-Ling et al., 2006;Szabo et al., 2016). Hence the interest of a fusion of information to manage these imperfections in the optics of a decision improvement.

Material
The equipment used consists of software, positioning tools and data.
With regard to software, it was first used ENVI 4.7 to preprocess ASTER images, then MATLAB to develop classification models based on the use of segmented NDVI, MNDWI and NDBaI spectral indices, lastly the DST and the DSmT for the classification of aquatic, mineral and vegetated surface conditions.
The positioning tools are composed of a 2.5 m resolution Garmin GPS to determine and locate the coordinates of the classification entities, fixed points and outcrops and three (3) topographic maps (Leaves Gagnoa, Bouaké and Dimbokro) for the identification of the localities of the experimental study area, in the forestsavanna transition zone, in the center of Côte d'Ivoire.
The data for this study are of two types: Field data and remote sensing data.
Field data consists of geographic coordinates of fixed points and outcrops. The remote sensing data used are from the ASTER sensor and are rectified satellite images of the AST_L1A_00301102004105832 scene. This sensor has 14 bands with a broad spectral region covering the visible and the near infrared (VNIR-Visible and Near Infrared: Band 1, Band 2 and Band 3N), the medium infrared (SWIR-Short-Wave Infrared: Band 4, Band 5, Band 6, Band 7, Band 8 and Band 9) and thermal infrared (TIR-Thermal Infrared: Band 10 to Band 14).
The spatial resolution associated with these images is 15 m in the visible and the near infrared, 30 m in the medium infrared and 90 m in the thermal infrared.

Methods
The approach used consisted first of all in packaging the ASTER satellite images under ENVI to generate the sources of information to be used for the development of the classification models; then we developed four classification models based on the DST; later four other classification models were developed from the DSmT. All these classification models were made from algorithms and programming codes implemented under the MATLAB environment. Finally, these models were evaluated to retain only one.

Packaging
In order to benefit from the totality and quality of spatial resolutions and spectral resolutions, these ASTER satellite images have been subject to georeferencing, resampling and geometric correction to create a compatible database, based on 14 bands.
First geo-referencing was carried out for each band with the k nearest neighbors method; then the geometric correction was made from 100 bitter points, chosen uniformly covering the ASTER scene considered, with the bi-linear method; finally, the re-sampling, at a step of 15 m with the bilinear method, is carried out for the bands of SWIR (bands 4, 5, 6, 7, 8 and 9) and TIR (bands 10, 11, 12, 13 and 14).
Georeferencing and geometric correction make it possible to superimpose said satellite images on other georeferenced media in the same coordinate system.
The fourteen (14) preprocessed images made it possible to determine the NDVI, MNDWI and NDBaI indices, which in turn had their associated images segmented according to the thresholds indicated in Table 1.
The segmented NDVI, MNDWI and NDBaI images are the sources of information considered for the development of classification models according to the DST and the DSmT. Figure 1 shows the organizational chart of the ASTER Satellite images conditioning methodology.

Modeling Frameworks of Discernment and Reasoning
The vegetated surfaces, the aquatic surfaces and the mineral surfaces are exclusive elements between them and consequently, the discernment frames Θ (Ω for the DST and Ω ' for the DSmT) and of reasoning G (2 Ω for the DST and D Θ for DSmT) adopted are presented by Equations 14-16:

Modeling and Estimation of Mass Functions
The mass functions of the sources are defined on G (2 Ω or D Θ ) according to the theory used.
Considering the normal distribution of variable x and parameters µ A and σ A in Equation 17: with µ A and σ A respectively the mean and the standard deviation of the data x belonging to A, the mass functions of the sources are then defined by the Equations 18-24. X≤-0,9 -0,9 < X≤0,1 0,1 < X Y≤ 0,9 Z < -0,1 0

Mass Function of NDVI
With NDVI(x): value of the pixel x of the NDVI image, we have:

Mass Function of MNDWI
With MNDWI(x): Value of the pixel x of the MNDWI, we have:

Mass Function of NDBaI
With NDBaI(x): value of the pixel x of the NDBaI, we have: The proposed Gaussian single support mass function model was used to assign a mass to each element of G independently of the other elements and to take into account the similarity of a pixel to the majority of pixels belonging to the class being tested.

Combined Mass Function
For the DST, the combined mass function was performed with the Dempster orthogonal combination rule of Equation 3 applied to images associated with segmented NDVI, MNDWI and NDBaI.
The combined mass function in the case of DSmT was performed with the 5 th version of the Proportional Conflict Redistribution (PCR5) combination rule (Djiknavorian, 2008) applied to the associated sources of NDVI, MNDWI and NDBaI segmented images.

Measure of Belief
For the DST, the measure of belief has been realized with the credibility (Bel) and Plausibility (Pls) functions on the frameworks of discernment and reasoning. This generated four (04) classification models according to the DST: model 1 (simple Bel), model 2 (full Bel), model 3 (simple Pls) and model 4 (full Pls). The simple models are elaborated on the framework of discernment and the complete models on the framework of reasoning.
Similarly, for the DSmT four (04) models have been developed by applying the following conditions, based on an exclusion integrity constraint (Djiknavorian, 2008): • Model 1: E∩V∩M = φ and classification on D Θ • Model 2: E∩V∩M = φ with strict paradoxical and plausible information • Model 3: E∩V∩M = φ with strict paradoxical information • Model 4: E∩V∩M = φ and classification on Θ For the realization of the classifications associated with these models, codings have been developed in order to simplify the implementation of algorithms and programming codes in the MATLAB environment (Okaingni et al., 2017a;2017b).

Evaluation
The evaluation for this study, for the classification models performed, is based first on a statistical analysis and then on a visual compliance analysis. The statistical analysis is performed by a M CF (k) confusion matrix based on the field truth classes and those of the combined image by model k (k =1, 2, 3, 4). Field truth classes are placed in columns while those in the combined image are in rows. The total number of pixels per class for the terrain truth is divided into the classes of the combined image. It is then calculated performance indices from Equation 25-28 (Ji et al., 2009): Where: i: Class number of the ground truth j: Class number of the combined image by model N i : Number of pixels of class i of the ground truth M cf (j,i): Number of pixels of class i of the terrain truth having been assigned after classification to class j of the combined image by model GCR i : Well-graded pixel rate of class i of the ground truth GCR moy : Average rate of well ranked pixels of the truth ground ECR ji : Misclassified pixel rate of class i of ground truth in class j of the combined image by model ECR i : Misclassified pixel rate of the class i of the ground truth The visual compliance analysis, on the other hand, consisted in verifying in the field the correspondences of the different compound classes provided by the classification. Portions of the image have been selected and their geographic coordinates determined, with which field verification is performed.
The final model to remember is one with a wellranked average pixel rate above 90% and a compliance rate greater than 95%.

Results and Discussion
Images Classified with the DST The approach proposed by the DST, with criteria of maximum credibility and maximum plausibility on the frameworks of discernment and reasoning, from the segmented images derived from the indices NDVI, MNDWI and NDBaI, produced four classified images from four models (Fig. 2-5). The analysis of these images through the evaluation statistics (Table 2-           Models 2 and 4 reveal confusions that are partial ignorance characteristics for the determination of classes E, V and M, the most pronounced of which are found in model 2. This could be due to the decision criterion applied, maximum credibility, which measures how pessimistic the information produced is.

Images classified with DSmT
The results obtained by the DSmT approach derive from the classifications made with the hybrid combination based on the PCR5 rule for the four models generated by an integrity constraint and simplifications ( Fig. 6-9). The analysis of the statistics produced by these models indicates well-ranked pixel rates, for vegetated surfaces, aquatic surfaces and mineral surfaces, greater than 90% for each entity and for the four (4) models (Tables 6-9) with the highest rate.        statistics for model 3  Ground Truth  ---------------------------------E  V  M  Classified image  E  94,44  0,00  0,00  with model 3  V  0,00  91,    Bottom is achieved by vegetated surfaces for each model. The comparison with the DST approach shows that the DSmT approach gives more satisfactory results. Therefore, by referring (Table 10) to the average well-ranked pixel rates and field entity compliance rates according to the four (4) proposed DSmT models, model 1 (model with exclusion integrity constraint E∩V∩M = φ) was chosen as the best model because, in addition to an average rate of well-graded pixels of 93.34%, a compliance rate of entities with the field (96.37%) superior to those of other implemented models. Thus, with this model a physical correspondence in the field of the classified entities was carried out and presented by Fig. 10.

Conclusion
In this study, the aim was to propose an unsupervised classification approach, based on the use of the NDVI, MNDWI and NDBaI spectral indices and the Dempster-Shafer and Dezert-Smarandache theories, to characterize vegetated surfaces, aquatic surfaces and mineral surfaces.
We used as an experimental study area, in the forestsavanna transition zone in central Côte d'Ivoire, ASTER satellite images, ENVI 4.7 and MATLAB software, a GPS, topographic maps and geographical coordinates of fixed points and outcrops The methodology first consisted of conditioning the ASTER satellite images, then developing and implementing DST and DSmT models for unsupervised classification and finally evaluating these models in order to remember that one.
The resulting results show that the DST approach is relatively satisfactory for the unsupervised classification of mineral surfaces and aquatic surfaces while it is not satisfactory for vegetated surfaces according to all proposed models. As for the DSmT, it presents satisfactory results for all the models proposed. The model with the exclusion integrity constraint E∩V∩M = φ was selected as the best model because having, in addition to an average rate of well-graded pixels of 93,34%, a compliance rate (96,37%) with terrain higher than those of other models implemented.
However, the proposed approach could be used, with appropriate accommodations, for other characterization and mapping purposes.