The Development of Geo-Information System for Finding Potential Zones of Water using Environmental Parameters and Engineering

Problem statement: Water is a major natural resource for all the livi ng beings in the world. People’s lives and livelihood depends on water. Mos t of the human beings in this world use ground water for drinking purpose. The reason behind that is the ground water is pollution less or less polluted when compared with the surface water. The ne d for clean water increases continually with the world population growth. People in different ar e s in this world are lack of fresh drinkable water which is important for their survival. Maintaining secure water supplies for drinking, industry and agriculture is not possible without ground water. S o it is necessary to explore the potential ground w ater area to dig a well for the utilization of ground wa ter. Approach: In this study, we have used some parameters to identify the level of water in a part icular area. Then, the ground water identification system is designed with the help of Histogram Equalization , Neural Network and PCA. The collected data is given as input to data normalization and the featur is computed for every data using PCA. The presenc e of ground water in particular location is identifie d using the trained neural network. Results: Finally, the experimentation is carried out using the synthetic data to show the performance of the ground water identification system. Conclusion: The potential zones of ground water can be identifi d if the specified parameters of the particular location can be given as input to the neural network.


INTRODUCTION
In recent years, the utility of the Geographic Information System (GIS) (Morales et al., 2002;Sudiana and Arbain, 2011;Carpio, 2011;Nouri et al., 2006) in groundwater management and research has been increased in large number. GIS is now commonly used to create the digital geographic databases to influence and prepare the data as input for various model parameters and to display the model output. These functions allows primarily overlay or index operations, but latest GIS functions that exists or under development can further helps the necessity of process based approaches. A GIS controlled hydro geological database has been generated in order to aid data used in vulnerability assessment systems and numerical modeling for groundwater flow and contaminant transport studies. Additionally, coupling between the database and process based numerical systems were implemented. Data and information is complex which is necessary for the hydro geological studies. Information about geology, hydrology, geomorphology, soil, climate, land use, topography and anthropogenic features has to be analyzed and combined. Data are collected from existing databases and maps and also from field measurements.

MATERIALS AND METHODS
A GIS is a system for input, storage, manipulation and output of geographically referenced data. GIS gives a means of representing the real world through incorporated layers of constituent spatial information. In GIS, the geographic information are represented as objects or fields. The object approach describes the real world through simple objects like points, lines and areas. The objects describing entities are characterized by geometry, topology and non-spatial attributes values. Spatial objects like wells, piezometers, boreholes, galleries and zones of protection are few instances in hydrogeology. The attribute values of the object could be the set of wells, the ownership and the diameter of a gallery or drain. The field approach denotes the real world as fields of attribute data without defining objects. This approach gives the attribute values in any location. In GIS, the difference between the objects and fields are frequently associated with vector data models and raster data models. The vector models denote the spatial phenomena through the differences in the distribution of properties like points, lines and areas. Each layer in this system is a modified mixture of one or more classes of geometrical features. A raster model contains rectangular array of cells which the values are assigned to each cell. In the raster model each cell has only one value.
In general, the Geographic data processing is seen as a subfield of data processing. Geographic features and attributes have to be modeled to create the digital version of real geographic form or pattern. The equations are used by the process based modeling which describes the physical or biochemical process that is to be simulated for understanding and prediction behavior. Useful relationship can be established between these two forms of modeling. Most of the GIS can achieve the overlay and index process quickly but cannot execute the operation based on the groundwater modeling task which is related to the groundwater flow and transport process. Coupling a GIS to process based model will give an efficient tool for processing, storing, manipulating and displaying the hydro geological data. An effective GIS can reduce the time required for data preparation and presentation. Modeling the groundwater flow and contaminant transport in aquifers denotes a spatial and temporal problem that needs the integration of deterministic process based models with GIS.
Each model parameter should denoted by three or four dimensional (x, y, z and time) information layers to model the physical and chemical process in the aquifer.
In this s, we have taken some different parameters to identify the potential zone of water. The parameters are as follows: Valleys, Leaf Area Index, Water Logging, Lakes and Rainfall. We have used the techniques like, Histogram Equalization, Neural Network and PCA for ground water identification. Here, the collected data is given as input to the data normalization and the feature is computed for every data using PCA. The presence of ground water in particular location is identified using the trained neural network. The folowing sections dicuss the motivation behind the research, proposed system of ground water identification, experimentation and conclusion of the proposed system.

Valleys:
We have classified the distance between the target area and the valley as five categories. The first category is the distance from 0 to 1km and the second category is the distance from 2-3 km and the third category is the distance from 4-5 km and fourth category is the distance from 6-10 km and the fifth category is the distance above 10 km. Then we have to estimate the distance from the valley to the target area.      Motivation behind the research: Groundwater is the most precious natural resource in the world. Groundwater supports fitness of the human beings, economic development and ecological diversity. Because of its inherent qualities, it has become an essential and reliable source of water supplies in the entire climatic regions of developed and developing countries. In India, more than 90% of the rural population and almost 30% of the urban population depends on the ground water for their drinking purpose and for their domestic purpose. Therefore, the groundwater is raised as a poverty reduction tool in developing countries. Nowadays the groundwater studies (Li et al., 2011;Zhenmin, 2009;Maio et al., 2009) are very important for not only targeting the groundwater potential zone, but also for examining and preserving this vital resource. Usually, test drilling and stratigraphy analysis are used to determine the location of aquifer, physical characteristics of aquifers. But, such an approach is very costly, time consuming and need skilled manpower. Ghataprabha, sub Basin of Krishna River in peninsular India is facing a rigorous water shortage problem for both irrigation and domestic purposes. Every year in summer most of the surface water will get dry up and cause very serious water shortages for both domestic and irrigation purposes. The availability of the surface water cannot be ensured in right quantity at the necessary time because of the changeable nature of the south-west monsoon in India. Therefore, most of the irrigated area in Ghataprabha Basin is cultivating with the support of groundwater which is got from the dugwells and tubewells. The unrestricted excessive pumping of the groundwater resulted reduction of groundwater in some area. During the dry period, the dugwells and the hand pumps also become out of use every year and thereby annoying the water problems in such areas. So, it is necessary to recognize the potential renewable groundwater resource. To identify the potential renewable ground water resources, first we have to identify the areas which are most likely to yield the renewable water resources.

Parameters
collected for ground water identification: To identify the potential zone of water we require some parameters that is given in Fig. 1 • Valleys are the extended depression of the earth's surface. Valleys are generally sapped by rivers and may form in relatively flat plain regions or between ranges of hills or mountains • Leaf Area Index (LAI) represents the quantity of leaf material in an ecosystem and is geometrically defined as the total unfair area of photosynthetic tissue per unit ground surface area • Lake is a small body of water which is entirely surrounded by land. Lakes are generally found in the bottom of a basin • The quantity of precipitation falling over a given area in a given period of time is called rainfall • The raised water table in the soil is called water logging. This happens in poorly drained soils where the water can't pierce deeply.
If our estimation meets the first category, the possibility of ground water in the target area is 80%. If our estimation meets the second category, the possibility of ground water in the target area is 60%. If our estimation meets the third category, the possibility of ground water in the target area is 40%. If our estimation meets the fourth category, the possibility of ground water in the target area is 20%. If our estimation did not meet the above four categories, the possibility of ground water in the target area is less than 10% or there will not be any possibility of ground water in the target area.
Lakes: By considering the lakes, we have classified the distance between the target area and the lake as five categories. The first category is the distance from 0 to 1km and the second category is the distance from 2-3 km and the third category is the distance from 4-5 km and fourth category is the distance from 6-10 km and the fifth category is the distance above 10 km. Then we have to estimate the distance from the Lake to the target area.

RESULTS AND DISCUSSION
If our estimation meets the first category, the possibility of ground water in the target area is 80%. If our estimation meets the second category, the possibility of ground water in the target area is 60%. If our estimation meets the third category, the possibility of ground water in the target area is 40%. If our estimation meets the fourth category, the possibility of ground water in the target area is 20%. If our estimation did not meet the above four categories, the possibility of ground water in the target area is less than 10% or there will not be any possibility of ground water in the target area.

Leaf area index:
We have classified the Leaf Area Index (LAI) as five categories based on its estimation. The first category is the Leaf Area Index above 80% and the second category is the Leaf Area Index between 61% and 80% and the third category is the Leaf Area Index between 41and 60% and the fourth category is the Leaf Area Index between 20 and 40% and the fifth category is the Leaf Area Index below 20%. Then, we have to estimate the Leaf Area Index of the target area.
If the estimation of the Leaf Area Index in the target area comes under first category, the possibility of the ground water is 80%. If the estimation of the Leaf Area Index in the target area comes under second category, the possibility of the ground water is 60%. If the estimation of the Leaf Area Index in the target area comes under third category, the possibility of the ground water is 40%. If the estimation of the Leaf Area Index in the target area comes under fourth category, the possibility of the ground water is 20%. If the estimation of the Leaf Area Index in the target area comes under fifth category, the possibility of the ground water is less than 10% or there is no possibility of ground water in that target area.

Rainfall:
We have classified the estimation of the rainfall in the target area as five categories based on the values in cms per year. The first category is the values above 800 cms and the second category is the values from 501 cms to 800 cms and the values in the third category is between 101 cms and 500 cms and the values in the fourth category is between 40 cms and 100 cms and the fifth category has the values below 40 cms.
If the estimation of the rainfall in the target area comes under first category, the possibility of the ground water in that target area is 80%. If the estimation of the rainfall in the target area comes under the second category, the possibility of the ground water in the target area is 60%. If the estimation of the rainfall in the target area comes under third category, then the possibility of the ground water in the target area is 40%. If the estimation of the rainfall in the target area comes under fourth category, then the possibility of the ground water in the target area is 20%. If the estimation of the rainfall in the target area comes under fifth category, then the possibility of the ground water in the target area is less than 10% or there will not be any possibility of ground water in the target area.
Water logging: In flat lands, heavy rain may cause water logging. Based on the discharge of the logged water on the flat land surface, we have classified the duration of water logging in to five categories. The first category is that the logged water takes less than one day to get discharge from the ground surface and the second category is that the logged water takes 1-2 days to get discharge from the ground surface and the third category is that the logged water takes 3-4 days to get discharge from the ground surface and the fourth category is that the logged water takes 5-6 days to get discharge from the ground surface and the fifth category is that the logged water takes more than one week to get discharge from the ground surface. Why we were considered this is, if the soil contains more porous, the rain water will easily get discharged from the ground surface because the rain water will use the porosity of the soil to recharge the aquifer. The aquifer is the underground layer of water bearing permeable rock. Then, we have to estimate how many days the logged water takes to get discharged from the ground surface in the target area. If the discharge of the logged water from the ground surface comes under first category, the possibility of ground water in that target area is 80%. If the discharge of the logged water from the ground surface comes under second category, then the possibility of the ground water in that target area is 60%. If the discharge of the logged water from the ground surface comes under third category, then the possibility of the ground water in that target area is 40%. If the discharge of the logged water from the ground surface comes under fourth category, then the possibility of the ground water in that target area is 20%. If the discharge of the logged water from the ground surface comes under fifth category, then the possibility of the ground water in that target area is less than 10% or there will not be any possibility of ground water in that target area.

System for ground water identification:
This section presents the ground water identification system developed based on Principal Component Analysis (PCA). The block diagram (given in Fig. 2) shows the system architecture for the ground water identification. First, we have to collect the attributes for the ground water identification and then we have to prepare the data and the data is given to the ground water identification database. Then, the data is normalized using histogram equalization that is widely applied method to confine the data values into particular space. Subsequently, the normalized data is converted into the feature space that provides the significance of the data.
Finally, the ANN/SVM is used to train the data in feature space to identify the potential zone of the ground water. The proposed system for ground water identification consists of three important phases: • Data preprocessing • Feature extraction • Identification using AI techniques Data preprocessing: Data preprocessing is an important for any identification system to suit the data for applying the sequential steps. Here, we identify some of the attributes for identifying the ground water and then, we apply the following preprocessing steps. (a) Data transformation: At first, the input data is converted into MxN format in which every row represents the number of locations and every column represents the attributes taken for identification of ground water. (b) Normalization using Histogram: After converting to the data into the particular format, we need to perform the normalization procedure that will confine the data values to particular range. The normalization procedure can help to improve the performance by reducing the time complexity. In data normalization, the input data is represented as histogram where the distribution of every data elements is presented with their frequency. Once we compute the frequency histogram, the new data value is computed through the general histogram equalization (Garg et al., 2011;Ahmed, 2011;Ravichandran, 2012). Then, the new data value is assigned for every corresponding data observation in the database.
Feature extraction using PCA: The second step of our technique is to find the feature space of the data to make the identification easier than the data space. Here, we use the Principal Component Analysis (PCA) (Kargupta et al., 2001;Sharma et al., 2006) which is mathematically described as an orthogonal linear transformation transforms data to a new coordinate system. Several possibly correlated variables are transformed by it into fewer uncorrelated variables, called principal components. Hence, key variables in a high dimensional data set that describe the discrepancy in the observation can be determined by the statistical technique PCA which can also be used to facilitate the analysis and visualization of high dimensional data set, without considerable loss of information. Theoretically, a principal component can be described as a linear association of optimally weighted monitored variables which increases the variance of the linear association and which have zero covariance with the preceding PCs. PC's are calculated by Eigen value decomposition of a data covariance matrix or singular value decomposition of a data matrix, usually after performing data centering for each attribute. The following are the steps to obtain the principal component vector from the input data matrix: • Find the mean value of every column vector • Subtract the mean of every column vector with the corresponding data value • Compute the covariance matrix for the output obtained from the previous step • Compute the eigenvectors and eigen values of the covariance matrix • Select the components from the covariance to form a feature vector

Identification of ground water using AI techniques:
The feature vector obtained from the previous step is then given to the ANN for training. Generally, two important phase of neural network is described as follows.
Training phase: The multi-layer perceptrons feed forward neural network with back propagation algorithm is utilized as learning mechanism. The backpropagation algorithm can be utilized successfully to train neural networks. Here, the feature vector is given to input layer and the target output is zero or one that signifies whether the corresponding location having the ground water or not.
Testing phase: For the target location, the above mentioned attributes are identified and the presence of ground water can be obtained from the trained neural network. Neural network: A mathematical model which is encouraged by the structure and functional features of biological neural networks is called as artificial neural network (Cybenko, 1989;Hornik, 1991;Wan, 1990;Bayati et al., 2009) which is generally called as neural network. The general architecture of the neural network is given in Fig. 3. A neural network contains an interrelated group of artificial neurons and it deals with the information by means of connectionist move toward for computation. In most cases, an artificial neural network is an adjusting system that can alter its structure based on external or internal information that flows through the network which is in learning phase. Present neural networks are non-linear arithmetical information modeling tools. Neural networks are commonly used to model multifaceted relationship amid the inputs and outputs or to find the patterns in data.
A neural network is generally defined by three types of parameters: • The interconnection model amid diverse layers of neurons • The learning method for renewing the weights of the interconnections • The activation function that changes a neuron's weighted input to its output activation.

Support Vector Machine (SVM):
The identification of ground water can be also done using SVM that get the feature vector of previous step. In the training step, SVM is trained using the feature vector and the presence of the ground water is identified when the corresponding attribute is given as input. An arithmetic learning based classification system is called Support Vector Machine (SVM) (Cristianini and John, 2000;Christopher, 1998). In SVM sections, the class with respect to the decision surface maximizes the margin between the classes. The surface is generally called as optimal hyper plane and the data points which are closest to the optimal hyper plane are called as support vectors. These support vectors are the essential elements for the training set. Some differences of SVM are: (1) by the employment of nonlinear kernels, the SVM can be altered to make it a nonlinear classifier and (2) by the clubbing of outsized number of binary SVM classifiers, a multiclass classifier can be formed. The pair wise classification strategy is regularly used for multiclass classification. The result of the SVM classification is the resultant values of each pixel for every class. This is employed for probability estimates.
In the two class scenario, a support vector classifier tries to attain a hyper plane that reduces the distance from the members of each class to the optional hyper plane. The two class classification problem can be explained in the following way: suppose there are M training samples that can be given by set of pairs  An optimum separating hyper plane is established by the SVM algorithm such that: On each side of the hyper plane, the samples with labels ±1 are placed and the distance of the adjacent vectors to the hyper plane in each side of maximum are called support vectors and the distance is the optimal margin. The hyper plane is given by the equation as w, y +b = 0 where (w, b) are the parameter factors of the hyper plane. The vectors which are not on this hyper plane will lead to w, y + b >0 and let the classifier to be given as ƒ(y; α) = sgm(w, y + b). The support vectors lie on the two hyper planes that are parallel to the optimal hyper plane of equation w, y +b = ±1. The maximization of the margin with the equations of the two support vector hyper planes contributes to the following forced optimization problem with: 2 i 1 min || w || with x (w.y b) 1,i 1, 2,.......,M 2 + ≥ = Experimentation: This section presents the experimentation of the ground water identification system developed based on the PCA and SVM. The proposed system is implemented and the performance of the proposed system is analyzed with the help of accuracy and the computation time. We synthesize the input data that is divided into two sets, namely training data and testing data. The training data is given to ANN as well as SVM for providing the training and testing data is used to check the performance of the proposed system. Accordingly, the accuracy and computation time is computed for the testing data and it is plotted as graphs shown in Fig. 5-8.  Figure 5 presents the accuracy of the proposed system for various hidden layer. From the figure, we can say that the performance is not varied even we varies the hidden layer or attributes. The Fig. 6 presents the computation time plot of the identification system. The computation time is also independent of various attributes and hidden layer. From the Fig. 7, the accuracy of the proposed system is constant for various attributes size and the two different functions. But, the time for the various functions is varied as the figure is given in Fig. 8. As a conclusion, the performance of the neural network is significantly good compared with SVM.

CONCLUSION
In this study, we have chosen five parameters for the identification of ground water. Here, we have developed a system using principal component analysis and the neural network. Initially, the input data was normalized using Histogram equalization and the feature was extracted with the help of PCA. Subsequently, the feature vector was trained in neural network to find the presence of ground water. Finally, the experimentation was carried out using the synthetic data to evaluate the performance of the proposed system. In future, the experimentation will be performed using the real data to prove the significance of the proposed system.