Data Relay Clustering Algorithm for Wireless Sensor Networks: A Data Mining Approach

: Problem statement: Nowadays sensors are very essential for today life to monitor environment where human cannot get involved very often. Wireless Sensor Networks (WSN) are used in many real world applications like environmental monitoring, traffic control, trajectory monitoring. It is more challenging for sensor network to sense and collect a large amount of data which are continuous over time, which in turn need to be forwarded to sink for further decision making process. Clustering of sensory data act as a nucleus job of data mining. A clustering in WSN involves selecting cluster heads and assigning cluster members(sensors) to it for efficient data relay. The contraints in power supply, limited communication, bandwidh, storage resoures are the major challenges in WSN facing today. Conclusion: Proposed study presents K-Means Data Relay (K-MDR) clustering algorithm for grouping sensor nodes there by reducing number of nodes transmitting data to sink node, it reduces the communication overhead and in this manner increase the network performance. Furthermore Conserve and Observe Modes (COM) algorithm reduces the number of nodes within the cluster there by without compromising the coverage face major challenges such as limited communication bandwidth, constraints in power supply and storage resources region of it. The contribution of K-MDR is to reduce power consumption finally the simulation experimental results show that the time efficiency of the algorithm is achieved.


INTRODUCTION
Advances in wireless communications made to cultivate tiny hardware components as multifunctional and intelligent sensor nodes with major advantage of low-Power and low-cost.Usually it communicate in short range distances over a radio frequency channel and these devices are small in size.the componenets of these tiny nodes are sensing, processing and communicating data, realize the objectives of wireless sensor networks (Taherkordi et al., 2008).
Comparing with the tradition sensors,Wireless sensor networks promise significant improvements. A large number of integrated sensor nodes from the Wireless Sensor Network which are densely deployed either inside the observable fact or very close to it . It cooperate with each other through a wireless network in gatheringing the environmental information or reacting to particular events. Classical applications of sensor networks are monitoring of medical data, weather monitoring, object tracking, vehicle monitoring and combat field survey (Ilyas and Mahgoub, 2005). The majority of sensor networks applications fall into the querying class of applications and for future analysis and mining it required to continuously collect and integrate data. The WSN's extraordinary characteristics direct us to innovative research challenges in several data mining process. WSN face rigorous resource constraints in communication bandwidth, power supply, storage and processor capacity (Ma et al., 2005) normally the traditional mining techniques is centralized, computationally expensive and focuses on disk stored data. In data mining grouping a simliar data is known as clustering which is a preparatory step for future data analysis. In this study a new algorithm called distributed K-means clustering algprithm is proposed for clustering sensor's node in WSN. The nodes within the clusters will forward the data to sink through cluster head where aggregation of data takes place.
Cluster analysis in data mining: The process of clustering the data objects into cluster is depended on the similarities and their functionalities is data clustering. This process is much easier to collect and process the data sensed from the environment. The dissimilar sensors are placed in the nearby clusters. The difference between the clusters are dependent on the characteristics of each cluster and they are measured by various distance functions like Manhattan distance and Eulidean distance (Han et al., 2011;Forman and Zhang, 2000). There are number of algorithms are developed for data clustering in the past. Many of these clustering algorithms are designed to deal with data which specifically stored in a traditional database management system, but here WSN is distributed environment so the data taken to analyze is ditributed across multiple sites. The distributed environment faces many challenges in data analysis because of privacy and limited bandwidth (Silva et al., 2005). In distributed environments many algorithms are proposed for the problems of cluster analysis. Silva et al. (2005), based on rounds of message passing between data sites and a central site a number of distributed algorithms are grouped. A wireless sensor network is made up of energy efficient sensors in a distributed environment. Therefore, these challenges should be addressed while designing a data clustering algorithms and also the sensor constraints in communication and computation are considered.
Cluster support sensor networks: The energy limitations in sensor network makes hard in managing the sensor energy and safeguarding of the power is a crucial factor for attaining prolonged network lifetime. Clustering sensors into group is more efficient and adaptive approach is adopted for node communication and routing by many studies on sensors energy consideration (Kim et al., 2005).
Cluster-based mechanism (Ghiasi et al., 2002) is employed for node communication and routing. In the clustered WSN the sensors communicate data only to cluster-head .Then the cluster heads transfer the aggregated data to the processing center or to the base station which is called as sink shown in Fig. 1. The base station is a specialized device or one of the sensors. Bandyopadhyay and Coyle (2003) selecting a set of cluster head among the nodes in the network is the essential operation, it also cluster the other nodes with these heads, the cluster heads are selected according to some negotiated rules, normally the node which is more powerfull in the topology play the role of cluster and the other nodes will forwarded the sensing data to cluster nodes. In energy consumption sensors, communication is the main factor to be considered. Ghiasi et al. (2002) the amount of energy used is totally dependent on the distance the data has to travel between the sender and the receiver. Since the sensors only communicate data to cluster heads over smaller distances in clustered sensor network, the total number of sensors in the network is much lower than the situation in which each and every sensor communicates directly to the base station.  Lee et al. (2004) many heuristic algorithms have been proposed for choosing cluster-heads. Moreover, in all the approaches, the attempt is made to estimate the optimal number of cluster-heads. Kim et al. (2005) to balance the energy consumption over the entire network is the objective of WSN. The algorithms like ASCEMT, GAF, LEACH, SPAM, HEED and ACE are all try to preserve and balance the energy dissipation of the WSN using cluster based architectures.
The major goal of clustered based sensor network is to stability the energy consumption over the entire network. LEACH, ASCENT, SPAN, GAF, ACE and HEED all attempts to save and balance the energy dissipation of the network using cluster-based architectures, as a result extending entire the network lifetime (Lee et al., 2004).

Problem definition:
Consider N number of sensor node is dispersed uniformly in a field to detect D attributes. Here the assumption is with predetermined cluster-heads these sensors are clustered into groups resulting in a clustered sensor network. In our proposed approach the goal is to develop an algorithm that clusters the the sensor nodes for a data relay without compromising coverage area. The majority of the algorithms are not concentrated on the convergence time and computation time.
Proposed work: Initial assumptions made for following work: • The network is homogeneous • For long range transmissions the cluster-heads are dominant for performing computations to the sink • Each and every sensor node transfers data directly only with other nodes in its cluster • Number iteration in K-means controlled by sum of mean square error Generally K-means algorithm is designed to group the sensors nodes for proficient data relay. First and foremost, k random points are chosen as the cluster leader by the sink node. The main drawbacks of the above mentioned algorithm are that the chosen cluster heads may not be able to obey its assigned function as a representative. Moreover in case of smaller networks it would result in a high cost for clustering the predefined number of k, whereas the clusters could be limited. The proposed K-MDR algorithm conserves the energy by delimiting the clustering factors in a number of ways and thus improving efficiency.
K-MDR Algorithm 1: Let select k random points as Ch j (j= 12,3,4,..k) 2: repeat 3: for all si € S do /*assign each sensor to clusters*/ 4: Qi ← Ch/*candidate clusters*/ 5: to prune candidate representatives from Ch = {ch 1 ,ch2,ch3..., ch k }. 6: for all chj∈ Ch do 7: if MDRi⊆ V(chj ) then 8: Qk← {chj} /*The one and only one candidate*/ 9: if |Qi| = 1 then /*only one candidate remains*/ 10: f(i) ← j where chj∈ Qi 11: else 12: for all chj∈ Qi do /*remaining candidates*/ 13: Compute ED(si, chj ) 14: f(i) ←argminj:chj∈Qi{ED(si, chj )} 15:for all j=1,...,k do /*readjust cluster representatives*/ 16: chj← centroid of {si∈ S | f(i) = j} 17: until Ch and f become stable f(i) ← j * For improving the performance of K-means algorithm, the time spent on ED calculations need to be reduced, because it dominates the execution time of the algorithm. Since we know that computation cost is much more than transmission cost. The one of the possible way to achieve is whenever possible ED computation scan be avoided. To the specified sensor i, the set Qi supplies the group of candidate cluster representatives that are likely the closest to si. Initially, Qi = Ch, a pruning algorithm is useful to prune candidate representatives from Qi that are definite to be not nearby to sensor si. Suppose if one candidate cluster leftover in Qi, then sensor si is hand overed to that cluster. If not, calculate the expected distances between si and other remaining clusters in Qi. Sensors with the smallest expected distance are assigned to the cluster. A excellent pruning algorithm should supposed to considerably reduce the set Qi to a very small cardinality, so that number of ED calculations that required to be carry out in line 13 is as not many as possible. Thus we can reduce ED calculation by K-MDR.
Above method would swallow a considerable amount of energy of the individual sensor in the network. Additionally, another algorithm (COM) would render its function to make the lifetime of the entire network to be enhanced without comprising coverage area within its cluster by minimizing number of nodes contributing to sense and forward data.
COM Algorithm for sensor activation: After the first iterative step, the further selected subset of nodes enters into the monitoring state while the remaining sensor nodes go to conserve mode. The selected observe nodes provide full coverage over the monitored field during this iteration. In COM, an activation delay is assigened by each sensor node thenself that is relative to its function cost C(s) =1/E(s). For choosing the active sensors the next iteration in this algorithm a smallest cost sensor node have a highest chance of becoming active sensors. Every sensor node then waits for a mount of time before come to a decision whether to stay in observe mode in the next communication round. Until sensor wait for a observe mode delay time to expire, the sensor node can accept the observe mode communication from its neighboring nodes, which have smaller activation delays (smaller cost), if they decide to become active during the upcoming iteration. Once the sensor node's observe mode delay time expires,the sensor node check whether the neighboring nodes covers its region entirly for monitoring process. If not, the sensor node broadcast an observe mode message to its neighboring nodes about its decision to stay active. Nodes with the minimum cost have more priority to become active. In the network all sensor nodes mutually take part in the activation phase, despite of the cluster to which they fit in. This eliminates the redundant activation of sensor nodes on the borders of the clusters, which may happen when the activation of nodes is done in each cluster independently.
Experimental Result:The simulation results of the proposed work are carryout with MATLAB. And comparison between K-MDR and K-MEANS algorithm (Fig. 2) are given. From the results obtained, we find that K-MDR with COM was proved to be effective and efficient algorithm for node clustering (Fig. 3). Furthermore optimal number of nodes was participated per iteration in clusters without compromising network coverage.