A REVIEW OF PEER-TO-PEER BOTNET DETECTION TECHNIQUES

In recent years, Peer-to-Peer technology has an extensive use. Botnets have exploited this technology efficiently and introduced the P2P botnet, which uses P2P network for remote control of its bots and become one of the most significant threats to computer networks. They are used to make DDOS attacks, generate spam, click fraud and steal sensitive information. Compared with traditional botnets, P2P botnets are harder to be defended and hijacked. In this study we discuss various P2P botnet detection approaches and evaluate their effectiveness. We identify the advantages and shortcomings of each of the discussed techniques. This can guide the researchers to a better understanding of P2P botnets and easier for them developing more sufficient detection techniques. Our evaluation shows that each technique has its own advantages and limitations. Two or more detection techniques might be used together, in order to have a robust P2P botent detection.


INTRODUCTION
Botnet is a network of infected computers (bots) running malicious software, usually installed by different attacking techniques such as worms, Trojan horses and viruses. Each bot is remotely controlled by an attacker (botmaster). They responds to the botmaster orders and initiate several malicious activities, such as email spam, key loggin, password cracking and Distributed Denial of Service (DDOS) attack.
Botnet is a network of compromised computers connected to the Internet, which were commanded and controlled by the botmaster. Botent in general, are formed in a centralized architecture and has a central point of failure which is the C&C server. That is, if the C&C server is tracked, the entire botent will be easily detected and shut down.
To avoid the weakness of centralized architecture, botnet imitate Peer to Peer (P2P) networks architecture and design a botnet of a P2P control mechanism, in order to increase its stability. In P2P networks there is no centralized node for command and control. Each node acts as a client and a server, even if a node is taken offline by the defenders, the botnet will remain under other nodes control (Ping et al., 2010). Compared with traditional botnets, P2P botnets are harder to be hijacked and defended. Figure 1 shows how P2P botent works.

P2P Botnet Analysis
Botnet lifecycle has four phases: Formation, C&C, attack and post attack (Leonard et al., 2009). In the first phase, formation; the botmaster infects other computers on the Internet to form a botnet. One way of forming P2P botnets is using the indexes of P2P file sharing system to connect to each other, this enables nodes to know IP addresses and port numbers of other nodes. A new bot receives an index from the spreading nodes, then it will try to contact to bots whose IP address included in the index. Building a P2P botnet is called bootstrap and botnets built by this method are called index-based botnet.

Fig. 1. P2P botnet operation
After building the botnet, all bots should be ready to communicate with their botmaster for more instructions, as starting an attack or making update. That will be in the C&C phase, which is the most important part for the botnet; because it defines its network topology and its strength against defenses. P2P botnets also uses the P2P traffic indexes to send commands. C&C phase include two mechanisms; pull mechanism and push mechanism. During the Pull mechanism, bots retrieve commands from the botmaster. This commonly used in centralized botnets, but in P2P, a peer can send a query message for the needed file and based on the routing algorithm of the system, the message will be passed around. The search for the desired file will be continued until peers receive the query message and return it with command encoded, or the query message will expire. Push mechanism means the bots are passively waiting for commands and resend them to other bots.
According to the instructions, bots will carry malicious activities during the attack phase. After attack, if some bots are detected and stopped, the botmaster will plan to build a new botnet.
P2P bots can spread very fast in P2P network; due to the huge popularity of P2P file sharing systems; Moreover, their traffic can be completely integrated with regular P2P traffic, which makes them more difficult to be detected.

P2P Botnet Detection
In order to protect networks against bots, bots should be stopped from spreading. But this process is not easy with P2P botnet, since there is no central point to detect and stop it. The researchers are working on methods to detect the communication of botnets, in order to prevent the bots from forming new botnet, or launching an attack.
There are two main approaches for botnet defense, the first one is analyzing the network traffic the second approach is using honeypots (Zhaosheng et al., 2008).
Analyzing the network traffic can be useful to identify the existing botnet in the networks and collecting its characteristics and behaviors and build a common model for it. So the defenders can use this model to detect botnets. Botnet in this model based on the existence of many network anomalies such as high volumes of traffic, high network latency and traffic on unusual ports. Using the common model, the hosts that share similar communication and similar malicious activity pattern can be identified. Although this approach is effective for detecting known botnets, it is not such powerful in detecting new botnets. In the other hand, honeypots is useful in analyzing characteristics of new botnets, but it is not effective in detecting infected programs. Therefore, defenders tend to use both approaches together to detect botnets and identify their C&C mechanisms (Li et al., 2011).

P2P Botnet Detection Techniques
P2P botnet is still an emerging technology; therefore most of the literature is about the centralized botnet. Recently, researchers have focused on analyzing and modeling P2P botnet (Grizzard et al., 2007). There are some efforts on detecting P2P botnet, but it still the great challenge.
The following subsections discuss most of the P2P botnet detection approaches proposed by the researchers around the world in recent years.

Botminer
Proposed a general botnet detection framework, named BotMiner. This framework is proposed for both centralized IRC and P2P botnets. BotMiner suppose that bots are coordinated malware and shows the same communication patterns and malicious activities. The first stage in the proposed framework is clustering hosts with similar malicious activities and communication patterns from a network traffic and the resulted clusters are named A-Plane and C-plane for activity traffic and C&C communication traffic, respectively. The second stage is applying a cross correlation between A-plane and C-plane clusters. As a result from the correlation process, hosts that show both kinds of behaviors are detected as bots.
Real network traffic was used to evaluate the proposed framework. The results show relatively high detection efficiency, with low numbers of false positives and false negatives. Furthermore, reasonable time and resources have been employed.
BotMiner has two main limitations, the first one that it targets a group of infected computers within a monitored network, but in fact in a monitored network there is only a single compromised host and this single host may belong to a larger botnet. Therefore, BotMiner is not effective in detecting compromised hosts. The second limitation of BotMiner is its assumption of the systematic classification of any infected hosts. In case of P2P botnet, the bot may have malicious behaviors but still exchange normal C&C messages, so that bot will not be considered as a bot for the BotMiner. Under such scenario, BotMiner may not detect bots that exchange covert C&C messages (Gu et al., 2008).

Network Streams Analysis
As shown in Fig. 2, they present a general P2P botnet detection framework, which includes three main algorithms.
P2P nodes detection algorithm: Filtering can be applied on the P2P botnet, according to its features of paroxysm and distribution of the network streams.
P2P nodes clustering algorithm: Clustering is proposed based on the connection characteristics of the nodes. The research uses K-mean clustering algorithm which based on the connection degree between the pair of nodes.
Botnet behaviors detection algorithm: By extracting the similarities of the malicious behaviors of the bots, which may occur several times a day, the algorithm can detect if the P2P network is infected by bots.
Unlike other detection models, the testing characteristics of this model taken from net stream macroscopical statistic, so it can be used to detect unknown protocol P2P botnets effectively.
They ran a simulation of the three model algorithms in LAN circumstances and have goocd results of extracting the P2P stream, clustering and detecting botnets from normal network (Liu et al., 2010).

Multi-Phased Flow Model
P2P bots generate phased flows to connect with outside peers in order to construct the botnet. Based on this, the researcher proposed a multi-phased flow model to detect malicious traffic. The proposed model identifies P2P botnet by observing similar flows between network hosts. The proposed system consists of three stages, shown in Fig. 3.
Flow grouping: Where the system group huge volume of traffic generated by P2P botnets and make clustering of TCP/UDP connections.
Flow Compression: Extract information from each flow group value.
Flow Modeling: Modeling the P2P flows using a constructed matrix based on the transition information.
Finally, the likelihood ratio is computed based on the probability-based models and used in detecting bots.

Node Behavior Detection
This research proposed a new method to detect the P2P bots inside the LAN. It uses correlation between the Process name and both ports and network traffic (the protocols). To evaluate the system on real network, a storm bot infected dataset has been used. The research was conducted in University Technology Malaysia (UTM), which has a UTM-AntiBot to monitor the input and output flows and the network communication. In this research UTM-Antibot has been used to observe the network traffic between the internet and the internal host. After filtering out all the processes of the network traffic, PPNT correlates each process with its associated port and the connected IP. A behavior of a normal user under a controlled LAN has been examined. The research resulted that it is impossible to send thousands of SMTP packets in less than 10 minutes and considering UDP packets with fixed port, SMTP packets confirm that this user is a part of Storm Botnet. Acceptable but not high rate of detection has been shown in their experimental results (Rostami et al., 2011).

Entropy Theory Detection
Propose a new detecting method that applies the Information Entropy theory in the Detection Multichart CUSUM.
Storm botnet malicious activities appear as abnormities in network flows on the CUSUM chart. The researcher collected these abnormities and transformed it into proportion, then integrated the UDP packets characteristics to data flow entropy. Then, the resulted data will be the detecting input factors of multi-chart non parametric CUSUM algorithm.
The algorithm steps are as following: • Take the data from monitoring device and turn it into the proportion CUDP, CICMP, CSMTP. α + β + γ + η = α, β, γ are the weight values. If D>K (K is a network constant), it is judged as abnormity and consider that botnet exists, otherwise not.
Evaluation of the proposed method was carried on an experiment network platform consists of a protected internet network; include several computers connected to a firewall through a hub. One of the logging hosts to the network traffic was with Wireshark and some of the hosts work as storm botnet. Results show that using entropy theory has its own advantages in detecting P2P botnets (Kang and Zhang, 2009).

Behavioral Correlation
They developed an algorithm to detect P2P bots by correlating their behavioral attributes. They use a Peacomm (Storm P2P bot) as a case study. They collect their data by assuming the bot to be already installed on the victim host, so they used extrusion detection in order Science Publications JCS to limit the bot activities. They developed an interception program (APITrace) to record behavioral attributes and capture some function calls done by the monitored processes. These function calls were used as input to the developed algorithm.
The state of the system was defined by three signal categories namely S1, S2 and S3 collected by the interception program APITrace. S1 derived from the change rate of three fields, which are Failed Connection Attempts (FCA), Destination Unreachable (DU) and Reset connections (RST). S2 is derived from the change rate of number of packets send per second. S3 represents the time difference between two outgoing successive communication functions.
The algorithm was developed to find the correlation between S1, S2 and S3, by setting a Sensitivity Value (SV) and check each value of the three signals. If it exceeds SV, the value of one will be assigned to the signal records; otherwise, zero will be assigned. Then, the signal records will be examined to check if they have same values, the value of one will be assigned, which represents the correlation between the three signals. After repeating the process for all the signals of the data (log files), the anomaly factor and the correlation values were calculated.
The evaluation shows that correlating different activities can enhance the detection process of P2P bots. The main disadvantage of this algorithm is that the threshold value is not defined. In addition, evaluation was examined only on one type of bots (Peacomm) (Al-Hammadi and Aickelin, 2010).

Network Behavior Analysis and Machine Learning
This research proposes a new method for detecting botnets through identifying the network behavior characteristics. This approach aimed to detect P2Pbotnet Command and Control (C&C) phase, which allows detecting the bots before attacking their victims. In addition, this study discussed the requirements of online botnet detection framework and investigates the ability of five Machine Learning (ML) techniques to meet these requirements. The evaluation results show the promising performance of ML techniques, but none of them satisfy all the requirements of the online botnet detection framework (Sherif et al., 2011).

Association between Common Network Behaviors and Host Behaviors
This research proposed a new P2P Botnet detection approach relying on the association between common host and network behaviors.
The proposed framework consists of six stages as following: • Detected system: To distinguish between the single and communication program, since the main characteristic of the bots is communication with other bots on other computers • Filtering: To reduce traffic load, so the system can work more efficiently • Extract features from P2P data: Detect the more relevant features to make a subset of features that describe properly the P2P data • Botnet detection: Based on the data source this stage includes host data detection and network data detection. The objective is to detect the known botnet and the unknown malware • Report: If the detected behavior is known, the system report, if not the system will detect the bot behavior by correlating host and network behaviors • Solution: After finding out the botnet, the system can either fire it back or take it down • This method has some limitations such as, bots that using encryption algorithms cannot be detected (Yin and Ghorbani, 2011)

User Behavior Sociality and Traffic Entropy Function
Based on the user behavior and the social action of Botnet nodes that differ from normal nodes, this research proposed a new structure to identify P2P Botnet and consider it as a key basis for P2P Botnet detection. The proposed structure of P2P Botnet includes: • Analyzing sociality characteristics as centrality from the original network data, by making too high centrality nodes as suspicious ones • Finding out data packet size characteristic cand use the entropy concept to make model for the data packet of the suspicious node • Make deep data packet detection, with improved entropy After doing experimental evaluations of the proposed structure, the results show that this structure can identify the P2P botnet with high accuracy. However, the identification accuracy reduces when the download rate of net traffic is very high, or the user video streaming is too big (Zhigang et al., 2012).

Data Mining
This research proposed a P2P botnet detection approach which relies on monitoring gateway traffic and analyze network behavior using data mining techniques. To evaluate the proposed method, they used a freeware WEKA and three popular algorithms J48, Naïve Bayes and Bayesian networks for data mining. The resulted accuracy rates were 98, 89 and 87%, respectively for the three algorithms. Based on the results, the proposed method can used in distinguishing infected bots flows from other bots and the most appropriate algorithm among the three algorithms was J48 (Liao and Chang, 2010).

TCP Distinctive Behavior
This study presents a new approach to recognize P2P botnets, through its Transmission Control Protocol (TCP) connections. They focus on analyzing the abnormal characteristics in the network traffic behavior of P2P botnet. This approach can be used for early detection and warning of any P2P botnet activities in the network; since the P2P Botnets initialize its activities by the TCP connections. The proposed framework includes filtering, detecting malicious activity and analyzing. The study also uses the general P2P botnet detection framework with the P2P botnet detection model proposed by Dan et al. (2010). The model involved three steps: Detection of the P2P-nodes, clustering of P2Pnodes and detection of the botnets.
The proposed framework was implemented on both normal P2P network test-bed and abnormal P2P traffic which has been infected by the P2P botnet. The captured dataset in each case is analyzed based on TCP protocols using network analysis tools. At the end of the framework, comparison is done to classifies and detect the P2P botnet characteristics (Syahirah et al., 2011). Figure 4 shows proposed TCP framework.

Behavior Clustering and Statistical Tests
Su & Thomas present two detection schemes to detect P2P botnet C&C behaviors. Based on the observation of node behaviors correlations at different times, they design algorithms using formal statistical tests on popular behavior clusters in the network, to see if there are undetectable activities from C&C in P2P botnets using non-P2P protocols, in order to measure the impact of P2P botnet C&C behaviors on normal behavior clusters. They evaluate this approach in both simple and realistic cases and achieve an encouraging good detection rate of C&C channel (Chang and Daniels, 2009).  Gu et al. (2008) Proposed a general botnet detection BotMiner can be useful for framework named BotMiner, based on detecting IRC botnet, but it is clustering analysis of network traffic not effective for detecting P2P botnet Masud et al. (2008) Proposed a general botnet detection BotMiner can be useful for detecting framework named BotMiner, based botnet, but it is not effective for on clustering analysis of network traffic detecting P2P botnet Noh et al. (2009) They consider the network traffic as They have better detection accuracy infinite data stream and use data mining than other data stream classification techniques to detect P2P botnet techniques Kang and Zhang (2009) Propose using a multi-phased flow The proposed system shows the model to detect malicious traffic efficiency with the SpamThru, Storm and Nugache botnets Kang and Zhang (2009) Applying the information entropy theory The results show that the entropy theory In the detection Multi-chart CUS UM to has its own advantages in detecting detect new P2P botnets P2P botnets Chang and Daniels (2009) Present two detection schemes using The proposed algorithms achieve an behavior clustering and statistical tests encouraging good detection rate of clustering and statistical tests C&C channel Chen et al. (2009) Propose a detection method of P2P Effective in detecting the controlled bots controlled bots on the hosts, using API on the host, but has few limitations as the function calls and algorithms to process APIs large training set required to improve the detection accuracy Hangxia (2010) They propose mitigating P2P botnets using The results show that sybil attack technique Two Sybil attacks, based on analyzing can be quite effective to defend botnets' weaknesses against P2P botnets Liu et al. (2010) Present a general P2P detection model and It can be used to detect unknown protocol algorithms based on network stream analysis P2P botnets effectively Al-Hammadi and Aickelin (2010) Developed an algorithm to detect P2P bots

JCS
The proposed correlation method can by correlating their behavioral attributes enhance the detection process of P2P bots The key limitation is that the threshold value is not defined Liao and Chang (2010) Propose a detection approach relies on The proposed method can used in monitoring traffic at the gateway and using distinguishing infected bots flows from data mining to analyze network behavior other bots Rostami et al. (2011) Propose detecting P2P botnets connections Acceptable but not high rate of detection on node behavior, by using correlation has been shown in the experimental between processes with the associated ports results and traffic protocols Syahirah et al. (2011) Propose recognizing P2P botnets through its Can be used for early detection and TCP connections, by analyzing the abnormal warning of P2P botnet activities in the characteristics in the network traffic behavior network Sherif et al. (2011) Detecting P2P botnets through identifying the The results show the promising network behavior characteristics and using performance of ML techniques Machine Learning techniques but none of them can satisfy all the requirements of the online botnet detection framework Yin and Ghorbani (2011) Their detection is relying on the association The main disadvantage is that bots using between common host and network behaviors encryption algorithms cannot be detected Zhigang et al. (2012) Proposed a new structure to identifyP2P The proposed structure can identify the botnet, based on the user behavior and the P2P botnet with high accuracy. However, social action of botnet nodes the identification accuracy reduces when the download rate of net traffic is very high Science Publications

Cyber-Security: A Data Mining Approach
They follow the approach of considering the network traffic as infinite data stream and classify it into equal size of chunks. However, they propose a new technique in storing the data. They divide the chunks into several classifiers and introduce multi-chunk, multi-level ensemble for data stream classification. This technique reduces the expected error of single chunk, single-level ensemble method. They evaluate their proposed technique theoretically and empirically and have better detection accuracy than other data stream classifications techniques (Masud et al., 2008).

Controlled Bots on the Host
Proposed a general approach to detect P2Pcontrolled bots on the host. They aim to detect malicious behaviors and P2P communication simultaneously. They use API function calls and N-gram algorithm to process API sequence and utilize a static signature to detect P2P communication traffic. The advantage of this method is detecting the bots on the host. There are few shortcomings in this approach, such as the large training set required to improve the detection accuracy and other limitation is the usage of signature based technique (Chen et al., 2009).

Mitigating Peer-to-Peer Botnets by Sybil Attacks
They proposed a new detection technique, include mitigating P2P bots behaviour. Based on analysing botnets' weaknesses, they present two Sybil attacks methods; d-choice sybil attack and random Sybil attack. They also study the effect of the sybil nodes sizes in attacking P2P botnet. Their proposed method has been evaluated by simulation and theoretically (Hangxia, 2010).
The Table 1 below summarizes the P2P botnet detection approaches that have been discussed in this previously.

CONCLUSION
P2P Botnet is the most critical issue in Network security to be detected since it based on non-centralized command and control (C&C) communication, In this study we present most of the P2P detection techniques proposed by researchers. In addition, we identify advantages and shortcomings of each of the discussed techniques, which can guide the researchers to a better understanding of P2P botnets and easier for them developing more sufficient detection techniques.
From the detection approaches discussed in this study, we can notice that: • Most of studied approaches rely on one technique in detecting the bots, which may cause less detection accuracy • Many techniques focus on detecting the bots after the attacking process; this cannot stop bots from spreading, since the remaining bots will build a newborn botnet • Most of them evaluate their proposed methods only theoretically or simulation with non-real P2P botnet environment