Evaluation of Customer Behaviour Irregularities in Cameroon Electricity Network using Support Vector Machine

: Non-Technical Losses (NTLs) in the Cameroonians electricity network are approximately 30 to 40% of production and are estimated at several billion CFA francs per year for National Electricity Company (ENEO); Hence the importance of finding effective solutions to fight against these losses. The purpose of this work was to develop a tool for the fraud detection for Cameroon National Electricity Company (ENEO) using support vector machines which consisted in data preprocessing base on the load profile, development of a model for classification, parameter optimization and detection of customers irregularities and prediction.


Introduction
In Cameroon, the Cameroon National Electricity Company (ENEO) is the only electricity company with nearly one million subscribers. To this figure, it is now necessary to add people and groups who clandestinely connect to the low voltage network or who defraud by illegally handling the meters. The number of these clients of a different kind remains uncertain and difficult to define in an urban and peri-urban. They are characterized by anarchy and spontaneous habitat, a place of preference for wild and clandestine electrical connections. The losses are amount to several billions of CFA francs each year and represent a real technical and economic headache for ENEO distribution department. According to information gathered from the Interemployer grouping of Cameroon, (GICAM): "The demand of electric power of the companies grows of 8% each year while the supply progresses barely of 2%. Since 2003, the difficulties of supplying electricity have caused losses estimated at more than 60 billion FCFA for ENEO, more than one point of the annual growth rate of the country". For this proposal, Cameroon's economic program for 2020 is to increase energy production from 1337 MW currently to 3000 MW. The objective is not only to fill the national deficit but also to export electricity. In order to absorb energy deficit, Cameroon is engaged in several projects such as: • The Dibamba Power Developer Company, responsible for the production of electricity from a The research question of this study is, how can we improve the supply, availability and quality of electricity in Cameroon by reducing non-technical losses in the existing distribution network? And the Objective of the research to provide to the national electricity companies a reliable tool based on the use of Support Vector Machines (SVM) for the search non-technical losses in the distribution network, in order to eradicate fraud Distribution. Customer consumption patterns are extracted using data mining techniques. Based on the assumption that load profiles contain abnormalities when a fraud activity occurs, SVM classifies load profiles of customers for detection of fraud suspects. This research concentrates only on scenarios where abnormal changes appear in load profiles, indicating fraudulent activities due to the unavailability of a clear database at ENEO CAMREOON for the others factors contributing to NTL activities.

Support Vector Machine
Support Vector Machines are often translated as Large Marge separator (SVM) which is a class of learning algorithms defined for discrimination, that is to say provide variable initially binary (Nagi et al., 2008a). They are based on the search for the optimal hyper plane margin, when possible, class or correctly separates the data while being far away as possible from all observations. The principle is to find a classifier, or a discrimination function, the generalization ability (quality forecast) is the largest possible (Nagi et al., 2008b). That is to say, to bring the issue of discrimination in the linear, the search for an optimal hyper plane and two ideas or tricks achieve this objective (Nagi et al., 2010a;2010b): • Define the hyper plane as a solution of a problem of constrained optimization, whose objective function is not expressed only by using scalar products between vectors • A research nonlinear dividing surface is obtained by the introduction of a kernel function in the scalar product As in any learning situation, a variable Y is considered to predict but to simplify this basic introduction, we supposed it dichotomous with values {-1,1}: Explanatory or predictor variables and φ(x) a model for Y function where: Generally we can simply consider the variable {X} with values in a set.
We notice: , ... , n n z x y x y = A statistical sample size of n and law with unknown. The objective is to build an estimation ˆ of ϕ ϕ function of X in {-1,1} so that the probability P(φ(X) ≠ Y) is supposed to be minimum. The problem is like the search for a boundary decision in space F with value in X.
Conventionally, a compromise must be found between the complexities of the border, which can also be expressed as its ability to spray a cloud of points. So the model fit capacity and widespread anticipation of the qualities of the model (Fourie and Calmeyer, 2004). This is equivalent to solve a problem of classification or separation as follows in Fig. 1.
This case is defined by using the scalar product of H a hyper plane by its equation: where, w is orthogonal vector to the plane, x point to predict. This is well positioned if and only if: Search the maximum margin separator plane involved in solving the quadratic problem constrained below: The dual problem is achieved by introducing Lagrange multipliers α * and the solution is provided by a point (w * ;b * ;α * ) of the Lagrangian: The cancellation condition of partial derivatives of the Lagrangian allows writing: These equality constraints allow us to express the following dual formula:

Organization of the Method
The research methodology framework proposed in order to develop our intelligent fraud detection system for detection, identification and prediction of NTLs activities is shown in Fig. 2.

Data Acquisition
Electricity customer consumption data was obtained from Cameroon National electricity Company (ENEO) Billing system. The data base consisted of 62 000 customers for a period of 12 months (From January to December 2014) as is shown in Fig. 3 for the central region commercial unit.
But others factors contributing to NTL activities which we did not use in this work due to the unavailability of a clear database at ENEO CAMREOON such as: Although some electrical power loss is inevitable, steps can be taken to ensure that it is minimized. Several measures have been applied to this end, including those based on technology and those that rely on human effort and ingenuity. Among the factors contributing to NTL activities, NTLs based on the components identified are listed in Table 1.
In the majority of factors contributing to NTL activities as indicated in Table 1, electricity customers intentionally avoid paying their bills or are involved in pilferage, theft and unauthorized use. Therefore, the intention of the present study is to focus on detecting and identifying NTL activities in the distribution network where deviations in customer behavior exist. That why, in this research, the approach is a method of data mining using support vector's machine in orders to extract patterns of customer behavior from historical consumption data base in a load profile.

Customer Filtering, Selection and Extraction
The raw data obtained from Cameroon National electricity Company (ENEO) billing system was filtered for extraction of customer load profiles and features. Hence, data mining techniques using data base querying were applied for: • Removing repeating customers in monthly data • Removing customers having no consumption (0 kWh) throughout the entire 12 month period • Removing customers who are not present within the entire 12 month period that is, removing new customers registered after the first month

Data Normalization
The load data needs to be represented using a normalized scale for the SVM classifier. Therefore, the monthly average kWh consumption feature data is normalized as follows: where, x m represents the current kWh consumption of the customer, min(x m ) and max(x m ) represent the minimum and maximum values in the 12 months consumption feature set as is shown in Fig. 4. Then typical load profiles of customers were then established, with each load profile being represented by the 12 normalized monthly average kWh consumption features.

Feature Adjustment
All 62 000 customers were given a label, where the labels are represented by integer values; (« 1 » for suspects Customer's and « 2 » For normal customer's). Normalized feature values with labels are represented as a LIBSVM feature file, denoted by the matrix W, in the form: where, « m » Is total number of customer and « M » Is the total number of months.

Graphics Interface
The graphic interface of the fraud detection system here in Fig. 5, (that is, support tool to decision making in the fight against non-technical losses in the distribution network: Case of ENEO Cameroon) was developed and designed simple for the detection and identification of suspicious customers (customers list to check); the different buttons and their functionality are presented Table 2.

Data Base Selection
To launch the software, a given excel format based file must be selected. To select a data file, users simply click on the button "Import Customer Data", it opens a file browser in the File Browser dialog box, excel file is then selected as indicated in Fig. 6. Loading of customer data base is indicated in Fig. 7.

Implementation of Detection
Once the customer data file is selected, by clicking start, the software will run in trade detection indicated in Fig. 8. It applies all the procedures mentioned in the previous listed.  Detection is complete once the display of trial appears in the customers list, with their status indicated in Fig. 9.

Display of Customer Load Profile
Once the software has posted the list of clients with their status, it is possible for us to visualize their load profile. Simply right click on the customer whose profile you want to see indicated in Fig. 10 to 12 respectively suspect, normal and unclassified customers.

Discussion
The training accuracy of the SVC model is estimated by tuning the SVC kernel parameter and the error penalty parameter, C. In this study, the RBF kernel is used, hence, the parameter gamma which controls the width of the Gaussian is to be fine-tuned.
Experimentally, by iterating different parameter combinations for our model we firstly obtained the accuracy of 75, 9% as is shown in Fig. 13.  To increase this percentage of success, we use the "search grid" to optimize the kernel parameters. The best results were obtained for the optimal parameters C = 8 ET γ = 0.0078, w1 = 416, 66; w2 = 131, 57 and we obtained the accuracy of 83% as is shown in Fig. 14 This result is in agreement with those obtained in literature as indicated in Table 3.
In order to apply a supervised learning technique, we chose a data base of 331 customers. This data base was used to establish the SVM classification model through training and testing as shown in Table 4.
We selected 61 782 customers data to test our model and we obtained 56 501 normal customers, 5 281 suspect customers and 8 unclassified customers.

Conclusion
NTLs on the Cameroonian transmission and distribution network are about 30-40% of electricity production, causing enormous losses estimated at several billion FCFA per year to the State. Hence the importance of finding effective solutions to these losses. The purpose of this work was to develop a tool for the fraud detection for Cameroon National Electricity Company (ENEO). Firstly we present the context and the problems of our study in the introduction. Secondly, we present the methodology of support vector machines which consisted in data preprocessing, development of a model for classification, parameter optimization and testing and validation of SVM model.
For the implementation of this work we use: • A database of nearly 62,000 customers for ENEO central region business unit • In order to apply a supervised learning technique, we chose a data base of 331 customers and this data base was used to establish the SVM classification model through training and testing • We selected 61 782 customers in the data base to test our model • Despite the good performance achieved, validating the proposed system required the comparison of results of the SVC protocol with practical cases detected by ENEO The development prospects are many, including: • Use of Genetic Algorithms for a "fine tuning" of SVM parameters such as: The penalty coefficient C and γ • The fraud detection system requires the use of multiple data from different regions for learning and testing. That is to say customer data already classified: Normal or suspect • The operation and contribution of experience ENEO agents for the development of the final decision algorithm • To finish, we believe that fraud detection system available offline can be operated at commercial agencies ENEO to create an optimal system of customer management • In addition, the use of the proposed system will allow ENEO improve its management of NTL and revenue protection