A NOVEL MULTICLASS SUPPORT VECTOR MACHINE ALGORITHM USING MEAN REVERSION AND COEFFICIENT OF VARIANCE

Inaccuracy of a kernel function used in Support Vector Machine (SVM) can be found when simulated with nonlinear and stationary datasets. To minimise the error, we propose a new multiclass SVM model using mean reversion and coefficient of variance algorithm to partition and classify imbalance in datasets. By introducing a series of test statistic, simulations of the proposed algorithm outperformed the performance of the SVM model without using multiclass SVM model


Background on Mean Reversion and Coefficient of Variance
There are many definitions of mean reversion. In general, it is an asset model, which shows that the asset price tends to fall (or rise) after hitting a maximum (or minimum). The process of mean reversion is a lognormal diffusion, but the variance does not growing in proportion to the time interval (Pillay and O'Hare, 2011). The variance grows at the start and sometimes it stabilises at a certain value. The most basic mean reversion model is the (arithmetic) (Uhlenbeck and Ornstein, 1930). This model is a stochastic process with stationary, Gaussian and Markovian distribution and use to describe the velocity of a massive Brownian particle under the influence of friction. In the other approach, Zhao et al. (2011) introduced AR process where the value drifted to its mean in the long run. Currently, two main methodologies are used for measuring the mean reversion, i.e., (i) variance ratio and (ii) regression. Cochrane (1988) has used the variance ratio to measure the relative importance of the random walk component. Poterba and Summers (1988) and Lo and Mackinlay (1988) compared the relative variability of returns over different time horizons using the variance ratio in discrete time series. Bali and Demirtas (2008) confirmed these reports by showing that in a 'high variance' scenario and using the mean reversion concept could cause negative drift. Using regression tests, Fama and French (1992) determined the correlation with currency asset returns, while Chen and Jeon (2000) measured its mean reversion behaviour and found that the returns were positively autocorrelated over shorter periods but negatively autocorrelated over longer periods. To measure imbalance in multiclass datasets, Cieslak and Chawla (2009) recommended to use Coefficient of Variance to solve the problem. Diebold (2004) also reported mean reversion using fractional unit-root analysis of real exchange rates under the gold standard. Diebold (2004) showed that the power of the test for mean reversion could be raised.

Background on the EMD Algorithm
The analysis of nonlinear and nonstationary data is important in many applications such as bioinformatics (Shi et al., 2008;Huang et al., 1996), signal processing (Huang et al., 1998;Huang and Attoh-Okine, 2005;Huang and Shen, 2005), geophysics (Wang et al., 1990;Datig and Shlurmann, 2004) and finance (Huang and Attoh-Okine, 2005;Guhathakurta et al., 2008). Huang et al. (1996) formulated an a posteriori algorithm with adaptive control over a separate data structure, which was later termed the Hilbert-Huang Transform (HHT) (Huang et al., 1998). The HHT overcomes the limitations of the Hilbert transform, which is only suitable for a narrow band-pass signal. The key element of the HHT algorithm is the EMD, in which any complicated dataset can be decomposed into a finite and often small number of Intrinsic Mode Functions (IMFs) that allow a well-behaved Hilbert transform. Since this decomposition is based on the local characteristic time scale of the data, it is applicable to nonlinear and nonstationary processes (Huang and Attoh-Okine, 2005). In signal processing, high-frequency noises from input data are considered simple intrinsic mode oscillations (Huang and Attoh-Okine, 2005). The EMD uses a sifting process and curve spline technique to decompose a signal into a new oscillatory signal termed the IMF. After a number of decomposition iterations, the characteristics of the IMF must meet two conditions. First, in the entire dataset the number of extrema (maxima plus minima) and number of zero crossings must either be the same or differ by at most one. Second, at any point the mean values of the envelopes defined by the local maxima and the local minima must be zero (Huang and Shen, 2005). The EMD algorithm, which is fundamental to the HHT, can thus reduce high-frequency noise from input data, such as noise from retail trades on a stock exchange.

Objective
This study presents a novel multiclass SVM model using mean reversion and coefficient of variance algorithm to partition and classify the datasets; and then introduce SVM to measure their performance. For verification, we compared to the multiclass SVM model with another SVM model that was without mean reversion and coefficient of variance algorithm.

SVM Model
There are several methods used for data classification, one of them being SVM. Let x be a vector in a vector space. A boundary hyperplane is expressed as one of the hyperplanes: Where: w = A weight coefficient vector and b = A bias term. The distance between training vectors x i = The boundary, called margin, is expressed as follows: Since the hyperplanes expressed by Equation 1 where w and b are multiplied by a common constant are identical, we introduce a restriction to this expression, as follows Equation 3: subject to y i (w T x+b)≥1. where, y i is 1 if x i belongs to one set and y i is-1if x i belongs to the other set. If the boundary classifies the vectors correctly, then y i (w T x+b)≥0, of which it is identical to the margin. This conditional optimisation is achieved by Lagrange's method of indeterminate coefficient. Define a function: where, α i ≥0 are the indeterminate coefficients. If w and b take the optimal value, the partial derivatives are zero: By setting the derivatives of Equation 6 to zero, we obtain: Rewriting Equation 5, we obtain: Substituting Equations 7-9, we obtain Equation 10: The contribution of the second term of Equation 5 should be minimised versus L should be maximised subject to α. Consequently, the optimisation is reduced to a quadratic programming problem as follows: Let Φ be a transformation to a higher dimensional space. The transformed space should satisfy that the distance is defined in the transformed space and the distance has a relationship to the distance in the original space. The kernel function K(x,x') is introduced for satisfying the above conditions. The kernel function satisfies: The above Equation 12 indicates that the kernel function is equivalent to the distance between x and x' measured in the higher dimensional space transformed by Φ. If we measure the margin by the kernel function and perform the optimisation, a nonlinear boundary is obtained. Note that the boundary in the transformed space is obtained as: Substituting Equation 7 into the above Equation 13 while replacing x with Φ(x), we obtain: The optimisation function of Equation 11 in the transformed space is also obtained by substituting These results mean that all the calculation can be achieved by using K(x i ,x j ) only and we do not need to know what Φ or the transformed space actually is.
A sufficient condition for satisfying Equation 14 is that K(x i ,x j ) is positive definite. Several examples of such kernel functions are known, as follows: (Laplacian kernel function) Equation 15: Science Publications

Data Classification Using Mean Reversion and Coefficient of Variance Algorithm
We present a novel multiclass model for use with the SVM family using mean reversion and coefficient of variance algorithm to partition and classify the time horizon (span) of multiclass datasets, respectively. In practice, the typical curve of exchange rates tends to shift towards the mean, so the point of reversal can be used to determine changes in its direction, i.e., from up to down and vice versa. The datasets are then partitioned at the reversal point using the mean reversion algorithm, which later explained, as a decision tool. The standard deviations of a nonstationary datasets are not the same, so we measure the datasets between each reversal points and input them into a SVM model. The procedure for using mean reversion and coefficient of variance algorithm are as follows: • Compute the mean µ n (t) of random variables X n (t) , mark the intercept point on the x-axis and denote it as M 1 , i.e., the value is X m (t) where r =1,2,..,c and c is the last class The original datasets X n (t) was classified in different classes of coefficient of varaince and termed to CV class. The next step aims to simulate X n (t) individually using the SVM model for each CV group. As a result, these multiple sets of kernel parameters in each SVM simulation are created; we then integrate all the blocks partitioned by the CV. Schneider and Moore (1997) classified model evaluation into many methods, e.g., judging the model quality based on the residuals, black box model selection and cross-validation. According to their reports, crossvalidation is a model evaluation method that separates training and test data, which uses the test data to test the performance after the training data were computed by a statistical model. Cross-validation uses the following three main methods.

Holdout
The Holdout method is the simplest type of crossvalidation. The datasets is separated into two sets: the training set and the test set. The estimation model fits the training set only and leaves the test data blind. The performance of the Holdout method can be quantified using a variety of test statistic, i.e., MSE, MAE, MAPE, R 2 , AIC and BIC. This method is usually preferable to the residual method and it does take that much no longer to compute. However, its evaluation can have a high variance, which may depend greatly on how the training and test data are divided. In this study, we select and divide datasets using different ratios of the training data and test data, i.e., 30:70, 50:50 and 70:30, respectively.

K-Fold
K-fold was proposed to improve the Holdout method. The K-fold method divides the entire datasets into k subsets and uses the Holdout method k times. In each subset, the training data are computed using the model and tested with the test data. Thus, the average error is computed across all k trials. The advantage of this method is that it is less important how the data is divided. Every data point is used in the test set exactly once and in a training set k-1 times. The variance of the estimate is reduced to k is increased. The disadvantage of this method is that the training algorithm has to be rerun from scratch k times, which means it requires k times as many computation to make an evaluation. A variant of this method is to randomly divided the data into a test and training set k different times.

Leave-One-Out
The name Leave-one-out explains itself. This method applies bootstrap sampling by taking one particle (data unit) out of the overall training and test datasets whereas Science Publications JMSS the remaining data are used for reference. The advantage is that the accuracy of the outcome but this is traded-off by the massive computational power requirements when handling large input datasets. Moreover, this method was designed only for model evaluation or in-sample forecasting so it is rather difficult to apply this method to out-of-sample forecasting.

The OAO and OAA Strategies
In general, the multiclass SVM is an ongoing research issue and it can be fundamentally classified using two different strategies: (i) OAA consists of building one SVM per class and trains it while the rest of the training data remains as the other class so-called 'All'. The method of building the 'One' class is flexible depending upon the structure of datasets and (ii) OAO is similar to the OAA strategy except the 'All' class is individually partitioned into subclasses. The discrimination of OAA strategy between an information class and all others often leads to the estimation of complex discriminant functions (Schneider and Moore, 1997). In brief, the OAO decomposes the original problem into a set of small problems of two information classes with n(n-1)/2 binary SVMs. In this study, we have selected mean reversion and coefficient of vraince algorithm for selecting the training data.

RESULTS
The simulations were conducted on a personal computer with clock frequency at 2.50GHz, Intel® core (TM) i5-3210 M CPU and 8 GB of RAM. All the SVMs were trained with R programming software (Kim and Oh, 2012). We use all kernel functions i.e., Gaussian, polynomial, linear, Laplace, Bessel and ANOVA RBF; then, select the best performance out of those functions.
With respect to Materails and Methods, we introduced theoretical considerations related to the SVM model and its background in supervised learning and its structure risk minimisation. We also selected the Holdout model, which is part of the model evaluation used to segregate training and test datasets, before partitioning them using the mean reversion and coefficient of variance algorithm. This technique is novel and it is used to construct a multiclass for the SVM family.

Simulation Procedure
We describe the simulation procedure used by the multiclass SVM model, as shown in Fig. 1. The systematic order of the workflow is as follows: • Retrieve 2322 datasets, EUR-USD exchange rates from Bloomberg terminal from 2001 to 2011 • Demonstrating in Fig. 1, we use the Holdout method to separate the datasets into the training and test data. After separation, each group has a ratio of 30:70, 50:50 and 70:30. The 30% of the datasets are used as out-of-sample forecasting • At the reversal point of the curves resulted from the mean reversion technique described in 2.2, we use CV to classify each group of dependent variables and yields a number of sub-groups • From (3) for each sub-group we simulate the multiclass SVM model using the R programming software (Kim and Oh, 2012). This fits the multiclass kernel functions and parameters for each subgroup. During this stage, we obtain many kernel parameters for each group partitioned, as shown in Table 1 • Integrate all of the results from (4) (3) and Table 1, we assign the first CV class to 'One' and the remaining classes to 'All' as per the OAA strategy. Additionally, we select the second CV class and assign to 'One' and the remaining classes to 'All' and so on. Alternatively, we introduce OAO strategy; thus, the pairing of multiclass is as follows: a) Class 1 and Class 2, Class 1 and Class 3,…, Class 1 and n where n is the last Class b) Class 2 and Class 3, Class 2 and Class 4,…,Class 2 and Class n; until the final when c) Class n-1 and Class n • For the One-Against-All strategy, we compare simulation results of the data points, which are located in the correct classes with the results from (5); and then analyse the percentage of error. For the One-Against-One strategy, use the outcomes from each paring to vote for the best performance in each class

Simulation Results
Using procedures in 3.1, we retrieved the original datasets. The next step was to introduce the Holdout method to segregate the training and test data using Science Publications JMSS different ratios, i.e., 30:70, 50:50 and 70:30. We introduced the mean reversion and coefficient of variance algorithm to partition and classify the training data, respectively. We intended to leave the test data to be used as reference in order to compare with the predicted outcome from the SVM simulation. As a result, 48 blocks/partitions were separated into six CV classes, at which are shown in Table 1. We then grouped each CV class and simulated them with the SVM classification model using R Programming scripts (Kim and Oh, 2012). The simulated are shown in Table 2-6. Table 1 was used to plot a graph and Fig. 2 illustrates the distribution of the six CV classes where the x-axis represented 50 blocks/partitions and the y-axis represented the CV values of the different classes. The figures in brackets show the number of data points per CV class and they were re-displayed in Table 4. Each block/partition contained different numbers of data points, depending on the change in the curve direction, i.e., upward to downward or vice versa.
At this point, we had classified datasets into multiclass. The next step was to run simulations of the SVM models. The results of those simulations are in Table 3.
We fitted the kernel distribution functions and their parameters. The outcomes shown in Table 2 using different sets of kernel functions, i.e., radial, polynomial, linear, Laplacian, hyperbolic, Bessel and ANOVA RBF, were different. Moreover, the number of support vectors, parameters and training error were presented in Fig. 2. Definitions of the parameters given in the Table 2 are as follows: • Sigma: the inverse kernel width used by the Radial, Gaussian, Laplacian, Bessel and ANOVA RBF kernel functions • Degree: the degree of the polynomial, Bessel or ANOVA RBF kernel functions, i.e., a positive integer • Scale: the scaling parameter of the polynomial and tangent kernel functions is a convenient way of • normalising patterns without any need to modify the data itself • Offset: the offset used in polynomial or hyperbolic tangent kernel functions • Order: the order of the Bessel function Next, we used the test statistic, i.e., MSE, MAE and MAPE, to measure the performance of each kernel function in all CV classes of the datasets. The results given in Table  3 show that the Laplacian kernel functions performed the best. Having successfully simulated the SVM with Laplacian kernel functions and parameters, in conclusion there were six CV classes in 48 blocks/partitions and each block had a different number of data points. Table 4 shows the CV classes including the Laplacian kernel functions and parameters. At this stage, we completed simulations required for building up the training data.
Referring to the SVM model for classification, we simulated the test data, which came from the ratio of training and test data at 70:30, by using the trained Laplacian kernel functions and their parameters. The simulation results of the three datasets shown in Fig. 3, at which the x-axis represented the number of data points used for classification and the y-axis represented the EUR-USD. Finally, the results demonstrated that the graph of the multiclass SVM model in black solid line agreed with the graph of the original datasets in dotted blue line, whereas the graph of SVM model only was deviated from those two graphs. To ease for presentation, the x-axis represented the test data ranking between 2001st to 2049th of the original datasets.
To verify the performance of the multiclass using the mean reversion and coefficient of variance algorithm, we plotted three graphs, which showed the simulation results for the multiclass SVM model and the SVM without multiclass algorithm (SVM model only). The two graphs of the SVMs after simulations with and without multiclass were significantly different, as shown in Table 5. Using the Laplacian kernel function and a training:test data ratio of 70:30. For example, we compared the Accuracy count which was 79.88%, whereas the SVM model without the multiclass algorithm measured by Accuracy count of 73.21%. Finally, we measured the performance of the multiclass SVM model comparing with the SVM model only by using the test statistic, i.e., MSE, MAE, MAPE, R 2 , AIC and BIC. The results in Table 5. showed that the proposed model outperformed the performance of the simulation from the SVM model only.

Robustness Test
We reintroduced the same R Programming software application, which was used to generate the results shoewn in the figures indicated in the Table 3-4, to simulate the OAA and OAA strategies. The simulation procedure was similar to the procedures in 3.1 of which the input data were four different exchange rates, namely; EUR-USD, EUR-CNY, EUR-RUB and EUR-CHF. It is noted that each input dataset contained 2322 data point retrieved from Bloomberg terminal from 2001 to 2011.    In the parameter selection of the SVM simulation software, we selected Laplacian kernel function and compared the results simulations of the OAA and OAO strategies. Table 6 presented performance comparison of the multiclass SVM models using the different input datasets. The Accuracy count for the OAA and OAO strategies reached to 100% for every single simulated input data except EUR-CNY yielded 99.78% and the training errors of each simulation were similar.

DISCUSSION
The mechanism of mean reversion and coefficient of variance algorithm started by classifying each datasets with its mean, giving two separated groups, termed to 'mean 1 +' and 'mean 1 -'; and continue to divide each mean 1 +' and 'mean 1 -' until the minimum and maximum values of the datasets located in both ends. Therefore, we may obtain 'mean 2 +' and 'mean 2 -' inasmuch as 'mean 3 +' and 'mean 3 -' and so on. In this study, we optimise the classification process; and have found six possible CV classes. The future work will be in the area of optimization of kernel functions in the frequency domain.

CONCLUSION
The multiclass algorithm for SVM model consists of mean reversion and coefficient of variance algorithm using for partition and classification nonlinear nonstationary datasets yields a significant outcome, compared with the conventional SVM model without multiclass algorithm. To verify the robustness of the proposed algorithm, the OAA and OAO strategies were introduced. By using a variety of test statistic, the simulation results using different inputs from many exchange rates i.e., EUR-USD, EUR-CNY, EUR-RUB and EUR-CHF confirmed that the proposed multiclass algorithm outperformed significantly the simulations of the conventional SVM model.

ACKNOWLEDGMENT
This study is fully inspired by the collaborations of the Department of Electrical and Electronic Engineering and Centre of Bio-Inspired Technology, Imperial College London. The authors are grateful to Prof. Nicos Christofides, Dr. Peerapol Yuvapoositanon, Janpen