© 2006 Science Publications Design of Output Codes for Fast Covering Learning using Basic Decomposition Techniques

We propose the design of output codes for solving the classification problem in Fast Covering Learning Algorithm (FCLA). For a complex multi-class problem normally the classifiers are constructed by combining the outputs of several binary ones. In this paper, we use the basic methods of decomposition; one per class (OPC) and Error Correcting Output Code (ECOC) with FCLA, binary to binary mapping algorithm as a base binary learner. The methods have been tested on Fisher’s well-known Iris data set and experimental results show that the classification ability is improved by using ECOC method.


INTRODUCTION
In the last two decades, binary neural networks (BNNs) have attracted attention of many researchers and now there have been many established approaches for the construction of BNNs. They include Boolean Like Training Algorithm (BLTA) [3] , Improved Expand and Truncated Learning (IETL) [8] . In these methods, predefined output codes are used for the representation of multiple classes. Using predefined output codes makes the problem independent of the specific application and class of hypotheses used to construct binary classifiers [9] . Experimental work has shown that output coding can greatly improve various performance parameters like generalization, prediction accuracy [1] etc.
Several output coding methods have been suggested and tested so far, such as comparing each class against the rest (One Per Class: OPC), comparing all pairs of classes (Pair Wise Coupling: PWC), random codes, exhaustive codes, Error Correcting Output Codes, Margin Classfiers [1,5,6,7] .
In this paper, we extend Fast Covering Learning Algorithm (FCLA) [2] for multi-class problem (i.e., Kclasses, where K>2). Further, this paper addresses the design of output codes for a binary to binary mapping learning. In our work, we use two output coding schemes One-Per-Class (OPC) and Error Correcting Output Code (ECOC). Output Coding of multi-class problems is composed of two stages. In the training stage , we need to construct hidden layer by independent K binary classifiers where K is the number of classes to be learned. The output layer is then constructed by training of number of neurons as per the coding scheme used.
In the second stage, the classification part, the applied sample is predicted by combining various binary classifiers. OPC separates one class from all other classes and ECOC consists of several dichotomizers with class redundancy to get robustness in case some dichotomizers fail [5,6,7] . ECOC approach improves the generalization performance [1,5,7] . These coding schemes are used for output coding for the training phase of the neural network. In the reconstruction stage, when new samples come, some similarity measure is required to find out the class to which it belongs, if the generated string is in binary form, the hamming distance criteria is being used for deciding the class to which new sample belongs [5,7] .
In case of OPC, for the training of output layer, a class is separated from the rest of the classes. Therefore, at the output layer, a single neuron per dichotomizer is taken to collect the outputs from the hidden layer neurons of their respective class. The weights and thresholds in the output layer are set to one for each of the dichotomizer/neuron.
In ECOC [1] , each class is assigned a unique binary string. We refer to these strings as codewords. Then we train K classifiers at the hidden layer and l number of output neurons at the output layer (where l is the length of the codeword). The predicted class is one whose codeword is closest to the output generated. The similarity measure is the Hamming distance ; (i.e., the number of bits different from the codeword bits).
We show that the use of ECOC method for FCLA improves the generalization capabilities over the OPC. This comparison has been tested by experimenting on Iris data set. Also, utilizing binary to binary mapping algorithm, convergence problem has been resolved as compared to backpropagation algorithm. Thus training time has been reduced. The use of integer weights and thresholds reduces prediction time also, as computations have been reduced.
In section 2 we discuss the basic concepts for extending the FCLA framework. In section 3 and 4, we present the formulae used under training and training algorithm of FCLA. In section 5, the extension of the FCLA framework is presented. Section 6 gives one illustrative example and in section 7 performance comparison is given, In section 8 we give concluding remarks.

BASIC CONCEPTS
Let s={x 1   For each of the k classes, FCLA [2] algorithm can be applied separately for the training of hidden layer. Thus for each of the k-classes the FCLA algorithm can be applied in parallel in order to find out the hidden layer neurons with respect to each and every class. For combining the outputs of the hidden layer neurons, FCLA approach can be extended for the training of output layer by using either of the two coding schemes: OPC or ECOC and three layered network structure is formed as depicted in the figure 2. For deciding the output codes for each of the class, let s 1 ,s 2 ,…s k be k distinct binary strings of length L. The length of the string will depend on the type of decomposition method used: OPC or ECOC. We call each string S i the codeword for class c i . Now define L hypotheses i.e. f 1 ,f 2 ,…,f l .
For OPC, f 1 ,f 2 ,…,f k hypotheses are learned, one function f i is defined for each class, such that f i (x)=1 if f(x)=c i and zero otherwise. During learning, a set of hypotheses , {f 1 ,f 2 ,…,f k } is learned. To classify a new example, x′, we compute the value of f i (x′) for each i. The predicted value of f(x′) is the class c i for which f i (x′) is generating 1.
For ECOC, L hypotheses f 1 ,f 2 ,…,f l for a class c i if i=1, then f i =1 for all i=1 to L otherwise there are alternating runs of 2 k-i zeroes and 2 k-i ones.
During learning, the hidden layer neurons are trained using two class learning algorithm to learn each of g j function of x 1 ,x 2 ,….,x m examples. The output layer neurons are trained depending on the coding scheme used for the classification OPC or ECOC, presented in the next section. The output layer have L hypotheses {f 1 ,f 2 ,…,f l }.
To classify a new example, x′, we apply each of the learned function g j to compute binary string s′=<f( Then we determine which codeword s i is nearest to this s′ . The predicted value of f(x′) is the class c i corresponding to the nearest codeword (having minimum Hamming distance) s i .

FORMULAE USED: FAST COVERING LEARNING ALGORITHM
While constructing the BNN, suppose that {x 1 , x 2 ,…,x v } are v (true) vertices included in one hypersphere. The centre is defined as follows [2] : three radii are defined as follows: formulae for weights and threshold value of a neuron:

TRAINING FOR THE CONSTRUCTION OF NETWORK
For our extension, there are two broad steps involved in the construction of network: A. Training of hidden layer: The training of hidden layer is done in parallel for each of k classes using FCLA [2] as follows:

B. Training the output layer
According to FCLA [2] , at the output layer a single neuron is needed to collect the outputs of all the hidden neurons with respect to a two class problem as depicted

EXTENSION OF FCLA FRAMEWORK
We now use coding schemes for extending the FCLA framework for solving classification problems figure 3. We use two coding schemes for the construction of output layer : (1) OPC scheme, (2) ECOC scheme. The number of neurons required at the output layer depends on the coding scheme used.

A. Construction of hidden layer
For a given K-Class problem {G 1 ,G 2 ,……….G k }, for each & every class, we separately apply FCL [2] Algorithm 1. Thus hidden neurons are evaluated for each of the classes. After this, for collecting the outputs of the hidden neurons, we propose the approach in the next section.

B. Training Of Output Layer
The outputs generated by the hidden layer are combined at the output layer.

ILLUSTRATIVE EXAMPLE
We illustrate the proposed approach with an example mentioned below: Approximation of the following regions mentioned as A, B, C, D, E in the figure can be done by 6*6 grid. Table 1 gives the approximation of these regions through 6-bit binary values.  Table 1:Data sets with respect to the approximated regions.
Applying Algorithm 1 of section 2, the results of the construction of hidden layer is as follows:     As discussed in section 2, figure 2, three layered network structure is formed : input layer, hidden layer and output layer. Input layer doesn't contain any processing element, these are just nodes for providing inputs to the hidden layer. Hidden and output layers contains the neurons. With respect to table 3, network structure formed is depicted in figure 4. Network structure for Table 4 is shown in figure 5.

Fig. 4: Example Solution using OPC scheme
We make use of the Fisher's Iris data set for comparing the performance of the coding schemes used OPC and ECOC for the designing of classifiers in FCLA. Fisher's Iris Data Set contains 150 patterns for representing three classes [10] . There are 50 patterns of each class. There are four properties on the basis of combination of these properties, the classification have been done. For applying the inputs to the network the each of the four properties of the original pattern have been represented by 7-bit binary equivalent. Thus the   For testing over these pattern, we split each of the 50 patterns for each of the class 40/10 (train/test) data. Testing results show that ECOC performs better in terms of classification accuracy. For Setosa and Versicolor , ECOC is gives 100% accuracy(i.e. classifying all the 10 samples properly). For Virginica, 80% accuracy is achieved with ECOC. Using OPC with the same case, results are not satisfactory.

CONCLUSION
In this paper, we extend FCLA [2] method for multiclass problems by designing classifiers using coding schemes. The hidden layer trained is in modular form. Thus modules in the hidden layer corresponding to each class can be trained independently [4] in parallel, thus reduces training time. For output layer training, the paper has examined the use of Error correcting coding and One Per Class coding scheme for binary to binary mapping learning algorithm. The performance of the method has been compared on the Fisher's well-known Iris dataset. The results shows that ECOC gives more classification accuracy as compared to OPC.