Discrimination-based Artificial Immune System: Modeling the Learning Mechanism of Self and Non-self Discrimination for Classification

: This study presents a new artificial immune system for classification. It was named discrimination-based artificial immune system (DAIS) and was based on the principle of self and non-self discrimination by T cells in the human immune system. Ability of a natural immune system to distinguish between self and non-self molecules was applicable for classification in a way that one class was distinguished from others. We model this and the mechanism of the education in a thymus for classification. Especially, we introduce the method to decide the recognition distance threshold of the artificial lymphocyte, as the negative selection algorithm. We apply DAIS to real world datasets and show its performance to be comparable to that of other classifier systems. We conclude that this modeling was appropriate and DAIS was a useful classifier.


INTRODUCTION
Biological mechanisms have attractive applications in computer science. Many algorithms, called biologically inspired algorithms, have been proposed; for example, neural networks, genetic algorithms (GA), genetic programming and artificial immune systems (AIS). AIS is the algorithm mimicking the human immune system for solving various problems [1] and has been studied in many fields, including computer security [2] , information filtering [3] , data clustering [4] , concept learning [5] , classification [6] , etc. These studies show that AIS is effective for data analysis.
One of the main objectives of AIS is to apply the immune metaphor or mechanism to computations [1] . In a natural immune system, recognition of the T cell is one of the most important mechanisms. It allows the immune system to discriminate self and non-self molecules. This characteristic is conducive to machine learning, particularly, classification and concept learning, in terms that one class (self) is distinguished from other classes (non-self).
In AIS study, however, the main topic has been the recognition mechanism and network theory of B cells [4,5] . Although the mechanism of T cells is essential to immune discrimination, there are a few studies in which this concept is applied to classification problems. Most of these studies either over-simplifies fundamental processes [7] or are too complex by featuring GA with many parameters [8,9] . We need to research the generalized modeling of T cells and then the learning mechanism to apply these to machine learning.
We developed new AIS that is applicable to the classification of data sets by incorporating the training and recognition mechanisms of T cells. It is called a discrimination-based artificial immune system (DAIS). In DAIS, we introduced a variable recognition threshold and a new negative selection algorithm that decides the threshold. DAIS employs only immune inspired algorithms with five parameters, of which only one parameter is important. In this paper, we explain the concept of DAIS and its algorithm. Then, we demonstrate its behavior by applying it to real world data sets. The results show that DAIS has comparable properties to other classifier systems and the mechanism of T cell is useful for classification.

IMMUNE SYSTEM
A biological immune system consists of two different response systems: innate and adaptive immune systems. By cooperation of these, the whole immune system defends the body against harmful foreign antigens. The innate system responds to foreign antigens. This mechanism has not been used in artificial immune systems. In contrast, the adaptive part functions against specific targets, which has been available for artificial immune systems.
The important functions of the adaptive immune system are to recognize, eliminate and memorize foreign antigens, as well as to discriminate between self and non-self molecules. This means that the immune system is able to react only to non-self molecules. The adaptive immune system consists of two kinds of lymphocytes. One is the B cell, which originates in the bone marrow and has the ability to produce antibodies. B cells can recognize the shape of foreign antigens, eliminate and memorize them. The other is the T cell, which can discriminate between self and non-self molecules in a body. T cells acquire the ability to discriminate by education in an organ named the thymus.
What happens in the thymus? If the T cell recognizes self-molecules, it causes apotosis and dies. This is called negative selection. If it does not react, it is tested with non-self molecules. If the T cell is not able to recognize them, it dies. So, T cells that can react with non-self molecules survive. This is positive selection. T cells that can pass through the thymus are able to react to non-self molecules but unable to react to self molecules ( Fig. 1).
After training in a thymus, T cells circulate throughout the body. When B cells react with foreign antigens, they need the help of T cells. B cells do not react with self molecules if T cells do not recognize it. As a result, the adaptive immune system does not attach self-molecules.

DISCRIMINATION-BASED ARTIFICIAL IMMUNE SYSTEM
We prepare a discrimination-based artificial immune system (DAIS), which consists of artificial lymphocytes (ALC), a learning algorithm and a classification method. In this section, first, we describe the idea which is inspired by education in a thymus. Second, we prepare the algorithm to apply this idea to classification, which is the basic algorithm of DAIS. Third, the whole learning algorithm of DAIS is described and finally, we explain the classification method of DAIS.  Modeling of the immune receptor is one of the most important items, because the immune reaction is based on recognition via the receptor. In AIS, it is represented in shape space [1] . It is a phase space whose axes represent the properties of the receptor; for example, the length, width, height of any bump or groove in the combining site, electrostatic charge, etc. This means that the position in shape space represents the shape of a receptor. The distance between two positions is the affinity between these shapes. The smaller the distance, the higher the affinity. If the distance is smaller than a threshold, these interact with each other.
To apply education in a thymus to classification, we need to understand how to represent this mechanism in shape space. There are two kinds of region in shape space. Thymal education makes it possible for survived T cells to be distributed in only one region called the non-self region (Fig. 2). The other region is named self region. Negative selection removes T cells from the self-region and positive selection chooses T cells distributed in the non-self region. This is very useful for classification, because the data set belonging to the same class becomes clusters.

Basic learning algorithm:
In this subsection, we describe how to apply the above idea to classification. This algorithm consists of three big parts: self and non-self classes, artificial lymphocyte and learning processes. Also, the data set is normalized.

Self and non-self classes:
To apply the self and non-self regions and the mechanism of dividing shape space to classification, it is necessary to divide the data set into two classes: self and non-self classes. Self-class consists of only one class in the problem and non-self class consists of all the remaining classes. The immune system categorizes self-molecules as self-class and all other molecules as non-self class. Data belonging to self-class are named self data and those in the non-self class are named non-self data.
Artificial lymphocyte: In the immune system, each T cell recognizes these molecules. AIS uses artificial lymphocytes (ALC) to recognize data individually in the same way. The ALC consists of a receptor, a recognition distance threshold (RDT) and a self class.
The receptor is represented in shape space. Its definition is such that, if there is a distance (Euclidian distance, hamming distance, etc.) between an ALC and individual data, the ALC can recognize it.
Self class of an ALC is one of the classes. The remaining classes are non-self class for the ALC. Self class is decided in a learning process (next subsection). If class i is a self class in a learning process, self class is i . Then, ALC i represents the ALC who's self-class is i and the ALCs whose self class is the same are named the same kind of ALC.
Learning processes: Next, we describe the algorithm to acquire the ALC i set so that these ALC i cannot recognize the data belonging to their own self class ( i ). This is inspired by education in a thymus and consists of iteration of some processes. These processes are proliferation and mutation (clonal expansion), determination of RDT (negative selection) and activation and selection (positive selection). After passing through these processes, the ALC i is named memory ALC i .
In the proliferation and mutation process, one ALC i produces clones. Mutation of a cell is done in the immune system and it is important to use this for computation. Changing the receptor of ALC partially makes a clone. This process is the same as the evolutionary computations and is known to be effective.
In negative selection, we reverse the idea. We don't choose the ALC that cannot recognize the self data set, but decide the RDT of ALC so that it cannot recognize the self data set. The reason is the computational cost. With a real thymus, only about 3% can survive among an enormous number of T cells and this is costly. RDT is determined by multiplying the minimum distance with the self data set by the constant number α equal to or less than 1 (Fig. 3). This ensures that ALCs cannot recognize the self-data that are used as the training data. It introduces the variable recognition threshold of ALC and a new negative selection algorithm. Parameter α is the most important in DAIS.
In positive selection, an ALC i is selected in order to perform two kinds of activity. To decide the activities, the non-self data set is divided into two kinds. One is the data that are not recognized by the memory ALC i s and the other is the data recognized by them, which are represented as U i and R i , respectively. The primary activity of an ALC i is the number of U i that it can recognize. The secondary activity of an ALC i is the number of R i that it can recognize. An ALC i is selected in the order of the primary activity. If some ALC i s have the same activity, they are selected in the order of the secondary activity. If some ALC i s have the same secondary activity, they are selected in the order of RDT. The larger RDT an ALC i has, the wider it can cover. The remaining ALC i s are removed. These processes make an ALC i more effective.
The above processes are repeated to obtain the more useful ALC i. . It is the only one allowed to pass (Fig. 4). This is repeated a predetermined number of times. The last selected ALC is stored as a memory ALC. The learning process is as follows in detail. Through three big processes with one ALC, we get the more effective ALC. First, in the proliferation and mutation process, the ALC produces its clones. Next, the self data set is presented to these clones and their RDT is determined. In the final process, they are activated by non-self data set and the best ALC is selected.
1. Set class i to self class and the remaining classes to non-self class. 2. Select one among the non-self data set, which the memory ALC i s have not recognized. 3. Make an ALC i whose receptor is the same as a selected non-self data. Remove the ALCs that cannot recognize the data selected in Step 2. (Activation) 8. Select the ALC whose primary activity is the largest. If more than one ALC has the same activity, select the ALC whose secondary activity is the largest among them. If more than one ALC has the same activity, select the ALC which has the biggest RDT. Then, remove the rest. (Selection) 9. If the number of repetitions from Step 4 to Step 8 reaches 100, go to Step 10. If not, return to Step 4. 10. Store the selected ALC in Step 8 as a memory ALC i . 11. If all non-self data are recognized by the set of the memory ALC i , the learning is complete. Otherwise, return to Step 2.

Demonstration:
We demonstrate the behavior of the basic learning algorithm and investigate its recognition mechanism visually. This algorithm is applied to two kinds of artificial data set: linear and non-linear separable data sets. Both of them are two-dimensional  two-class problems. They are trivial, but our purpose here is to demonstrate their behavior. Figure 5 is the result for linear separable data set and Fig. 6 is one for non-linear separable data set. Closed circles and triangles represent the data belonging to class 1 and 2, respectively. Class 1 is self class. Open circles are the memory ALC 1 s. These show that the memory ALC 1 is distributed in the non-self class region (class 2). Therefore, the prepared learning algorithm allows the distribution of the memory ALC in a non-self region.
Whole learning algorithm: Generally, the number of classes in a classification problem is more than three. Therefore, DAIS needs to acquire all kinds of memory , N c : the number of class) and classify the unknown data using these memory ALCs. The method to get all kinds of ALC is to repeat the basic learning algorithm (section 3.2) N c times. At first, class 1 becomes self class. Next, class 2 becomes self class. DAIS repeats this until N c becomes self class. This is the whole learning algorithm of DAIS.
After the learning process, there are all kinds of memory ALC i s that can distinguish between one class (their own self class) and the remaining classes.

Classification method:
In the classification process, all memory ALCs are used. The classification method is an elimination process. If data are recognized by an ALC i , the data do not belong to the self class of ALC i (class i ). If only memory ALC i cannot recognize a data, the class is determined as k. For example, in the case of three classes problem ( , if data are recognized by both ALC 0 and ALC 1 , then they belong to class 3 (Fig.  7, left). This method has two problems. One is that the data are not recognized by more than two kinds of ALC. This case is named hole (Fig. 7, center). The other is that the data are recognized by all kinds of ALC. This case is named overlap (Fig. 7, right).
In the case of hole, DAIS sets a new threshold (rdt) to the memory ALCs that cannot recognize data. rdt is the value obtained by multiplying RDT with If there is only one kind of ALC which cannot recognize data, then its own self class is the class of the data. Otherwise, w increases ( δ + = w w ) and rdt is calculated again. This process is repeated until the above-mentioned condition is satisfied. However, if rdt is too big and all kinds of memory ALCs can recognize data (overlap), w  ) and δ is made half ( ). Then, the above process is repeated.
In the case of overlap, DAIS sets a new threshold (rdt) to the memory ALCs that can recognize data. rdt is the value obtained by multiplying RDT with w If there is only one kind of ALC that cannot recognize data, then its own self-class is the class of the data. Otherwise, w decreases ( δ − = w w ) and rdt is calculated again. This process is repeated until the above-mentioned condition is satisfied. However, if rdt is too small and more than two kinds of memory ALCs cannot recognize data (hole), w increases ( δ + = w w ) and δ is made half ( ).

Accuracy:
We applied DAIS to four real world data sets and compared these results with those of other classifier systems. Data sets chosen are Iris, Ionosphere, Diabetes and Sonar. These sets are all obtained from the UCI machine learning repository [10] . Iris and Diabetes have a few features, but Ionosphere and Sonar have many features. Their details are as follows (Table 1). Table 2 shows the accuracy and ranking of DAIS and the other classifiers [6,11] . These were obtained by averaging ten runs of DAIS. For Iris, Diabetes and Sonar data sets, a five, ten and thirteen-fold cross validation scheme were employed, respectively. For Ionosphere data set, the first 200 items were chosen for the training set and the remaining 151 items for test set. DAIS compares well with some of the best general purpose classifiers available. Especially, for Ionosphere and Sonar data sets, DAIS showed high accuracy and its rank was high; proof that DAIS is useful for pattern classification with a lot of features.
To obtain these results, we changed only parameter α. The value of α is 0.85, 0.8, 1.3 and 0.65 for Iris, Ionosphere, Diabetes and Sonar data sets, respectively. The best value is dependent on the problem. However, it is easy to optimize only one parameter.
We also compared DAIS with artificial immune recognition system (AIRS), which is a classification algorithm based on the mechanism of B cell's recognition [6] . For Iris and Diabetes, the accuracy of DAIS is about 1% lower than AISR's. However, for Ionosphere and Sonar data sets, DAIS's accuracy is about 2% and 5% higher than AIRS's. If the problem involves features, DAIS is more useful than AIRS. This means that the mechanism of T cell discrimination is as useful as that of B cell for classification. Now, we discuss the accuracy of the training set. DAIS is applied to the satellite image data set (STATLOG version) obtained from the UCI machine learning repository. This offers 36 features and 5 classes. In addition, it provides a training set with 4435 items and a test set with 2000 items. The result was obtained by averaging the ten runs. Table 3 shows the results that compare the accuracy of DAIS with one of the other systems for training and test sets. DAIS's accuracy is only 0.5% lower than the accuracy of the best system. In addition, DAIS's accuracy for training set is 100%. This is the unique characteristic of DAIS. Also, it is ensured that α is equal to or less than 1. This means that DAIS appropriately models the training mechanism in a thymus. DAIS's complete accuracy for a training set is different from over-training of neural network. Since DAIS shows high accuracy for a test set and considering the results of the training and test set, DAIS is the best classifier system for a satellite image data set.

Property:
In this subsection, we discuss the characteristics of DAIS for the value of α. We focus on the accuracy and size of memory ALCs. It is important to decrease the memory size, because it makes computational cost low when DAIS classifies unknown data. The cost is proportional to the memory size, since DAIS classifies data using all the memory ALCs.
The relationships between α and accuracy for the Ionosphere data set are depicted in Fig. 8. This shows that the accuracy falls when α is around 1. This is a common property for the other data sets, except in the case of the Diabetes data set. However, there is no other common property.
In contrast, the number of memory ALCs monotonically decreases with α (Fig. 9). The longitudinal axis represents the ratio of the number of memory ALC's to the number of training data. The reason for this decrease is that the large value of α makes a recognition domain of an ALC become large. Therefore, it is desirable for the value of α to be as large as possible. Therefore, the best value of α is a trade-off between the accuracy and the memory size.
These results mean that small α and large memory size do not always make the accuracy high and we must seek an optimized value of α for each problem.

CONCLUSION
This study introduced a new learning system based on the mechanism of the immune system. The results show that it is useful to apply the metaphor of self and non-self discrimination by T cell to classification problems. We prepare the method by deciding RDT with α. This is a novel algorithm in the field of AIS. The accuracy of DAIS is sufficiently high; if the value of α is equal to or less than 1, the DAIS's accuracy for training set is 100%. This shows that the prepared negative selection algorithm is acceptable for the T cell and training mechanisms.
DAIS is applicable to various classification problems. Unlike a neural network, DAIS does not need to optimize self-structure. It automatically determines the distribution of the memory ALCs. In addition, optimization of parameters is easy, because it only needs to optimize the value of α. These properties are important for classification. We think that DAIS is a general-purpose classification method. Thus, it can be said that the new negative algorithm and DAIS are useful for classification.