Intelligent Voice-Based Door Access Control System Using Adaptive-Network-based Fuzzy Inference Systems (ANFIS) for Building Security

: Secure buildings are currently protected from unauthorized access by a variety of devices. Even though there are many kinds of devices to guarantee the system safety such as PIN pads, keys both conventional and electronic, identity cards, cryptographic and dual control procedures, the people voice can also be used. The ability to verify the identity of a speaker by analyzing speech, or speaker verification, is an attractive and relatively unobtrusive means of providing security for admission into an important or secured place. An individual’s voice cannot be stolen, lost, forgotten, guessed, or impersonated with accuracy. Due to these advantages, this paper describes design and prototyping a voice-based door access control system for building security. In the proposed system, the access may be authorized simply by means of an enrolled user speaking into a microphone attached to the system. The proposed system then will decide whether to accept or reject the user’s identity claim or possibly to report insufficient confidence and request additional input before making the decision. Furthermore, intelligent system approach is used to develop authorized person models based on theirs voice. Particularly Adaptive-Network-based Fuzzy Inference Systems is used in the proposed system to identify the authorized and unauthorized people. Experimental result confirms the effectiveness of the proposed intelligent voice-based door access control system based on the false acceptance rate and false rejection rate.


INTRODUCTION
The personal safety of the population in public and private building has always been a concern in current daily life. Access control for buildings represents an important tool for protecting both building occupants and the structure itself. One of the important security systems in building security is door access control. The door access control is a physical security that assures the security of a room or building by means limiting access to that room or building to specific people and by keeping records of such accesses.
The most widespread authentication method for such system is based on smartcards. Smartcard limits room or building access to only those people who hold an allocated smart card. However, there is difficulty to prevent another person from attaining and using a legitimate person's card. The conventional smartcard can be lost, duplicated, stolen, forgotten, or impersonated with accuracy. Due to the limitations of conventional security procedures, a range of biometric verification options are currently under consideration for security system including door access control [ 1,2] . In the biometrics methods, the idea is to enable automatic verification of identity by computer assessment of one or more behavioral and/or physiological characteristics of a person. Recently biometrics methods used for personal authentication utilize such features as the face, the voice, the hand shape, the finger print and the iris [1,2] . Each method has its own advantages and disadvantages based on their usability and security [3] . Among the biometrics methods, voice has the high usability characteristics which include the simplicity for the user, feeling of resistance, speed of authentication and level of false-rejection rate [4] .
In order to overcome the problems of the smartcard-based door access control, this paper introduces an intelligent voice-based door access control system for building security. The proposed intelligent voice-based access control system is a performance biometric which offers an ability to provide positive verification of identity from an individual's voice characteristics to access secure locations (e.g. office, laboratory, home). In the proposed system features are extracted from the person voice data and then an Adaptive-Network-based Fuzzy Inference Systems (ANFIS) is used to develop models of the authorized persons based on the feature extracted from the authorized person voices.
First, the prototype of the door access control is described. Next, the speaker verification process used in the proposed system is discussed in detail. Finally, the performance of the proposed intelligent voice-based door access control is evaluated experimentally for door control access in Intelligent Mechatronics System Laboratory, Faculty of Engineering, International Islamic University Malaysia.

PROPOSED VOICE-BASED DOOR ACCESS CONTROL
Proposed system description: Figure 1 shows the schematic diagram of the proposed intelligent voicebased access control. The proposed system basically consists of three main components namely voice sensor, speaker verification system and door access control. A low-cost microphone commonly used in the computer system is used as voice sensor to record the person voice. The recorded voice is then sent to the voicebased verification system which will verify the authenticity of the person based on his/her voice. A personal computer (PC) of 1.5 MHz Pentium III processor equipped with sound card is used for speaker verification implementation. The sound card records the voice data based on the sampling frequency of 22 kHz. In this system, all of the voice data processing and speaker verification algorithms are implemented in the PC using MATLAB and its toolboxes. As a result of the voice-based verification, a decision signal which will accept or reject the access will be sent through the parallel port of the PC to the door access control.
As shown in Fig. 2, an electromagnetic lock is attached in the door for controlling the door opening and closing. The electromagnetic lock works on 12 volts DC power supply and it is set in normally close (NC) condition. Therefore, without command signal from the verification system, the lock is always switched on and the door remains closed. In the case a person is verified by the proposed voice-based verification as an authorized user, the access is granted. The parallel port sends a signal to the electromagnetic lock driver, which is shown in Fig. 3, so that the electromagnetic lock is demagnetized. As a result the door can be opened by that authorized person for a certain period of time.  The access control system in general makes four possible decisions; the authorized person is accepted, the authorized person is rejected, the unauthorized person (impostor) is accepted and the unauthorized person (impostor) is rejected. The accuracy of the access control system is then specified based on the rate in which the system makes decision to reject the authorized person and to accept the unauthorized person. The quantities to measure the rate of the access control accuracy to reject the authorized person is then called as false rejection rate (FRR) and that to measure the rate of access control to accept the unauthorized person is called to as false acceptance rate (FAR). Mathematically, both rates are expressed as percentage using the following simple calculations [5] : NFR and NFA are the numbers of false rejections and false acceptance respectively, while NAA and NIA are the number of the authorized person attempts and the numbers of impostor person attempts. For achieving high security of the door access control system, it is expected that the proposed system will have both low FRR and low FAR.
Voice-based verification system: It is well known that not only conveys a person message, voice of a person also indicates the person identity. Therefore, it can also be used in biometric system. The use of the voice for biometric measurement becomes more popular due to some reasons such as natural signal generation, convenient to process or distributed and applicable for remote access. Basically there are two kinds of voicebased recognition or speaker recognition. Speaker identification is one of the two form of speaker recognition, while speaker verification being the other one [5] . In the speaker verification system, the system decides that a person is the one who he/she claims to be. On the other hand, speaker identification decides the person among a group of persons. Speaker recognition is further divided into two categories, which are textdependent and text-independent speaker recognitions. Text dependent speaker recognition recognizes the phrases that spoken, whereas in text-identification the speaker can utter any word.
The most appropriate method for voice-based door access control is based on the concept of speaker verification since the objective in the access control is to accept or reject a person to enter a specific building or room. Figure 4 shows the basic structure of the proposed voice-based verification system. As other methods of biometric-based security system, there are two phases in the proposed system. First phase is training or enrollment phase as shown in Fig. 4(a). In this phase the authorized persons are registered and their voices are recorded. The recorded voices are then extracted. The features extracted from the recorded voices are used the develop models of the authorized persons.
The second phase in the proposed system is testing or operational phase as depicted in Fig. 4(b). In this phase a person who wants to access the building/room is required to enter the claimed identity and his/her voice. Furthermore, the entered voice is processed and compared with the claimed person model to verify his/her claim. In this phase, there is a decision process in which the system decides whether the feature extracted from the given voice matches with the model of the claimed person. In order to give a definite answer of access acceptance or rejection, a threshold is set. When degree of similarity between a given voice and the model is greater then threshold, the system will accept the access, otherwise the system will reject the person to access the building/room. Feature extraction: As shown in Fig. 4, feature extraction is one of the important processes in the proposed system. Feature extraction is the process of converting the raw voice signal to feature vector which can be used for classification. Features are some quantities, which are extracted from preprocessed voice and can be used to represent the voice signal. In general, there are two types of feature extraction technique, namely; cepstral coefficient feature based and prosodic-based feature such as; fundamental frequency and formant frequency. In this paper, Perceptual Linear Prediction (PLP) coefficients are used as feature in the proposed system. Figure 5 shows schematically extraction process of the PLP coefficient from the raw voice signal. Perceptual Linear Prediction (PLP), similar to LPC analysis, is based on the short-term spectrum of speech. In contrast to pure linear predictive analysis if speech, PLP modifies the short-term spectrum of the speech by several psychophysically based transformations. In this method the spectrum is warped according to Bark scale. The PLP used an all-pole model to smooth the modified power spectrum. The output cepstral coefficients are then computed based on this model [6] .

Fig. 5: PLP coefficient extraction process
In summary, the PLP coefficients are calculated based on the following steps: Critical band analysis: Firstly, the voice data is framed and windowed (using available window function such as hamming window) and then it is transformed into frequency domain using the Fast Fourier Transform (FFT). Then, the obtained spectrum is warped from the radial frequency (ω) to the Bark frequency (Ω) scale using the following formula: The power spectrum, S (Ω) of the warped spectrum is then calculated. Next, the convolution operation is carried out between the warped power spectrum S(Ω) and the power spectrum of a simulated critical-band masking curve ) ( Ω Ψ , which has the following form; The following is the result of the convolution operation: where B is the number of the sample. The sampling intervals are chosen so that when the critical bands are added together it equally represents the frequency scale.
Equal-loudness pre-emphasis: Firstly, an equalloudness curve is constructed. An approximation of this curve for the frequency up to 5 kHz is Then, the simulated equal-loudness E(ω) is used to pre-emphasis the sampled bark power spectrum Θ(Ω i ). The following is obtained from the equal-loudness preemphasis:

Intensity-loudness power conversion:
Here, a cubic root compression of the amplitude is performed as follows: Inverse discrete Fourier transform: The power spectrum that resulted from previous amplitude compression is converted back to time domain using inverse FFT (IFFT).

Calculation of the all-pole coefficients:
The autocorrelation signal can be used to calculate the allpole coefficients using the well-known the Levinson-Durbin algorithm. Detail discussion on the PLP coefficient can be found in [6] .
ANFIS-based speaker model: ANFIS proposed by Jang [7] is an architecture which functionally integrates the interpretability of a fuzzy inference system with adaptability of a neural network. Loosely speaking ANFIS is a method for tuning an existing rule base of fuzzy system with a learning algorithm based on a collection of training data found in artificial neural network. Due to the use the less tunable of parameters of fuzzy system compared with conventional artificial neural network, ANFIS is trained faster and more accurate than the conventional artificial neural network.
An ANFIS which corresponds to a Sugeno type fuzzy model of two inputs and single output is shown in Fig. 6. A rule set of first order Sugeno fuzzy system is the following form:

Rule i: If x is A i and y is B i then f i = p i x+q i y+r i .
In the perspective of artificial neural network, it is a feedfoward network consisting of 5 layers. Every node i in the first layer is an adaptive node with the following node function Fig. 6: ANFIS architecture [8]   where x (or y) is the input node i and A i (or B i-2 ) is a linguistic label associated with this node. In other words, O 1,i is the membership degree of a fuzzy set A (or B) to which the input x (or y) is quantified. The membership function for A (or B) can be Gaussian function, triangle membership function and others. The parameters of the membership function used in this layer are termed as premise parameters.
Second layer combines the output of the first layer so that it has the following output: Here each output represents the firing strength of a rule. Next layer, which is third layer, normalizes the output of the previous layer as follows; In the fourth layer, the following output is calculated based on the third outputs: where f is function which is used in the first order Sugeno type fuzzy system. Parameters in this node (p i , q i and r i ) are referred as consequent parameters. Finally, the final output of the ANFIS is the last layer output and it is given as The main objective of the ANFIS design is to optimize the ANFIS parameters. There are two steps in the ANFIS design. First is design of the premise parameters and the other is consequent parameters training. There are several method proposed for designing the premise parameter such as grid partition, fuzzy c-means clustering and subtractive clustering [8] .
Once the premise parameters are fixed, the consequent parameters are obtained based on the input-output training data. A hybrid learning algorithm is a popular learning algorithm used to train the ANFIS for this purpose. In summary, the steps of building person model based on the voice data using ANFIS are as follows: * Voice data collection and feature extraction of the voice data. * Determining the premise parameters. * Training of the ANFIS using the input pattern and desired output to obtain the consequent parameters. * Validation of the trained ANFIS using training data.

Experimental setup:
In order to evaluate the effectiveness of the proposed intelligent voice-based door access control, the proposed system is installed at Intelligent Mechatronics System Laboratory, Faculty of Engineering, International Islamic University Malaysia. Voices of nine (9) speakers from YOHO database are used in the experiment. Three (3) speakers are considered as the authorized person to access the laboratory and the other six (6) speakers are assumed as outside impostors. Each speaker, who is assumed as authorized person, has to say word 'seven' for 70 times where 20 voice data are used as training data and the other 50 voice data are used as testing data. This means the text-dependent speaker verification system is used in the proposed system. The example of raw voice signal of the word 'seven' for an authorized person is shown in Fig. 7. To obtain the PLP coefficients, the 17 critical-band filters are used, which covers a 17 Bark frequency range. These filters are simulated by integrating the FFT spectrum of 20-ms Hamming-windowed speech segments in which the frame rate is 10-ms. Figure 8 shows the 13 PLP coefficients extracted from the voice signal shown in Fig. 7.

Training of the ANFIS-based speaker models:
The ANFIS-based speaker model is developed using Fuzzy Logic Toolbox of MATLAB. In order to allow the ANFIS learn from the input-output data available so that the consequent parameters are obtained, firstly the structure of the ANFIS has to be designed. Design of the ANFIS structure is done by determining premise parameters. Here the subtractive clustering method is used with different radius parameters. Once the premise parameters are obtained, the ANFIS model is trained by using hybrid learning algorithm for 10 iterations. Table 1 shows the training time and the classification rate for all of the ANFIS-based speaker models for different subtractive clustering parameters. As shown in the table, all of the speaker models give perfect classification rates. There are no errors in identifying the authorized persons based on the voice data used in training phase. However, the training time is significantly different for different radius. A smaller radius causes a longer training time. This is due to the fact that a smaller cluster radius will usually yield more, smaller clusters in the data and hence more rules. A more rules of ANFIS system result in a larger number of consequent parameters. As consequent, a longer time is needed in training process to optimize the parameters. Hence, it can be concluded a larger radius are preferable to shorten the training time.   Testing of the ANFIS-based speaker models: Tables  2-4 show the performances of the ANFIS models when testing voice data is used. In term of both FAR and FRR, ANFIS 2 produces a better performance than the other models. Hence it can be concluded that ANFIS 2 is the best candidate as voice-based model in the proposed system. From the security point of view, ANFIS 2 is the best model for protecting the laboratory from unauthorized person (impostors) since it gives the lowest FAR. The overall FAR of the ANFIS 2 is smaller than 10%, which is good enough for common security system. In the case high level of security is needed, further improvement has to be done so that the proposed system produces a small FAR, which is smaller than 1 %. However although the FRR of the ANFIS 2 is also the smallest, its FRR is larger than 10 %. Although it does not influence the level of security, a quite large value of FRR makes the access control system inconvenient for the authorized person. Further improvement needs to be done to improve the level of usability of the ANFIS-based model for access control system.