Used for Nonlinear Adaptive Prediction

Neural networks have been widely used for many applications in digital communications. They are able to give solutions to complex problems due to their nonlinear processing and their learning and generalization. Neural networks are one of the key technologies for the communication domain and accordingly a special effort may be expected to be paid to real time hardware implementation issues. In this study, it is proposed a digital hardware implementation of a neural system based on a multilayer perceptron (MLP). The neural system is used for the nonlinear adaptive prediction of nonstationary signals such as speech signals. The implemented architecture of the MLP is generated using a generic elementary neuron (EN). The polynomial approximation method is used to implement the sigmoidal activation function. The back-propagation algorithm is used to implant the prediction task. The circuit implementation architecture is detailed, for achieving real-time prediction for speech signals. The designed ASIC circuit includes a neural network block, an on-chip learning block and a memory used for storing the synaptic weights for updating.


INTRODUCTION
Many physical signals, such as speech, are generated from a nonlinear mechanism and have statistically nonstationary properties, which make the task of their prediction difficult. Artificial neural networks have been widely used as powerful tools for modelling nonlinear dynamical systems. They are also able to give solutions to complex problems in digital communications due to their nonlinear processing, parallel distributed architecture, capacity of learning and generalisation and the efficient hardware implementation. A neural network is well suited for the nonlinear prediction of nonstationary signals by virtue of the distributed nonlinearity built into its design and the ability of the network to learn from its environment.
Digital implementation of neural networks on configurable systems was presented by [1,2] . Also, several commercial hardware solutions that can be used to implement neural circuits have reached the market [3] . The learning algorithm implementation remains the main difficulty when an autonomous system is planned regarding the running frequency. The implementation of the nonlinear activation function of neurons and its derivative used by the learning algorithm, is often solved by a linear approximation [4][5][6] but no implementation method has emerged as a universal solution [7] .
In this study we propose a hardware implementation of a neural system, used for prediction in time-series. An adequate and optimal architecture is used for the implementation of the learning algorithm. As a final result, we propose a digital hardware implementation of a neural prediction circuit on digital ASIC.
Problem position -brief presentation: Neural networks are able to give solutions to complex problems in digital communications due to their nonlinear processing, parallel distributed architecture, capacity of learning and generalization and efficient hardware implementations [8] . In this paragraph we describe a neural system for the nonlinear adaptive prediction of non-stationary signals and demonstrate its application to a speech signal [9,10] . The neural network used here is a multilayer perceptron (MLP) trained with the backpropagation algorithm. This structure is not recursive; it permits days reduction of training and gives a big gain in the time of the hardware implementation.
The prediction system: The system is modeled through the use of a feedforward multilayered neural network fitted with tapped delay lines at its input. Figure 1 shows a schematic diagram of the neural network and its environment. The purpose of the design is to filter a set of samples Y, of the input signal S that is represented by equation: The subscript T denotes transposition. These samples give the past of the signal at the variation of a discrete time t. The aim of the operation is to produce a prediction S P (t+1) of the signal one step into the future. S P (t+1) value is used to update the weight values in the neurons network. The prediction error is injected into the learning bloc.
An example of a similar predictor was given by [9] . The pipelined recurrent neural network (PRNN) gives satisfactory results but is relatively complex for hardware implementation. A non-recurrent neural network may be less complex and easier to implant the entire prediction system on silicon chip. The non-recurrent neural network: The structure of the neural network used in the design consists of a nonrecurrent fully connected multilayer perceptron (MLP). The proposed MLP architecture presents three layers. The first layer is the input layer composed by p neurons. The hidden layer is the second and is composed by q neurons used for intermediate calculus.
The output layer, the third one, is composed by a unique neuron and calculates the predicted value.
Let W1 and W2 denote the synaptic weight matrix respectively for the first and second layers.  The activation function for every neuron in each layer is the sigmoid binary function described by f (x) = 1/[1+exp(-x)], where x is the internal potential of each neuron. Let X1 the output vector of the second layer, calculated by: X1 = f (W1·Y). Assume X2 the output of the third layer expressed as: X2 = f (W2·X1).
A learning algorithm calculates, at each time step, the weight correction factors ∆W1 and ∆W2 in order to update the weight matrixes W1 and W2. The error function is calculated by comparison between the estimated and the real value of the sample, e(t) = S(t) -S P (t+1) = S(t) -X2.
The prediction task of a nonstationary time series, such as speech signals, needs a continuous learning.
Owing to the fact that we cannot estimate the neurons error in the hidden layer, we chose the backpropagation learning algorithm detailed in [11] to correct the synaptic weights. The new matrixes are calculated according to the following equations: W1 new = W1 + ∆W1 and W2 new = W2 + ∆W2. In our case the backpropagation algorithm used for training the neural network is performed by the equations: : update weights in the third layer ∆W k,h (2) =-a·δ k (2) ·y h (1) : update weights in the second layer : the potential of neurons W k,h is the synaptic weight between the neuron k of previous layer and the neuron h of the considered layer. "δ" represents the output error of a neuron. "a" is the learning rate and v is the potential of each neuron before activation. The numbers between parentheses represent the layer number and "y" represents the neuron's output.

SIMULATION RESULTS
Three different speech signals, denoted by S1, S2 and S3 were used to test the nonlinear predictor. These signals are registrations sampled at 8 KHz and coded on 8 bits. The amplitude of the signals is normalized to be in the definition domain of the function f.
The numbers p and q of neurons in the first and second layer are determined by an optimization procedure in order to minimize the square error function E expressed by: Optimal parameters obtained are p=2 and q=12 [12] . Figure 2-4 represent the temporal representation of the various signals. The curve with stars ' ' represents the real signal stacked with the predicted signal represented with cross 'x'. The error curve represents the difference between signals with a continuous line.
The error signal is of reduced amplitude. There is a very light gap between the signals. This gap is due to the procedure of weights update which makes late by re-injection of the error signal.
That is why the maximal value of the error is not significant. Moreover this value does not exceed 31%. The mean value of the error is lower than 4% and the square error is lower than 0.32 %. Table 1 shows the mean, the mean square and the maximum value of the prediction error. Predicted signal S1 Actual signal S1 Error signal   considered, such as the nonlinearity of the activation function, the silicon area and the time delay. The most popular activation function is the sigmoid, often used with gradient-descendent type learning algorithms [11,[13][14][15][16] as represented in Fig. 5. There are different possibilities to implement this function such as look-up tables or piecewise linear approximation. Moreover, by using the gradient descent algorithm, it is necessary to use the first derivative of the activation function described by: The operation realized by an artificial neuron unit in a multilayer perceptron is described by: where x i is the input of the neuron, w i is the synaptic weight related to the considered input, f is the neuron activation function and n is the total number of all inputs.
The implementation of the neuron's nonlinear activation function and their derivatives used by the learning algorithm, is often solved by a piecewise linear approximation [4,5,7,[17][18][19][20] . However, no implementation method has emerged as a universal solution. For hardware implementation, the efficiency criteria for a successful approximation are the achieved accuracy, speed and area resources. If the circuit is on-chip learning, the multiplier used by the learning algorithm (or used by the neuron to multiply the inputs with weights) could also be used in a time-sharing manner for computing an approximation of the sigmoid function. In this paragraph, we propose a polynomial approximation of the sigmoid activation function and its derivative used in artificial neural networks.
Preliminary study: Let P N the set of polynoms of degree less or equal to integer N. Let p* an element of P N , the approximate of the function f in the interval [a, b]. It is important to note that the Taylor development of the function f for a given point in the interval is in general not satisfactory because the Taylor development gives a local and not a global approximation. The following two theorems are fundamental results [21][22][23] . The first one is obtained thanks to Weierstrass and the second to Chebyshev. Let's find an approximation of a continuous function f on interval [a, b].
Using the approximation theorem announced by Weierstrass, it is showing that: for any ε>0, there exist a polynom P such that: Meaning that, any continuous function can be approximated uniformly by a polynom. But, it gives us no information about the polynom's degree. Bernstein [23] announces that the polynom degree can be as large as we want. Now we may use the Chebyshev theorem: P* N is the best uniform approximation polynom for f on [a, b], in the set of polynoms of degree less than or equal to N, if and only if are existing N+2 points x i such that:

Sigmoid function approximation:
The sigmoid is a continuous function and strictly monotonous on ]-∞, +∞[. In practice, we may consider that this function tends to 0 on ]-∞, -5] and to 1 on [+5, +∞[. For complexity reasons, we consider only polynoms of degree equal to one. We apply the preceding theorems to find the best approximation of the sigmoid function on the intervals [-5, -4] to [4,5] by a step of one. It is supposed that the function is constant on ]-∞, -5] and on [+5, +∞[. Let's present an example of approximation procedure within the interval [-5, -4]. The same procedure was applied to find the approximate of the function on the rest of the interval fragments. Let P 1 (x) = a·x +b be the approximation of f on [-5, -4]. The maximal error ε is reached in three points on the interval. Furthermore, the convexity of the function f in [-5, -4] implies that the 2 extreme points with the greatest error are the edges of the interval (Fig. 6).
This means that when the degree of the polynom is one, the function f is approximated on [-5, -4] by the polynom P 1 (x) = 0.01129·x + 0.06248 and the maximum error (ε = 0.0006) is reached at the points -5, -4.5 and -4.
We apply the same procedure for the rest of the intervals. The detailed piecewise approximation results of the sigmoid function f are shown in Table 2.  [17] 2.47% 4.90% Approximation of Alippi and Storti-Gajani [18] 0.87% 1.89% PLAN Approximation [19] 0.59% 1.89% CRI Approximation [5], q = 0 2.41% 11.9% 11.9% CRI Approximation [5]  The error approximation is represented on Fig. 7. The mean error is 0.2%. The maximum approximation error is 1.11%. Because of the error limitation, we accept this approximation. A comparison between the sigmoid approximation result described in this work and others same approximations is resumed in Table 3.
By comparison, we can see that although the used intervals of cut are different, the maximal error for our approximation always remains lower than that found by the other authors.
Approximation of the first derivative function: By application of the same technique, the function g(x)=f'(x) = f (x) [1f (x)] can be approximated by a polynom of degree one. The approximate function P 1 (x) = a·x +b is found at each fragment of the interval.
On the interval [-5, -4] for example, we obtain: a = 0.01101, α =-4.5, b = 0.06107 and ε = 0.0006. The detailed approximation results of the first derivative function f ' are shown in Table 4. The error of derivative approximation is represented on Fig. 8. The maximum error is inferior to 0.5%. This error is very limited and the approximation is retained. We can say that this function is faithfully reproduced.
Digital implementation: The principal advantage of a digital implementation over the analogical one is the flexibility of the design flow. In this paragraph, we describe the digital implementation of the predictor system. The main parts of the target ASIC are: the neural network and the learning algorithm. A memory is used to store the synaptic weights for updating. A controller synchronizes all the parts of the circuit. The neural network system is based on a generic elementary neuron (EN).

The elementary neuron:
Using HDL description of one EN, we generate an architecture composed of a MAC unit, multiplexers and the sigmoid calculus block which is composed of a multiplier and an adder. The general architecture of the digital artificial neuron is shown in Fig. 9.
A local control unit allows the synchronizing between different blocks. A "Start" signal triggers the calculation process by activating the EN processes. 'End' signal indicates the end of the process. Except the first layer, each neuron on each layer has two inputs bus: the input samples and the synaptic weights. The EN outputs will be connected to the next neurons of the next layer. It communicates the calculated value and its internal potential. The neuron's potential value will be used by the learning block.
Neural network Implementation: The EN's architecture is then used to build the entire multilayer perceptron (MLP). Each layer is connected to the next one. Each layer contains the appropriate number of elementary neurons (p and q). A global control unit commands and controls all the neurons and the circuit blocks in the design. The signal "Start" triggers the process. When all the neurons of one layer finish, by raising to high all the "End" signals, the next layer is triggered. A logic "And" gate controls this process. Synaptic weights are stored in an on-chip RAM memory. The output neurons and the synaptic weights are communicated to the learning block. Fig. 12: Layout of the speech prediction ASIC The memory used is a static Single Port RAM. It is used a 0.35 m pre-designed circuit. The delay time is 4 ns. Figure 10 represents a neuron's network architecture with the generic EN.
Learning algorithm implementation: The prediction system needs a real time gradient-descendent algorithm type for learning. We chose the backpropagation algorithm [11] as expressed.
The learning algorithm used here is the well known backpropagation algorithm [11,13] . The arithmetic operations used to perform the equations are addition, subtraction and multiplication. This module, performing the learning algorithm, has the inconvenient of a relatively very slow circuit because of many arithmetic multiplications built on it. It is the critical part in the circuit. Figure 11 shows the detailed architecture of the learning block.
The learning block, reads the synaptic weights from the memory to perform the calculus. Then it stores the new synaptic weights in the memory to be reused next process for prediction. The block calculating f ' is a simple implementation of a linear function using a ROM. The ROM contains the constants needed for calculating the approximation per segment as described in Table 4.
Tests and circuit design: Neural networks systems are known to be able to extract and restore information from a wrong or disturbed data. This ability is due to the capability of the neural network to learn from the

Neural network block
The backpropagation block environment and its distributed non-linearity built into its design. The ASIC circuit was designed using 0.35µm CMOS technology. The circuit area is about 8 mm 2 and contains 250 thousand logic gates. Figure 12 shows the layout of the ASIC.
The minimum delay time is about 25 ns, specifying a maximum frequency of 40 MHz. The critical path is maximized on the learning block, where many arithmetic operations are processed. Circuit emulation shows that it is possible to reach 50 MHz. The procedure of prediction creates some mistakes in calculation, but the stability of the process is assured.

CONCLUSION
In this work, a neural system used for time series predictions has been designed and hardware implemented with a fully digital manner. The system implemented is an interconnection between a neural network, a memory, a controller and a learning module. The neural network part is built on an elementary generic neuron (EN). The device is on-chip learning. Considering the robustness of the ASIC implementation, we observe process stability with some little generated calculus errors, when we use frequencies greater than 40 MHz. The ASIC area is about 8 mm 2 using 0.35 µm technology.