Adaptive Control of Robotic Manipulators with Improvement of the Transient Behavior Through an Intelligent Supervision of the Free-Design Parameters and the Sampling Period

: An adaptive control scheme for mechanical manipulators proposed. The control loop consists of a network for learning the robot’s inverse dynamics and on-line generating the control signal. Some simulation results were provided to evaluate the design. A supervisor was used to improve the performances of the system during the adaptation transients. The supervisor exerts two supervisory actions. The first one consists of updating the free-design adaptive controller parameters so that the value of a quadratic loss function maintained sufficiently small. The second supervisory action consists of an on-line adjustment of the sampling period within an interval centered at its nominal value.


INTRODUCTION
The problem of designing adaptive control laws for rigid robot manipulators has interested researchers for many years. The development of effective adaptive controllers represents an important step towards highprecision robotic applications. In recent years, adaptive control results for robotic systems have included rigorous stability analysis (for instance [1][2][3][4] ). On the other hand, over the last few years the possible use of learning networks within a control systems environment has been considered ( [5][6] and references therein). Thus, it is important the achievement of good transient performances when synthesizing adaptive control laws. Particular useful tools for that purpose are the on-line updating of the free parameters of the adaptation algorithm and the on-line generation of the sampling period so that the tracking error be improved during the transient. In this paper, an adaptive control scheme for mechanical manipulators is presented that takes advantage of the relationships between adaptive and neural controllers. The control loop basically consists of a simple neural network which learns the robot's inverse dynamics, so that the control signal can be online generated. The synthesized controller involves the use of a supervisor to improve the transient performances since such a strategy was proved to be useful in classical problems of adaptive control to improve the adaptation transients. The proposed supervisory scheme consists of two major actions, namely: * An on-line updating procedure of one of the freedesign parameters of the estimation algorithm. An optimization horizon including a set of samples including past measurements and eventually, tracking error predictions is considered. * The sampling period is generated from an updating sampling law as dependent of the tracking error rate.

Problem formulation:
The vector equation of motion of an n-link robot manipulator can be written as: which is is an nx1 vector of joint torques; Θ, Θ and Θ are the nx1 vectors of joint positions, speed and accelerations, respectively; M( Θ ) is the nxn mass matrix of the manipulator; V( Θ , Θ ) is an nx1 vector of centrifugal and Coriolis terms; G ( Θ ) is an nx1 vector of gravitational terms and F( Θ , Θ ) is an nx1 vector of friction terms. The equations of motion (1) form a set of coupled nonlinear ordinary differential equations which are quite complex, even for simple manipulators. One of the most widely used techniques to design a trajectory following control system for such a device is the .  (1) with the parameters being replaced by their estimates and Θ ' has been calculated as: where the modeling parametrical errors are ; Kp Kv If all the robot's parameters are perfectly known, the closed loop equation (5) takes the following linear and decoupled form since the terms in the right-hand side brackets of (5) become zero: So that it becomes clear that a simple suitable selection of K p and K v can easily regulate the evolution of the servo error. However, although some parameters of a robot are easily measurable, some other effects, such as friction, mass distribution or payload variations cannot, in general, be accurately modeled and thus the assumption of obtaining negligible modeling errors is quite unrealistic in practice. In these conditions, it looks apparent that some sort of adaptive parameter estimation mechanism should be included in the control loop, so that equation (5) became approximately linear and uncoupled and the servo errors could be asymptotically eliminated.
Adaptive control scheme: The equations of motion (1), although quite complex and nonlinear in general, can be expressed in a linear in the parameters form, since all the potentially unknown parameters (link masses, lengths, friction coefficients, etc.) appear as coefficients of known functions of the generalized coordinates. In an adaptive control system design context, one usually takes the advantage of the above property of linearity in the parameters by rewriting (1) as: where P is an rx1 vector containing the robot's unknown parameters and W( Θ , Θ , Θ ) is an nxr matrix of known nonlinear functions, often referred to as regression matrix. In the same way, the rx1 estimated parameters vector P fulfill: where the parameter estimation error ˜ P has been defined as ˜ P = P -ˆ P . Figure 2 and 3 show the adaptive control scheme. The design is a neural extension of the computed-torque control strategy. A two-layered learning network with nxr inputs and n outputs is used to learn the manipulator's inverse dynamics, so that the control law can be on-line generated. The network's inputs are known nonlinear functions of the system response (more concretely, the elements w ij of the regression matrix W (Θ, Θ , Θ ) are defined in eqn. (7)), while its outputs are estimates of the input torques: which is a piecewise constant signal from the zero-order sampling and hold device. Defining the connection weights vector and the estimated torques vector as: Eqn. 10 can be expressed in a familiar matrix form: which is estimated in a discrete-time way, i. e. , it is only updated at sampling instants by the adaptation algorithm. The inverse dynamics is learned as follows: where Eτ k is the prediction error vector, defined as is the regression matrix used for updating the parameters and F k is an adaptation gain matrix which satisfies   (13) is positive definite (at the limit it can become semidefinite) and time-decreasing. The norms taken in (13) are the Euclidean norms. The above approach is then used in the simulated example to evaluate the supervision efficiency. If the manipulator's inverse dynamics is correctly learned by the neural network, both nonlinear dynamics cancel each other according to the block diagram shown in Fig. 3. Thus, the closedloop system turns linear and the closed-loop tracking properties are adjusted with a suitable selection of the proportional and derivative gain matrices K p , K v . This is the same effect obtained using the conventional adaptive control approach described in the previous section.

Supervisor design
A. Heuristic motivation: Note, by inspection, that the learning rule associated with the adaptation algorithm (13)-(14) has an adaptation rate highly dependent on the size of the c k -updating parameter which is a freedesign parameter provided that it is positive and bounded. The adaptation rate is very low when the c ksequence takes very large values compared to the square of the regressor norm. A second action of the supervisor is concerned with the on-line choice of the sampling period within an interval centered around a nominal sampling period. The boundary of the variation domain of the sampling period is established according to ´a priori´ knowledge about guaranteeing closed-loop stability and a prefixed bandwidth. Other considerations as, for instance, the upper limit of the sampling rate or the achievable performance of the application at hand. The overall supervisor is designed for: * An on-line calculation of a free parameter of the adaptation algorithm * A calculation of a time-varying sampling period dependent on the time variation of the tracking error: It is based upon three main rules, namely: Rule 1: If the tracking error is increasing with respect to preceding samples then decrease (increase ) the last value of the sequence of the free -design parameter provided that the previous action at the preceding sample was to increase ( decrease ) the value of such a sequence. [In other words, if the tracking performance is deteriorating then make an action to correct the supervision philosophy of the last action exerted on the value of the free parameter f the algorithm].

Rule 2:
If the tracking error is decreasing with respect to preceding samples then decrease (increase) the last value of the sequence of the free design parameter provided that the previous action was to decrease ( increase) it . [In other words, if the tracking performance is being improved then do not modify the last action exerted on the value of the free parameter].  ≤ σ < is the forgetting factor of the loss function . Note that E j for j > k are predicted tracking errors in the loss function for each k -th sample. In this paper, the free design parameter in (13) is c k which has to belong to an admissibility interval compatible with the stability constraint, i. e. it has to be positive and bounded . The horizon sizes, weighting matrix and forgetting factor of the loss function are chosen by the designer according to the next design criteria: a. How relatively important each robot articulation is compared to the remaining ones. This idea is relevant top the choice of the Q (.) -matrix. In Fig. 4, the third articulation could be considered more important, if suited, since it has to follow a reference related to the final trajectory for each specific application. If the matrix is chosen as diagonal with positive identical diagonal entries then all the articulations are considered equally relevant and then all the tracking error components are introduced with identical weights in the supervisory loss function.

B. Supervisory action on a free -design parameter of the adaptation algorithm: Define the loss function
b. The relative weight in the loss function given to the more recent measured errors and their next immediate predictions is large compared to the older ones and subsequent future ones, respectively. c. The relative weight in the loss function given to the past tracking errors (correction horizon) compared to the predicted errors ( prediction horizon) . The supervisory action over c k is described in the following algorithm:

C. Supervision of c k
Step 0: Define with c min > 0 , c max > c min > 0 as the admissibility domain for the free parameter c k of the adaptation algorithm (13) . Define also the loss function J according to the above supervisory design criteria (a) to (c).
Step 3: k → k    and then Go to Step 1.

End
If the loss function value decreases the supervisory policy has to be kept . The saturation g k for the modification of Θ k in Step 1 guarantees that c k is upper-bounded by a predefined positive design constant K . The small positive constant c is used to avoid division by zero in the parameter estimation (13) when the measurement regressor is zero. The supervisory learning rule also ensures, apart from the above mentioned saturation, that the eventual corrections on the choice of the parameters increase as the efficiency deteriorates. D. Error prediction: The measurements of the loss function in the prediction horizon are calculated by extrapolations of preceding predictions or real measurements by using a Taylor series expansion approximations with finite differences using sampled values according to: with T being the sampling period for any signal f ( t ) and the i-th derivative f k where a discrete approximation of the sampling period is selected according to considerations of stability, bandwidth and the requirements on performance of each particular application. The above sampling law is tested in the simulations to evaluate the performance improvement of the sampling rate updating for the transient adaptation. The above sampling law as well as other five updating sampling laws listed below are then comparatively tested in the simulations to evaluate the various improvements caused by the sampling rate adaptation over the basic free-parameter adaptation. The supervisory technique can be also applied to the forgetting factor by making it time-varying with λ k ∈ 0 , 1 ( ] to ensure closed-loop stability of the adaptive scheme. A useful technique is to modify the Supervision Algorithm of Section 4.2 to on-line estimate the forgetting factor which has to belong to the admissibility domain some constant δ ∈ (0 , 1) with the change c → λ > 0 and Step 2 is modified with the replacement c k → λ k = ρ k λ k −1 + λ with ρ k being computed as above and Subsequently, the free parameter of the adaptive algorithm k c is chosen according to the rule with T r ≥ Trace ( F 0 ) ≥ Trace ( F k ) > 0. The trace of the adaptation matrix remains upper-bounded by a prefixed finite bound T r for all time in spite of the fact that the adaptation gain matrix is not necessarily timedecreasing.

F. Closed-loop stability:
The following result, whose proof is omitted, proves that both the basic (supervision-free) system and the supervised ones are stable.

Theorem 1 (Stability results):
The following two items hold: i. In the absence of supervision, the estimated parameters are bounded if their initial conditions are bounded and the initial adaptation covariance matrix is positive definite. Also, the closed-loop system is globally Lyapunov' s stable so that the output, input, estimation error and tracking error are all bounded provided that the reference trajectory is bounded.
ii. If the algorithm free-parameter c k (or, alternatively, the forgetting factor) is supervised by the given rule while respecting its positivity and boundedness (while belonging to the range (0, 1]) then (i) holds. If the sampling period is supervised (with the free-parameter being supervised or not) during a finite time interval within its admissibility domain then the (i) still holds. ]. Moreover, we assume that the center of mass of link 3 is located at the proximal end of the link, that is, it coincides with the center of mass of m 2 . The simplified planar manipulator with three degrees of freedom can reach arbitrary positions and orientations in the plane. The elements of the dynamic equation (1)  q , q , q = (10º,-50º,-20º) in 0.5 seconds, following smooth cubic trajectories, defined by: q 1 (t) = 10 + 0.12 t 2 -0.016 t 3 , t ≤ 0. 5; q 1 (0) = 10, t ≥ 0.5 q 2 (t) = -50 -0.12 t 2 + 0.016 t 3 , t 0.5; q 2 (0) = -50, t 0.55 q 3 (t) = -20 + 0.72 t 2 -0.096 t 3 , t ≤ 0.5; q 3 (0) = -20, t ≥ 0.5 It is assumed that that the link masses m 1 , m 2 , m 3 , as well as the mass distribution I zz and the viscous friction coefficient v 1 of the manipulator are unknown. A two-layered neural network with twelve inputs and three outputs will be used to learn the robot's inverse dynamics and to on-line generate the control signal. In particular, the values

D. Summary of the performance efficiency:
Exhaustively worked examples led to the following conclusions. The free-design parameter supervision and the use of a tracking error time-variation dependent adaptive sampling are successful to improve the transient performances. It is important to select properly the bounds for the sampling period according to the stability, bandwidth and applications requirements from ´a priori´ knowledge on the system.