Hybrid Hot Strip Rolling Force Prediction using a Bayesian Trained Artificial Neural Network and Analytical Models

: The authors discuss the combination of an Artificial Neural Network (ANN) with analytical models to improve the performance of the prediction model of finishing rolling force in hot strip rolling mill process. The suggested model was implemented using Bayesian Evidence based training algorithm. It was found that the Bayesian Evidence based approach provided a superior and smoother fit to the real rolling mill data. Completely independent set of real rolling data were used to evaluate the capacity of the fitted ANN model to predict the unseen regions of data. As a result, test rolls obtained by the suggested hybrid model have shown high prediction quality comparatively to the usual empirical prediction models.

distribution) testing data, whereas a poorly generalizing network would give very low errors on the training set and relatively poor performance on the test data. Such a network is said to be over-fitted. Generalization can be understood by considering the bias and variance of the network model [6] . There is a trade-off between bias and variance and a function too closely fitted to the training data (a large network with many parameters) will have a large variance and hence generalize poorly to new data. By smoothing the function (a simpler network with fewer parameters), the generalization improves, but if taken too far will yield a model with insufficient complexity to model the data, yielding a high bias and hence large error.
The key to good classifier performance is to find the network with the best generalization performance to new data and this requires careful consideration of the quantity and quality of the training data (it should be statistically well sampled from the generative distribution) and the relation to the overall size of the network [7] . The problem is likely to be exacerbated for large networks (with lots of independent parameters) and limited amounts of training data [8] .
An often cited empirical requirement for classification problems is that the minimum number of training patterns N MIN required for good generalization performance is given by [5,6] : Where W is the total number of independent network weights and biases and is the fraction of classification errors permitted on the test data.
Practically the use of early stopping or crossvalidation using an independent validation data set are used as termination or selection criteria for network training [6] ; the final network performance being evaluated using a third independent test data set. Good generalization performance is an indicator that the information capacity of the network (reflected in the number of weights) is of the same order of, or smaller than the total information content of the training set. The rationale here is that if the network is sufficiently complex, it can memorize all the features of the data (including noise). For the network to generalise it has to start to just store the important features of the training data. This being stated, there also exists evidence to suggest that generalisation performance depends more on the size of the weights rather than their number.
Recent years have seen the development of sophisticated methods of addressing the over-fitting problem through regularisation techniques. A simple way to implement regularization is using an additive weight decay term in the error function [9] . A more sophisticated basis for regularization can be found in the approach of Bayesian Evidence update techniques for network training which frame the optimisation problem rather differently.
Such an approach is useful as network weight regularization falls naturally into the framework; and additionally it is possible to estimate confidence bounds on the output predictions based on the widths of the posterior probability functions for the weight matrix [10] .
In the current work the authors seek to use a Bayesian implementation of ANN modelling and physical prior knowledge to generate an efficient hybrid hot rolling force prediction model. Hot strip rolling mill description: A simplified schematic diagram of a steel rolling mill for the production of coil plate is presented in Fig. 1. It shows the transformation stages of slabs from entry at the reheat furnace to their exit at the coiler at the end of the mill.
The feed stock for the rolling mill are slabs produced by the continuous casting process in a steel plant (1). These are normally supplied at ambient temperature. The purpose of the reheat furnace (2) is to raise the temperature of the whole slab to the around 1250 °C (.re-crystallization temperature).
On exit from the reheat furnace, there is a build-up of scale on the surface of the slab, due to oxidation, which is detrimental to surface quality. This is removed within the de-scaling box (3), which consists of jets of high pressure water (140 bars).
After the de-scaling stage, the roughing mill (4) produces a breakdown bar (the product between the roughing mill and the finishing mill) by rolling the slab through a series of forward and reverse passes, typically reducing the slab thickness from 200 to 30 mm. The finishing mill (5) is designed to reduce the gauge (thickness) of the breakdown bar to that of the finished coil, while maintaining the desired width. The finishing mill control system is critical as constant mass flow must be maintained in all stands to ensure continuous production [11] . On exit from the finishing mill, the rolled strip is still at elevated temperatures, typically ( > 800°C ), which is above the phase transformation of the coil. Critical quality parameters, such as the mechanical properties and other metallurgical properties, of the finished coil are significantly affected by the cooling process applied in the run-out table (6). On exit from the mill / run-out table cooling system, the hot strip typically has a velocity of up to 15 m/s and can be hundreds of metres in length. The down coiler (7) allows the strip to be converted into a coil of dimensions that can be easily transported. The main characteristics of the considered hot strip rolling mill are given in Table 1 [12] .
Hot strip rolling force empirical model: For a given strip Temperature, a steel grade and a rolling speed, the actual separating rolling force can be calculated by the model of Alexander-Ford [13][14][15] where b is the plate width, R is the work roll radius, k m is the mean constrained flow stress in the roll bite for plane strain conditions (strain resistance), expression under square root is the contact length, h e is the incoming thickness, h a is the exit thickness and Q p is a geometric factor which is strongly affected by the geometry of roll bite and interface conditions between the rolls and rolled strip. The term k m can be calculated in function of a set of hot rolling parameters, like plate temperature T, viscous friction coefficient µ and rolling speed v.
In practice, there is no accurate physical model which describes efficiently the relationship between these hot rolling parameters. They are generally determined by empirical regressions. In fact, many authors proposed different equations for the calculation of the geometrical factor Q p , based on their own experimental rolling data.
An analog situation stands for the hot flow strength k m , where there are many formulas that permit its evaluation from values of temperature, strain and strain rate [16] . To improve the hot rolling force model accuracy, we suggest to combine forward neural networks and empirical knowledges which gives an hybrid prediction model. Fig. 2, the chosen network structure for the regression modelling was a multilayer perceptron (MLP). The reason for this choice was that MLPs have been shown to be universal approximators [19] . The MLP network implementation and training was undertaken in MATLAB TM .

Hybrid ANN prediction model Topology : As shown in
The data was presented to a series of MLP networks with a variable number of hidden nodes arranged in a single hidden layer.
Each network had 6 input nodes corresponding to the variables used in equation 2 and a single output node corresponding to the value of the predicted rolling force F a ..
The empirical model given in equation (3) is associated to the structure of the ANN force prediction model to evaluate the value of the strain resistance k m in terms of strip temperature and other metallurgical parameters which gives an hybrid hot rolling force prediction model. The output rolling force F a , from the second (hidden) layer was given by: Where w ji was the weight matrix of the first layer, w kj the weight matrix of the second layer, b j the bias vector of the first layer, b k the bias vector of the second layer, d the number of input nodes and M was the number of hidden nodes. The non-linear capability of the network was implemented using the tanh transfer function between the first and second layers. The Bayesian Evidence based training is investigated.
When implementing Bayesian Evidence training [10] , the error function was given by: If interest lies only in minimizing the error for a particular weight vector, then the effective value of the regularization parameter depends only on the ratio / .
Besides accommodating regularization in a consistent framework, the Bayesian approach has the additional advantage of providing a mechanism to generate confidence bounds on the output prediction values.
Assuming that the posterior distribution of the weight matrix is Gaussian in nature, it is possible to find the variance corresponding to the mean output y(x,w MP ), i.e. the standard prediction output for the most probable weight distribution. This variance is given by [6] : Where A was the Hessian matrix defining the second derivatives of the error function and g was the gradient of the error function.
The standard deviation, of the predictive distribution can be interpreted as an error bar on the mean value y MP . which has two contributions. The first arises from the intrinsic noise in the target data, the second from the posterior distribution of the network weights. The ease of implementation of these powerful network training paradigms was a major consideration in employing the NETLAB toolbox to realize the network training [10] .
For the Bayesian Evidence training an initial value of =0.01 was employed along with an initial inverse noise variance parameter = 100.
During the Evidence update procedure of the network training, these hyper-parameters were reevaluated iteratively.
Data acquisition and pre-processing: Measurements of several finishing rolling mill variables were recorded for each coil rolled over a five days period of production using IbaAnalyser© , ( an embedded realtime logging software). For our study, the manufacture of a single grade of steel coils (with constant width and exit thickness varying from 1.2 mm to 4.00 mm) was considered.
The patterns were grouped into training, validation and test sets. The output values of F a ranged from 0 to 1.
In order to fully utilise the maximum dynamic range of the tanh transfer function of the ANN networks, The input values were normalised to lie in the range -1 to + 1 before presentation to the networks. Note that all results presented in this paper are plotted as the original (un-normalised) data.
A sample of the most significant input/output variables used in our study for one rolling coil is shown in the curves of Fig. 3.

RESULTS
Bayesian Evidence training algorithms was implemented in MATLAB using the NETLAB toolbox. The performance of the network was evaluated by calculating the mean square error of the true target values from the network predictions.
Where y n are the network outputs, t n the target values, N the number of samples and t , the variance of the target data.
For each of the 1 to 15 hidden node network structures, the error was calculated for the training, validation and test data sets. For Bayesian Evidence training, the best selected network in term of generalization of prediction was for 8 hidden nodes. The performance of the selected hybrid ANN model is tested using measured rolling forces at the last stand of the finishing mill from 422 coils. From Fig. 4, it is shown that the hybrid ANN model is able to predict the rolling force in more accurate way than the analytical model given by equation 2.

CONCLUSION
The efficiency of a hot strip mill can be increased if the amount of rejected material is reduced. A strip is considered as rejected material if it does not meet the requirements of the customer and thus has to be sold as lower quality or has to be re-melted. This last option implies a tremendous amount of extra materials handling and energy costs.
One way to improve the efficiency of the hot rolling mill is the use of a better finishing set-up model for hot rolling force prediction.
Bayesian training method was implemented to construct a series of hybrid ANN structures to model hot rolling force prediction from real input/output data and empirical expressions. It has been demonstrated that the suggested approach produced a superior fit to the data.
This was experimentally verified with a completely independent set of data. It was noted that the Bayesian training algorithm tended to produce smoother overall fitting functions to the training data. A smoothly varying model output was an important characteristic to raise confidence in hot rolling force prediction model. Future work will aim to extend the number of input parameters modelled to allow for different carbon content of steel.