Multi-Phase Support Vector Regression Soft Sensor for Online Product Quality Prediction in Glutamate Fermentation Process

Corresponding Author: Rongjian Zheng Key Laboratory of Advanced Process Control for Light Industry (Ministry of Education), Jiangnan University, Wuxi, China E-mail: zhengrjian@163.com Abstract: Glutamate fermentation is inherently nonlinear, multi-phase and an aerobic fermentation process. As long measurement delays and expensive apparatus cost, on-line measurement of the product concentration is not necessarily available. The present fermentation process monitoring and quality prediction involve manual interpretation of highly informative, however, the concentrations of substrates, biomass and products are only low frequency off-line measurements. In this paper, we propose a novel Multi-Phase Support Vector Regression (MPSVR) based soft sensor model for online quality prediction of glutamate concentration. The glutamate fermentation process can be divided into a sequence of five phases by detecting the trend variation events (also termed as singular points or inflection point) of online measured O2 in the exhaust gas, the Inflection Point (IP) are easily identified through combining Moving Window (MW) with Pearson Correlation Coefficient (PCC). For each estimation phase, SVR soft sensor model are constructed and their performance is evaluated against fermentation data in a 5 L fermenter. The efficiency of the proposed soft sensor model for online product quality prediction has been demonstrated to be superior compared to that of reported techniques in a 5 L glutamate fermentation process.


Introduction
Glutamate is commercially one of the most important amino acids produced mainly by fermentation process, its fermentative production amount exceeds 2.2 million tons annually (Xiao et al., 2006;Khan et al., 2005). Glutamate are widely used for human and animal alimentation, as ingredients of pharmaceutical products, agrochemicals and some other industrial derivatives (Pal et al., 2016).
Like other fed-batch processes, glutamate fermentation process requires a sophisticated operator involvement. In addition to run-to-run modifiability, this could result in abnormal situation, in which any deviation from desired operating regimes could lead to product quality change. This provides a strong incentive for automating operation supervision. During manual operation, a human operator is commonly responsible for setting points to regulatory controllers, performing control actions, process supervising and taking remedial measures when an abnormal condition is detected (Muthuswamy and Srinivasan, 2003). The batch-to-batch change may root in the variation in the raw material quality or the seed culture variations. Typically, during manipulation, the product quality and batch performance are desired to be controlled and monitored by way of offline laboratory assays of concentrations of the product, substrates and biomass, which might take up to 2 h. These laboratory assays are high investment costs, manpower inputs and time consuming, are obtained at low frequencies and hence, may not necessarily acquire timely information about the fermentation status of the batch. Online measurements that are easily acquirable include temperature, pH, dissolved oxygen, agitation speed, exhaust CO 2 and O 2 , whereas these measurements do not show direct state of the process (Doan et al., 2007). Consequently, this will lead to time delay to the quality control of product, since during this period of time, the fermentation process is without precise and continual information on the product quality (Ge et al., 2011). Nevertheless, because of technical difficulty, high investment costs and large measurement delays, the laboratory assays apparatuses are limited use in practical plants. Furthermore, as we known, it requires a very significant effort to develop a first principles model for accurately depicting the fermentation process. Hence, in biochemical plants, soft sensors are used widely to estimate the primary quality variables that are difficult to measure online. An inferential model is constructed between objective variable which is difficult to measure online and process variables which are easy to measure online. (Facco et al., 2009;Kadlec et al., 2009;Kaneko et al., 2009;. To date, many soft sensor methods have been presented for quality prediction objective, including Artificial Neural Network (ANN), Partial Least Squares (PLS) and Support Vector Machine (SVM) (Acuña et al., 2014;Facco et al., 2009;Wang et al., 2014). Recently, SVR, an extension of SVM, has also been receiving increasing attention to solve nonlinear estimation problems. It has been successfully applied in different problems of time series prediction (Kavousi-Fard et al., 2014;Lu et al., 2009;Santamaria-Bonfil et al., 2016;Were et al., 2015).
Besides the inherently non-linear behavior, as we all known, biomass growth undergoes a series of phases in a fermentation process: Lag phase, exponential phase, stationary phase, decline phase (Khan et al., 2005). The metabolism in each stage is different and each stage may have its special nature, using a single model will not be capable of entirely get the dynamic characteristic of the fermentation process. A straightforward method is to divide the fermentation process into different operation phases on the basis of the changes in variable crosscorrelations and model each stage separately. Furthermore, minimize off-line sampling is desirable for the concomitant risk of contamination, at the same time, we need to obtain enough information on product formation and nurture uptake on-line. As a result, it is of critical importance to on-line identification of phases in fermentation process, phase partition is a crucial procedure before multi-phase modeling. The effectiveness of a multi-phase model is problematical without a proper phase division (Doan et al., 2007;Sun et al., 2011;Yao and Gao, 2009;Luo et al., 2016).
In recent years, many phase identification methods have been developed by way of online analytical measurements of important bioprocess parameters such as the biomass concentration, or broth composition measurements, including use of ion chromatography, Near Infrared (NIR) and HPLC. These systems can provide analysis of product concentration, nutrient compositions and other metabolites (Alford, 2006). However, these methods suffer from the aforementioned disadvantages. Another class of approach has focused on using the routinely available online data to qualitatively identify fermentation phases, this class most common methods include process knowledge, process analysis and the process data. A formal framework for inferring process trends from the online variables was exploited (Cheung and Stephanopoulos, 1990) and applied to fermentation process data (Stephanopoulos et al., 1997;Doan et al., 2007). Another method for detecting phase change uses singular points detection based only on online measurements (Maiti et al., 2009;Régis et al., 2008). The reader can find more multi-phase analysis methods in (Yao and Gao, 2009;Camacho et al., 2008;Doan et al., 2007;Luo et al., 2016), which give different kinds of phase identification methods. Knowledge based phase identification fails when the process prior knowledge is not enough to divide processes into phases legitimately and difficult to customise for diverse fermentation processes. Process analysis based phase identification works well when certain required process features are known. Finally, process data based methods carry out phase partition by detecting variation in process data. Compared with the aforementioned two methods, data-driven methods are easier to perform because of their data-driven property. However, their phase partition results obtained by data-driven methods may or may not always consistent with actual operation phases (Sun et al., 2011;Luo et al., 2016).
In this paper, a novel phase partition method and a Multi-Phase SVR (MPSVR) modeling strategy are presented for online estimation and prediction of glutamate concentration. Glutamate fermentation process goes through a number of phases based on serial cell growth, substrate uptake and product formation. Besides, The production of glutamate is an aerobic process, the glutamate fermentation performance and the metabolic flux distribution are affected drastically by the concentrations of dissolved oxygen in the liquid phase in fermenter or oxygen concentrations in exhaust gas (Golobič and Gjerkeš, 1999;Xiao et al., 2006), the fed-batch process can be divided into 5 phases based on the detection of Inflection Point (IP) by online measured O 2 in the exhaust gas, the IP are easily identified through combining Moving Window (MW) with Pearson Correlation Coefficient (PCC). The phase division result agrees well with actual fermentation process, it depends only on on-line measurements and fermentation processes can be easily automated to work. Then, for each estimation phase, SVR soft sensor models are designed for online prediction of glutamate concentration and their performance is evaluated against glutamate fermentation data in a 5 L fermenter. Also, a comparison with Neural Networks (NN) based prediction approach in the literature is presented.

Experimental Methods
C. glutamicum S9114 was used in the present study, it was kept by the laboratory of industrial biotechnology, Jiangnan University. The fermentation conditions and the compositions of medium were the same as those ahead reported (Zhang et al., 2005;Xiao et al., 2006;Ding et al., 2012;Cao et al., 2013;Zheng and Pan, 2016).
C. glutamicum S9114 was cultured for glutamate production in a 5 L bioreactor. PH was controlled in 7.0-7.2 by feeding 25% (v/v) ammonia water. The O 2 and CO 2 concentrations in the exhaust gas were measured on-line by a gas analyzer (LKM2000, Lokas Co., Korea), Dissolved Oxygen (DO) was controlled at various levels by automatically or manually controlling the agitation speed, O 2 Uptake Rate (OUR) and CO 2 Evolution Rate (CER) were computed accordingly. Temperature was controlled at about 32°C. Electronic balances, which was connected to a PC via RS232, was used to compute the glucose and ammonia consumption rates (Zhang et al., 2005;Xiao et al., 2006;Ding et al., 2012;Cao et al., 2013;Zheng and Pan, 2016).

Support Vector Regression
The SVR aims to provide a nonlinear mapping function to map the training data {x i , y i ; i = 1,…n} to a high dimensional feature space (Kavousi-Fard et al., 2014). Then, the nonlinear relation can be represented as follows: where, w and b are the efficients to be adjusted, ϕ(x) denotes a mapping function of the feature space. The empirical risk can be defined as following: where, ε Θ denote the ε-insensitive loss function and is described as follows: Then, an optimum hyper plane can be acquired by utilizing the function. With the help of hyper plane, the training data were divided into two linear separable subsets with maximum separation distance. As it is, SVR is an optimizing problem with objective function is: where, C is the regularization parameter. The constraint conditions of this optimization problem are as follows: * * ( ) , 1,..., ( ) , 1,..., , 0, 1,..., By solving the above describing optimization problem, the coefficients of Equation (1) can be got as below: where, β i is the Lagrangian coefficients. The SVR regression function can be described as below: where, K(x i , x) denotes the Kernel function, it can be described as follows in the feature space: In this study, the RBF Kernel function is utilized, it can be expressed as: where, σ denotes the width of the RBF.

Phase Partition Technique
The statistical model used in this approach is t t Y a bt ε = + + where, 1 ≤ t ≤ n, so t represents the time with the initial time taken as minute 1, b is the slope, it indicates the current variation trend of the O 2 with fermentation time t, a is the intercept and ε are random errors. The errors are assumed to be identically distributed and independent. When working with the online measured O 2 data, this postulation may not be valid. Whereas, this assumption is more likely to be satisfied if time average is used as the response variable (Li et al., 2013;Hess et al., 2001).
After taking time averages, the regression coefficients are estimated by using the least squares method. Thus the estimates of the intercept and slope are given by: where, t and Y are the arithmetic means for t i and Y i .
Since glutamate fermentation is a continuous process, b changes continuously and smoothly relative to Y (O 2 ), we then employ Pearson Correlation Coefficient (PCC) to identify the Inflection Point (IP) of variation trend b. The PCC was developed by Karl Pearson from a related idea introduced by Galton in the late 19th century, it is a well-established measure of correlation and has range of -1 (perfect but negative correlation) to +1 (perfect correlation) with 0 denoting the short of a relationship (Adler and Parmryd, 2010). PCC (r) is given as follows: where, b and T is the means for b i and T i . Fig. 1 gives the phase partition procedure. As the glutamate fermentation continues, the fermentation characteristics will change over time, the Moving Window (MW) technology can be used for tracking O 2 changes. It is essential to discard the old data and add the newest data to the model. In fact, the most challenging thing lies in the selection of the window length, enough information can be included to detect the real change trend of parameter via setting suitable window size. If the window size is too small, the variation trend will be disturbed by the process noises and phases recognition may be lagged when the window size is too long (Yuan et al., 2016). The window length of t is set to be 180 by trial-and-error, while the window length of b is set to be 60. Fig. 2 shows the recognition process of inflection point, the profile of the online measured O 2 is shown in Fig. 2A. As shown in Fig. 2B, the r changes sharply at transition point of two different phases.
Glutamate fermentation experiences about five phases: Growth phase, transition phase, initial and middle production phases, late production stage, end of fermentation phase. Nevertheless, according to offline measurements and analysis, the growth of the cells passes through 5 phases: Lag phase, accelerate phase, decelerate phase, stationary phase and decline phase.

Fig. 1. Phase partition procedure
A key challenge for exploiting such a multi-phase partition is finding a appropriate b in normal fermentation process, analysis and experiments of the fermentation data for all the normal batches revealed that b 1 is set to b 1 <0, while (b 1 /10) <b 2 <0, (b 1 /3) <b 3 <b 2 , b 4 >0 and b 5 >b 4 (b 1 , b 2 , b 3 , b 4 and b 5 denotes phase 1, phase 2, phase 3, phase 4 and phase 5, respectively).

Quality Prediction Based on Multi-phase SVR
The prediction model based on Multi-Phase SVR (MPSVR) for glutamate concentration is shown in Fig. 3.  In soft sensors, the secondary variables are used to act as the inputs of the soft sensor model and the primary variables such as glutamate concentration is employed to act as the output of soft sensor model.
The online measurement variables, such as fermentation time (t), fermentation Temperature (T), pH, Dissolved Oxygen Concentration (DO), Agitation Rate (AG), O 2 Uptake Rate (OUR), CO 2 Evolution Rate (CER) and ammonia water consumption Rate (AR) were chose as input variables. Table 1 shows the inputs and outputs configuration of the prediction model. There are 10 batches data for modeling and testing, 9 batch used for training and the other one for testing (Zheng and Pan, 2016).
In order to evaluate the prediction performance, Root Man Square Error (RMSE) and coefficient of determination (R 2 ) are computed as: where, y i is the offline measured value; ˆi y is the prediction value and n is the number of samples.

Results and Discussion
The phase partition results of glutamate fermentation process are shown in Fig. 4, comparisons of multi-phase partition and offline measurements method are shown in Fig. 5. It can be seen that the phase partition results are well consistent with the physiological states of the real process by offline measurements and analysis method. The phase partition approach can be applied to other fermentation processes where minimal process information concerning phase shifts is available. Moreover, it can be used as a guide online in making a decision about the timing of off-line sampling. Note that in Fig. 4, range of variables has been stretched processing for clear show of each curve.
Once the phase recognition of fermentation process has been determined, the proposed MPSVR prediction models are therefore built for online product quality prediction of glutamate fermentation base on the phase partition results. For comparison, the SVR, NN and Multi-Phase NN (MPNN) models are equally constructed to predict production concentration with the same fermentation batches.
For the NN, Functions included to automatically train and test standard 1-layer neural networks using the MATLAB functions "train" and "sim". The number of hidden neurons is cross validated. For the SVR, standard support vector implementation for regression and function approximation using the Libsvm toolbox. Prediction values comparisons of different models are shown in Fig. 6.   Prediction residuals and residuals boxplot comparisons of different models are shown in Fig. 7 and 8. It can be seen that the best prediction result has been obtained by the multi-phase SVR (MPSVR) model based soft sensor. The maximum residuals reaches 7.32 at fermentation time of 12 h. The reason is that it is decelerate phase, the bacteria activity becomes stronger at fermentation time of 12 h. Meanwhile, From Fig. 6-8, we can see that the MPSVR based prediction model exhibits better tracking performance than that by NN, SVR and MPNN models. Quality prediction results of different models are shown in Table 2, as can be seen that MPSVR has got the best prediction performance, since it has the minimum Root Mean Square Error (RMSE) 2.8653 among all prediction models, however, RMSE of SVR model is 3.7979. Furthermore, R 2 of the MPSVR is 0.9908.

Conclusion
In this paper, a soft sensor prediction model based on Multi-Phase SVR (MPSVR) was proposed for online quality prediction in glutamate fermentation process.
The states detection and partition is based on the detection of inflection points of the online measured O 2 variations using Moving Window (MW) and Pearson Correlation Coefficient (PCC). The phase identification result agrees well with the physiologic phase changing of the glutamate fermentation process. Hence this method can help to automatically control and optimize the glutamate fermentation process. As the phase identification merely lies on the process data, it can be applied to other fermentation processes where minimal process knowledge regarding phase shifts is available. Besides, the phase partition approach can be used as a guide online in making decisions about the timing of off-line sampling.
The performance comparisons among SVR, NN, Multi-phase NN and Multi-phase SVR are implemented. Through glutamate fermentation process case study, the feasibility of the proposed MPSVR soft sensor has been conformed, the MPSVR model exhibits excellent prediction performance, it can provide effective information for monitoring and operation of glutamate fermentation process.