Conjugate Gradient Method: A Developed Version to Resolve Unconstrained Optimization Problems

Division of Computational Mathematics and Engineering, Institute for Computational Science, Ton Duc Thang University, Ho Chi Minh City, Vietnam Faculty of Mathematics and Statistics, Ton Duc Thang University, Ho Chi Minh City, Vietnam Faculty of Civil Engineering, Ton Duc Thang University, Ho Chi Minh City, Vietnam Department of Mathematics, Faculty of Ocean Engineering Technology and Informatics, Universiti Malaysia Terengganu, 21030 Kuala Nerus, Terengganu, Malaysia


Introduction
The Conjugate Gradient (CG) method is utilized to resolve unconstrained optimization problems in the form of: where : n f    represents the smooth function and denotes that the gradient is available. Using the CG method does not require a second derivative or its approximation as Newtons method or its modifications. Thus, this method is computationally inexpensive. The CG method is used to obtain a solution for the optimization problem: By generating a sequence of points xk+1 (Equation (1), start from initial point x0, where xk denotes the current iteration and ak > 0 indicates a step length obtained from a line search (Equation (2)-(6). 1 , 1, 2, ..., The search direction dk of the CG method is defined in Equation (2) where, gk = g(xk) and βk is the CG formula.
The exact line search, which is expressed in Equation (3), can be used to obtain the step length: However, this type of line search is computationally expensive because numerous iterations are required to obtain the step length. Moreover, if the initial point is far from the optimum and/or the dimension of the problem is large, then an even greater number of iterations is required. With high-speed processors, sufficient memory and an appropriate choice of βk, Equation (3) may be computationally acceptable for some functions. The inexact line search uses an approximation of the function and a reduced search space to find the step length. Therefore, the inexact line search is not as computationally expensive as 1221 the exact line search. The Strong Wolfe-Powell (SWP) line search is the most popular type of inexact line search and is calculated in Equation (4)-(6): and: where: 0 1.

   
The WWP line search is given by Equation (4) and (6): The SWP line search forces the step length to be near a stationary point or the local minimum of the function, as the step length in the WWP line search may stratify without this advantage. The popular formulas for βk are illustrated in Equation (7)-(11) (Fletcher and Reeves, 1964;Polak and Ribiere, 1969;Fletcher, 1987;Liu and Storey, 1991;Dai and Yuan, 1999): Wei et al. (2006) proposed a new positive CG method, which is relatively similar to the original Polak-Ribière-Polyak (PRP) formula that has a global convergence under exact and inexact line search (Equation 12): Theoretically, when 0 PRP k   , the search direction restarts automatically. However, (Powell, 1984)  . This method is globally convergent whenever the line search fulfills the Wolfe conditions. This formula is presented in Equation (13): where: In the numerical experiments, they set  = 0.01. HZ k  is called the CG-DESCENT method. Numerous versions for the CG-DESCENT code have appeared recently. Additional details will be discussed in the section of numerical results. In addition, (Hager and Zhang, 2005) proposed an approximate WWP line search as in Equation (14).
where: (14) is matched to the second Wolfe condition (Equation (6)). The first inequality in Equation (14) is matched to the first Wolfe condition (Equation (4)) when the function is quadratic. The new version of this method, called CG-DESCENT 6.3, was proposed in (Hager and Zhang, 2013).
One of important conditions in CG method called sufficient descent condition which proposed b Al-Baali (1985), which given as follows if there exists a constant 0  c such that: N k   However, the concern regarding memory requirements and CPU time to solve unconstrained optimization problems has encouraged the development of the CG method. Over the years, numerous new formulas of the CG method have been proposed. Some of these formulas are difficult to use in different application fields, such as neural network, engineering and medical science. This restriction motivates us to the construct a new version of the CG method, which is simple and relatively easy to understand. For more information the reader can read the following papers Alhawarat and Salleh (2017), Alhawarat et al. (2015), Hestenes and Stiefel (1952), Gilbert and Nocedal (1992) and Salleh and Alhawarat (2016).
The rest of this paper is organized in five sections. The new version of the CG method (MCG) is illustrated in section 2. Section 3 demonstrates the global convergence analysis for the new formula. Efficiency analysis based on numerical results are discussed and evaluated in section 4. We concluded in section 5. Alhawarat et al. (2016) presented a new CG formula with new restart criteria (Equation (15)):

The Modified Conjugate Gradient (MCG) Method
where: In the present study, we modified the formula as follows (Equation (16)): where || . || represents the Euclidean norm, k is defined by We note that Equation (16) satisfies the descent property without using any line search. In addition, we note that: The main steps of the MCG method are illustrated in algorithm (1).
Step 6: Increment k and go to Step 2.

MCG: Global Convergence Analysis
The following assumption is required to establish the convergence properties of the new formula    (1), (2) and (16), where k is computed by any line search; then, the sufficient descent condition holds.

Proof
We use the proof by induction. By multiplying Equation (2) by , T k g we obtain: When m>1, we have 0.

T kk gd 
The proof is complete.
The following lemma is called (Zoutendijk, 1970). condition, which is useful for analyzing the global convergence property of the CG method.

Lemma 1
Suppose Assumption 1 holds. Let any method in the form of Equation (1) and (2) and k satisfy the WWP line search (Equation (5) and (6)), in which the search direction is descent. Then, the following condition holds: In addition, Equation (23) holds for the exact and SWP line searches; the proof is presented in (Wei et al., 2006). Substituting Equation (18) The following theorem shows that A k  has a global convergence property with the SWP line search. To establish the convergence analysis for the modified CG method (Equation (16)) with the modified WWP condition, we need the following theorem.

Proof
We use the proof by induction. By multiplying Equation (2) The proof is complete.

Efficiency Analysis: Numerical Result
To analyze the efficiency of the new method, some test functions are selected from CUTE (Bongartz et al., 1995), as shown in Table A1 (Appendix A). These functions are obtained from the CCPForge website (Gould et al., 2018). The selected functions and dimensions are similar to that used in (Hager and Zhang, 2005). Furthermore, the modified CG method is compared with CG-DESCENT 5.3 (Hager and Zhang, 2005). The comparison is based on CPU time, function evaluations, number of iterations and gradient evaluations. In this study, WWP is modified (presented by the modified CG-DESCENT 5.3), where the memory equal to zero is used to obtain the result for A k  . The code can be downloaded from the webpage of (Hager and Zhang, 2018). The CG-DESCENT 5.3 results are obtained by running CG-DESCENT 6.3 with memory equal to zero. The minimum time of 0.2 second is used for all algorithms with memory equal to zero. The host computer has an Intel (R) Dual-Core CPU and 2GB of DDR2 RAM. Figures 1-4, in which a performance measure introduced by (Powell, 1977) is used, show the results. This performance measure is presented to compare a set of solvers S with a set of problems P. Assume that ns solvers and np problems are s and p, respectively. The measure tp,s is defined as the computation time (e.g., number of iterations or the CPU time) required for solver s to solve problem p. To produce a baseline for comparison, we scale the performance of solver s on problem p by the top performance of any solver S on the problem using the following fraction: Thus, Ps(t) is the probability for solver sS that the performance ratio rp,s is within a factor tR of the best possible ratio. If the function Ps is identified as the cumulative distribution function for the performance ratio, then the performance measure   : 0,1 s P  for a solver is non-decreasing and piecewise continuous from the right. The value of Ps(1) is the probability that the solver obtains the best performance among all solvers. In general, a solver with high values of Ps(t), which would appear in the upper right corner of the figure, is preferable for all figures. Figure 1 shows that the modified CG method (Alhawarat) out performs CG-DESCENT 5.3 in terms of gradient evaluations. Figure 2 illustrates that k strongly outperforms CG-DESCENT 5.3 with regard to function evaluation. Figures 3 and 4 show that the k formula strongly outperforms CG-DESCENT 5.3 in terms of CPU time and number of iterations.

Conclusion
In this study, a modified version of the CG algorithm (Alhawarat) is suggested and its performance is investigated. The modified formula is restarted on the basis of the value of the Lipchitz constant. The modified WWP line search is used to obtain the step length. The global convergence is established by using WWP. In addition, the descent condition is satisfied without using any line search. Our numerical results show that the new coefficient produces efficient and competitive results compared with other methods, such as CG-DESCENT 5.3. As future work, the new version of CG (MCG) method will be combined with feedforward neural network (Back-Propagation (BP) algorithm) to improve the training process and produce fast training multilayer algorithm. This will help in reducing time needed to train neural network when the training samples are massive.