A New Spectral Conjugate Gradient Method for Nonlinear Unconstrained Optimization

Corresponding Author: Zabidin Salleh Department of Mathematics, Faculty of Ocean Engineering Technology and Informatics, Universiti Malaysia Terengganu, 21030 Kuala Nerus, Terengganu, Malaysia Email: zabidin@umt.edu.my Abstract: The conjugate gradient method is widely used to solve large scale unconstrained optimization problems. However, the rate of convergence conjugate gradient method is linear unless it restarted. In this study, we present a new spectral conjugate gradient modification formula with restart property obtains the global convergence and descent properties. In addition, we proposed a new restart condition for Fletcher-Reeves conjugate gradient formula. The numerical results demonstrated that the modified Fletcher-Reeves parameter and the new CG formula with their restart conditions are more efficient and robustness than other conventional methods.


Introduction
We consider the following problem: where, f: R n  R is continuous and differentiable function and its gradient g(x) = f(x) is available. Iterative methods are usually used to solve (1), as follows: 1 , 1, 2, ..., starting from initial point x1  R n , where k is obtained by some line search. The search direction dk is defined by: where, gk = g(xk) and k is known as the conjugate gradient parameter.
The exact line search can be used to find the steplength k. Suppose that () = f(xk+dk) which is problem that departs from xk to find a step length in the direction dk such that () <  (0). If the step length is defined such that the search direction minimized i.e., this line search is called exact line search where this line search is expensive.
Therefore, using the inexact line search with less computation load is better. The inexact line search in particular Strong Wolfe-Powell (SWP) line search inherits the advantages of exact line search and computationally inexpensive. Thus, to reduce the computation cost of exact line search and also to reduce evaluations of the objective function and gradient function, usually the inexact line search is employed. SWP line search is more preferable than other line searches. The SWP line search is defined by: and: where, 0 <  <  < 1. The Weak Wolfe-Powell (WWP) (Wolfe, 1969;1971) line search given by (5) and: The convergence of CG method will not be linear if we restart CG method (Powell, 1977). Beale (1972) recommended the use of the two-term CG method instead (dk = -gk, k  1) as the restart search direction. Powell (1984) recommended restarting dk using Beale's method if: Dai and Yuan (1998) present the following restart criterion: The famous formulas for k are the Hestenes-Stiefel (HS) (Hestenes and Stiefel, 1952) , Fletcher-Reeves (FR) (Fletcher and Reeves, 1964) and Polak-Ribière-Polyak (PRP) (Polak and Ribiere, 1969) formulas, which are defined as follows: where, yk = gk -gk-1. Polak and Ribière (1969) proved CG method with the PRP formula and by using exact line search is convergent. Powell (1984) show that the PRP fail to satisfy the convergence by using an example even the exact line is used. Powell recommended to use the non-negative of PRP formula to satisfy the convergence analysis. Gilbert and Nocedal (1992) suggest to use PRP as follows: (1970) obtain the global convergence of FR formula with CG method and the exact line search. Al-Baali (1985) proved FR method with SWP line search when  < 1/2 and SWP line search is employed, Guanghui et al. (1995) extended the proof to the case for   1/2. Alhawarat et al. (2017) presented the following formula: where, k is defined as follows: Kaelo et al. (2020) proposed the following CG formula: As we know that in the case of the function is quadratic i.e., f(x) = g T x+ (1/2)x T Hx and the step size obtained by exact line search (3), the CG method satisfy the conjugacy condition i.e., . From quasi-Newton method, BFGS method and the limited memory (LBFGS) method and using (2), Dai and Liao (2001) present the following conjugacy condition: where, Sk-1 = xk -xk-1 and t  0. In the case of t = 0 Eq. (8) becomes the classical conjugacy condition. By using (2) and (8), (Kaelo et al., 2020) proposed the following CG formula: . This formula is given as follows: The positive scalar denoted by k. Hence, dk given as: when, k = 1, the search direction is a classical CG method. If k = 0, then there are two possibilities of k. If k =  2 f(xk) 1 or an approximation of it, then the search direction is Newton or Quasi-Newton, respectively.

The New Formula and the Algorithm
Here, we construct the following new modification to improve the efficiency and robustness of DY CG formula and robustness of PRP CG method.
Step 2 If a stopping criteria is satisfied, then stop.
Step 6 Set k: = k+1 and go to Step 2.
In Algorithm1, note that after the step k = k+1, the iterates xk = xk+1 takes place after every iteration. The other iterations are updated in a similar manner as xk.
In following section, we present the global convergence property of the new formula (11). In case of then the search direction becomes the steepest descent (negative gradient) which mean the stationary point is obtained.

Convergence of CG Algorithm with the Search Direction
In some neighbourhood n of , f is continuously differentiable and its gradient is Lipschitz continuous; that is, for all x, y  N, there exists a constant such that: This assumption implies that there exists a positive constant B such that: The descent condition: Al-Baali (1985) modified (12) to the following form and used it to prove the FR method: where, c  (0,1). Equation (13) is the sufficient descent condition. Note that the general form of the sufficient descent condition is (14) with c > 0.

Descent and Convergence Properties for ATAZ k d with the SWP Line Search
In fact, we have two types of global convergence; weak global convergence and strong global convergence both of them imply the stationary point for optimization problem. However, the convergence and the descent properties will not give any sense in terms of the efficiency for CG methods; for example, FR formula has global convergence properties with poor efficiency. Thus, to improve the efficiency when the method cycle does not reach a solution the CG algorithm should be restarted. In the following section, we will present a new CG method with restart property by using the steepest descent method.

Lemma 3.1
Suppose assumption 1 is holds. Suppose method in the form (1), (2) and k satisfies the WWP line search (5) and (6), where the search direction satisfied. Then Also we can extended Eq. (14) to the following form: Kaelo et al. (2020) present the following theorem for global convergence properties:
Proof. By using proof by induction. From (3) for k = 1, Suppose that it is true until k -1, i.e., 1 1 0  , for i = 1,2,…, k -1 then we have the following two cases: Case By using assumption 1: By using Theorem 3. 1 We obtain the lim inf 0 k k g   .

The New Restart Criteria for FR Family
Fletcher-Reeves formula is simple CG method and has a global convergence property with SWP line search and it satisfies the descent property. However, FR formula is not efficient as PRP k  which the later has a problem in convergence properties for some optimization functions. Powell

Numerical Results and Discussion
To study the efficiency of the new search direction, we selected several test problems in Table 1 from Cuter (Bongartz et al., 1995) and Andrei (2008). The test functions consist of unimodal and multimodal functions. We also selected examples according to the similarities in significant physical properties and shapes. For example, the Rosenbrock function has a long, narrow shape; the Himmelblau function, the six-hump function and the three-hump function have many local minima; the Booth function is plate shaped; and the Sum Squares function is bowl shaped. As the CG method is useful for small-and large-scale optimisation problems, we also select the dimensions of the functions, which varied from 2 to 10000. All of the functions are nonlinear. In Table 1, "Gen" denotes generalised, "Ext" denotes extended, "Dim" denotes dimension/s. We employed the MATLAB programming environment (ver. 7.9). The results are shown in Fig. 1 and 2, in which a performance measure introduced by Dolan and More (2002)  Since we are interested to find the stationary point/s for optimization problems, we selected more than one initial point to test every function in Table 1 the dimension of functions between 2 and 5000. Different initial points almost will obtain different stationary points, which imply that more than one solution for multimodal functions. Hence, we obtain the best solution. In addition, we select small and large dimensions for every function. The ranges of dimensions are chosen between 2 and 10000. Thus, we conclude that using different dimension and different initial points to obtain the results will be more convince than using original initials and one dimension. However, the starting point needs more study. is uppermost of all curves. In addition, it is clear that FR* formula is better than original FR formula which demonstrates the discussion that presented by Powell and the program is terminated by the user when the number of iterations exceeds 1000. The PRP+ formula is efficient since its curve started uppermost other curves. However, it is not satisfied the descent property with SWP line search. Thus, the program is terminated automatically. In addition we present the following two functions the first one called Extended Beale function which given by the following formula: Number of variables (n): 500, 1000, 5000, 10000 with initial points: (-1,-1, ..., -1), (.5, .5, …, .5), (1, 1,...,1), (2, 2, ..., 2).
This function has only one global minimum surrounded by a flat plateau. At the four corners lie four ascending steep walls that become smaller at the tip. These steep walls become higher as the value of the two variables increases. The minimum is x* = (3,0.5) and the function value is f(x*) = 0 for two variable functions (note Fig. 3 for a three-dimensional graph).

2 1
And the second function called Perturbed quadratic function: Initial points: (0.5, 0.5, …, 0.5) This function has a smooth curve, look like dish shape and has a minimum value at x* = (0,0) and the function value is f(x*) = 0. where lies at the bottom for two variable functions ( Fig. 4 for a three-dimensional graph).
Moreover we present another strong comparison between ATAZ and CG-Descent is given with benchmark functions in Table 2. The numerical results in Fig. 5, 6 and 7 show that the new modification ATAZ is better than CG-Descent in term of number of iterations, number of function evaluations and CPU time. The test functions can be downloaded from (Bongartz et al., 1995).