Two-Versions of Conjugate Gradient-Algorithms Based on Conjugacy Conditions for Unconstrained Optimization

: Problem statement: (CG) algorithms, which we had investigated in this study, were widely used in optimization, especially for large scale optimization problems, because it did not need the storage of any matrix. The purpose of this construction was to find new CG-algorithms suitable for solving large scale optimization problems. Approach: Based on pure conjugacy condition and quadratic convex function two new versions of (CG) algorithms were derived and observed that they were generate descent directions for each iteration, the global convergence analysis of these algorithms with Wolfe line search conditions had been proved. Results: Numerical results for some standard test functions were reported and compared with the classical Fletcher-Reeves and Hestenes-Stiefel algorithms showing considerable improving over these standard CG-algorithms. Conclusion: Two new versions of CG-algorithms were proposed in this study with their numerical properties and convergence analysis and they were out perform on the standard HS and FR CG-algorithms.


INTRODUCTION
The problem of interest can be stated as finding a local x* to the unconstrained optimization problem: where, f: R n → R is continuously differentiable and its gradient is available and denoted by g. There are different type of iteration algorithms for solving the problem given in (1); all these algorithms uses the iteration of the form: Where: X 0 = Starting point and α k = Step-size computed by line search procedure d k = A descent direction If f∈C 2 and the Hessian matrix G = ∇ 2 f(x) is available and positive definite then an ideal choice for d k is the Newton direction [7] given by: It is shown that when G k is positive definite and x k is lies in some neighborhood of x* then the sequence {x k } generated by (2) and (3) converges and order of convergence is second order. These local convergence properties represent the ideal local behavior which other algorithms aim to emulate as far as possible [6] , in spite of these desirable properties of Newton's algorithm also it has some drawbacks such as dealing with n × n matrix and when x k is remote from x* the algorithm may not defined when G k is not positive. Therefore, other algorithms can be used for solving the problem (1) such as Quasi-Newton algorithms which are modifications of Newton's algorithm and uses direction of the form: where, H k+1 is an approximation of the inverse Hessian matrix. The Conjugate Gradient (CG) algorithm is suitable approach to solve the minimization problem given in (1) when n is large. If the (CG) algorithm is used to minimize non-quadratic objective functions the related algorithm is called the non-linear (CG) algorithm [12,14,15] . The search directions for CGalgorithm has the following form: Here β k is a scalar known as the (CG) parameter. Different CG-algorithms correspond to different choices for the parameter β k , therefore a crucial element in any (CG) algorithm is the definition of formula β k ; some well-known (CG) algorithms include the Hestenes-Stiefel (HS) algorithm, the Fletcher-Reeves (FR), the Polak-Ribiere (PR) and the Dai-Yuan (DY) algorithms [4,9,10,11] : where, y k = g k+1 -g k . Hager and Zhang [8] shown that CG-algorithms with in the numerator of β k has strong global convergence properties with exact and inexact line searches especially with the following Wolfe conditions: where, 0<c 1 <c 2 <1. For some CG-algorithms stronger version of Wolfe conditions i.e., (10a) and: Are need to ensure global convergence and hence stability [8] . But these algorithms has poor performance in practice. On the other hand the CG-algorithms with T k 1 k g y + in the numerator has uncertain global convergence for general non-linear functions but has good performance in practice. One of the open questions in optimization is whether can we construct a (CG) that has both global convergence and good numerical performance in practical computation? In this study we try to derive new CG-algorithms with global convergence property and acceptable performance in practice. All the algorithms mentioned earlier (Newton algorithm, Quasi-Newton algorithm and CGalgorithms) are called conjugate direction algorithms since they generates a conjugate directions i.e., the search directions generated by these algorithms satisfies the following equation: When the objective function is quadratic and convex function and step size α k is exact. For general non linear functions, we know by the mean value theorem that there exists some γ∈(0, 1) such that: Therefore, it is reasonable [5] to replace (12) with the following conjugacy condition: Which is called pure conjugacy condition. Dai and Liao [5] combined the search direction given in (3) or (4) with secant equation to modify the pure conjugacy condition (13) as follows: where, B k+1 symmetric positive n×n matrix, satisfying the quasi-Newton or (secant) equation: where, s k = x k+1 -x k , therefore: where, t>0 is a scalar. The main object of this study is to find new (CG) algorithms with new search directions d k having the same form of (5). This is done in materials and algorithms as well as the descent property and global convergent property will be proved and numerical results will be reported and compared with some standard CG algorithms.

Derivation of two new versions of CG-algorithms:
It is known that all conjugate direction algorithms generate conjugate directions at least theoretically [6] and hence the key element for derivation of the new algorithms is the pure conjugacy condition (13), also in derivation of all conjugate direction algorithms it is assumed that the objective function is convex and quadratic, therefore we begin with convex quadratic function q(x) defined by: where, x∈R n and G is positive definite n×n matrix. Since q(x) is strictly convex then G is diagonal and gradient of q(x) is given by: The main property of quadratic function is: From (18-20) we can write: Therefore Newton direction (3) for function defined in (18) can be written as: Use the conjugacy condition (13) because Newton directions are conjugate with exact line searches: Similarly CG algorithms generates conjugate: Use (22) and (23) to get: s y y s y Where: We can therefore modify the Eq. 25 and 26 by using the idea of Dai and Laio [5] and combining the quasi-Newton condition with pure conjugacy condition: s y g y g s 0 y y and From (28) and (29) we get: And letting s k = α k d k : For convenience, we summarize the above algorithms as the following algorithms. Similarly we can summarize the new v2 algorithm as.

Convergence analysis:
In the investigated of the global convergence analysis of many iteration algorithms, the following assumption is often needed.
• In some neighborhood N of S f is continuously differentiable and its gradient is lipschitz continuous i.e., ∃ a constant L>0 s.t: Note that assumption A implies that ∃ a constant γ>0 such that: In order to ensure global convergence of our algorithms we need to compute the step size α k . The Wolfe line search consists of finding α k satisfying (10a) and (10b). The following lemma, called the Zoutendijk condition is often used to prove global convergence of (CG) algorithms. It was originally given by Zoutendijk [16] and Wolfe [13] . for proof [16] or [13] .
In the investigated of the global convergence analysis for many CG-algorithms. The descent or sufficient descent condition plays an important role. In the following theorem we show that the new v1 algorithm produces sufficient descent directions i.e., where c is some positive scalar. Proof: For initial direction (k = 0) we have: s y g y g d g g 1 g s y y s y g y g y g g g s -g s s y y y From lipschitz condition T  . Therefore: g g c L s g y y y g c s g y y Note that: T T T T k k k 1 k 1 k k 1 k k T T T k 1 k 1 k k 1 k k 1 y y g g -2g g g g g g -g g y g When γ = c 2 (L-1); Divide both sides of (35) by 2 k 1 g + and take the squares to get: Taking the summation of the above equality from k = o to k = ∞ yields: Contradiction with Zountendijk theorem. Therefore k g 0 = .
Note: To study the convergence analysis of the new Algorithm 2 we will give only the conditions for descent property since the algorithm is not in general generates descent directions except under suitable conditions. It is clear that for exact line searches the directions are descent for all k. We assume that the parameter α k satisfies Wolfe conditions therefore: s y g 2 s g y g y s g 2 s y s y Use the fact T

RESULTS
Here we reported some numerical results obtained with the implementation of the new v1 and v2 algorithms on a set of unconstrained optimization test problems taken from [2,3] . We have selected (15) large scale unconstrained optimization problems in extended or generalized form; for each test function we have considered numerical experiments with the number of variables n = 100, 1000, 10000.
These two new versions are compared with two well-know CG-algorithms; the first is the Hestenes and Stiefel (HS) algorithm which is one of the best and well-known CG-algorithms [5] in practice and always generates conjugate directions independent of line search and objective functions. The second is the original Fletcher and Reeves (FR) algorithm. All these algorithms are implemented with the standard Wolfe line search conditions (10a) and (10b) with c 1 = 0.0001 and c 2 = 0.9 where initial step-size k 0 1 / g α = and the initial guess for other iterations i.e., (k>0): In the all these cases, the stopping criteria is the -6 k g 10 ≤ . Problems numbers indicat for: 1 is the Extend trigonometric, 2 is the Extend Rosenbrok, 3 is the Penlty, 4 is the Perturbed Quadratic, 5 is the Rayadan 1, 6 is the Extended three exponential terms, 7 is the Generalized tridigonal 2, 8 is the Extended Powell, 9 is the Extended wood, 10 is the Quadratic QF1, 11 is the Quadratic QF2, 12 is the Extend tridigonal 2, 13 is the Almost pertumbed quadratic, 14 is the Tridiognal perturbed quadratic, 15 is the ENGAL1 (CUTE).
Because the main costs in the numerical optimization are the Function And Gradient Evaluations (FGEV) and also the number of Iterations (IT), hence our comparison are based on the function, gradient evolutions (which they are equal in these CGalgorithms by employing cubic fitting technique as a line search subprogram). Also in the comparison we considered the ability of the algorithms to solve particular test problems.
All codes are written in double precision FORTRAN (2000) with F77 default compiler settings. These codes are originally written by Andrei [1,2] and modified by the authors.   In Table 1 we have compared HS, FR, new v1 and new v2 algorithms for three values of n. The symbol * in Table 1 means that the algorithm is unable to solve the particular problem in less than the maximum number of iterations which it is 2000 in our comparisons.
From Table 1 we have observed that the new v2 algorithm is the better than the others for n = 100, in terms of the number of results against (IT and FGEV). Details of the best results of these compared algorithms are shown in Table 2. From this Table we have observed also that for n = 1000, the new v1 is also the best and for n = 10000, the new v1 is the best, overall 45 problem dimensions test operations.

CONCLUSION
The suggested algorithms like the original CGalgorithms are gave better numerical results in terms of IT and FGEV which are clearly well-defined from Table 2.