Ant Colony Optimization for Capacitated Vehicle Routing Problem

: Problem statement: The Capacitated Vehicle Routing Problem (CVRP) is a well-known combinatorial optimization problem which is concerned with the distribution of goods between the depot and customers. It is of economic importance to businesses as approximately 10-20% of the final cost of the goods is contributed by the transportation process. Approach: This problem was tackled using an Ant Colony Optimization (ACO) combined with heuristic approaches that act as the route improvement strategies. The proposed ACO utilized a pheromone evaporation procedure of standard ant algorithm in order to introduce an evaporation rate that depends on the solutions found by the artificial ants. Results: Computational experiments were conducted on benchmark data set and the results obtained from the proposed algorithms shown that the application of combination of two different heuristics in the ACO had the capability to improve the ants’ solutions better than ACO embedded with only one heuristic. Conclusion: ACO with swap and 3-opt heuristic has the capability to tackle the CVRP with satisfactory solution quality and run time. It is a viable alternative for solving the CVRP.


INTRODUCTION
The Capacitated Vehicle Routing Problem (CVRP) concerns the design of a set of minimum cost routes, starting and ending at a single depot, for a fleet of vehicles to service a number of customers with known demands. Mathematically, it can be represented by a weighted graph G = (V, A) with V = {0,1, 2,…, n} as the vertex set and A = {(i, j) | i, j∈ V} as the edge set. The depot is denoted as vertex 0 and the total of n cities or customers to be served are represented by the other vertices. For each edge (i,j), i≠j, there is a nonnegative distance d ij each measured using Euclidean computations. Each customer i, i=1,2,…,n, is associated with a nonnegative demand q i and a service time δ i which have to be satisfied. The demand at the depot is set to q 0 = 0 and its service time is set to δ 0 = 0. Each vehicle is given a capacity constraint, Q. Consequently, the objective of the CVRP is to find a set of minimum cost routes to serve all the customers by satisfying the following constraints which are listed in Voss (1999): (i) each customer is visited exactly once by exactly one vehicle, (ii) all vehicle routes start and end at the depot, (iii) for each vehicle route, the total demand does not exceed the vehicle capacity Q and (iv) for each vehicle route, the total route length (including service times) does not exceed a given bound L.
Since the CVRP is a NP-hard problem, only instances of small sizes can be solved to optimality using exact solution methods (Toth and Vigo, 2002;Baldacci et al., 2010). As a result, heuristic methods are used to find good, but not necessarily guaranteed optimal solutions using reasonable amount of computing time. Starting with the simple constructive approaches such as the savings algorithm proposed by Clarke and Wright (1964) or basic improvement methods such as the 2-opt heuristic, the general-purpose heuristic methods (which are called metaheuristics) have then been developed to guide subordinate heuristics to avoid or overcome local optimality. During the past two decades, an increasing number of literatures on heuristic approaches have been developed to tackle the CVRP. The summary and discussion of several important and state-of-the-art modern heuristics for the problem can be found in the study by Cordeau et al. (2002) and Szeto et al. (2011).
The Ant Colony Optimization (ACO) was first introduced by Dorigo and Stutzle (2004). It is inspired by the real life behavior of ants foraging for food. During the search for food from their nest to the food source, it was found that a moving ant will lay a chemical substance called pheromone on the trail. The pheromone trail is a form of communication among the ants which will attract the other ants to use the same path to travel. Thus, higher amount of pheromone will enhance the probability of the next ant selecting that path to travel. With times, as more ants are able to complete the shorter path, the pheromone will accumulate faster on shorter path compared to the longer path. Consequently, majority of the ants would have travelled on the shortest path. Detailed descriptions of the ACO can be found in the book by Dorigo and Stutzle (2004). Recent applications of ACO can be found in Naganathan and Rajagopalan (2011) and Yap et al. (2012).
To apply the ACO for solving the CVRP, Voss (1999) first developed an ACO algorithm which is called Ant System (AS) for the problem and then presented an improved AS in Bullnheimer et al. (1999). Since then, many researchers have proposed new methods to improve the original ACO especially by applying other algorithms into the ACO to tackle the large-scaled CVRP. For instance, Doerner et al. (2002) proposed a hybrid approach for solving the CVRP by combining the AS with the savings algorithm. After that, Reimann et al. (2002) improved on the method in Doerner et al. (2002) by presenting a Savings based Ant System (SbAS) and then Reimann et al. (2004) proposed an approach called D-Ants which is competitive with the best Tabu Search (TS) algorithm in terms of solution quality and computation time. Also, Mazzeo and Loiseau (2004); Bell and McMullen (2004); Yu et al. (2009) and Zhang and Tang (2009), have made major contributions to the development of ACO to tackle the CVRP. This study aims to compare the solution quality of different basic heuristics combined with an original ACO in solving the problem.

MATERIALS AND METHODS
The main tasks considered in an ACO algorithm consist of the solution construction, the management of the pheromone trails and the additional techniques such as heuristic. Overall, the main procedures of the proposed ACO for solving the CVRP are summarized in a flowchart illustrated in Fig. 1.

Solution construction:
In an ACO, each artificial ant simulates a vehicle and its complete set of routes is constructed by successively choosing customers to visit until all the customers have been visited. A new route will be started from the depot whenever the choice of the next customer to be visited leads to an infeasible solution due to the vehicle capacity or the total route length constraint. Consequently, there are a total of m solutions constructed sequentially by the total of m artificial ants in one run of iteration.
Initially, each ant is assigned to a randomly chosen customer as its first city to visit from the depot. Then, at each construction step, an ant k at current city i will select the next city j to visit from a feasible neighborhood k i N according to a probability distribution as in Eq. 1: where, η ij = 1/d ij is a heuristic value, τ ij denotes the pheromone concentration on the edge connecting cities i and j while µ ij = d i0 + d 0j -d ij is the savings of combining two cities i and j on one tour as opposed to visiting them on two different tours. The parameters α, β and γ bias the relative influence of the pheromone concentration, the heuristic value and the savings value. With the above probability in Eq. 1, the selection of a city that has not yet been visited would depend on the following criteria: • Pheromone concentration, τ ij which indicates how good the choice of the next city j from the current city i from the past • Attractiveness, η ij which indicates how promising the choice of the next city j is from current city i • Savings, µ ij which measures the favorability of combining two cities i and j to a tour where high savings indicate that visiting the next city j from current city i is a good choice • Feasible neighborhood, k i N which is also called the candidate list where it includes only the closest cities for the current city i to be available for selection as the next city to be visited in the route The probability of choosing a particular edge (i, j) will increase with the addition in the value of the corresponding pheromone concentration τ ij whereas the values of the heuristic information η ij and savings µ ij will not dynamically change over time. However, when all the cities in the candidate list have already been visited by the ants, one city out of those not in the candidate list will be chosen.
In this case, an ant k will select the city (among the remaining cities) with maximum value of [τ ij ] α [η ij ] β [µ ij ] γ as the next to move to. The use of the candidate list has the ability to significantly reduce the computation time necessary for the ants to construct solutions since the ants choose among a much smaller set of cities. But it should be noted that the use of a truncated candidate list can lead to not finding the optimal solution. During the routes construction process, an ant k returns to the depot when the carried quantity of demands meets the vehicle capacity constraint or the total route length constraint is violated. After that, the same ant k which represents a vehicle will start a new route again to serve the customers that have not yet been visited. This process will be repeated until all the customers have been visited.

Heuristics:
After an artificial ant has finished constructing a solution but before the following ants start to build their solutions, the pheromone is updated and the ant's solution will be improved by applying a heuristic. There are four basic heuristics which are of interest to us: Swap: This heuristic aims at improving the clustering of the solution by exchanging two customers from different routes, i.e., a customer i from route a is exchanged with a customer j from route b if there is an improvement of solution quality. In the proposed ACO, the swap heuristic will stop once there is a successful exchange of two cities between two different routes or there is no improvement found for the solution built.
Subtour reversal: This heuristic adjusts a sequence of cities to be visited in the current solution by selecting a subsequence of the cities and reversing the order. In detail, for an n-city situation, this heuristic starts with a feasible route and then tries to improve on it by reversing 2-city subtours, followed by 3-city subtours and continuing until reaching subtour of size n-1. The improvement is based on the largest decrease in travelled distance and the ties will be solved randomly. The stopping rule is subjected to when there is no subtour reversal improving the vehicle routes in the solution.

2-opt:
This heuristic is applied separately to each of the vehicle routes built by an ant. Starting from a feasible route, it modifies the current route by deleting two edges, reversing one of the resulting paths and then reconnecting the route with two new edges. In the proposed ACO, the 2-opt is implemented to each vehicle route by using the best-improvement stopping rule.

3-opt:
Three edges of a tour are removed in a 3-opt move and a new tour is obtained by replacing at most three of its arcs. In this context, the removal of three edges will result in three paths that can then be recombined into a full tour in eight different ways as shown in Fig. 2. However, only four (e, f, g, h) of the eight ways actually introduce three new edges while the other four ways (a, b, c, d) can be obtained by the 2-opt move.
Different combination of the above mentioned four heuristics would be applied to the original ACO during the computational experiments to determine the best combination of the heuristics with ACO in solving the CVRP.
Pheromone update: After all the artificial ants have improved the solutions through the heuristics, the pheromone trails will be updated. This is the main feature of an ACO algorithm which assists at improving future solutions since the updated pheromone trails would reflect the ants' performance and the quality of their solutions found. In this context, there are two main phases of the pheromone update in an AS algorithm (Dorigo and Stutzle, 2004), which are the pheromone evaporation and the pheromone deposition. In the proposed ACO, modifications would be made to the usual pheromone evaporation whereas the pheromone deposition would be referred to Bullnheimer et al. (1999) which comprises of the elitist strategy and also the concept of ranking. The details of the pheromone update procedures implemented in the proposed ACO are described as follows: Pheromone evaporation: First of all, the pheromone concentration on all edges will be lowered by a constant factor with the following Eq. 2: where, is an evaporation factor, 0 ρ < 1 ≤ is the trail persistence, θ is a constant, as the evaporation factor as opposed to the pheromone evaporation in an AS algorithm from Dorigo and Stutzle (2004) which uses only the ρ as the trail persistence. The idea is to simulate the evaporation process of the pheromone trail in nature which depends on the length of the path travelled by an ant. The longer the path is, the more pheromone evaporates. Consequently, it favors the exploration of not yet visited edges by making the edges already visited by the ants less attractive. Furthermore, this process can avoid early or quick convergence of all the ants toward a suboptimal solution.
Pheromone deposition: After the pheromone evaporation process, only the best ants and the elitist ants will deposit pheromone on the edges that they have travelled following the Eq. 3 below: Where: , if the λth best ant travels on edge (i, j) ∆τ = 0, otherwise , if edge (i, j) is part of the best solution ∆τ = 0, Bullnheimer et al. (1999), two types of pheromone trails are laid during the pheromone update process with the Eq. 3. Firstly, the best-so-far solution (objective value L * ) found since the start of the ACO algorithm will be updated as if σ elitist ants had traversed it. The quantity of the pheromone deposited by the elitist ants is * ij ∆τ . Secondly, only the σ-1 best ants out of m ants of the current iteration are allowed to lay pheromone on the edges that they have traversed. The amounts of pheromone laid by these ants depend on their rank λ and also their solution quality L λ , where the λth best ant lays an amount of pheromone equals to λ ij ∆τ . In short, the idea of the elitist strategy is to provide strong additional reinforcement to the edges belonging to the best solution found so far after every run of iteration. The aim is to guide the search in succeeding iterations as it is likely that some edges of the best-so-far solution are part of the optimal solution. On the other hand, the concept of ranking which is suggested in Bullnheimer et al. (1997) aims to avoid the danger of over-emphasized pheromone trails caused by many ants using suboptimal routes.

RESULTS
The computational experiments were performed on a set of benchmark problems which are publicly available at the VRPWeb at: http://neo.lcc.uma.es/radi-aeb/WebVRP/. These fourteen vehicle routing test problems have been widely used as benchmarks and their characteristics are summarized in Table 1. From the initial investigation, we observed that the following parameter settings give a good compromise between the computation time and the solution quality for the proposed ACO: • m = n artificial ants • α= 2, β = 5, γ = 9 • ρ = 0.80, θ = 80 • candidate list size of   3 n • σ = 3 elitist ants Besides, with the suggestion from Dorigo and Stutzle (2004), the initial pheromone concentration was set as τ 0 = m/L nn , where L nn is the total length of the solution generated by the nearest-neighbor heuristic. This is due to the fact that it is a good practice to set the initial pheromone concentration to a value that is slightly higher than the expected amount of pheromone deposited by the ants in one iteration. For all the problems tested, we set the maximum iteration to 50000.
The computational results are presented in Table 2  and Table 3. For each instance in both Table 2 and 3, the results are presented in the form of RPD (both for best solution obtained and on average) and the average run time is shown in CPU seconds. A RPD value of 0.00% indicates that the best published solution of the total distance is obtained. The last row of both of the tables shows the average RPD and average run time over all instances tested for different approaches. Table 2 shows the results obtained by an original ACO algorithm and those ACO algorithms combined with one heuristic. In addition, we further extended the investigation on ACOs by combining two heuristics into the original ACO algorithm with the results in Table 3. There are a total of three combinations of two different heuristics considered in our study since only the swap algorithm functions at improving the clustering with the modification among different vehicle routes whereas the other three heuristics function at improving the routing with the alteration within one vehicle route.

DISCUSSION
All the ACOs with heuristic perform better than the original ACO with respect to the solution quality. However, ACOs with heuristic consume more run time compared to the original ACO especially for the ACOs involving the application of 3-opt heuristic. Nevertheless, the ACO with 3-opt heuristic performs the best in Table 2 with an average RPD of 4.65% and the ACO with swap and 3-opt heuristics also outperform other ACOs in Table 3 with an average RPD of 4.16%. From Table 2, it could be observed that the performance of the ACOs could be summarized as the application of 3-opt performs the best, followed by 2-opt, subtour reversal and then the swap heuristic. The same case for the results shown in Table 3 where the ACO with swap and 3-opt performs better than the ACO with swap and 2-opt following by the ACO with swap and subtour reversal. Besides, it is shown in Table  2 and Table 3 that the application of two different heuristics in the ACO has the capability to improve the ants' solutions better than the ACO embedded with only one heuristic. For instance, the ACO with swap and 2-opt obtains an average RPD of 5.38% whereas ACO with swap and ACO with 2-opt only reach the average RPD of 8.15% and 5.73% respectively.

CONCLUSION
The CVRP has been an attractive issue in the field of distribution and logistics which is motivated by both its practical relevance and its considerable difficulty. In this study, we have compared the solution quality of different basic heuristics combined with an original ACO in solving the problem. The computational results of fourteen benchmark problems shown that the ACO combined with the swap and 3-opt heuristic has the capability to tackle the CVRP with satisfactory solution quality and run time. Therefore it is a viable alternative for solving the CVRP.