Optimization of capacitated vehicle routing problem using initial route with same size K-means and greedy algorithm for vaccines distribution

Vaccines are the solution that is currently underway to tackle COVID-19. In this paper, vaccine distribution for hospitals in Central Java is developed. The problem case in this paper is classified as a Capacitated Vehicle Routing Problem (CVRP). The method proposed is using an initial route that follows the cluster-first route-second method (CFRS). The same size K-means is used for the clustering phase and the Greedy algorithm is used for the routing phase. The result of the initial route is a clustered route for each vehicle with a balanced capacity. Then, each cluster was re-optimized using metaheuristics Guided Local Search from Google OR-tools. Our experiment results have proven that using the initial route has the effect of reducing runtime by 97.37% 99.17% when compared to without the initial route. This is because using initial routes with the same size K-means means breaking the problem into parts, then using the Greedy algorithm can reduce the number of possible routes. However, the total distance increased by 8.22% 16.69% because no cluster member is allowed to move to another cluster.


INTRODUCTION
Coronavirus disease  carries terrible nightmares around the world. As of 22 February 2021, more than 110 million people have been infected with COVID-19. This is what has prompted many parties to compete to make vaccines against COVID-19. World Health Organization R&D Blue Print (2021) lists 84 vaccines in clinical development and 184 vaccines in pre-clinical development.
In Indonesia, vaccination has started, but it is not yet widespread. As of 22 February 2021, only 700 thousand people have been vaccinated out of 270 million Indonesians. In the future, vaccines must be distributed widely and massively. Weintraub et al. (2021) explained that the COVID-19 vaccine is urgent and needs proper vaccine distribution strategies to ensure that the distribution suppresses transmission.
Vaccine distribution for hospitals in Central Java is developed in this paper. Central Java is the third province with the most COVID-19 cases in Indonesia. Vaccines are distributed in 285 hospitals in Central Java where each hospital has different vaccine needs because it is based on the hospital's class. Distribution is carried out by several vehicles that had capacity limitations. The problem case in this paper is classified as a Capacitated Vehicle Routing Problem (CVRP) that is a variant of the Vehicle Routing Problem (VRP).
Furthermore, VRP has always been a concern in the field of Operations Research since its discovered by Dantzig and Ramser (1959). VRP has many constraints because it came from a real-world problem. Some of the famous constraints in VRP are
In this paper, Capacitated Vehicle Routing Problem (CVRP) is the problem case. CVRP is classified as an NPhard problem and it can be exactly solved only for a smallsized problem, the heuristics approach yields the best results but does not guarantee optimality (Caric and Gold, 2008). Using metaheuristics is the common way to solve CVRP because the problem can be solved in a reasonable computational time although the result is not the best solution. Some metaheuristics that are widely used to solve CVRP are: Genetic Algorithms (Kramer, 2017), Guided Local Search (Dorigo and Socha, 2018), Ant Colony Optimization (Dorigo and Socha, 2018), Simulated Annealing (Delahaye et al., 2019), and Tabu Search (Sivaram et al., 2019).
One of the methods to solve CVRP is the cluster-first route-second method (CFRS). CFRS is a two-phase technique, the first are clustering, and then the second is routing where each cluster is treated independently as a Traveling Salesman Problem (TSP).
The following are the applications of the CFRS method in several papers to solve CVRP. Li et al. (2020) help the Ecommerce product distribution problem with the K-means as the clustering method and Tabu Search as the routing method. Mostafa and Eltawil (2017) employed K-means as a clustering method and added valid inequalities as a routing method for customer supply chains. Comert et al. (2018) help supermarket chains using Kmeans, K-medoids, and random clustering method than using branch and bound algorithm in each cluster. They compare the K-means, K-medoids and random clustering method and discovered that K-medoids provided better solution combined with branch and bound algorithm. Shalaby et al. (2020) handle the clustering phase using a modified demand weighted fuzzy c-means and used traditional optimization software for the routing phase.
In this paper, CFRS will be used as well for creating an initial route that will be re-optimized in each cluster. For creating CFRS, Clustering will be carried out with the same size K-means and Greedy algorithm for the routing phase. This paper implemented the same size K-means that will consider hospital demand so that each vehicle has the same vaccine capacity to distribute. The Greedy algorithm will solve the problem based on the shortest distance from the current node recursively. The first node used for the Greedy algorithm is the cluster's centroid. The reason for combining the same size K-means and Greedy algorithm for the CFRS method is because K-means and Greedy algorithm are well known for their speed and simplicity.
After the initial route is created, each cluster will be reoptimized using metaheuristic Guided Local Search. The purpose is to improve the performance of the Google Optimization Tools (OR-tools) (Perron, 2011) when solving CVRP using Guided Local Search. The rest of this paper is organized as follows. In section 2, the materials and methods are explained. Section 3 describes the result and discussion. Section 4 outlines the conclusion and possible future research direction.

Data Collection and Processing
The dataset used in this paper is scraped from Central Java's Covid-19 Official Website (Tanggap COVID-19 Provinsi Jawa Tengah, 2021). This dataset contains 285 hospital data including the hospital's name, address, class, latitude, and longitude. Some hospitals do not have complete addresses, latitude, and longitude, so the data is completed using Geocoding API from Google Maps Platform.
Moreover, there are two ways of distance calculation in this paper. First, Euclidean distance used latitude and longitude data for processing the initial route. Second, distance matrix that used Distance Matrix API from Google Maps Platform. The size of the original distance matrix is 285 × 285, but it became a 286 × 286 distance matrix because one dummy data was added so that there was no CVRP depot. This distance matrix will be used in Google OR-tools.

Algorithm for Creating Initial Route
This paper will compare the result of Google OR-tools when solving CVRP without an initial route and with an initial route, especially using Guided Local Search as a metaheuristic. The initial route follows the CFRS with few modifications. There are four stages to create the initial route. The first stage, modify the number of points in each hospital based on hospital class. The second stage, use the traditional K-means algorithm to partition hospitals into several areas based on the number of vehicles. Create the same size K-means in the third stage. Last, modify each point again and use the Greedy algorithm for each cluster to get the initial route.

Modify the Number of Each Point Based on the Hospital Class
In Indonesia, hospitals are divided into several classes. Class shows the readiness of the hospital which in this case indicates readiness when dealing with COVID-19. There are three classes, namely class 1, 2, and 3 with class 1 hospitals being the most prepared. In this paper, it is assumed that class 1 requires 2000 vaccines, class 2 requires 1000 vaccines, and class 3 requires 500 vaccines. The lower limit which has been set is 500 vaccines, meaning that 1 point represents 500 vaccines. Moreover, hospitals with class 1 will become 4 points, hospitals with class 2 will become 2 points, and hospitals with class 3 will become 1 point. The purpose of this modification is to create a cluster that considers the number of needs.

K-means Clustering
The clustering analysis method is one of the main analytical methods. K-means is one of the simplest and widely used data clustering algorithms (Sinaga and Yang, 2020). K-means is often referred to as Lloyd's algorithm (Lloyd, 1982). The number of clusters must be defined in K-means. In this paper, each cluster represents one vehicle. For distance calculation, Euclidean distance is used. The Euclidean distance between data point a and b is given by formula (1) (Smith, 2011): ( , ) = �( − ) 2 + ( + ) 2 (1) Given an initial set of K-means m1 (1) , …, mk (1) , the Kmeans result is by proceeds these two steps (Mackay, 1995). First by assigning each observation to the cluster with the nearest mean as seen in formula (2). Second, with the update step, recalculate means (centroids) for observations assigned to each cluster as seen in formula (3).

Same Size K-means
The purpose of making the same size K-means is to make each vehicle has the same capacity to distribute. This is important because, with traditional K-means, the distribution capacity of each vehicle is not balanced.
The steps for creating the same size K-means used in this paper are described as follows.
1. Do clustering with traditional K-means. 2. Set the time limit.
3. If runtime is less than the time limit, iterate step 4 until step 7. 4. Set the first cluster as the current cluster. 5. If the current cluster is less than the equal number, move the shortest points in other clusters that exceed the equal number to the current cluster until the current cluster reaches the equal number. 6. If the current cluster exceeds the equal number, move the points in current clusters that have the shortest points to other clusters until the current cluster reaches the equal number. 7. Change the current cluster in order. Repeat steps 5 and 6 until all clusters are processed. 8. If all clusters are processed, go to step 3.

Modify Each Point and Use the Greedy Algorithm for Each Cluster
After getting the results from the same size K-means, the points need to be modified again to produce the same number as the number of hospitals. Cluster determination at each point is selected from the dominant cluster. In the last step, use the Greedy algorithm to connect all points in each cluster and the initial route is ready to use. The step for using the Greedy algorithm in each cluster is described as follows.

Google OR-tools
Google Optimization Tools (Google OR-tools) is an open-source, fast, and portable software suite for solving combinatorial optimization problems. Google OR-tools provides algorithms for solving Travelling Salesman Problem and Vehicle Routing Problem.
In our work, the results of CVRP when using the initial route and without the initial route are compared. Both using libraries from Google OR-tools with Guided Local Search as the metaheuristics. Guided Local Search is a high-level strategy that interacts with the local improvement procedure using an efficient penalty-based technique. This interaction results in a process that can escape local optima, improving the efficiency and robustness of the underlying local search algorithms (Tsang et al., 2016).
For distance calculation, a distance matrix is used. The distance matrix is used for the final stage, either with the initial route or without the initial route. The use of a distance matrix is to make the route solution to be in accordance with real-life situations.
When solving with the initial route, each cluster is run separately and treated as Travelling Salesman Problem. When solving without an initial route, the problem is treated as Vehicle Routing Problem. The parameters used in Google OR-tools are shown in the table below (Google developers, 2020).

Initial Route Results
Our experiment employed data from 285 hospitals in Central Java which have an area of 32,801 km². The experiment is assumed with the number of vehicles 3 (low), 5 (medium), and 7 (high). The number of vehicles represents the number of clusters created which is the k value in K-means. Furthermore, by using the algorithm described in this paper, an initial route is created. Fig. 1 shows the process of creating an initial route for 3 vehicles. Fig. 1(a) was the k-means results with modified points that make 285 points become 380 points. The points increase due to capacity adjustment as described in Algorithm 1. In Fig. 1 (b), the same size K-means was created. The last step is to modify the points again back to 285 points, then the points are connected with the Greedy algorithm, see Fig. 1(c). The result of Fig. 1(c). will be used as the initial route in Google OR-Tools. The same step is shown for 5 vehicles in Fig. 2, and Fig. 3 for 7 vehicles.
Further, Same size K-means result as seen in Fig. 1(b) and Fig. 2(b), no point in the cluster has deviated too far. But in Fig. 3(c), some points have deviated too far and it should be more effective if those points move to another cluster. Starting from a route "start" node, connect it to the node which produces. The cheapest route segment then extends the route by iterating on the last node added to the route. FirstSolutionStrategy is used when the program runs without an initial route.

LocalSearchMetaheuristic GUIDED_LOCAL_S EARCH
Uses guided local search to escape local minima. This is generally the most efficient metaheuristic for vehicle routing.

ReadAssignmentFromRoutes initial_route
Specify a set of initial routes for a CVRP. The initial route given is based on the algorithm described in this paper. Time_Limit Between 100 to 1000 The time limit is in milliseconds to the search. The final route after being processed with Google OR-Tools is shown in Fig. 4 for 3 vehicles, Fig. 5 for 5 vehicles, and Fig. 6 for 7 vehicles. All results are plotted on a map of Central Java. The result with no initial route, as seen in Fig.   4(a), Fig. 5(a), and Fig. 6(a) represent that each vehicle has many scattered points. These points are scattered far away, but the road is not passed many times. Hence, the result with the initial route is shown in Fig. 4(b), Fig. 5(b), Fig. 6(b), respectively. Each vehicle has adjacent points, but the road is passed many times and looked less effective.

Total Distance Comparison
As seen in Fig. 7, Guided Local Search in Google ORtools with initial route performs worse in total distance. This is because points can't move to another cluster, make the solution stuck on local optima. In Table 2, the detailed results are shown with the better results are shown as bold text for each vehicle. Each problem was repeated 10 times with a time limit change from 100 milliseconds to 1000 milliseconds.
From Table 2, the worst increase is in Vehicle 7, this is following the results of the same size K-means with Vehicle 7 that is not good enough. The results of the same size Kmeans greatly affect the total distance.

Runtime Comparison
The runtime between both methods is very different, as seen in Fig. 8. The summary based on Table 3 is as follows: 1. Using the initial route can reduce runtime by up to 98.89% in Google OR-tools. 2. The average runtime when using the initial route is 5  -6 milliseconds. However, it depends on how fast the program creates the same size K-means. In vehicle 7, it takes a longer time to create the same size K-means. 3. While using the initial route the runtime increases with the number of vehicles.

CONCLUSION
In this paper, the results of using the initial route that follow the cluster-first route-second method (CFRS) in solving Capacitated Vehicle Routing Problem (CVRP) are presented. For creating an initial route, the same size Kmeans is used for the clustering phase and the Greedy algorithm is used for the routing phase. Re-optimization of each cluster was also done by using the metaheuristics method Guided Local Search from Google OR-tools.
As shown in Table 2, the initial route has the effect of reducing runtime by 97.37% -99.166% when compared to without the initial route. This is because using initial routes with the same size K-means means breaking the problem into parts, then using the Greedy algorithm can reduce the number of possible routes. Nevertheless, when it is run with Google OR-tools using metaheuristics Guided Local Search, there is a reduction in runtime which is very significant.
However, the total distance when using the initial route increased by 8.22% -16.69% as seen in Table 3. This is because no cluster member is allowed to move to another cluster, and make the route stuck on local optima. Hence, there is no relationship between clusters because each cluster is run separately.
In conclusion, using the initial route with the same size K-means and Greedy algorithm can solve CVRP in a more acceptable runtime, especially in problems that must be solved quickly such as vaccine distribution. However, this method does not produce a better total distance. For future work, the testing will be carried out on a larger dataset. Also, the addition of constraints such as Time Window should be considered.