Data Clustering-based Metaheuristic for Physical Internet Supply Chain Network

: In this study, a data clustering-driven technique is proposed for a Physical Internet Supply Chain Network (PI-SCN) to reduce data complexity, process time compression, and lankness of process optimization. Given a set of data points, a clustering algorithm aims to classify each data-points into a specific group. Each group should have similar properties and/or features, while data points in different groups should have highly dissimilar properties and/or features. The motivation of this study follows. Firstly, an improved metaheuristic algorithm named ISCA is proposed as a new data clustering technique to improve and incorporate a variety of PI-SCN decisions. By this framework, we propose a tool to make clear decisions for enterprise proprietors. The robustness of the proposed approach is tested against five recent metaheuristics using twelve benchmark datasets. The presented technique performs more satisfactory accurateness and complete coverage of search space in comparison to the existing methods.


Introduction
In recent years, the worldwide challenge is to ensure sustainable performance within a dynamic environment for logistics and supply chain management operations. The driving force is raised from the high uncertainty that becomes an inherent characteristic to consider for all supply chain planning levels. In this field, the logistic web and the physical internet (PI or π) as futuristic concepts are considered to rely on the interconnection of the global logistics system to improve efficiency while ensuring global sustainability. The idea of PI was put forward by Ballot et al. (2014), who proposed to apply it to physical objects in the field of logistics according to the same type of treatment that the digital Internet brings to information objects. This logistics system, which encompasses the exhaustive supply chain operations, is implemented through interconnected networks and uses a standard set of collaborative routing protocols, modular containers, and intelligent interfaces. The previously discussed uncertainty can be raised through different aspects such as customer demand, and road traffic status in addition to the high-level disruption that can be occurred from a major crisis and COVID-19 epidemic is a well-known illustration.
The prevailing element to include to enhance PI in supply chain networks is the ability to assess the strategical, tactical, and operational performance indicators. It is worth mentioning that this assessment is mandatory for an organization's competitiveness as a framework. It is important for businesses to ensure long-term efficiency through an effective decisionmaking process Bhagwat and Sharma (2007). Thus, Taticchi et al. (2012) suggested that an arrangement to analyze the environment of the organizations is requested to successfully grow with a clear performance indicators configuration. The challenge here is to maintain real benefits for an overall performance measurement framework.
As mentioned previously, a high amount level of data is consistently handled according to many aspects of creation, collection, and archiving processes. This nature is an instrument to construct a major asset for companies for further process operation, control, and design. The availability of this wider advantage allows benefits by smart exploitation of these data to build up information and knowledge. Due to the large supply chain network and the multiple interconnections, the volume of the generated data is huge which compels the organizations to opt for new technologies to extract rapidly and wisely interpret this big volume of data Darvazeh et al. (2020); Tiwari et al. (2018). The traditional methods have revealed their limits within this new requirement for big data handling. Therefore, the professionals are looking for new concepts and abilities to develop and for efficient use of big data to achieve the required performance. Among the principal approaches, the Artificial Intelligence (AI) techniques are suitable to overcome big data treatment concerns. In this field, data-clustering is a well-known AI sub-discipline that can be implemented for real-world problems for data treatment. The purpose of this technique is the grouping of data points. Given a set of data points, a clustering algorithm aims to classify each data point into a specific group. In theory, data points that are in the same group should have similar properties and/or features, while data points in different groups should have highly dissimilar properties and/or features. Numerous algorithms have been suggested in the literature to solve the problems of data clustering. The most well-known algorithm is K-means Hartigan and Wong (1979). It is an iterative algorithm that attempts to partition the dataset into K pre-defined distinct nonoverlapping subgroups (clusters) where each data point belongs to only one group. However, this approach strongly relies on the initially designated cluster centers, thus, inclined to set into local optima solutions. In the literature, data clustering belongs to NP-hard problems for which the resolution is hard for deterministic algorithms. Accordingly, these algorithms cause local entrapment for NP-hard applications, this behavior impacts negatively the algorithm's global performance.
Generally speaking, the optimization process becomes very tough for the algorithms due to the presence of non-linear objective functions and the large search space in data clustering applications. Consequently, adopting the most suitable optimization algorithm to resolve the problem prevails further. The defined algorithm should have a well-adjusted structure to deal with exploration and exploitation. Hatamlou (2013) suggested an optimization algorithm structure approach based on the cuckoo search for data clustering. For the same purpose, Saida et al. (2014) introduced a guided algorithm approach from the black hole. Many other studies have been presented in the literature. For example, Particle Swarm Optimization (PSO) implementation for data clustering has been reviewed by Rana et al. (2011). From a solutions searching standpoint through the optimization process, a novel technique has been considered in GSA to improve the global search Han et al. (2017). A revised gravitational search algorithm has been presented by Yang et al. (2007); Tavazoei and Haeri (2007). Chuang et al. (2011) highlight a method to increase optimization algorithms performance that relies on the introduction of chaotic sequences instead of random numbers. With inspiration from chaos theory, Wan et al. (2012) proposed a data clustering structure using a chaotic map and PSO while Singh (2020) used chaotic sequences. Concerning the same technique, Singh and Saxena (2021) presented one more method based on the chaotic sequence harris hawks optimizer. Senthilnath et al. (2013) proposed a chaotic number and opposition learning approach. In addition, Abdulwahab et al. (2019) considered a searching strategy based on the levy flight-based cuckoo algorithm. Besides, Rojas-Morales et al. (2017) proposed a directed methodology based on levy flight for data clustering. To enhance the search diversity for the optimization algorithms, Kumar and Sahoo (2017) have successfully implemented the opposition learningbased method. Moreover, an optimization algorithm that is based on opposition learning in addition to a random local perturbation-guided monarch butterfly has been presented by Sun et al. (2019). For the same algorithm purpose, Nasiri and Khiyabani (2018) proposed a whale optimization for data clustering studies. The added value and the efficiency of the proposed approach have been proven by the authors through several comparative performance analyses. For data clustering endeavors using optimization algorithms, Kushwaha et al. (2018) proposed a magnetically guided approach. A hybrid approach based on exponential Grey Wolf Optimizer (GWO) and Whale Optimization Algorithm (WOA) is presented in Jadhav and Gomathi (2018) to deal with the data clustering problem. A variance-based differential evolution approach is suggested in Alswaitti et al. (2019). Zhou et al. (2019) and Eesa and Orman (2020) introduced optimization algorithms based on the symbiotic organism and cuttlefish guided approaches respectively.
The prevailing element to highlight is that despite the presence of plenty of optimization algorithms in the literature, it is tough to find out an algorithm that can outperform all the other algorithms within a benchmark of datasets for data clustering. There is a hurdle that has been emphasized by Wolpert and Macready (1997) as a general indication of no free lunch theorem to highlight that there is not a perfect algorithm for all test problems. This is the trigger of multiple optimization algorithms proposals for data clustering. Therefore, the best optimization algorithm should ensure a compromise between exploitation and exploration to get good solutions within a reasonable time. The most important issue is the contradictory nature of these two operations. On the one hand, focusing on exploitation can generate a premature convergence and on the other hand, prioritizing exploration creates random search. The exploitation characterizes the neighborhood research of good solutions. But, the exploration tends to get good solutions from diverse areas of the search domain. From a practical standpoint, while constructing algorithms to deal with data clustering studies, the programmer should include into the algorithm's design the ability to get the best solutions from the neighborhood, in addition, to catching up on those that are far from the poor ones. Thus, the best methodology is to find out a good way to update the solutions vectors' position efficiently.
In recent times, Mirjalili (2016) conducted a Sine-Cosine Algorithm (SCA) to resolve optimization problems with the implementation of real-world studies. Concerning population diversity and convergence rate, the performance of SCA has been already proven in the literature for solving multifaceted problems. For data clustering, the contribution of this study is to propose a novel approach based on SCA which is named ISCA. The effectiveness of the ISCA relies on its capability for a fast convergence towards the global optimal solution in addition to the local optima avoidance. Consequently, the prevailing contributions of this study are described as follows: • Use data clustering in PI-SCN to identify dense and sparse regions and, therefore, discover overall distribution patterns and interesting correlations among data attributes • An enriched SCA approach is proposed for data clustering for PI-SCN • The suggested method is assessed according to 12 benchmarks datasets for machine learning • To evaluate the quality of the resulted solutions, they are compared with 5 algorithms • To statistically evaluate the quality of the suggested approach, 3 statistical tests have been conducted • The efficiency of the suggested approach is justified according to convergence curves, statistical and experimental values

Physical Internet Supply Chain Network (PISCN) System
The PI is presented as an open, global, and multimodal logistics system based on universal physical, digital and operational interconnectivity, implemented through the standardization of smart collaboration interfaces, protocols, and modular containers to ensure sustainability and increase the efficiency Ballot et al. (2014). Another description has been proposed by Treiblmaier (2019) where the PI is considered a comprehensive and measurable supply chain framework that is based on a network of physical components. The standardization of these components in addition to optimization endeavors contributes to developing the supply chain management operations efficiency and sustainability. As highlighted by the previous definitions, the PI proposes an overhaul of the basics of logistics. The term" interconnection" thus refers to the close and intensive connection between the actors and the components of the network. The second key aspect of PI is the desire to open logistics networks and share assets. Today, companies form private and relatively stable networks that own their warehouses and vehicles in general. PI breaks with this logic and assumes that assets should be shared among all users of this new network and used as needed. This concept of PI has been proposed to satisfy the growing requirements in terms of the environment and the performance of services. Indeed, the current logistics system presents malfunctions that are harmful to the environment and which tend to compromise the worldwide objectives. For this reason, PI was designed to give logistics services a chance to be more resilient, efficient, sustainable, and adaptable for their users by changing the way physical objects move through the network. Accordingly, different technologies and research areas are required to ensure such transition toward PT, for example, the Internet of Things (IoT) for logistics networks interconnections in addition to the key performance indicators for continuous measurement and improvement. The PI has the potential to become a disruptive innovation that inspires vast multidisciplinary collaboration to solve several pressing social and business problems by revolutionizing extant logistics and SCM practices Krogsgaard et al. (2018).

The PI Foundations
The idea behind PI was implemented to deal with the inefficiencies and unsustainable of current logistics and SCM practices. Montreuil (2011a) described 13 issues that make the current logistics practices unsustainable: (1) Limited space utilization for road, rail, sea, and air transport; (2) empty travel being the norm rather than the exception; (3) poor working conditions for truck drivers; (4) products sitting idle; (5) inefficiency of product distribution; (6) inefficient use of production and storage facilities; (7) mediocre coordination within distribution networks; (8) high inefficiency of multimodal transportation; (9) dysfunctional city logistics; (10) inefficient cross-docking operations; (11) low network security and robustness; (12) difficulty justifying the use of IT in SCs; and (13) limited innovation opportunities. It should be noted that such a paradigm shift in logistics, as suggested by the PI, requires significant transformations at various levels. Montreuil (2011b) suggested 13 PI principles to address the" global logistics sustainability grand challenge". An illustration has been proposed to further clarify the implementation way of these principles. For instance, it will be necessary to develop appropriate information systems using advanced technologies to enable the hyper connection of actors, increased and standardized sharing of information, and massive data storage. The impacts will not only take place at the level of the logistical operation but also on how the objects will be designed, produced, and delivered. Then protocols and interfaces should be enhanced through technological, infrastructural, and business innovation Montreuil et al. (2013).

The Pi Objectives and Constituents
The principal objective for PI is to alter "the way physical objects are handled, moved, stored, realized, supplied and used, aiming towards global logistics efficiency and sustainability" Ballot et al. (2014). Therefore, transportation of physical goods can be organized mutually between supply chain actors as data packages are handled on the digital Internet. The optimization of supply chain operations will be realized by sharing resources such as data and vehicles. Thus, inter-operability will be emphasized to ensure efficiency and sustainability. For this reason, the PI relies on horizontal and vertical cooperation across organizations using defined standards and protocols.

The Proposed PI-SCN System
The contribution of this study is to present a clustering framework based on an improved metaheuristic algorithm to reduce PI-SCN complexity, process time compression, and process optimization deficiency. The focus of tis research is threefold. Firstly, improve and integrate multiple PI-SCN decisions for a generic structure by a clustering framework based on an improved metaheuristic algorithm. Secondly, the managerial insights are clarified to present clear feasibility for decision-makers and enterprise owners. Finally, present a result-oriented technique for network simplification in order to select the appropriate path within PI-SCN. Figure 1 presents the features reflected in this study for PI-SCN.

Basic Concepts Brief Introduction to Data Clustering
Data clustering is a well-known unsupervised learning algorithm that can be implemented for realworld problems for data treatment. The purpose of this technique is the grouping of data points. Given a set of data points, a clustering algorithm aims to classify each data point into a specific group.
In theory, data points that are in the same group should have similar properties and/or features, while data points in different groups should have highly dissimilar properties and/or features. Numerous algorithms have been suggested in the literature to solve the problems of data clustering. The rest of the used notations are in the following: where N is its cardinal representing the total number of objects in the dataset. • An Object x1= {x1, x2,…xD} is a data element characterized by a vector of dimensions D, each element is a feature. • Data clustering can be described as the partition process of dataset X into K of mutually disjoint clusters which is denoted O = {O1, O2,..,} OK In this study, the clustering task is formulated as an optimization problem where the goal is to minimize the fitness function F (X, O) expressed as the within-cluster sum of squares: where, ||(Oi − Zj)|| denotes the Euclidean distance between the two vectors xi indicating the data point and o¯i representing the centroid of the cluster i. The wij denotes the associated weight for the data Xi and cluster Oj. The value of wij = 1 if Xi is with Oj cluster, else wi,j = 0.

Brief Introduction to Improved Sine-Cosine Algorithm (ISCA)
The Sine-Cosine Algorithm (SCA) is a new stochastic optimization algorithm proposed in 2016 by Mirjalili (2016) to solve optimization problems Li et al. (2018); Abd Elaziz et al. (2017). Starting with multiple initial solutions randomly generated, the SCA performs the search process by oscillating outwards or towards the destination (best solution) using a mathematical model based on the trigonometric Sine and Cosine functions. The SCA integrates many random parameters to accentuate the balance between exploration and exploitation during the optimization search process.
In the SCA, after initialization of the population of agents, at each iteration, the below mathematical model is used to update the position of each agent according to Eq. (2): where, Des i,t is the destination solution (best solution found so far), X i,t is the current solution, |.| is used to indicate the absolute value. For the parameters r1,r2,r3, and r4 are random variables. The parameter r1 is a random variable used to equilibrate between exploration and exploitation. This parameter ensures that the next location of the agent i, possibly either be inside the space between the current solution X i and Des i or outside them. This parameter is defined as follows in Eq. (3) Mirjalili (2016): where a is a constant, Gmax is the maximum number of iterations and g is the current iteration. The direction of the position of the next solution is towards or outwards the destination is ensured by the random parameter r2. The random parameter r3 guarantees the stochastic effect (i.e., emphasize (r3 > 1) or de-emphasize (r3 < 1)) of the destination in determining the distance. The sine and cosine functions are switched in Eq. (2) by the random parameter r4.
A good metaheuristic method implies a high explorative capacity and a quick exploitative rate. To support the performance of the standard SCA regarding the exploration, a new update position based on Accelerated Particle Swarm Optimization (APSO) Gandomi et al. (2013) is used. Generally speaking, APSO has a high explorative capacity during the search process and can locate the optimal solution efficiently. After generating a new solution using the standard mathematical model of SCA given in Eq. (2). This solution is updated based on the equation of APSO according to Eq. (4): Unlike the standard PSO and as shown in the Eq. (4), there is no need for the velocity. This mechanism avoids the disadvantages of velocities when updating the position in APSO. Also, a random parameter r gives more flexibility to avoid local optima. The rest of the parameters are β ∈ [0.2−0.7] and α = γ t where γ ∈ [0.1−0.99] and t ∈ [0, Tmax].
The subsequent phases represent the proposed ISCA for data clustering for PI-SCN. For each dataset X with dimension D, there are N data objects to assign for K clusters, where K is an integer to be initialized.

Initialization: Construct an initial population by
generating randomly NP agents, where each agent includes K centers with D dimension 2. Fitness Evaluation: For each agent, we evaluate the fitness by using Eq. (1) 3. Update Position: The position of each agent is updated based on the standard mechanism of SCA given in the Eq.
(2) and then the APSO mechanism given in the Eq. (4) 4. Stopping conditions: If the stop condition is met, the ISCA terminates and returns the best solution.
Otherwise, return to Step 3 for the next iteration The proposed ISCA for data clustering for PI-SCN is summarized in Algorithm 1.

Analysis of Time Complexity
In this study, the considered parameters for all algorithms are the population size NP, the maximum of generations Maxitr, the number of independent runs R, and N the number of elements in the taken dataset. Since the time complexity is O (Maxitr × NP × N). Hence the parameter N changes regarding the considered dataset. The time complexity of the proposed ISCA is O(Maxitr × NP). The total amount of memory space used to run the program for each benchmark depends on the mentioned parameters which are constant size datatype. Therefore, the space complexity is O (1).

Numerical Results
To evaluate the performance of the proposed ISCA-based data clustering for PI-SCN, we compared it to five recently developed algorithms: Grey Wolf Optimizer (GWO) Mirjalili et al. (2014), Sine Cosine Algorithm (SCA).
Algorithm 1 Pseudo-code of ISCA for data clustering PI-SCN Inputs: • X = D dimension dataset with N data objects • K = Number of clusters Outputs: • = Set of K clusters O1,O2,...,OK of data objects Begin • The population size NP • Maximum number of iterations maxt • Ap Each Agent with K random cluster center (∀p = 1 : NP) while (t < Tmax) do for (p = 1 : NP) do for (i = 1 : N) do • Calculate the Euclidean distance of each data object Oi to cluster centers of Ap • Assign Oi to the nearest cluster centers of Ap • Calculate the fitness using equation (1) end for end for Update Agent's positions using Eq. (2) and (4).  . The parameters setting such as population size, the maximum number of iterations, and the number of independent runs for all algorithms are below: • Population size NP = 50 • Maximum iterations = 500 • Independent runs = 20

Case Study
The proposed model was tested first based on the dataset of 100 companies operating in the textile industry, the dataset is shown in Table 1. The initial data was modified due to confidential issues. The following parameters are considered: • #Instances = 100 • #Features = 13 • #Classes = 3 The obtained results are reported in Table 2. We considered the performance of each algorithm using as metrics the best, worst, average, and standard deviation. Since the metaheuristics algorithms are stochastic, the best and the worst take different values in each run may well represent upper and lower performance bounds respectively. The mean may designate the center of all objective function values obtained in multiple runs. In addition, that standard deviation helps to understand the performance characteristics of the algorithm. The lower the standard deviation, the more consistent the performance of the algorithm around the mean.
By analyzing the results of Table 2, it can be observed the highest performance of the proposed algorithm. This conduct is due to a remarkable capability to avoid local optima, particularly better than other algorithms. Besides, a convergence comparative experimentation was taken out to confirm that ISCA has better convergence performance than algorithms. Figures 2-3 show the convergence curves.

Literature Benchmarks
To further confirm the capability of the proposed ISCA approach in working with literature benchmarks, eight shape datasets and four UCI datasets are considered. These datasets are available in this link: https://cs.joensuu.fi/ sipu/datasets/.
The numerical results of all algorithms for Shapes and UCI datasets respectively are presented in Tables 3 and 4. The overall performance of all algorithms is evaluated based on the best, worst, mean, and standard deviation of the sum of intra-cluster distances. By analyzing the results of Table 5-6, we can observe that the best (minimum) values of metrics are obtained by the proposed ISCA at the most considered datasets. This concludes that ISCA can avoid local optimum, avoid premature convergence compared to the rest of the algorithms. On one hand, the mean values in the tables indicate a difference between algorithms for different datasets. This behavior is due to the complexity of considered datasets and the difficulty of discovering optimal points. On the other hand, large values of standard deviation show a considerable difference between the mean values and the values obtained in each run. To demonstrate the advantages of the proposed ISCA against other algorithms, a statistical test was conducted in Table 7-8. If the pvalue between ISCA and a given algorithm is below the significant level of 0.05, it means that the ISCA is superior n the statistical sense. Figures 4-14 show the convergence curves of algorithms for each benchmark. These curves show that the convergence rate of the proposed approach was much better than other approaches.

Analysis of the Results and Discussion
Based on statistical tests, the sum of intra-cluster and inter-cluster distances respectively are improved by the proposed ISCA. This behavior is due to superior avoidance of local optima and premature convergence.
In addition, a new position is generated by using the position update mechanism of the particle on APSO to guarantee high exploratory behavior. This combination is very useful to avoid stagnation in local optima even when the standard SCA does the exploitation stage.

Conclusion
In this study, we devoted the Physical Internet Supply Chain Network (PI-SCN) within a data-clustering framework. Given a set of data points, a clustering algorithm aims to classify each data-points into a specific group. Each group should have similar properties and/or features, while data points in different groups should have highly dissimilar properties and/or features. The motivation of this study follows. Firstly, an improved metaheuristic named ISCA is proposed to enhance and incorporate the data complexity, process time compression, and lankness of process optimization in the case of PI-SCN. Secondly, propose o tool to make clear decisions by the enterprise proprietors and lastly, the robustness of the proposed ISCA is tested against five recent metaheuristics using twelve benchmark datasets. Based on numerical results, the proposed approach achieves more reasonable precision and whole coverage of search space in comparison to the existing methods. Jamila Elalami: Supervision, Methodology, Formal analysis.