A Genetic Algorithm for Scheduling n Jobs on a Single Machine with a Stochastic Controllable Processing, Tooling Cost and Earliness-Tardiness Penalties

: Problem statement: In this research, we addressed the problem of minimizing the earliness-tardiness penalties and manufacturing costs of a single machine with a stochastic controllable processing and tooling cost. Approach: We developed a mathematical non-linear integer programming model and its linearised version to find the optimal solution. We introduced a new genome representation in single machine scheduling literature that evolved by a genetic algorithm to solve the problem. The genome representation includes two genes per job, one represents the job starting time and other corresponds to the job processing time. The algorithms were compared based on the solution quality, CPU time and memory consumption in bytes on a set of randomly generated test problems. Results: The results showed that developed algorithms could define the global optimal solution of most scheduling problems with n ≤ 20 jobs. For larger n, the developed genetic algorithm outperforms the math models in terms of solution quality and less CPU seconds while consumes moderate memory kilobytes of 3295 compared with 5058 and 1685 of linear and nonlinear models on the average. Conclusion: The GA’s average performance achieves 6.013 related to the lower bound of math linear program whereas nonlinear model achieves an average of 1.034. The GA’s performance increases by increasing n compared with other techniques. We hope to expand the developed algorithms for different configurations as parallel and job shops.


INTRODUCTION
In this article, we examine the single machine scheduling problem with controllable processing time. In various real-life systems, the job processing time may be controlled by allocating extra resources such as money, manpower, energy, catalysts, spindle speed, feed rate, overtime, subcontracting and so on. For more interesting applications of such scheduling problems, Trick (1994); Kaspi and Shabtay (2003); Wang and Cheng (2005); Kayan and Akturk (2005) and Gurel and Akturk (2007a) for CNC turning operation in flexible manufacturing systems. In relation to these systems, resource allocation and scheduling objectives should be optimized simultaneously to achieve the most efficient system performance.
The modern continuous improvement paradigms such as lean manufacturing and 6-sigma focus on the creation of value through the relentless elimination of waste by the precise allocation of their resources and producing jobs according to the just-in-time principle. Moreover, these paradigms working on the optimal resource allocations in order to minimize the manufacturing cost that can be reflected in an increase in profit by cost reduction rather than the conventional approach of increasing the profit by price increase. Also, a significant value added to the product price is the inventory holding cost resulting from the excess production that could be eliminated by considering the earliness and lateness penalties. Various research work has been done that combine resource allocations and scheduling objectives as the work of Wassenhove and Baker (1982); Janiak (1987); Daniels and Sarin (1989); Daniels (1990); Panwalkar and Rajagoplan (1992); Cheng et al. (1996a;1996b) and Janiak and Kovalyov (1996). Vickson (1980) initiated the research in the controllable processing for a single machine with a linear resource consumption function to minimize the total weighted flow time cost plus controllable job processing cost followed by the work done by Wassenhove and Baker (1982); Janiak (1987) and Janiak and Kovalyov (1996). For a survey concerning this area before 1990 Nowicki and Zdrzalka (1990) and brief surveys given later by Chen et al. (1998); Hoogeveen (2005) and Shabtay and Steiner (2007). Hoogeveen and Woeginger (2002) considered a controllable processing single machine scheduling problem to minimize the multi-criteria of the total weighted job processing times plus the linear compression cost function of the processing times and showed that the problem was NP-complete. Kaspi and Shabtay (2004) considered the single machine scheduling problem with controllable processing time for identical and non-identical job release times restricted by a common limited convex decreasing resource consumption function for minimizing makespan. Moreover, Shabtay and Kaspi (2004) considered the same problem to minimize the total weighted flow time. Janiak et al. (2005a) minimized the multi-criteria of the total weighted completion and compression times for a controllable processing single machine. They showed that the problem was equivalent to the half-product minimization problem. Cheng et al. (2001) and Ng et al. (2005) considered single machine partitioned-jobs group scheduling problems with controllable processing times and the machine processed jobs of the same group simultaneously. Janiak et al. (2005b) presented the polynomial time algorithms based on solving two variables linear programming problems by geometric techniques to minimize the total weighted resource consumption of sequencing a set of jobs' groups with independent setup times between groups of a single machine. The setups and processing times are compressible depending on the availability of two resources.
The single machine scheduling problems with resource dependent release times have been extensively studied by Janiak (1985); Cheng and Janiak (1994); Janiak (1998);Janiak (1991) and Li (1994) to minimize a single objective as makespan subjective to the total resource consumption or to minimize the multi-criteria of the total resource consumption cost plus the makespan. The single machine scheduling problem in which both release times and processing times could be controlled by the amount of the resource consumed was recently addressed by Wang and Cheng (2005) to minimize the makespan plus total resource consumption cost.
Although there has been a significant body of research work on process planning decisions for a turning operation with multi-objectives on parallel machine configuration with controllable processing time such as minimizing the sum of makespan and total processing cost as the work done by Alidaee and Ahmadian (1993); Cheng et al. (1996a;1996b ); Jansen and Mastrolilli (2004) and Shabtay and Kaspi (2006) there is a little research work done for the single machine scheduling case. Recently, Gurel and Akturk (2007b) considered the minimization of the total manufacturing cost and total completion time simultaneously on identical parallel CNC turning machines. They developed an effective math formulation for the problem by minimizing the total manufacturing cost subject to a given makespan value. Also, they proved some optimality attributes which facilitate efficient heuristic algorithms to generate approximate non-dominated solutions.
Most of the scheduling literature focused on the optimization of scheduling objectives as makespan, total weighted completion times, or solving the multicriteria objective composed of these two scheduling objectives plus the optimization of the linear resource consumption function. Also, they assumed a discrete compression cost function or linear processing. In a different manner this article focuses on the minimization of earliness and tardiness penalties as scheduling objective plus the manufacturing cost function expressed as a nonlinear convex function of its processing time as showed by Kayan and Akturk (2005). In this article the manufacturing cost is defined in terms of tooling and operating costs.
The purpose of this article is to study the effectiveness of Applying Genetic Algorithms (GAs) to minimize the total manufacturing cost and the earlinesstardiness penalties for manufacturing a set of n jobs of controllable processing times on a single machine. The main reason of selecting GAs is the non-linear nature of the problem that makes it is difficult to find optimal solutions by math programming algorithms when they are always trapped in local optimal solutions. Moreover, GAs are capable of obtaining near to optimal solutions of optimization problems consuming less CPU time and memory bytes (Mansour and Dessouky, 2010). Also, they have the advantage of their flexibility of modeling complex constraints and objective functions. The manufacturing costs include the cost of operating the machine plus the tooling costs. We develop a genome representation that can be considered as a new addition to the single machine scheduling problem with controllable processing and investigate its reliability and applicability for solving large problems.
The remainder of this article is organized as follows. In section 2, we give the problem definition and provide non-linear and linearised math formulation for the problem. The developed genetic algorithm is developed in Section 3. Section 4 provides numerical results on a set of generated test problems compared with the commercial math programming solver. Finally, Section 5 provides the conclusions and future research.

Problem definition:
The single machine scheduling problem with controllable processing time can be formulated as follows. Let π = {1,2,3,..,n} be a set of independent and non-preemptitive jobs which have to be executed on a single machine and ready to be processed at the start of production period. Each job has a single operation to be performed on the machine and has a uniform random processing time with an upper ( U j p ) and lower bounds ( L j p ). Also, each job has its due date (d j ) and earliness-tardiness penalties (a j ) and (β j ). The cost parameters of performing job j on the machine were given in terms of machining cost per unit time (c j ) of job j and the tooling cost multiplier (m j ) and exponent (e j ). The problem consists on allocation of jobs to time slots on the machine and defining the processing time for each one to minimize the sum of earliness-tardiness penalties plus the manufacturing costs.
We use Kayan and Akturk (2005) formula for the manufacturing cost function of producing job j as a function of p j as shown in Eq. 1 bellow: The first term is a linear increasing function of p j representing the cost of operating the machine for p j unit times and the second term is a nonlinear decreasing function of p j representing the tooling cost. The parameters m j and e j representing the job tooling cost multiplier and exponent of job j whereas the conditions m j >0 and e j <0 always hold and guarantee that f j (p j ) is a non-linear convex function. Moreover, the processing time for job j is constrained by a lower ( L j p ) and upper ( U j p ) bounds. Kayan and Akturk (2005) for a detailed description on how f j (p j ) is formed and how L j p and U j p are determined.

MATERIALS AND METHODS
We developed a mixed integer non-linear programming model and its linearised version for the scheduling problem. Also, we developed a genetic formulation via developing a new genome representation that evolved by a genetic algorithm.
The mixed integer non-linear math programming model: The problem can be formulated as a mixed Integer Non-Linear Math Programming (NLIP) model, which considered an extension of the standard single machine scheduling problem that use completion time variables C j and the binary variables y jk to model the problem as the work done by Balas (1984); Queyranne and Wang (1991); Queyranne (1993); Queyranne and Schulz (1994); Pinedo (2002) and Khowala et al. (2005), as follows: { } jk y 0,1 for j, k N and j k ∈ ∈ p The nonlinear objective of the mathematical model is to minimize the sum of earliness and tardiness penalties, the manufacturing and tooling costs of n jobs as indicated by Eq. 2. Equation set (3) ensures that the completion time for each job is greater than or equal to its processing time. The disjunctive constraints are defined in constraints set (4) and (5) that define if the job j is preceded by job k or not. The big M is taken to equal to the sum of upper processing times for all jobs in this article. Equation 6 and 7 define the lower, L j p and upper bounds, U j p for each job. The calculations of the tardiness and earliness penalties are given by satisfying Eq. 8 and 9 respectively. Variables domain is restricted by Eq. 10 and 11. The mixed integer linear programming model: The solution of the nonlinear model always trapped in a local optima point so it needs to be linearised to an equivalent math Linear Programming (LP) to find the global optimal solution. The NLIP could be linearised by redefining the term p j to the summation shown in Eq. 12 where u i is a binary variable define the p j 's value, p r is the precession of the data set that equals to 0.01 in this article and k equals to the difference between upper and lower processing time divided by p r . Equation 12 represents all possible values of the processing time of job j and Eq. 13 restricts this value to be a unique in terms of the binary variable u i . The term j e j p could be linearised by Eq. 14. Equation 15 defines the variable u i as a binary: The developed GA: GAs have been widely used in optimization literature to solve the non-polynomial time and complex problems as the satellite daily image selection problem, jobshop scheduling, flexible manufacturing systems operational problems and the optimization of single batch processors of chemical plants. The GA begins by generating a number of genomes equal to a predefined population size and performs a number of evolution processes as crossover, selection, replacement, mutation until satisfying a predefined stopping criteria to define a near to optimal solution. We describe the genetic formulation for the single machine scheduling problem with controllable processing times as follows.

The genome representation:
The genome representation consists of n pairs of genes. Each pair defines the start time and the amount of processing required per job. The genes holds real alleles express time values. For example, consider a 5 jobs problem represented by 10 genes as shown in Fig. 1 where s j denotes the possible value for starting the corresponding job. The first pair consists of 2 consecutive genes denoting the start and processing times for job 1. The first gene represents the staring time and includes a value ranges from 0 and max ( n U j j j 1 p , max(d ) = ∑ ) in steps of 0.01. The second represents the processing time required to finish job 1 and so on. For example consider a 5 jobs problem as shown in Fig. 1

The initialization scheme and genetic operators:
The initial population was randomly generated for start times genes by generating a discrete uniformly distributed random number between [0,max( n j j j 1 p , max d = ∑ )] for job j. For initializing processing time genes, we construct a discrete empirical probability distribution for each job's processing time by dividing the range of processing time to 10 equal intervals. Let the probability of appearance of the first interval numbers equals 1, the probability equals 0.9 for the second interval numbers and so on. The roulette wheel selection method, two point crossover and swap mutation operators were used to generate new offspring's. The experimental section provides the settings that were used for these operators in this article.
The genome score: Our genome representation does not guarantee feasible genomes at the initialization and during the evolution process at each GA's step. To penalize the infeasible genomes, we added a cost term to the objective score f. It consists of a genome feasibility, f fes and performance, f per measures. The genome feasibility part measures the conflict between all jobs and the genome performance part measures the genome total cost. The genome score is calculated as the following procedure: Step 1: Calculate the sum of processing times of all possible job pair's combinations sp ij Step 2: Calculate the genome total overlap, δ g that can be defined as the sum of all possible overlaps between any two job pair's combinations between 1 and n. The overlap, δ ij between jobs (i, j) can be defined as the difference between the sum of p i and p j minus the actual difference d ij as in Eq. 16: The actual difference, d ij between two jobs (i, j) could be determined by Eq. 17. The symbols F i and S i indicate the finish and start times for jobs i and j respectively: The value of δ g is the sum of all δ ij calculated using Eq. 16 Step 3: Calculate the genome feasibility measure, f fes as in Eq. 18: If f fes value equals to 1, go to step 4 otherwise go to step 5.
Step 4: Calculate the genome's score f = f fes /2.0 and end Step 5: Calculate the genome's total cost, TC, that is equals to the sum of tardiness, earliness, operation and tooling costs of the feasible genome then calculate the genome performance measure as shown in Eq. 19: Step 6: Calculate the genome score using Eq. 20 and end: fes per The GA heuristic was coded using the MATLAB software and tested on a Fujitsu Siemens Laptop, Intel (R) Pentium (R) M with 240 MB RAM, 40 GB HDD, 1.6 GHz speed computer system running Windows XP. The results section will investigate the applicability of using the developed algorithms for solving the scheduling problem under considerations.

RESULTS AND DISCUSSION
Data set: For generating the test problems, five levels for n are considered ranges from 10-50 in steps of 10. C o and m j factors take the values are generated from a discrete uniform distribution from (0.1,4.5]. The e j factor is generated from a discrete uniform distribution from (0.1,2.0]. The Tightness Factor (TF) has two levels of 0.2 and 0.6. The earliness and tardiness penalties (α j and β j ) are generated from a continuous uniform distribution from (0,  Computations results: Table 1 shows the performance of the developed math programming models and the developed GA in terms of the solution quality, CPU seconds and memory kilobytes consumed by each algorithm. Columns 1-4 depict problem number, n, TF and RDD values for each problem.     (Ng) for each problem.The performance of various algorithms are measured by the ratio of the solution/lower bound found by the algorithm with regard to the lower bound found by the math linear programming model using LINGO software where a stopping criteria of 36,000 seconds was adopted. Each design of Table 1 represents a total of 50 problems, each one solved by the nonlinear, linear and the developed GA.
The genetic algorithm parameters such as population size, number of generations, crossover and mutation probabilities affect the algorithm efficiency. Population sizes and number of generations varied for each test problem as listed in Table 1. The population size was set to 200,250,300,300,300,300,300,350,400 and 500 for problems with n of 10,20,30,40,50,60,70,80,90 and 100 respectively. Also, the number of generations was determined based on experimentations for each problem set separately as shown in Table 1 column 18.
Extensive experimental work was done on a hypothetical problem with 40 jobs to find the best values for the crossover and mutation operators. From this test case, a 0.95 crossover and 0.05 mutation probabilities were adopted for all test problems in this article. For each test problem, the GA was run 100 times, each time with a different initial random seed resulting in 300,000 runs for all models.
In general, the linearised version of the nonlinear model could find the global optimal solution of most 10 and 20 jobs designs. Also, the developed GA could find the global optimal solution for these problems consuming lower computational time. Also, it is observed that the NILP model consumes less memory rather than LP model and GA as shown in Fig. 2. The memory consumed by the non-linear model is so lower that of the linearised model due to the large number of binary variables of the linearised model rather than the non-linear one. Also, the GA consumes larger memory than LP model for the first 6 problems of 10 jobs set and consumes moderate memory kilobytes for all problems.
We first evaluate the developed algorithms for small size problems (n = 10), which can be solved optimally by the math LP algorithm using the software package LINGO. Table 1 show that the LP on the average spent 97 seconds to find the optimal solutions while the GA spent less than 25 seconds to find optimal solutions. Also, the NLIP model could not find any optima's and spent on the average 2431 seconds to converge to local optimal solutions. The developed GA could define optimal solutions with a mean average performance of 1.12 compared with the LP bounds.
For medium size problem instances (n = 20, 30, 40, 50), the LP and GA heuristics found the optimal solution for problems 8,10 and 16, however the GA spent less time than the LP model to find the optimal solutions and the NLIP algorithm could not find any optimal solutions. Moreover, the LP and the GA could find solutions for 10 problems whereas the NLIP algorithm could find only a local optimal solution for only 8 problems and could not define any optimal solutions for this set of problems. For the other large size problems, none of the math models could define any candidate best solution but they could define a lower bound for each problem while the developed GA could define feasible and good solutions compared with the lower bounds found by the linear model.
Based on experimentations, we can observe that the GA formulation for the problem under consideration can be solved in less CPU seconds rather than the math formulations of the problem while it consumes moderate memory kilobytes.

CONCLUSION
In this research, we have proposed a math NLIP model for solving the single machine scheduling problem with controllable processing time for minimizing the total earliness and tardiness penalties and the total manufacturing cost. The developed nonlinear model was linearised to obtain the global optimum solution for the scheduling problem. Also, we developed a GA for solving the problem via developing a new genome scheme evolved in a genetic evolution process. The new genome representation could be considered as a new addition to single machine scheduling literature. The developed GA could define the same solutions as the LP model for small size problems where the number of jobs reach to 10 and also outperforms the developed math programming algorithms for larger problems in addition to it consumes small CPU seconds and memory kilobytes.
The natural development in the future of this work is to find a trade-off between the four criteria and to further improve the solution quality and speed of the developed GA. Also, this research can be expanded to different configurations as parallel machines, job shops and flexible job shops.