A Hybrid Approach to Process Planning: The Urban Traffic Controller Example

Corresponding Author: Falilat Olaitan Jimoha Centre for High-Performance Intelligent Computing, Department of Informatics, University of Huddersfield, Queensgate, Huddersfield, HD1 3DH, UK Email: F.Jimoh@hud.ac.uk Abstract: Automated planning and scheduling are increasingly utilised in solving evsery day planning task. Planning in domains with continuous numeric changes present certain limitations and tremendous challenges to advanced planning algorithms. There are many pertinent examples to the engineering community; however, a case study is provided through the urban traffic controller domain. This paper introduce a novel hybrid approach to state-space planning systems involving a continuous process which can be utilised in several applications. We explore Model Predictive Control (MPC) and explain how it can be introduce into planning with domains containing mixed discrete and continuous state variables. This preserves the numerous benefits of AI Planning approach by the use of explicit reasoning and declarative modelling. It also leverages on the capability of MPC to manage numeric computation and control of continuous processes. The hybrid approach was tested on an urban traffic control network to ascertain it practicability on a continuous domain; the results show its potential to control and optimise heavy volumes of traffic.


Introduction
Process planning is the act of selecting and assigning resources towards achieving a desired goal. Process planning is performed programmatically and It involves the design of autonomous computer program; such computer programs are self-aware of their environment, can adapt to change, generate and scrutinise goals (Russell et al., 1995). There have been many successful implementations of autonomous planning for processing planning. There has been the successful implementation of automated planning and scheduling for many engineering processes. For example, early work by Khoshnevis and Chen (1991) utilised automated planning and scheduling in manufacturing processes for comprehensive resource selection and allocation.
This early success motivated the use of autonomous planning and scheduling for many different applications; however, as each solution often contained tightly coupled domain knowledge alongside the algorithms, researchers were often spending large amounts of time developing systems which shared similar core algorithms. This resulted in the establishment of domain-independent automated planning where state-of-the-art algorithms are designed in isolation from the domain knowledge. These algorithms are then used alongside an action model representing the domain specific knowledge. Also, the emerging development in the field of automated planning with constraints processing has facilitated the deployment of deliberative reasoning to real-time control applications (Heinrich et al., 2015;Chen et al., 2015). There are many successful applications of domain-independent planning to realworld problems. Example could be found in the computer integrated manufacturing process (da Silva Fonseca et al., 2016); relocation problem (Tierney et al., 2012), calibration of machine tools (Parkinson and Longstaff, 2015), clinical validation (Dinapoli et al., 2016) and crowd sourcing (Machado et al., 2016).
It is vital to enable deliberative reasoning in systems. Introducing deliberation into a controller enables it to reason with its components, environment and functionality. It enable the generation of effective plans towards achieving desirable goal within the control system (Jimoh and McCluskey, 2012). This facilitates its effectiveness to deal with unexpected situations that might not have been learnt, adopted nor programmed beforehand into such a system Dusparic et al. (2016). Embedding automated planning into urban traffic control systems will introduce deliberative reasoning in urban traffic controllers. Deliberative reasoning in a controller would introduce intelligence into the UTC systems through the generation of plans and schedules for self management. This will ultimately contribute to the reduction of traffic congestion and carbon emissions on roads.
However, as the variety of possible planning applications increases so is the complexity of the domain knowledge (Jimoh and McCluskey, 2016). The complexity is significantly hindering the uptake of novel automated planning applications due to limitations of planning applications to handle continuous change in numeric values. To avoid this limitation, the complexity of the domain knowledge are currently being relaxed through the discretisation of continuous transformation into a discretised profile of linear change (Lhr et al., 2012). For example, the application to machine tool calibration, non-linear change in environmental temperate is discretised to reduce complexity (Parkinson et al., 2014). However, this discretisation is often at a cost to the quality and accuracy of the generated plan and a compromise has to be established. This also motivates the requirement for a novel approach to handling continuous processes in planning for control systems. The next section explains a hybrid algorithm that uses automated planning with an embedded MPC strategy to create an algorithm that can reason with planning problems containing numerics with continuous change. The specific example provided is in the urban traffic control environment to generate plans for a controller to optimise the traffic situation.
In this study, a hybrid planning system is presented through the introduction of Model Predictive Control (MPC) approach into a classical state-based planning system. It facilitates efficient planning in the presence of complex numeric and logical changes within a problem domain. The technique's primary application is in autonomous traffic management and will be provided as an example throughout the paper. However, the traffic management domain has many of the similar characteristics with complex engineering and manufacturing planning problems.
The layout of the paper is as follows: The first section presents a survey of work related to this paper. This leads to the description of the developed hybrid approach. Following this, a case study is presented where the technique is applied to the urban traffic controller.

Background
The increase in demand for innovative plan generation techniques, plan execution, monitoring and recovery; has stemmed awareness towards evolving system designs which make use of advance planning and implementation frameworks (Jimoh and McCluskey, 2015;Laguna et al., 2014). Teleo-Reactive Executive (T-Rex) is an example of such design. T-Rex is a goaloriented autonomous underwater vehicles that integrates automated planning technology for real-time plan generation and execution. T-Rex framework is designed to improve research in the field of oceanic science (Pinto et al., 2012). Another example of planning design is Planning and Execution L-Earning Architecture (PELEA). PELEA introduce adaptable modular design that integrates learning with planning and execution. It also incorporates sensing and monitoring for realtime re-planning (Jimnez et al., 2013). We propose the use of Model Predictive Control (MPC) design in continuous planning to create reasoning in controllers that can solve problems in domains which are modelled using variables whose values are changing continuously. Similarly, Domain Predictive Control (DPC) is another design that is proposed for continuous (re-) planning in hybrid systems (Lhr et al., 2012). It involves the extraction of a discretised domain model from given MPC dynamic equation of a system to control realtime applications. This is different from the work in this study; which involves the creation of symbolic continuous domain model of a system while leveraging on MPC derived from a model of dynamical equations of the same system as a heuristic to control the search space in symbolic planning.
Control systems which support Urban Traffic Control (UTC), such as those controlling networks of traffic lights, have utilised AI techniques since the 1970's (Jimoh and McCluskey, 2014). These systems are embedded in a real-time control environment and are often based on algorithms that rely on feedback and adaptation. They make use of road traffic data which may be gathered every few seconds or gathered over several years. Resulting in traffic control systems operating with the fundamental of adaptive signal control in road networks established from stored traffic data. However, these approaches to traffic control has some limitations during unprecedented situations such as road accidents or an unexpected change in traffic demand within short interval of time (De Oliveira and Bazzan, 2009;Jimoh et al., 2013b). In such circumstances, traffic control systems usually use fixed traffic signal timing or apply some hardcoded approach to revert into a recognised state. Therefore, there is a need for intelligent controls that can effectively generate plans and execution towards restoring an unpredicted traffic situation to desired condition. One promising direction is by creating a hybrid control design that will support intelligent systems to spontaneously reason and deliberate with their declarative knowledge, towards managing themselves during unexpected situations (Jimoh et al., 2013a). Such intelligent controls would be an achievement in the urban traffic control domain and this paper is a step towards realising such goal.

Model Predictive Control
Control engineering is a field of knowledge within the engineering discipline, which applies control theory to design and implement systems with desired behaviours. Predictive Controls is a sub-set of Control Engineering utilised in adopting and anticipating the future pattern of control processes in other to control its inputs for a desirable future goal.
MPC attracts remarkable consideration in the control of dynamic systems which makes it an essential aspect of control practice (Osusky and Vesely, 2015). MPC was established within the industrial sector as an alternative option of control compared with traditional Proportional Integrate Derivative (PID) controls (Bennett, 1993). MPC formulation incorporates optimal control, multi-variable control, stochastic control, deadtime processes and future references where applicable (Camacho and Bordons, 1999). MPC has several algorithms; they differ in the way they represent the model of the process as well as the cost function to be minimised. MPC algorithms have been continually enhanced to increase its robustness and scalability for instantaneous processes (Al-Gherwi et al., 2011;Falugi et al., 2010;Tay, 2007;Osusky and Vesely, 2015).

The MPC Approach
The mathematical model of a controlled process, as well as the assumed disturbances that might occur during its operation, is built based on the past experience of operation and past data from similar operations within the same system. A cost function is derived from the available resources and constraints that need to be optimised for the entire duration of the process. The system uses the predefined model as a guide to maximise the cost function when given a set of varying input parameters, output parameters and the dynamic changes in the state of the environment. The system plans over a period of time, which is known as the horizon. The generated plan is applied to the system to control the process by changing its current state to a desirable state for a given period of time. The new state is sampled again. It re-plans for another horizon taking the present state from the feedback loop as well as all the system constraints into consideration. This approach of planning is called "receding the horizon". This planning and re-planning approach make MPC robust and able to keep a control process in a desirable state over a given period. It also allows it to function in a partially observable environment, because of its ability to sample dynamic environment at every sampling time during a re-plan.

The Store-and-Forward Model
In 1963, Gazis and Potts introduced the store-andforward traffic flow model with the aim of achieving a sensible compromise between computational efforts and precision control in dynamic systems. A store-andforward traffic flow model is utilised in this study to formulate a state space predictive control model; it helps to creates a dynamic mathematical formulation of the network model (Guo et al., 2014). Figure 2 depicts a diagrammatic representation of the application of MPC into a UTC structure. The simplified store-and-forward traffic model only allows for split optimisation. Cycle time and offsets must be calculated by other control algorithms.
Roads networks is represented as sets of junctions j∈J and links z∈Z and as shown in Fig. 1. Each signalised junction j has sets of outgoing links O J and incoming links I j . A sample of urban road is shown in Fig. 1. It has two junctions M and N adjacent to each other, such that z ∈ I N and z ∈ O M . The remaining fundamental variables are: • i represents the stage identifier • x z (t) is the state variable indicating the number of vehicles in link z at step t • j represents the junction identifier • g j,i the control input indicating the green time of stage i at junction j • t discrete time index, t = 0,1,2... • S z represents the saturation flow of link Z • v z represents the set of stages where link z has right of way • t w,z turning rate; towards link Z from the links w that enter junction M • T the control interval in discrete time step • C j junction j cycle time Given that the cycle times C j for all junctions j ∈ J are the same and fixed such that C j = C. Equation 1 can consequently denote the dynamics of link z: Each z ∈ Z has an outflow capacity at specific green times; this is represented by the Saturation flow S z . S z could be fixed using a standard value or calculated by another approach; we assumed it is known and constant. Turning rates t w,z of z ∈ O j and also w ∈ I j , could be calculated in real-time or estimated utilising statistical values. Assuming T = C; a further simplification of the variables (replacing both second and third term) from Equation 1 will yield Equation 2: Such that, p z (t) represents inflow to link z, q z (t) represents outflow from link z. Also, d z (t) represents demand in the link z and e z (t) represents exit flow in the link z; in the sample time [tT, (t +l)T]. The exit flow e z (t) can be estimated by s z (t) = t z p z (t) while assuming that the exit rates t z are known. The resulting outflow is given in Equation 3: In a bit to reduce computational efforts, red-green switching in a cycle is not taken into consideration in the model. However, the modelled flow represents the average real flow for each period.
A linear scalar equation that represents a specified link is shown in Equation 1. Organising all interconnected conservation equations in a state space form (for individual link), equation 4 would represents a state space model that defines an entire traffic network: Such that x(t) represents numbers of vehicles in each link (state vector); g(t) represents all green time settings (control vector) and d(t) represents any disturbance within the network. B is the network characteristics, it is represented by a constant coefficient matrix of proper dimensions. For instance the network topology is represented by B.

MPC Constraints on UTC
Given a UTC traffic model, there are some constraints that have to be considered. The constraints are formulated from the store-and-forward model discussed in the previous section

Non-Negative Control Constraints
At any given time t there cannot be a negative volume of traffic flowing through link z. Also, the green split timing at any given junction falls between the traffic light cycle at that junction: , ,min ,

Traffic Light Cycle Constraints
All green time constraint holds for every stage i at junction j: Such that L j represents the set lost time and N j represents the value of stages, at the junction j.

Green Duration Constraints
Equation 7 represents the lower and upper bounds constraints on the green time at a junction: ,min , ,max , Such that g j,max represents maximum permissible time and g j,min represents minimum permissible time at junction j.

Flow Conflict Constraints
This is to avoid collision between links at a junction. Given a connected link only one link could be active at a time.

Non Negative Queue Constraints
The queues on a given link are restricted to length of the link connecting two junctions: such that x z,max value specifies the maximum number of vehicles that can be admitted into link z. This restriction helps to eliminate over saturation of a link in rush hours. It also makes sure that the value of a queue length on the road is nonnegative during the computation of control input.

Capacity Constraints
The capacity of a link must not be exceeded. Thus, the number of vehicles leaving any link will be limited by the state and capacity of the downstream link.

The Objective Function
The objective of this MPC formulation is to reduce the number of vehicles waiting in line at a junction. This is evaluated as the total time it requires to exit the vehicles waiting at individually connected junctions within a network of connected links. Thus, to reduce queuing distance on links, Equation 9 represents a quadratic costs function that satisfies Equation 4, 6 and 7; with the aim of minimising queues and optimising green times at a junction:

Automated Planning
The ability to reason with the dynamics of life and its environment by creating and implementing plans to solve challenges is one of the uniqueness of human race. Embedding this quality of man into artificial entities such as machines, is the foundation of Automated Planning. AI planning is a field that involve the formation of sequence or partially ordered plans whose execution solves a given problem; from an initial state or situation to a state that satisfies it specified goals conditions (Gupta et al., 1998;Fox and Long, 2003;Garrido et al., 2001). To embed deliberative property in control system, it is essential for the controller to be situationally aware of its components, its operating environment and the correlation between its component and environment (Jimoh et al., 2013b). This is accomplished through the extraction of the operational knowledge of a given domain, in this case, a road traffic domain. The extracted knowledge is declaratively represented in a language that can be understood by the planning system. The domain knowledge employed in the implementation of this work is represented in a language that is close to PDDL+ (Fox and Long, 2006). This structural language provides a formal declarative representation of the problem and domain entities along with all the operational policies of the domain.
Given a domain of interest with facts and description of the environment and problems within that domain, a UTC model could be defined as a symbolic system which has inference and rules that represent the domain of interest. Traffic flow models are of three distinguished types: Macroscopic model; microscopic model and mesoscopic model. Refer to the work of Hoogendoorn and Bovy (2001) for a detailed overview of existing traffic models. A macroscopic model is considered in this analysis through the use of aggregated variables to describe traffic flows.
The syntax and semantics of the domain description language used in this implementation are similar to PDDL+. Detailed explanation of PDDL+ syntaxes and semantics is in the work of (Fox and Long, 2006); this includes the semantics for the construction and implementation of state representation and progression.
A domain model has been encoded from a case study town centre area in the United Kingdom as shown in Fig.  3. The domain model is made up of static and dynamic part (Jimoh et al., 2013a). The static part represents road network topology, such as road name, road capacity, road length and junctions linking the roads. A directed graph is used to represent the road network layout, edges represent roads and vertices represent either source road, sink road or junction. Vehicles enter the network through the source road and exit the network through the sink road. The dynamic aspect of the model is represented by the flow rate of vehicles on each road and the queuing distance such road. The dynamic aspect of the model is continuously changing due to movement of vehicles in the road network. The UTC environment is modelled with predicates and fluents. The relationship between objects are represented with predicates. For example, given a predicate (link nLSouth wDStr) in a state S, it indicate that the road nLSouth is connected to wDStr in that state. Thus, traffic is allowed to flow from nLSouth to wDStr, provided all given constraints are satisfied. Fluents could be logical or numeric; it status are subject to changes within the model. Rich numeric expressions are possible with the use of numeric fluents. For example, a fluent = (queueLenght (nLNorth 300.0) indicates the current value of the queue in nLNorth to be equal to 300 m. A UTC Planning Problem involves the effective navigation of vehicles within a network of roads with the purpose of optimising traffic flow. In our model, we consider action operators, grounded processes and events. Figure 4-6 shows a sample declaration of an action operator, grounded processes and events respectively.

The Hybrid Approach
Exploiting the relationship and building on the synthesis of MPC and AI planning techniques to solve problems involving both discrete and continuous state variables lies at the heart of this research work. The hybrid approach uses an A * search algorithm technique for node exploration. The point within search space where search frontiers intersect or branch is referred to as the Node. State information and transitions are also stored in a node. The current node is expanded by comparing the preconditions of each operator with the proposition and numeric fluent; if it is satisfied given all other constraint are fulfilled; the effect of the operator is applied at the node. The declared numeric resource and constraints within the model are computed and updated at selected nodes during node exploration. Applicable operators are chosen and applied, in a receding horizon, to each state until the goal condition is satisfied or the expanded set of nodes becomes empty. Some essential definitions in the design and implementation of the hybrid algorithm are explained in the next section.

Preliminary Definitions Definition 1 (State)
Assuming a Close World Assumption (CWA) on S, a state S gives a description, at any given snapshot of time, the true situation of some world. Given that N is an assignment for the numeric variable to values and P are the set of atomic propositions. S is a pair 〈P,N〉.

Definition 2 (Initial State)
Given that N is an assignment of values to numeric variables and P is the set of atomic propositions that are true at the start of a planning problem. Initial State I = 〈P,N〉.

Definition 3 (Goal Condition)
Given that N is a set of numeric variables and P is a set of atomic propositions then, a Goal Condition G = 〈P,N〉. For a goal G to be satisfied in some state S values v satisfies some numeric constraints vL < v < vU specified by G. Thus, S satisfies the goal condition if S satisfies every proposition in P and ∃v = c ∈ N: Here c is a constant representing a value between the lower and upper bound of v.

Definition 4 (Domain Model)
The Domain Model (DM), consist of:

Definition 5 (Action)
An instantaneous action is characterised by sets of preconditions that must be true prior to the execution of the action and effects that becomes true after the execution of the action. The logical basis for actions is modelled using a collection of propositions, with vectors of numeric variables. Both P and v are manipulated and referred to by actions. The executability of an action is determined by its preconditions.
For example, the action switch to green has the precondition that the light is red with an effect that the light is green. A durative action A has three sets of preconditions: The condition that must hold at start pre ⇐ A, at the end pre ⇒ A and throughout the execution of the action pre ⇔ A. Effect could be durative or instantaneous, instantaneous effects are bound to the start e f f + ⇐ and e f f − ⇐ or end of the action e f f + ⇒ and e f f − ⇒ where positive and negative denote the propositions added and deleted at the start and end of A respectively. Also numeric effect e f n f ⇐ and e f n f ⇒ are updated at the start and end respectively. An example of action declaration is shown in Fig. 4.

Definition 6 (Processes)
A process p comprises of a precondition, C and a set of continuous effects, E, such that, if S |= C then the continuous effects are active at state S.
For instance, the inflow process of vehicles V to a road R through a junction J. This process has a precondition that a given phase at junction J is active that is 'Green' and that the road use level of R less than the road-capacity-level; and the constraint that J is a connected inflow junction to road R. Once R is filled or blocked, an event is triggered that stops the process. The effect of Inflow process increases R traffic level at the flow rate of V as shown in Fig. 5. The derivative of traffic level in R is the summation the active inflow processes rates of the at any given time.

Definition 7 (Event)
The event e is activated in a state S such that S |= C, where C is an assertion expressing what triggers the event e. Given that E describes the effects of C on event e; then event e is defined as a state transition of (C, E). The application of effect E on state S produce a new state s′ such that s′ -| E. For example, an event 'upstreamFilled' to be triggered, it requires the estimated number of vehicles on such road to be equal or greater that the road capacity limit of such road as shown in Fig. 6.

Definition 8 (Operators)
Given a set of proposition P (s) and numeric fluents N (s) , a numeric operator δ = 〈pre(δ);e f f (δ)〉 given that: • The condition for applicability pre(δ) of an operator δ consist of: • A proposition or set of propositions pre prop δ define over P • A numeric or set of numeric comparisons pre num δ in the form of (exp{>, ≥,<,≤, =}exp′).

Definition 9 (Operators Applicability)
An operator δ is applicable in a state S iff, s is satisfied the operators propositional and numeric preconditions. That is: • pre prop (δ) ⊆ P(s) and • pre num (δ) must be valid(i.e., equal or in range of values) in all n where n ∈ N(s)

Definition 10 (Plan)
A plan comprises of action sequences and initiated processes; that could lead the initial state into a state satisfying the goal conditions, taken all the stipulated constraints into consideration. Given a continuous planning problem Y = {I, G, DM} where, I is the initial state, G is a set of goal conditions and DM comprises of a set of operators. A solution for Ψ is a total ordered set of operators from δ, such that the ordered sequence of execution of these operators transforms I into a state where G is satisfied.

UTCPLAN: Top Level Algorithm
The planner input five components. These are: (a) The initial state (b) the goal condition (c) the domain model (d) the horizon prediction value and (e) the control horizon. The initial state "S" comprises of a set of propositions "P" and a sequence of the numerical variable "R". The Goal condition "G" is satisfied in a state S, if S satisfies every proposition in P and ∃v = c ∈ N: V L < c < V U for all v in N. Assuming c is a constant representing a value between the upper and lower bound of v. A detail component of the domain model is defined in the preliminary definitions.
The fixed horizon prediction value N p represents the period for which the MPC component will generate a new future prediction values to guide the search space. The control horizon value N c represents the number of nodes frontiers that are searched at every control horizon window after an MPC prediction episodes. N p and N c are tailored to the domain and the nature of the problem that the planner is intended to solve.
A node is initialised in Lines 1-2. There are four components that constitute a node in the search space: (a) the set of propositions "P" component of "S" (b) the numerical variable components in the "R" component of "S" (c) the variable "I" that updates and saves the dynamic prediction values over successive horizons; "I" is initially set to null (d) a partial plan.
The search space is initialised within the outer loop of Line 4. Line 5 utilises the MPC numeric optimisation and prediction process to generate numeric control variables. The output of Line 5 could be inferred as a set of predicted actions whose execution fulfills the stipulated objective function and guides the search space towards satisfying the goal condition. The inner loop of Line 6 11 expands the search frontiers over a fixed horizon window N c . The selection of the best node is informed by the output of UtiliseMPC procedure. The closest node to the given trajectory specified by the partial plan in the current ℑ is picked as the best node "n" and removed from "Q". The selected node "n" is expanded in Line 8 and returns a set of successor node "N". Line 9 adds "N" to the open set as detailed in Algorithm 5.3. There is currently no built-in specific heuristics for pruning the search space in UTCPLAN.

Algorithm 1 UTCPLAN: Top Level Algorithm
Given that the goal condition is not met upon the exit of the inner loop of Line 6-11; the best node is retrieve from Q informed by ℑ. The best node "n" becomes the start node for a new search for the next control horizon window. The selection of a single node might create incompleteness in the algorithm, but it restricts the search and utilise the guidance of the MPC approach to select the best node for pruning the search space. The search and optimisation procedure is repeated from the current node in Line 15 until the goal conditions are satisfied, or the open node set becomes empty.

Nodes Expansion
The current node n is expanded by selecting the appropriate operator that satisfies the condition at the node. The effect of the operator changes the state at a node from 'n' into a new state 'N' as explained in Algorithm 5.3. The procedure for the application of an action, initiation of a process and the triggering of an event is explained in Algorithm ?? respectively. Certain assumptions are made with regards to the event semantics. For instance, there is no different in the orders occurrence of simultaneous events. The detailed procedure for the application of an operator, grounded process and event is explained in Jimoh (2015).

Action Application
Definition 11 (Apply Action). Given an action a and a state s, if a is applicable in s, then a new state s′ is produces and denoted by s[a] as shown in Algorithm 3. ++n′.S)} end for P := {p′|p′ is an instantiation of some process p ∈ DM and n make p′:pre true} for all p ∈ P do n := apply p for a unit of time to n end for N := N∪{n} A action consist of logical or numeric preconditions. The effect of an action operator could be logical propositions; numeric updates of the current state after the execution of the action or both. An example is given in Fig. 4. The action 'switchGreen' has a logical precondition that 'roadA' and 'roadB' must be connected by at the same junction. The two roads are also controlled by the same signal phase. The action in Fig. 4 also indicate numeric preconditions of an interrupt level seven for the linked roads. This means that the connect roads must not be a congested road. The action effect alters the signal phase at this junction, which consequently initiates a flow process at the connected junction.

Algorithm 2 Expand(n) Algorithm
Algorithm 3 Action Application Input: s,a Output: s′. 1: s′ is initialised to be s; 2: All propositions in e f f + a that are not already in s are added to P(s) 3: All proposition in e f f − a are deleted from P(s) 4: All numeric fluent f where (f, op, exp) ∈ e f f num (δ) are updated 5: All state s ∈ S obtained by a non applicable operator is undefined and does not satisfy any condition.

Simulate Process Definition 12 (Simulate Process)
Given a ground process c and a state s, such that c is applicable in s, the application of c in s, denoted by s[c + ] to simulate continuous numeric changes in s for a period of time is as shown in Algorithm 4.
Whenever processes are initiated within a given node, it will run for a period of time at a single discretisation of a step count. For instance, time t becomes t =1, 2, 3...t n given that t n is the duration of the process simulation. Processes are initiated as an effect of an action or event trigger. The preconditions of process simulation are logical or numeric inequalities, but its effects produces a numeric update of the current state at the node. For instance, the effect of an action "switchGreen" in Fig. 4 could initiate a vehicles flow process at the flow rate of traffic on the connected roads as depicted by Fig. 5. Once a process is initiated at a node, it will continuously run for the specified duration of time, except if it is halted by an event. The current numeric status of the process is updated at the node upon the completion or halting of the process.
Algorithm 4 Simulate Process Input: s, c Output: s′. 1: initialise process duration time count = dur 2: repeat 3: All numeric fluent f such that (f, op, exp) ∈ e f f num (c) is updated and modified according to the defined op and exp involved 4: Time #t and other primitive numeric variables are updated 5: until event e is triggered or dur exceeded.

Event Application Definition 13 (Apply Event)
Given a ground event e and a state s, such that e is applicable in s, represented by s[e], the application of e in s lead to a new state s′ as shown in Algorithm 5.
Event application share some similarities with an action operator, except that, the unique difference is the fact that an action may occur if its preconditions hold, an event, on the other hand, must occur if its precondition hold. An event in the domain could be internally triggered from within a process, or outside the control of a process. Internally triggered event are interrupts that are activated while a process is running, it preconditions are usually numeric inequalities and their effect are also numeric assignments. These numeric assignments are set as preconditions for some actions in the domain. This means that the interrupts tell the planner to execute an emphaction that could change the emphstate of the system or flag a display.
An example of event is to manage the constraint of traffic spill-over at junctions during rush hour as shown in Fig. 6. It has a precondition to check the capacity of the connected road during the process of traffic flow at a junction. The effect of this event stops the currently running process from transferring queue to the upstream road. This is achieved by an interrupt trigger that halts the process and pushes the current state of the node to the priority queue node.
Externally triggered event are a result of interaction between domain objects. An example of such external event is the activation of connectors that link two separate roads. Once the condition for the connector is satisfied, the queue from the previous road flows to the connected routes. This is outside the control of a junction, but the ripple effect of such event (traffic flow) affects the queues at downstream of the junctions. The different between this connecting event and an action is that once the event precondition is satisfied, it has to be activated, computed and updated to the current state, however, an action might only be selected if it necessary get the state closer to the goal state.  (look ahead table) for a duration of control horizon N p window within the UtiliseMPC procedure. A numeric optimisation procedure takes into consideration all constraints in the domain DM and the generated values from the prediction table to compute the best control values ℑ within the horizon window N c , over a period of N p . The computed value ℑ is the updated at the node n and use as a guide for the next set of alterations.
The numeric optimisation procedure is implemented as Satisfiability (SAT) problem solver in AI planning, formerly used in Shin and Davis (2005); Audemard et al. (2002). Such that, the continuous numeric variables with their associated constraints are converted to a linear programming problem within the search node. The best combination of input satisfying the stipulated numeric constraint is returned and updated at the node. Given a domain of problem for instance, assume N c is set at 300 node count and N p is set at 30 sec. At every 300 node counts, the planner retrieves past numeric fluents, sent it to the UtiliseMPC procedure and update the result at the node. This means that the past numeric fluents are utilised during the generation of a new set of predicted numeric values over a prediction horizon period of 30 sec. The predicted new generated set of values serve as an input to the numeric optimiser; to obtain the best option of numeric combination that would be used during the next successive search frontiers.

Implementation Assumptions
It is assumed that the continuous approximation of numeric counts(queue length) is maintained within the network. This is obtained at different level of abstractions based on the following: Route (R) explored by the planner during search space; queue (Q) denoting the numeric value of each road object at any instance of time; Source (Sc) which represents the entering road to the networks and sink (Si) which represents the exit roads. Vehicles originate from the source, passes through roads, connectors and junctions, then end up in sink.
A road could be active or inactive at every time instance. Vehicles are assumed to move on an active road at the flow rate of unit value per seconds of time veh/sec. We assumed the flow rate of the roads were known and fixed at the initial state. The flowrate of inactive road is assume to be zero; due to no movement of vehicles on such road.
Each of the junctions has two phase (1 and 2). Traffic can move from north to south or from east to west at junctions. Two conflicting roads cannot be activated at the same time at a junction. The domain model, incorporate declarative descriptions of grounded event that monitors the movement of traffic within linking roads. The planner selects the appropriate green phase duration to controls the traffic of roads connected at a junction.
All dynamic inputs, such as turning rates are assumed constant; with an exception of the state variables (x z (t)) and controlled variables (g j,i ). The flow rate of individual junctions is also assumed to be constant. The rate of flow of vehicles is represented as a unit value per seconds of time (veh/sec). We assume we cannot control drivers behaviour; thus, we only control the green split (the controlled variable). We also assumed that the traffic flow dynamics are fully defined and included in the domain file.
We consider a linearised version of the quadratic problem that simplifies real-time calculations. Linearised methods often led to suboptimal solutions and could not consider the limits of some constraints exhaustively. Therefore, exploring more complex optimisation solution that can scale better in preferred for future purpose. The main objective of this implementation is not to scale the output metrics, but to investigate the feasibility of using our UTCPLAN approach in this domain of interest (UTC).

Evaluation
The main evaluation criterion is to show that UTCPLAN can indeed accept inputs expressive domain descriptions within urban traffic domain and output solution plans containing continuous processes, events and actions through the integration of MPC with AI searchbased planning techniques. This is measured by creating an expressive description of a UTC domain with traffic flow problems of various degree to test if UTCPLAN can generate execution plans that can control and manage traffic situation base on specified traffic goals.
The experimental traffic network (domain) is designed to have more than one connected junctions in other to test the centralise reasoning of UTCPLAN to manage upstream and downstream of traffic from connected road to the junction. This also allows us to test the feasibility of junction to junction traffic relationship within the network. Each junction in the model is designed to have more than one signal phase, for the purpose of evaluating the effectiveness of UTCPLAN at splitting the green times of the signal phases within a junction. There are several connected roads without a signaled junction within the network model; for the purpose of evaluating the effectiveness of UTCPLAN at reasoning with the dynamics of traffic flow in those linked roads not directly controlled by a signaled junction.
The effectiveness of the embedded MPC approach in UTCPLAN algorithm is tested with sample traffic domains; to evaluate the performance of UTCPLAN at controlling the signaled junctions while optimising the flow of traffic within the given network, during unexpected changes to the traffic situation. To achieve this, two signaled situation were created for experimental purpose:

Fixed
Signal duration are fixed for every junction within the network. The planner cannot alter the signal duration during search space. The planner reasons with the domain and problem information to generate solution plans using the fix signal value at every junction.

Controlled
Signal control is entirely at the discretion of the planner. The signal durations are set at initial state; however, the planner alters the signal duration whenever it anticipates a better control performance during search space; utilising the embedded MPC approach.
The speed of UTCPLAN was assessed with different volume of traffic with bottlenecks to investigation the plan generation time during light and heavy traffic situation. Numerous traffic flows were generated by altering the values of queuing distance on roads to create a heavier flow of traffic in the test suite. The quality of plan generated by UTCPLAN was evaluated for both controlled and fixed signal experiment. This is achieved by computing the total number of executable actions and initiated processes within the output plans, for both fixed and controlled signal.

Evaluation Criteria
To investigate the applicability and effectiveness of UTCPLAN, we use three evaluation criteria for comparison: Total time taken to generate a plan; the average number of processes initiated and the average number of actions sequence in the output plan. Makespan is not considered in this criteria because this implementation does not include a scheduler for makespan optimisation in the plan. Thus, using makespan as a major metric would not be suitable as criteria for evaluation of the planner.
A variation of UTCPLAN was created for the purpose of comparison and experimental analysis. This variation creates a planner version without integrating MPC approach. This version produces a Fixed Signal approach; it reasons with numerics within the domain similar to a classical numeric planner Hoffmann (2003). The Fixed Signal and the Controlled Signal are tested with the same formulation of domain and problems. Several traffic problems of increasing complexities were abstracted and modelled within the UTC domain. The modelled traffic problems are suitable for UTCPLAN evaluation because it highlights the advantages of the controlled signal (with MPC integration) over the fixed signal approach. The time discretisation of t = 1.0, is used in the two test cases (Fixed and Controlled); and the entire task in the UTC domain. The time taken to solve problems in our experiment is shown in Fig. 7. The performance of the planner (controlled signal) is compared with fixed signal value. The results of the fixed time duration compared with the controlled approach are reported in Table 1. Given that x 2 is the new average value and x 1 is the previous average value, the percentage change in value y% is measured by Equation 10 and recorded in Table 1: This helps to visually illustrate the trend in plan quality of both the fixed and the controlled experiment. A decreasing (↓) trend in the value of y implies a good quality plan while a continuous increase (↑) in the value of y means that the planner output is affected by the complexity of the problem in the domain. The more complex the problem becomes the more the challenge to generate quality plan at a reasonable time. Moreover, when y is zero, it means the output plan is steady and stable despite an increase in problem complexity.

Test Environment
The UTCPLAN algorithm is implemented in Netbeans Java 8.0 which involves the creation of a continuous planner with an embedded MPC approach. The domain and problem representation (traffic description) are also developed in Java to facilitate easy data transfer between planner and network information description. The experiment was run on Ubuntu 15.04, Intel Core i7 on a 16GB RAM at 2.20GHz.

Result
The plan contains the sequence of action operators needed to optimise traffic flow within an urban traffic network until the goal condition is satisfied. Figure 7 shows an excerpt of a sample plan generated by UTCPLAN for a controller to solve a UTC control problem instance.

Empirical Analysis
A output plan is the sequence of steps needed to get to a goal condition from an initial problem situation. The total length of a plan for a given problem varies from planner to planner. The shorter the length of the generated plan, the better the quality of the plan. The lesser the number of actions and processes needed to achieve a goal condition the better the quality of the plan for such problem domain.
The average total time taken to generate a plan is a metric that shows the efficacy and speed of the planner. The total time depends majorly on the planner algorithm. It is also dependent on some other factors such as the language used to implement the planner and the hardware configuration of the system that the planner resides on. The faster it is to achieve the goal condition the lesser the total time to generate a plan and vice versa. The total time taken to generate a plan is an essential criterion for the evaluation of planners in AI planning. A planner is effective in a domain of problem if the total time to generate a plan for problems in that domain remains steady and stable. However, if the total time to produce a solution in a domain of problem is astronomically increasing with an increase in the complexity of the problem, it means the planner might get stuck during certain problem situation in such domain. Table 1 presents the percentage rate of increase in queues within the network and the effect of those percentage increase on the average total time as illustrated in Fig. 7. It is observed that the average total time required to generate a plan varies with a variation in queuing distance and the green split values. The percentage change in total time increases with an increase in queue length at fixed signal. However, the percentage change in the total time of controlled signal is remarkable at a low increase rate with increase in queue length.
The trend in the percentage change in average number of processes initiated by generated plans is also shown in Table1. The percentage change in the average number of processes increases with increase in queue length at fixed signal. However, the percentage change in the average number of processes is reduced to zero percent despite an increase in queue length when the signal is controlled by UTCPLAN. It increases a little when the length of the queue reaches close to 200 m but later drop back to zero percent despite a further increase in queue length. The total number of processes initiated by the planner to achieve the goal condition increases with an increase in the congestion rate whenever the signal is fixed as shown in Fig. 8. However, the changes are minimum and often becomes steady despite the increasing queues in the network when the green split is controlled by the UTCPLAN approach within the traffic network.
Similarly, Table 1 shows the trend in percentage change in the average number of action operator within the plans. This increases with an increase in queue length at fixed signal. However, the percentage change in the average number of action operator is reduced to zero percent despite an increase in queue length when the signal is controlled by UTCPLAN. The total number of actions generated by the planner to achieve the goal condition increases with an increase in the traffic congestion rate whenever the signal is fixed. However, the changes are also minimum and often becomes steady despite the increasing queues in the network when the green split is controlled by UTCPLAN approach within the traffic network as illustrated in Fig. 9.

Discussion
The percentage change in output value gives a visual illustration of the trend in plan quality of both the fixed and the controlled experiment. A decreasing (↓) trend in the output value implies a good quality plan while a continuous increase (↑) in output value means that the planner output is affected by the complexity of the problem in the domain. The more the complexity of the problem, the higher the challenge to generate quality plan at a reasonable time. Moreover, when the percentage change in output value is zero, it means the output plan is steady and stable despite an increase in problem complexity as illustrated by Fig. 7-9.
Stability in plan metrics can not be achieved by a planner with fixed duration. It can only be achieved by a planner that can establish a unique approach to numeric fluents during search space. The stability in the controlled output plan metric is achieved through the novel integration of MPC approach with AI planning. This implies that the time to generate a valid plan, as well as the quality of plan generated, becomes stable at some point irrespective of the increase in complexity of the problem domain. For instance, the result shows that a controlled approach is required to optimise any traffic situation. The effectiveness of UTCPLAN approach at tracking and predicting numeric changes, while evaluating the effect of those changes during search space, helps to anticipate increasing or decreasing queue trends within the network. The controlled green time is always suited to the changes in the network. This helps to keep the network in a stable state despite increasing congestion. The result indicates a favourable output in both signal test cases when planning with tasks of less complexities. It is inferred from the result that the fixed and controlled signal approach produce excellent control performance during a lesser traffic situation. However, a vast output difference is observed between the two instances when planning with tasks of higher complexities. It is inferred from the result that the run-time of controlled signal increases initially, then become steady despite an increase in traffic congestion and bottleneck. While the run-time of the fixed signal gets worse with increasing traffic congestions and bottleneck as shown in Fig. 7 (right side), because large traffic demand generates huge search space and, therefore, the solution requires more computational time especially at lower fix duration.
The total number of actions sequence and initiated simulation in the plan generated by the fixed signal is 45% above the controlled signal plan. Thus, the controlled plan is has a lesser plan length in over 80% of the tasks in the test suite compared with the fixed generated plan. This evidence confirms that UTCPLAN generates a more quality plans. Another benefit of the controlled instance is the ability to reach the goal condition in lesser time for most of the problem instances, though the domain courage is the same for both configurations (both test instances solved all the modelled problems in the domain).
The creation of a rich declarative representation of the UTC model facilitates reasoning with logical constants, variables and constraints within the model; but a classical MPC formulation might not take logical formalities into consideration. However, the MPC mathematical formulation and computation of UTC numerics within the model, facilitate dynamic control of traffic signal and vehicle routing; this might not be effectively achieved by classical AI planning search mechanism. Integrating and utilising the two approaches create an effective control of continuous numerics combined with the logical component within a model.

Scaling Difficulties
UTCPLAN currently, does not have a built-in specific heuristics for pruning the search space. Integrating advanced planning solvers into the search pattern of this implementation would boost the speed of planner. The implementation made use of a simple classical numeric solver; the use of a state-of-the-art commercial solver would enhance the robustness and scalability of UTCPLAN to deal with a larger network of constraints in future implementation.

Conclusion
We introduce UTCPLAN, a planning system that embeds model predictive approach into an AI planning search paradigm. UTCPLAN supports the analysis of domain descriptions containing continuously changing processes, events and actions. Experimental evaluation shows that our novel approach can control traffic and reduce congestion when tested on a sample road network. The application to Urban Traffic domain is utilised to validate the practicability of this novel hybrid integration on a continuous domain with logical preferences. The result shows that UTCPLAN can reason with continuous processes in the domain and has the potential to generate control and execution plans and schedules that will keep such domain in a desirable state.