Improving Search by Utilizing State Information in OPTIC Planners Compilation to LP
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Improving Search by Utilizing State Information in OPTIC Planners Compilation to LP Elad Denenberg1 , Amanda Coles2 , and Derek Long2 1 University of Haifa, Abba Khoushy Ave 199, Haifa, Israel 2 King’s College London, 30 Aldwych, London, United Kingdom arXiv:2106.07924v1 [cs.AI] 15 Jun 2021 Abstract that contain temporal as well as discrete and continuous change, the planner would require more complex ap- Automated planners are computer tools that allow au- tonomous agents to make strategies and decisions by deter- proaches. For instance, SMTPlan formulating the problem mining a set of actions for the agent that to take, which will into SAT (Cashmore et al. 2016), ENHSP using interval carry a system from a given initial state to the desired goal relaxation (Scala et al. 2016), DiNo discretizing time state. Many planners are domain-independent, allowing their (Piotrowski et al. 2016), and qtScoty using convex op- deployment in a variety of domains. Such is the broad family timization (Fernández-González, Karpas, and Williams of OPTIC planners. These planners perform Forward Search 2017). and call a Linear Programming (LP) solver multiple times This work focuses on a family of planners that uses at every state to check for consistency and to set bounds on Linear Programming (LP) solvers to schedule the plan. the numeric variables. These checks can be computationally This family includes COLIN (Coles et al. 2012), POPF costly, especially in real-life applications. This paper suggests a method for identifying information about the specific state (Coles et al. 2010), and OPTIC (Benton, Coles, and Coles being evaluated, allowing the formulation of the equations to 2012). Planners from this family were used in a variety of facilitate better solver selection and faster LP solving. The real-world applications including robotics, (Cashmore et al. usefulness of the method is demonstrated in six domains and 2015), Autonomous Underwater Vehicle (AUV) control is shown to enhance performance significantly. (Cashmore et al. 2014), Micro Aerial Vehicles (MAV) con- trol (Bernardini, Fox, and Long 2014) and space applica- tions (Coles et al. 2019; Denenberg and Coles 2018). These 1 Introduction planners perform forward state-space search starting from Automated Planning (often called AI Planning) is concerned the initial state. At each search state, an LP solver is used with formulating a sequence of actions that transforms a sys- once to determine whether a consistent schedule for the tem from a given initial state into a desired goal state. One plan exists. If no consistent schedule exists, the search strength of AI Planning is domain-independence: a single branch can be pruned. If the state is consistent, the LP is general planner can plan in a wide range of different ap- then used several more times to bound the numeric vari- plication domains. Examples of domains in which Planning ables and thus tightening the space of applicable actions in was used include space (Chien et al. 2000), battery usage this state, narrowing the search space ahead. Recent work (Fox, Long, and Magazzeni 2011), and software penetration (Denenberg, Coles, and Long 2019) has shown that solving testing (Obes, Sarraute, and Richarte 2013). To facilitate ap- LPs at every state multiple times may cause the search pro- plication in realistic problems, planners need to reason with cess to become slow and ineffective. expressive models of the world. Such models can be tem- The contribution of this work is twofold: poral: finding a plan with timestamped actions, taking into account action durations and concurrency, and numeric: con- 1. We propose a better translation from the search state to sidering variables that change discretely, or continuously, LP. Our new compilation allows the better selection of over time (Fox and Long 2003). optimization tools required for the consistency check of To solve expressive problems that contain tempo- the problem, possibly eliminating the necessity of an LP ral constraints, the planner requires a scheduler – A solver. technique for assigning values to the action’s times- 2. We propose methods for using information from a given tamps that would result in a valid plan. For example, state to compute the current variable bounds in the search, the temporal planners SAPA (Do and Kambhampati thus calling the solver fewer times and improve the per- 2001) and Temporal Fast Downward (Bell et al. formance of any planner from the OPTIC family. 2010) utilize the decision epoch mechanism, Crikey 3 (Bemporad, Ferrari-trecate, and Morari 2000) uses Sim- The rest of the article is ordered as follows: Section 2 de- ple Temporal Network (STN). To solve hybrid problems scribes the problem and the current way OPTIC planners
m Leg1 Leg2 distance Can start observing Obs1 Obs2 Obs3 Obs4 Obs1 start dist. Obs4 start dist. Precondition Figure 1: The flying observer time Figure 2: Distance Requirement solve it. Section 3 presents the suggested methodology. Sec- tion 4 presents the performance of the suggested improve- ments in a variety of different domains. to plan an Unmanned Aerial Vehicle (UAV) observation mis- sion. The UAV is required to fly legs over a defined stretch 2 Background of land containing objects to be observed. Each leg is of dif- 2.1 Problem Definition ferent length. Each observation has a different duration and A temporal planning problem with discrete and linear con- requires a different type of equipment. A target-start dis- tinuous numeric effects is a tuple: tance defines the area within the leg in which the observa- tion must take place. The observation can only occur when hP, V, I, Gi (1) the UAV has flown more than the target-start distance of that leg (flownl ≥ target-starto ). A continuous numeric effect where P is a set of propositions and V a set of numeric of the flyl action updates the distance flown so far in a leg: variables. A state S is defined as a set of value assignments dflownl to the variables in P and V. I is such a set representing the dt = Vell , where Vell is the flight velocity. initial state of the system. G is the goal: a conjunction of Fig. 1 illustrates an instance of this domain: in this in- propositions in P , and linear numeric conditions over the stance, two legs are defined (marked in solid blue lines). variables in V , of the form w1 v 1 + w2 v 2 + ... + wi v i {< In each leg, two observations are required (marked in red, , ≤, =, ≥ >}c (w1 ...wi and c ∈ R are constants). An action pattern-filled lines). All observations have a target-start dis- changes the values and carries the system from one state to tance defined, but for clarity, only the starting distance of the another. An action is defined as a tuple as well: first and last observations are shown. In order to perform an observation, a defined piece of hd, pre⊢ , eff ⊢ , pre↔ , eff ↔ , pre⊣ , eff ⊣ i (2) equipment needs to be calibrated and configured for a spe- where d is the duration of the action constrained by a con- cific observation. Once the observation is done, the equip- junction of numeric conditions. pre⊢ and pre⊣ are conjunc- ment needs to be released to become available for future ob- tions of preconditions (facts and numeric conditions) that servations. The domain comprises the following actions: must be true at the start and end of the action, pre↔ are take-offl(dur=5; pre⊢ ={on-ground,first-legl}; invariant conditions (preconditions that must hold through- eff ⊢ ={¬on-ground, flownl =0}; eff ⊣ ={flyingl}), out the action’s duration), eff ⊢ and eff ⊣ are instantaneous set-coursel1,l2 (dur=1; pre⊢ ={donel1 , nextl1,l2 }; effects that occur at the start and end of the action. Such ef- eff ⊢ ={¬donel1}; eff ⊣ ={flyingl2, flownl2 =0}), fects may add or delete propositions p ∈ P (eff + , eff − ) flyl (dur=distancel /speedl ; pre⊢ ={flyingl }; or update a numeric variable v i ∈ V according to a linear pre↔ ={flownl ≤distancel }; eff ⊣ ={donel , ¬flyingl } instantaneous change: eff ↔ ={dflownl /dt +=1}), configureo,e(dur=1; pre⊢ ={availablee , optionforo,e}; u{+=, =, -=}w1 v 1 + w2 v 2 + ... + wi v i + c (3) eff ⊢ ={¬availablee } eff ⊣ ={configuredforo, pendingo,e}), where u, v i ∈ V are numeric variables, and c, wi ∈ R are observel,o (dur=time-foro; pre⊢ ={configuredforo, weights. eff ↔ is a conjunction of continuous effects that act containsl,o, awaitingo, target-starto≤flownl }; upon numeric variables throughout the action’s duration. In pre↔ ={flyingl }; eff ⊢ ={¬awaitingo}; eff ⊣ ={observedo}), this work, we assume all change is linear and is of the form: releaseo,e (dur=1; pre⊢ ={pendingo,e}; dv eff ⊢ ={¬configuredforo, ¬pendingo,e}; eff ⊣ ={availablee }) {+=, =, -=} c (4) The target distance precondition and temporal constraints dt force the observe actions to fit within the fly action. The where c ∈ R is a constant. meaning of the precondition is illustrated in Fig. 2: The blue The planner is required to find a set of actions in A and line is a depiction of the distance change as the UAV flies their schedule, that would carry the system from the initial over the leg. The dashed red line is the precondition signi- state to the goal state. fying the distance required for the start of the observation. When the distance reaches the value required in the precon- 2.2 Running Example dition, the observation can start. Notice that this problem is, (Denenberg, Coles, and Long 2019) first introduced this ex- in fact, temporal, the numeric constraint can be easily con- ample; it is an anonymized model of a real-life problem. In verted to a temporal one depending on the manifestation in this domain, named flying observer, the planner is required the temporal state can be seen in Fig. 3.
Fly Leg 1 Fly Leg 1 ǫ is a small positive constant. The temporal and ordering Configure Configure constraints, formulated as Eq. (5), constitute a STN and the Observe Observe planner uses a Simple Temporal Problem (STP) solver to check for temporal consistency. If the STP can solve all Distance requirement Distance requirement equations (i.e., assign values to all time-steps such that the equations are valid), then the STP was able to find a sched- (a) Possible Plan (b) Impossible Plan ule, and the state is temporally consistent. If S ′ is propositionally and temporally consistent, the Figure 3: Durative Meaning of Distance Requirement planner will compile the problem into an LP and then check for numerical consistency. Each step of the plan i is given an LP variable ti . Each variable v ∈ V is given three LP 2.3 OPTIC and Forward Search variables for each step i: vi , vi′ , and δvi . vi denotes the value The OPTIC family of planners is based on the methodology of v just before applying the action in step i, and vi′ is the of converting a state to an LP described in COLIN paper value right after the action’s application. The planner applies (Coles et al. 2012). Here we survey that methodology. Eq. (3) to the affected variable thus: To find a path from the initial state to the goal, OPTIC performs Forward Search. Starting from the initial state, OP- vi′ = vi + w1 u1i + w2 u2i + ... + wn uni + c (6) TIC branches over applicable actions, exploring partially- where v, u ∈ V are the numerical variables, uni is the value ordered but un-time-stamped sequences of instantaneous ac- of the nth numerical variable at step i. wn s are weights. tions. Durative actions are converted to a pair of instanta- δvi is the value of the sum of all changes currently acting neous snap-action. Snap-actions mark the start (A⊢ ) and end on v. Recall; in this work, each continuous effect is defined (A⊣ ) of a durative action A. A⊢ has preconditions pre ⊢ A by a constant. When a new action is applied at step prev, the and effects eff ⊢ A; A⊣ is analogous. We define the set Ainst contribution of its effect is added to δvprev ; when an effect to contain all instantaneous actions in A, including snap- ends at step i, the value is removed from δvi . actions. A state S in the search can be thought of as a set contain- δvi−1 + cA ifAi = A⊢ δvi = (7) ing: propositions (S.p ⊆ P ) that are true in S, and upper δvi−1 − cA ifAi = A⊣ (S.max (v)) and lower (S.min(v)) bounds on the value each The value of vi denoting the value just before the application variable in V can hold in S. In the initial state, all variables of the action at step i can be computed thus: have max (v) = min(v) = vI the value of v specified in the initial state; max (v) and min(v) will only differ from each vi = vprev + δvprev (ti − tprev ) (8) other in following states if a durative action with a continu- where i is the current step index and prev is the index of ous effect has acted on the variable. the last step in which the value of v was computed. Note; if An action is deemed applicable if all its propositional the calculation of the value of v is required for an invariant invariants are satisfied by S.p and if all numerical invari- in step j, the next time v will be computed the time interval ants can be satisfied by any value between S.max (v) and will be ti − tj , regardless whether δv is changed or if the S.min(v). The planner compiles a list of all open applicable action is acting on v. (named openlist). Search proceeds by popping the first state If the STN or LP finds that the state S ′ is inconsistent (i.e., from the openlist: in our work, we use WA* (W=5), so sort there is no solution, no schedule that would enable achiev- the openlist by h(S) + 5.g(s), using the temporal-numeric ing the state), it is pruned, and the search will not advance RPG heuristic of COLIN (Coles et al. 2012). down that branch. If the state is consistent, then the LP solver All successors S ′ of S are generated by adding or deleting is called two more times for each variable v to optimize it all propositions in eff + − a and eff a respectively, and applying and compute the new max (v) and min(v) using the stan- all discrete numeric effects to both max (v) and min(v) for dard temporal/numeric relaxed planning graph heuristic of all v ∈ V affected by eff num a . This guarantees the S ′ to be Colin (Coles et al. 2012). It is then inserted into the openlist, propositionally consistent. providing h(S ′ ) 6= ∞, i.e., the heuristic does not indicate S ′ OPTIC then transforms all the temporal constraints to the is a dead-end. following form: Note that when no numerical change is present, it is suf- Lb ≤ tj − ti ≤ U b (5) ficient to use STN to prove consistency. When continuous changes or numeric constraints are present, the LP solver is where Lb, U b ∈ R are the upper and lower bounds of a required for the proof of consistency. time interval. OPTIC also adds the necessary ordering con- In addition to the consistency check, the LP is also used straints to the plan. The action that has just been applied is to compute the bounds max (v) and min(v) of S ′ . If all con- ordered after the following: last actions to add each of its tinuous actions on v have ended, then the last defined value preconditions, actions whose preconditions it deletes, and vi can be maximized and minimized to compute the bounds. actions with numeric effects on variables it updates or refers If v had a continuous effect start but has not yet ended, an- to in preconditions/effects. All ordering constraints are of other time variable is added to the LP denoted tnow , repre- the form tj − ti ≥ ǫ, where tj ,ti are the times at which senting the latest timestamp. For each variable with an ac- the new and existing action must occur, respectively, and tive effect, a variable vnow is added. tnow is ordered after all
Step Action variables constraints comment 0 TakeOff t0 ≥0 3 Informed Selection of Solver for t1 −t0 ≥ ǫ Step1 afer Step0 Consistency and Update f lown l01 =0 Initial Assignlemt 1 Fly⊢l0 The OPTIC methodology described in the previous chapter f lown l01 Value after action f lown l0′1 ≤ distance l0 Invariant was developed to accommodate the general case in which t2 −t1 ≥ ǫ Step2 after Step1 = f lown l0′1 + 1 ∗ (t2 − t1 ) Value before action hybrid planning is to be done, covering all possible state f lown l02 ≥ target-start o1 Start precondition types. The planner uses the general representation both in 2 Observe⊢o1,l0 ≤ l0 dist Invariant = f lown l02 Value after action consistency check and in the variable update. It was shown f lown l0′2 ≥ T arget dist o1 Start precondition in (Denenberg, Coles, and Long 2019) that the general ap- ≤ l0 length Invariant proach could, at times, lead to slow solving. −t2 ≥ ǫ Step3 after Step2 t3 −t2 ≤ time − f oro1 Action duration In this section, we propose two new methods for identi- = f lown l0′2 + 1 ∗ (t3 − t2 ) Value before action fying two specific cases that frequently arise in the state in 3 Observe⊣o1,l0 f lown l03 ≤ distance l0 Invariant = f lown l03 Value after action real-life problems. In such cases, the use of an LP solver can f lown l0′3 ≤ l0 length Invariant be made redundant, facilitating faster solving. In other cases, tnow −t3 ≥ ǫ,−t2 ≥ ǫ,−t1 ≥ ǫ, After All Steps information from the current state may be injected into the 4 now f lown l0now = f lown l0′3 + 1 ∗ (tnow − t3 ) Value Now problem definition to allow for faster solving. Table 1: LP Equations of a Partial Plan The first method examines the latest added action that car- ried the system from state S to state S ′ . The second involves a conversion of specific numeric constraints and effects into STN form. Finally, we describe how both these processes other time steps, and vnow is calculated using Eq. (8). The can facilitate a more effective update of variable bounds. LP solver then minimizes and maximizes vnow to find the possible bounds. 3.1 Observing the Latest Action As stated previously, OPTIC solvers attempt to prove incon- (Denenberg, Coles, and Long 2019) showed that though sistency with an STP first. Then if continuous numeric ef- it was previously thought that calling the LP solver is ben- fects and numeric constraints are present in the current state, eficial both for state consistency and for the variable bound the planner compiles the problem as an LP. OPTIC planners update, in large real-life domains, the calls to the LP solver treat each state in the most general way: in the general case, may slow the search down. The premise was that the LP every action may render the new state inconsistent. How- problems solved are small, and therefore the call to the off- ever, using knowledge about previous states, some instances the-shelf solver would not be computationally expensive. in which the LP solver can be avoided may be found. However, it was shown that in some real-life applications, Consider the state TakeOff⊢ , TakeOff⊣ , Flyl⊢1 , this was not true: when the state contained many actions configure⊢o1,e2 : The partial plan contains continuous and multiple variables, the LP grew large and the solving numeric effects on the variable f lownl0 . Therefore, to of which became slow. prove this partial plan consistent, OPTIC requires an LP Table 1 demonstrates the process of converting a state into solver. However, since the planner is performing forward an LP. The table shows the LP equations for the partial-plan: search, to reach this state, the planner must have been take-off, fly⊢l0 , observe⊢o1,l0, observe⊣o1,l0 (for conciseness we in a previous state, which it found consistent: TakeOff⊢ , assume that take-off is instantaneous and no configure ac- TakeOff⊣, Flyl⊢1 . The configure action that is added does not tions are required). require the value of f lownl0 , and its effect is propositional. Assume the state S is propositionally, temporally, and nu- The first action receives a single time variable t0 . The merically consistent. The new state S ′ , which is reached ⊢ second action F lyl0 , receives a time variable t1 , which is from S by addition of action A ∈ Ainst cannot be rendered ordered after t0 , and the value of f lown is computed. The numerically inconsistent if A does not contain any numer- value before the fly action is the initial assignment, which ical effects or constraints. Furthermore, since OPTIC only is 0. Since there is no instantaneous effect on the action’s examines states S ′ that are generated to be propositionally start, the value just after the application of the action is the consistent, the state only has to be tested only for temporal same. The invariants on the value are enforced just after the consistency. beginning of the fly action. The same process is repeated Thus, if an added action A contains only propositional or for the next action Observe⊢o0 : assigning a time variable for temporal constraints and effects, the state S ′ can be deemed the action, calculating flown before and after the application consistent by using the STP, and an LP is not required. of the action, and enforcing invariants. Note the ordering Note that if this test has determined that S ′ is consistent, constraints formulated as temporal constraints in all actions. there is no need for a numerical variable bound update, as Also, the f lown variable is computed at each step, and its those do not change. value depends on the previous step. The next section will describe a method for changing 3.2 Reformulation of LP Eq. (8) in a way that would allow calling the LP solver fewer During the search, OPTIC compiles the state into an LP, as times, and compile certain problems containing numerical depicted in the previous section. This transformation is done constraints and change as STP. in a step-wise manner, meaning each step is transformed into
Step Action variables constraints comment 0 TakeOff t0 ≥0 Notice that at step i, all the variables on the right-hand side t1 −t0 ≥ ǫ Step1 afer Step0 of Eq. (11) are known and are constant. Therefore, Eq. (11) 1 Fly⊢l0 f lown l01 =0 Initial Assignlemt can be formulated for each step i as a constraint of the form f lown l01 Value after action f lown l0′1 ≤ distance l0 Invariant of Eq. (10) as long as vpef f is constant. Notice that the nu- t2 −t1 ≥ ǫ Step2 after Step1 merical constraint in Eq. (10) is converted to a temporal con- = f lown l0′1 + 1 ∗ (t2 − t1 ) Value before action straint. If all constraints in the state can be converted thus, f lown l02 ≥ target-start o1 Start precondition 2 Observe⊢o1,l0 ≤ l0 dist Invariant then the problem is, in fact, an STN, and the LP solver is not = f lown l02 Value after action required. f lown l0′2 ≥ T arget dist o1 Start precondition ≤ l0 length Invariant The example given above can be converted in such a way. t3 −t2 ≥ ǫ Step3 after Step2 The constraint f lown l02 > T arget dist o1 can be con- −t2 ≤ time − f oro1 = f lown l0′1 + 1 ∗ (t3 − t1 ) Action duration Value before action verted to t2 − t1 >(T arget dist o1−f lown l01 ) /1 . All numeric 3 Observe⊣o1,l0 f lown l03 constraints in this partial plan can be converted in the same ≤ distance l0 Invariant = f lown l03 Value after action way. This means that even though continuous numerical ef- f lown l0′3 ≤ l0 length Invariant tnow −t3 ≥ ǫ,−t2 ≥ ǫ,−t1 ≥ ǫ, After All Steps fects and numerical constraints are present, this problem is 4 now f lown l0now = f lown l0′1 + 1 ∗ (tnow − t1 ) Value Now temporal, and can be solved with an STP. If all numerical constraints were converted to temporal, Table 2: New LP Equations of a Partial Plan the planner could determine consistency using the STN. However, updating the bounds is still required. Observing Eq. (8), we note that since vpef f is constant, the maximum a set of equations, and each step builds on the previous one. and minimum of vi are dependent on the interval No consideration is taken as to what effect a step has on a variable; as long as the value of the variable is required, its Ti = ti − tpef f (12) value will be computed, and the next step will use said com- The minimal size of the interval is zero. The maximum may puted value. The notation tprev denotes the previous step be drawn from the state: if another constraint exists such that at which the value of variable v was calculated, and the next limits ti or if tpef f is a start action beginning an effect, and step i > prev that computes the value of v will use the value is the only continuous numeric effect present, the maximal stored in vprev . interval is the duration of the action. If no such value can be Here we suggest making a distinction between two types derived from the problem, then the interval is set to infinity. of steps that affect the value of v: A step containing a start or If δvi > 0 then the minimum value for vi is when Ti = 0 end of a continuous numeric effect and a step that does not. and is min (vi ) = vpef f ; and the maximum is max (vi ) = The later is a step containing numerical constraints on v but vpef f + δvef f Ti . The case in which δvi < 0 is analogous. does not change the value of δv. To distinguish between the This update method does not require an external solver and, two, we propose two notations ef f for steps that start or end therefore, very fast. an effect and const for steps containing only constraints. Using the new notation tpef f would be the last time in 3.3 Efficient Variable Update which v had an effect start or end, vpef f would be the value (Denenberg, Coles, and Long 2019) have shown that the of v calculated at that time point. Then Eq. (8) is written variable update is often the task that is most computation- thus: ally expensive as it requires several calls to the LP solver, vi = vpef f + δvpef f ti − tpef f (9) depending on the number of numerical variables in the par- The conversion of the equations is demonstrated in the pre- tial plan. The previous section has detailed several cases in vious example: this is presented in Table 2. Notice the dif- which the variable update can be avoided or done without ference between Table 1: The effect acting on the variable the use of an LP. Here we attempt to facilitate faster LP solv- f lown l0 started in step 1, and therefore the computation of ing in the update phase in case it is still required. f lown l02 f lown l03 is always done with respect to step 1. The LP solver may use one of several optimization meth- If no continuous numeric actions have been acting on ods; however, the selection of the method, as well as the v before the last action at tpef f , then the value of vpef f method speed, depend on whether the feasible space is is a constant. This is seen in our example. The value of bounded and in which direction. We would like to supply f lownl 0′1 is the same as the initial assignment. This conver- the solver with information about variable boundlessness. sion extremely useful when all constraints are of the form: We cannot use the bounds from the previous state S as those might have changed by action A. v≤C (10) Therefore, we again examine the snap action A added in where v ∈ V and C ∈ R. This constraint is written as a the last step that carried the system from previous state S less than-equal-to constraint. Without loss of generality, this to the current state S ′ . Recalling state S contains bounds on includes all constraints that have a single variable on one v (S.max (v) andS.min(v)). We wish to determine whether side and a constant on the other. If vpef f is constant, then the action A is capable of expanding the limits of v, caus- using Eq. (9) when enforcing Eq. (10) at step i, we can write ing the interval [min(v), max (v)] to grow if A causes the bounds to contract, or if the interval retains its size but shifts. C − vpef f If an instantaneous numeric effect exists, then the bounds ti − tpef f ≤ (11) δvpef f on the latest defined vi (the step at which all continuous ef-
Observations Instance Observations Legs Required in Goal server that (Denenberg, Coles, and Long 2019) has pre- 1 10 28 4 sented as a domain stemming from the industry. Two dis- 2 15 38 6 tinct models of this physical domain were tested: The first is 3 20 48 8 4 25 58 10 identical to the running problem. The second variant of this 5 30 68 12 domain has a requirement that the observer is flying before 6 40 78 14 configuring or releasing pieces of equipment. 7 40 88 16 The problems in the first variant all require a single obser- 8 40 88 18 .. .. vation to be made in each leg. The instances are described in . . Table 3. 17 40 88 36 In all instances of the second variant, six observations are Table 3: Single Observation Per-Leg Instances defined for each leg, where the first and last leg both share an observation that cannot be performed in the first leg. The fact the observation cannot be used in the first leg may lead fects have ended, or vnow ) can either be shifted due to an the planner down a branch, which would not be useful. The increase or decrease. An assignment would make the latest first instance contains two legs, the second three, and so on. value a constant (max (v) = min(v) =Assigned value). If (Denenberg, Coles, and Long 2019) showed that these prob- snap action A contains a continuous numeric effect on v, lems are challenging for the planner. then the bounds of v may expand. Therefore, when solving Factory Floor Quality Assurance (QA) This domain de- the LP to update the bounds of v, the bounds of vnow are scribes QA sampling planning on a factory floor. A machine defined as [−∞, ∞]. If snap action A does not contain a produces parts at a specific rate; at some point, several parts continuous numeric effect on v or an instantaneous effect on are taken for sampling. The number of produced parts is lim- v, then the bounds can only contract. Therefore, the bounds ited for storage reasons. This domain is similar to the pre- from S are passed to the LP solver, leading to a smaller vious domain (the flown distance is analogous to produced search space and faster update. parts); only here, we limit the total number of parts that can be produced, adding a global numerical invariant condition. 4 Evaluation In this domain, too, we have two variants - one allowing the In this section, we examine the performance of the pro- calibration of measuring machinery before the beginning of posed changes in six different domains, which stem from the manufacturing process, and one that does not. As in the four physical world examples. The domains brought here previous domain, the second variant is used for instances were chosen to demonstrate both the strengths and the weak- that require multiple samples of the same part. nesses of the contribution: The first four illustrate the fam- Single Rover IPC Domain and Linear Generator The ily of cases in which the contribution is meaningful, shed- Single Rover domain is a standard domain taken from the ding light not only to the benefits of our suggestion but also standard IPC 3, and (Coles et al. 2012) used it to demon- to ways in which to better model a problem for any OP- strate the hybrid planning mechanism. TIC family planner. The other two domains are brought to The Linear Generator is yet another standard domain that demonstrate cases in which the contribution is not useful, in was widely used in previous papers. It describes a generator an attempt to assess the price of using it. consuming fuel to generate energy. The generator may be re- As this work’s contribution is in improving the OPTIC fueled from auxiliary tanks. All actions affect the main-tank family of planners, we compare our suggestions with the fuel quantity, and the fact that all actions contain numerical performance of the latest implementation of OPTIC, and ap- change and constraints it was expected OPTIC-II to show ply our improvements to the same code base, and name it little to no improvement in solving problems from this do- OPTIC with Injected Information (OPTIC-II). main. The results and results of other additional tests on various IPC domains are publicly available at extra data1 . 4.2 Results All tests were performed on an Intel i7-8550U CPU@1.80GHz×8 with 8GB RAM. The PDDL files of the Implementing the changes suggested in this work requires domains, the respective problems, and result data will be additional tests before building an LP. These tests and published on a public website. Runtime results are presented checks come with a computational price. However, as can in Table 4. “X” in the table means the runtime was over be seen in Table 4, that price is not high. In the Linear gen- 1000s. Since the domains and processes are deterministic erator domain and the single rover, none of the changes are no statistical analysis is required, as results may only vary useful. In most states, the information from the applied ac- due to CPU noise. tion cannot reduce the computations, and the problem is not convertible to temporal. This is because in many states, for 4.1 Domains instance, there are often two continuous linear actions op- Flying Observer We demonstrate OPTIC-II on a set of erating on the same variable. The changes not being useful previously published domains. The first is the flying ob- mean that all checks will be false when running OPTIC-II, and the LP solver will be used just as in OPTIC. The results 1 show that OPTIC-II indeed performs a bit slower when used This is currently given in the extra data. When paper is ac- cepted it will be published publicly online in these domains.
Flying Observer No limit on configure Configure only when flying Instance 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 OPTIC 0.52 6.58 90.15 172.82 268.59 808.65 X 33.15 55.62 66.42 82.5 91.19 456.68 831.02 X 6.99 35.96 105.62 213.33 237.86 538.46 X X Sec 3.3 0.55 5.19 76.23 164.19 257.38 795.11 X 31.22 52.16 63.58 79.71 89.87 445.21 825.16 X 6.31 34.09 99.76 198.00 224.11 504.43 X X Sec 3.1 0.19 2.91 37.11 85.15 149.76 578.61 X 9.32 13.63 19.84 26.14 26.59 192.26 441.77 924.58 1.69 7.91 27.62 58.88 68.15 442.05 332.87 X Sec 3.1+3.2 0.17 1.83 31.51 77.38 141.17 560.44 X 8.30 11.83 15.88 20.91 22.07 163.18 387.70 854.78 1.17 6.60 22.58 49.40 56.87 93.32 236.55 X Sec 3.1+3.3 0.21 2.31 35.51 84.23 149.12 580.07 X 9.34 13.70 19.81 22.89 26.53 190.98 434.36 924.04 1.64 7.81 27.35 57.95 65.76 441.35 336.51 X OPTIC-II 0.15 1.88 32.91 79.63 140.55 557.89 X 8.29 11.07 14.91 21.82 24.50 167.22 391.78 857.92 1.23 5.98 20.40 49.15 56.91 91.02 241.99 X Factory Floor QA No limit on calibrate Calibrate only when manufacturing Instance 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 OPTIC 0.47 6.34 44.70 293.68 547.98 828.47 X 30.90 63.54 X 594.82 906.96 X 6.35 36.64 122.41 267.91 314.17 452.16 880.83 X Sec 3.3 0.45 6.89 53.80 302.40 535.98 807.05 X 26.95 48.61 X 526.68 814.11 X 5.83 34.54 118.00 254.39 294.76 424.92 827.24 X Sec 3.1 0.33 7.09 53.38 275.61 478.96 733.08 X 28.64 59.74 X 570.86 872.45 X 5.36 30.56 92.71 205.55 232.52 351.25 726.83 X Sec 3.1+3.2 0.33 5.80 43.69 254.10 463.60 730.79 X 24.50 54.89 X 550.04 874.37 X 5.18 29.23 90.89 201.99 234.87 347.51 725.29 X Sec 3.1+3.3 0.35 6.74 47.87 256.16 451.83 714.85 X 26.82 56.17 X 538.03 831.99 X 4.45 24.65 85.27 192.05 219.80 325.14 690.79 X OPTIC-II 0.33 6.92 47.19 253.12 465.30 722.89 X 26.65 56.25 X 548.36 843.43 X 4.19 22.36 78.29 191.23 215.33 327.49 687.34 X Single Rover Instance 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 OPTIC 0.08 0.04 0.09 0.18 0.35 0.88 7.80 12.03 0.29 374.94 0.22 0.06 225.89 2.98 0.66 36.23 X OPTIC-II 0.07 0.04 0.10 0.18 0.35 0.96 8.13 13.09 0.23 375.69 0.23 0.06 221.19 3.04 0.65 36.68 X Linear Generator Tanks 10 20 30 40 50 60 70 80 90 100 OPTIC 0.61 3.28 12.14 130.89 554.31 48.14 91.27 160.68 300.70 X OPTIC-II 0.63 4.18 15.36 127.18 559.67 47.71 91.58 159.89 301.27 X Table 4: Results In the four domains from the Flying Observer and QA, tests slightly slow the search process. OPTIC-II performs far better than OPTIC. We present the results of all three suggested changes, the contribution of Reformulating the LP The last contribution we exam- each change separately, and possible combinations to better ine is the one described in Section 3.2. For implementa- understand the results. tion reasons, this change was only applicable to the previous change. This change was useful in both the flying observer Efficient Variable Update In Section 3.3, we suggest us- domains. When applied to these domains, all problems were ing information about the current action to update the vari- converted to temporal, and no LPs were solved. The result ables’ bounds efficiently. This improves the LP solving in was a significant improvement in planning speed. the variable update stage of the search. Therefore, we expect this change to be more prominent when the planner handles In the QA domains, only some of the states were con- large plans that result in larger LP problems, and when a verted from numerical to temporal, and therefore the change large amount of variables needs to be updated. was less prominent. The planner was able to identify that the In Table 4, the line named “Sec. 3.3” presents OPTIC requirement for the total amount of produced parts was the performance when only this change is present in the four variable that prevented the conversion. If the domain expert domains taken from (Denenberg, Coles, and Long 2019). It believes the total amount limit cannot be reached, he can re- can be seen that this change contributes to the performance move that constraint from the domain and allow for much is more prominent in higher instances where the plan is, in- faster Planning. Thus, using this method, we can improve deed, quite long and contains many variables. performance, and perform knowledge engineering, present- Though this change’s contribution is not visible in simple ing the model expert with possible ways to improve the plan- academic domains, it helps scale large problems such that ning process. arise in a real-life domain and, therefore, useful. Observing the Latest Action In Section 3.1, we sug- 5 Conclusions gested using information about the current action to decide whether an STP can be used to prove temporal consistency This work presented three methods for improving the search even though a numeric change is present in the state. This in the OPTIC family of planners: injecting state information change may lower the number of LPs that will be solved into the consistency check, injecting state information into during the search and, therefore, speed up the search. The the variables bound update, and reformulating the LP as an runtime results of OPTIC running only this improvement is STN. These suggested changes to the planner can improve dubbed “Sec. 3.1” in the table. LP’s solving time or, at times, help avoid using an LP solver In the first four domains, we see a significant reduction of altogether. These changes were shown to be relatively cheap LPs solved in the search2 , which results in a faster search. and useful in many cases. These changes apply to a board In the last two domains, the reduction is minimal, and so the and a popular family of planners. Future work would include additional tests and profiling. 2 Also, exploiting state information in other forward search The number of states proved consistent with an LP out of the visited states is given in the extra data, and will be published online planners can be examined.
References and Planning Applications Workshop (SPARK), 2019 ICAPS Bell, P. C.; Delvenne, J.-C.; Jungers, R. M.; and Blondel, Workshop. V. D. 2010. The continuous Skolem-Pisot problem. Theo- Do, M. B.; and Kambhampati, S. 2001. SAPA: a Domain- retical Computer Science 411(40-42): 3625–3634. Independent Heuristic Metric Temporal Planner. In Euro- Bemporad, A.; Ferrari-trecate, G.; and Morari, M. 2000. Ob- pean Conf. on Planning (ECP). servability and controllability of piecewise affine and hy- Fernández-González, E.; Karpas, E.; and Williams, B. C. brid systems. IEEE Transactions on Automatic Control 45: 2017. Mixed Discrete-Continuous Planning with Convex 1864–1876. Optimization. In AAAI Conference on Artificial Intelligence. Benton, J.; Coles, A.; and Coles, A. 2012. Temporal Fox, M.; and Long, D. 2003. PDDL2.1: An extension to Planning with Preferences and Time-Dependent Continu- PDDL for expressing temporal planning domains. Journal ous Costs. In Proceedings of the Twenty-Second Inter- of artificial intelligence research 20: 61–124. national Conference on International Conference on Auto- Fox, M.; Long, D.; and Magazzeni, D. 2011. Automatic mated Planning and Scheduling (ICAPS), 2–10. Construction of Efficient Multiple Battery Usage Policies. Bernardini, S.; Fox, M.; and Long, D. 2014. Planning the be- In The Twenty-Second International Joint Conference on Ar- haviour of low-cost quadcopters for surveillance missions. tificial Intelligence (IJCAI), 2620–2625. In Twenty-Fourth International Conference on Automated Obes, J. L.; Sarraute, C.; and Richarte, G. 2013. Attack Plan- Planning and Scheduling (ICAPS), 445–453. ning in the Real World. The Computing Research Repository Cashmore, M.; Fox, M.; Larkworthy, T.; Long, D.; and Mag- (CoRR) abs/1306.4044. azzeni, D. 2014. AUV mission control via temporal plan- Piotrowski, W.; Fox, M.; Long, D.; Magazzeni, D.; and Mer- ning. In 2014 IEEE International Conference on Robotics corio, F. 2016. Heuristic Planning for PDDL+ Domains. In and Automation (ICRA), 6535–6541. Proceedings of the Twenty-Fifth International Joint Confer- Cashmore, M.; Fox, M.; Long, D.; and Magazzeni, D. 2016. ence on Artificial Intelligence (IJCAI), 3213–3219. A Compilation of the Full PDDL+ Language into SMT. In Scala, E.; Haslum, P.; Thiebaux, S.; and Ramirez, M. 2016. Twenty-Sixth International Conference on Automated Plan- Interval-Based Relaxation for General Numeric Planning. In ning and Scheduling(ICAPS). Proceedings of the Twenty-Second European Conference on Cashmore, M.; Fox, M.; Long, D.; Magazzeni, D.; Ridder, Artificial Intelligence (ECAI), 655–663. B.; Carreraa, A.; Palomeras, N.; Hurtós, N.; and Carrerasa, M. 2015. ROSPlan: Planning in the Robot Operating Sys- tem. In Proceedings of the Twenty-Fifth International Con- ference on International Conference on Automated Planning and Scheduling (ICAPS), 333–341. Chien, S.; Rabideau, S.; Knight, R.; Sherwood, R.; Engel- hardt, B.; Mutz, D.; Estlin, T.; Smith, B.; Fisher, F.; Barrett, T.; Stebbins, G.; and Tran, D. 2000. ASPEN - Automated Planning and Scheduling for Space Mission Operations. In Space Ops. Coles, A.; Coles, A.; Fox, M.; and Long, D. 2010. Forward- chaining partial-order planning. In Twentieth Interna- tional Conference on Automated Planning and Scheduling (ICAPS). Coles, A.; Coles, A.; Fox, M.; and Long, D. 2012. COLIN: Planning with Continuous Linear Numeric Change. Journal of Artificial Intelligence Research 44: 1–96. Coles, A.; Coles, A.; Martinez, M.; Savas, E.; Keller, T.; Pommerening, F.; and Helmert, M. 2019. On-board plan- ning for robotic space missions using temporal PDDL. In 11th International Workshop on Planning and Scheduling for Space (IWPSS). Denenberg, E.; and Coles, A. 2018. Automated planning in non-linear domains for aerospace applications. In 58th Israel Annual Conference on Aerospace Sciences. Denenberg, E.; Coles, A.; and Long, D. 2019. Evalu- ating the Cost of Employing LPs and STPs in planning: lessons learned from large real-life domains. In Scheduling
You can also read