Coordinating Heterogeneous Teams of Robots using Temporal Symbolic Planning
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Autonomous Robots manuscript No. (will be inserted by the editor) Coordinating Heterogeneous Teams of Robots using Temporal Symbolic Planning Kai M. Wurm · Christian Dornhege · Bernhard Nebel Wolfram Burgard · Cyrill Stachniss Received: date / Accepted: date Abstract The efficient coordination of a team of heteroge- 1 Introduction neous robots is an important requirement for exploration, rescue, and disaster recovery missions. In this paper, we One of the fundamental problems in mobile robotics is the present a novel approach to target assignment for hetero- task of autonomously navigating in an environment. In most geneous teams of robots. It goes beyond existing target as- realistic settings, a robot only has partial knowledge about signment algorithms in that it explicitly takes symbolic ac- its environment. For example, the blueprint of a building tions into account. Such actions include the deployment may be known to the robot but furnishing and other obsta- and retrieval of other robots or manipulation tasks. Our cles are usually not known beforehand. Furthermore, there method integrates a temporal planning approach with a tra- are applications in which the environment is entirely un- ditional cost-based planner. The proposed approach was im- known to the robot, for example during planetary explo- plemented and evaluated in two distinct settings. First, we ration and disaster recovery missions. coordinated teams of marsupial robots. Such robots are able In many applications, a coordinated team of robots offers to deploy and pickup smaller robots. Second, we simulated advantages over a single robot. Multi-robot systems have the a disaster scenario where the task is to clear blockades and potential of being more fault tolerant and of reducing the reach certain critical locations in the environment. A simi- overall time to complete a given task [Dudek et al., 1996; lar setting was also investigated using a team of real robots. Cao et al., 1997]. A well-studied problem is the explo- The results show that our approach outperforms ad-hoc ex- ration of an unknown environment with a cooperative team tensions of state-of-the-art cost-based coordination methods of robots. Previous approaches often addressed the prob- and that the approach is able to efficiently coordinate teams lem of exploring an environment with groups of robots that of heterogeneous robots and to consider symbolic actions. have identical capabilities. To coordinate such homogeneous teams, popular approaches determine a set of exploration targets and assign robots to them numerically [Zlot et al., Keywords Multi-robot coordination · temporal planning 2002; Ko et al., 2003; Berhault et al., 2003; Burgard et al., 2005; Stachniss, 2009]. These approaches consider the cost and the expected information gain of each exploration tar- This work was supported by the Deutsche Forschungsgemeinschaft get. (DFG) in the Transregional Collaborative Research Center SFB/TR8 In this paper, we address the problem of coordinating Spatial Cognition projects A3-[MultiBot] and R7-[PlanSpace] and by exploring teams of robots with differing capabilities. The Microsoft Research, Redmond. robots in such heterogeneous teams may differ in their phys- K. Wurm, C. Stachniss, and W. Burgard ical properties such as their sensor setup, their size and pay- University of Freiburg, Dept. of Computer Science, Georges- Köhler-Allee 079, 79110 Freiburg, Germany E-mail: load, their maximum traveling speed, or the type of terrain (wurm|stachnis|burgard)@informatik.uni-freiburg.de they are able to traverse. In our approach, we especially con- C. Dornhege, and B. Nebel sider robots that differ in the actions they are able to per- University of Freiburg, Dept. of Computer Science, Georges- form. For instance, robots might be equipped with manip- Köhler-Allee 052, 79110 Freiburg, Germany E-mail: ulators, they could be able to deploy localization beacons, (dornhege|nebel)@informatik.uni-freiburg.de or they could even deploy other robots. Numeric, cost-based
2 Kai M. Wurm et al. coordination approaches are able to consider differing prop- tions that have to be executed. This description serves as the erties of robots that affect navigation costs. For example, the input to a symbolic planning system that solves the coordi- cost of reaching a target depends on the travel speed of a nation problem. However, classical planning approaches do robot and it is also possible to encode that a robot cannot not consider the execution cost of the solutions they gener- reach a target due to constraints on the size of the robot or ate. Adding costs measures to actions turns the coordination the type of terrain it can traverse. However, it is not straight- problem into a temporal planning problem and there exist forward to take into account actions other than navigating efficient algorithms to solve such problems [Gerevini et al., to exploration targets since there is no efficient mapping of 2008; Eyerich et al., 2009]. We estimate execution costs for such actions to cost or utility measures. While we can usu- each navigation action using a robotic path planner. ally specify the time it takes to perform an action such as In contrast to cost-based coordination approaches, our deploying a robot, it is not meaningful to compare this cost system is able to explicitly plan for the execution of sym- to the cost of exploring a given frontier target. The reason bolic actions. Still, the use of action costs that are deter- for this incompatibility lies in the fact that performing an mined by the robotic path planner allows us to generate action that does not explore parts of the environment has no time-efficient solutions. In this way, our approach combines apparent immediate reward to an exploration system but it the strength of cost-based coordination approaches with the effects its performance in the future. flexibility of symbolic planning systems. To evaluate our co- From a conceptual point of view, the ability to perform ordination approach, we apply our framework to two popu- actions that go beyond navigating to goal positions intro- lar multi-robot applications: exploration of unknown envi- duces corresponding symbolic actions. This term is com- ronments with heterogeneous teams of robots and disaster monly used in classical planning approaches that encode the recovery with heterogeneous teams. state of a system using logical symbols. Classical planning An interesting coordination problem arises when a team approaches execute symbolic actions to change the state of of robots includes members that are able to deploy and the system towards a pre-defined goal state. In the context retrieve other, smaller robots. For a task such as the au- of multi-robot exploration, we define a symbolic action as tonomous exploration of lunar craters, one can imagine any action that does not explore parts of the environment robots that approach the crater and then deploy a specialized directly. Examples of such actions include opening doors, robot which descents into the crater [Cordes et al., 2011; operating elevators, moving obstacles, and deploying other ESA, 2008]. Such systems are frequently referred to as mar- robots. There exist relatively few approaches that coordinate supial robots [Murphy et al., 1999]. The coordination of teams of robots and take into account symbolic actions. One marsupial teams generally requires to carefully plan deploy- specific set of actions that has previously been considered ment and retrieval actions, both are symbolic actions ac- is the deployment and retrieval of robots by other robots. cording to our definition. In addition, one has to take into To execute symbolic actions, previous approaches rely on account the different properties of the robots such as their manually designed strategies [Singh and Fujimura, 1993; sensor setup, their size and payload, their maximum travel- Murphy et al., 1999; Dellaert et al., 2002]. One strategy, ing speed, or the type of terrain they are able to traverse. The for example, is to deploy a smaller robot whenever a large coordination framework proposed in this paper is applied to robot cannot reach a given goal. These strategies, however, the exploration of an unknown environment using marsupial are specific to a certain type of robot and environment and teams of robots. it is unclear whether they are able to efficiently coordinate A further class of applications for teams of heteroge- large teams of robots. neous robots is centered around the scenario that important The approach presented in this paper considers sym- parts of an environment have collapsed after a natural dis- bolic actions explicitly by applying symbolic planning tech- aster. In such a setting, teams of robots can, for instance, niques. In the context of multi-robot exploration our goal is be used to perform search and rescue missions. We consider to explore all exploration targets as fast as possible. We as- the task of carrying out pre-defined actions at certain criti- sume that to reach some targets it is necessary to execute cal locations in the collapsed structure, for example, to re- additional symbolic actions such as removing an obstacle establish safe operation of a failed power station or chemical or operating an elevator. The key idea of our approach is plant. Such a disaster recovery task requires robots with di- to integrate a symbolic planning system and a robotic path verse capabilities. First of all, robots are required to clear planner. Since it is unclear how we can translate symbolic collapsed paths throughout the structure. At the same time, actions into a cost measure as it is used in numeric coordi- there is a need for robots with good sensors or precise ma- nation approaches, we instead treat navigating to an explo- nipulation skills to open doors, operate valves, closely in- ration target as an action in a symbolic planning system. We spect parts of the building, or repair damage. A heteroge- generate a symbolic formulation of the coordination prob- neous team of specialized robots can be used to provide lem that includes navigation goals as well as symbolic ac- these capabilities in a flexible way. We apply our proposed
Coordinating Heterogeneous Teams of Robots using Temporal Symbolic Planning 3 framework to coordinate a team of exploring and clearing To improve the overall performance, later approaches in- robots in a simulated disaster scenario. In addition to ex- troduced techniques to avoid redundant work. Zlot and col- ploratory navigation actions, we explicitly plan for symbolic leagues [2002] proposed an architecture in which the ex- clearing actions. ploration is guided by a market economy. They consider se- Both application scenarios serve as motivating examples quences of potential target locations for each robot and trade for this work. We developed and implemented a coordina- tasks between the robots using single-item first-price sealed- tion approach for both applications and compared our ap- bid auctions. Such auction-based techniques have also been proach to ad-hoc extensions of cost-based numeric coordi- applied by Berhault et al. [2003] to assign robots to bun- nation approaches. As we will show in the evaluation, our dles of targets so that synergy effects between targets are approach produces significantly better plans leading to a de- exploited. crease in both the overall runtime and the sum of traveled Burgard et al. [2005] presented an iterative assignment distances. Additionally, we applied our approach to coordi- method based on the estimated cost of reaching a target. It nate a real-world heterogeneous robotic system. additionally considers visibility constraints between targets. This paper extends our previous work [Wurm et al., Ko et al. [2003] and Stachniss [2009] presented algorithms 2010] in several ways. First, it provides an additional appli- that use the Hungarian method to compute the assignments cation of our approach in the context of disaster recovery. It of robots to exploration targets. In a previous work, we pre- furthermore contains an extended experimental evaluation. sented approaches that use semantic information [Stachniss Additionally, we describe an application realized with a real- et al., 2009] and segmentations of the environment [Wurm et world heterogeneous robotic system. We finally include a al., 2008] for improving coordinated exploration. By assign- description of temporal planning from a robotics perspec- ing robots to unexplored segments instead of frontier targets, tive and discuss strengths and limitations of its application a more balanced distribution of the robots over the environ- in robotics. ment is achieved and the overall exploration time is reduced. The remainder of this paper is organized as follows. Af- Heterogeneous teams of robots consist of robots with ter discussing related work, we give a short introduction to different abilities. An early approach towards coordination temporal planning. In Section 4, we give a general descrip- of heterogeneous robot systems during exploration was pre- tion of our coordination framework and its components. We sented by Singh and Fujimura [1993]. In this approach, if describe a specific application for exploration with marsu- a robot is too big to pass through a narrow passage, other pial robots in Section 5. In Section 6, we present a further ap- robots are informed about this task. A system for mapping plication of our framework in the context of disaster recov- with heterogeneous robots was introduced by Grabowski ery. Our experimental evaluation is presented in Section 7. and Navarro-Serment [2000]. Coordination, however, is per- Finally, strengths and limitations of the proposed approach formed manually in their approach. Howard et al. [2006] are discussed in Section 8. presented an approach to coordinate large teams of hetero- geneous robots. In their experiments, a small number of so- phisticated robots explored an environment and a large num- 2 Related Work ber of simple sensor robots was then deployed to serve as a distributed sensor network. When multiple robots explore an unknown environment the Whenever small robots with low traveling speeds or coordination strategy has the biggest influence on the per- limited power resources are used in a heterogeneous robot formance of the system. Coordination methods for homo- team, it is advantageous to have larger robots transport the geneous teams, in which all members are equipped equally, smaller ones to avoid a serious penalty in exploration time or have been studied extensively in the past. In those cases, the power consumption [Murphy et al., 1999]. The larger carrier coordination task is often formulated as an assignment prob- robots are frequently referred to as marsupial robots. One lem. Such approaches assign robots to exploration targets of the first physical implementation of a marsupial system based on a suitable cost measure. was presented by Murphy et al. [1999] who also described Several methods have been presented to determine an as- ad-hoc methods to deploy the smaller robot. Kadioglu and signment of robots to exploration targets. Most methods de- Papanikolopoulos [2003] presented a further physical im- termine targets by extracting the frontier between explored plementation of a marsupial system. Rybski et al. [2000] and unexplored areas. This target generation approach was developed strategies to use marsupial systems for surveil- introduced by Yamauchi [1997] who also proposed a decen- lance. In their approach, small scout robots are deployed by tralized multi-robot coordination technique that assigns each the marsupial robot to monitor individual rooms. Dellaert et robot to the nearest frontier [Yamauchi, 1998]. The draw- al. [2002] deployed a team of legged robots using a car- back of this coordination method is that several robots can rier robot in a rescue scenario. Denner and Papanikolopou- be assigned to the same target. los [2007] presented a deployment method for marsupial
4 Kai M. Wurm et al. teams that explicitly considers power constraints. In all of the previously described exploration systems, deployment and retrieval of robots in marsupial teams is determined by handcrafted methods. In contrast to that, the approach pre- e sented in this chapter explicitly takes these symbolic actions c into account when coordinating the exploration. The approach presented in this paper employs domain Fig. 1 An exploring robot (e) has to explore both targets t1 and t2 . independent planning techniques to coordinate multiple The path to t1 is blocked at b1 (dashed line). A clearing robot (c) can robots. Domain independent planning is a sub-field of artifi- clear the blockade. cial intelligence that has been investigated thoroughly [Fikes and Nilsson, 1971; Bylander, 1994; Erol et al., 1995] using during planning. They use so called suggesters that provide, methodes like action graphs [Gerevini and Serina, 1999], for example, new motion paths or goal locations. The work planning as satisfiability [Kautz and Selman, 1992] and described in this chapter uses the planner TFD/M [Dornhege heuristic state-space search [Bonet and Geffner, 2001]. et al., 2009], a variant of the temporal fast downward plan- A classical planning problem consists of a set of state- ning system originally developed by Eyerich et al. [2009]. variables with finite domains, an initial state, a set of actions TFD/M provides a generic interface for the integration of and a goal condition. An action is defined by a precondition external modules that compute the values of state variables, and its effects, which is a set of variable assignments. A so- action costs, and numerical effects during the planning pro- lution for a classical planning problem is then defined as a cess using sub-processes. By means of these sub-processes, finite sequence of actions that leads from the initial state to a we combine temporal planning with path planners tradition- goal state. There exist several efficient planning systems for ally used for robot navigation and coordination. The work by classical planning problems [Bonet and Geffner, 2001; Hoff- Gregory et al. [2012] uses modules during planning to eval- mann and Nebel, 2001; Chen et al., 2006; Helmert, 2006; uate not only predicates, but also terms in first order logic. Richter and Westphal, 2010]. Autonomous disaster recovery in partially known envi- In multi-robot coordination, actions need to be executed ronments, as introduced in this chapter, has not been studied concurrently and they usually have variable durations. These in depth. An auction-based coordination method as well as a temporal constraints cannot be handled by classical plan- method based on genetic algorithms is presented by Jones et ning approaches and constitute a new class of problems. The al. [2011]. In their work, they coordinate a simulated team task of finding solutions to these problems is known as tem- of fire trucks and bulldozers leading to a coordination prob- poral planning and several efficient approaches have been lem that is similar to the disaster recovery domain consid- presented [Gerevini et al., 2008; Eyerich et al., 2009]. The ered in this chapter. In contrast to our approach, however, predominant approach of solving temporal planning prob- blockades are assumed to be known beforehand. It is un- lems is forward search with a search guidance function us- clear, how this approach can be extended to partially known ing A∗ or similar algorithms. Most approaches support the environments. Koes et al. [2005] described an approach that use of numerical state variables. In contrast to binary and employs mixed integer linear programming (MILP) to coor- multi-valued state variables, numeric state variables have an dinate a heterogeneous team in a search and rescue scenario. infinite continuous value domain. While numeric state vari- In this domain, disaster victims need to be found and iden- ables lead to undecidability even when used in a very lim- tified in a known environment. This task involves assigning ited form [Helmert, 2002], they are considered to be of high robots with different capabilities to appropriate tasks. Simi- importance when modeling real world domains. Temporal lar to our approach, path costs between target locations are planning is more complex than classical planning, which explicitly taken into account using a robotic path planner. is PSPACE-complete [Bylander, 1994]. Although the gen- The static nature of the environment, however, allows all eral problem of temporal planning is EXPSPACE-complete, paths to be pre-computed for the entire runtime. Such a pre- wide classes are still in PSPACE [Rintanen, 2007]. computation is not possible in the case of partially known While temporal planners generate sequences of abstract environments. actions, most real-world applications in robotics require ex- plicit motion commands that can be executed by the robots. For this reason, a number of approaches have been pro- 3 Temporal Planning posed that integrate other reasoners, such as motion plan- ners, to generate low-level actions. The work by Cam- Our approach employs a symbolic planning system to com- bon et al. [2009] focuses on the integration of manipulation pute actions for a team of robots. Algorithms for classical planning with a symbolic planner. Kaelbling and Lozano- domain independent planning generate plans that consist of Perez [2011] address the problem of successor generation sequences of actions. There is a substantial body of literature
Coordinating Heterogeneous Teams of Robots using Temporal Symbolic Planning 5 on this topic and a description of planning algorithms can, more details on the definition of temporal planning tasks we for example, be found in [Russell and Norvig, 2010]. In con- refer to the work of Eyerich et al. [2009]. trast to these classical planners, temporal planning systems When coordinating a cooperative team of robots, spe- explicitly allow for concurrent actions. Consider the initial cific tasks are assigned to individual robots. In most do- state of the simple planning problem shown in Fig. 1. The mains, robots work in parallel. Still, their actions might de- environment contains two targets t1 and t2 and two robots pend on each other, for example, when a robot is clearing a e and c. The task is to explore both targets using robot e. path for another robot. Coordination problems that include Robot c can clear the blockade b1 using the action (clear such cases can only be solved efficiently when the concur- c b1) and robot e can explore targets using, for example, rency of the domain, but also interdependencies of robot the action (explore e t2). Target t1 is blocked by b1 tasks, are accounted for. Temporal planning allows for the and cannot be explored initially. specification of such conditions and is therefore well suited A classical planner is able to produce a sequence of ac- for multi-robot coordination. tions that solves the problem of exploring all targets. For example the following plan is a valid solution: 3.1 Planning Domain Definition Language (clear c b1) (explore e t2) A wide range of problem types can be modeled as a gen- (explore e t1) eral planning problem, ranging from transportation prob- lems and single-player games to general combinatorial prob- This, however, is not the best solution to the task. The first lems. In recent years, the Planning Problem Definition Lan- two actions are not dependent on each other, so they can be guage (PDDL) [Fox and Long, 2003] has been established executed in parallel. Temporal planning allows to express as the prevalent planning language. such concurrent actions and would result in the following To generate a PDDL task description, one needs to de- exemplary plan fine (i) the objects involved in the planning process, (ii) the 0.0: (clear c b1) [1.2] predicates that define the state of the planner, (iii) actions 0.0: (explore e t2) [2.0] that change the state, and (iv) a start state and a goal con- 2.0: (explore e t1) [1.8], dition. We will give several examples of PDDL statements in the following sections. The PDDL description then forms where the actions are preceded by the start time and the du- the input to symbolic planners. rations of actions are given in square brackets. For example, In our approach, PDDL is used to encode the coordi- (explore e t2) starts at time 0.0 and takes 2.0 time nation problem that has to be solved in a multi-robot sys- units to complete. tem. Based on this description, the temporal planner com- In our approach, we define temporal planning problems putes concurrent actions for the team of robots. We use using PDDL 2.1 [Fox and Long, 2003], which is a language PDDL/M [Dornhege et al., 2009], an extension to PDDL for defining planning problems and allows for durative ac- that allows for the definition of external module calls. Using tions. Durative actions are an extension of classical plan- this method, we combine symbolic planning and a robotic ning actions. In addition to conditions and effects they allow path planner in our approach (see Section 3.3). the problem description to specify an execution time. Note that solutions to temporal planning problems do not define a strict sequence of actions but merely a partially ordered set 3.2 The TFD/M Planning System and especially that actions might be executed in parallel. In realistic planning problems, there will be restrictions TFD/M is a domain-independent progression search planner on which actions can be executed in parallel and which ac- developed by Dornhege et al. [2009]. It extends the tempo- tions have to be completed before another action can be ral planner framework Temporal Fast Downward (TFD) by started. These restrictions are modeled as conditions. More Eyerich et al. [2009]. TFD in turn is based on a classical precisely, a condition is defined as a tuple (C⊢ ,C↔ ,C⊣ ), planning system called Fast Downward that was developed where C⊢ is a start condition that must hold at the start time by Helmert [2006]. TFD extends the original system to sup- of the action, C⊣ is an end condition that must hold at the port durative actions and numeric expressions while TFD/M end of the action and C↔ is an overall condition that must adds support for external modules. hold during the execution of the action. TFD/M solves a planning problem in three phases: First, The effects of durative actions are specified in a similar the PDDL planning task is translated from its original way. An effect is defined as a tuple (E⊢ , E⊣ ), where E⊢ is a encoding into a more concise representation using finite- start effect that is applied at the start of the action and E⊣ domain variables. This is used by the planner to guide is an end effect that is applied at the end of the action. For the search by employing hierarchical dependencies between
6 Kai M. Wurm et al. state variables and leads to an increased search performance. initial problem (see Fig. 1), for example, can be defined in In the second step, efficient internal data structures are the following way: generated that are used by the search component and the (:durative-action clear search guidance function. The most important ones are do- :parameters (?c - clearer ?t - blockade) main transition graphs for each variable that encode how :duration (= ?duration [costClear ?c ?t])... state variables can change their values and the causal graph that represents the hierarchical dependencies between dif- Note that the use of a semantic attachment is indicated by ferent state variables. Finally, a best-first progression search squared brackets in the definition. For further details on the is performed, guided by a numeric temporal variant of the implementation of semantic attachments in forward chain- context-enhanced additive method [Helmert and Geffner, ing state space planners such as TFD/M, we refer the reader 2008]. to our previous work [Dornhege et al., 2009]. When the planner expands actions in the planning phase, In contrast to several other temporal planning systems, it detects semantic attachments and executes the associated TFD/M does not decompose the search into an action selec- dynamic library calls. These external calls compute the ap- tion phase and a scheduling phase but searches directly in propriate action costs. They are made exaclty at the same the space of time-stamped states. This usually leads to plans time that a planner without modules would acquire the value of significantly higher quality [Eyerich et al., 2009]. Note, from the internal state. In our approach, for example, the however, that the first plan that is generated by this method cost of navigation actions are computed by a robot path plan- is not necessarily optimal. This is because the search guid- ner. ance function is inadmissible, in other words, it does not In the general case, those module calls can influence the guarantee an underestimation of the true execution cost of a complexity of the system. Wide classes of temporal plan- plan. ning problems lie in PSPACE, while temporal planning is TFD/M is implemented as an anytime algorithm that EXPSPACE-complete [Rintanen, 2007]. The scenarios in does not terminate after the first solution is generated. It pro- this paper belong to those classes, as they do not allow duces a potentially non-optimal solution quickly and then the same action to overlap with itself. This means, that as prunes the search space to those time-stamped states which long as the sub-problem solved by a single module call is can potentially be extended to solutions with a lower over- in PSPACE, the complete system will retain its complex- all duration. If all states in the resulting state space are ex- ity. This is due to the fact that module calls depend only panded, the produced solution is guaranteed to be optimal. on the planner state, but not on other module calls. Thus, once a module call is finished, its memory can be freed and therefore no space is accumulated in memory. The numeric planner used in this work fulfills this condition as it only accumulates a polynomial amount of states – the grid cells. 3.3 Semantic Attachments in TFD/M TFD/M features semantic attachments that are a means of 4 Coordination of Heterogenous Teams of Robots Using evaluating components of the planning task externally. This Temporal Planning is implemented as a module interface for predicates, numer- In the following, we will describe our coordination frame- ical effects, and durations. In our coordination algorithm, se- work for heterogeneous teams of robots. We assume that mantic attachments are used to specify durations of actions to solve a given task, the robots have to move in the envi- in the planning task description. Consider, for example, the ronment and additionally need to perform symbolic actions. following module specification: In this context, symbolic actions are defined as actions that (costClear ?r - robot ?b - blockade cost change the state of the overall system but whose primary costClear@libcost module.so) purpose is not to change the position of the robots. Numeric coordination approaches usually consist of a method to eval- This defines a semantic attachment for cost computation (in- uate the utility of navigation goals and a target assignment dicated by the cost keyword) and is used to specify the technique. Since it is unclear how we can translate symbolic duration of actions. It has two parameters, a robot r and a actions into a utility measure as it is used in numeric coordi- blockade b. To define a semantic attachment, a function call nation approaches, we instead treat navigating to an explo- is provided that performs the actual computation externally ration target as an action in a symbolic planning system. In (here, costClear@libcost module.so). our approach, the target evaluation and assignment modules We can now use the semantic attachment defined above of numeric approaches are replaced by a temporal symbolic to specify the duration of actions. The clear action in the planner, a method to generate a problem description for this
Coordinating Heterogeneous Teams of Robots using Temporal Symbolic Planning 7 sensor data map map coordination targets robot states Fig. 3 Example of a marsupial robot team. A versatile but slow legged platform is deployed by a faster wheeled robot. Courtesy of DFKI Robotics Innovation Center. PDDL actions description module action call costs Fig. 2 This figure shows an overview of the coordination system. Solid arrows mark the control flow and dashed arrows represent data passed between modules. planner and a numeric path planner that is used to estimate Fig. 4 An exploring robot (white circle) has to choose between three the path costs of navigation actions. possible actions: explore target t1 , explore target t3 , or deploy a smaller robot at m1 to let it explore t2 in the red area that it cannot explore The architecture of our system is illustrated in Fig. 2. In itself. our approach, we assume global and unlimited communica- tion between the robots and thus employ a centralized co- the mission goal. From the solution of the symbolic planner, ordination approach. Furthermore, all robots are assumed to we extract actions and send them to the robots. The loop know their individual position. The team of robots provides depicted in Fig. 2 is executed constantly: Whenever new in- the sensor data and states of the platforms, such as their posi- formation about the environment arrives, for example, new tions, current actions and navigation goals, to our centralized sensor data is perceived or one of the robots reaches a target, coordination system. We use the sensor measurements and we replan. poses of the robots to build a grid map of the environment. Some of the modules in our coordination framework are From this map, we extract locations that are relevant for co- specific to the application. Depending on which actions the ordination. In an exploration task, relevant locations would robots are able to perform, we need to adapt the PDDL de- be exploration targets that can, for example, be determined scription of the coordination problem. The possible actions using the frontiers approach [Yamauchi, 1997]. To execute furthermore influence which locations we need to extract symbolic actions we furthermore extract positions at which from the map. In the following, we will present two applica- the actions need to be executed. If some of the robots were tions and describe how to adapt the framework to them. able to open doors, for example, we would determine the positions of doors in the map. We solve the coordination problem of assigning actions 5 Coordination of Marsupial Teams to the robots using the temporal symbolic planning system TFD/M. From the current state of the robots and the lo- In the first application, we use our planning framework to cations we extracted from the map of the environment we coordinate a team of marsupial robots. In such a team, one generate a problem description in PDDL. This description group of robots, the carriers, are able to carry, deploy and serves as the input for the planner. Since the goal is to solve retrieve a group of other robots, the rovers. An example of the given task as fast as possible, we need to consider the a real carrier and rover robot is depicted in Fig. 3. Here, a costs of traveling to navigation goals. To this end, we make versatile but slow legged platform is deployed by a faster use of the modular interface of TFD/M to call a numeric wheeled robot. path planner for mobile robots. The numeric planner returns We assume that carriers and rovers have different navi- the estimated path cost of reaching a goal position from a gation capabilities and that certain areas of the environment given start position. Based on these estimates, the tempo- can only be explored by the rovers and others only by the ral planner is able to compute an efficient plan that solves carriers. We furthermore assume that the robots are able to
8 Kai M. Wurm et al. reaching t as: Ctype (x,t) (1) est. path cost(x,t) , if robots of type type = can reach t from x ∞ , otherwise. Finally, the exploration task is assumed to be completed as soon as the set of exploration targets T is empty. Fig. 5 Example of the costs that have to be considered. Dotted lines 5.2 Formulating the Exploration Problem as a Temporal illustrate the estimated path costs between the robot at position p and the different target positions ti , the costs between meeting points mi Planning Problem and robot or targets positions, and costs between target positions. For the sake of better visibility, we did not display all costs in this figure. To apply the coordination approach described above, we need to encode the coordination problem that arises in a given situation as a PDDL description. First, we define determine which areas are traversable by which robot based which types of objects are involved. In the exploration sce- on their sensor observations, for example based on tech- nario, possible objects are robots that can be either rovers niques developed in our previous work [Wurm et al., 2009]. or carriers and locations that can be meeting points or ex- In our experiments, a marsupial team consists of n ploration targets. The corresponding PDDL statements are carrier robots, where each carrier initially carries m rover given in Fig. 6 (left). Second, we specify the predicates that robots. The goal is to completely explore the environment, are used to define the state. The major predicates we use to that is, to cover the traversable area with the sensors of describe the exploration problem are the robots. Fig. 4 depicts a situation where a carrier has (at ?r - robot ?x - location), to choose between exploring the environment itself and de- ploying a rover in an area that it cannot reach. This illus- which describes that the robot r is at position x and trates the key challenges that the coordination method faces: (on ?e - rover ?c - carrier) It needs to generate exploration targets, to assign robots to which is used to determine whether a rover e is docked at a those targets, and to schedule deployment and retrieval ac- carrier c. For each target t ∈ T , we also define if it has been tions. explored (explored ?t - target). Additionally, we use a numeric state variable 5.1 Target Locations and Cost Estimation (num docked ?c - carrier) To identify potential target locations, a set of exploration tar- that keeps track of the number of rovers that are docked at a gets T is generated from the partially explored grid map. In carrier c. addition to this, a set of meeting points M is determined. Third, the actions that change the state are provided. These meeting points are situated at the border between We define four actions, namely dock, undock, move, and those parts of the environment that can only be traversed explore. The actions dock and undock are used to de- by the rovers and the parts that can only be traversed by ploy or pick up a rover. They require that the carrier and the carriers. They are used for deployment and retrieval of the rover are at the same meeting point, which is ensured the rovers (see Fig. 5 for an illustration). To determine the using the at predicate. For docking, the number of docked targets and meeting points a frontier extraction algorithm is rovers has to be lower than the carrier’s capacity and the ac- used as first described by Yamauchi [Yamauchi, 1997]. tion changes a rover’s state from being at a meeting point to There are two basic types of actions a carrier can per- being on a carrier (encoded using the on predicate). form: exploring a target or visiting a meeting point to de- The other two actions move and explore model the ploy or retrieve a rover (see Fig. 4). While deployment and possible motions of the robots. The move action moves a retrieval are assumed to have constant costs, the cost of trav- robot to a meeting point for deployment or retrieval while eling between two locations in the environment is defined as the explore action moves the robot to a target and ex- the estimated travel time of a given robot. This cost depends plores it. To define the duration of the move and explore on the path length as well as on the traversability constraints actions, we employ the module interface of the temporal and travel speed of the corresponding robot. Let type be a planner. Instead of specifying a constant duration or a fixed robot type (here: carrier or rover), x a location in the envi- formula, we call an external module that determines the du- ronment and t ∈ {T ∪ M} a target. We define the cost for ration of the action. In our setting, the external module is
Coordinating Heterogeneous Teams of Robots using Temporal Symbolic Planning 9 (:durative-action explore :parameters (?r - robot (:init (:types ?s - location ?g - target) (at robot0 p) robot :duration (= ?duration (on robot1 robot0) carrier rover - robot [pathCost ?r ?s ?g]) (can_explore robot0 t1) location :condition (and (at start (at ?r ?s)) (can_explore robot1 t2) target meeting - location ) (at start (not (explored ?g))) (can_explore robot0 t3) (:predicates (at start (can_explore ?r ?g)) ... ) ) (at ?r - robot ?x - location) :effect (:goal (and (on ?e - rover ?c - carrier) (and (explored t1) (explored ?t - target) (at start (not (at ?r ?s))) (explored t2) (can_explore ?r - robot ?t - target)) (at end (at ?r ?g)) (explored t3) (at start (explored ?g)) )) ... )) Fig. 6 Examples of PDDL definitions. Left: definition of the types and predicates used in the coordination domain. Middle: definition of the explore action. Right: example that shows how to specify the current state of the world for the TFD/M planner (see illustration shown in Fig. 4). realized by an efficient path planner for mobile robots that dination module and stored in a set T = {t1 , . . . ,tn }. When- plans the optimal trajectory of the robot to the given tar- ever a target is visited by one of the exploring robots, it is get location based on the current occupancy grid map con- removed from the set T and the mission map. structed by the robots so far. Fig. 6 (middle) depicts the While the robots navigate in the environment they up- PDDL statements that describe the action explore. The date their maps using their sensors. Whenever a blockade b term[pathCost ?r ?s ?g] represents the call to the is sensed, this information is distributed to all other robots. external module. In our current implementation, we use Di- The central coordinator keeps track of blockades in a set jkstra’s algorithm for cost estimation as it allows to effi- B = {b1 , . . . , bm } which is used to coordinate the clearing ciently compute the path cost from a given start position to robots. all goal positions. Finally, the initial state of the current plan- Furthermore, the coordination system maintains a repre- ning procedure and the goal state are specified. For the situ- sentation of the environment that is composed of the prior ation depicted in Fig. 4, this is exemplified in Fig. 6 (right). building map and the set of blockades sensed by the robots. This map is used by the robot path planner to plan collision free paths for the team of robots and to estimate travel costs 6 Coordinated Disaster Recovery during target assignment. It estimates path costs from a lo- cation x to a target t ∈ {T ∪ B} according to As a further application of the coordination approach de- scribed in Section 4, we investigate teams of robots that op- C(x,t) (2) erate in a disaster recovery scenario. More specifically, we est. path cost(x,t) , path from x to t traversable assume that a known building is partially collapsed so that = ∞ , path blocked. paths within the building are blocked at unknown positions. The goal is to visit a given set of targets in the building us- This is an optimistic estimate, as we do not assume any ing a team of robots. In a real world application, the robots knowledge about where blockades may appear. When a could, for instance, provide high-resolution views of the sit- probabilistic model of blockades is given the path planning uation, close a set of valves, or deploy a set of beacons. We problem can be formulated as the Canadian Traveller’s Prob- assume that there are two types of robots: exploring robots lem [Papadimitriou and Yannakakis, 1991]. There exist ef- that visit targets by performing a specific action at a given ficient methods to solve this problem [Eyerich et al., 2010] location and clearing robots that are able to clear blocked that can be used for the path cost estimation. Note, however, paths. Both types of robots are able to detect blockades using that in our formulation blockades can be cleared and thus their sensors. All robots are provided with a map of the en- this estimate would also not be optimal. vironment that includes the mission targets. This prior map A disaster recovery mission is considered completed as does not, however, include blocked paths. soon as the set of targets T is empty, that is, all of the targets The key challenge in this scenario is to coordinate the have been visited by an exploring robot. team of robots so that the exploring robots do not waste time by waiting for blocked paths to be cleared. An illustration of a typical situation in this problem domain is given in Fig. 1. 6.2 Formulation as a Temporal Planning Problem To apply our coordination approach to the disaster recov- 6.1 Target Locations and Cost Estimation ery problem described above, we encode the system’s state and possible actions using PDDL statements. This PDDL The initial mission map includes all target locations that description then serves as input to our coordination algo- have to be visited. These locations are extracted by the coor- rithm as depicted in the system overview in Fig. 2.
10 Kai M. Wurm et al. (:types (:durative-action explore-from-blockade robot - object :parameters (?r - explorer ?s - blockade ?g - mission_target) clearer explorer - robot :duration (= ?duration [pathCost ?r ?s ?g]) location - object :condition (and (at start (at ?r ?s)) (at start (not (explored ?g))) free_location - location (at start (not (= ?s ?g))) (at start (cleared ?s)) ) mission_target - free_location :effect blockade - location) (and (:predicates (at start (not (at ?r ?s))) (at ?r - robot ?x - location) (at end (at ?r ?g)) (cleared ?t - blockade) (at start (explored ?g)) )) (explored ?t - mission_target)) Fig. 7 Examples of PDDL definitions used in the disaster recovery domain. Left: definition of the types and predicates. Right: definition of the explore-from-blockade action. First, we define which types of objects are involved. In to clear the blockade. In our experiments we assume that this this scenario, the relevant objects are the robots that can be value depends on the size of the blockade. After the action is either exploring robots or clearing robots and the map loca- executed, the predicate (cleared b) is set to true in the tions that can be mission targets or blockades. Fig. 7 (left) current state. Finally, the initial and goal state are defined. shows the corresponding PDDL statements. Second, we In the initial state, no blockades have been discovered and specify the predicates that we use to define the problem none of the mission targets have been explored. The goal states. The state in the disaster recovery problem can be state is defined as the state that describes all mission targets specified using as explored. (at ?r - robot ?x - location) The coordination loop depicted in the system overview (see Fig. 2) is executed continuously until the goal state is which describes that the robot r is at position x, reached. While the description of the actions remain un- (cleared ?b - blockade) changed during consecutive planning cycles, the current which describes that blockade b has been cleared, and state of the system is updated to reflect the current state of (explored ?t - mission target) the system and the environment. The updated state descrip- tion defines all robots to be at their specific locations and which describes that mission target t has been visited by an blockades are added to the state whenever they are discov- exploring robot. ered. Previously cleared blockades or targets that have been Third, we provide the actions that change the state of explored are irrelevant to the current problem and thus are the system. There are four kinds of movement actions: We ignored in the problem description. distinguish between actions that explore a mission target and actions that have a goal position other than a mission target. We furthermore differentiate actions whose start lo- 7 Evaluation cation is a blockade and actions that start at a location in free space. As an example, the PDDL definition given in The approach described in this paper has been implemented Fig. 7 (right) defines a movement action that explores a tar- and evaluated thoroughly using a multi-robot simulation get starting from a blockade. It can be seen, that movement system. The experiments are designed to show that explicitly actions which start at a blockade require the blockade to be planning symbolic actions leads to a significantly more effi- cleared before execution. All movement actions require the cient coordination than using a heuristic extension of previ- robot to be at the start location and result in the robot be- ous coordination approaches. We evaluated our coordination ing at the goal location. The costs of these actions are de- framework in the two problem domains introduced in Sec- fined as the estimated path costs and are determined via the tion 5 and Section 6. In addition, we show that our approach modular interface much the same as in the marsupial explo- is able to produce efficient coordination plans in reasonable ration scenario. The goal location of all exploration actions time and thus can be employed in a real-world system. is a mission target t. Once such an action is executed, the current state is changed so that the corresponding predicate (explored t) is set to true. 7.1 Simulation System Movement actions that start at a blockade have the addi- tional precondition that the blockade is cleared. To evaluate our coordination approach quantitatively, we de- In addition to exploration actions, we define a clear veloped a simulation system that is able to simulate large action to coordinate the group of clearing robots. This action teams of heterogeneous robots. In our current system, we expresses that a given blockade b is cleared by a clearing also simulate laser range sensors. Sensor and odometry robot that is positioned at the corresponding location. The noise are not considered since we focus on the coordination duration of the action is defined by the time the robot takes aspects of the problems. The environment is modeled using
Coordinating Heterogeneous Teams of Robots using Temporal Symbolic Planning 11 generated by the simulator is depicted in Fig. 9. It shows a representative coordination problem where exploration tar- gets as well as symbolic actions need to be considered. To get an intuition of the extend of the environment, one can look at the ratio between the size of the environment and the maximum sensor range of the robots. In many real office Fig. 8 Simulated environments in the marsupial exploration experi- buildings, such as building 079 on the Freiburg campus, this ments: office (left) and maze (right). White areas can only be traversed by carriers while red (dark) areas can only be explored by rovers. ratio is around 10 for a maximum sensor range of 4 m. The ratio for the maze environment is 10.2 for rovers and 5.1 for carriers. In the considerably bigger office environment the ratio increases to 23.8 and 11.9 respectively. In both environments, we simulated 30 exploration runs with random initial robot positions. Exploration targets were determined using the frontiers approach and neighboring ex- ploration targets were clustered using visibility constraints similar to the approach proposed by Burgard et al. [2005]. Fig. 9 Visualization of a simulated run in the maze environment. 7.3 Baseline Approach Robots are depicted as disks with id numbers and carriers are marked by a white dot. Small circles depict exploration targets (red) and meet- ing points (green). In this situation, rover 3 could be retrieved by carrier We compared our coordination algorithm to a heuristic ex- 0 or carrier 2 at any of the three meeting points in its vicinity. tension of a method that assigns robots to target locations based on cost estimates [Stachniss, 2009]. In this approach, the Hungarian method is used to compute the cost-optimal a grid map with additional traversability information and se- assignment of robots to exploration targets. To adapt this mantic annotations such as mission targets or blockades. method to the problem of exploration with marsupial teams, we first assign the carriers to exploration targets independent of whether they can access those targets or not. The assign- 7.2 Exploration with Masupial Teams ment is solely based on the estimated travel cost of the carri- ers, ignoring traversability constraints in the estimation. The We simulated teams of marsupial robots to evaluate our ap- rovers are then deployed as follows: Whenever a carrier is proach introduced in Section 5. The simulated environments assigned to a target that it cannot explore itself, it will move contain areas that can only be traversed by rovers and areas to the nearest connecting meeting point and deploy a rover that can only be traversed by carriers. The rover robots in there. This rover then explores the targets reachable from the most real marsupial systems are considerably slower than meeting point. As soon as it has finished exploring them, it the carrier robots. In the evaluation we account for this dif- returns to the meeting point. In our experiments, we assume ference by simulating carrier robots that are twice as fast a limited number of rovers per carrier. This will lead to situ- as rovers. Furthermore, the maximum sensor range of the ations in which a carrier needs to deploy a rover but has none carriers is also twice as far as the sensor range of the rover available. The baseline approach then requires the carrier to robots. first retrieve a rover. We evaluated robot teams of varying sizes and differ- The baseline approach as described above closely re- ent environments have been used in the simulation. Two of sembles heuristic deployment rules that have been applied the environments we used in our experiments can be seen in previous marsupial systems [Murphy et al., 1999]. Note in Fig. 8. The office environment resembles a typical office that the baseline approach could be improved by introducing building with two corridors and a number of rooms. Some more complex reasoning. For example, it could anticipate of the rooms can only be explored by rovers, those are de- cases in which no rovers are available to a carrier and then picted as red areas in the maps. The second environment, in avoid assigning this carrier to targets that are only reach- the following referred to as the maze environment, features a able by rovers. This would introduce additional state vari- central area that can only be explored by rovers. In contrast ables that the coordination system would need to maintain. to the areas in the office environment, it has multiple meet- In fact, this kind of reasoning would approximate a planning ing points that can be used for deployment. A visualization method such as the one we apply in our approach.
12 Kai M. Wurm et al. 160 heuristic 140 our approach 120 time steps 100 80 60 40 20 0 #rovers 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 #carriers 1 2 3 4 5 6 350 heuristic 300 our approach 250 time steps 200 150 100 50 0 #rovers 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 #carriers 1 2 3 4 5 6 Fig. 10 Exploration time obtained with our approach compared to the ad-hoc method in the maze environment (top) and in the office environment (bottom) for varying team sizes (number of carriers and number of rovers per carrier). The error bars indicate the 95% confidence intervals. 7.4 Results 60 our approach heuristic 50 An overview of the results obtained in the experiments is exploration quality 40 given in Fig. 10. It can be seen that our approach explores the 30 environment significantly faster than the baseline method in all configurations. By considering deployment and pick-up 20 actions explicitly, our approach avoids long travel paths and 10 detours that result from the pick-up heuristic in the base- 0 line. In addition, it uses its larger planning horizon to plan 1 2 3 4 5 6 sequences of actions while the baseline approach computes number of carriers exactly one action per robot. Even though we execute only Fig. 11 Exploration quality in the office environment over 30 runs us- the first action in the sequence, robots often are in a good ing two rovers per carrier. The higher the value, the better the coordi- nation. The error bars indicate the 95% confidence intervals. Note that initial position for the next action. similar results were obtained for the maze environment. Especially smaller teams of up to six robots profit con- siderably from our coordination method. Larger teams can, to some degree, compensate inefficiencies of the coordina- where A is the total area of the environment and di denotes tion when a good distribution in the environment is achieved the distance traveled by robot i. This measure can intuitively and many targets can be approached simultaneously. In the be understood as the average area each robot explores per settings we evaluated, this effect can be noticed when more movement. than three carriers are used. The number of robots for which The results given in Fig. 11 show that our approach this effect occurs clearly depends on the structure and size reaches a significantly higher exploration quality. Especially of the environment. larger teams of robots are coordinated more efficiently, so To evaluate the amount of unnecessary movement in that unnecessary movements are avoided. The reason for this larger teams, we evaluated a further benchmark. We com- improvement lies in the larger planning horizon of our ap- puted the exploration quality according to [Zlot et al., 2002] proach compared to the baseline. Our approach anticipates as future states of the system, especially the position of the robots, while the baseline approach computes actions solely n 1 on the basis of the current state of the system. Avoiding un- Q= A ∑ di , (3) necessary movements is a significant advantage on its own i=1
You can also read