Coordinating Heterogeneous Teams of Robots using Temporal Symbolic Planning

Page created by Dorothy Bowen

Science

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Coordinating Heterogeneous Teams of Robots using Temporal Symbolic Planning

Autonomous Robots manuscript No.
(will be inserted by the editor)

Coordinating Heterogeneous Teams of Robots
using Temporal Symbolic Planning
Kai M. Wurm · Christian Dornhege · Bernhard Nebel
Wolfram Burgard · Cyrill Stachniss

Received: date / Accepted: date

Abstract The efficient coordination of a team of heteroge-           1 Introduction
neous robots is an important requirement for exploration,
rescue, and disaster recovery missions. In this paper, we            One of the fundamental problems in mobile robotics is the
present a novel approach to target assignment for hetero-            task of autonomously navigating in an environment. In most
geneous teams of robots. It goes beyond existing target as-          realistic settings, a robot only has partial knowledge about
signment algorithms in that it explicitly takes symbolic ac-         its environment. For example, the blueprint of a building
tions into account. Such actions include the deployment              may be known to the robot but furnishing and other obsta-
and retrieval of other robots or manipulation tasks. Our             cles are usually not known beforehand. Furthermore, there
method integrates a temporal planning approach with a tra-           are applications in which the environment is entirely un-
ditional cost-based planner. The proposed approach was im-           known to the robot, for example during planetary explo-
plemented and evaluated in two distinct settings. First, we          ration and disaster recovery missions.
coordinated teams of marsupial robots. Such robots are able               In many applications, a coordinated team of robots offers
to deploy and pickup smaller robots. Second, we simulated            advantages over a single robot. Multi-robot systems have the
a disaster scenario where the task is to clear blockades and         potential of being more fault tolerant and of reducing the
reach certain critical locations in the environment. A simi-         overall time to complete a given task [Dudek et al., 1996;
lar setting was also investigated using a team of real robots.       Cao et al., 1997]. A well-studied problem is the explo-
The results show that our approach outperforms ad-hoc ex-            ration of an unknown environment with a cooperative team
tensions of state-of-the-art cost-based coordination methods         of robots. Previous approaches often addressed the prob-
and that the approach is able to efficiently coordinate teams        lem of exploring an environment with groups of robots that
of heterogeneous robots and to consider symbolic actions.            have identical capabilities. To coordinate such homogeneous
                                                                     teams, popular approaches determine a set of exploration
                                                                     targets and assign robots to them numerically [Zlot et al.,
Keywords Multi-robot coordination · temporal planning                2002; Ko et al., 2003; Berhault et al., 2003; Burgard et al.,
                                                                     2005; Stachniss, 2009]. These approaches consider the cost
                                                                     and the expected information gain of each exploration tar-
This work was supported by the Deutsche Forschungsgemeinschaft       get.
(DFG) in the Transregional Collaborative Research Center SFB/TR8          In this paper, we address the problem of coordinating
Spatial Cognition projects A3-[MultiBot] and R7-[PlanSpace] and by
                                                                     exploring teams of robots with differing capabilities. The
Microsoft Research, Redmond.
                                                                     robots in such heterogeneous teams may differ in their phys-
K. Wurm, C. Stachniss, and W. Burgard                                ical properties such as their sensor setup, their size and pay-
University of Freiburg, Dept. of Computer Science, Georges-
Köhler-Allee   079,    79110     Freiburg,   Germany E-mail:
                                                                     load, their maximum traveling speed, or the type of terrain
(wurm|stachnis|burgard)@informatik.uni-freiburg.de                   they are able to traverse. In our approach, we especially con-
C. Dornhege, and B. Nebel
                                                                     sider robots that differ in the actions they are able to per-
University of Freiburg, Dept. of Computer Science, Georges-          form. For instance, robots might be equipped with manip-
Köhler-Allee   052,   79110      Freiburg, Germany E-mail:          ulators, they could be able to deploy localization beacons,
(dornhege|nebel)@informatik.uni-freiburg.de                          or they could even deploy other robots. Numeric, cost-based

2 Kai M. Wurm et al.

coordination approaches are able to consider differing prop- tions that have to be executed. This description serves as the
erties of robots that affect navigation costs. For example, the input to a symbolic planning system that solves the coordi-
cost of reaching a target depends on the travel speed of a nation problem. However, classical planning approaches do
robot and it is also possible to encode that a robot cannot not consider the execution cost of the solutions they gener-
reach a target due to constraints on the size of the robot or ate. Adding costs measures to actions turns the coordination
the type of terrain it can traverse. However, it is not straight- problem into a temporal planning problem and there exist
forward to take into account actions other than navigating efficient algorithms to solve such problems [Gerevini et al.,
to exploration targets since there is no efficient mapping of 2008; Eyerich et al., 2009]. We estimate execution costs for
such actions to cost or utility measures. While we can usu- each navigation action using a robotic path planner.
ally specify the time it takes to perform an action such as In contrast to cost-based coordination approaches, our
deploying a robot, it is not meaningful to compare this cost system is able to explicitly plan for the execution of sym-
to the cost of exploring a given frontier target. The reason bolic actions. Still, the use of action costs that are deter-
for this incompatibility lies in the fact that performing an mined by the robotic path planner allows us to generate
action that does not explore parts of the environment has no time-efficient solutions. In this way, our approach combines
apparent immediate reward to an exploration system but it the strength of cost-based coordination approaches with the
effects its performance in the future. flexibility of symbolic planning systems. To evaluate our co-
From a conceptual point of view, the ability to perform ordination approach, we apply our framework to two popu-
actions that go beyond navigating to goal positions intro- lar multi-robot applications: exploration of unknown envi-
duces corresponding symbolic actions. This term is com- ronments with heterogeneous teams of robots and disaster
monly used in classical planning approaches that encode the recovery with heterogeneous teams.
state of a system using logical symbols. Classical planning An interesting coordination problem arises when a team
approaches execute symbolic actions to change the state of of robots includes members that are able to deploy and
the system towards a pre-defined goal state. In the context retrieve other, smaller robots. For a task such as the au-
of multi-robot exploration, we define a symbolic action as tonomous exploration of lunar craters, one can imagine
any action that does not explore parts of the environment robots that approach the crater and then deploy a specialized
directly. Examples of such actions include opening doors, robot which descents into the crater [Cordes et al., 2011;
operating elevators, moving obstacles, and deploying other ESA, 2008]. Such systems are frequently referred to as mar-
robots. There exist relatively few approaches that coordinate supial robots [Murphy et al., 1999]. The coordination of
teams of robots and take into account symbolic actions. One marsupial teams generally requires to carefully plan deploy-
specific set of actions that has previously been considered ment and retrieval actions, both are symbolic actions ac-
is the deployment and retrieval of robots by other robots. cording to our definition. In addition, one has to take into
To execute symbolic actions, previous approaches rely on account the different properties of the robots such as their
manually designed strategies [Singh and Fujimura, 1993; sensor setup, their size and payload, their maximum travel-
Murphy et al., 1999; Dellaert et al., 2002]. One strategy, ing speed, or the type of terrain they are able to traverse. The
for example, is to deploy a smaller robot whenever a large coordination framework proposed in this paper is applied to
robot cannot reach a given goal. These strategies, however, the exploration of an unknown environment using marsupial
are specific to a certain type of robot and environment and teams of robots.
it is unclear whether they are able to efficiently coordinate A further class of applications for teams of heteroge-
large teams of robots. neous robots is centered around the scenario that important
The approach presented in this paper considers sym- parts of an environment have collapsed after a natural dis-
bolic actions explicitly by applying symbolic planning tech- aster. In such a setting, teams of robots can, for instance,
niques. In the context of multi-robot exploration our goal is be used to perform search and rescue missions. We consider
to explore all exploration targets as fast as possible. We as- the task of carrying out pre-defined actions at certain criti-
sume that to reach some targets it is necessary to execute cal locations in the collapsed structure, for example, to re-
additional symbolic actions such as removing an obstacle establish safe operation of a failed power station or chemical
or operating an elevator. The key idea of our approach is plant. Such a disaster recovery task requires robots with di-
to integrate a symbolic planning system and a robotic path verse capabilities. First of all, robots are required to clear
planner. Since it is unclear how we can translate symbolic collapsed paths throughout the structure. At the same time,
actions into a cost measure as it is used in numeric coordi- there is a need for robots with good sensors or precise ma-
nation approaches, we instead treat navigating to an explo- nipulation skills to open doors, operate valves, closely in-
ration target as an action in a symbolic planning system. We spect parts of the building, or repair damage. A heteroge-
generate a symbolic formulation of the coordination prob- neous team of specialized robots can be used to provide
lem that includes navigation goals as well as symbolic ac- these capabilities in a flexible way. We apply our proposed

Coordinating Heterogeneous Teams of Robots using Temporal Symbolic Planning 3

framework to coordinate a team of exploring and clearing To improve the overall performance, later approaches in-
robots in a simulated disaster scenario. In addition to ex- troduced techniques to avoid redundant work. Zlot and col-
ploratory navigation actions, we explicitly plan for symbolic leagues [2002] proposed an architecture in which the ex-
clearing actions. ploration is guided by a market economy. They consider se-
Both application scenarios serve as motivating examples quences of potential target locations for each robot and trade
for this work. We developed and implemented a coordina- tasks between the robots using single-item first-price sealed-
tion approach for both applications and compared our ap- bid auctions. Such auction-based techniques have also been
proach to ad-hoc extensions of cost-based numeric coordi- applied by Berhault et al. [2003] to assign robots to bun-
nation approaches. As we will show in the evaluation, our dles of targets so that synergy effects between targets are
approach produces significantly better plans leading to a de- exploited.
crease in both the overall runtime and the sum of traveled Burgard et al. [2005] presented an iterative assignment
distances. Additionally, we applied our approach to coordi- method based on the estimated cost of reaching a target. It
nate a real-world heterogeneous robotic system. additionally considers visibility constraints between targets.
This paper extends our previous work [Wurm et al., Ko et al. [2003] and Stachniss [2009] presented algorithms
2010] in several ways. First, it provides an additional appli- that use the Hungarian method to compute the assignments
cation of our approach in the context of disaster recovery. It of robots to exploration targets. In a previous work, we pre-
furthermore contains an extended experimental evaluation. sented approaches that use semantic information [Stachniss
Additionally, we describe an application realized with a real- et al., 2009] and segmentations of the environment [Wurm et
world heterogeneous robotic system. We finally include a al., 2008] for improving coordinated exploration. By assign-
description of temporal planning from a robotics perspec- ing robots to unexplored segments instead of frontier targets,
tive and discuss strengths and limitations of its application a more balanced distribution of the robots over the environ-
in robotics. ment is achieved and the overall exploration time is reduced.
The remainder of this paper is organized as follows. Af- Heterogeneous teams of robots consist of robots with
ter discussing related work, we give a short introduction to different abilities. An early approach towards coordination
temporal planning. In Section 4, we give a general descrip- of heterogeneous robot systems during exploration was pre-
tion of our coordination framework and its components. We sented by Singh and Fujimura [1993]. In this approach, if
describe a specific application for exploration with marsu- a robot is too big to pass through a narrow passage, other
pial robots in Section 5. In Section 6, we present a further ap- robots are informed about this task. A system for mapping
plication of our framework in the context of disaster recov- with heterogeneous robots was introduced by Grabowski
ery. Our experimental evaluation is presented in Section 7. and Navarro-Serment [2000]. Coordination, however, is per-
Finally, strengths and limitations of the proposed approach formed manually in their approach. Howard et al. [2006]
are discussed in Section 8. presented an approach to coordinate large teams of hetero-
geneous robots. In their experiments, a small number of so-
phisticated robots explored an environment and a large num-
2 Related Work ber of simple sensor robots was then deployed to serve as a
distributed sensor network.
When multiple robots explore an unknown environment the Whenever small robots with low traveling speeds or
coordination strategy has the biggest influence on the per- limited power resources are used in a heterogeneous robot
formance of the system. Coordination methods for homo- team, it is advantageous to have larger robots transport the
geneous teams, in which all members are equipped equally, smaller ones to avoid a serious penalty in exploration time or
have been studied extensively in the past. In those cases, the power consumption [Murphy et al., 1999]. The larger carrier
coordination task is often formulated as an assignment prob- robots are frequently referred to as marsupial robots. One
lem. Such approaches assign robots to exploration targets of the first physical implementation of a marsupial system
based on a suitable cost measure. was presented by Murphy et al. [1999] who also described
Several methods have been presented to determine an as- ad-hoc methods to deploy the smaller robot. Kadioglu and
signment of robots to exploration targets. Most methods de- Papanikolopoulos [2003] presented a further physical im-
termine targets by extracting the frontier between explored plementation of a marsupial system. Rybski et al. [2000]
and unexplored areas. This target generation approach was developed strategies to use marsupial systems for surveil-
introduced by Yamauchi [1997] who also proposed a decen- lance. In their approach, small scout robots are deployed by
tralized multi-robot coordination technique that assigns each the marsupial robot to monitor individual rooms. Dellaert et
robot to the nearest frontier [Yamauchi, 1998]. The draw- al. [2002] deployed a team of legged robots using a car-
back of this coordination method is that several robots can rier robot in a rescue scenario. Denner and Papanikolopou-
be assigned to the same target. los [2007] presented a deployment method for marsupial

4 Kai M. Wurm et al.

teams that explicitly considers power constraints. In all of
the previously described exploration systems, deployment
and retrieval of robots in marsupial teams is determined by
handcrafted methods. In contrast to that, the approach pre-
e
sented in this chapter explicitly takes these symbolic actions
c
into account when coordinating the exploration.
The approach presented in this paper employs domain Fig. 1 An exploring robot (e) has to explore both targets t1 and t2 .
independent planning techniques to coordinate multiple The path to t1 is blocked at b1 (dashed line). A clearing robot (c) can
robots. Domain independent planning is a sub-field of artifi- clear the blockade.
cial intelligence that has been investigated thoroughly [Fikes
and Nilsson, 1971; Bylander, 1994; Erol et al., 1995] using during planning. They use so called suggesters that provide,
methodes like action graphs [Gerevini and Serina, 1999], for example, new motion paths or goal locations. The work
planning as satisfiability [Kautz and Selman, 1992] and described in this chapter uses the planner TFD/M [Dornhege
heuristic state-space search [Bonet and Geffner, 2001]. et al., 2009], a variant of the temporal fast downward plan-
A classical planning problem consists of a set of state- ning system originally developed by Eyerich et al. [2009].
variables with finite domains, an initial state, a set of actions TFD/M provides a generic interface for the integration of
and a goal condition. An action is defined by a precondition external modules that compute the values of state variables,
and its effects, which is a set of variable assignments. A so- action costs, and numerical effects during the planning pro-
lution for a classical planning problem is then defined as a cess using sub-processes. By means of these sub-processes,
finite sequence of actions that leads from the initial state to a we combine temporal planning with path planners tradition-
goal state. There exist several efficient planning systems for ally used for robot navigation and coordination. The work by
classical planning problems [Bonet and Geffner, 2001; Hoff- Gregory et al. [2012] uses modules during planning to eval-
mann and Nebel, 2001; Chen et al., 2006; Helmert, 2006; uate not only predicates, but also terms in first order logic.
Richter and Westphal, 2010]. Autonomous disaster recovery in partially known envi-
In multi-robot coordination, actions need to be executed ronments, as introduced in this chapter, has not been studied
concurrently and they usually have variable durations. These in depth. An auction-based coordination method as well as a
temporal constraints cannot be handled by classical plan- method based on genetic algorithms is presented by Jones et
ning approaches and constitute a new class of problems. The al. [2011]. In their work, they coordinate a simulated team
task of finding solutions to these problems is known as tem- of fire trucks and bulldozers leading to a coordination prob-
poral planning and several efficient approaches have been lem that is similar to the disaster recovery domain consid-
presented [Gerevini et al., 2008; Eyerich et al., 2009]. The ered in this chapter. In contrast to our approach, however,
predominant approach of solving temporal planning prob- blockades are assumed to be known beforehand. It is un-
lems is forward search with a search guidance function us- clear, how this approach can be extended to partially known
ing A∗ or similar algorithms. Most approaches support the environments. Koes et al. [2005] described an approach that
use of numerical state variables. In contrast to binary and employs mixed integer linear programming (MILP) to coor-
multi-valued state variables, numeric state variables have an dinate a heterogeneous team in a search and rescue scenario.
infinite continuous value domain. While numeric state vari- In this domain, disaster victims need to be found and iden-
ables lead to undecidability even when used in a very lim- tified in a known environment. This task involves assigning
ited form [Helmert, 2002], they are considered to be of high robots with different capabilities to appropriate tasks. Simi-
importance when modeling real world domains. Temporal lar to our approach, path costs between target locations are
planning is more complex than classical planning, which explicitly taken into account using a robotic path planner.
is PSPACE-complete [Bylander, 1994]. Although the gen- The static nature of the environment, however, allows all
eral problem of temporal planning is EXPSPACE-complete, paths to be pre-computed for the entire runtime. Such a pre-
wide classes are still in PSPACE [Rintanen, 2007]. computation is not possible in the case of partially known
While temporal planners generate sequences of abstract environments.
actions, most real-world applications in robotics require ex-
plicit motion commands that can be executed by the robots.
For this reason, a number of approaches have been pro- 3 Temporal Planning
posed that integrate other reasoners, such as motion plan-
ners, to generate low-level actions. The work by Cam- Our approach employs a symbolic planning system to com-
bon et al. [2009] focuses on the integration of manipulation pute actions for a team of robots. Algorithms for classical
planning with a symbolic planner. Kaelbling and Lozano- domain independent planning generate plans that consist of
Perez [2011] address the problem of successor generation sequences of actions. There is a substantial body of literature

Coordinating Heterogeneous Teams of Robots using Temporal Symbolic Planning 5

on this topic and a description of planning algorithms can, more details on the definition of temporal planning tasks we
for example, be found in [Russell and Norvig, 2010]. In con- refer to the work of Eyerich et al. [2009].
trast to these classical planners, temporal planning systems When coordinating a cooperative team of robots, spe-
explicitly allow for concurrent actions. Consider the initial cific tasks are assigned to individual robots. In most do-
state of the simple planning problem shown in Fig. 1. The mains, robots work in parallel. Still, their actions might de-
environment contains two targets t1 and t2 and two robots pend on each other, for example, when a robot is clearing a
e and c. The task is to explore both targets using robot e. path for another robot. Coordination problems that include
Robot c can clear the blockade b1 using the action (clear such cases can only be solved efficiently when the concur-
c b1) and robot e can explore targets using, for example, rency of the domain, but also interdependencies of robot
the action (explore e t2). Target t1 is blocked by b1 tasks, are accounted for. Temporal planning allows for the
and cannot be explored initially. specification of such conditions and is therefore well suited
A classical planner is able to produce a sequence of ac- for multi-robot coordination.
tions that solves the problem of exploring all targets. For
example the following plan is a valid solution:
3.1 Planning Domain Definition Language
(clear c b1)
(explore e t2) A wide range of problem types can be modeled as a gen-
(explore e t1) eral planning problem, ranging from transportation prob-
lems and single-player games to general combinatorial prob-
This, however, is not the best solution to the task. The first
lems. In recent years, the Planning Problem Definition Lan-
two actions are not dependent on each other, so they can be
guage (PDDL) [Fox and Long, 2003] has been established
executed in parallel. Temporal planning allows to express
as the prevalent planning language.
such concurrent actions and would result in the following
To generate a PDDL task description, one needs to de-
exemplary plan
fine (i) the objects involved in the planning process, (ii) the
0.0: (clear c b1) [1.2] predicates that define the state of the planner, (iii) actions
0.0: (explore e t2) [2.0] that change the state, and (iv) a start state and a goal con-
2.0: (explore e t1) [1.8], dition. We will give several examples of PDDL statements
in the following sections. The PDDL description then forms
where the actions are preceded by the start time and the du- the input to symbolic planners.
rations of actions are given in square brackets. For example, In our approach, PDDL is used to encode the coordi-
(explore e t2) starts at time 0.0 and takes 2.0 time nation problem that has to be solved in a multi-robot sys-
units to complete. tem. Based on this description, the temporal planner com-
In our approach, we define temporal planning problems putes concurrent actions for the team of robots. We use
using PDDL 2.1 [Fox and Long, 2003], which is a language PDDL/M [Dornhege et al., 2009], an extension to PDDL
for defining planning problems and allows for durative ac- that allows for the definition of external module calls. Using
tions. Durative actions are an extension of classical plan- this method, we combine symbolic planning and a robotic
ning actions. In addition to conditions and effects they allow path planner in our approach (see Section 3.3).
the problem description to specify an execution time. Note
that solutions to temporal planning problems do not define a
strict sequence of actions but merely a partially ordered set 3.2 The TFD/M Planning System
and especially that actions might be executed in parallel.
In realistic planning problems, there will be restrictions TFD/M is a domain-independent progression search planner
on which actions can be executed in parallel and which ac- developed by Dornhege et al. [2009]. It extends the tempo-
tions have to be completed before another action can be ral planner framework Temporal Fast Downward (TFD) by
started. These restrictions are modeled as conditions. More Eyerich et al. [2009]. TFD in turn is based on a classical
precisely, a condition is defined as a tuple (C⊢ ,C↔ ,C⊣ ), planning system called Fast Downward that was developed
where C⊢ is a start condition that must hold at the start time by Helmert [2006]. TFD extends the original system to sup-
of the action, C⊣ is an end condition that must hold at the port durative actions and numeric expressions while TFD/M
end of the action and C↔ is an overall condition that must adds support for external modules.
hold during the execution of the action. TFD/M solves a planning problem in three phases: First,
The effects of durative actions are specified in a similar the PDDL planning task is translated from its original
way. An effect is defined as a tuple (E⊢ , E⊣ ), where E⊢ is a encoding into a more concise representation using finite-
start effect that is applied at the start of the action and E⊣ domain variables. This is used by the planner to guide
is an end effect that is applied at the end of the action. For the search by employing hierarchical dependencies between

6 Kai M. Wurm et al.

state variables and leads to an increased search performance. initial problem (see Fig. 1), for example, can be defined in
In the second step, efficient internal data structures are the following way:
generated that are used by the search component and the
(:durative-action clear
search guidance function. The most important ones are do- :parameters (?c - clearer ?t - blockade)
main transition graphs for each variable that encode how :duration (= ?duration [costClear ?c ?t])...
state variables can change their values and the causal graph
that represents the hierarchical dependencies between dif- Note that the use of a semantic attachment is indicated by
ferent state variables. Finally, a best-first progression search squared brackets in the definition. For further details on the
is performed, guided by a numeric temporal variant of the implementation of semantic attachments in forward chain-
context-enhanced additive method [Helmert and Geffner, ing state space planners such as TFD/M, we refer the reader
2008]. to our previous work [Dornhege et al., 2009].
When the planner expands actions in the planning phase,
In contrast to several other temporal planning systems,
it detects semantic attachments and executes the associated
TFD/M does not decompose the search into an action selec-
dynamic library calls. These external calls compute the ap-
tion phase and a scheduling phase but searches directly in
propriate action costs. They are made exaclty at the same
the space of time-stamped states. This usually leads to plans
time that a planner without modules would acquire the value
of significantly higher quality [Eyerich et al., 2009]. Note,
from the internal state. In our approach, for example, the
however, that the first plan that is generated by this method
cost of navigation actions are computed by a robot path plan-
is not necessarily optimal. This is because the search guid-
ner.
ance function is inadmissible, in other words, it does not
In the general case, those module calls can influence the
guarantee an underestimation of the true execution cost of a
complexity of the system. Wide classes of temporal plan-
plan.
ning problems lie in PSPACE, while temporal planning is
TFD/M is implemented as an anytime algorithm that EXPSPACE-complete [Rintanen, 2007]. The scenarios in
does not terminate after the first solution is generated. It pro- this paper belong to those classes, as they do not allow
duces a potentially non-optimal solution quickly and then the same action to overlap with itself. This means, that as
prunes the search space to those time-stamped states which long as the sub-problem solved by a single module call is
can potentially be extended to solutions with a lower over- in PSPACE, the complete system will retain its complex-
all duration. If all states in the resulting state space are ex- ity. This is due to the fact that module calls depend only
panded, the produced solution is guaranteed to be optimal. on the planner state, but not on other module calls. Thus,
once a module call is finished, its memory can be freed and
therefore no space is accumulated in memory. The numeric
planner used in this work fulfills this condition as it only
accumulates a polynomial amount of states – the grid cells.
3.3 Semantic Attachments in TFD/M

TFD/M features semantic attachments that are a means of 4 Coordination of Heterogenous Teams of Robots Using
evaluating components of the planning task externally. This Temporal Planning
is implemented as a module interface for predicates, numer-
In the following, we will describe our coordination frame-
ical effects, and durations. In our coordination algorithm, se-
work for heterogeneous teams of robots. We assume that
mantic attachments are used to specify durations of actions
to solve a given task, the robots have to move in the envi-
in the planning task description. Consider, for example, the
ronment and additionally need to perform symbolic actions.
following module specification:
In this context, symbolic actions are defined as actions that
(costClear ?r - robot ?b - blockade cost change the state of the overall system but whose primary
costClear@libcost module.so) purpose is not to change the position of the robots. Numeric
coordination approaches usually consist of a method to eval-
This defines a semantic attachment for cost computation (in- uate the utility of navigation goals and a target assignment
dicated by the cost keyword) and is used to specify the technique. Since it is unclear how we can translate symbolic
duration of actions. It has two parameters, a robot r and a actions into a utility measure as it is used in numeric coordi-
blockade b. To define a semantic attachment, a function call nation approaches, we instead treat navigating to an explo-
is provided that performs the actual computation externally ration target as an action in a symbolic planning system. In
(here, costClear@libcost module.so). our approach, the target evaluation and assignment modules
We can now use the semantic attachment defined above of numeric approaches are replaced by a temporal symbolic
to specify the duration of actions. The clear action in the planner, a method to generate a problem description for this

Coordinating Heterogeneous Teams of Robots using Temporal Symbolic Planning 7

sensor
data map

map
coordination
targets
robot
states Fig. 3 Example of a marsupial robot team. A versatile but slow legged
platform is deployed by a faster wheeled robot. Courtesy of DFKI
Robotics Innovation Center.
PDDL
actions
description module
action call
costs

Fig. 2 This figure shows an overview of the coordination system. Solid
arrows mark the control flow and dashed arrows represent data passed
between modules.

planner and a numeric path planner that is used to estimate Fig. 4 An exploring robot (white circle) has to choose between three
the path costs of navigation actions. possible actions: explore target t1 , explore target t3 , or deploy a smaller
robot at m1 to let it explore t2 in the red area that it cannot explore
The architecture of our system is illustrated in Fig. 2. In itself.
our approach, we assume global and unlimited communica-
tion between the robots and thus employ a centralized co-
the mission goal. From the solution of the symbolic planner,
ordination approach. Furthermore, all robots are assumed to
we extract actions and send them to the robots. The loop
know their individual position. The team of robots provides
depicted in Fig. 2 is executed constantly: Whenever new in-
the sensor data and states of the platforms, such as their posi-
formation about the environment arrives, for example, new
tions, current actions and navigation goals, to our centralized
sensor data is perceived or one of the robots reaches a target,
coordination system. We use the sensor measurements and
we replan.
poses of the robots to build a grid map of the environment.
Some of the modules in our coordination framework are
From this map, we extract locations that are relevant for co-
specific to the application. Depending on which actions the
ordination. In an exploration task, relevant locations would
robots are able to perform, we need to adapt the PDDL de-
be exploration targets that can, for example, be determined
scription of the coordination problem. The possible actions
using the frontiers approach [Yamauchi, 1997]. To execute
furthermore influence which locations we need to extract
symbolic actions we furthermore extract positions at which
from the map. In the following, we will present two applica-
the actions need to be executed. If some of the robots were
tions and describe how to adapt the framework to them.
able to open doors, for example, we would determine the
positions of doors in the map.
We solve the coordination problem of assigning actions 5 Coordination of Marsupial Teams
to the robots using the temporal symbolic planning system
TFD/M. From the current state of the robots and the lo- In the first application, we use our planning framework to
cations we extracted from the map of the environment we coordinate a team of marsupial robots. In such a team, one
generate a problem description in PDDL. This description group of robots, the carriers, are able to carry, deploy and
serves as the input for the planner. Since the goal is to solve retrieve a group of other robots, the rovers. An example of
the given task as fast as possible, we need to consider the a real carrier and rover robot is depicted in Fig. 3. Here, a
costs of traveling to navigation goals. To this end, we make versatile but slow legged platform is deployed by a faster
use of the modular interface of TFD/M to call a numeric wheeled robot.
path planner for mobile robots. The numeric planner returns We assume that carriers and rovers have different navi-
the estimated path cost of reaching a goal position from a gation capabilities and that certain areas of the environment
given start position. Based on these estimates, the tempo- can only be explored by the rovers and others only by the
ral planner is able to compute an efficient plan that solves carriers. We furthermore assume that the robots are able to

8 Kai M. Wurm et al.

reaching t as:
Ctype (x,t) (1)

 est. path cost(x,t) , if robots of type type
= can reach t from x
∞ , otherwise.


Finally, the exploration task is assumed to be completed as
soon as the set of exploration targets T is empty.

Fig. 5 Example of the costs that have to be considered. Dotted lines 5.2 Formulating the Exploration Problem as a Temporal
illustrate the estimated path costs between the robot at position p and
the different target positions ti , the costs between meeting points mi
Planning Problem
and robot or targets positions, and costs between target positions. For
the sake of better visibility, we did not display all costs in this figure. To apply the coordination approach described above, we
need to encode the coordination problem that arises in a
given situation as a PDDL description. First, we define
determine which areas are traversable by which robot based which types of objects are involved. In the exploration sce-
on their sensor observations, for example based on tech- nario, possible objects are robots that can be either rovers
niques developed in our previous work [Wurm et al., 2009]. or carriers and locations that can be meeting points or ex-
In our experiments, a marsupial team consists of n ploration targets. The corresponding PDDL statements are
carrier robots, where each carrier initially carries m rover given in Fig. 6 (left). Second, we specify the predicates that
robots. The goal is to completely explore the environment, are used to define the state. The major predicates we use to
that is, to cover the traversable area with the sensors of describe the exploration problem are
the robots. Fig. 4 depicts a situation where a carrier has
(at ?r - robot ?x - location),
to choose between exploring the environment itself and de-
ploying a rover in an area that it cannot reach. This illus- which describes that the robot r is at position x and
trates the key challenges that the coordination method faces: (on ?e - rover ?c - carrier)
It needs to generate exploration targets, to assign robots to which is used to determine whether a rover e is docked at a
those targets, and to schedule deployment and retrieval ac- carrier c. For each target t ∈ T , we also define if it has been
tions. explored
(explored ?t - target).
Additionally, we use a numeric state variable
5.1 Target Locations and Cost Estimation
(num docked ?c - carrier)
To identify potential target locations, a set of exploration tar- that keeps track of the number of rovers that are docked at a
gets T is generated from the partially explored grid map. In carrier c.
addition to this, a set of meeting points M is determined. Third, the actions that change the state are provided.
These meeting points are situated at the border between We define four actions, namely dock, undock, move, and
those parts of the environment that can only be traversed explore. The actions dock and undock are used to de-
by the rovers and the parts that can only be traversed by ploy or pick up a rover. They require that the carrier and
the carriers. They are used for deployment and retrieval of the rover are at the same meeting point, which is ensured
the rovers (see Fig. 5 for an illustration). To determine the using the at predicate. For docking, the number of docked
targets and meeting points a frontier extraction algorithm is rovers has to be lower than the carrier’s capacity and the ac-
used as first described by Yamauchi [Yamauchi, 1997]. tion changes a rover’s state from being at a meeting point to
There are two basic types of actions a carrier can per- being on a carrier (encoded using the on predicate).
form: exploring a target or visiting a meeting point to de- The other two actions move and explore model the
ploy or retrieve a rover (see Fig. 4). While deployment and possible motions of the robots. The move action moves a
retrieval are assumed to have constant costs, the cost of trav- robot to a meeting point for deployment or retrieval while
eling between two locations in the environment is defined as the explore action moves the robot to a target and ex-
the estimated travel time of a given robot. This cost depends plores it. To define the duration of the move and explore
on the path length as well as on the traversability constraints actions, we employ the module interface of the temporal
and travel speed of the corresponding robot. Let type be a planner. Instead of specifying a constant duration or a fixed
robot type (here: carrier or rover), x a location in the envi- formula, we call an external module that determines the du-
ronment and t ∈ {T ∪ M} a target. We define the cost for ration of the action. In our setting, the external module is

Coordinating Heterogeneous Teams of Robots using Temporal Symbolic Planning 9

(:durative-action explore
:parameters (?r - robot (:init
(:types ?s - location ?g - target) (at robot0 p)
robot :duration (= ?duration (on robot1 robot0)
carrier rover - robot [pathCost ?r ?s ?g]) (can_explore robot0 t1)
location :condition (and (at start (at ?r ?s)) (can_explore robot1 t2)
target meeting - location ) (at start (not (explored ?g))) (can_explore robot0 t3)
(:predicates (at start (can_explore ?r ?g)) ... ) )
(at ?r - robot ?x - location) :effect (:goal (and
(on ?e - rover ?c - carrier) (and (explored t1)
(explored ?t - target) (at start (not (at ?r ?s))) (explored t2)
(can_explore ?r - robot ?t - target)) (at end (at ?r ?g)) (explored t3)
(at start (explored ?g)) ))
... ))

Fig. 6 Examples of PDDL definitions. Left: definition of the types and predicates used in the coordination domain. Middle: definition of the
explore action. Right: example that shows how to specify the current state of the world for the TFD/M planner (see illustration shown in Fig. 4).

realized by an efficient path planner for mobile robots that dination module and stored in a set T = {t1 , . . . ,tn }. When-
plans the optimal trajectory of the robot to the given tar- ever a target is visited by one of the exploring robots, it is
get location based on the current occupancy grid map con- removed from the set T and the mission map.
structed by the robots so far. Fig. 6 (middle) depicts the While the robots navigate in the environment they up-
PDDL statements that describe the action explore. The date their maps using their sensors. Whenever a blockade b
term[pathCost ?r ?s ?g] represents the call to the is sensed, this information is distributed to all other robots.
external module. In our current implementation, we use Di- The central coordinator keeps track of blockades in a set
jkstra’s algorithm for cost estimation as it allows to effi- B = {b1 , . . . , bm } which is used to coordinate the clearing
ciently compute the path cost from a given start position to robots.
all goal positions. Finally, the initial state of the current plan- Furthermore, the coordination system maintains a repre-
ning procedure and the goal state are specified. For the situ- sentation of the environment that is composed of the prior
ation depicted in Fig. 4, this is exemplified in Fig. 6 (right). building map and the set of blockades sensed by the robots.
This map is used by the robot path planner to plan collision
free paths for the team of robots and to estimate travel costs
6 Coordinated Disaster Recovery
during target assignment. It estimates path costs from a lo-
cation x to a target t ∈ {T ∪ B} according to
As a further application of the coordination approach de-
scribed in Section 4, we investigate teams of robots that op- C(x,t) (2)
erate in a disaster recovery scenario. More specifically, we
est. path cost(x,t) , path from x to t traversable
assume that a known building is partially collapsed so that =
∞ , path blocked.
paths within the building are blocked at unknown positions.
The goal is to visit a given set of targets in the building us- This is an optimistic estimate, as we do not assume any
ing a team of robots. In a real world application, the robots knowledge about where blockades may appear. When a
could, for instance, provide high-resolution views of the sit- probabilistic model of blockades is given the path planning
uation, close a set of valves, or deploy a set of beacons. We problem can be formulated as the Canadian Traveller’s Prob-
assume that there are two types of robots: exploring robots lem [Papadimitriou and Yannakakis, 1991]. There exist ef-
that visit targets by performing a specific action at a given ficient methods to solve this problem [Eyerich et al., 2010]
location and clearing robots that are able to clear blocked that can be used for the path cost estimation. Note, however,
paths. Both types of robots are able to detect blockades using that in our formulation blockades can be cleared and thus
their sensors. All robots are provided with a map of the en- this estimate would also not be optimal.
vironment that includes the mission targets. This prior map A disaster recovery mission is considered completed as
does not, however, include blocked paths. soon as the set of targets T is empty, that is, all of the targets
The key challenge in this scenario is to coordinate the have been visited by an exploring robot.
team of robots so that the exploring robots do not waste time
by waiting for blocked paths to be cleared. An illustration of
a typical situation in this problem domain is given in Fig. 1. 6.2 Formulation as a Temporal Planning Problem

To apply our coordination approach to the disaster recov-
6.1 Target Locations and Cost Estimation ery problem described above, we encode the system’s state
and possible actions using PDDL statements. This PDDL
The initial mission map includes all target locations that description then serves as input to our coordination algo-
have to be visited. These locations are extracted by the coor- rithm as depicted in the system overview in Fig. 2.

10 Kai M. Wurm et al.

(:types
(:durative-action explore-from-blockade
robot - object
:parameters (?r - explorer ?s - blockade ?g - mission_target)
clearer explorer - robot
:duration (= ?duration [pathCost ?r ?s ?g])
location - object
:condition (and (at start (at ?r ?s)) (at start (not (explored ?g)))
free_location - location
(at start (not (= ?s ?g))) (at start (cleared ?s)) )
mission_target - free_location
:effect
blockade - location)
(and
(:predicates
(at start (not (at ?r ?s)))
(at ?r - robot ?x - location)
(at end (at ?r ?g))
(cleared ?t - blockade)
(at start (explored ?g)) ))
(explored ?t - mission_target))

Fig. 7 Examples of PDDL definitions used in the disaster recovery domain. Left: definition of the types and predicates. Right: definition of the
explore-from-blockade action.

First, we define which types of objects are involved. In to clear the blockade. In our experiments we assume that this
this scenario, the relevant objects are the robots that can be value depends on the size of the blockade. After the action is
either exploring robots or clearing robots and the map loca- executed, the predicate (cleared b) is set to true in the
tions that can be mission targets or blockades. Fig. 7 (left) current state. Finally, the initial and goal state are defined.
shows the corresponding PDDL statements. Second, we In the initial state, no blockades have been discovered and
specify the predicates that we use to define the problem none of the mission targets have been explored. The goal
states. The state in the disaster recovery problem can be state is defined as the state that describes all mission targets
specified using as explored.
(at ?r - robot ?x - location) The coordination loop depicted in the system overview
(see Fig. 2) is executed continuously until the goal state is
which describes that the robot r is at position x,
reached. While the description of the actions remain un-
(cleared ?b - blockade) changed during consecutive planning cycles, the current
which describes that blockade b has been cleared, and state of the system is updated to reflect the current state of
(explored ?t - mission target) the system and the environment. The updated state descrip-
tion defines all robots to be at their specific locations and
which describes that mission target t has been visited by an
blockades are added to the state whenever they are discov-
exploring robot.
ered. Previously cleared blockades or targets that have been
Third, we provide the actions that change the state of
explored are irrelevant to the current problem and thus are
the system. There are four kinds of movement actions: We
ignored in the problem description.
distinguish between actions that explore a mission target
and actions that have a goal position other than a mission
target. We furthermore differentiate actions whose start lo-
7 Evaluation
cation is a blockade and actions that start at a location in
free space. As an example, the PDDL definition given in The approach described in this paper has been implemented
Fig. 7 (right) defines a movement action that explores a tar- and evaluated thoroughly using a multi-robot simulation
get starting from a blockade. It can be seen, that movement system. The experiments are designed to show that explicitly
actions which start at a blockade require the blockade to be planning symbolic actions leads to a significantly more effi-
cleared before execution. All movement actions require the cient coordination than using a heuristic extension of previ-
robot to be at the start location and result in the robot be- ous coordination approaches. We evaluated our coordination
ing at the goal location. The costs of these actions are de- framework in the two problem domains introduced in Sec-
fined as the estimated path costs and are determined via the tion 5 and Section 6. In addition, we show that our approach
modular interface much the same as in the marsupial explo- is able to produce efficient coordination plans in reasonable
ration scenario. The goal location of all exploration actions time and thus can be employed in a real-world system.
is a mission target t. Once such an action is executed, the
current state is changed so that the corresponding predicate
(explored t) is set to true.
7.1 Simulation System
Movement actions that start at a blockade have the addi-
tional precondition that the blockade is cleared. To evaluate our coordination approach quantitatively, we de-
In addition to exploration actions, we define a clear veloped a simulation system that is able to simulate large
action to coordinate the group of clearing robots. This action teams of heterogeneous robots. In our current system, we
expresses that a given blockade b is cleared by a clearing also simulate laser range sensors. Sensor and odometry
robot that is positioned at the corresponding location. The noise are not considered since we focus on the coordination
duration of the action is defined by the time the robot takes aspects of the problems. The environment is modeled using

Coordinating Heterogeneous Teams of Robots using Temporal Symbolic Planning 11

generated by the simulator is depicted in Fig. 9. It shows a
representative coordination problem where exploration tar-
gets as well as symbolic actions need to be considered.
To get an intuition of the extend of the environment, one
can look at the ratio between the size of the environment and
the maximum sensor range of the robots. In many real office
Fig. 8 Simulated environments in the marsupial exploration experi- buildings, such as building 079 on the Freiburg campus, this
ments: office (left) and maze (right). White areas can only be traversed
by carriers while red (dark) areas can only be explored by rovers.
ratio is around 10 for a maximum sensor range of 4 m. The
ratio for the maze environment is 10.2 for rovers and 5.1 for
carriers. In the considerably bigger office environment the
ratio increases to 23.8 and 11.9 respectively.
In both environments, we simulated 30 exploration runs
with random initial robot positions. Exploration targets were
determined using the frontiers approach and neighboring ex-
ploration targets were clustered using visibility constraints
similar to the approach proposed by Burgard et al. [2005].

Fig. 9 Visualization of a simulated run in the maze environment. 7.3 Baseline Approach
Robots are depicted as disks with id numbers and carriers are marked
by a white dot. Small circles depict exploration targets (red) and meet-
ing points (green). In this situation, rover 3 could be retrieved by carrier We compared our coordination algorithm to a heuristic ex-
0 or carrier 2 at any of the three meeting points in its vicinity. tension of a method that assigns robots to target locations
based on cost estimates [Stachniss, 2009]. In this approach,
the Hungarian method is used to compute the cost-optimal
a grid map with additional traversability information and se- assignment of robots to exploration targets. To adapt this
mantic annotations such as mission targets or blockades. method to the problem of exploration with marsupial teams,
we first assign the carriers to exploration targets independent
of whether they can access those targets or not. The assign-
7.2 Exploration with Masupial Teams ment is solely based on the estimated travel cost of the carri-
ers, ignoring traversability constraints in the estimation. The
We simulated teams of marsupial robots to evaluate our ap- rovers are then deployed as follows: Whenever a carrier is
proach introduced in Section 5. The simulated environments assigned to a target that it cannot explore itself, it will move
contain areas that can only be traversed by rovers and areas to the nearest connecting meeting point and deploy a rover
that can only be traversed by carriers. The rover robots in there. This rover then explores the targets reachable from the
most real marsupial systems are considerably slower than meeting point. As soon as it has finished exploring them, it
the carrier robots. In the evaluation we account for this dif- returns to the meeting point. In our experiments, we assume
ference by simulating carrier robots that are twice as fast a limited number of rovers per carrier. This will lead to situ-
as rovers. Furthermore, the maximum sensor range of the ations in which a carrier needs to deploy a rover but has none
carriers is also twice as far as the sensor range of the rover available. The baseline approach then requires the carrier to
robots. first retrieve a rover.
We evaluated robot teams of varying sizes and differ- The baseline approach as described above closely re-
ent environments have been used in the simulation. Two of sembles heuristic deployment rules that have been applied
the environments we used in our experiments can be seen in previous marsupial systems [Murphy et al., 1999]. Note
in Fig. 8. The office environment resembles a typical office that the baseline approach could be improved by introducing
building with two corridors and a number of rooms. Some more complex reasoning. For example, it could anticipate
of the rooms can only be explored by rovers, those are de- cases in which no rovers are available to a carrier and then
picted as red areas in the maps. The second environment, in avoid assigning this carrier to targets that are only reach-
the following referred to as the maze environment, features a able by rovers. This would introduce additional state vari-
central area that can only be explored by rovers. In contrast ables that the coordination system would need to maintain.
to the areas in the office environment, it has multiple meet- In fact, this kind of reasoning would approximate a planning
ing points that can be used for deployment. A visualization method such as the one we apply in our approach.

12                                                                                                                                                    Kai M. Wurm et al.

                  160                                                                                                                               heuristic
                  140                                                                                                                           our approach

                  120
     time steps

                  100
                      80
                      60
                      40
                      20
                       0
                      #rovers     1   2   3   1   2   3    1    2     3                            1       2   3       1    2      3              1       2      3
                      #carriers       1           2             3                                          4                5                             6

                  350
                                                                                                                                                    heuristic
                  300                                                                                                                           our approach

                  250
     time steps

                  200

                  150

                  100

                      50

                       0
                      #rovers     1   2   3   1   2   3    1    2     3                            1       2   3       1    2      3              1       2      3
                      #carriers       1           2             3                                          4                5                             6

Fig. 10 Exploration time obtained with our approach compared to the ad-hoc method in the maze environment (top) and in the office environment
(bottom) for varying team sizes (number of carriers and number of rovers per carrier). The error bars indicate the 95% confidence intervals.

7.4 Results                                                                                        60
                                                                                                                                                 our approach
                                                                                                                                                     heuristic
                                                                                                   50
An overview of the results obtained in the experiments is
                                                                             exploration quality

                                                                                                   40
given in Fig. 10. It can be seen that our approach explores the
                                                                                                   30
environment significantly faster than the baseline method in
all configurations. By considering deployment and pick-up                                          20
actions explicitly, our approach avoids long travel paths and                                      10
detours that result from the pick-up heuristic in the base-
                                                                                                       0
line. In addition, it uses its larger planning horizon to plan                                                 1   2         3           4            5          6
sequences of actions while the baseline approach computes                                                                  number of carriers

exactly one action per robot. Even though we execute only                 Fig. 11 Exploration quality in the office environment over 30 runs us-
the first action in the sequence, robots often are in a good              ing two rovers per carrier. The higher the value, the better the coordi-
                                                                          nation. The error bars indicate the 95% confidence intervals. Note that
initial position for the next action.                                     similar results were obtained for the maze environment.
     Especially smaller teams of up to six robots profit con-
siderably from our coordination method. Larger teams can,
to some degree, compensate inefficiencies of the coordina-                where A is the total area of the environment and di denotes
tion when a good distribution in the environment is achieved              the distance traveled by robot i. This measure can intuitively
and many targets can be approached simultaneously. In the                 be understood as the average area each robot explores per
settings we evaluated, this effect can be noticed when more               movement.
than three carriers are used. The number of robots for which                  The results given in Fig. 11 show that our approach
this effect occurs clearly depends on the structure and size              reaches a significantly higher exploration quality. Especially
of the environment.                                                       larger teams of robots are coordinated more efficiently, so
     To evaluate the amount of unnecessary movement in                    that unnecessary movements are avoided. The reason for this
larger teams, we evaluated a further benchmark. We com-                   improvement lies in the larger planning horizon of our ap-
puted the exploration quality according to [Zlot et al., 2002]            proach compared to the baseline. Our approach anticipates
as                                                                        future states of the system, especially the position of the
                                                                          robots, while the baseline approach computes actions solely
                       n
                  1                                                       on the basis of the current state of the system. Avoiding un-
Q=
                  A   ∑ di ,                                    (3)
                                                                          necessary movements is a significant advantage on its own
                      i=1

You can also read