GraphDistNet: A Graph-based Collision-distance Estimator for Gradient-based Trajectory Optimization

Page created by Dana Caldwell

Careers

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

GraphDistNet: A Graph-based Collision-distance Estimator for
Gradient-based Trajectory Optimization
Yeseung Kim, Jinwoo Kim, and Daehyung Park†

Abstract— Trajectory optimization (TO) aims to find a se-
quence of valid states while minimizing costs. However, its
fine validation process is often costly due to computationally
expensive collision searches, otherwise coarse searches lower
the safety of the system losing a precise solution. To resolve
the issues, we introduce a new collision-distance estimator,
arXiv:2206.01517v1 [cs.RO] 3 Jun 2022

GraphDistNet, that can precisely encode the structural informa-
tion between two geometries by leveraging edge feature-based
convolutional operations, and also efficiently predict a batch
of collision distances and gradients through 25, 000 random
environments with a maximum of 20 unforeseen objects. Fur-
ther, we show the adoption of attention mechanism enables
our method to be easily generalized in unforeseen complex
geometries toward TO. Our evaluation show GraphDistNet
outperforms state-of-the-art baseline methods in both simulated Fig. 1: An example of the object fetch task. We represent a UR5e
and real world tasks. manipulator and obstacle geometries as graphs Gr and Go , respec-
I. I NTRODUCTION tively. Our graph-based collision-distance estimator, GraphDistNet,
enables a trajectory optimization method to find collision-free
Consider the problem of trajectory optimization (TO) in clut- motion by precisely estimating the collision distance and gradients.
ter for manipulation tasks. This requires finding a sequence
of feasible configurations with collision avoidance. However, trajectory planning. However, most detectors do not provide
the complexity of the real world reveals the necessity of collision gradients, where the lack of analytical gradients
more precise collision detection, though its computational often results in the use of unstable numerical gradients [15]
complexity may lower the usability of the robot. To resolve or stochastic gradients [16]. Otherwise, we can use inefficient
it, we need a highly-efficient collision detection method that gradient-free optimization methods. Recently, Zhi et al. intro-
is also robust to diverse environments toward planning of duced the first fully-differentiable detector that finds collision
safe and optimal manipulation trajectories. gradients from the distribution of collision scores [14].
Traditional detection approaches, such as Gilbert-Johnson- Although the approach could improve the optimization, the
Keerthi (GJK) algorithm [1], iteratively find the minimum gradient modeling is restricted to priory known geometries
distance between two sets of sub geometries [2]. There were modeling the gradients in configuration space.
also efforts to narrow down the search space [3], [4], sample In this work, we propose a new data-driven collision es-
collidable objects [5], [6], [7], or simplify the geometry [8] timator, GraphDistNet, that precisely estimates the collision
to minimize the iterations. However, simplifications often distance and gradients between two graphs (e.g., graphs of
degrade the completeness or the preciseness of a planning robot and obstacle geometries). GraphDistNet is the first
solution. Alternatively, researchers have adopted data-driven graph-based collision-distance estimator that can encode
models that approximate the detector by modeling the geo- the structural relation (i.e., edge feature) between point
metric proximity between two point sets [9], [10], [11]. Re- measures. By constructing a connected robot-object graph,
searchers have also extended the models by adopting incre- GraphDistNet can embed the geometric relations between
mental/active learning strategies [12], [13], [14]. The model graphs via message passing (i.e., convolutional operation)
approximation becomes more rapid at predicting collisions of edge features. In addition, to generalize GraphDistNet to
without exhaustive searches, though the approaches assume a new environment, we introduce an attention-based graph
a specific environment. Otherwise, they require additional construction scheme that drastically reduces the construction
explorations. time and helps to handle more complex geometries with-
A detector in TO has to be computationally efficient and out retraining. Finally, we enable GraphDistNet to provide
informative enough to guide the optimization process of collision gradients to guide TO by leveraging an auto-
differentiation tool from deep learning.
All authors are with the School of Computing, Korea Advanced Insti- We evaluate our approach on a 2-dimensional (2D) sim-
tute of Science and Technology, Korea ({rhgkrsus1, kjwoo31, ulated manipulation environment. Also, we show the TO
daehyung}@kaist.ac.kr). † D. Park is the corresponding au- performance given the graphized high-degree of freedom
thor. This work was supported by the National Research Foundation
of Korea(NRF) grants funded by the Korea government(MSIT) (No. manipulator and object geometries. The proposed detection
2021R1C1C1004368 and 2021R1A4A3032834). method is superior to state-of-the-art baseline methods with

respect to preciseness, robustness, and generalizability. We numbers or shapes of objects. Our method overcomes the
also demonstrate the detection and the improved planning generalization issue by looking at a region of interest from
performance using a mobile manipulator, Haetae, where we a connected graph of robot and object geometries.
show our method enables the robot to safely and also quickly The learned model-based detectors can be used for val-
reach a goal in clutter. idating candidate trajectories during TO. Major trajectory
Our main contributions are four folds: optimization methods often rely on the quality of collision
• We present the first graph-based collision-distance estima- gradients (i.e., smoothness and continuity) for numerical
tion network, GraphDistNet, that precisely regresses the optimization [22], [15]. Due to the lack of analytical collision
collision distance between objects. gradients, a major TO method, CHOMP [15], precomputes
• Our GraphDistNet provides accurate gradients and batch a simplified version of gradients by fitting objects with
computation results improving TO’s performance. simpler geometries such as spheres. Similarly, Trajopt [15]
• Our GraphDistNet is also robust to various environmental calculates numerical gradients between the swept-volumes
changes and unforeseen environments. of convex or non-convex shapes. However, the computation
• We demonstrate our method outperforms search and or the approximation of numerical gradients are costly and
learning-based state-of-the-art baseline methods in terms unstable, so Trajopt often leads to a local minimum or
of computational efficiency and preciseness via 2D sim- longer optimization time. Recently, the development of auto-
ulations. We also conduct a proof concept of robotic differentiation tools provides a new opportunity by adopt-
manipulation tasks in clutter using a real robot. ing differentiable collision detection. DiffCo [14] models
a robot’s collision and collision-free configuration spaces
II. R ELATED W ORK
by adopting a forward-kinematics kernel perceptron. Then,
Trajectory optimization primarily focuses on avoiding obsta- DiffCo generates collision gradients for TO by differentiating
cles by defining collision penalty or collision-free inequality its collision-score function. In addition, the batch prediction
constraint terms. To specify collision terms, researchers and gradient computation of ClearanceNet resulted in the ac-
have introduced three representation types of geometric celeration of sampling-based motion planning [11]. We also
detection approaches: point-based, voxel-based, and mesh- use neural network based gradient computation and a batch
based methods. The first two approaches are suited to non- prediction for thousands of collision checks. Furthermore, we
parametrically describing complex geometries with raw data improve preciseness and generalization capabilities by adopt-
but do not hold topological information [17]. On the other ing graph-based representation and attention techniques.
hand, mesh-based methods explicitly represent the surface
III. BACKGROUND : G RAPH N EURAL N ETWORKS
of a structure, e.g., triangular meshes [18]. In this field,
Gilbert–Johnson–Keerthi (GJK) [1] and expanding polytope We briefly review graph neural network (GNN) to regress
algorithm (EPA) [19] are representative types of convex poly- graph properties. Let consider a graph G = (V, E, X ) that is
tope detectors that iteratively find the minimum distance and a set of nodes vi ∈ V, edges eij ∈ E, node features xi ∈ X ,
the penetration depth between two convex sets, respectively. where the nodes, known as vertices, represent entities or their
To handle a wide variety of robotic applications, Pan et point measures in a real world. Edges represent vertices’
al. introduced a unified framework, flexible collision library topological relationships such as connectivity, adjacency, and
(FCL), that combines GJK, EPA, and other collision detec- enclosure. The subscripts, i and j, represent the indices
tion methods [2]. While these approaches have been widely of vertices in G. Likewise, eij is the edge between vi
used in this field, the intricate mesh design increases time and and vj . A node feature xi is a dv -dimensional feature
space complexities in the real world. Our method also uses a vector (∈ Rdv ) associated with vi (e.g., a point coordinate
graph representation to hold topological information, but we in pointcloud). However, the node feature often lacks the
adopt a data-driven prediction scheme for faster prediction. topological information defined between two vertices, so
The data-driven detection is now emerging as the method we can additionally define edge features xij = hθ (xi , xj )
that has effective representation learning and fast prediction as described in [23], where the function hθ is a nonlinear
capability [10], [9], [20], [13]. Major approaches often mapping function hθ : Rdv × Rdv → Rde , θ is a set of
perform kinematic-collision detection in a predetermined learnable parameters, and de is the dimension of edge feature
environment by modeling collision-free configurations. For xij . By associating each vertex and edge with features and
modeling, researchers leveraged a wide variety of classic aggregating them, we can predict node-, edge-, or graph-level
learning approaches: support vector machine [10], [9], Gaus- of properties [24].
sian mixture model [20], k-nearest neighbor [9], kernel per- A straightforward aggregation method is message pass-
ceptron [12], [13], [14], or CMA-ES [21]. For generalization ing [25], which consists of three steps toward regres-
in new environments, researchers have also adopted incre- sion/classification (see Fig. 2). First, each node vi in a graph
mental/adaptive learning strategies but required additional computes a message (e.g., node or edge features) to ex-
data collection [12], [14]. Alternatively, Kew et al. introduced change it with neighbors. Second, each node aggregates the
a neural network-based collision checker, ClearanceNet, that messages of the neighbors using sum or average operations.
predicts a collision distance given robot and object configura- Then, each node updates the node feature using a function
tions [11]. However, ClearanceNet still suffers from varying of the aggregated message and the current features. We

(a) Message
 Construction
 gradients for TO. In this section, we describe how to 1)
 construct a connected input graph, 2) encode the geometric
 relation via message passing, and 3) estimate the collision-
 distance and gradients toward TO in order. Finally, we
 discuss the collision distance that can improve the distance
 ( )-th layer

 regression performance.
 A. Initial Graph Construction
 Consider a pair of robot and object geometries that can be
 represented as undirected graphs, Gr = (Vr , Er , Xr ) and
 Go = (Vo , Eo , Xo ), respectively. Assuming the graphs are
 (b) Aggregation
 independent, Vr ∩ Vo = ∅, we construct a connected graph
 by adding a bipartite graph Gc that connects each node in
 ( + )-th layer

 Gr with all or part of the nodes in Go . We create a connected
 graph to exchange geometric relationships between two
 graphs via message passing over edges.
 For the connection, we can first use a fully-connected
 (c) Update
 bipartite graph Gc , where each node in geometry connects
Fig. 2: A message-passing scheme of convolutional operation at with all nodes in another geometry (see Fig. 3(a)). We can
a target node v0 with the node feature x0 given a graph G(k) . (a) then build a connected graph Gu = Gr ∪ Go ∪ Gc . However,
Message Construction: The operation first requires generating a construction has several drawbacks regarding construction
message at each edge around v0 using a function φ(xi , xj , xij ). time and space complexities. Given the large graphs due to
(b) Aggregation: The operation then aggregates the generated complex geometries or multiple objects, the space complex-
messages from (a), and makes an output node feature x0 for G(k+1)
via a function γ with previous feature x0 . (c) Update: Finally, the
 ity drastically increases to |Ec | = |Vr | × |Vo |, since the fully-
operation replaces the target node feature x0 with the output of γ. connected graph requires connecting all vertices between Vr
 and Vo . The enlargement of complexity causes a problem
 (a) Fully-connected bipartite graph (b) Sampling-based bipartite graph
 with a high-memory footprint and computationally expensive
 ‫ݔ‬௜௝,௥௘௟
 gather-scatter operations in the message passing [26].
 jc,o Alternatively, we define a sampling-based bipartite graph
 ‫ܩ‬௥ ‫ܩ‬௥
 as Gc = (Vc , Ec , Xc ), where Ec connects either the vertices
 ‫ܩ‬௢ ‫ܩ‬௢ of Gr with a vertex vjc,o of Go or the vertices of Go with a
 jc,r
 ‫ݔ‬௜௝,௟௢௖ vertex vjc,r of Gr (see Fig. 3(b)):
Fig. 3: Two ways of constructing a connected graph given Gr Ec = {eijc,o |(∀vi ∈ Vr , ∃vjc,o ∈ Vo )}
and Go . (a) We build a fully-connected bipartite graph by creating
 (2)
 ∪ {eijc,r |(∀vi ∈ Vo , ∃vjc,r ∈ Vr )},
a maximum number of edges between two graphs. (b) We build a
sampling-based bipartite graph by placing a single connection node where we randomly select (jc,r , jc,o ) and eij = eji since we
per graph (jc,r , jc,o ). use undirected graphs. For the sake of conciseness, we omit
 the symbol o and r behind so we use jc to indicate the index
describe a typical form of message passing scheme which of the selected informative node that can be connected to vi .
is a generalized convolutional operation [26] as following: Note that vi and vjc must not be on the same graph. With a
 (k+1)
 
 (k)
 
 (k) (k) (k)
  sampling-based bipartite graph, we can still build a connected
 xi = γ xi ,  φ xi , xj , xij , (1) graph Gu (see Fig. 4). Although we randomly select each jc
 j:(i,j)∈E
 for initial graph construction, jc is not necessarily a random
where γ and φ are multi-layer perceptrons (MLPs) for the index after the first message passing. In Sec. IV-B, we will
update and aggregation of features, respectively.
 P  is a discuss how to construct better message-passing structure by
symmetric aggregation operator (e.g., or max) and (k) introducing an attention-score based jc selection mechanism.
represents the k-th layer. Fig. 2 shows a visualization of To embed geometric information in a message, we define a
the message passing scheme. The convolutional operation node feature xi ∈ Xu as a Cartesian coordinate (xi , yi ) ∈ R2
aggregates the messages in the 1-hop neighborhood of a of the node vi in a graph Gu , where xi is not necessarily be
target node vj . Finally, we can collapse the aggregated a 2D coordinate and its dimension varies depending on the
messages to a vector for regression/classification. Note that convolutional operation. In this work, we gradually update
k = 0 means the input layer and K is the number of total the node features over layers in GraphDistNet. To distinguish
 (0)
layers. each, we define the initial graph as Gu and represent each
 (k)
 IV. M ETHOD : G RAPH D IST N ET updated graph as Gu , where k = 1, ..., K.
We introduce a graph-based collision-distance estimator, B. Graph Update via Attention-based Message Passing
GraphDistNet, that encodes the geometric relation between We propose a new message passing method for GraphDistNet
two graphs and predicts their minimum distance and collision that gradually aggregates the information of the minimum

New node features and attention-based
 Max Pooling Readout
 Input Graphs
 Convolution Update
 Random

 …
 Obstacle
 0.1
 0
 0.7 0.8

 MLP
 0.1
 0.2
 0.1
 0

 Robot

 …
 0
 0
 0

 Graph Construction Graph Update via Attention-based Message Passing Distance Regression

Fig. 4: Illustration of a graph-based collision-distance estimation network (GraphDistNet). Given robot and object graphs, our GraphDistNet
constructs an initial connected graph Gu , and then updates its node features via graph convolutional operations. By selecting the maximally
attentive node, which has the highest attention score in each graph, we construct a new connected graph for the convolutional operations
on the next layer. Finally, given an iteratively updated graph, our GraphDistNet regresses a collision distance value by aggregating the
node features of two graphs and taking the aggregated information into a readout network.

distance between Gr and Go via the K number of message- message passing, we use an improved version of the edge
passing layers. The aggregation is a process of convolutional feature that is a concatenated vector of the current difference
operations, where each node vi exchanges node/edge features and a shortcut connection difference from the initial graph
 (0)
with its 1-hop neighbor nodes. We define the procedure at a Gu :
target node vi as following:  
  x(0) − x(0) if k = 0,
 (k+1)
 X (k)
 
 (k) (k)
  (k) j i
 xi = hθk xij,loc , xijc ,rel , (3) xij =  (0)  (6)
  x − x(0) k x(k) − x(k) otherwise,
 j∈Nro (i)∪{i} j i j i

where Nro (i) is the index set of 1-hop neighborhood of the where j represents a node index in 1-hop neighbors of the
target node vi in Gro (= Gr ∪ Go ) and jc is a random or target node vi , the operator k represents the concatenation
 (k) (k) (0)
the best node index that satisfies eijc ∈ Ec and i 6= jc . of two vectors, and xij ∈ R2 .
 (k) (k) (k) (k) The convolutional operation in Eq. (3) aggregates edge
xij,loc ∈ Xloc and xijc ,rel ∈ Xrel are two types of edge
features, local-edge and relative-edge features, respectively. features and updates node features via an MLP-based en-
 (k)
We will describe the details later. The convolution encodes coder hθk at the k-th layer:
both local-edge and relative-edge features via an MLP-based (
 (k)
encoder hθk with a set of trainable parameters θ k and (k) R4 → Rd h if k = 0,
 hθk : 2dh +4 dh
 (7)
aggregates the features over the neighborhood of the target R →R otherwise,
node vi . Updatingnode features per layer, we iteratively
 (1) (2)
obtain new graphs Gu , Gu , ..., Gu
 (K)
 in order. Note that where dh = 32. In this work, we use a (32, 32, 32) size of 3-
 hidden layer MLPs with a Leaky ReLU activation function
the structure of graphs varies since we sample jc per layer.
 where its slope parameter is 0.2.
We describe the details below.
 The proposed convolution operation particularly focuses After each convolution, we find the most informative
on capturing the local structure among vertices (i.e., edge vjc in each graph Gr and Go by introducing an attention-
features), since the distance estimation does not require the based selection method. The informativeness of relative-
information of global structure such as the global coordi- edge features depends on the selection of jc , which enables
nates of objects. Assuming each object is a collection of GraphDistNet to exchange the collision-distance information
small patches [23], GraphDistNet encodes two types of edge between Gr and Go . However, the uniform random selection
features: local-edge feature xij,loc ∈ Xloc and relative-edge may degrade the message-passing performance by finding
feature xij,rel ∈ Xrel associated with edges in Er ∪ Eo and far-distance edges. Further, a large number of obstacles or
Ec , respectively: complex geometries require a large number of layers to pass
 necessary features. Therefore, we adopt graph attention [27]
 (k) (k)
 xij,loc = xij if eij ∈ Er ∪ Eo , (4) and sampling [28] schemes, where the attention mechanism
 (k) (k) helps finding the most relevant node to predict the collision
 xij,rel = xij if eij ∈ Ec . (5)
 distance by computing an attention score per node. To do
Both features represent the difference between two node that, our method first computes an attention score per node
 (k)
features and are used for updating previous node features via feature in each graph as a(k)T xi , where i is the node index
 (k) (k)
message passing. To minimize the degradation problem of in Gu and a is a vector of trainable attention values
 (k) (k)
losing the original geometric information during the iterative (∈ Rdh ). By using the attention values, ar for Gr and

(k) (k)
ao for Go , we can determine the attention distribution of collision avoidance, gradient-based trajectory optimization
 (k) (k)
node features in each graph Gr and Go as follows: solvers can be faster than using conventional numerical
 (k)T (k) gradients such as finite differences.
 (k) exp(ar xi )
 αir = P (k)T (k)
 , (8) D. Collision Distance
 j exp(ar xj )
 The training of our GraphDistNet requires a pair of input
 (k)T (k)
 (k) exp(ao xi ) graphs and output distance. The input is a set of a robot graph
 αio =P , (9)
 (k)T (k) Gr and a union of object graphs, Go = Go,1 ∪ ... ∪ Go,n ,
 j exp(ao xj )
 where n is the number of objects, and the output can be
 (k) (k) (k) (k) (k) (k)
where vir ∈ Vr and vio ∈ Vo . αir and αio are the minimum distance between Gr and the nearest object
 (k) (k) Go,i ⊂ Go . However, the minimum distance is always
attention weights for Gr and Go , respectively. The node
index with the maximum attention weight indicates jc in each zero when two target geometries overlap regardless of the
graph (e.g., jc,r = arg maxir αir and jc,o = arg maxio αio ). degree of penetration. This causes non-smooth regression
 that degrades the estimation performance near the collision
C. Collision-distance and -gradient Estimation
 status. To resolve the problem, we use a modified collision-
We regress a collision distance via a readout network distance function fcd similar to [14]:
by
 adopting the node features of layerwise output graphs (
 (0) (0) (K) (K)
 (Gr , Go ), ..., (Gr , Go ) . To do that, we compute mini fd (Gr , Go,i ), if Gr ∩ Go = ∅,
 fcd (Gr , Go ) =
the attention-weighted features to extract the informative fea- − maxi fpd (Gr , Go,i ), otherwise,
tures from each graph by applying an attention mechanism: where fd (·) returns the minimum distance between input ge-
 ometries and fpd (·) returns the local translational-penetration
 !
 (k)
 X
 (k)
 yr = fLR αir xir ∈ Rdh , (10) depth computed from flexible collision library (FCL) [2].
 ir
 ! V. E VALUATION S ETUP
 (k)
 X
 yo(k) = fLR αio xio ∈ Rd h , (11) A. Collision Distance and Gradient Estimation
 io We first aimed to evaluate the accuracy and the robustness
 (k) (k) of the estimated collision distance and gradients via planar
where x ir ∈ Rdh , ∈ Rdh , and fLR is a LeakyReLU
 xio
 arm environments. Fig. 5 shows five types of environments
activation function with a slope parameter 0.2. After stack-
 
 (1)T (K)T
  where we placed either 2 or 7 degree-of-freedom (DoF)
ing the attention-weighted features yr , ..., yr ∈ planar arms with obstacles. We measured the distance and
 dh ×K
R , we perform element-wise max pooling to hold trans- gradient estimation performance by randomly placing single
lation invariance. We then obtain a dh -dimensional vector, or multiple objects. We describe each setting as follows:
yf,r ∈ Rdh . Likewise, we also compute yf,o ∈ Rdh . Finally, • 2-DoF Arm & 1-fixed Obj.: We placed a 2-DoF planar
we estimate a collision distance, arm and a fixed square object, as shown in Fig. 5(a).
  dnh dnh  • 2-DoF Arm & 1-random Obj.: We placed a 2-DoF planar
 dˆ = M LP max yrk (i) k max yok (i) , (12) arm and a random position and orientation of the square
 k k
 i=1 i=1 object, as shown in Fig. 5(b).
 • 7-DoF Arm & 20-fixed Objs.: We placed a 7-DoF planar
 | {z } | {z }
 yf,r yf,o
 arm and a fixed distribution of 20 square objects, as shown
where we use a (32, 32, 32) size of 3-hidden layer MLPs. in Fig. 5(c).
The output dˆ indicates an estimated collision distance. • 7-DoF Arm & 20-random Objs.: We placed a 7-DoF
 We can then derive additional tools from GraphDistNet planar arm and randomly distributed and rotated 20 square
that can benefit trajectory planning. One is a binary collision objects, as shown in Fig. 5(d)
checker setting a safety distance-margin dmargin : • 7-DoF Arm & 5 random α-shapes: We placed a 7-DoF
 planar arm and 5 randomly-created α-shape [29] objects,
 (
 True if dˆ ≤ dmargin ,
 Is collision? = as shown in Fig. 5(e). To create an α-shape, we sampled
 False otherwise,
 20 points uniformly in 2D space and connected maximum
where the margin dmargin varies depending on the robot 15 points with α = 0.6 to make a closed loop of points
and objects. We can say a robot is in a collision when the as an object. We then removed the remaining.
collision distance dˆ is within the margin. Another tool is a We trained our method and two learning-based baselines
collision-gradient estimator that returns a gradient of distance (i.e., DiffCo [14] and ClearanceNet [11]) with 50, 000 num-
measure with respect to a configuration of the robot. We ber of random configurations of the robot and objects. We
can compute the gradient using any automatic differentia- then tested methods with 5, 000 random configurations of
tion tool in deep learning libraries since GraphDistNet is environments. Note that DiffCo requires training in a fixed
analytically differentiable. Thus, we can obtain a gradient environment so we trained DiffCo on Fig. 5(a) and Fig. 5(c)
as ∂GraphDistN et(θ)/∂θ given a joint configuration θ for the evaluations on Fig. 5(b) and Fig. 5(d), respectively.
that can build a robot graph Gr . Providing the gradients for For Fig. 5(e), we randomly sampled a configuration of

Distance Regression & Classification Gradient Est.
 Env. Method
 Elapsed MAE
 AUC ACD
 Time (s) with p-value
 Fixed Obstacles Random Obstacles
 (a) (b) Fig. 5
 FCL 1.0653 - - -
 DiffCo 0.0048 N/A  1.0000 0.3575
 (a) ClearanceNet 0.0009 0.0082 1.0000 0.0255
 6 × 10−20
 GraphDistNet 0.0115 0.0031 1.0000 0.0054
 FCL 1.1822 - - -
 Fig. 5 DiffCo (w/o active learning) 0.0054 N/A 0.5031 1.0270
 DiffCo (w/ active learning) 2058.2 N/A  0.9948 0.4221
 (b)
 ClearanceNet 0.0009 0.1732 0.9986 0.2853
 (c) (d) 2 × 10−12
 GraphDistNet 0.0115 0.0390 0.9999 0.1748
 FCL 3.1573 - - -
 Fig. 5 DiffCo 0.0740 N/A  0.9843 0.3202
Y (c) ClearanceNet 0.0012 0.2360 0.9111 0.5933
 6 × 10−60
 GraphDistNet 0.0574 0.0253 0.9990 0.1458
 FCL 4.4467 - - -
 Fig. 5 DiffCo (w/o active learning) 0.0742 N/A 0.5239 0.9450
 DiffCo (w/ active learning) 75129.5 N/A  0.9798 0.3968
 (e) (d)
 X ClearanceNet* 0.0012 0.7725 0.5691 0.9491
 1 × 10−10
 GraphDistNet 0.0546 0.1614 0.9842 0.4152
 FCL 6.8432 - - -
 Obstacles DiffCo (w/o active learning) 0.0350 N/A 0.5200 0.9612
 Fig. 5
 Robot DiffCo (w/ active learning) (Failure:
  Out of Memory)
 (e)
 ClearanceNet* 0.0011 0.7479 0.8882 0.9767
 3 × 10−2
 GraphDistNet 0.1251 0.2446 0.9925 0.8421

Fig. 5: Examples of 2D planar arm envi- TABLE I: Comparison of the GraphDistNet and baseline methods on planar arm envi-
ronments. Robots and objects are red and ronments. We evaluated each method with 5000 random samples, where the elapsed time
blue, respectively. For (a) and (c), we use represents the average computation time of the 5000 estimation samples. MAE represents
the same objects for both training and test the mean absolute error between the estimated collision distance and its ground truth from
settings. For (b), (d), and (e), we vary the FCL. p-value represents the statistical significance of error distributions via Welch’s t-test.
pose and shape of objects. AUC is the area under the receiver operating characteristic (ROC) curve metric for the binary
 collision classification problems. The ACD is the average of cosine distances between the
 estimated gradient fields and their ground truths from FCL. Note that MAE is not available in
 DiffCo as DiffCo returns a score instead of a distance. (*we applied dropouts with p = 0.5.)
objects and then trained DiffCo on it. Then, when we similar to DiffCo [14]. We performed 20 random reaching
test the DiffCo, we performed active learning that updates evaluations per setting, where we used three types of random
the learned model to adapt to the unforeseen environment. environments such as Fig. 5(b), Fig. 5(d), and Fig. 1. Note
Further, to obtain ground truths, we used FCL as a baseline that we perform the iterations until the cost falls below a
method. certain threshold.
 To compare the performance, we use three metrics (i.e., To measure the TO performance, we used three metrics:
MAE, AUC, and ACD) in addition to the elapsed time. The average elapsed time, average path cost, and reaching success
MAE represents the mean absolute error between the esti- rate. The path costP is the sum of distances between end-
mated collision distance and its ground truth from FCL. AUC effector waypoints: t ||θ t+1 − θ t ||2 . For a fair comparison,
is the area under the receiver operating characteristic (ROC) we considered the cost data collected from environments
curve metric for the binary collision classification problems. where all evaluation methods succeeded.
The ACD is the average of cosine distances between the C. Demonstration with Manipulation Tasks
estimated gradient fields and their ground truths from FCL
(see Fig 6). To obtain gradient value per environment, we We then demonstrated the feasibility of our method via
randomly sampled 100 gradients over the space. real-world experiment. We set an object-fetch scenario, in
 which a robot has to reach and grasp a target object
B. Trajectory Optimization with Gradients avoiding any environmental collisions. We used a robot
We then evaluated the planning performance with our that is a 6-DoF manipulator (i.e., UR5e from Universal
GraphDistNet. Our differential collision detection can be Robots) with a parallel-jaw gripper (i.e., 2F-85 from Robo-
combined with any TO algorithms, such as CHOMP [22] tiq). To simplify the demonstration task, we used the second,
and TrajOpt [30], that require a collision cost or constraint third, and fourth joints (i.e., shoulder lift joint, elbow joint,
function as well as its Jacobian. We defined a distance cost wrist 1 joint) only, otherwise we fixed the remained joints
 c
function C(d) = ||d−dmargin ||2 + , where c is an constant to make the end effector move on a vertical plane.
and d is the collision distance that is a function of joint For training GraphDistNet and another baseline method,
angles θ. By obtaining the gradient ∂C(d)/∂θ via the auto DiffCo, we performed three steps of training processes. 1)
differentiation in Pytorch [31], we performed trajectory opti- We first represented the complex geometry of the manipu-
mization of 20 waypoints given a linearly interpolated joint lator as a graph Gr by projecting the link meshes on the
trajectory. In this evaluation, we used an Adam optimizer vertical plane and manually extracting the boundaries of

Collision Avg. Elapsed Avg. Path Success
Env.
Checker Time (s) Cost Rate (%)

Fig. 5 DiffCo 0.9304 23.7657 0.95
ClearanceNet 1.7300 24.9018 1.0
(b)
GraphDistNet 1.9522 24.4297 1.0
Fig. 5 DiffCo 28.8247 99.0850 0.95
ClearanceNet 10.9816 72.3153 0.4
(d)
GraphDistNet 16.2682 65.0297 0.9
DiffCo 0.6364 3.9×105 0.4
Fig. 1 ClearanceNet 1.9603 5.5×105 0.3
GraphDistNet 3.2982 9.0×105 0.7

TABLE II: Comparison of trajectory optimization (TO) perfor-
mance. We performed 20 random evaluations per method that is
combined with a TO method.
anceNet also resulted in large accuracy drops in complex
environments such as Fig. 5 (c), (d), and (e).
In this work, we particularly compared the gradient esti-
mation performance toward TO. Fig. 6 shows the exemplar
visualizations of gradient fields from FCL, DiffCo, and
GraphDistNet. Unlike DiffCo, the field of GraphDistNet
is indistinguishable from that of ground truth from FCL.
Fig. 6: Exemplar plots of the minimum distance (or collision Thus, we can expect that our method is potentially useful
score) field given a 2-DoF planar arm and an obstacle. We use for precise optimization by guiding the trajectory (green) to
FCL, DiffCo, and GraphDistNet to generate each field in order.
The contour lines represent the change of distance with respect to avoid collision precisely following the collision gradients.
the configuration of the arm (i.e., gradient), where the blue and On the other hand, as shown in Fig. 6 (c), the inaccurate or
red spaces represent the likelihood of collision and collision-free drastically changing gradients of DiffCo may lead to unstable
spaces, respectively. We projected the trail of arms in (a) as an optimization reaching local minima.
end-effector trajectory (green) in (b), (c), and (d).
Table II shows the comparison of TO performance given
three challenging environments. In simple environments (i.e.,
them as polygons by using a point-based annotation method
Fig. 5 (b)), differences between algorithms appeared less
(see Fig. 1). We extracted multiple polygons and combined
though DiffCo was able to return a quick solution. Given
them as a Gr based on joint angles considering the revolute
more complex environments, our method showed consis-
joints. Likewise, we also created Go . 2) We then prepared a
tently high success rates in finding the best available paths.
collision-distance dataset with 50, 000 samples that consist of
Although ClearanceNet also resulted in low path costs, its
graphs (Gr , Go ) and their distances. For graphs, we sampled
success rate is significantly low. As mentioned before, the
joint angles from a uniform random distribution and also
fixed representation of the input in ClearanceNet restricts
varied the gripper polygon considering the size of an object
the use in a new complex environment. The restriction also
in hand. For distance computation, we used FCL. 3) Finally,
affects the TO performance. Note that our method takes a
we generated an optimal joint trajectory via TO given initial
comparably long time since we did not utilize the batch
and final joint angles for fetching. For the optimization, we
prediction capability of GraphDistNet. If we adopt batch
used a sum of three costs: position error, joint-limit violation,
gradient optimization methods in TO, we expect to show
and collision distance. We provided a linearly interpolated
superior efficiency in the future.
joint trajectory as an initial guess of TO.
We finally demonstrated our GraphDistNet-based TO
VI. E VALUATION method with a real UR5e manipulator. Fig. 7 shows an object
We first compared the estimation performance of GraphDis- fetch scenario, where we generated joint trajectories from
tNet with that of three baseline methods through five types DiffCo and GraphDistNet combining a TO approach. Given
of clutter environments. Table I shows our GraphDistNet GraphDistNet, the manipulator could successfully reach a
resulted in the lowest distance and gradient estimation errors target object (i.e., a wood stick) avoiding collisions, par-
(i.e., MAE and ACD) as well as superior collision-detection ticularly around the small entrance of the storage (orange).
performances (i.e., AUC) in most environments, except the Further, our method was able to safely bring out the object in
environment (d). In environment (d), the DiffCo with an the gripper by training the GraphDistNet with random object
active learning scheme could show the lowest ACD value. geometries. On the other hand, the trajectory generation with
However, the method required significantly longer adaptation DiffCo could precisely reach the target after active learning.
time (i.e., about 15 s per environment) and failed to run However, the trajectory from DiffCo failed to bring the target
in the most complex environment (e). Such disadvantages object back without risk since the simple perceptron model is
will largely degrade the usability of potential applications. not enough to estimate the accurate collision scores required
In addition, another neural network based approach, Clear- for planning in a narrow passage. The limitation of DiffCo
anceNet, could show the second best performance but Clear- significantly degrades the usability of the system, pausing

DiffCo
 GraphDistNet
 (Ours)

Fig. 7: Demonstrations on an object-fetch task. Each row shows a UR5e robot tracking a joint space trajectory computed via trajectory
optimization with either DiffCo or GraphDistNet collision checkers. The result of DiffCo shows collisions with the storage (orange).
the manipulation process. Overall, the demonstration shows [12] N. Das, N. Gupta, and M. Yip, “Fastron: An online learning-based
our robot achieves effective trajectory optimization in a real- model and active learning strategy for proxy collision detection,” in
 Proc. Conf. on Robot Learning, PMLR, 2017.
world manipulation scenario. Note that we demonstrated a [13] N. Das and M. Yip, “Learning-based proxy collision detection for
planar TO problem with 2D coordinate features aligning robot motion planning applications,” Trans. on Robotics, 2020.
the robot and obstacles on the vertical plane. However, [14] Y. Zhi, N. Das, and M. Yip, “Diffco: Autodifferentiable proxy collision
 detection with multiclass labels for safety-aware trajectory optimiza-
GraphDistNet can be extended to more complex scenarios tion,” IEEE Transactions on Robotics, 2022.
adopting 3D features or graphs from point clouds. [15] J. Schulman, Y. Duan, J. Ho, A. Lee, I. Awwal, H. Bradlow, J. Pan,
 S. Patil, K. Goldberg, and P. Abbeel, “Motion planning with sequen-
 VII. C ONCLUSION tial convex optimization and convex collision checking,” Int’l J. of
We introduced a graph-based collision-distance estimation Robotics Research, 2014.
 [16] M. Kalakrishnan, S. Chitta, E. Theodorou, P. Pastor, and S. Schaal,
network, GraphDistNet, that encodes the geometric relation “Stomp: Stochastic trajectory optimization for motion planning,” in
between two graphs and predicts their minimum distance and Proc. Int’l Conf. on Robotics and Automation, 2011.
gradients to optimize a collision-free trajectory in clutter. Our [17] J. Pan, I. A. Şucan, S. Chitta, and D. Manocha, “Real-time collision
 detection and distance computation on point cloud sensor data,” in
method provides efficient and precise estimation performance Proc. Int’l Conf. on Robotics and Automation, 2013.
and generalization capabilities in unforeseen environments [18] S. Kockara, T. Halic, K. Iqbal, C. Bayrak, and R. Rowe, “Collision
by leveraging an attention mechanism. The statistical evalua- detection: A survey,” in Proc. Int’l Conf. on Systems, Man and
 Cybernetics, 2007.
tion of GraphDistNet shows superior performance compared [19] G. Van Den Bergen, “Proximity queries and penetration depth com-
to other baseline methods in various simulation environ- putation on 3d game objects,” in Game developers conference, 2001.
ments. We also demonstrated the applicability in the real [20] J. Huh and D. D. Lee, “Learning high-dimensional mixture models
 for fast collision detection in rapidly-exploring random trees,” in Proc.
world using a UR5e manipulator. Int’l Conf. on Robotics and Automation, 2016.
 [21] A. Kapusta and C. C. Kemp, “Task-centric optimization of configura-
 R EFERENCES tions for assistive robots,” Auto. Robots, 2019.
 [1] E. G. Gilbert, D. W. Johnson, and S. S. Keerthi, “A fast procedure for [22] M. Zucker, N. Ratliff, A. D. Dragan, M. Pivtoraiko, M. Klingensmith,
 computing the distance between complex objects in three-dimensional C. M. Dellin, J. A. Bagnell, and S. S. Srinivasa, “CHOMP: Covariant
 space,” IEEE Journal on Robotics and Automation, 1988. hamiltonian optimization for motion planning,” Int’l J. of Robotics
 [2] J. Pan, S. Chitta, and D. Manocha, “Fcl: A general purpose library for Research, 2013.
 collision and proximity queries,” in Proc. Int’l Conf. on Robotics and [23] Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M.
 Automation, 2012. Solomon, “Dynamic graph cnn for learning on point clouds,” ACM
 [3] J. D. Cohen, M. C. Lin, D. Manocha, and M. Ponamgi, “I-collide: Transactions On Graphics (TOG), 2019.
 An interactive and exact collision detection system for large-scale [24] J. Zhou, G. Cui, S. Hu, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li,
 environments,” in Proceedings of the symposium on Interactive 3D and M. Sun, “Graph neural networks: A review of methods and
 graphics, 1995. applications,” AI Open, 2020.
 [4] J. Bialkowski, M. Otte, S. Karaman, and E. Frazzoli, “Efficient [25] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl,
 collision checking in sampling-based motion planning via safety “Neural message passing for quantum chemistry,” in Proc. Int’l Conf.
 certificates,” Int’l J. of Robotics Research, 2016. on Machine Learning, PMLR, 2017.
 [5] S. Kumar, S. Choudhary, and S. Srinivasa, “Learning configuration [26] M. Fey and J. E. Lenssen, “Fast graph representation learning with
 space belief model from collision checks for motion planning,” arXiv pytorch geometric,” ICLR Workshop on Representation Learning on
 preprint arXiv:1901.07646, 2019. Graphs and Manifolds, 2019.
 [6] N. Das and M. C. Yip, “Stochastic modeling of distance to collision [27] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Liò, and
 for robot manipulators,” Robotics and Automation Letters, 2020. Y. Bengio, “Graph Attention Networks,” Proceedings of the Interna-
 [7] J. J. Johnson and M. C. Yip, “Chance-constrained motion planning tional Conference on Learning Representations, 2018.
 using modeled distance-to-collision functions,” in Proc. Int’l Conf. on [28] M. Serafini and H. Guan, “Scalable graph neural network training: The
 Automation Science and Engineering, 2021. case for sampling,” ACM SIGOPS Operating Systems Review, 2021.
 [8] S. Liu, X. Jin, C. C. Wang, and K.-c. Hui, “Ellipsoidal-blob approx- [29] H. Edelsbrunner and E. P. Mücke, “Three-dimensional alpha shapes,”
 imation of 3d models and its applications,” Computers & Graphics, ACM Transactions on Graphics (TOG), 1994.
 2007. [30] J. Schulman, J. Ho, A. X. Lee, I. Awwal, H. Bradlow, and P. Abbeel,
 [9] J. Pan and D. Manocha, “Fast probabilistic collision checking for “Finding locally optimal, collision-free trajectories with sequential
 sampling-based motion planning using locality-sensitive hashing,” Int’l convex optimization,” in Proc. Robotics: Science and Systems, 2013.
 J. of Robotics Research, 2016. [31] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan,
[10] J. Pan and D. Manocha, “Efficient configuration space construction T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, and A. e. a. Desmai-
 and optimization for motion planning,” Engineering, 2015. son, “Pytorch: An imperative style, high-performance deep learning
[11] J. Chase Kew, B. Ichter, M. Bandari, T.-W. E. Lee, and A. Faust, library,” in Conf. on Neural Information Processing Systems, 2019.
 “Neural collision clearance estimator for batched motion planning,”
 in Int’l Workshop on the Algorithmic Foundations of Robotics, 2020.