GraphDistNet: A Graph-based Collision-distance Estimator for Gradient-based Trajectory Optimization
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
GraphDistNet: A Graph-based Collision-distance Estimator for Gradient-based Trajectory Optimization Yeseung Kim, Jinwoo Kim, and Daehyung Park† Abstract— Trajectory optimization (TO) aims to find a se- quence of valid states while minimizing costs. However, its fine validation process is often costly due to computationally expensive collision searches, otherwise coarse searches lower the safety of the system losing a precise solution. To resolve the issues, we introduce a new collision-distance estimator, arXiv:2206.01517v1 [cs.RO] 3 Jun 2022 GraphDistNet, that can precisely encode the structural informa- tion between two geometries by leveraging edge feature-based convolutional operations, and also efficiently predict a batch of collision distances and gradients through 25, 000 random environments with a maximum of 20 unforeseen objects. Fur- ther, we show the adoption of attention mechanism enables our method to be easily generalized in unforeseen complex geometries toward TO. Our evaluation show GraphDistNet outperforms state-of-the-art baseline methods in both simulated Fig. 1: An example of the object fetch task. We represent a UR5e and real world tasks. manipulator and obstacle geometries as graphs Gr and Go , respec- I. I NTRODUCTION tively. Our graph-based collision-distance estimator, GraphDistNet, enables a trajectory optimization method to find collision-free Consider the problem of trajectory optimization (TO) in clut- motion by precisely estimating the collision distance and gradients. ter for manipulation tasks. This requires finding a sequence of feasible configurations with collision avoidance. However, trajectory planning. However, most detectors do not provide the complexity of the real world reveals the necessity of collision gradients, where the lack of analytical gradients more precise collision detection, though its computational often results in the use of unstable numerical gradients [15] complexity may lower the usability of the robot. To resolve or stochastic gradients [16]. Otherwise, we can use inefficient it, we need a highly-efficient collision detection method that gradient-free optimization methods. Recently, Zhi et al. intro- is also robust to diverse environments toward planning of duced the first fully-differentiable detector that finds collision safe and optimal manipulation trajectories. gradients from the distribution of collision scores [14]. Traditional detection approaches, such as Gilbert-Johnson- Although the approach could improve the optimization, the Keerthi (GJK) algorithm [1], iteratively find the minimum gradient modeling is restricted to priory known geometries distance between two sets of sub geometries [2]. There were modeling the gradients in configuration space. also efforts to narrow down the search space [3], [4], sample In this work, we propose a new data-driven collision es- collidable objects [5], [6], [7], or simplify the geometry [8] timator, GraphDistNet, that precisely estimates the collision to minimize the iterations. However, simplifications often distance and gradients between two graphs (e.g., graphs of degrade the completeness or the preciseness of a planning robot and obstacle geometries). GraphDistNet is the first solution. Alternatively, researchers have adopted data-driven graph-based collision-distance estimator that can encode models that approximate the detector by modeling the geo- the structural relation (i.e., edge feature) between point metric proximity between two point sets [9], [10], [11]. Re- measures. By constructing a connected robot-object graph, searchers have also extended the models by adopting incre- GraphDistNet can embed the geometric relations between mental/active learning strategies [12], [13], [14]. The model graphs via message passing (i.e., convolutional operation) approximation becomes more rapid at predicting collisions of edge features. In addition, to generalize GraphDistNet to without exhaustive searches, though the approaches assume a new environment, we introduce an attention-based graph a specific environment. Otherwise, they require additional construction scheme that drastically reduces the construction explorations. time and helps to handle more complex geometries with- A detector in TO has to be computationally efficient and out retraining. Finally, we enable GraphDistNet to provide informative enough to guide the optimization process of collision gradients to guide TO by leveraging an auto- differentiation tool from deep learning. All authors are with the School of Computing, Korea Advanced Insti- We evaluate our approach on a 2-dimensional (2D) sim- tute of Science and Technology, Korea ({rhgkrsus1, kjwoo31, ulated manipulation environment. Also, we show the TO daehyung}@kaist.ac.kr). † D. Park is the corresponding au- performance given the graphized high-degree of freedom thor. This work was supported by the National Research Foundation of Korea(NRF) grants funded by the Korea government(MSIT) (No. manipulator and object geometries. The proposed detection 2021R1C1C1004368 and 2021R1A4A3032834). method is superior to state-of-the-art baseline methods with
respect to preciseness, robustness, and generalizability. We numbers or shapes of objects. Our method overcomes the also demonstrate the detection and the improved planning generalization issue by looking at a region of interest from performance using a mobile manipulator, Haetae, where we a connected graph of robot and object geometries. show our method enables the robot to safely and also quickly The learned model-based detectors can be used for val- reach a goal in clutter. idating candidate trajectories during TO. Major trajectory Our main contributions are four folds: optimization methods often rely on the quality of collision • We present the first graph-based collision-distance estima- gradients (i.e., smoothness and continuity) for numerical tion network, GraphDistNet, that precisely regresses the optimization [22], [15]. Due to the lack of analytical collision collision distance between objects. gradients, a major TO method, CHOMP [15], precomputes • Our GraphDistNet provides accurate gradients and batch a simplified version of gradients by fitting objects with computation results improving TO’s performance. simpler geometries such as spheres. Similarly, Trajopt [15] • Our GraphDistNet is also robust to various environmental calculates numerical gradients between the swept-volumes changes and unforeseen environments. of convex or non-convex shapes. However, the computation • We demonstrate our method outperforms search and or the approximation of numerical gradients are costly and learning-based state-of-the-art baseline methods in terms unstable, so Trajopt often leads to a local minimum or of computational efficiency and preciseness via 2D sim- longer optimization time. Recently, the development of auto- ulations. We also conduct a proof concept of robotic differentiation tools provides a new opportunity by adopt- manipulation tasks in clutter using a real robot. ing differentiable collision detection. DiffCo [14] models a robot’s collision and collision-free configuration spaces II. R ELATED W ORK by adopting a forward-kinematics kernel perceptron. Then, Trajectory optimization primarily focuses on avoiding obsta- DiffCo generates collision gradients for TO by differentiating cles by defining collision penalty or collision-free inequality its collision-score function. In addition, the batch prediction constraint terms. To specify collision terms, researchers and gradient computation of ClearanceNet resulted in the ac- have introduced three representation types of geometric celeration of sampling-based motion planning [11]. We also detection approaches: point-based, voxel-based, and mesh- use neural network based gradient computation and a batch based methods. The first two approaches are suited to non- prediction for thousands of collision checks. Furthermore, we parametrically describing complex geometries with raw data improve preciseness and generalization capabilities by adopt- but do not hold topological information [17]. On the other ing graph-based representation and attention techniques. hand, mesh-based methods explicitly represent the surface III. BACKGROUND : G RAPH N EURAL N ETWORKS of a structure, e.g., triangular meshes [18]. In this field, Gilbert–Johnson–Keerthi (GJK) [1] and expanding polytope We briefly review graph neural network (GNN) to regress algorithm (EPA) [19] are representative types of convex poly- graph properties. Let consider a graph G = (V, E, X ) that is tope detectors that iteratively find the minimum distance and a set of nodes vi ∈ V, edges eij ∈ E, node features xi ∈ X , the penetration depth between two convex sets, respectively. where the nodes, known as vertices, represent entities or their To handle a wide variety of robotic applications, Pan et point measures in a real world. Edges represent vertices’ al. introduced a unified framework, flexible collision library topological relationships such as connectivity, adjacency, and (FCL), that combines GJK, EPA, and other collision detec- enclosure. The subscripts, i and j, represent the indices tion methods [2]. While these approaches have been widely of vertices in G. Likewise, eij is the edge between vi used in this field, the intricate mesh design increases time and and vj . A node feature xi is a dv -dimensional feature space complexities in the real world. Our method also uses a vector (∈ Rdv ) associated with vi (e.g., a point coordinate graph representation to hold topological information, but we in pointcloud). However, the node feature often lacks the adopt a data-driven prediction scheme for faster prediction. topological information defined between two vertices, so The data-driven detection is now emerging as the method we can additionally define edge features xij = hθ (xi , xj ) that has effective representation learning and fast prediction as described in [23], where the function hθ is a nonlinear capability [10], [9], [20], [13]. Major approaches often mapping function hθ : Rdv × Rdv → Rde , θ is a set of perform kinematic-collision detection in a predetermined learnable parameters, and de is the dimension of edge feature environment by modeling collision-free configurations. For xij . By associating each vertex and edge with features and modeling, researchers leveraged a wide variety of classic aggregating them, we can predict node-, edge-, or graph-level learning approaches: support vector machine [10], [9], Gaus- of properties [24]. sian mixture model [20], k-nearest neighbor [9], kernel per- A straightforward aggregation method is message pass- ceptron [12], [13], [14], or CMA-ES [21]. For generalization ing [25], which consists of three steps toward regres- in new environments, researchers have also adopted incre- sion/classification (see Fig. 2). First, each node vi in a graph mental/adaptive learning strategies but required additional computes a message (e.g., node or edge features) to ex- data collection [12], [14]. Alternatively, Kew et al. introduced change it with neighbors. Second, each node aggregates the a neural network-based collision checker, ClearanceNet, that messages of the neighbors using sum or average operations. predicts a collision distance given robot and object configura- Then, each node updates the node feature using a function tions [11]. However, ClearanceNet still suffers from varying of the aggregated message and the current features. We
(a) Message Construction gradients for TO. In this section, we describe how to 1) construct a connected input graph, 2) encode the geometric relation via message passing, and 3) estimate the collision- distance and gradients toward TO in order. Finally, we discuss the collision distance that can improve the distance ( )-th layer regression performance. A. Initial Graph Construction Consider a pair of robot and object geometries that can be represented as undirected graphs, Gr = (Vr , Er , Xr ) and Go = (Vo , Eo , Xo ), respectively. Assuming the graphs are (b) Aggregation independent, Vr ∩ Vo = ∅, we construct a connected graph by adding a bipartite graph Gc that connects each node in ( + )-th layer Gr with all or part of the nodes in Go . We create a connected graph to exchange geometric relationships between two graphs via message passing over edges. For the connection, we can first use a fully-connected (c) Update bipartite graph Gc , where each node in geometry connects Fig. 2: A message-passing scheme of convolutional operation at with all nodes in another geometry (see Fig. 3(a)). We can a target node v0 with the node feature x0 given a graph G(k) . (a) then build a connected graph Gu = Gr ∪ Go ∪ Gc . However, Message Construction: The operation first requires generating a construction has several drawbacks regarding construction message at each edge around v0 using a function φ(xi , xj , xij ). time and space complexities. Given the large graphs due to (b) Aggregation: The operation then aggregates the generated complex geometries or multiple objects, the space complex- messages from (a), and makes an output node feature x0 for G(k+1) via a function γ with previous feature x0 . (c) Update: Finally, the ity drastically increases to |Ec | = |Vr | × |Vo |, since the fully- operation replaces the target node feature x0 with the output of γ. connected graph requires connecting all vertices between Vr and Vo . The enlargement of complexity causes a problem (a) Fully-connected bipartite graph (b) Sampling-based bipartite graph with a high-memory footprint and computationally expensive ݔ, gather-scatter operations in the message passing [26]. jc,o Alternatively, we define a sampling-based bipartite graph ܩ ܩ as Gc = (Vc , Ec , Xc ), where Ec connects either the vertices ܩ ܩ of Gr with a vertex vjc,o of Go or the vertices of Go with a jc,r ݔ, vertex vjc,r of Gr (see Fig. 3(b)): Fig. 3: Two ways of constructing a connected graph given Gr Ec = {eijc,o |(∀vi ∈ Vr , ∃vjc,o ∈ Vo )} and Go . (a) We build a fully-connected bipartite graph by creating (2) ∪ {eijc,r |(∀vi ∈ Vo , ∃vjc,r ∈ Vr )}, a maximum number of edges between two graphs. (b) We build a sampling-based bipartite graph by placing a single connection node where we randomly select (jc,r , jc,o ) and eij = eji since we per graph (jc,r , jc,o ). use undirected graphs. For the sake of conciseness, we omit the symbol o and r behind so we use jc to indicate the index describe a typical form of message passing scheme which of the selected informative node that can be connected to vi . is a generalized convolutional operation [26] as following: Note that vi and vjc must not be on the same graph. With a (k+1) (k) (k) (k) (k) sampling-based bipartite graph, we can still build a connected xi = γ xi , φ xi , xj , xij , (1) graph Gu (see Fig. 4). Although we randomly select each jc j:(i,j)∈E for initial graph construction, jc is not necessarily a random where γ and φ are multi-layer perceptrons (MLPs) for the index after the first message passing. In Sec. IV-B, we will update and aggregation of features, respectively. P is a discuss how to construct better message-passing structure by symmetric aggregation operator (e.g., or max) and (k) introducing an attention-score based jc selection mechanism. represents the k-th layer. Fig. 2 shows a visualization of To embed geometric information in a message, we define a the message passing scheme. The convolutional operation node feature xi ∈ Xu as a Cartesian coordinate (xi , yi ) ∈ R2 aggregates the messages in the 1-hop neighborhood of a of the node vi in a graph Gu , where xi is not necessarily be target node vj . Finally, we can collapse the aggregated a 2D coordinate and its dimension varies depending on the messages to a vector for regression/classification. Note that convolutional operation. In this work, we gradually update k = 0 means the input layer and K is the number of total the node features over layers in GraphDistNet. To distinguish (0) layers. each, we define the initial graph as Gu and represent each (k) IV. M ETHOD : G RAPH D IST N ET updated graph as Gu , where k = 1, ..., K. We introduce a graph-based collision-distance estimator, B. Graph Update via Attention-based Message Passing GraphDistNet, that encodes the geometric relation between We propose a new message passing method for GraphDistNet two graphs and predicts their minimum distance and collision that gradually aggregates the information of the minimum
New node features and attention-based Max Pooling Readout Input Graphs Convolution Update Random … Obstacle 0.1 0 0.7 0.8 MLP 0.1 0.2 0.1 0 Robot … 0 0 0 Graph Construction Graph Update via Attention-based Message Passing Distance Regression Fig. 4: Illustration of a graph-based collision-distance estimation network (GraphDistNet). Given robot and object graphs, our GraphDistNet constructs an initial connected graph Gu , and then updates its node features via graph convolutional operations. By selecting the maximally attentive node, which has the highest attention score in each graph, we construct a new connected graph for the convolutional operations on the next layer. Finally, given an iteratively updated graph, our GraphDistNet regresses a collision distance value by aggregating the node features of two graphs and taking the aggregated information into a readout network. distance between Gr and Go via the K number of message- message passing, we use an improved version of the edge passing layers. The aggregation is a process of convolutional feature that is a concatenated vector of the current difference operations, where each node vi exchanges node/edge features and a shortcut connection difference from the initial graph (0) with its 1-hop neighbor nodes. We define the procedure at a Gu : target node vi as following: x(0) − x(0) if k = 0, (k+1) X (k) (k) (k) (k) j i xi = hθk xij,loc , xijc ,rel , (3) xij = (0) (6) x − x(0) k x(k) − x(k) otherwise, j∈Nro (i)∪{i} j i j i where Nro (i) is the index set of 1-hop neighborhood of the where j represents a node index in 1-hop neighbors of the target node vi in Gro (= Gr ∪ Go ) and jc is a random or target node vi , the operator k represents the concatenation (k) (k) (0) the best node index that satisfies eijc ∈ Ec and i 6= jc . of two vectors, and xij ∈ R2 . (k) (k) (k) (k) The convolutional operation in Eq. (3) aggregates edge xij,loc ∈ Xloc and xijc ,rel ∈ Xrel are two types of edge features, local-edge and relative-edge features, respectively. features and updates node features via an MLP-based en- (k) We will describe the details later. The convolution encodes coder hθk at the k-th layer: both local-edge and relative-edge features via an MLP-based ( (k) encoder hθk with a set of trainable parameters θ k and (k) R4 → Rd h if k = 0, hθk : 2dh +4 dh (7) aggregates the features over the neighborhood of the target R →R otherwise, node vi . Updatingnode features per layer, we iteratively (1) (2) obtain new graphs Gu , Gu , ..., Gu (K) in order. Note that where dh = 32. In this work, we use a (32, 32, 32) size of 3- hidden layer MLPs with a Leaky ReLU activation function the structure of graphs varies since we sample jc per layer. where its slope parameter is 0.2. We describe the details below. The proposed convolution operation particularly focuses After each convolution, we find the most informative on capturing the local structure among vertices (i.e., edge vjc in each graph Gr and Go by introducing an attention- features), since the distance estimation does not require the based selection method. The informativeness of relative- information of global structure such as the global coordi- edge features depends on the selection of jc , which enables nates of objects. Assuming each object is a collection of GraphDistNet to exchange the collision-distance information small patches [23], GraphDistNet encodes two types of edge between Gr and Go . However, the uniform random selection features: local-edge feature xij,loc ∈ Xloc and relative-edge may degrade the message-passing performance by finding feature xij,rel ∈ Xrel associated with edges in Er ∪ Eo and far-distance edges. Further, a large number of obstacles or Ec , respectively: complex geometries require a large number of layers to pass necessary features. Therefore, we adopt graph attention [27] (k) (k) xij,loc = xij if eij ∈ Er ∪ Eo , (4) and sampling [28] schemes, where the attention mechanism (k) (k) helps finding the most relevant node to predict the collision xij,rel = xij if eij ∈ Ec . (5) distance by computing an attention score per node. To do Both features represent the difference between two node that, our method first computes an attention score per node (k) features and are used for updating previous node features via feature in each graph as a(k)T xi , where i is the node index (k) (k) message passing. To minimize the degradation problem of in Gu and a is a vector of trainable attention values (k) (k) losing the original geometric information during the iterative (∈ Rdh ). By using the attention values, ar for Gr and
(k) (k) ao for Go , we can determine the attention distribution of collision avoidance, gradient-based trajectory optimization (k) (k) node features in each graph Gr and Go as follows: solvers can be faster than using conventional numerical (k)T (k) gradients such as finite differences. (k) exp(ar xi ) αir = P (k)T (k) , (8) D. Collision Distance j exp(ar xj ) The training of our GraphDistNet requires a pair of input (k)T (k) (k) exp(ao xi ) graphs and output distance. The input is a set of a robot graph αio =P , (9) (k)T (k) Gr and a union of object graphs, Go = Go,1 ∪ ... ∪ Go,n , j exp(ao xj ) where n is the number of objects, and the output can be (k) (k) (k) (k) (k) (k) where vir ∈ Vr and vio ∈ Vo . αir and αio are the minimum distance between Gr and the nearest object (k) (k) Go,i ⊂ Go . However, the minimum distance is always attention weights for Gr and Go , respectively. The node index with the maximum attention weight indicates jc in each zero when two target geometries overlap regardless of the graph (e.g., jc,r = arg maxir αir and jc,o = arg maxio αio ). degree of penetration. This causes non-smooth regression that degrades the estimation performance near the collision C. Collision-distance and -gradient Estimation status. To resolve the problem, we use a modified collision- We regress a collision distance via a readout network distance function fcd similar to [14]: by adopting the node features of layerwise output graphs ( (0) (0) (K) (K) (Gr , Go ), ..., (Gr , Go ) . To do that, we compute mini fd (Gr , Go,i ), if Gr ∩ Go = ∅, fcd (Gr , Go ) = the attention-weighted features to extract the informative fea- − maxi fpd (Gr , Go,i ), otherwise, tures from each graph by applying an attention mechanism: where fd (·) returns the minimum distance between input ge- ometries and fpd (·) returns the local translational-penetration ! (k) X (k) yr = fLR αir xir ∈ Rdh , (10) depth computed from flexible collision library (FCL) [2]. ir ! V. E VALUATION S ETUP (k) X yo(k) = fLR αio xio ∈ Rd h , (11) A. Collision Distance and Gradient Estimation io We first aimed to evaluate the accuracy and the robustness (k) (k) of the estimated collision distance and gradients via planar where x ir ∈ Rdh , ∈ Rdh , and fLR is a LeakyReLU xio arm environments. Fig. 5 shows five types of environments activation function with a slope parameter 0.2. After stack- (1)T (K)T where we placed either 2 or 7 degree-of-freedom (DoF) ing the attention-weighted features yr , ..., yr ∈ planar arms with obstacles. We measured the distance and dh ×K R , we perform element-wise max pooling to hold trans- gradient estimation performance by randomly placing single lation invariance. We then obtain a dh -dimensional vector, or multiple objects. We describe each setting as follows: yf,r ∈ Rdh . Likewise, we also compute yf,o ∈ Rdh . Finally, • 2-DoF Arm & 1-fixed Obj.: We placed a 2-DoF planar we estimate a collision distance, arm and a fixed square object, as shown in Fig. 5(a). dnh dnh • 2-DoF Arm & 1-random Obj.: We placed a 2-DoF planar dˆ = M LP max yrk (i) k max yok (i) , (12) arm and a random position and orientation of the square k k i=1 i=1 object, as shown in Fig. 5(b). • 7-DoF Arm & 20-fixed Objs.: We placed a 7-DoF planar | {z } | {z } yf,r yf,o arm and a fixed distribution of 20 square objects, as shown where we use a (32, 32, 32) size of 3-hidden layer MLPs. in Fig. 5(c). The output dˆ indicates an estimated collision distance. • 7-DoF Arm & 20-random Objs.: We placed a 7-DoF We can then derive additional tools from GraphDistNet planar arm and randomly distributed and rotated 20 square that can benefit trajectory planning. One is a binary collision objects, as shown in Fig. 5(d) checker setting a safety distance-margin dmargin : • 7-DoF Arm & 5 random α-shapes: We placed a 7-DoF planar arm and 5 randomly-created α-shape [29] objects, ( True if dˆ ≤ dmargin , Is collision? = as shown in Fig. 5(e). To create an α-shape, we sampled False otherwise, 20 points uniformly in 2D space and connected maximum where the margin dmargin varies depending on the robot 15 points with α = 0.6 to make a closed loop of points and objects. We can say a robot is in a collision when the as an object. We then removed the remaining. collision distance dˆ is within the margin. Another tool is a We trained our method and two learning-based baselines collision-gradient estimator that returns a gradient of distance (i.e., DiffCo [14] and ClearanceNet [11]) with 50, 000 num- measure with respect to a configuration of the robot. We ber of random configurations of the robot and objects. We can compute the gradient using any automatic differentia- then tested methods with 5, 000 random configurations of tion tool in deep learning libraries since GraphDistNet is environments. Note that DiffCo requires training in a fixed analytically differentiable. Thus, we can obtain a gradient environment so we trained DiffCo on Fig. 5(a) and Fig. 5(c) as ∂GraphDistN et(θ)/∂θ given a joint configuration θ for the evaluations on Fig. 5(b) and Fig. 5(d), respectively. that can build a robot graph Gr . Providing the gradients for For Fig. 5(e), we randomly sampled a configuration of
Distance Regression & Classification Gradient Est. Env. Method Elapsed MAE AUC ACD Time (s) with p-value Fixed Obstacles Random Obstacles (a) (b) Fig. 5 FCL 1.0653 - - - DiffCo 0.0048 N/A 1.0000 0.3575 (a) ClearanceNet 0.0009 0.0082 1.0000 0.0255 6 × 10−20 GraphDistNet 0.0115 0.0031 1.0000 0.0054 FCL 1.1822 - - - Fig. 5 DiffCo (w/o active learning) 0.0054 N/A 0.5031 1.0270 DiffCo (w/ active learning) 2058.2 N/A 0.9948 0.4221 (b) ClearanceNet 0.0009 0.1732 0.9986 0.2853 (c) (d) 2 × 10−12 GraphDistNet 0.0115 0.0390 0.9999 0.1748 FCL 3.1573 - - - Fig. 5 DiffCo 0.0740 N/A 0.9843 0.3202 Y (c) ClearanceNet 0.0012 0.2360 0.9111 0.5933 6 × 10−60 GraphDistNet 0.0574 0.0253 0.9990 0.1458 FCL 4.4467 - - - Fig. 5 DiffCo (w/o active learning) 0.0742 N/A 0.5239 0.9450 DiffCo (w/ active learning) 75129.5 N/A 0.9798 0.3968 (e) (d) X ClearanceNet* 0.0012 0.7725 0.5691 0.9491 1 × 10−10 GraphDistNet 0.0546 0.1614 0.9842 0.4152 FCL 6.8432 - - - Obstacles DiffCo (w/o active learning) 0.0350 N/A 0.5200 0.9612 Fig. 5 Robot DiffCo (w/ active learning) (Failure: Out of Memory) (e) ClearanceNet* 0.0011 0.7479 0.8882 0.9767 3 × 10−2 GraphDistNet 0.1251 0.2446 0.9925 0.8421 Fig. 5: Examples of 2D planar arm envi- TABLE I: Comparison of the GraphDistNet and baseline methods on planar arm envi- ronments. Robots and objects are red and ronments. We evaluated each method with 5000 random samples, where the elapsed time blue, respectively. For (a) and (c), we use represents the average computation time of the 5000 estimation samples. MAE represents the same objects for both training and test the mean absolute error between the estimated collision distance and its ground truth from settings. For (b), (d), and (e), we vary the FCL. p-value represents the statistical significance of error distributions via Welch’s t-test. pose and shape of objects. AUC is the area under the receiver operating characteristic (ROC) curve metric for the binary collision classification problems. The ACD is the average of cosine distances between the estimated gradient fields and their ground truths from FCL. Note that MAE is not available in DiffCo as DiffCo returns a score instead of a distance. (*we applied dropouts with p = 0.5.) objects and then trained DiffCo on it. Then, when we similar to DiffCo [14]. We performed 20 random reaching test the DiffCo, we performed active learning that updates evaluations per setting, where we used three types of random the learned model to adapt to the unforeseen environment. environments such as Fig. 5(b), Fig. 5(d), and Fig. 1. Note Further, to obtain ground truths, we used FCL as a baseline that we perform the iterations until the cost falls below a method. certain threshold. To compare the performance, we use three metrics (i.e., To measure the TO performance, we used three metrics: MAE, AUC, and ACD) in addition to the elapsed time. The average elapsed time, average path cost, and reaching success MAE represents the mean absolute error between the esti- rate. The path costP is the sum of distances between end- mated collision distance and its ground truth from FCL. AUC effector waypoints: t ||θ t+1 − θ t ||2 . For a fair comparison, is the area under the receiver operating characteristic (ROC) we considered the cost data collected from environments curve metric for the binary collision classification problems. where all evaluation methods succeeded. The ACD is the average of cosine distances between the C. Demonstration with Manipulation Tasks estimated gradient fields and their ground truths from FCL (see Fig 6). To obtain gradient value per environment, we We then demonstrated the feasibility of our method via randomly sampled 100 gradients over the space. real-world experiment. We set an object-fetch scenario, in which a robot has to reach and grasp a target object B. Trajectory Optimization with Gradients avoiding any environmental collisions. We used a robot We then evaluated the planning performance with our that is a 6-DoF manipulator (i.e., UR5e from Universal GraphDistNet. Our differential collision detection can be Robots) with a parallel-jaw gripper (i.e., 2F-85 from Robo- combined with any TO algorithms, such as CHOMP [22] tiq). To simplify the demonstration task, we used the second, and TrajOpt [30], that require a collision cost or constraint third, and fourth joints (i.e., shoulder lift joint, elbow joint, function as well as its Jacobian. We defined a distance cost wrist 1 joint) only, otherwise we fixed the remained joints c function C(d) = ||d−dmargin ||2 + , where c is an constant to make the end effector move on a vertical plane. and d is the collision distance that is a function of joint For training GraphDistNet and another baseline method, angles θ. By obtaining the gradient ∂C(d)/∂θ via the auto DiffCo, we performed three steps of training processes. 1) differentiation in Pytorch [31], we performed trajectory opti- We first represented the complex geometry of the manipu- mization of 20 waypoints given a linearly interpolated joint lator as a graph Gr by projecting the link meshes on the trajectory. In this evaluation, we used an Adam optimizer vertical plane and manually extracting the boundaries of
Collision Avg. Elapsed Avg. Path Success Env. Checker Time (s) Cost Rate (%) Fig. 5 DiffCo 0.9304 23.7657 0.95 ClearanceNet 1.7300 24.9018 1.0 (b) GraphDistNet 1.9522 24.4297 1.0 Fig. 5 DiffCo 28.8247 99.0850 0.95 ClearanceNet 10.9816 72.3153 0.4 (d) GraphDistNet 16.2682 65.0297 0.9 DiffCo 0.6364 3.9×105 0.4 Fig. 1 ClearanceNet 1.9603 5.5×105 0.3 GraphDistNet 3.2982 9.0×105 0.7 TABLE II: Comparison of trajectory optimization (TO) perfor- mance. We performed 20 random evaluations per method that is combined with a TO method. anceNet also resulted in large accuracy drops in complex environments such as Fig. 5 (c), (d), and (e). In this work, we particularly compared the gradient esti- mation performance toward TO. Fig. 6 shows the exemplar visualizations of gradient fields from FCL, DiffCo, and GraphDistNet. Unlike DiffCo, the field of GraphDistNet is indistinguishable from that of ground truth from FCL. Fig. 6: Exemplar plots of the minimum distance (or collision Thus, we can expect that our method is potentially useful score) field given a 2-DoF planar arm and an obstacle. We use for precise optimization by guiding the trajectory (green) to FCL, DiffCo, and GraphDistNet to generate each field in order. The contour lines represent the change of distance with respect to avoid collision precisely following the collision gradients. the configuration of the arm (i.e., gradient), where the blue and On the other hand, as shown in Fig. 6 (c), the inaccurate or red spaces represent the likelihood of collision and collision-free drastically changing gradients of DiffCo may lead to unstable spaces, respectively. We projected the trail of arms in (a) as an optimization reaching local minima. end-effector trajectory (green) in (b), (c), and (d). Table II shows the comparison of TO performance given three challenging environments. In simple environments (i.e., them as polygons by using a point-based annotation method Fig. 5 (b)), differences between algorithms appeared less (see Fig. 1). We extracted multiple polygons and combined though DiffCo was able to return a quick solution. Given them as a Gr based on joint angles considering the revolute more complex environments, our method showed consis- joints. Likewise, we also created Go . 2) We then prepared a tently high success rates in finding the best available paths. collision-distance dataset with 50, 000 samples that consist of Although ClearanceNet also resulted in low path costs, its graphs (Gr , Go ) and their distances. For graphs, we sampled success rate is significantly low. As mentioned before, the joint angles from a uniform random distribution and also fixed representation of the input in ClearanceNet restricts varied the gripper polygon considering the size of an object the use in a new complex environment. The restriction also in hand. For distance computation, we used FCL. 3) Finally, affects the TO performance. Note that our method takes a we generated an optimal joint trajectory via TO given initial comparably long time since we did not utilize the batch and final joint angles for fetching. For the optimization, we prediction capability of GraphDistNet. If we adopt batch used a sum of three costs: position error, joint-limit violation, gradient optimization methods in TO, we expect to show and collision distance. We provided a linearly interpolated superior efficiency in the future. joint trajectory as an initial guess of TO. We finally demonstrated our GraphDistNet-based TO VI. E VALUATION method with a real UR5e manipulator. Fig. 7 shows an object We first compared the estimation performance of GraphDis- fetch scenario, where we generated joint trajectories from tNet with that of three baseline methods through five types DiffCo and GraphDistNet combining a TO approach. Given of clutter environments. Table I shows our GraphDistNet GraphDistNet, the manipulator could successfully reach a resulted in the lowest distance and gradient estimation errors target object (i.e., a wood stick) avoiding collisions, par- (i.e., MAE and ACD) as well as superior collision-detection ticularly around the small entrance of the storage (orange). performances (i.e., AUC) in most environments, except the Further, our method was able to safely bring out the object in environment (d). In environment (d), the DiffCo with an the gripper by training the GraphDistNet with random object active learning scheme could show the lowest ACD value. geometries. On the other hand, the trajectory generation with However, the method required significantly longer adaptation DiffCo could precisely reach the target after active learning. time (i.e., about 15 s per environment) and failed to run However, the trajectory from DiffCo failed to bring the target in the most complex environment (e). Such disadvantages object back without risk since the simple perceptron model is will largely degrade the usability of potential applications. not enough to estimate the accurate collision scores required In addition, another neural network based approach, Clear- for planning in a narrow passage. The limitation of DiffCo anceNet, could show the second best performance but Clear- significantly degrades the usability of the system, pausing
DiffCo GraphDistNet (Ours) Fig. 7: Demonstrations on an object-fetch task. Each row shows a UR5e robot tracking a joint space trajectory computed via trajectory optimization with either DiffCo or GraphDistNet collision checkers. The result of DiffCo shows collisions with the storage (orange). the manipulation process. Overall, the demonstration shows [12] N. Das, N. Gupta, and M. Yip, “Fastron: An online learning-based our robot achieves effective trajectory optimization in a real- model and active learning strategy for proxy collision detection,” in Proc. Conf. on Robot Learning, PMLR, 2017. world manipulation scenario. Note that we demonstrated a [13] N. Das and M. Yip, “Learning-based proxy collision detection for planar TO problem with 2D coordinate features aligning robot motion planning applications,” Trans. on Robotics, 2020. the robot and obstacles on the vertical plane. However, [14] Y. Zhi, N. Das, and M. Yip, “Diffco: Autodifferentiable proxy collision detection with multiclass labels for safety-aware trajectory optimiza- GraphDistNet can be extended to more complex scenarios tion,” IEEE Transactions on Robotics, 2022. adopting 3D features or graphs from point clouds. [15] J. Schulman, Y. Duan, J. Ho, A. Lee, I. Awwal, H. Bradlow, J. Pan, S. Patil, K. Goldberg, and P. Abbeel, “Motion planning with sequen- VII. C ONCLUSION tial convex optimization and convex collision checking,” Int’l J. of We introduced a graph-based collision-distance estimation Robotics Research, 2014. [16] M. Kalakrishnan, S. Chitta, E. Theodorou, P. Pastor, and S. Schaal, network, GraphDistNet, that encodes the geometric relation “Stomp: Stochastic trajectory optimization for motion planning,” in between two graphs and predicts their minimum distance and Proc. Int’l Conf. on Robotics and Automation, 2011. gradients to optimize a collision-free trajectory in clutter. Our [17] J. Pan, I. A. Şucan, S. Chitta, and D. Manocha, “Real-time collision detection and distance computation on point cloud sensor data,” in method provides efficient and precise estimation performance Proc. Int’l Conf. on Robotics and Automation, 2013. and generalization capabilities in unforeseen environments [18] S. Kockara, T. Halic, K. Iqbal, C. Bayrak, and R. Rowe, “Collision by leveraging an attention mechanism. The statistical evalua- detection: A survey,” in Proc. Int’l Conf. on Systems, Man and Cybernetics, 2007. tion of GraphDistNet shows superior performance compared [19] G. Van Den Bergen, “Proximity queries and penetration depth com- to other baseline methods in various simulation environ- putation on 3d game objects,” in Game developers conference, 2001. ments. We also demonstrated the applicability in the real [20] J. Huh and D. D. Lee, “Learning high-dimensional mixture models for fast collision detection in rapidly-exploring random trees,” in Proc. world using a UR5e manipulator. Int’l Conf. on Robotics and Automation, 2016. [21] A. Kapusta and C. C. Kemp, “Task-centric optimization of configura- R EFERENCES tions for assistive robots,” Auto. Robots, 2019. [1] E. G. Gilbert, D. W. Johnson, and S. S. Keerthi, “A fast procedure for [22] M. Zucker, N. Ratliff, A. D. Dragan, M. Pivtoraiko, M. Klingensmith, computing the distance between complex objects in three-dimensional C. M. Dellin, J. A. Bagnell, and S. S. Srinivasa, “CHOMP: Covariant space,” IEEE Journal on Robotics and Automation, 1988. hamiltonian optimization for motion planning,” Int’l J. of Robotics [2] J. Pan, S. Chitta, and D. Manocha, “Fcl: A general purpose library for Research, 2013. collision and proximity queries,” in Proc. Int’l Conf. on Robotics and [23] Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Automation, 2012. Solomon, “Dynamic graph cnn for learning on point clouds,” ACM [3] J. D. Cohen, M. C. Lin, D. Manocha, and M. Ponamgi, “I-collide: Transactions On Graphics (TOG), 2019. An interactive and exact collision detection system for large-scale [24] J. Zhou, G. Cui, S. Hu, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li, environments,” in Proceedings of the symposium on Interactive 3D and M. Sun, “Graph neural networks: A review of methods and graphics, 1995. applications,” AI Open, 2020. [4] J. Bialkowski, M. Otte, S. Karaman, and E. Frazzoli, “Efficient [25] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, collision checking in sampling-based motion planning via safety “Neural message passing for quantum chemistry,” in Proc. Int’l Conf. certificates,” Int’l J. of Robotics Research, 2016. on Machine Learning, PMLR, 2017. [5] S. Kumar, S. Choudhary, and S. Srinivasa, “Learning configuration [26] M. Fey and J. E. Lenssen, “Fast graph representation learning with space belief model from collision checks for motion planning,” arXiv pytorch geometric,” ICLR Workshop on Representation Learning on preprint arXiv:1901.07646, 2019. Graphs and Manifolds, 2019. [6] N. Das and M. C. Yip, “Stochastic modeling of distance to collision [27] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Liò, and for robot manipulators,” Robotics and Automation Letters, 2020. Y. Bengio, “Graph Attention Networks,” Proceedings of the Interna- [7] J. J. Johnson and M. C. Yip, “Chance-constrained motion planning tional Conference on Learning Representations, 2018. using modeled distance-to-collision functions,” in Proc. Int’l Conf. on [28] M. Serafini and H. Guan, “Scalable graph neural network training: The Automation Science and Engineering, 2021. case for sampling,” ACM SIGOPS Operating Systems Review, 2021. [8] S. Liu, X. Jin, C. C. Wang, and K.-c. Hui, “Ellipsoidal-blob approx- [29] H. Edelsbrunner and E. P. Mücke, “Three-dimensional alpha shapes,” imation of 3d models and its applications,” Computers & Graphics, ACM Transactions on Graphics (TOG), 1994. 2007. [30] J. Schulman, J. Ho, A. X. Lee, I. Awwal, H. Bradlow, and P. Abbeel, [9] J. Pan and D. Manocha, “Fast probabilistic collision checking for “Finding locally optimal, collision-free trajectories with sequential sampling-based motion planning using locality-sensitive hashing,” Int’l convex optimization,” in Proc. Robotics: Science and Systems, 2013. J. of Robotics Research, 2016. [31] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, [10] J. Pan and D. Manocha, “Efficient configuration space construction T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, and A. e. a. Desmai- and optimization for motion planning,” Engineering, 2015. son, “Pytorch: An imperative style, high-performance deep learning [11] J. Chase Kew, B. Ichter, M. Bandari, T.-W. E. Lee, and A. Faust, library,” in Conf. on Neural Information Processing Systems, 2019. “Neural collision clearance estimator for batched motion planning,” in Int’l Workshop on the Algorithmic Foundations of Robotics, 2020.
You can also read