A Robot Cluster for Reproducible Research in Dexterous Manipulation
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
A Robot Cluster for Reproducible Research in Dexterous Manipulation Manuel Wüthrich∗1 and Felix Widmaier∗1 and Stefan Bauer∗2 and Niklas Funk†3 and Julen Urain De Jesus†3 and Jan Peters†3 and Joe Watson†3 and Claire Chen†4 and Krishnan Srinivasan†4 and Junwu Zhang†4 and Jeffrey Zhang†4 and Matthew R. Walter†5 and Rishabh Madan†9 and Charles Schaff†5 and Takahiro Maeda†5 and Takuma Yoneda†5 and Denis Yarats†6 and Arthur Allshire†7 and Ethan K. Gordon †8 and Tapomayukh Bhattacharjee†9 and Siddhartha S. Srinivasa†8 and Animesh Garg†7 and Annika Buchholz1 and Sebastian Stark1 and Thomas Steinbrenner1 and Joel Akpo1 and Shruti Joshi1 and Vaibhav Agrawal1 and Bernhard Schölkopf1 arXiv:2109.10957v1 [cs.RO] 22 Sep 2021 Abstract— Dexterous manipulation remains an open problem only to a limited degree to real robots. Therefore, the robotics in robotics. To coordinate efforts of the research community community has recently proposed a number of open-source towards tackling this problem, we propose a shared benchmark. platforms for robotic manipulation [28], [1], [27]. These We designed and built robotic platforms that are hosted at the MPI-IS1 and can be accessed remotely. Each platform robots can be built by any lab to reproduce results of other consists of three robotic fingers that are capable of dexterous labs. While this is a large step towards a shared benchmark, object manipulation. Users are able to control the platforms it requires effort by the researchers to set up and maintain remotely by submitting code that is executed automatically, the system, and it is nontrivial to ensure a fully standardized akin to a computational cluster. Using this setup, i) we host setup. robotics competitions, where teams from anywhere in the world access our platforms to tackle challenging tasks, ii) we publish Therefore, we provide remote access to dexterous manip- the datasets collected during these competitions (consisting of ulation platforms hosted at MPI-IS, see Figure 1 (interested hundreds of robot hours), and iii) we give researchers access researchers can contact us to request access). This allows to these platforms for their own projects. for an objective evaluation of robot-learning algorithms on real-world platforms with minimal effort for the researchers. I. I NTRODUCTION In addition, we publish a large dataset of these platforms Dexterous manipulation is humans’ interface to the physical interacting with objects, which we collected during a recent world. Our ability to manipulate objects around us in a competition we hosted. During this competition, teams from creative and precise manner is one of the most apparent across the world developed algorithms for challenging object distinctions between human and animal intelligence. The manipulations tasks, which yielded a very diverse dataset impact robots with a similar level of dexterity would have on containing meaningful interactions. Finally, we also provide a our society cannot be overstated. They would likely replace simulation of the robotic setup, allowing for research into humans in most tasks that are primarily physical, such as sim-to-real transfer. All the code is open-source. working at production lines, packaging, constructing houses, In the rest of the paper, we describe the robotic hardware agriculture, cooking, and cleaning. Yet, robotic manipulation and the software interface which allows easy robot access, is still far from the level of dexterity attained by humans, similarly to a computational cluster. We also describe the as witnessed by the fact that these are still mostly carried robot competition we hosted and the data we collected in the out by humans. This problem has been remarkably resistant process. to the rapid progress of machine learning over the past II. R ELATED W ORK years. A factor that has been crucial for progress in machine learning, but nonexistent in real-world robotic manipulation, In the past years, a large part of the RL community has is a shared benchmark. Benchmarks allow for different focused on simulation benchmarks, such as the deepmind labs to coordinate efforts, reproduce results and measure control suite [25] or OpenAI gym [8] and extensions thereof progress. Most notably, in the area of image processing, [30]. These benchmarks internally use physics simulators, such benchmarks were crucial for the rapid progress of typically Mujoco [26] or PyBullet [12]. deep learning. More recently, simulation benchmarks have These commonly accepted benchmarks allowed researchers been proposed in reinforcement learning (RL) [8], [25]. from different labs to compare their methods, reproduce However, methods that are successful in simulators transfer results, and hence build on each other’s work. Very impressive results have been obtained through this coordinated effort ∗ Equal contribution † Challenge participant [17], [14], [24], [20], [18], [13], [19]. 1 Max-Planck-Institute for Intelligent Systems 2 KTH Stockholm 3 TU Darmstadt 4 Stanford University 5 TTI Chicago In contrast, no such coordinated effort has been possible 6 New York University 7 University of Toronto on real robotic systems, since there is no shared benchmark. 8 University of Washington 9 Cornell University This lack of standardized real-world benchmarks has been
(a) (b) (c) (d) Fig. 1: The dexterous robotic platforms hosted at MPI-IS. Users can submit code which is then executed autmatically. recognized by the robotics and RL community [6], [7], [9], Driving Olympics [2] twice per year. However, a remote [10], [4], [21]. Recently, there have been renewed efforts to benchmark for robotic manipulation accessible to researchers alleviate this problem: around the world is still missing. Therefore we propose such a) Affordable Open-Source Platforms:: The robotics a system herein. community recently proposed affordable open-source robotic platforms that can be built by users. For instance, [28] III. ROBOTIC P LATFORMS propose Replab, a simple, low-cost manipulation platform that is suitable for benchmarking RL algorithms. Similarly, We host 8 robotic platforms at MPI-IS (see Figure 1), [1] propose a simple robotic hand and quadruped that remote users can submit code which is then assigned to a can be built from commercially available modules. CMU platform and executed automatically, akin to a computational designed LoCoBot, a low-cost open-source platform for cluster. Users have access to the data collected during mobile manipulation. [16] propose an open-source quadruped execution of their code. Submission and data retrieval can be consisting of off-the-shelf parts and 3D-printed shells. Based automated to allow for RL methods that alternate between on this design, [27] developed an open-source manipulation policy evaluation and policy improvement. platform consisting of three fingers capable of complex The platforms we use here are based on an open-source dexterous manipulation (here, we use an industrial-grade design that was published recently [27]. The benefits of this adaptation of this design). design are Such platforms are beneficial for collaboration and repro- • Dexterity: The robot design consists of three fingers and ducibility across labs. However, setting up and maintaining has the mechanical and sensorial capabilities necessary such platforms often requires hardware experts and is time- for complex object manipulation beyond grasping. intensive. Furthermore, there are necessarily small variations • Safe Unsupervised Operation: The combination of across labs that may harm reproducibility. To overcome these robust hardware and safety checks in the software allows limitations, the robotics community has proposed a number users to run even unpredictable algorithms without of remote robotics benchmarks. supervision. This enables, for instance, training of deep b) Remote Benchmarks:: For mobile robotics, [23] neural networks directly on the real robot. propose the Robotarium, a remotely accessible swarm robotics • Ease of Use: The C++ and Python interfaces are simple research platform. Similarly, Duckietown [22] hosts the AI and well-suited for RL as well as optimal control at
rates up to 1 kHz. For convenience, we also provide a 15 # Compute next action using some control policy simulation (PyBullet) environment of the robot. . The different observation values are listed separately for illustration. Here, we use an industrial-grade adaptation of this hardware 16 torque = control_policy( (see Figure 1) to guarantee an even longer lifetime and 17 robot_obs.position, 18 robot_obs.velocity, higher reproducibility. In addition, we developed a submission 19 robot_obs.torque, system to allow researchers from anywhere in the world to 20 robot_obs.tip_force, submit code with ease. 21 camera_obs.cameras[0].image, 22 camera_obs.cameras[1].image, 23 camera_obs.cameras[2].image, A. Observations and Actions 24 camera_obs.object_pose.position, 25 camera_obs.object_pose.orientation, Actions: This platform consists of 3 fingers, each with 26 ) 3 joints, yielding a total of 9 degrees of freedom (and 9 27 action = robot_interfaces.trifinger.Action( corresponding motors). There are two ways of controlling the 28 torque=torque) robot: One can send 9-dimensional torque-actions which are At the end of each job, the recorded data is stored and directly executed by the motors. Alternatively, we provide provided to the user, who can then analyse it and use it, the option of using position-actions (9-dimensional as well), for example, to train a better policy. Users can automate which are then translated to torques by an internal controller. submissions and data retrieval to run RL algorithms directly The native control rate of the system is 1kHz, but one can on the robots. control at a lower rate, if so desired. Observations: An observation consists of proprioceptive IV. T HE R EAL ROBOT C HALLENGE measurements, images and the object pose inferred using an object tracker. The proprioceptive measurements are provided Using the robotic platforms described in Section III, we at a rate of 1 kHz and contain the joint angles, joint organized the "Real Robot Challenge 2020", a competition velocities, joint torques (each 9 dimensional) and finger which ran from August to December 2020. The participants tip forces (3 dimensional). There are three cameras placed were able to access the robots remotely via the submission around the robot. The camera images are provided at 10 Hz system described in Section III-B. Thus teams from all around and have a resolution of 270x270 pixels. In addition, our the world were able to participate, allowing them to work system also contains an object tracker, which provides the with real robots, even if they do not have access to one locally pose of the object being manipulated along with the images. (or were locked out of their labs due to the Covid pandemic). Their algorithms generated a large dataset rich in contact B. Submitting Code interactions, which we make publicly available. Currently, We use HTCondor1 to provide a cluster-like system where we are hosting another edition of this challenge. users can submit jobs which are then automatically executed on a randomly-selected robot. During execution, a backend A. Phases and Tasks process automatically starts the robot, monitors execution The challenge was split into three phases: and records all actions and observations. The users only use the simple frontend interface to send actions to and retrieve • Phase 1: In the first phase, the participants had to observations from the robot (see [27] for more details). manipulate a 65 mm cube in simulation. There was no Below is a minimal example of actual user code in Python. restriction on who could participate, this phase served It creates a frontend-interface to the robot and uses it to send as a qualification round before giving the participants torque commands that are computed by some control policy access to the real robots. based on the observations: • Phase 2: The teams that achieved promising results in the first phase were given access to the real robots where 1 # Initialise front end to interact with the robot. the same task had to be solved. 2 robot = robot_fingers.TriFingerPlatformFrontend() 3 • Phase 3: For the last phase, the cube was replaced by a 4 # Create a zero-torque action to start with smaller, elongated cuboid (20x20x80 mm) that is more 5 action = robot_interfaces.trifinger.Action() difficult to grasp and manipulate. 6 7 while True: Figure 2 shows pictures of the different phases and the objects 8 # Append action to the "action queue". Returns that were used. the time step at which the given action will be executed. In all three phases, the task was to move the object 9 t = robot.append_desired_action(action) from its initial position at the center of the workspace to a 10 11 # Get observations of time step t. Will block randomly sampled goal. In each phase, there were four levels and wait if t is in the future. of difficulty corresponding to different goal distributions: 12 robot_obs = robot.get_robot_observation(t) 13 camera_obs = robot.get_camera_observation(t) • Level 1: The goal is randomly sampled on the table. 14 The orientation is not considered. For this level it is not necessary to lift the object, so it can be solved by 1 https://research.cs.wisc.edu/htcondor pushing.
B. Evaluation In phase 1, we used an episode length of 15 s (3750 steps at 250 Hz). In phases 2 and 3, this was increased to 2 min (120000 steps at 1 kHz) so that the users had enough time on the real robots. We defined the performance of an episode as the cumulative P reward, i.e. the sum rewards across all time-steps R = t rt . In the following, we describe the reward functions (for a single time step) used in the different levels. (a) Phase 1 - Simulation 1) Reward for Levels 1-3: For difficulty levels 1-3, we only considered position error (i.e. orientation is ignored). We used a weighted sum of the Euclidean distance on the x/y-plane and the absolute distance along the z-axis. Both components are scaled based on their expected range. Since the z-range is smaller, this means that the height has a higher weight. The sum is again rescaled so that the total error is in the interval [0, 1]. Given goal position pg = (xg , yg , zg ), actual position pa = (xa , ya , za ), arena diameter d and maximum expected height h, the position error epos is computed as (b) Phase 2 - Cube ! p 1 (xg − xa )2 + (yg − ya )2 |zg − za | epos = + (1) 2 d h The arena diameter is d = 0.39, matching the inner diameter of the arena boundary, and the maximum height is h = 0.1. We compute the reward r by simply negating the error r = −epos . 2) Reward for Level 4: a) Phases 1 and 2:: For level 4, we considered both position and orientation. (c) Phase 3 - Cuboid We compute the position error epos as in the previous level, according to (1). For the orientation, we first compute the rotation q = (qx , qy , qz , qw ) (given as quaternion) that would have to be applied to the actual orientation to obtain the goal orientation. We then compute the angle of this rotation, divided by π to normalize to the interval [0, 1]: 2 · atan2(k(qx , qy , qz )k, |qw |) erot = . (2) π For the total reward r we sum the two errors, rescale again to the interval [0, 1] and negate so that a higher reward means (d) Objects used in phases 2 and 3. lower error Fig. 2: Illustration of the different challenge phases and the objects that were used. epos + epos r=− . (3) 2 b) Phase 3:: We found that for the narrow cuboid used • Level 2: The object has to be lifted to a fixed goal in phase 3, our object tracking method was unreliable with position 8cm above the table center. The orientation is respect to the rotation around the long axis of the cuboid. not considered. To prevent this from affecting the reward computation, we • Level 4: The goal is randomly sampled somewhere changed the computation of the rotation error to use only the within the arena with an height of up to 10 cm. The absolute angle between the long axes of the cuboid in goal orientation is not considered. and actual pose. This is again divided by π for normalisation. • Level 5: As level 3, but in addition to the position, a The reward is then computed in the same way as above, using goal orientation is sampled uniformly. (3).
3) Evaluating Submissions: Since there is randomness in with this, we provide a database containing the metadata and each execution due to the randomly-sampled goal and small metrics of all episodes. This allows to filter the data before changes in initial configuration of the object and robot, we downloading. need to average a number of episodes to approximate the The dataset itself, the tools for filtering the episodes as expected reward. At the end of each real-world phase, the well as a more technical description of the data format can code of each team was executed for multiple goals of each be found at https://people.tuebingen.mpg.de/ difficulty level (we use the same goals across all users to mpi-is-software/data/rrc2020. reduce variance). The average cumulative rewards P4 Ri for the 1) Data Description: For each episode, the following different levels i are then aggregated by i=1 i · Ri . This information is stored: weighted sum gives a higher weight to higher (more difficult) • all robot and camera observations (including object pose levels, encouraging teams to solve them. information) as well as all actions (see Section III-A) • camera calibration parameters C. Challenge Results • the goal pose that was used in this episode After the simulation stage, seven teams with excellent • metadata such as timestamp, challenge phase, etc. performance qualified for the real-robot phases. These teams • several metrics that can be used to filter the dataset: made thousands of submissions to the robots, corresponding – cumulative reward to approximately 250 hours of robot-run-time (see Figure 3 – baseline reward obtained if the object were not for submission statistics). Tables I and II show the final moved during the whole episode evaluation results of phases 2 and 3 (the scores in these – initial distance to goal tables are computed as described in Section IV-B). The top – closest distance to goal throughout the episode teams in both phases found solutions that successfully grasp – maximum height of the object throughout the the object and move it to the goal position. Videos published episode by the winning teams are available on YouTube2 . They also – largest distance to initial position throughout the published reports describing their methods [29], [5], [11] and episode open-sourced their code3 . We collected all the data produced during the challenge and Any user-specific information, such as names or custom output aggregated it into a dataset, which is described in Section IV- of the user code, are not included in the dataset. D. 2) Data Format: The data is provided in two different 1) The Winning Policies: The winning teams used similar formats: methods for solving the task: They made use of motion 1) The original log files as they were generated by the primitives that are sequenced using state machines. For robot. This includes files in a custom binary format that difficulty level 4, where orientation is important, they typically requires the use of our software to read them (available first perform a sequence of motions to rotate the object to be for C++ and Python). roughly in the right orientation before lifting it to the goal 2) Zarr4 storages created from the original files. This can position. For details regarding their implementations, please be easily read in Python using the Zarr package. refer to [29], [5], [11], [3]. Further [15] contains a more For a more technical description please refer to the dataset detailed description of the solutions of some of the teams. website. D. The Dataset V. C ONCLUSION The dataset contains the recorded data of all jobs that were We designed and built a robot cluster to facilitate repro- executed during phases 2 and 3 of the challenge. Combined ducible research into dexterous robotic manipulation. We with the runs from the weekly evaluation rounds, this results believe that this cluster can greatly enhance coordination and in 2856 episodes of phase 2 and 7422 episodes of phase 3. collaboration between researchers all across the world. We The data of each episode can be downloaded individually. hosted competitions on these robot to advance state-of-the- The compressed file size of one episode is about 250 MB. art and to produce publicly-available datasets that contain We expect that users of the dataset will typically not use rich interactions between the robots and external objects. In all episodes (which would be a large amount of data), but addition, we provide scientists with access to the platforms select those that are interesting for their project. To help for their own research project. This is especially beneficial 2 ardentstork: https://youtube.com/playlist?list= for researchers which would otherwise not have access to PLBUWL2_ywUvE_czrinTTRqqzNu86mYuOV such robots. troubledhare: https://youtube.com/playlist?list= PLEYg4qhK8iUaXVb1ij18pVwnzeowMCpRc R EFERENCES sombertortoise: https://youtu.be/I65Kwu9PGmg 3 ardentstork: https://github.com/ripl-ttic/ [1] Michael Ahn, Henry Zhu, Kristian Hartikainen, Hugo Ponte, Abhishek real-robot-challenge Gupta, Sergey Levine, and Vikash Kumar. Robel: Robotics benchmarks troubledhare: https://github.com/madan96/rrc_example_ for learning with low-cost robots. arXiv preprint arXiv:1909.11639, package 2019. sombertortoise: https://github.com/stanford-iprl-lab/ rrc_package 4 https://zarr.readthedocs.io
TABLE I: Final Evaluation of Phase 2 # Username Level 1 Level 2 Level 3 Level 4 Total Score 1. ardentstork -5472 -2898 -9080 -21428 -124221 2. troubledhare -3927 -4144 -4226 -48572 -219182 3. sombertortoise -8544 -15199 -14075 -44989 -261123 4. sincerefish -6278 -13738 -17927 -49491 -285500 5. hushedtomatoe -17976 -41389 -41832 -60815 -469509 6. giddyicecream -22379 -46650 -41655 -61845 -488023 TABLE II: Final Evaluation of Phase 3 # Username Level 1 Level 2 Level 3 Level 4 Total Score 1. ardentstork -9239 -4040 -6525 -25625 -139394 2. sombertortoise -5461 -8522 -10323 -36135 -198016 3. sincerefish -7428 -25291 -26768 -52311 -347560 4. innocenttortoise -16872 -31977 -33357 -55611 -403344 5. hushedtomatoe -18304 -31917 -36835 -60219 -433521 6. troubledhare -18742 -42831 -36272 -56503 -439233 7. giddyicecream -33329 -57372 -53694 -59734 -548090 400 ardentstork giddyicecream troubledhare sincerefish troubledhare giddyicecream Number of submissions 300 hushedtomatoe 493 sombertortoise 220 sombertortoise ardentstork 180 837 hushedtomatoe 200 87 1249 100 0 sincerefish 4 9 4 9 3 8 3 0-1 0-1 0-2 0-2 1-0 1-0 1-1 0-1 0-1 0-1 0-1 0-1 0-1 0-1 202 202 202 202 202 202 202 (a) Phase 2 ardentstork 500 giddyicecream giddyicecream sincerefish ardentstork innocenttortoise Number of submissions 400 troubledhare 863 748 sombertortoise hushedtomatoe sombertortoise 271 troubledhare 300 242 85 2 innocenttortoise hushedtomatoe 200 3530 100 0 sincerefish 6 1 6 1 6 1 1-1 1-2 1-2 2-0 2-0 2-1 0-1 0-1 0-1 0-1 0-1 0-1 202 202 202 202 202 202 (b) Phase 3 Fig. 3: Number of submissions over time (left, one bar corresponds to one day) and per team (right).
[2] AI-DO: AI Driving Olympics – Duckietown. Website. https://www. editors, Proceedings of The 33rd International Conference on Machine duckietown.org/research/ai-driving-olympics. Ac- Learning, volume 48 of Proceedings of Machine Learning Research, cessed: 2021-8-27. pages 1928–1937, New York, New York, USA, 2016. PMLR. [3] Arthur Allshire, Mayank Mittal, Varun Lodaya, Viktor Makoviychuk, [21] Adithyavairavan Murali, Tao Chen, Kalyan Vasudev Alwala, Dhiraj Denys Makoviichuk, Felix Widmaier, Manuel Wüthrich, Stefan Bauer, Gandhi, Lerrel Pinto, Saurabh Gupta, and Abhinav Gupta. PyRobot: Ankur Handa, and Animesh Garg. Transferring dexterous manipulation An Open-source Robotics Framework for Research and Benchmarking. from gpu simulation to a remote real-world trifinger. arXiv preprint June 2019. arXiv:2108.09779, 2021. [22] Liam Paull, Jacopo Tani, Heejin Ahn, Javier Alonso-Mora, Luca [4] F Amigoni, E Bastianelli, J Berghofer, A Bonarini, G Fontana, Carlone, Michal Cap, Yu Fan Chen, Changhyun Choi, Jeff Dusek, N Hochgeschwender, L Iocchi, G Kraetzschmar, P Lima, M Mat- Yajun Fang, Daniel Hoehener, Shih-Yuan Liu, Michael Novitzky, teucci, P Miraldo, D Nardi, and V Schiaffonati. Competitions for Igor Franzoni Okuyama, Jason Pazis, Guy Rosman, Valerio Varricchio, Benchmarking: Task and Functionality Scoring Complete Performance Hsueh-Cheng Wang, Dmitry Yershov, Hang Zhao, Michael Benjamin, Assessment. IEEE robotics & automation magazine / IEEE Robotics Christopher Carr, Maria Zuber, Sertac Karaman, Emilio Frazzoli, & Automation Society, 22(3):53–61, September 2015. Domitilla Del Vecchio, Daniela Rus, Jonathan How, John Leonard, and [5] Anonymous. Real robot challenge phase 2: Manipulating objects using Andrea Censi. Duckietown: An open, inexpensive and flexible platform high-level coordination of motion primitives. In Submitted to Real for autonomy education and research. In 2017 IEEE International Robot Challenge 2020, 2020. under review. Conference on Robotics and Automation (ICRA), pages 1497–1504, [6] Sven Behnke. Robot competitions-ideal benchmarks for robotics May 2017. research. In Proc. of IROS-2006 Workshop on Benchmarks in Robotics [23] Daniel Pickem, Paul Glotfelter, Li Wang, Mark Mote, Aaron Ames, Research. Institute of Electrical and Electronics Engineers (IEEE), Eric Feron, and Magnus Egerstedt. The robotarium: A remotely 2006. accessible swarm robotics research testbed. In 2017 IEEE International [7] F Bonsignorio and A P del Pobil. Toward Replicable and Measurable Conference on Robotics and Automation (ICRA), pages 1699–1706. Robotics Research [From the Guest Editors]. IEEE robotics & IEEE, 2017. automation magazine / IEEE Robotics & Automation Society, 22(3):32– [24] Ivaylo Popov, Nicolas Heess, Timothy Lillicrap, Roland Hafner, Gabriel 35, September 2015. Barth-Maron, Matej Vecerik, Thomas Lampe, Yuval Tassa, Tom Erez, [8] Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, and Martin Riedmiller. Data-efficient Deep Reinforcement Learning John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym. arXiv for Dexterous Manipulation. April 2017. preprint arXiv:1606.01540, 2016. [25] Yuval Tassa, Yotam Doron, Alistair Muldal, Tom Erez, Yazhe Li, [9] B Calli, A Walsman, A Singh, S Srinivasa, P Abbeel, and A M Diego de Las Casas, David Budden, Abbas Abdolmaleki, Josh Merel, Dollar. Benchmarking in Manipulation Research: Using the Yale-CMU- Andrew Lefrancq, et al. Deepmind control suite. arXiv preprint Berkeley Object and Model Set. IEEE robotics & automation magazine arXiv:1801.00690, 2018. / IEEE Robotics & Automation Society, 22(3):36–52, September 2015. [26] Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics [10] Berk Calli, Aaron Walsman, Arjun Singh, Siddhartha Srinivasa, Pieter engine for model-based control. In 2012 IEEE/RSJ International Abbeel, and Aaron M Dollar. Benchmarking in Manipulation Research: Conference on Intelligent Robots and Systems, pages 5026–5033. IEEE, The YCB Object and Model Set and Benchmarking Protocols. February 2012. 2015. [27] Manuel Wüthrich, Felix Widmaier, Felix Grimminger, Joel Akpo, Shruti [11] Claire Chen, Krishnan Srinivasan, Jeffrey Zhang, Junwu Zhang, Lin Joshi, Vaibhav Agrawal, Bilal Hammoud, Majid Khadiv, Miroslav Shao, Shenli Yuan, Preston Culbertson, Hongkai Dai, Mac Schwager, Bogdanovic, Vincent Berenz, Julian Viereck, Maximilien Naveau, and Jeannette Bohg. Dexterous manipulation primitives for the real Ludovic Righetti, Bernhard Schölkopf, and Stefan Bauer. TriFinger: robot challenge, 2021. An Open-Source Robot for Learning Dexterity. In Proceedings of the [12] Erwin Coumans and Yunfei Bai. Pybullet, a python module for physics Conference on Robot Learning (CoRL), August 2020. simulation for games, robotics and machine learning. GitHub repository, [28] Brian Yang, Jesse Zhang, Vitchyr Pong, Sergey Levine, and Dinesh 2016. Jayaraman. Replab: A reproducible low-cost arm benchmark platform [13] Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel. for robotic learning. arXiv preprint arXiv:1905.07447, 2019. Benchmarking deep reinforcement learning for continuous control. In [29] Takuma Yoneda, Charles Schaff, Takahiro Maeda, and Matthew Walter. International Conference on Machine Learning, pages 1329–1338, Grasp and motion planning for dexterous manipulation for the real 2016. robot challenge, 2021. [14] Scott Fujimoto, Herke van Hoof, and David Meger. Addressing [30] Iker Zamora, Nestor Gonzalez Lopez, Victor Mayoral Vilches, and Function Approximation Error in Actor-Critic Methods. February Alejandro Hernandez Cordero. Extending the openai gym for robotics: 2018. a toolkit for reinforcement learning using ros and gazebo. arXiv preprint arXiv:1608.05742, 2016. [15] Niklas Funk, Charles Schaff, Rishabh Madan, Takuma Yoneda, Julen Urain De Jesus, Joe Watson, Ethan K. Gordon, Felix Widmaier, Stefan Bauer, Siddhartha S. Srinivasa, Tapomayukh Bhattacharjee, Matthew R. Walter, and Jan Peters. Benchmarking structured policies and policy optimization for real-world dexterous object manipulation, 2021. [16] Felix Grimminger, Avadesh Meduri, Majid Khadiv, Julian Viereck, Manuel Wüthrich, Maximilien Naveau, Vincent Berenz, Steve Heim, Fe- lix Widmaier, Jonathan Fiene, Alexander Badri-Spröwitz, and Ludovic Righetti. An Open Torque-Controlled Modular Robot Architecture for Legged Locomotion Research. In International Conference on Robotics and Automation (ICRA), 2020. [17] Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. January 2018. [18] Nicolas Heess, T B Dhruva, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, S M Ali Eslami, Martin Riedmiller, and David Silver. Emergence of Locomotion Behaviours in Rich Environments. July 2017. [19] Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David Meger. Deep reinforcement learning that matters. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018. [20] Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous Methods for Deep Reinforcement Learning. In Maria Florina Balcan and Kilian Q Weinberger,
You can also read