A Robot Cluster for Reproducible Research in Dexterous Manipulation

Page created by Ron Juarez
 
CONTINUE READING
A Robot Cluster for Reproducible Research in Dexterous Manipulation
A Robot Cluster for Reproducible Research in Dexterous Manipulation
                                                                    Manuel Wüthrich∗1 and Felix Widmaier∗1 and Stefan Bauer∗2
                                                           and Niklas Funk†3 and Julen Urain De Jesus†3 and Jan Peters†3 and Joe Watson†3
                                                         and Claire Chen†4 and Krishnan Srinivasan†4 and Junwu Zhang†4 and Jeffrey Zhang†4
                                                     and Matthew R. Walter†5 and Rishabh Madan†9 and Charles Schaff†5 and Takahiro Maeda†5
                                                       and Takuma Yoneda†5 and Denis Yarats†6 and Arthur Allshire†7 and Ethan K. Gordon †8
                                                          and Tapomayukh Bhattacharjee†9 and Siddhartha S. Srinivasa†8 and Animesh Garg†7
                                                        and Annika Buchholz1 and Sebastian Stark1 and Thomas Steinbrenner1 and Joel Akpo1
                                                                and Shruti Joshi1 and Vaibhav Agrawal1 and Bernhard Schölkopf1
arXiv:2109.10957v1 [cs.RO] 22 Sep 2021

                                            Abstract— Dexterous manipulation remains an open problem         only to a limited degree to real robots. Therefore, the robotics
                                         in robotics. To coordinate efforts of the research community        community has recently proposed a number of open-source
                                         towards tackling this problem, we propose a shared benchmark.       platforms for robotic manipulation [28], [1], [27]. These
                                         We designed and built robotic platforms that are hosted at
                                         the MPI-IS1 and can be accessed remotely. Each platform             robots can be built by any lab to reproduce results of other
                                         consists of three robotic fingers that are capable of dexterous     labs. While this is a large step towards a shared benchmark,
                                         object manipulation. Users are able to control the platforms        it requires effort by the researchers to set up and maintain
                                         remotely by submitting code that is executed automatically,         the system, and it is nontrivial to ensure a fully standardized
                                         akin to a computational cluster. Using this setup, i) we host       setup.
                                         robotics competitions, where teams from anywhere in the world
                                         access our platforms to tackle challenging tasks, ii) we publish       Therefore, we provide remote access to dexterous manip-
                                         the datasets collected during these competitions (consisting of     ulation platforms hosted at MPI-IS, see Figure 1 (interested
                                         hundreds of robot hours), and iii) we give researchers access       researchers can contact us to request access). This allows
                                         to these platforms for their own projects.                          for an objective evaluation of robot-learning algorithms on
                                                                                                             real-world platforms with minimal effort for the researchers.
                                                                    I. I NTRODUCTION                         In addition, we publish a large dataset of these platforms
                                            Dexterous manipulation is humans’ interface to the physical      interacting with objects, which we collected during a recent
                                         world. Our ability to manipulate objects around us in a             competition we hosted. During this competition, teams from
                                         creative and precise manner is one of the most apparent             across the world developed algorithms for challenging object
                                         distinctions between human and animal intelligence. The             manipulations tasks, which yielded a very diverse dataset
                                         impact robots with a similar level of dexterity would have on       containing meaningful interactions. Finally, we also provide a
                                         our society cannot be overstated. They would likely replace         simulation of the robotic setup, allowing for research into
                                         humans in most tasks that are primarily physical, such as           sim-to-real transfer. All the code is open-source.
                                         working at production lines, packaging, constructing houses,           In the rest of the paper, we describe the robotic hardware
                                         agriculture, cooking, and cleaning. Yet, robotic manipulation       and the software interface which allows easy robot access,
                                         is still far from the level of dexterity attained by humans,        similarly to a computational cluster. We also describe the
                                         as witnessed by the fact that these are still mostly carried        robot competition we hosted and the data we collected in the
                                         out by humans. This problem has been remarkably resistant           process.
                                         to the rapid progress of machine learning over the past
                                                                                                                                 II. R ELATED W ORK
                                         years. A factor that has been crucial for progress in machine
                                         learning, but nonexistent in real-world robotic manipulation,          In the past years, a large part of the RL community has
                                         is a shared benchmark. Benchmarks allow for different               focused on simulation benchmarks, such as the deepmind
                                         labs to coordinate efforts, reproduce results and measure           control suite [25] or OpenAI gym [8] and extensions thereof
                                         progress. Most notably, in the area of image processing,            [30]. These benchmarks internally use physics simulators,
                                         such benchmarks were crucial for the rapid progress of              typically Mujoco [26] or PyBullet [12].
                                         deep learning. More recently, simulation benchmarks have               These commonly accepted benchmarks allowed researchers
                                         been proposed in reinforcement learning (RL) [8], [25].             from different labs to compare their methods, reproduce
                                         However, methods that are successful in simulators transfer         results, and hence build on each other’s work. Very impressive
                                                                                                             results have been obtained through this coordinated effort
                                           ∗ Equal contribution † Challenge participant                      [17], [14], [24], [20], [18], [13], [19].
                                           1 Max-Planck-Institute  for Intelligent Systems 2 KTH Stockholm
                                           3 TU Darmstadt 4 Stanford University 5 TTI Chicago                   In contrast, no such coordinated effort has been possible
                                           6 New York University 7 University of Toronto                     on real robotic systems, since there is no shared benchmark.
                                           8 University of Washington 9 Cornell University                   This lack of standardized real-world benchmarks has been
A Robot Cluster for Reproducible Research in Dexterous Manipulation
(a)                                                 (b)

                                    (c)                                                 (d)
  Fig. 1: The dexterous robotic platforms hosted at MPI-IS. Users can submit code which is then executed autmatically.

recognized by the robotics and RL community [6], [7], [9],       Driving Olympics [2] twice per year. However, a remote
[10], [4], [21]. Recently, there have been renewed efforts to    benchmark for robotic manipulation accessible to researchers
alleviate this problem:                                          around the world is still missing. Therefore we propose such
     a) Affordable Open-Source Platforms:: The robotics          a system herein.
community recently proposed affordable open-source robotic
platforms that can be built by users. For instance, [28]                          III. ROBOTIC P LATFORMS
propose Replab, a simple, low-cost manipulation platform
that is suitable for benchmarking RL algorithms. Similarly,         We host 8 robotic platforms at MPI-IS (see Figure 1),
[1] propose a simple robotic hand and quadruped that             remote users can submit code which is then assigned to a
can be built from commercially available modules. CMU            platform and executed automatically, akin to a computational
designed LoCoBot, a low-cost open-source platform for            cluster. Users have access to the data collected during
mobile manipulation. [16] propose an open-source quadruped       execution of their code. Submission and data retrieval can be
consisting of off-the-shelf parts and 3D-printed shells. Based   automated to allow for RL methods that alternate between
on this design, [27] developed an open-source manipulation       policy evaluation and policy improvement.
platform consisting of three fingers capable of complex             The platforms we use here are based on an open-source
dexterous manipulation (here, we use an industrial-grade         design that was published recently [27]. The benefits of this
adaptation of this design).                                      design are
   Such platforms are beneficial for collaboration and repro-      •   Dexterity: The robot design consists of three fingers and
ducibility across labs. However, setting up and maintaining            has the mechanical and sensorial capabilities necessary
such platforms often requires hardware experts and is time-            for complex object manipulation beyond grasping.
intensive. Furthermore, there are necessarily small variations     •   Safe Unsupervised Operation: The combination of
across labs that may harm reproducibility. To overcome these           robust hardware and safety checks in the software allows
limitations, the robotics community has proposed a number              users to run even unpredictable algorithms without
of remote robotics benchmarks.                                         supervision. This enables, for instance, training of deep
     b) Remote Benchmarks:: For mobile robotics, [23]                  neural networks directly on the real robot.
propose the Robotarium, a remotely accessible swarm robotics       •   Ease of Use: The C++ and Python interfaces are simple
research platform. Similarly, Duckietown [22] hosts the AI             and well-suited for RL as well as optimal control at
A Robot Cluster for Reproducible Research in Dexterous Manipulation
rates up to 1 kHz. For convenience, we also provide a         15         # Compute next action using some control policy
          simulation (PyBullet) environment of the robot.                          . The different observation values are listed
                                                                                   separately for illustration.
        Here, we use an industrial-grade adaptation of this hardware    16         torque = control_policy(
     (see Figure 1) to guarantee an even longer lifetime and            17             robot_obs.position,
                                                                        18             robot_obs.velocity,
     higher reproducibility. In addition, we developed a submission     19             robot_obs.torque,
     system to allow researchers from anywhere in the world to          20             robot_obs.tip_force,
     submit code with ease.                                             21             camera_obs.cameras[0].image,
                                                                        22             camera_obs.cameras[1].image,
                                                                        23             camera_obs.cameras[2].image,
     A. Observations and Actions                                        24             camera_obs.object_pose.position,
                                                                        25             camera_obs.object_pose.orientation,
        Actions: This platform consists of 3 fingers, each with 26                 )
     3 joints, yielding a total of 9 degrees of freedom (and 9 27                  action = robot_interfaces.trifinger.Action(
     corresponding motors). There are two ways of controlling the 28                   torque=torque)
     robot: One can send 9-dimensional torque-actions which are
                                                                               At the end of each job, the recorded data is stored and
     directly executed by the motors. Alternatively, we provide
                                                                             provided to the user, who can then analyse it and use it,
     the option of using position-actions (9-dimensional as well),
                                                                             for example, to train a better policy. Users can automate
     which are then translated to torques by an internal controller.
                                                                             submissions and data retrieval to run RL algorithms directly
     The native control rate of the system is 1kHz, but one can
                                                                             on the robots.
     control at a lower rate, if so desired.
        Observations: An observation consists of proprioceptive
                                                                                         IV. T HE R EAL ROBOT C HALLENGE
     measurements, images and the object pose inferred using an
     object tracker. The proprioceptive measurements are provided               Using the robotic platforms described in Section III, we
     at a rate of 1 kHz and contain the joint angles, joint                  organized the "Real Robot Challenge 2020", a competition
     velocities, joint torques (each 9 dimensional) and finger               which ran from August to December 2020. The participants
     tip forces (3 dimensional). There are three cameras placed              were able to access the robots remotely via the submission
     around the robot. The camera images are provided at 10 Hz               system described in Section III-B. Thus teams from all around
     and have a resolution of 270x270 pixels. In addition, our               the world were able to participate, allowing them to work
     system also contains an object tracker, which provides the              with real robots, even if they do not have access to one locally
     pose of the object being manipulated along with the images.             (or were locked out of their labs due to the Covid pandemic).
                                                                             Their algorithms generated a large dataset rich in contact
     B. Submitting Code                                                      interactions, which we make publicly available. Currently,
        We use HTCondor1 to provide a cluster-like system where              we are hosting another edition of this challenge.
     users can submit jobs which are then automatically executed
     on a randomly-selected robot. During execution, a backend               A. Phases and Tasks
     process automatically starts the robot, monitors execution                The challenge was split into three phases:
     and records all actions and observations. The users only use
     the simple frontend interface to send actions to and retrieve             •   Phase 1: In the first phase, the participants had to
     observations from the robot (see [27] for more details).                      manipulate a 65 mm cube in simulation. There was no
        Below is a minimal example of actual user code in Python.                  restriction on who could participate, this phase served
     It creates a frontend-interface to the robot and uses it to send              as a qualification round before giving the participants
     torque commands that are computed by some control policy                      access to the real robots.
     based on the observations:                                                •   Phase 2: The teams that achieved promising results in
                                                                                   the first phase were given access to the real robots where
 1   # Initialise front end to interact with the robot.                            the same task had to be solved.
 2   robot = robot_fingers.TriFingerPlatformFrontend()
 3
                                                                               •   Phase 3: For the last phase, the cube was replaced by a
 4   # Create a zero-torque action to start with                                   smaller, elongated cuboid (20x20x80 mm) that is more
 5   action = robot_interfaces.trifinger.Action()                                  difficult to grasp and manipulate.
 6
 7   while True:                                                             Figure 2 shows pictures of the different phases and the objects
 8       # Append action to the "action queue". Returns                      that were used.
         the time step at which the given action will be
          executed.                                                             In all three phases, the task was to move the object
 9       t = robot.append_desired_action(action)                             from its initial position at the center of the workspace to a
10
11        # Get observations of time step t. Will block
                                                                             randomly sampled goal. In each phase, there were four levels
          and wait if t is in the future.                                    of difficulty corresponding to different goal distributions:
12        robot_obs = robot.get_robot_observation(t)
13        camera_obs = robot.get_camera_observation(t)
                                                                               •   Level 1: The goal is randomly sampled on the table.
14                                                                                 The orientation is not considered. For this level it is
                                                                                   not necessary to lift the object, so it can be solved by
       1 https://research.cs.wisc.edu/htcondor                                     pushing.
A Robot Cluster for Reproducible Research in Dexterous Manipulation
B. Evaluation
                                                                    In phase 1, we used an episode length of 15 s (3750 steps
                                                                 at 250 Hz). In phases 2 and 3, this was increased to 2 min
                                                                 (120000 steps at 1 kHz) so that the users had enough time
                                                                 on the real robots.
                                                                    We defined the performance of an episode as the cumulative
                                                                                                                             P
                                                                 reward, i.e. the sum rewards across all time-steps R = t rt .
                                                                 In the following, we describe the reward functions (for a
                                                                 single time step) used in the different levels.
                    (a) Phase 1 - Simulation
                                                                    1) Reward for Levels 1-3: For difficulty levels 1-3, we
                                                                 only considered position error (i.e. orientation is ignored).
                                                                    We used a weighted sum of the Euclidean distance on the
                                                                 x/y-plane and the absolute distance along the z-axis. Both
                                                                 components are scaled based on their expected range. Since
                                                                 the z-range is smaller, this means that the height has a higher
                                                                 weight. The sum is again rescaled so that the total error is in
                                                                 the interval [0, 1].
                                                                    Given goal position pg = (xg , yg , zg ), actual position pa =
                                                                 (xa , ya , za ), arena diameter d and maximum expected height
                                                                 h, the position error epos is computed as
                       (b) Phase 2 - Cube                                                                                        !
                                                                                  p
                                                                            1         (xg − xa )2 + (yg − ya )2   |zg − za |
                                                                   epos   =                                     +                    (1)
                                                                            2                    d                    h

                                                                 The arena diameter is d = 0.39, matching the inner diameter
                                                                 of the arena boundary, and the maximum height is h = 0.1.
                                                                    We compute the reward r by simply negating the error
                                                                 r = −epos .
                                                                    2) Reward for Level 4:
                                                                       a) Phases 1 and 2:: For level 4, we considered both
                                                                 position and orientation.
                      (c) Phase 3 - Cuboid                          We compute the position error epos as in the previous level,
                                                                 according to (1). For the orientation, we first compute the
                                                                 rotation q = (qx , qy , qz , qw ) (given as quaternion) that would
                                                                 have to be applied to the actual orientation to obtain the
                                                                 goal orientation. We then compute the angle of this rotation,
                                                                 divided by π to normalize to the interval [0, 1]:

                                                                                         2 · atan2(k(qx , qy , qz )k, |qw |)
                                                                                erot =                                       .       (2)
                                                                                                       π
                                                                    For the total reward r we sum the two errors, rescale again
                                                                 to the interval [0, 1] and negate so that a higher reward means
               (d) Objects used in phases 2 and 3.               lower error
Fig. 2: Illustration of the different challenge phases and the
objects that were used.                                                                            epos + epos
                                                                                           r=−                 .                     (3)
                                                                                                        2
                                                                      b) Phase 3:: We found that for the narrow cuboid used
  •   Level 2: The object has to be lifted to a fixed goal       in phase 3, our object tracking method was unreliable with
      position 8cm above the table center. The orientation is    respect to the rotation around the long axis of the cuboid.
      not considered.                                            To prevent this from affecting the reward computation, we
  •   Level 4: The goal is randomly sampled somewhere            changed the computation of the rotation error to use only the
      within the arena with an height of up to 10 cm. The        absolute angle between the long axes of the cuboid in goal
      orientation is not considered.                             and actual pose. This is again divided by π for normalisation.
  •   Level 5: As level 3, but in addition to the position, a    The reward is then computed in the same way as above, using
      goal orientation is sampled uniformly.                     (3).
A Robot Cluster for Reproducible Research in Dexterous Manipulation
3) Evaluating Submissions: Since there is randomness in           with this, we provide a database containing the metadata and
each execution due to the randomly-sampled goal and small            metrics of all episodes. This allows to filter the data before
changes in initial configuration of the object and robot, we         downloading.
need to average a number of episodes to approximate the                 The dataset itself, the tools for filtering the episodes as
expected reward. At the end of each real-world phase, the            well as a more technical description of the data format can
code of each team was executed for multiple goals of each            be found at https://people.tuebingen.mpg.de/
difficulty level (we use the same goals across all users to          mpi-is-software/data/rrc2020.
reduce variance). The average cumulative rewards
                                           P4     Ri for the            1) Data Description: For each episode, the following
different levels i are then aggregated by i=1 i · Ri . This          information is stored:
weighted sum gives a higher weight to higher (more difficult)           • all robot and camera observations (including object pose
levels, encouraging teams to solve them.                                  information) as well as all actions (see Section III-A)
                                                                        • camera calibration parameters
C. Challenge Results
                                                                        • the goal pose that was used in this episode
   After the simulation stage, seven teams with excellent               • metadata such as timestamp, challenge phase, etc.
performance qualified for the real-robot phases. These teams            • several metrics that can be used to filter the dataset:
made thousands of submissions to the robots, corresponding
                                                                            – cumulative reward
to approximately 250 hours of robot-run-time (see Figure 3
                                                                            – baseline reward obtained if the object were not
for submission statistics). Tables I and II show the final
                                                                               moved during the whole episode
evaluation results of phases 2 and 3 (the scores in these
                                                                            – initial distance to goal
tables are computed as described in Section IV-B). The top
                                                                            – closest distance to goal throughout the episode
teams in both phases found solutions that successfully grasp
                                                                            – maximum height of the object throughout the
the object and move it to the goal position. Videos published
                                                                               episode
by the winning teams are available on YouTube2 . They also
                                                                            – largest distance to initial position throughout the
published reports describing their methods [29], [5], [11] and
                                                                               episode
open-sourced their code3 .
   We collected all the data produced during the challenge and       Any user-specific information, such as names or custom output
aggregated it into a dataset, which is described in Section IV-      of the user code, are not included in the dataset.
D.                                                                      2) Data Format: The data is provided in two different
   1) The Winning Policies: The winning teams used similar           formats:
methods for solving the task: They made use of motion                   1) The original log files as they were generated by the
primitives that are sequenced using state machines. For                    robot. This includes files in a custom binary format that
difficulty level 4, where orientation is important, they typically         requires the use of our software to read them (available
first perform a sequence of motions to rotate the object to be             for C++ and Python).
roughly in the right orientation before lifting it to the goal          2) Zarr4 storages created from the original files. This can
position. For details regarding their implementations, please              be easily read in Python using the Zarr package.
refer to [29], [5], [11], [3]. Further [15] contains a more          For a more technical description please refer to the dataset
detailed description of the solutions of some of the teams.          website.
D. The Dataset                                                                                V. C ONCLUSION
   The dataset contains the recorded data of all jobs that were         We designed and built a robot cluster to facilitate repro-
executed during phases 2 and 3 of the challenge. Combined            ducible research into dexterous robotic manipulation. We
with the runs from the weekly evaluation rounds, this results        believe that this cluster can greatly enhance coordination and
in 2856 episodes of phase 2 and 7422 episodes of phase 3.            collaboration between researchers all across the world. We
   The data of each episode can be downloaded individually.          hosted competitions on these robot to advance state-of-the-
The compressed file size of one episode is about 250 MB.             art and to produce publicly-available datasets that contain
   We expect that users of the dataset will typically not use        rich interactions between the robots and external objects. In
all episodes (which would be a large amount of data), but            addition, we provide scientists with access to the platforms
select those that are interesting for their project. To help         for their own research project. This is especially beneficial
  2 ardentstork:       https://youtube.com/playlist?list=            for researchers which would otherwise not have access to
PLBUWL2_ywUvE_czrinTTRqqzNu86mYuOV                                   such robots.
  troubledhare:        https://youtube.com/playlist?list=
PLEYg4qhK8iUaXVb1ij18pVwnzeowMCpRc                                                              R EFERENCES
  sombertortoise: https://youtu.be/I65Kwu9PGmg
  3 ardentstork:            https://github.com/ripl-ttic/             [1] Michael Ahn, Henry Zhu, Kristian Hartikainen, Hugo Ponte, Abhishek
real-robot-challenge                                                      Gupta, Sergey Levine, and Vikash Kumar. Robel: Robotics benchmarks
  troubledhare: https://github.com/madan96/rrc_example_                   for learning with low-cost robots. arXiv preprint arXiv:1909.11639,
package                                                                   2019.
  sombertortoise: https://github.com/stanford-iprl-lab/
rrc_package                                                            4 https://zarr.readthedocs.io
A Robot Cluster for Reproducible Research in Dexterous Manipulation
TABLE I: Final Evaluation of Phase 2
                                                                            #          Username                              Level 1             Level 2   Level 3       Level 4     Total Score
                                                                            1.         ardentstork                             -5472              -2898        -9080     -21428             -124221
                                                                            2.         troubledhare                            -3927              -4144        -4226     -48572             -219182
                                                                            3.         sombertortoise                          -8544             -15199       -14075     -44989             -261123
                                                                            4.         sincerefish                             -6278             -13738       -17927     -49491             -285500
                                                                            5.         hushedtomatoe                          -17976             -41389       -41832     -60815             -469509
                                                                            6.         giddyicecream                          -22379             -46650       -41655     -61845             -488023

                                                                                                           TABLE II: Final Evaluation of Phase 3
                                                                           #          Username                                Level 1            Level 2      Level 3    Level 4      Total Score
                                                                           1.         ardentstork                               -9239              -4040       -6525      -25625            -139394
                                                                           2.         sombertortoise                            -5461              -8522      -10323      -36135            -198016
                                                                           3.         sincerefish                               -7428             -25291      -26768      -52311            -347560
                                                                           4.         innocenttortoise                         -16872             -31977      -33357      -55611            -403344
                                                                           5.         hushedtomatoe                            -18304             -31917      -36835      -60219            -433521
                                                                           6.         troubledhare                             -18742             -42831      -36272      -56503            -439233
                                                                           7.         giddyicecream                            -33329             -57372      -53694      -59734            -548090

                                                400
                                                               ardentstork
                                                               giddyicecream                                                                                                                          troubledhare
                                                               sincerefish
                                                               troubledhare                                                                                                                                                 giddyicecream
Number of submissions

                                                300            hushedtomatoe                                                                                                                       493
                                                               sombertortoise                                                                                                                                   220             sombertortoise
                                                                                                                                                                ardentstork
                                                                                                                                                                                                                  180
                                                                                                                                                                                     837                                         hushedtomatoe
                                                200                                                                                                                                                                   87

                                                                                                                                                                                                   1249
                                                100

                                                  0                                                                                                                                                         sincerefish
                                                                   4            9            4          9           3          8            3
                                                               0-1          0-1          0-2         0-2         1-0        1-0         1-1
                                                           0-1          0-1          0-1          0-1        0-1         0-1        0-1
                                                       202          202          202          202        202         202        202

                                                                                                                                                (a) Phase 2
                                                                     ardentstork
                                                      500            giddyicecream                                                                                        giddyicecream
                                                                     sincerefish                                                                                                                                  ardentstork
                                                                     innocenttortoise
                        Number of submissions

                                                      400            troubledhare
                                                                                                                                                                                             863          748                  sombertortoise
                                                                     hushedtomatoe
                                                                     sombertortoise                                                                                                                               271            troubledhare
                                                      300
                                                                                                                                                                                                                      242
                                                                                                                                                                                                                       85
                                                                                                                                                                                                                        2        innocenttortoise
                                                                                                                                                                                                                                 hushedtomatoe
                                                      200
                                                                                                                                                                                               3530

                                                      100

                                                        0                                                                                                                     sincerefish
                                                                       6            1            6           1           6            1
                                                                   1-1          1-2          1-2         2-0         2-0          2-1
                                                               0-1          0-1          0-1          0-1         0-1         0-1
                                                            202         202          202          202         202         202

                                                                                                                                                (b) Phase 3
Fig. 3: Number of submissions over time (left, one bar corresponds to one day) and per team (right).
A Robot Cluster for Reproducible Research in Dexterous Manipulation
[2] AI-DO: AI Driving Olympics – Duckietown. Website. https://www.                   editors, Proceedings of The 33rd International Conference on Machine
     duckietown.org/research/ai-driving-olympics. Ac-                                 Learning, volume 48 of Proceedings of Machine Learning Research,
     cessed: 2021-8-27.                                                               pages 1928–1937, New York, New York, USA, 2016. PMLR.
 [3] Arthur Allshire, Mayank Mittal, Varun Lodaya, Viktor Makoviychuk,         [21]   Adithyavairavan Murali, Tao Chen, Kalyan Vasudev Alwala, Dhiraj
     Denys Makoviichuk, Felix Widmaier, Manuel Wüthrich, Stefan Bauer,                Gandhi, Lerrel Pinto, Saurabh Gupta, and Abhinav Gupta. PyRobot:
     Ankur Handa, and Animesh Garg. Transferring dexterous manipulation               An Open-source Robotics Framework for Research and Benchmarking.
     from gpu simulation to a remote real-world trifinger. arXiv preprint             June 2019.
     arXiv:2108.09779, 2021.                                                   [22]   Liam Paull, Jacopo Tani, Heejin Ahn, Javier Alonso-Mora, Luca
 [4] F Amigoni, E Bastianelli, J Berghofer, A Bonarini, G Fontana,                    Carlone, Michal Cap, Yu Fan Chen, Changhyun Choi, Jeff Dusek,
     N Hochgeschwender, L Iocchi, G Kraetzschmar, P Lima, M Mat-                      Yajun Fang, Daniel Hoehener, Shih-Yuan Liu, Michael Novitzky,
     teucci, P Miraldo, D Nardi, and V Schiaffonati. Competitions for                 Igor Franzoni Okuyama, Jason Pazis, Guy Rosman, Valerio Varricchio,
     Benchmarking: Task and Functionality Scoring Complete Performance                Hsueh-Cheng Wang, Dmitry Yershov, Hang Zhao, Michael Benjamin,
     Assessment. IEEE robotics & automation magazine / IEEE Robotics                  Christopher Carr, Maria Zuber, Sertac Karaman, Emilio Frazzoli,
     & Automation Society, 22(3):53–61, September 2015.                               Domitilla Del Vecchio, Daniela Rus, Jonathan How, John Leonard, and
 [5] Anonymous. Real robot challenge phase 2: Manipulating objects using              Andrea Censi. Duckietown: An open, inexpensive and flexible platform
     high-level coordination of motion primitives. In Submitted to Real               for autonomy education and research. In 2017 IEEE International
     Robot Challenge 2020, 2020. under review.                                        Conference on Robotics and Automation (ICRA), pages 1497–1504,
 [6] Sven Behnke. Robot competitions-ideal benchmarks for robotics                    May 2017.
     research. In Proc. of IROS-2006 Workshop on Benchmarks in Robotics        [23]   Daniel Pickem, Paul Glotfelter, Li Wang, Mark Mote, Aaron Ames,
     Research. Institute of Electrical and Electronics Engineers (IEEE),              Eric Feron, and Magnus Egerstedt. The robotarium: A remotely
     2006.                                                                            accessible swarm robotics research testbed. In 2017 IEEE International
 [7] F Bonsignorio and A P del Pobil. Toward Replicable and Measurable                Conference on Robotics and Automation (ICRA), pages 1699–1706.
     Robotics Research [From the Guest Editors]. IEEE robotics &                      IEEE, 2017.
     automation magazine / IEEE Robotics & Automation Society, 22(3):32–       [24]   Ivaylo Popov, Nicolas Heess, Timothy Lillicrap, Roland Hafner, Gabriel
     35, September 2015.                                                              Barth-Maron, Matej Vecerik, Thomas Lampe, Yuval Tassa, Tom Erez,
 [8] Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider,                 and Martin Riedmiller. Data-efficient Deep Reinforcement Learning
     John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym. arXiv                 for Dexterous Manipulation. April 2017.
     preprint arXiv:1606.01540, 2016.                                          [25]   Yuval Tassa, Yotam Doron, Alistair Muldal, Tom Erez, Yazhe Li,
 [9] B Calli, A Walsman, A Singh, S Srinivasa, P Abbeel, and A M                      Diego de Las Casas, David Budden, Abbas Abdolmaleki, Josh Merel,
     Dollar. Benchmarking in Manipulation Research: Using the Yale-CMU-               Andrew Lefrancq, et al. Deepmind control suite. arXiv preprint
     Berkeley Object and Model Set. IEEE robotics & automation magazine               arXiv:1801.00690, 2018.
     / IEEE Robotics & Automation Society, 22(3):36–52, September 2015.        [26]   Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics
[10] Berk Calli, Aaron Walsman, Arjun Singh, Siddhartha Srinivasa, Pieter             engine for model-based control. In 2012 IEEE/RSJ International
     Abbeel, and Aaron M Dollar. Benchmarking in Manipulation Research:               Conference on Intelligent Robots and Systems, pages 5026–5033. IEEE,
     The YCB Object and Model Set and Benchmarking Protocols. February                2012.
     2015.                                                                     [27]   Manuel Wüthrich, Felix Widmaier, Felix Grimminger, Joel Akpo, Shruti
[11] Claire Chen, Krishnan Srinivasan, Jeffrey Zhang, Junwu Zhang, Lin                Joshi, Vaibhav Agrawal, Bilal Hammoud, Majid Khadiv, Miroslav
     Shao, Shenli Yuan, Preston Culbertson, Hongkai Dai, Mac Schwager,                Bogdanovic, Vincent Berenz, Julian Viereck, Maximilien Naveau,
     and Jeannette Bohg. Dexterous manipulation primitives for the real               Ludovic Righetti, Bernhard Schölkopf, and Stefan Bauer. TriFinger:
     robot challenge, 2021.                                                           An Open-Source Robot for Learning Dexterity. In Proceedings of the
[12] Erwin Coumans and Yunfei Bai. Pybullet, a python module for physics              Conference on Robot Learning (CoRL), August 2020.
     simulation for games, robotics and machine learning. GitHub repository,   [28]   Brian Yang, Jesse Zhang, Vitchyr Pong, Sergey Levine, and Dinesh
     2016.                                                                            Jayaraman. Replab: A reproducible low-cost arm benchmark platform
[13] Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel.             for robotic learning. arXiv preprint arXiv:1905.07447, 2019.
     Benchmarking deep reinforcement learning for continuous control. In       [29]   Takuma Yoneda, Charles Schaff, Takahiro Maeda, and Matthew Walter.
     International Conference on Machine Learning, pages 1329–1338,                   Grasp and motion planning for dexterous manipulation for the real
     2016.                                                                            robot challenge, 2021.
[14] Scott Fujimoto, Herke van Hoof, and David Meger. Addressing               [30]   Iker Zamora, Nestor Gonzalez Lopez, Victor Mayoral Vilches, and
     Function Approximation Error in Actor-Critic Methods. February                   Alejandro Hernandez Cordero. Extending the openai gym for robotics:
     2018.                                                                            a toolkit for reinforcement learning using ros and gazebo. arXiv
                                                                                      preprint arXiv:1608.05742, 2016.
[15] Niklas Funk, Charles Schaff, Rishabh Madan, Takuma Yoneda, Julen
     Urain De Jesus, Joe Watson, Ethan K. Gordon, Felix Widmaier, Stefan
     Bauer, Siddhartha S. Srinivasa, Tapomayukh Bhattacharjee, Matthew R.
     Walter, and Jan Peters. Benchmarking structured policies and policy
     optimization for real-world dexterous object manipulation, 2021.
[16] Felix Grimminger, Avadesh Meduri, Majid Khadiv, Julian Viereck,
     Manuel Wüthrich, Maximilien Naveau, Vincent Berenz, Steve Heim, Fe-
     lix Widmaier, Jonathan Fiene, Alexander Badri-Spröwitz, and Ludovic
     Righetti. An Open Torque-Controlled Modular Robot Architecture
     for Legged Locomotion Research. In International Conference on
     Robotics and Automation (ICRA), 2020.
[17] Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine.
     Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement
     Learning with a Stochastic Actor. January 2018.
[18] Nicolas Heess, T B Dhruva, Srinivasan Sriram, Jay Lemmon, Josh
     Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, S M
     Ali Eslami, Martin Riedmiller, and David Silver. Emergence of
     Locomotion Behaviours in Rich Environments. July 2017.
[19] Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina
     Precup, and David Meger. Deep reinforcement learning that matters.
     In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[20] Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex
     Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray
     Kavukcuoglu. Asynchronous Methods for Deep Reinforcement
     Learning. In Maria Florina Balcan and Kilian Q Weinberger,
A Robot Cluster for Reproducible Research in Dexterous Manipulation
You can also read