PYTOUCH: A MACHINE LEARNING LIBRARY FOR TOUCH PROCESSING

Page created by Theodore Deleon
 
CONTINUE READING
PYTOUCH: A MACHINE LEARNING LIBRARY FOR TOUCH PROCESSING
PyTouch: A Machine Learning Library for Touch Processing
                                                               Mike Lambeta1 , Huazhe Xu2 , Jingwei Xu3 , Po-Wei Chou1 , Shaoxiong Wang4 ,
                                                                                Trevor Darrell2 , and Roberto Calandra1

                                            Abstract— With the increased availability of rich tactile       want to make use of tactile sensors, which are forced to
                                         sensors, there is an equally proportional need for open-source     implement their own touch processing routines. We believe
                                         and integrated software capable of efficiently and effectively     that similarly to computer vision, the availability of open-
                                         processing raw touch measurements into high-level signals that
                                         can be used for control and decision-making. In this paper, we     source and maintained software libraries for processing touch
arXiv:2105.12791v1 [cs.RO] 26 May 2021

                                         present PyTouch – the first machine learning library dedicated     reading would lessens the barrier of entry to tactile based
                                         to the processing of touch sensing signals. PyTouch, is designed   tasks, experimentation, and research in the touch sensing
                                         to be modular, easy-to-use and provides state-of-the-art touch     domain.
                                         processing capabilities as a service with the goal of unifying        To address this challenge we introduce PyTouch – the
                                         the tactile sensing community by providing a library for
                                         building scalable, proven, and performance-validated modules       first open-source library for touch processing that enables
                                         over which applications and research can be built upon.            the machine learning and the robotics community to process
                                         We evaluate PyTouch on real-world data from several tactile        raw touch data from tactile sensors through abstractions
                                         sensors on touch processing tasks such as touch detection,         which focus on the experiment instead of the low level
                                         slip and object pose estimations. PyTouch is open-sourced at       details of elementary concepts. PyTouch is designed to be
                                         https://github.com/facebookresearch/pytouch.
                                                                                                            a high-performance, modular, and easy to use library aiming
                                                                I. INTRODUCTION                             to provide touch processing functionalities “as a service”
                                                                                                            through a pre-trained model collection capable of kick-
                                            A fundamental challenge in robotics is to process raw           starting initial experiments to allowing end applications using
                                         sensor measurements into high-level features that can be           new or variations of existing sensors the ability to apply
                                         used for control and decision-making. For the sense of             transfer learning in levering the performance of the library.
                                         vision, the field of computer vision has been dedicated to         The software library modularizes a set of commonly used
                                         study provide an algorithmic and programmatic method of            tactile-processing functions valuable for various down-stream
                                         understanding images and videos. In this field, open-source        tasks, such as tactile manipulation, slip detection, object
                                         libraries such as PyTorch [1], CAFFE [2], and OpenCV [3]           recognition based on touch, etc. Furthermore, the library
                                         have enabled the acceleration of research and collectively         aims to standardize the way touch based experiments are
                                         brought together commonly used techniques in each perspec-         designed in reducing the amount of individual software
                                         tive domain by providing unified interfaces, algorithms, and       developed for one off experiments by using the PyTouch
                                         platforms. PyTorch and CAFFE have enabled researchers to           library as a foundation which can be expanded upon for
                                         develop and scale neural networks by reducing the amount           future experimental and research applications.
                                         of ground work required for implementing algorithms such              While tools such as PyTorch and CAFFE currently exist
                                         as backpropagation or convolution functions. Furthermore,          and can be applied to touch processing, precursor develop-
                                         OpenCV provided benefits such as a collection of commonly          ment is required to support the necessary algorithms for the
                                         used, tested and optimized functions. These frameworks,            experiment and research needs, PyTouch provides a entry
                                         libraries and platforms all have an underlying goal in en-         point using such tools specifically catered towards the goals
                                         abling further developments in their respective fields through     of touch processing. PyTouch was designed to support both
                                         abstracting lower level processes into building blocks for         entry level users beginning their first steps in tactile sensing
                                         further applications.                                              and power users well versed in this domain. PyTouch aims to
                                            With the increased availability of rich tactile sensors [4],    generalize to diverse tactile input devices that might differ
                                         [5], [6], [7], the sense of touch is becoming a new and            from each other in many design choices. This effort joins
                                         important sensor modality in robotics and machine learning.        the recent efforts to standardize robotics research for better
                                         Sensing the world through touch open exciting new chal-            benchmarks and more reproducible results. Finally, in hand
                                         lenges and opportunities to measure, understand and interact       with the framework, we release a set of pre-trained models
                                         with the world around us. However, the availability of ready-      which are used by PyTouch in the background for tactile
                                         to-use touch processing software is extremely limited –            based tasks.
                                         this results in a high entry bar for new practitioners that           In this paper, we describe the architectural choices of
                                                                                                            library and demonstrate some of its capabilities and benefits
                                           1   Facebook AI Research, Menlo Park, USA
                                           2                                                                through several experiments. Notably, for the first time
                                               University of California, Berkeley, USA
                                           3   Shanghai Jiao Tong University, China                         we evaluate the performance of a machine learning model
                                           4   Massachusetts Institute of Technology, USA                   trained across different models of vision based tactile sen-
sors, and show that this improves performance compared to         barrier to entry due to cost, sparseness of available hardware
models trained on single tactile sensors through the use of       arising from single one-off builds, and lack of reproducibility
PyTouch itself to rapidly prototype these experiments.            due to semi-closed or closed sourced designs. The DIGIT
   The future facing goal of PyTouch is to create an ex-          sensor [5] aimed to resolve these problems by providing
tendable library for touch processing, analogous to what          a low cost, scalable and easily machine manufacturable
PyTorch and OpenCV are for computer vision, in a way              tactile touch sensor. Amongst this, we see a pattern of
which allows reliable and rapid prototyping of experiments        individualized pieces of work in the tactile sensing field but a
and applications. Additionally, through the use of PyTouch        lack of unification that brings this work together into a library
as a set of touch processing building blocks and released         that future work can build upon. PyTouch is a step towards
pre-trained models we lessen the barrier to entry into touch      unifying the tactile processing field to create a proven, tested
processing while allowing researchers and experimenters to        and modular open source software library available to all
focus on the end goal of their application through proven         researchers and practitioners.
algorithms and models supplied by the expansion of this
library. We believe that this would beneficially impact the            III. A LIBRARY FOR TOUCH PROCESSING
robotic and machine learning community by enabling new               We now describe the desired specifications considered
capabilities and accelerate research.                             during the design of our touch processing framework, and
                  II. RELATED WORK                                the corresponding design choices adopted. In addition, we
                                                                  describe the software architecture of PyTouch, and several
   Developed fields in computer science build application         of its key features.
and research upon frameworks. There are many frameworks
existing today which bring together common paradigms used         A. Desired Specifications
in their respective domains while providing benefit to the
community. One example of such library is OpenCV [3]                 PyTouch aims to provide the same function in the domain
for the computer vision community, which is a well known          of tactile touch processing through the introduction of an
and probably the most widely-used data processing library         open-source library in which proven, tested and performance
supporting computer vision topics.                                validated modules allow for scaling touch based applications,
   Frameworks which are applicable to the domain of tactile       research and experimentation. This work aims to set forth a
touch sensing are far and few between or not available            need for the unification of tactile processing based software
through open-source channels. Observing public repositories       which not only strengthens the field but also decreases the
and open-source networks, we see very little open-source          barrier to entry in designing applications and research which
work pertaining to building a tactile processing framework.       utilize tactile touch. As such, the main goals of PyTouch are
Although there is work in isolated instances which provide        to provide a library that is powered by pre-trained models
insight into specific problems such as interfacing with robotic   through the ”as a service” paradigm, enables transfer learning
skin and providing interfaces for applications [8], [9], there    through the PyTouch for extending models for new hardware
are no works which brings commonly used paradigms within          sensors, and deploys joint models which work across a
the field of tactile touch sensing together or that provide       variety of vision based tactile sensors.
general purpose tools. Previous attempts include studying            By centralizing touch processing tasks into a software
friction models [10], physical property understanding [11],       library and through the models provided through PyTouch,
[12], [13], in-hand manipulation, determining physical ma-        this provides a path forward for sensor generalization. This
nipulation characteristics and grasp success. [14] demon-         is important due to the variance in produced hardware,
strates individual capabilities related to tactile touch which    image transfer layers and mechanical construction of lab-
allow for robotic learning in providing given forces to           built sensors.
an object, determining density and texture through stirring          PyTouch is designed in mind to cater to first time and
actions, and various types of in hand or arm rotation of          power users of the tactile touch community. High level
objects. Other work such as [15] learns robotic grasp success     abstractions of common goals are provided such as, but not
through an end-to-end action-conditional model based off of       limited to: “am I touching an object”, “is the object slipping”,
raw tactile sensor input data. [16] introduces TactileGCN         or “what is the touch contact area of the object”. Diving
which estimates grasp stability through a graph convolutional     deeper, the library can be used to perform analysis techniques
network. PyTouch aims at being a general purpose and open-        such as augmenting the input sensor data, modifying the
source solution that combines useful computational tools and      models used for inference, providing motion and optical flow
capabilities into a single framework and thereby proposes         outputs, or transfer learning techniques for extending the
an agnostic library for tactile sensors regardless of their       library to non-vision based sensors.
hardware or sensing modalities.
   Another contribution of PyTouch is to push towards             B. PyTouch Architecture
hardware standardization by providing flexible interfaces.          Pre-trained Models. PyTouch is built upon a library of
Observing the tactile touch community, we saw common              pre-trained models which provide real-time touch processing
trends amongst the hardware used for acquiring touch input:       functionalities. The pre-trained models are downloaded after
PyTouch

                                                                                    Touch Detection                              Application
                                         Initialization
          Live                                                                       Slip Detection
                                        Input Selection

                                                                                                             PyTouch Interface
                                                                                     Contact Area

                       Input Capture
                                       Model Pre-Loading
                                                                                   Sensor Experiments
                                                                PyTouch                     ...
                                                              Model Library                                                      Experiment
          Static                                                                        Training
                                         Pre-Processing
                                         Normalization                             Packaged model with:
                                       Data Transformation                           Hyper-parameters
                                                                                         Weights                                  Research
          Video

Fig. 1: PyTouch high level architecture where tactile touch processing is delivered to the end application ”as a service”
through released pre-trained models.

initializing the library with the desired sensor, an exam-             the weights can then be used through PyTouch for different
ple follows in Lst. 1. Currently, the library supports the             hardware input devices without changing the application
DIGIT [5], OmniTact [6] and GelSight [4] sensors, but the              software.
                                                                          Tactile Task Abstractions. PyTouch provides implemen-
Listing 1: Instantiating PyTouch with DIGIT sensor to detect           tations of commonly used features in the tactile touch
touch and slip. PyTouch reduces the amount of code required            domain. The aim of the library is to reduce friction when
to do advanced tasks such as these.                                    experimenting with hardware implementations and robotic
import pytouch as pt                                                   workspaces utilizing touch. Two interfaces are provided in
                                                                       initializing the library for use in applications. The first
digit pt = pt.init(pt.sensors.DigitSensor, tasks={pt.tasks.
      TouchDetect, pt.tasks.SlipDetect})                               interface, Lst. 1, abstracts all internal modules into a single
                                                                       module while the second interface allows for individual
digit = ... # Connect and initialize DIGIT sensor                      module level access to each task for power users.
touch, certainty = digit pt.is touched(digit.get frame())                 With each interface, an input device is registered which
slipping, = digit pt.is slipping(digit.get frames())
print(f”Touching object: {touch}, with certainty: {certainty};
                                                                       loads the appropriate pre-trained models. These pre-trained
      Object Slipping: {slipping},”)                                   models are downloaded during initialization or when the
                                                                       appropriate class is instantiated in the background. The reg-
   library can be used to train models using data from other           istered input device provides defaults as to which pre-trained
vision or non-vision based tactile sensors. At the current             models to use, and defines the internal data transformation
state, PyTouch provides the following functions: contact               to the models. Additionally, PyTouch supports extending
classification, slip detection, contact area estimation, and           the input devices for custom hardware through adding an
interfaces for training and transfer learning. By developing           additional input device module, or by specifying the custom
this library with modularity and scalability in mind, we               device module which allows the user to provide custom pre-
plan to extend features and provide new models ”as a                   trained models.
service” to the community and supporting new additions                    PyTouch is built upon a proven and widely used machine
from the community itself. Additionally, deploying these               learning framework, PyTorch [1].
models ”as a service” aids in the experimentation of different            Model and Network Agnostic. One of the crucial benefits
approaches to the same task by swapping pre-trained models             of PyTouch is the abstraction of high-level function calls
for another without changing high-level software. This allows          which allows end-users to integrate their own models specific
for performance benchmarking of real-world experiments for             to experiments, sensors, or use our pre-trained models and
the purposes of creating a tactile task baseline.                      perform fine-tuning. PyTouch introduces standardization of
   Sensor Abstraction. PyTouch can be used in a number of              learning-based methods through the application program-
ways such as, providing pre-captured video streams, static             ming interface of being able to swap models as needed,
images, or live video captured from an input device. The               or change them through initialization parameters. PyTouch
library is instantiated with a known hardware device which             abstracts the low level details in order to allow new users to
in the background provides inference and functionality based           experiment through models served by the PyTouch team, or
on the hardware specific pre-trained models. The set of sup-           through the community. This ultimately allows PyTouch to
ported devices is extendable through the library as well, and          be agnostic to model support in supporting conventional deep
they may be initialized by supplying a custom configuration            CNN models and through non-conventional learning based
file with models respective to that device. PyTouch provides           methods. Furthermore, this abstraction enables new users to
the training routines for each library capability. Once trained,       touch processing to begin experimentation without having
Sensor
                                                                                                                  DIGIT [5]    OmniTact [6]   GelSight [4]
                                                                 Model
                                                                 Single models (no reference)                     95.5 ± 1.2    98.4 ± 1.1    93.7 ± 1.2
                                                                 Single models (with reference)                   95.8 ± 1.3    98.5 ± 1.1       N/A
                                                                 Joint model (no reference)                       96.1 ± 1.0    99.1 ± 0.4    98.3 ± 0.6
                                                                 Joint model (with reference)                     96.2 ± 1.1    99.5 ± 0.3       N/A

                                                                TABLE I: Classification accuracy [%] of touch detection
                                                                (mean and std) using cross-validation (k = 5). The joint
                                                                models are trained with data from all three sensors: DIGIT,
                                                                OmniTact and GelSight.

                                                                                              1.00

                                                                        Validation Accuracy
                                                                                              0.98
                                                                                              0.96
                                                                                              0.94
                                                                                              0.92
Fig. 2: Examples of data used to train our touch prediction                                   0.90
                                                                                                                               Single Model
models. The dataset include data across several DIGITs,                                       0.88                             Joint Model
OmniTact, and GelSight sensors showing different lightning                                           0   500   1000   1500     2000   2500    3000
conditions and objects of various spatial resolutions.                                                         Train Dataset Size

                                                                Fig. 3: Cross validation (k = 5) accuracy with varying
                                                                train dataset size for single and joint models. With the same
knowledge of model, network and parameter design choices        amount of data, training a joint model using data across
but rather to focus on the experimentation itself.              multiple sensors (i.e., DIGITs, OmniTact and GelSight)
                                                                results in better model performance compared to training
            IV. EXPERIMENTAL RESULTS                            from a single sensor.
   We now provide several examples which outline the
advantages in using the PyTouch library. Predominantly,
the flexibility of the library to work with different vision-   grasp images from two GelSight sensors. We used only a
based tactile sensors such as DIGIT [5], OmniTact [6] and       subset of the total dataset for a total of approximately 1000
GelSight [4]. Secondly, the inherent design of the library to   images. The input images collected for DIGIT are images
enable capabilities through the ”as a service” paradigm. Py-    with size 240 × 320 × 3, OmniTact images are 480 × 640 × 3
Touch releases pre-trained models which are tested, validated   and GelSight images are 1280 × 960 × 3.
and supports additions via the community to extend library         Training and Architecture. The touch detection capa-
capabilities.                                                   bility is based off of a pre-trained ResNet-18 model. Two
                                                                modalities are used for training, the first is where images are
A. Touch Detection                                              provided to the model without a reference and the second
   A first functionality that PyTouch provides is touch de-     is where the image is concatenated with a reference no-
tection. This is a basic functionality which is often used as   touch image. The reference is unique to each sensor which
a subroutine in more complicated manipulation sequences.        is a frame where the sensor is not touching any object.
However, training high-accuracy touch detection models can      The input data is normalized and down-sampled to 64 × 64.
be difficult and require extensive data collections. Here we    The data was split to ensure an equal distribution of device
show that using one DIGIT sensor, we can determine if           serial numbers. For training with a reference no-touch image,
an object is being touched across multiple variations of        a unique reference image was provided to the model and
that same sensor. The robustness of the library depends on      concatenated to each input image. Finally, a joint dataset
the generalization of the model and due to manufacturing        was created to include all devices in order to show model
variances in the sensors and image transfer layers, no one      robustness and generalization.
sensor is exactly the same.                                        Results and Ablative Study. A model specific to each
   Formulation. Touch detection is formulated as an image       sensor and a joint model comprising of all sensor data across
classification task where given an input image X, we assign     each unique sensor was trained. We show in Table I that the
a label y which is 0 for data without touch and 1 for data      joint model performs better across a 5-fold cross validation
with registered touch.                                          test. The model trained for the OmniTact sensor shows higher
   Datasets. The dataset for DIGITs was collected touch-        performance than the DIGIT model, however, observing the
ing random household and office objects of various spatial      data in the OmniTact dataset, sample images shown in Fig. 2,
resolutions. This dataset was split by unique device serial     we see that the breadth of sample uniqueness is minimal.
numbers in contact, with over overall 2278 samples from 3       We also show the effects of constraining the train dataset
different sensors. The OmniTact dataset was provided by the     sizes ranging from 10% to 100% of the total dataset size
authors of [6] and consisted of 1632 random presses of a        in Fig. 3 that creating a joint model which uses multiple
single object. The GelSight samples were extracted from the     sensors results in better performance while using a similar
dataset provided in [15], which consisted of before and after   amount of data in the training dataset. To our knowledge,
1.0

     Validation Accuracy
                                      From scratch                             estimates the direction of contact with respect to the surface
                                      Pre-trained
                           0.8                                                 of the touch sensor.
                                                                                  PyTouch provides the contact area estimation capability
                           0.6
                                                                               for many input formats such as raw sensor input, video and
                           0.4
                                                                               static images. Additionally, providing centroid information
                                                                               and estimated size of the semi-major and semi-minor axis of
                                 10            20        30         40    50
                                               Percent of Total Dataset        the fitted ellipse.
Fig. 4: Cross validation (k = 5) accuracy with varying                         C. Library Experimentation
train dataset size when training a model from scratch or
starting from the pre-trained joint model. The validation is                      There are a number of parameters available to physical
performed on dataset of size 482 from a DIGIT sensor. Using                    hardware sensor design used in the field of tactile touch
the joint model to fine-tune against a new dataset results in                  sensing and due to the cumbersome nature of hardware
better model performance with lower dataset sizes compared                     development, progress and iteration is often slow and costly.
to training a the model from scratch for the same wall-                        Often, it is valuable to experiment with physical based
clock time of 30 s. Thus PyTouch can provide benefit to                        parameters prior to building a full prototype to reduce the
end applications when using the ”as a service” pre-trained                     number of iteration cycles. PyTouch proposes a module
models.                                                                        which allows in the experimentation of these physical based
                                                                               parameters against previous benchmarks obtained by the
                                                                               joint and single sensor models. We show in Fig. 7 that for
                                                                               the DIGIT [5] sensor which relies on RGB lighting that
this is the first study on multiple sensors manufacturers and
                                                                               a physical hardware change to using monochromatic light
sensor variants to show benefit of a touch processing library.
                                                                               results in less performance compared to chromatic light.
   Furthermore, we show a direct benefit of using PyTouch
in regards to fine-tuning a smaller dataset acquired from a                    D. Slip Detection
new sensor on the joint model. A large dataset was collected
                                                                                  Slip detection is crucial for many robotics tasks e.g.in-
from a new sensor and then constrained from 50% to 10%
                                                                               hand manipulation, grasping. In PyTouch, we incorporate a
of the initial size. The results on the touch detection task
                                                                               slip detection module as a pre-trained model. We also provide
show that using a pre-trained model can both increase the
                                                                               APIs for training customized models. The module design and
final performance and the training wall-clock time when
                                                                               implementation are described as follows.
using our pre-trained model and then fine-tuning, compared
                                                                                  Formulation. We formulate this task as a video classifi-
to a model trained exclusively from new data. We ablate
                                                                               cation task: given an image sequence X = {x1 , · · · , xT },
this experiment over different dataset size to show how the
                                                                               we assign a label y from a discrete set Y. We additionally
advantages are more pronounced the smaller the dataset used
                                                                               label the slip starting and ending timestamp ts and te for
for fine-tuning is. In Fig. 4 we show the results of the fine-
                                                                               extracting shorter sub-sequences. Specifically, the xt is an
tuned model compared to the model trained from the dataset
                                                                               RGB image with size 240 × 320 × 3, the label y is a binary
alone for the same wall-clock time of 30 s. Furthermore,
                                                                               label with 1 representing slip and 0 representing non-slip. We
we shown in Fig. 5 that introducing new sensors using
                                                                               note that both rotational and translational slip are labeled as
PyTouch as a base results in better performance with less
                                                                               slip without distinction.
samples through transfer learning from the released pre-
                                                                                  Dataset. We collected 40 objects with a large spectrum of
trained models PyTouch provides.
                                                                               surface finishes, roughness, and features. This can be thought
   The PyTouch library is designed with the goal to sup-
                                                                               of as “texture-ness” from the heavily textured pen holder
port real time experimentation with input into the touch
                                                                               to an almost textureless plastic bottle. For each object, we
detection model resulting in an average performance of
                                                                               collected 10 slip sequences and 10 non-slip sequences. Each
5.89 ± 0.46 ms. Tactile input sensors are most commonly
                                                                               sequence is consist of 128 frames in total. All the data are
seen with input capture rates of 24 to 60 frames per second
                                                                               collected by using DIGIT sensor to slide or touch an object
which places each frame between 17 and 41 ms. This enables
                                                                               with 30 Hz frequency [5].
PyTouch for use in real time applications which require high
                                                                                  Training and Architecture. We used ResNet-18 and
frame rates of up to 140 fps.
                                                                               ResNet-18-based 3D convolutional neural network as our
                                                                               architecture. For both architecture, we concatenate all the
B. Contact Area Estimation
                                                                               images as input. We use cross-entropy loss as the objective
   When grasping an object with a single or multiple sensor                    of the binary classification task. We follow the standard
it is useful to know the surface contact area of the object                    approach to normalize the input data, downsample the image
against the sensor. We show an example of this in Fig. 6                       to 112 × 112 and use BatchNorm for better performance.
where a number of object contact areas are estimated with                      We use Adam optimizer with learning rate 0.001. Since it is
single and multiple point features using a DIGIT [5] tactile                   video classification, we compare the results in two data split
sensor. The contact area estimated is not referenced to a                      setting. The first setting is “split by sequence“ where we use
ground truth, but rather provides the areas of contacts and                    20% of the data from each object as test sequences. The
1.0                                           1.0                                    1.0                                   1.0                                   1.0
                                                                               Train
                                                                               Fine Tune Frozen
                                                                 0.9                                           0.9                                    0.9                                   0.9                                   0.9
                                                                               Train
                                                                               Fine Tune Unfrozen
                                                                 0.8                                           0.8                                    0.8                                   0.8                                   0.8

                                                                 0.7                                           0.7                                    0.7                                   0.7                                   0.7

                                           Validation Accuracy
                                                                 0.6                                           0.6                                    0.6                                   0.6                                   0.6

                                                                 0.5                                           0.5                                    0.5                                   0.5                                   0.5
                                                                       0      10    20        30    40    50         0      10   20   30   40    50         0      10   20   30   40   50         0      10   20   30   40   50         0      10   20   30   40   50
                                                                           Epoch, Dataset Size=67                        Epoch, Dataset Size=135                Epoch, Dataset Size=203               Epoch, Dataset Size=271               Epoch, Dataset Size=339
                                                                 1.0                                           1.0                                    1.0                                   1.0                                   1.0

                                                                 0.9                                           0.9                                    0.9                                   0.9                                   0.9

                                                                 0.8                                           0.8                                    0.8                                   0.8                                   0.8

                                                                 0.7                                           0.7                                    0.7                                   0.7                                   0.7

                                                                 0.6                                           0.6                                    0.6                                   0.6                                   0.6

                                                                 0.5                                           0.5                                    0.5                                   0.5                                   0.5
                                                                       0      10    20        30    40    50         0      10   20   30   40    50         0      10   20   30   40   50         0      10   20   30   40   50         0      10   20   30   40   50
                                                                           Epoch, Dataset Size=67                        Epoch, Dataset Size=135                Epoch, Dataset Size=203               Epoch, Dataset Size=271               Epoch, Dataset Size=339

Fig. 5: Comparison of model trained just from the dataset collected from the GelSight [4] sensor (”Train”) against model
trained using the pre-trained joint model and fine-tuning the last 3 layers (”Fine Tune Unfrozen”) or just the last layer (”Fine
Tune Frozen”). The results show using the pre-trained and fine-tuning the last 3 layers model significantly improve the
performance of the models. This suggests that the pre-trained model learned useful features that can generalize to different
sensors.

                                                                                                                                                                             split. In the same table, we also extract a shorter sequence
                                                                                                                                                                             with 12 frames (approximately 400 milliseconds) as in
                                                                                                                                                                             [18] from the starting timestamp ts to ts+12 . We find the
                                                                                                                                                                             performance drops slightly to around 90%.
Fig. 6: Examples of contact area estimations using 4 objects:                                                                                                                                      V. CONCLUSION
glass ball, pen, cable and fork                                                                                                                                                 As the field of touch sensing evolves, a need arises for ad-
                                                                                                                                                                             vanced touch processing functionalities capable of efficiently
                             1.0                                                                                                                                             extracting information from raw sensor measurements. Prior
       Validation Accuracy

                             0.9
                                                                                                                                                                             work provided some of these functionalities, but typically
                                                                                                                                                                             ad-hoc – for single sensors, single applications, and not
                             0.8
                                                                                                                                                                             necessarily with reusable code (e.g., open-source). To better
                             0.7                                                                                                                                             support and grow the tactile sensing community, we believe
                                                                                                         RGB                     Grayscale
                             0.6
                                                                                                                                                                             that is necessary the creation of a software library which
                                   0                             10                      20               30                     40             50
                                                                                               Epoch
                                                                                                                                                                             brings together advancements in the touch processing field
                                                                                                                                                                             through a common interface. To address this need, we
Fig. 7: DIGIT [5] sensor loss in performance when compar-
                                                                                                                                                                             introduce PyTouch, a unified library for touch processing.
ing monochromatic light to RGB lighting
                                                                                                                                                                             Our goal with this library is to enable academics, researchers
                                                                                                                                                                             and experimenters to rapidly scale, modify and deploy new
                                                                                                                                                                             machine learning models for touch sensing. In this paper, we
second split is “split by objects” where we withhold 20%                                                                                                                     detailed the design choices and the corresponding software
objects as test data. We note that the latter is much harder                                                                                                                 architecture of this library. PyTouch is designed with open-
than the former due to the need of generalization.                                                                                                                           source at the core, enabling tactile processing ”as-a-service”
   Results and Ablative Study. In Table II, we show the                                                                                                                      through distributed pre-trained models, and a modular and
classification accuracy with all 128 frames or 12 frames as                                                                                                                  expandable library architecture. We demonstrate several of
inputs. We notice that the original ResNet-18 [17] outper-                                                                                                                   the feature of PyTouch on touch detection, slip and object
forms 3D conv net. We attribute this to the high variance of                                                                                                                 pose estimation experiments using 3 different tactile sensors,
slip velocity in the dataset that confuse the 3D convolutional                                                                                                               and we show that a unified tactile touch library provides
kernel. We also find that the accuracy on “split by object”                                                                                                                  benefit to rapid experimentation in robotic tactile process-
is lower than “split by sequence”, which confirms our claim                                                                                                                  ing. We open-source PyTouch at https://github.com/
about the requirement for better generalizability in the former                                                                                                              facebookresearch/pytouch, and aim in continuing
                                                                                                                                                                             the evolution of this library to further the field of touch
                                                                                                                                                                             processing.
                                       Split by Sequences [%]                                                            Split by Objects [%]
                                       len = 12     len = 128                                                            len = 12     len = 128                                                ACKNOWLEDGMENT
  ResNet-18                            90.6         97.9                                                                 89.2         96.3
  3D Conv                              N/A          91.7                                                                 N/A          81.2                                     We thank Omry Yadan for insightful discussions and
                                                                                                                                                                             assistance to integrate Hydra [19], Frederik Ebert, Stephen
TABLE II: Slip detection accuracy averaged over all trials                                                                                                                   Tian and Akhil Padmanabha for providing the OmniTact
with different architectures, number of frames and dataset                                                                                                                   dataset, and Joe Spisak for helping with community building
split methods.                                                                                                                                                               and project vision.
R EFERENCES                                               International Conference on Robotics and Automation (ICRA), 2016,
                                                                                       pp. 536–543.
 [1] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan,           [12]   A. Burka, S. Hu, S. Helgeson, S. Krishnan, Y. Gao, L. A. Hendricks,
     T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf,              T. Darrell, and K. J. Kuchenbecker, “Proton: A visuo-haptic data
     E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner,            acquisition system for robotic learning of surface properties,” in IEEE
     L. Fang, J. Bai, and S. Chintala, “Pytorch: An imperative style, high-            International Conference on Multisensor Fusion and Integration for
     performance deep learning library,” in Advances in neural information             Intelligent Systems (MFI), 2016, pp. 58–65.
     processing systems, 2019, pp. 8026–8037.                                   [13]   A. Burka, S. Hu, S. Krishnan, K. J. Kuchenbecker, L. A. Hendricks,
 [2] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick,               Y. Gao, and T. Darrell, “Toward a large-scale visuo-haptic dataset
     S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for             for robotic learning,” in proceedings of the Workshop on the Future
     fast feature embedding,” arXiv preprint arXiv:1408.5093, 2014.                    of Datasets in Vision at Computer Vision and Pattern Recognition
 [3] G. Bradski, “The OpenCV Library,” Dr. Dobb’s Journal of Software                  (CVPR), 2015.
     Tools, 2000.                                                               [14]   B. Belousov, A. Sadybakasov, B. Wibranek, F. Veiga, O. Tessmann,
 [4] W. Yuan, S. Dong, and E. H. Adelson, “Gelsight: High-resolution robot             and J. Peters, “Building a library of tactile skills based on fingervi-
     tactile sensors for estimating geometry and force,” Sensors, 2017.                sion,” 2019.
 [5] M. Lambeta, P.-W. Chou, S. Tian, B. Yang, B. Maloon, V. R. Most,           [15]   R. Calandra, A. Owens, D. Jayaraman, W. Yuan, J. Lin, J. Malik,
     D. Stroud, R. Santos, A. Byagowi, G. Kammerer, D. Jayaraman, and                  E. H. Adelson, and S. Levine, “More than a feeling: Learning to grasp
     R. Calandra, “DIGIT: A novel design for a low-cost compact high-                  and regrasp using vision and touch,” IEEE Robotics and Automation
     resolution tactile sensor with application to in-hand manipulation,”              Letters (RA-L), vol. 3, no. 4, pp. 3300–3307, 2018.
     IEEE Robotics and Automation Letters (RA-L), vol. 5, no. 3, pp. 3838–      [16]   A. Garcia-Garcia, B. S. Zapata-Impata, S. Orts-Escolano, P. Gil, and
     3845, 2020.                                                                       J. Garcia-Rodriguez, “Tactilegcn: A graph convolutional network for
 [6] A. Padmanabha, F. Ebert, S. Tian, R. Calandra, C. Finn, and S. Levine,            predicting grasp stability with tactile sensors,” in International Joint
     “Omnitact: A multi-directional high-resolution touch sensor,” IEEE                Conference on Neural Networks (IJCNN), 2019, pp. 1–8.
     International Conference on Robotics and Automation (ICRA), 2020.          [17]   K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning
 [7] A. C. Abad and A. Ranasinghe, “Visuotactile sensors with emphasis                 for image recognition,” CoRR, vol. abs/1512.03385, 2015. [Online].
     on gelsight sensor: A review,” IEEE Sensors Journal, vol. 20, no. 14,             Available: http://arxiv.org/abs/1512.03385
     pp. 7628–7638, 2020.                                                       [18]   J. Li, S. Dong, and E. Adelson, “Slip detection with combined tactile
 [8] S. Youssefi, S. Denei, F. Mastrogiovanni, and G. Cannata, “Skinware               and visual information,” in IEEE International Conference on Robotics
     2.0: a real-time middleware for robot skin,” SoftwareX, vol. 3, pp.               and Automation (ICRA), 2018, pp. 7772–7777.
     6–12, 2015.                                                                [19]   O. Yadan, “Hydra - a framework for elegantly configuring
 [9] ——, “A real-time data acquisition and processing framework for                    complex applications,” Github, 2019. [Online]. Available: https:
     large-scale robot skin,” Robotics and Autonomous Systems, vol. 68,                //github.com/facebookresearch/hydra
     pp. 86–103, 2015.
[10] H. Culbertson, J. J. Lopez Delgado, and K. J. Kuchenbecker, “The
     penn haptic texture toolkit for modeling, rendering, and evaluating
     haptic virtual textures,” 2014.
[11] Y. Gao, L. A. Hendricks, K. J. Kuchenbecker, and T. Darrell, “Deep
     learning for tactile understanding from visual and haptic data,” in IEEE
You can also read