PYTOUCH: A MACHINE LEARNING LIBRARY FOR TOUCH PROCESSING
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
PyTouch: A Machine Learning Library for Touch Processing Mike Lambeta1 , Huazhe Xu2 , Jingwei Xu3 , Po-Wei Chou1 , Shaoxiong Wang4 , Trevor Darrell2 , and Roberto Calandra1 Abstract— With the increased availability of rich tactile want to make use of tactile sensors, which are forced to sensors, there is an equally proportional need for open-source implement their own touch processing routines. We believe and integrated software capable of efficiently and effectively that similarly to computer vision, the availability of open- processing raw touch measurements into high-level signals that can be used for control and decision-making. In this paper, we source and maintained software libraries for processing touch arXiv:2105.12791v1 [cs.RO] 26 May 2021 present PyTouch – the first machine learning library dedicated reading would lessens the barrier of entry to tactile based to the processing of touch sensing signals. PyTouch, is designed tasks, experimentation, and research in the touch sensing to be modular, easy-to-use and provides state-of-the-art touch domain. processing capabilities as a service with the goal of unifying To address this challenge we introduce PyTouch – the the tactile sensing community by providing a library for building scalable, proven, and performance-validated modules first open-source library for touch processing that enables over which applications and research can be built upon. the machine learning and the robotics community to process We evaluate PyTouch on real-world data from several tactile raw touch data from tactile sensors through abstractions sensors on touch processing tasks such as touch detection, which focus on the experiment instead of the low level slip and object pose estimations. PyTouch is open-sourced at details of elementary concepts. PyTouch is designed to be https://github.com/facebookresearch/pytouch. a high-performance, modular, and easy to use library aiming I. INTRODUCTION to provide touch processing functionalities “as a service” through a pre-trained model collection capable of kick- A fundamental challenge in robotics is to process raw starting initial experiments to allowing end applications using sensor measurements into high-level features that can be new or variations of existing sensors the ability to apply used for control and decision-making. For the sense of transfer learning in levering the performance of the library. vision, the field of computer vision has been dedicated to The software library modularizes a set of commonly used study provide an algorithmic and programmatic method of tactile-processing functions valuable for various down-stream understanding images and videos. In this field, open-source tasks, such as tactile manipulation, slip detection, object libraries such as PyTorch [1], CAFFE [2], and OpenCV [3] recognition based on touch, etc. Furthermore, the library have enabled the acceleration of research and collectively aims to standardize the way touch based experiments are brought together commonly used techniques in each perspec- designed in reducing the amount of individual software tive domain by providing unified interfaces, algorithms, and developed for one off experiments by using the PyTouch platforms. PyTorch and CAFFE have enabled researchers to library as a foundation which can be expanded upon for develop and scale neural networks by reducing the amount future experimental and research applications. of ground work required for implementing algorithms such While tools such as PyTorch and CAFFE currently exist as backpropagation or convolution functions. Furthermore, and can be applied to touch processing, precursor develop- OpenCV provided benefits such as a collection of commonly ment is required to support the necessary algorithms for the used, tested and optimized functions. These frameworks, experiment and research needs, PyTouch provides a entry libraries and platforms all have an underlying goal in en- point using such tools specifically catered towards the goals abling further developments in their respective fields through of touch processing. PyTouch was designed to support both abstracting lower level processes into building blocks for entry level users beginning their first steps in tactile sensing further applications. and power users well versed in this domain. PyTouch aims to With the increased availability of rich tactile sensors [4], generalize to diverse tactile input devices that might differ [5], [6], [7], the sense of touch is becoming a new and from each other in many design choices. This effort joins important sensor modality in robotics and machine learning. the recent efforts to standardize robotics research for better Sensing the world through touch open exciting new chal- benchmarks and more reproducible results. Finally, in hand lenges and opportunities to measure, understand and interact with the framework, we release a set of pre-trained models with the world around us. However, the availability of ready- which are used by PyTouch in the background for tactile to-use touch processing software is extremely limited – based tasks. this results in a high entry bar for new practitioners that In this paper, we describe the architectural choices of library and demonstrate some of its capabilities and benefits 1 Facebook AI Research, Menlo Park, USA 2 through several experiments. Notably, for the first time University of California, Berkeley, USA 3 Shanghai Jiao Tong University, China we evaluate the performance of a machine learning model 4 Massachusetts Institute of Technology, USA trained across different models of vision based tactile sen-
sors, and show that this improves performance compared to barrier to entry due to cost, sparseness of available hardware models trained on single tactile sensors through the use of arising from single one-off builds, and lack of reproducibility PyTouch itself to rapidly prototype these experiments. due to semi-closed or closed sourced designs. The DIGIT The future facing goal of PyTouch is to create an ex- sensor [5] aimed to resolve these problems by providing tendable library for touch processing, analogous to what a low cost, scalable and easily machine manufacturable PyTorch and OpenCV are for computer vision, in a way tactile touch sensor. Amongst this, we see a pattern of which allows reliable and rapid prototyping of experiments individualized pieces of work in the tactile sensing field but a and applications. Additionally, through the use of PyTouch lack of unification that brings this work together into a library as a set of touch processing building blocks and released that future work can build upon. PyTouch is a step towards pre-trained models we lessen the barrier to entry into touch unifying the tactile processing field to create a proven, tested processing while allowing researchers and experimenters to and modular open source software library available to all focus on the end goal of their application through proven researchers and practitioners. algorithms and models supplied by the expansion of this library. We believe that this would beneficially impact the III. A LIBRARY FOR TOUCH PROCESSING robotic and machine learning community by enabling new We now describe the desired specifications considered capabilities and accelerate research. during the design of our touch processing framework, and II. RELATED WORK the corresponding design choices adopted. In addition, we describe the software architecture of PyTouch, and several Developed fields in computer science build application of its key features. and research upon frameworks. There are many frameworks existing today which bring together common paradigms used A. Desired Specifications in their respective domains while providing benefit to the community. One example of such library is OpenCV [3] PyTouch aims to provide the same function in the domain for the computer vision community, which is a well known of tactile touch processing through the introduction of an and probably the most widely-used data processing library open-source library in which proven, tested and performance supporting computer vision topics. validated modules allow for scaling touch based applications, Frameworks which are applicable to the domain of tactile research and experimentation. This work aims to set forth a touch sensing are far and few between or not available need for the unification of tactile processing based software through open-source channels. Observing public repositories which not only strengthens the field but also decreases the and open-source networks, we see very little open-source barrier to entry in designing applications and research which work pertaining to building a tactile processing framework. utilize tactile touch. As such, the main goals of PyTouch are Although there is work in isolated instances which provide to provide a library that is powered by pre-trained models insight into specific problems such as interfacing with robotic through the ”as a service” paradigm, enables transfer learning skin and providing interfaces for applications [8], [9], there through the PyTouch for extending models for new hardware are no works which brings commonly used paradigms within sensors, and deploys joint models which work across a the field of tactile touch sensing together or that provide variety of vision based tactile sensors. general purpose tools. Previous attempts include studying By centralizing touch processing tasks into a software friction models [10], physical property understanding [11], library and through the models provided through PyTouch, [12], [13], in-hand manipulation, determining physical ma- this provides a path forward for sensor generalization. This nipulation characteristics and grasp success. [14] demon- is important due to the variance in produced hardware, strates individual capabilities related to tactile touch which image transfer layers and mechanical construction of lab- allow for robotic learning in providing given forces to built sensors. an object, determining density and texture through stirring PyTouch is designed in mind to cater to first time and actions, and various types of in hand or arm rotation of power users of the tactile touch community. High level objects. Other work such as [15] learns robotic grasp success abstractions of common goals are provided such as, but not through an end-to-end action-conditional model based off of limited to: “am I touching an object”, “is the object slipping”, raw tactile sensor input data. [16] introduces TactileGCN or “what is the touch contact area of the object”. Diving which estimates grasp stability through a graph convolutional deeper, the library can be used to perform analysis techniques network. PyTouch aims at being a general purpose and open- such as augmenting the input sensor data, modifying the source solution that combines useful computational tools and models used for inference, providing motion and optical flow capabilities into a single framework and thereby proposes outputs, or transfer learning techniques for extending the an agnostic library for tactile sensors regardless of their library to non-vision based sensors. hardware or sensing modalities. Another contribution of PyTouch is to push towards B. PyTouch Architecture hardware standardization by providing flexible interfaces. Pre-trained Models. PyTouch is built upon a library of Observing the tactile touch community, we saw common pre-trained models which provide real-time touch processing trends amongst the hardware used for acquiring touch input: functionalities. The pre-trained models are downloaded after
PyTouch Touch Detection Application Initialization Live Slip Detection Input Selection PyTouch Interface Contact Area Input Capture Model Pre-Loading Sensor Experiments PyTouch ... Model Library Experiment Static Training Pre-Processing Normalization Packaged model with: Data Transformation Hyper-parameters Weights Research Video Fig. 1: PyTouch high level architecture where tactile touch processing is delivered to the end application ”as a service” through released pre-trained models. initializing the library with the desired sensor, an exam- the weights can then be used through PyTouch for different ple follows in Lst. 1. Currently, the library supports the hardware input devices without changing the application DIGIT [5], OmniTact [6] and GelSight [4] sensors, but the software. Tactile Task Abstractions. PyTouch provides implemen- Listing 1: Instantiating PyTouch with DIGIT sensor to detect tations of commonly used features in the tactile touch touch and slip. PyTouch reduces the amount of code required domain. The aim of the library is to reduce friction when to do advanced tasks such as these. experimenting with hardware implementations and robotic import pytouch as pt workspaces utilizing touch. Two interfaces are provided in initializing the library for use in applications. The first digit pt = pt.init(pt.sensors.DigitSensor, tasks={pt.tasks. TouchDetect, pt.tasks.SlipDetect}) interface, Lst. 1, abstracts all internal modules into a single module while the second interface allows for individual digit = ... # Connect and initialize DIGIT sensor module level access to each task for power users. touch, certainty = digit pt.is touched(digit.get frame()) With each interface, an input device is registered which slipping, = digit pt.is slipping(digit.get frames()) print(f”Touching object: {touch}, with certainty: {certainty}; loads the appropriate pre-trained models. These pre-trained Object Slipping: {slipping},”) models are downloaded during initialization or when the appropriate class is instantiated in the background. The reg- library can be used to train models using data from other istered input device provides defaults as to which pre-trained vision or non-vision based tactile sensors. At the current models to use, and defines the internal data transformation state, PyTouch provides the following functions: contact to the models. Additionally, PyTouch supports extending classification, slip detection, contact area estimation, and the input devices for custom hardware through adding an interfaces for training and transfer learning. By developing additional input device module, or by specifying the custom this library with modularity and scalability in mind, we device module which allows the user to provide custom pre- plan to extend features and provide new models ”as a trained models. service” to the community and supporting new additions PyTouch is built upon a proven and widely used machine from the community itself. Additionally, deploying these learning framework, PyTorch [1]. models ”as a service” aids in the experimentation of different Model and Network Agnostic. One of the crucial benefits approaches to the same task by swapping pre-trained models of PyTouch is the abstraction of high-level function calls for another without changing high-level software. This allows which allows end-users to integrate their own models specific for performance benchmarking of real-world experiments for to experiments, sensors, or use our pre-trained models and the purposes of creating a tactile task baseline. perform fine-tuning. PyTouch introduces standardization of Sensor Abstraction. PyTouch can be used in a number of learning-based methods through the application program- ways such as, providing pre-captured video streams, static ming interface of being able to swap models as needed, images, or live video captured from an input device. The or change them through initialization parameters. PyTouch library is instantiated with a known hardware device which abstracts the low level details in order to allow new users to in the background provides inference and functionality based experiment through models served by the PyTouch team, or on the hardware specific pre-trained models. The set of sup- through the community. This ultimately allows PyTouch to ported devices is extendable through the library as well, and be agnostic to model support in supporting conventional deep they may be initialized by supplying a custom configuration CNN models and through non-conventional learning based file with models respective to that device. PyTouch provides methods. Furthermore, this abstraction enables new users to the training routines for each library capability. Once trained, touch processing to begin experimentation without having
Sensor DIGIT [5] OmniTact [6] GelSight [4] Model Single models (no reference) 95.5 ± 1.2 98.4 ± 1.1 93.7 ± 1.2 Single models (with reference) 95.8 ± 1.3 98.5 ± 1.1 N/A Joint model (no reference) 96.1 ± 1.0 99.1 ± 0.4 98.3 ± 0.6 Joint model (with reference) 96.2 ± 1.1 99.5 ± 0.3 N/A TABLE I: Classification accuracy [%] of touch detection (mean and std) using cross-validation (k = 5). The joint models are trained with data from all three sensors: DIGIT, OmniTact and GelSight. 1.00 Validation Accuracy 0.98 0.96 0.94 0.92 Fig. 2: Examples of data used to train our touch prediction 0.90 Single Model models. The dataset include data across several DIGITs, 0.88 Joint Model OmniTact, and GelSight sensors showing different lightning 0 500 1000 1500 2000 2500 3000 conditions and objects of various spatial resolutions. Train Dataset Size Fig. 3: Cross validation (k = 5) accuracy with varying train dataset size for single and joint models. With the same knowledge of model, network and parameter design choices amount of data, training a joint model using data across but rather to focus on the experimentation itself. multiple sensors (i.e., DIGITs, OmniTact and GelSight) results in better model performance compared to training IV. EXPERIMENTAL RESULTS from a single sensor. We now provide several examples which outline the advantages in using the PyTouch library. Predominantly, the flexibility of the library to work with different vision- grasp images from two GelSight sensors. We used only a based tactile sensors such as DIGIT [5], OmniTact [6] and subset of the total dataset for a total of approximately 1000 GelSight [4]. Secondly, the inherent design of the library to images. The input images collected for DIGIT are images enable capabilities through the ”as a service” paradigm. Py- with size 240 × 320 × 3, OmniTact images are 480 × 640 × 3 Touch releases pre-trained models which are tested, validated and GelSight images are 1280 × 960 × 3. and supports additions via the community to extend library Training and Architecture. The touch detection capa- capabilities. bility is based off of a pre-trained ResNet-18 model. Two modalities are used for training, the first is where images are A. Touch Detection provided to the model without a reference and the second A first functionality that PyTouch provides is touch de- is where the image is concatenated with a reference no- tection. This is a basic functionality which is often used as touch image. The reference is unique to each sensor which a subroutine in more complicated manipulation sequences. is a frame where the sensor is not touching any object. However, training high-accuracy touch detection models can The input data is normalized and down-sampled to 64 × 64. be difficult and require extensive data collections. Here we The data was split to ensure an equal distribution of device show that using one DIGIT sensor, we can determine if serial numbers. For training with a reference no-touch image, an object is being touched across multiple variations of a unique reference image was provided to the model and that same sensor. The robustness of the library depends on concatenated to each input image. Finally, a joint dataset the generalization of the model and due to manufacturing was created to include all devices in order to show model variances in the sensors and image transfer layers, no one robustness and generalization. sensor is exactly the same. Results and Ablative Study. A model specific to each Formulation. Touch detection is formulated as an image sensor and a joint model comprising of all sensor data across classification task where given an input image X, we assign each unique sensor was trained. We show in Table I that the a label y which is 0 for data without touch and 1 for data joint model performs better across a 5-fold cross validation with registered touch. test. The model trained for the OmniTact sensor shows higher Datasets. The dataset for DIGITs was collected touch- performance than the DIGIT model, however, observing the ing random household and office objects of various spatial data in the OmniTact dataset, sample images shown in Fig. 2, resolutions. This dataset was split by unique device serial we see that the breadth of sample uniqueness is minimal. numbers in contact, with over overall 2278 samples from 3 We also show the effects of constraining the train dataset different sensors. The OmniTact dataset was provided by the sizes ranging from 10% to 100% of the total dataset size authors of [6] and consisted of 1632 random presses of a in Fig. 3 that creating a joint model which uses multiple single object. The GelSight samples were extracted from the sensors results in better performance while using a similar dataset provided in [15], which consisted of before and after amount of data in the training dataset. To our knowledge,
1.0 Validation Accuracy From scratch estimates the direction of contact with respect to the surface Pre-trained 0.8 of the touch sensor. PyTouch provides the contact area estimation capability 0.6 for many input formats such as raw sensor input, video and 0.4 static images. Additionally, providing centroid information and estimated size of the semi-major and semi-minor axis of 10 20 30 40 50 Percent of Total Dataset the fitted ellipse. Fig. 4: Cross validation (k = 5) accuracy with varying C. Library Experimentation train dataset size when training a model from scratch or starting from the pre-trained joint model. The validation is There are a number of parameters available to physical performed on dataset of size 482 from a DIGIT sensor. Using hardware sensor design used in the field of tactile touch the joint model to fine-tune against a new dataset results in sensing and due to the cumbersome nature of hardware better model performance with lower dataset sizes compared development, progress and iteration is often slow and costly. to training a the model from scratch for the same wall- Often, it is valuable to experiment with physical based clock time of 30 s. Thus PyTouch can provide benefit to parameters prior to building a full prototype to reduce the end applications when using the ”as a service” pre-trained number of iteration cycles. PyTouch proposes a module models. which allows in the experimentation of these physical based parameters against previous benchmarks obtained by the joint and single sensor models. We show in Fig. 7 that for the DIGIT [5] sensor which relies on RGB lighting that this is the first study on multiple sensors manufacturers and a physical hardware change to using monochromatic light sensor variants to show benefit of a touch processing library. results in less performance compared to chromatic light. Furthermore, we show a direct benefit of using PyTouch in regards to fine-tuning a smaller dataset acquired from a D. Slip Detection new sensor on the joint model. A large dataset was collected Slip detection is crucial for many robotics tasks e.g.in- from a new sensor and then constrained from 50% to 10% hand manipulation, grasping. In PyTouch, we incorporate a of the initial size. The results on the touch detection task slip detection module as a pre-trained model. We also provide show that using a pre-trained model can both increase the APIs for training customized models. The module design and final performance and the training wall-clock time when implementation are described as follows. using our pre-trained model and then fine-tuning, compared Formulation. We formulate this task as a video classifi- to a model trained exclusively from new data. We ablate cation task: given an image sequence X = {x1 , · · · , xT }, this experiment over different dataset size to show how the we assign a label y from a discrete set Y. We additionally advantages are more pronounced the smaller the dataset used label the slip starting and ending timestamp ts and te for for fine-tuning is. In Fig. 4 we show the results of the fine- extracting shorter sub-sequences. Specifically, the xt is an tuned model compared to the model trained from the dataset RGB image with size 240 × 320 × 3, the label y is a binary alone for the same wall-clock time of 30 s. Furthermore, label with 1 representing slip and 0 representing non-slip. We we shown in Fig. 5 that introducing new sensors using note that both rotational and translational slip are labeled as PyTouch as a base results in better performance with less slip without distinction. samples through transfer learning from the released pre- Dataset. We collected 40 objects with a large spectrum of trained models PyTouch provides. surface finishes, roughness, and features. This can be thought The PyTouch library is designed with the goal to sup- of as “texture-ness” from the heavily textured pen holder port real time experimentation with input into the touch to an almost textureless plastic bottle. For each object, we detection model resulting in an average performance of collected 10 slip sequences and 10 non-slip sequences. Each 5.89 ± 0.46 ms. Tactile input sensors are most commonly sequence is consist of 128 frames in total. All the data are seen with input capture rates of 24 to 60 frames per second collected by using DIGIT sensor to slide or touch an object which places each frame between 17 and 41 ms. This enables with 30 Hz frequency [5]. PyTouch for use in real time applications which require high Training and Architecture. We used ResNet-18 and frame rates of up to 140 fps. ResNet-18-based 3D convolutional neural network as our architecture. For both architecture, we concatenate all the B. Contact Area Estimation images as input. We use cross-entropy loss as the objective When grasping an object with a single or multiple sensor of the binary classification task. We follow the standard it is useful to know the surface contact area of the object approach to normalize the input data, downsample the image against the sensor. We show an example of this in Fig. 6 to 112 × 112 and use BatchNorm for better performance. where a number of object contact areas are estimated with We use Adam optimizer with learning rate 0.001. Since it is single and multiple point features using a DIGIT [5] tactile video classification, we compare the results in two data split sensor. The contact area estimated is not referenced to a setting. The first setting is “split by sequence“ where we use ground truth, but rather provides the areas of contacts and 20% of the data from each object as test sequences. The
1.0 1.0 1.0 1.0 1.0 Train Fine Tune Frozen 0.9 0.9 0.9 0.9 0.9 Train Fine Tune Unfrozen 0.8 0.8 0.8 0.8 0.8 0.7 0.7 0.7 0.7 0.7 Validation Accuracy 0.6 0.6 0.6 0.6 0.6 0.5 0.5 0.5 0.5 0.5 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 Epoch, Dataset Size=67 Epoch, Dataset Size=135 Epoch, Dataset Size=203 Epoch, Dataset Size=271 Epoch, Dataset Size=339 1.0 1.0 1.0 1.0 1.0 0.9 0.9 0.9 0.9 0.9 0.8 0.8 0.8 0.8 0.8 0.7 0.7 0.7 0.7 0.7 0.6 0.6 0.6 0.6 0.6 0.5 0.5 0.5 0.5 0.5 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 Epoch, Dataset Size=67 Epoch, Dataset Size=135 Epoch, Dataset Size=203 Epoch, Dataset Size=271 Epoch, Dataset Size=339 Fig. 5: Comparison of model trained just from the dataset collected from the GelSight [4] sensor (”Train”) against model trained using the pre-trained joint model and fine-tuning the last 3 layers (”Fine Tune Unfrozen”) or just the last layer (”Fine Tune Frozen”). The results show using the pre-trained and fine-tuning the last 3 layers model significantly improve the performance of the models. This suggests that the pre-trained model learned useful features that can generalize to different sensors. split. In the same table, we also extract a shorter sequence with 12 frames (approximately 400 milliseconds) as in [18] from the starting timestamp ts to ts+12 . We find the performance drops slightly to around 90%. Fig. 6: Examples of contact area estimations using 4 objects: V. CONCLUSION glass ball, pen, cable and fork As the field of touch sensing evolves, a need arises for ad- vanced touch processing functionalities capable of efficiently 1.0 extracting information from raw sensor measurements. Prior Validation Accuracy 0.9 work provided some of these functionalities, but typically ad-hoc – for single sensors, single applications, and not 0.8 necessarily with reusable code (e.g., open-source). To better 0.7 support and grow the tactile sensing community, we believe RGB Grayscale 0.6 that is necessary the creation of a software library which 0 10 20 30 40 50 Epoch brings together advancements in the touch processing field through a common interface. To address this need, we Fig. 7: DIGIT [5] sensor loss in performance when compar- introduce PyTouch, a unified library for touch processing. ing monochromatic light to RGB lighting Our goal with this library is to enable academics, researchers and experimenters to rapidly scale, modify and deploy new machine learning models for touch sensing. In this paper, we second split is “split by objects” where we withhold 20% detailed the design choices and the corresponding software objects as test data. We note that the latter is much harder architecture of this library. PyTouch is designed with open- than the former due to the need of generalization. source at the core, enabling tactile processing ”as-a-service” Results and Ablative Study. In Table II, we show the through distributed pre-trained models, and a modular and classification accuracy with all 128 frames or 12 frames as expandable library architecture. We demonstrate several of inputs. We notice that the original ResNet-18 [17] outper- the feature of PyTouch on touch detection, slip and object forms 3D conv net. We attribute this to the high variance of pose estimation experiments using 3 different tactile sensors, slip velocity in the dataset that confuse the 3D convolutional and we show that a unified tactile touch library provides kernel. We also find that the accuracy on “split by object” benefit to rapid experimentation in robotic tactile process- is lower than “split by sequence”, which confirms our claim ing. We open-source PyTouch at https://github.com/ about the requirement for better generalizability in the former facebookresearch/pytouch, and aim in continuing the evolution of this library to further the field of touch processing. Split by Sequences [%] Split by Objects [%] len = 12 len = 128 len = 12 len = 128 ACKNOWLEDGMENT ResNet-18 90.6 97.9 89.2 96.3 3D Conv N/A 91.7 N/A 81.2 We thank Omry Yadan for insightful discussions and assistance to integrate Hydra [19], Frederik Ebert, Stephen TABLE II: Slip detection accuracy averaged over all trials Tian and Akhil Padmanabha for providing the OmniTact with different architectures, number of frames and dataset dataset, and Joe Spisak for helping with community building split methods. and project vision.
R EFERENCES International Conference on Robotics and Automation (ICRA), 2016, pp. 536–543. [1] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, [12] A. Burka, S. Hu, S. Helgeson, S. Krishnan, Y. Gao, L. A. Hendricks, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, T. Darrell, and K. J. Kuchenbecker, “Proton: A visuo-haptic data E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, acquisition system for robotic learning of surface properties,” in IEEE L. Fang, J. Bai, and S. Chintala, “Pytorch: An imperative style, high- International Conference on Multisensor Fusion and Integration for performance deep learning library,” in Advances in neural information Intelligent Systems (MFI), 2016, pp. 58–65. processing systems, 2019, pp. 8026–8037. [13] A. Burka, S. Hu, S. Krishnan, K. J. Kuchenbecker, L. A. Hendricks, [2] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, Y. Gao, and T. Darrell, “Toward a large-scale visuo-haptic dataset S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for for robotic learning,” in proceedings of the Workshop on the Future fast feature embedding,” arXiv preprint arXiv:1408.5093, 2014. of Datasets in Vision at Computer Vision and Pattern Recognition [3] G. Bradski, “The OpenCV Library,” Dr. Dobb’s Journal of Software (CVPR), 2015. Tools, 2000. [14] B. Belousov, A. Sadybakasov, B. Wibranek, F. Veiga, O. Tessmann, [4] W. Yuan, S. Dong, and E. H. Adelson, “Gelsight: High-resolution robot and J. Peters, “Building a library of tactile skills based on fingervi- tactile sensors for estimating geometry and force,” Sensors, 2017. sion,” 2019. [5] M. Lambeta, P.-W. Chou, S. Tian, B. Yang, B. Maloon, V. R. Most, [15] R. Calandra, A. Owens, D. Jayaraman, W. Yuan, J. Lin, J. Malik, D. Stroud, R. Santos, A. Byagowi, G. Kammerer, D. Jayaraman, and E. H. Adelson, and S. Levine, “More than a feeling: Learning to grasp R. Calandra, “DIGIT: A novel design for a low-cost compact high- and regrasp using vision and touch,” IEEE Robotics and Automation resolution tactile sensor with application to in-hand manipulation,” Letters (RA-L), vol. 3, no. 4, pp. 3300–3307, 2018. IEEE Robotics and Automation Letters (RA-L), vol. 5, no. 3, pp. 3838– [16] A. Garcia-Garcia, B. S. Zapata-Impata, S. Orts-Escolano, P. Gil, and 3845, 2020. J. Garcia-Rodriguez, “Tactilegcn: A graph convolutional network for [6] A. Padmanabha, F. Ebert, S. Tian, R. Calandra, C. Finn, and S. Levine, predicting grasp stability with tactile sensors,” in International Joint “Omnitact: A multi-directional high-resolution touch sensor,” IEEE Conference on Neural Networks (IJCNN), 2019, pp. 1–8. International Conference on Robotics and Automation (ICRA), 2020. [17] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning [7] A. C. Abad and A. Ranasinghe, “Visuotactile sensors with emphasis for image recognition,” CoRR, vol. abs/1512.03385, 2015. [Online]. on gelsight sensor: A review,” IEEE Sensors Journal, vol. 20, no. 14, Available: http://arxiv.org/abs/1512.03385 pp. 7628–7638, 2020. [18] J. Li, S. Dong, and E. Adelson, “Slip detection with combined tactile [8] S. Youssefi, S. Denei, F. Mastrogiovanni, and G. Cannata, “Skinware and visual information,” in IEEE International Conference on Robotics 2.0: a real-time middleware for robot skin,” SoftwareX, vol. 3, pp. and Automation (ICRA), 2018, pp. 7772–7777. 6–12, 2015. [19] O. Yadan, “Hydra - a framework for elegantly configuring [9] ——, “A real-time data acquisition and processing framework for complex applications,” Github, 2019. [Online]. Available: https: large-scale robot skin,” Robotics and Autonomous Systems, vol. 68, //github.com/facebookresearch/hydra pp. 86–103, 2015. [10] H. Culbertson, J. J. Lopez Delgado, and K. J. Kuchenbecker, “The penn haptic texture toolkit for modeling, rendering, and evaluating haptic virtual textures,” 2014. [11] Y. Gao, L. A. Hendricks, K. J. Kuchenbecker, and T. Darrell, “Deep learning for tactile understanding from visual and haptic data,” in IEEE
You can also read