Augmented and Virtual Reality - From Daily Life to Assisted Robotics Seminar Media Technology - TUM

Page created by Fernando Ingram

Society

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Augmented and Virtual Reality - From Daily Life to Assisted Robotics Seminar Media Technology - TUM

Augmented and Virtual Reality
                  From Daily Life to Assisted Robotics

Seminar Media Technology
SS 2021
Chair of Media Technology
Technical University of Munich

Chair of Media Technology
      Prof. Dr.-Ing. Eckehard Steinbach                                                                              Technische Universität München

Seminar Topics
1. AR/VR-based Human-Robot Interfaces
2. VR and AR Interfaces for Robot Learning from Demonstration
3. Intuitive Teleoperation using Augmented Reality
4. Improving Teleoperated Driving Using an Augmented Reality Representation
5. Deep Learning for 6D pose estimation and its applications in AR
6. Deformable Object Tracking for AR
7. Marker-based Augmented Reality
8. Visual Place Recognition
9. Activity Recognition for Augmented Reality and Virtual Reality
10. Augmented and Virtual Haptics
11. Generation of Realistic Virtual Views
12. Video coding optimization of virtual reality 360-degree Video

14/04/21                          Michael Adam | Scientific Seminar Media Technology | Augmented & Virtual Reality                                    6

Chair of Media Technology
Prof. Dr.-Ing. Eckehard Steinbach Technische Universität München

AR/VR-based Human-Robot Interfaces
Supervision: Furkan Kaynar (furkan.kaynar@tum.de)

Augmented, virtual or mixed reality applications attract interest in many fields including robotics. The design of an
human-robot interface is of great importance as it determines the capacity and limitations of the interaction between
the human and the robot. The type of the human input is defined by the user interface itself. Due to their intuitive
nature, AR/VR-based interfaces may improve human demonstrations for robotic tasks. The demonstrations can be
provided via virtual tools rendered in a real or simulated scene. AR/VR-based interfaces may also improve the
understanding of the planned robotic tasks by the human. In this topic, we will investigate the methods and
applications of the AR/VR-based interfaces for robotic applications including but not limited to semiautonomous
teleoperation and robot programming.

References:
[1] Gadre, Samir Yitzhak, et al. "End-user robot programming using mixed reality." 2019 International conference on robotics and
automation (ICRA). IEEE, 2019.
[2] Rosen, Eric, et al. "Communicating and controlling robot arm motion intent through mixed-reality head-mounted displays." The
International Journal of Robotics Research 38.12-13 (2019): 1513-1526.
[3] Walker, Michael E., Hooman Hedayati, and Daniel Szafir. "Robot teleoperation with augmented reality virtual surrogates." 2019
14th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 2019.
[4] Makhataeva, Zhanat, and Huseyin Atakan Varol. "Augmented reality for robotics: a review." Robotics 9.2 (2020): 21.
[5] Kent, David, Carl Saldanha, and Sonia Chernova. "Leveraging depth data in remote robot teleoperation interfaces for general
object manipulation." The International Journal of Robotics Research 39.1 (2020): 39-53.

14/04/21 Michael Adam | Scientific Seminar Media Technology | Augmented & Virtual Reality 7

Chair of Media Technology
Prof. Dr.-Ing. Eckehard Steinbach Technische Universität München

VR and AR Interfaces
for Robot Learning from Demonstration
Supervision: Basak Guelecyuez (basak.guelecyuez@tum.de)

In the context of robotics and automation, learning from demonstration (LfD) refers to the framework of
endowing robots with autonomy in variety of tasks by making use of the demonstrations provided by a
human teacher. A classical approach for interacting with the robot during demonstrations is kinesthetic
teaching, where the human teacher physically guides the robot. In this topic we would like to investigate
novel approaches with VR and AR interfaces for human-robot interaction in LfD. For example, VR interfaces
are used to teleoperate a robot when providing demonstrations. AR interfaces are further employed to asses
the learned task and determine failure cases of the robot autonomy.

References:
[1] M. Diehl, A. Plopski, H. Kato and K. Ramirez-Amaro, "Augmented Reality interface to verify Robot Learning," 2020 29th IEEE
International Conference on Robot and Human Interactive Communication (RO-MAN), Naples, Italy, 2020, pp. 378-383.
[2] H. Liu, Y. Zhang, W. Si, X. Xie, Y. Zhu and S. Zhu, "Interactive Robot Knowledge Patching Using Augmented Reality," 2018 IEEE
International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 2018, pp. 1947-1954.
[3] Luebbers M. B., Brooks C., Kim M. J., Szafir D., Hayes B. Augmented reality interface for constrained learning from
demonstration. In: Proceedings of the 2nd International Workshop on Virtual, Augmented and Mixed Reality for HRI (VAM-HRI);
2019.
[4] D. Bambuŝek, Z. Materna, M. Kapinus, V. Beran and P. Smrž, "Combining Interactive Spatial Augmented Reality with Head-
Mounted Display for End-User Collaborative Robot Programming," 2019 28th IEEE International Conference on Robot and Human
Interactive Communication (RO-MAN), New Delhi, India, 2019, pp. 1-8.
[5] T. Zhang et al., "Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation," 2018 IEEE
International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 2018, pp. 5628-5635.

14/04/21 Michael Adam | Scientific Seminar Media Technology | Augmented & Virtual Reality 8

Chair of Media Technology
Prof. Dr.-Ing. Eckehard Steinbach Technische Universität München

Intuitive Teleoperation
using Augmented Reality
Supervision: Edwin Babaians (edwin.babaians@tum.de)

Teleoperation remains a dominant control paradigm for human interaction with robotic systems. However,
teleoperation can be quite challenging, especially for novice users. Even experienced users may face difficulties or
inefficiencies when operating a robot with unfamiliar and/or complex dynamics, such as industrial manipulators or
aerial robots, as teleoperation forces users to focus on low-level aspects of robot control, rather than higher level
goals regarding task completion, data analysis, and problem solving. We will explore how advances in augmented
reality (AR) may enable the design of novel teleoperation interfaces that increase operation effectiveness, support the
user in conducting concurrent work, and decrease stress. In addition, AR could help in shortening the learning curve,
so that the operators become proficient in the teleoperation setup and can thus perform better with just a short
familiarization with the system.
References:
[1] Birkenkampf, Peter, Daniel Leidner, and Christoph Borst. "A knowledge-driven shared autonomy human-robot interface for
tablet computers." In 2014 IEEE-RAS International Conference on Humanoid Robots, pp. 152-159. IEEE, 2014.
[2] Brizzi, Filippo, Lorenzo Peppoloni, Alessandro Graziano, Erika Di Stefano, Carlo Alberto Avizzano, and Emanuele Ruffaldi.
"Effects of augmented reality on the performance of teleoperated industrial assembly tasks in a robotic embodiment." IEEE
Transactions on Human-Machine Systems 48, no. 2 (2017): 197-206.
[3] Livatino, Salvatore, Dario C. Guastella, Giovanni Muscato, Vincenzo Rinaldi, Luciano Cantelli, Carmelo D. Melita, Alessandro
Caniglia, Riccardo Mazza, and Gianluca Padula. "Intuitive Robot Teleoperation through Multi-Sensor Informed Mixed Reality Visual
Aids." IEEE Access 9 (2021): 25795-25808.
[4] Walker, Michael E., Hooman Hedayati, and Daniel Szafir. "Robot teleoperation with augmented reality virtual surrogates." In 2019
14th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 202-210. IEEE, 2019.
[5] Hernández, Juan David, Shlok Sobti, Anthony Sciola, Mark Moll, and Lydia E. Kavraki. "Increasing robot autonomy via motion
planning and an augmented reality interface." IEEE Robotics and Automation Letters 5, no. 2 (2020): 1017-1023.

14/04/21 Michael Adam | Scientific Seminar Media Technology | Augmented & Virtual Reality 9

Chair of Media Technology
Prof. Dr.-Ing. Eckehard Steinbach Technische Universität München

Improving Teleoperated Driving
Using an Augmented Reality Representation
Supervision: Markus Hofbauer (markus.hofbauer@tum.de)

Teleoperated driving can be a possible fallback candidate to resolve failures of autonomous vehicles. One of
the main problems is the representation of the sensor data transmitted from the autonomous vehicle to the
human operator in a way, that the operator can understand the current traffic situation [1]. While regular
displays are often the first choice, the immersive telepresence of head mounted displays allows for further
mixed reality representations to improve the operator situation awareness [2], [3].

The task of this project is analyze different augmented/virtual reality approaches to improve the sensor
representation in driving.

References:
[1]J.-M. Georg und F. Diermeyer, „An Adaptable and Immersive Real Time Interface for Resolving System Limitations of Automated
Vehicles with Teleoperation“, in 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC), Bari, Italy, Okt. 2019,
S. 2659–2664, doi: 10.1109/SMC.2019.8914306.
[2]A. Hosseini und M. Lienkamp, „Enhancing telepresence during the teleoperation of road vehicles using HMD-based mixed
reality“, in 2016 IEEE Intelligent Vehicles Symposium (IV), Gotenburg, Sweden, Juni 2016, S. 1366–1373, doi:
10.1109/IVS.2016.7535568.
[3]P. Gomes, C. Olaverri-Monreal, und M. Ferreira, „Making Vehicles Transparent Through V2V Video Streaming“, IEEE Transactions
on Intelligent Transportation Systems, Bd. 13, Nr. 2, S. 930–938, Juni 2012, doi: 10.1109/TITS.2012.2188289.

14/04/21 Michael Adam | Scientific Seminar Media Technology | Augmented & Virtual Reality 10

Chair of Media Technology
Prof. Dr.-Ing. Eckehard Steinbach Technische Universität München

Deep Learning for 6D pose estimation
and its applications in AR
Supervision: Diego Prado (diego.prado@tum.de)

Estimating the 6D pose of an object from camera images can be very valuable for robotic assembly and maintenance
applications. Together with Augmented Reality, step-by-step guidance can be provided to the user in assembling an
object consisting of multiple components.
In recent years Neural Networks have been applied with great success and have proven to provide fast and robust
results. The goal of this seminar topic is to to study state of the art Deep Learning techniques for 6D pose estimation
and their possible applications in AR.

References:
[1] Su, Y., Rambach, J., Minaskan, N., Lesur, P., Pagani, A., & Stricker, D. “Deep multi-state object pose estimation for augmented
reality assembly.” ISMAR-Adjunct 2019
[2] He, Z.; Feng, W.; Zhao, X.; Lv, Y. “6D Pose Estimation of Objects: Recent Technologies and Challenges.” Appl. Sci. 2021
[3] Su, Y., Rambach, J., Pagani, A., & Stricker, D. ”SynPo-Net—Accurate and Fast CNN-Based 6DoF Object Pose Estimation Using
Synthetic Training” Sensors, 2021

14/04/21 Michael Adam | Scientific Seminar Media Technology | Augmented & Virtual Reality 11

Chair of Media Technology
Prof. Dr.-Ing. Eckehard Steinbach Technische Universität München

Deformable Object Tracking for AR
Supervision: Michael Adam (michael.adam@tum.de)

Before augmenting reality, one needs to understand the environment which should be changed. Understanding static
scenes and its objects has become easier due to the introduction of deep neural networks. However, when interacting
with the scene, objects may change size or even appearance. This is the case with deformable objects. Tracking
deformable objects can be tricky and several techniques have been developed in order to augment them over time.

During the seminar you should first make yourself familiar with object tracking in the scope of AR, for instance [1,2]
and find different techniques for tracking deformable objects. Physic-based simulations [3,4] as well as methods with
specialized setups exist [5].

References:
[1] Park, Youngmin, Vincent Lepetit, and Woontack Woo. "Multiple 3d object tracking for augmented reality." 2008 7th IEEE/ACM International
Symposium on Mixed and Augmented Reality. IEEE, 2008.
[2] Tsoli, Aggeliki, and Antonis A. Argyros. "Joint 3d tracking of a deformable object in interaction with a hand." Proceedings of the European
Conference on Computer Vision (ECCV). 2018.
[3] Haouchine, Nazim, et al. "Physics-based augmented reality for 3d deformable object." Eurographics Workshop on Virtual Reality Interaction
and Physical Simulation. 2012.
[4] Paulus, Christoph J., et al. "Augmented reality during cutting and tearing of deformable objects." 2015 IEEE International Symposium on Mixed
and Augmented Reality. IEEE, 2015.
[5] Fujimoto, Yuichiro, et al. "Geometrically-correct projection-based texture mapping onto a deformable object." IEEE transactions on
visualization and computer graphics 20.4 (2014): 540-549.

14/04/21 Michael Adam | Scientific Seminar Media Technology | Augmented & Virtual Reality 12

Chair of Media Technology
Prof. Dr.-Ing. Eckehard Steinbach Technische Universität München

Marker-based Augmented Reality
Supervision: Martin Oelsch (martin.Oelsch@tum.de)

Augmented Reality (AR) employs computer vision, image processing and computer graphics techniques to merge digital content into
the real world. It enables real-time interaction between the user, real objects and virtual objects. AR can, for example, be used
to embed 3D graphics into a video in such a way as if the virtual elements were part of the real environment [1].

One of the challenges of AR is to align virtual data with the environment. A marker-based approach solves the problem
using visual markers, e.g. 2D bar-codes, detectable with computer vision methods. Once the markers are detected, the geometry of
the currently viewed part of the environment can be estimated by computing the pose of the marker.

The student is required to give a good overview of marker-based augmented reality and explain the methodology of marker detection,
pose estimation in the context of augmented reality applications mathematically.

References:
[1] Siltanen, Sanni. (2012). Theory and applications of marker based augmented reality.
[2] Sadeghi-Niaraki, A.; Choi, S.-M. A Survey of Marker-Less Tracking and Registration Techniques for Health & Environmental Applications to
Augmented Reality and Ubiquitous Geospatial Information Systems. Sensors 2020

14/04/21 Michael Adam | Scientific Seminar Media Technology | Augmented & Virtual Reality 13

Chair of Media Technology
Prof. Dr.-Ing. Eckehard Steinbach Technische Universität München

Visual Place Recognition
Supervision: Sebastian Eger (sebastian.eger@tum.de)

A major task for realistic AR/VR applications is to retrieve an accurate pose (position and orientation) estimation for
the device (e.g smartphone). Most commonly, sensors like cameras and Inertial Measurement Units (IMUs) are used
to run a Visual Inertial Odometry (VIO) [1]. Based on the pose estimation of each frame, a dense 3D map of the
environment can be build [2]. This procedure is called Simultaneous Localization and Mapping (SLAM).

After that, the AR/VR application can render and place realistic augmented objects into the scene. However,
sometimes there already exists a large-scale, but sparse map of the environment. If we want to display position-based
information, we first need a pose estimation in this global map. Since the global map does not contain very detailed
(local) information, we then can initialise the SLAM at the global pose estimation.

In this seminar, the student shall research state-of-the-art visual place recognition methods and algorithms. If desired,
we can provide indoor image sequences to test out and evaluate different methods. A good starting point for the
literature research is: https://paperswithcode.com/task/visual-place-recognition

References:
[1] Tong Qin, Peiliang Li, and Shaojie Shen, ‘VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator’, IEEE
Transactions on Robotics 34, no. 4 (August 2018): 1004–20
[2] Richard A. Newcombe et al., ‘KinectFusion: Real-Time Dense Surface Mapping and Tracking’, in 2011 10th IEEE International
Symposium on Mixed and Augmented Reality, 2011, 127–36.

14/04/21 Michael Adam | Scientific Seminar Media Technology | Augmented & Virtual Reality 14

Chair of Media Technology
Prof. Dr.-Ing. Eckehard Steinbach Technische Universität München

Activity Recognition for
Augmented Reality and Virtual Reality
Supervision: Marsil Zakour (marsil.zakour@tum.de) Yuankai Wu (yuankai.wu@tum.de)

AR/VR based methods are very beneficial for us to collect data sets of human daily living and predict human activity. Capturing real
human daily living data is critical for learning human activity that can later be transferred to robots. Recent improvements in virtual
reality (VR) head-mounted displays provide a viable method for collecting data on human activity without the difficulties often
encountered in capturing performance in a physical environment [1].

Furthermore, [2] uses conventional AR-based method to make predictions about human activity. Another work is driven by the idea of
moving from traditional augmented reality (AR) systems, which are typically limited to visualization and tracking components, to
augmented reality cognitive systems, which possess or gradually build knowledge about the user's situation and intent [3].

You need to explore state of the art approaches by understanding the techniques mentioned above for predicting human activity using
augmented reality or virtual reality and describe the methodology.

References:
[1] T. Bates, K. Ramirez-Amaro, T. Inamura and G. Cheng, "On-line simultaneous learning and recognition of everyday activities from virtual
reality performances," 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 2017, pp. 3510-
3515, doi: 10.1109/IROS.2017.8206193.
[2] Schröder, Matthias, and Helge Ritter. "Deep learning for action recognition in augmented reality assistance systems." ACM SIGGRAPH 2017
Posters. 2017. 1-2.
[3] Stricker, Didier, and Gabriele Bleser. "From interactive to adaptive augmented reality." 2012 International Symposium on Ubiquitous Virtual
Reality. IEEE, 2012.

14/04/21 Michael Adam | Scientific Seminar Media Technology | Augmented & Virtual Reality 15

Chair of Media Technology
Prof. Dr.-Ing. Eckehard Steinbach Technische Universität München

Augmented and Virtual Haptics
Supervision: Andreas Noll (andreas.noll@tum.de)

For the simulation and augmentation of reality, not only the audio-visual perception should be addressed. For
instance, the tactile sense needs to be stimulated. However, as of today nearly no consumer product is available for
VR and AR scenarios.
Hence, in the seminar you should search for devices which are able to produce haptic feedback in the scope of
AR/VR. Further you should explain the different modalities which need to be addressed (force-feedback, material
property, etc.), discuss which problems need to be faced and present how the devices can be used for an augmented
setup or in a virtual environment.

References:
[1] Shi, Yuxiang, et al. "Self-powered electro-tactile system for virtual tactile experiences." Science Advances 7.6 (2021): eabe2943.
[2] Ichikari, Ryosuke, Tenshi Yanagimachi, and Takeshi Kurata. "Augmented reality tactile map with hand gesture recognition." International
Conference on Computers Helping People with Special Needs. Springer, Cham, 2016.
[3] Kaul, Oliver Beren, and Michael Rohs. "Haptichead: A spherical vibrotactile grid around the head for 3d guidance in virtual and augmented
reality." Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 2017.

14/04/21 Michael Adam | Scientific Seminar Media Technology | Augmented & Virtual Reality 16

Chair of Media Technology
Prof. Dr.-Ing. Eckehard Steinbach Technische Universität München

Generation of Realistic Virtual Views
Supervision: Martin Piccolrovazzi (martin.piccolrovazzi@tum.de)

Generating realistic virtual views of a scene based on 2D or 3D data is a challenging topic with many possible
applications in VR/AR. In recent years, neural rendering has emerged as a active research area, combining machine
learning techniques with computer graphics for virtual view generation. In this topic, we review the latest approaches
in neural rendering, focusing on different applications or modalities.

References:
[1] Mildenhall et al. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis https://arxiv.org/pdf/2003.08934.pdf
[2] Gafni et al. Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction https://arxiv.org/pdf/2012.03065.pdf
[3] Wang et al. NeRF−−: Neural Radiance Fields Without Known Camera Parameters https://arxiv.org/pdf/2102.07064.pdf
[4] Tewari et al. State of the Art on Neural Rendering https://arxiv.org/pdf/2004.03805.pdf

14/04/21 Michael Adam | Scientific Seminar Media Technology | Augmented & Virtual Reality 17

Chair of Media Technology
Prof. Dr.-Ing. Eckehard Steinbach Technische Universität München

Video coding optimization
of virtual reality 360-degree Video
Supervision: Kai Cui (kai.cui@tum.de)

Virtual reality (VR) creates an immersive experience of the real world in a virtual environment. Due to the
technological advancements in recent years, VR technology is growing very fast. Since VR is visualizing the real-
world experience, the image or video content that is used must represent the whole 3D world characteristics. 360-
degree videos demonstrate such characteristics and hence are used in VR applications. However, these contents are
not suitable for conventional video coding standards, which are designed for 2D video format content. Therefore, the
focus for 360-degree video compression is to find a proper projection that transforms a 360-degree frame into a
rectangular planar image that will have a high compression ratio. In this seminar topic, we will investigate the tools
and algorithms that are designed to optimize the 360-degree (spherical) video compression performance in state-of-
the-art video coding standards (e.g., HEVC, AV1, VVC). And compare the advantages and disadvantages of different
approaches.
References:
[1] Zhou, Yimin, Ling Tian, Ce Zhu, Xin Jin, and Yu Sun. "Video coding optimization for virtual reality 360-degree source." IEEE
Journal of Selected Topics in Signal Processing 14, no. 1 (2020): 118-129.
[2] Xu, Mai, Chen Li, Shanyi Zhang, and Patrick Le Callet. "State-of-the-art in 360 video/image processing: Perception, assessment
and compression." IEEE Journal of Selected Topics in Signal Processing 14, no. 1 (2020): 5-26.
[3] Wien, Mathias, Jill M. Boyce, Thomas Stockhammer, and Wen-Hsiao Peng. "Standardization status of immersive video coding."
IEEE Journal on Emerging and Selected Topics in Circuits and Systems 9, no. 1 (2019): 5-17.
[4] Adhuran, Jayasingam, Gosala Kulupana, Chathura Galkandage, and Anil Fernando. "Multiple Quantization Parameter
Optimization in Versatile Video Coding for 360° Videos." IEEE Transactions on Consumer Electronics 66, no. 3 (2020): 213-222.
[5] Lin, Jian-Liang, Ya-Hsuan Lee, Cheng-Hsuan Shih, Sheng-Yen Lin, Hung-Chih Lin, Shen-Kai Chang, Peng Wang, Lin Liu, and
Chi-Cheng Ju. "Efficient projection and coding tools for 360 video." IEEE Journal on Emerging and Selected Topics in Circuits and
Systems 9, no. 1 (2019): 84-97.

14/04/21 Michael Adam | Scientific Seminar Media Technology | Augmented & Virtual Reality 18

You can also read