Lecture Notes in Computer Science
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Lecture Notes in Computer Science 12572 Founding Editors Gerhard Goos Karlsruhe Institute of Technology, Karlsruhe, Germany Juris Hartmanis Cornell University, Ithaca, NY, USA Editorial Board Members Elisa Bertino Purdue University, West Lafayette, IN, USA Wen Gao Peking University, Beijing, China Bernhard Steffen TU Dortmund University, Dortmund, Germany Gerhard Woeginger RWTH Aachen, Aachen, Germany Moti Yung Columbia University, New York, NY, USA
More information about this subseries at http://www.springer.com/series/7409
Jakub Lokoč Tomáš Skopal • • Klaus Schoeffmann Vasileios Mezaris • • Xirong Li Stefanos Vrochidis • • Ioannis Patras (Eds.) MultiMedia Modeling 27th International Conference, MMM 2021 Prague, Czech Republic, June 22–24, 2021 Proceedings, Part I 123
Editors Jakub Lokoč Tomáš Skopal Charles University Charles University Prague, Czech Republic Prague, Czech Republic Klaus Schoeffmann Vasileios Mezaris Klagenfurt University CERTH-ITI Klagenfurt, Austria Thessaloniki, Greece Xirong Li Stefanos Vrochidis Renmin University of China CERTH-ITI Beijing, China Thessaloniki, Greece Ioannis Patras Queen Mary University of London London, UK ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-030-67831-9 ISBN 978-3-030-67832-6 (eBook) https://doi.org/10.1007/978-3-030-67832-6 LNCS Sublibrary: SL3 – Information Systems and Applications, incl. Internet/Web, and HCI © Springer Nature Switzerland AG 2021 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface These two-volume proceedings contain the papers accepted at MMM 2021, the 27th International Conference on MultiMedia Modeling. Organized for more than 25 years, MMM has become a respected and well-established international conference bringing together excellent researchers from academic and industrial areas. During the conference, novel research works from MMM-related areas (especially multimedia content analysis; multimedia signal pro- cessing and communications; and multimedia applications and services) are shared along with practical experiences, results, and exciting demonstrations. The 27th instance of the conference was organized in Prague, Czech Republic on June 22–24, 2021. Due to the COVID-19 pandemic, the conference date was shifted by five months, however the Proceedings were published in January in accordance with the original plan. Despite the pandemic, MMM 2021 received a large number of submissions organized in different tracks. Specifically, 211 papers were submitted to seven MMM 2021 tracks. Each paper was reviewed by at least two reviewers (but mostly three) from the Program Committee, while the TPC chairs and special event organizers acted as meta-reviewers. Out of 166 regular papers, 73 were accepted for the proceedings. In particular, 40 papers were accepted for oral presentation and 33 papers for poster presentation. Regarding the remaining tracks, 16 special session papers were accepted as well as 2 papers for a demo presentation and 17 papers for participation at the Video Browser Showdown 2021. Overall, the MMM 2021 program comprised 108 papers from the seven tracks with the following acceptance rates: Tracks #Papers ACCEPTANCE rates Full papers (oral) 40 24% Full papers (oral + poster) 73 44% Demos 2 67% SS1: MAPTA 4 50% SS2: MDRE 5 71% SS3: MMARSat 3 100% SS4: MULTIMED 4 67% Video Browser Showdown 17 94% The special sessions are traditionally organized to extend the program with novel challenging problems and directions. The MMM 2021 program included four special sessions: – SS1: Multimedia Analytics: Perspectives, Tools, and Applications (MAPTA) – SS2: Multimedia Datasets for Repeatable Experimentation (MDRE) – SS3: Multimodal Analysis and Retrieval of Satellite Images (MMARSat) – SS4: Multimedia and Multimodal Analytics in the Medical Domain and Pervasive Environments (MULTIMED)
vi Preface Besides the four special sessions, the anniversary 10th Video Browser Showdown represented an important highlight of MMM 2021 with a record number of 17 par- ticipating systems in this exciting (and challenging!) competition. In addition, two highly respected speakers were invited to MMM 2021 to present their impressive talks and results in multimedia-related topics. Specifically, we would like to thank Cees Snoek from the University of Amsterdam, and Pavel Zezula from Masaryk University. Last but not least, we would like to thank all members of the MMM community who contributed to the MMM 2021 event. We also thank all authors of submitted papers, all reviewers, and all members of the MMM 2021 organization team for their great work and support. They all helped MMM 2021 to be an exciting and inspiring international event for all participants! January 2021 Jakub Lokoč Tomáš Skopal Klaus Schoeffmann Vasileios Mezaris Xirong Li Stefanos Vrochidis Ioannis Patras
Organization Organizing Committee General Chairs Jakub Lokoč Charles University, Prague Tomáš Skopal Charles University, Prague Program Chairs Klaus Schoeffmann Klagenfurt University Vasileios Mezaris CERTH-ITI, Thessaloniki Xirong Li Renmin University of China Special Session and Tutorial Chairs Werner Bailer Joanneum Research Marta Mrak BBC Research & Development Panel Chairs Giuseppe Amato ISTI-CNR, Pisa Fabrizio Falchi ISTI-CNR, Pisa Demo Chairs Cathal Gurrin Dublin City University Jan Zahálka Czech Technical University in Prague Video Browser Showdown Chairs Klaus Schoeffmann Klagenfurt University Werner Bailer Joanneum Research Jakub Lokoč Charles University, Prague Cathal Gurrin Dublin City University Publicity Chairs Phoebe Chen La Trobe University Chong-Wah Ngo City University of Hong Kong Bing-Kun Bao Nanjing University of Posts and Telecommunications Publication Chairs Stefanos Vrochidis CERTH-ITI, Thessaloniki Ioannis Patras Queen Mary University of London
viii Organization Steering Committee Phoebe Chen La Trobe University Tat-Seng Chua National University of Singapore Kiyoharu Aizawa University of Tokyo Cathal Gurrin Dublin City University Benoit Huet Eurecom Klaus Schoeffmann Klagenfurt University Richang Hong Hefei University of Technology Björn Þór Jónsson IT University of Copenhagen Guo-Jun Qi University of Central Florida Wen-Huang Cheng National Chiao Tung University Peng Cui Tsinghua University Web Chair František Mejzlík Charles University, Prague Organizing Agency Conforg, s.r.o. Special Session Organizers Multimedia Datasets for Repeatable Experimentation (MDRE) Cathal Gurrin Dublin City University, Ireland Duc-Tien Dang-Nguyen University of Bergen, Norway Björn Þór Jónsson IT University of Copenhagen, Denmark Klaus Schoeffmann Klagenfurt University, Austria Multimedia Analytics: Perspectives, Tools and Applications (MAPTA) Björn Þór Jónsson IT University of Copenhagen, Denmark Stevan Rudinac University of Amsterdam, The Netherlands Xirong Li Renmin University of China, China Cathal Gurrin Dublin City University, Ireland Laurent Amsaleg CNRS-IRISA, France Multimodal Analysis and Retrieval of Satellite Images Ilias Gialampoukidis Centre for Research and Technology Hellas, Information Technologies Institute, Greece Stefanos Vrochidis Centre for Research and Technology Hellas, Information Technologies Institute, Greece Ioannis Papoutsis National Observatory of Athens, Greece
Organization ix Guido Vingione Serco Italy, Italy Ioannis Kompatsiaris Centre for Research and Technology Hellas, Information Technologies Institute, Greece MULTIMED: Multimedia and Multimodal Analytics in the Medical Domain and Pervasive Environments Georgios Meditskos Centre for Research and Technology Hellas, Information Technologies Institute, Greece Klaus Schoeffmann Klagenfurt University, Austria Leo Wanner ICREA – Universitat Pompeu Fabra, Spain Stefanos Vrochidis Centre for Research and Technology Hellas, Information Technologies Institute, Greece Athanasios Tzioufas Medical School of the National and Kapodistrian University of Athens, Greece MMM 2021 Program Committees and Reviewers Regular and Special Sessions Program Committee Olfa Ben Ahmed EURECOM Laurent Amsaleg CNRS-IRISA Evlampios Apostolidis CERTH ITI Ognjen Arandjelović University of St Andrews Devanshu Arya University of Amsterdam Nathalie Aussenac IRIT CNRS Esra Açar Middle East Technical University Werner Bailer JOANNEUM RESEARCH Bing-Kun Bao Nanjing University of Posts and Telecommunications Ilaria Bartolini University of Bologna Christian Beecks University of Munster Jenny Benois-Pineau LaBRI, UMR CNRS 5800 CNRS, University of Bordeaux Roberto Di Bernardo Engineering Ingegneria Informatica S.p.A. Antonis Bikakis University College London Josep Blat Universitat Pompeu Fabra Richard Burns West Chester University Benjamin Bustos University of Chile K. Selçuk Candan Arizona State University Ying Cao City University of Hong Kong Annalina Caputo University College Dublin Savvas Chatzichristofis Neapolis University Pafos Angelos Chatzimichail Centre for Research and Technology Hellas Edgar Chavez CICESE Mulin Chen Northwestern Polytechnical University Zhineng Chen Institute of Automation, Chinese Academy of Sciences Zhiyong Cheng Qilu University of Technology Wei-Ta Chu National Cheng Kung University
x Organization Andrea Ciapetti Innovation Engineering Kathy Clawson University of Sunderland Claudiu Cobarzan Klagenfurt University Rossana Damiano Università di Torino Mariana Damova Mozaika Minh-Son Dao National Institute of Information and Communications Technology Petros Daras Information Technologies Institute Mihai Datcu DLR Mathieu Delalandre Université de Tours Begum Demir Technische Universität Berlin Francois Destelle Dublin City University Cem Direkoğlu Middle East Technical University – Northern Cyprus Campus Jianfeng Dong Zhejiang Gongshang University Shaoyi Du Xi’an Jiaotong University Athanasios Efthymiou University of Amsterdam Lianli Gao University of Science and Technology of China Dimos Georgiou Catalink EU Negin Ghamsarian Klagenfurt University Ilias Gialampoukidis CERTH ITI Nikolaos Gkalelis CERTH ITI Nuno Grosso Ziyu Guan Northwest University of China Gylfi Gudmundsson Reykjavik University Silvio Guimaraes Pontifícia Universidade Católica de Minas Gerais Cathal Gurrin Dublin City University Pål Halvorsen SimulaMet Graham Healy Dublin City University Shintami Chusnul Hidayati Institute of Technology Sepuluh Nopember Dennis Hoppe High Performance Computing Center Stuttgart Jun-Wei Hsieh National Taiwan Ocean University Min-Chun Hu National Tsing Hua University Zhenzhen Hu Nanyang Technological University Jen-Wei Huang National Cheng Kung University Lei Huang Ocean University of China Ichiro Ide Nagoya University Konstantinos Ioannidis CERTH ITI Bogdan Ionescu University Politehnica of Bucharest Adam Jatowt Kyoto University Peiguang Jing Tianjin University Hyun Woo Jo Korea University Björn Þór Jónsson IT-University of Copenhagen Yong Ju Jung Gachon University Anastasios Karakostas Aristotle University of Thessaloniki Ari Karppinen Finnish Meteorological Institute
Organization xi Jiro Katto Waseda University Junmo Kim Korea Advanced Institute of Science and Technology Sabrina Kletz Klagenfurt University Ioannis Kompatsiaris CERTH ITI Haris Kontoes National Observatory of Athens Efstratios Kontopoulos Elsevier Technology Markus Koskela CSC – IT Center for Science Ltd. Yu-Kun Lai Cardiff University Woo Kyun Lee Korea University Jochen Laubrock University of Potsdam Khiem Tu Le Dublin City University Andreas Leibetseder Klagenfurt University Teng Li Anhui University Xirong Li Renmin University of China Yingbo Li Eurecom Wu Liu JD AI Research of JD.com Xueting Liu The Chinese University of Hong Kong Jakub Lokoč Charles University José Lorenzo Atos Mathias Lux Klagenfurt University Ioannis Manakos CERTH ITI José M. Martinez Universidad Autònoma de Madrid Stephane Marchand-Maillet Viper Group – University of Geneva Ernesto La Mattina Engineering Ingegneria Informatica S.p.A. Thanassis Mavropoulos CERTH ITI Kevin McGuinness Dublin City University Georgios Meditskos CERTH ITI Robert Mertens HSW University of Applied Sciences Vasileios Mezaris CERTH ITI Weiqing Min ICT Wolfgang Minker University of Ulm Marta Mrak BBC Phivos Mylonas National Technical University of Athens Henning Muller HES-SO Duc Tien Dang Nguyen University of Bergen Liqiang Nie Shandong University Tu Van Ninh Dublin City University Naoko Nitta Osaka University Noel E. O’Connor Dublin City University Neil O’Hare Yahoo Research Jean-Marc Ogier University of La Rochelle Vincent Oria NJIT Tse-Yu Pan National Cheng Kung University Ioannis Papoutsis National Observatory of Athens Cecilia Pasquini Universität Innsbruck Ladislav Peška Charles University
xii Organization Yannick Prie LINA – University of Nantes Manfred Jürgen Primus Klagenfurt University Athanasios Psaltis Centre for Research and Technology Hellas, Thessaloniki Georges Quénot Laboratoire d’Informatique de Grenoble, CNRS Miloš Radovanović University of Novi Sad Amon Rapp University of Torino Stevan Rudinac University of Amsterdam Borja Sanz University of Deusto Shin’ichi Satoh National Institute of Informatics Gabriella Scarpino Serco Italia S.p.A. Simon Scerri Fraunhofer IAIS, University of Bonn Klaus Schoeffmann Klagenfurt University Matthias Schramm TU Wien John See Multimedia University Jie Shao University of Science and Technology of China Wen-Ze Shao Nanjing University of Posts and Telecommunications Xi Shao Nanjing University of Posts and Telecommunications Ujjwal Sharma University of Amsterdam Dongyu She Nankai University Xiangjun Shen Jiangsu University Koichi Shinoda Tokyo Institute of Technology Hong-Han Shuai National Chiao Tung University Mei-Ling Shyu University of Miami Vasileios Sitokonstantinou National Observatory of Athens Tomáš Skopal Charles University Alan Smeaton Dublin City University Natalia Sokolova Klagenfurt University Gjorgji Strezoski University of Amsterdam Li Su UCAS Lifeng Sun Tsinghua University Machi Symeonidou DRAXIS Environmental SA Daniel Stanley Tan De La Salle University Mario Taschwer Klagenfurt University Georg Thallinger JOANNEUM RESEARCH Christian Timmerer Klagenfurt University Athina Tsanousa CERTH ITI Athanasios Tzioufas NKUA Shingo Uchihashi Fuji Xerox Co., Ltd. Tiberio Uricchio University of Florence Guido Vingione Serco Stefanos Vrochidis CERTH ITI Qiao Wang Southeast University Qifei Wang Google Xiang Wang National University of Singapore Xu Wang Shenzhen University
Organization xiii Zheng Wang National Institute of Informatics Leo Wanner ICREA/UPF Wolfgang Weiss JOANNEUM RESEARCH Lai-Kuan Wong Multimedia University Tien-Tsin Wong The Chinese University of Hong Kong Marcel Worring University of Amsterdam Xiao Wu Southwest Jiaotong University Sen Xiang Wuhan University of Science and Technology Ying-Qing Xu Tsinghua University Toshihiko Yamasaki The University of Tokyo Keiji Yanai The University of Electro-Communications Gang Yang Renmin University of China Yang Yang University of Science and Technology of China You Yang Huazhong University of Science and Technology Zhaoquan Yuan Southwest Jiaotong University Jan Zahálka Czech Technical University in Prague Hanwang Zhang Nanyang Technological University Sicheng Zhao University of California, Berkeley Lei Zhu Huazhong University of Science and Technology Additional Reviewers Hadi Amirpour Hanyuan Liu Eric Arazo Katrinna Macfarlane Gibran Benitez-Garcia Danila Mamontov Adam Blažek Thanassis Mavropoulos Manliang Cao Anastasia Moumtzidou Ekrem Çetinkaya Vangelis Oikonomou Long Chen Jesus Perez-Martin Přemysl Čech Zhaobo Qi Julia Dietlmeier Tomas Soucek Denis Dresvyanskiy Vajira Thambawita Negin Ghamsarian Athina Tsanousa Panagiotis Giannakeris Chenglei Wu Socratis Gkelios Menghan Xia Tomáš Grošup Minshan Xie Steven Hicks Cai Xu Milan Hladik Gang Yang Wenbo Hu Yaming Yang Debesh Jha Jiang Zhou Omar Shahbaz Khan Haichao Zhu Chengze Li Zirui Zhu
Contents – Part I Crossed-Time Delay Neural Network for Speaker Recognition . . . . . . . . . . . 1 Liang Chen, Yanchun Liang, Xiaoshu Shi, You Zhou, and Chunguo Wu An Asymmetric Two-Sided Penalty Term for CT-GAN . . . . . . . . . . . . . . . . 11 Huan Zhao, Yu Wang, Tingting Li, and Yuqing Zhao Fast Discrete Matrix Factorization Hashing for Large-Scale Cross-Modal Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Huan Zhao, Xiaolin She, Song Wang, and Kaili Ma Fast Optimal Transport Artistic Style Transfer . . . . . . . . . . . . . . . . . . . . . . 37 Ting Qiu, Bingbing Ni, Ziang Liu, and Xuanhong Chen Stacked Sparse Autoencoder for Audio Object Coding. . . . . . . . . . . . . . . . . 50 Yulin Wu, Ruimin Hu, Xiaochen Wang, Chenhao Hu, and Gang Li A Collaborative Multi-modal Fusion Method Based on Random Variational Information Bottleneck for Gesture Recognition . . . . . . . . . . . . . . . . . . . . . 62 Yang Gu, Yajie Li, Yiqiang Chen, Jiwei Wang, and Jianfei Shen Frame Aggregation and Multi-modal Fusion Framework for Video-Based Person Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Fangtao Li, Wenzhe Wang, Zihe Liu, Haoran Wang, Chenghao Yan, and Bin Wu An Adaptive Face-Iris Multimodal Identification System Based on Quality Assessment Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Zhengding Luo, Qinghua Gu, Guoxiong Su, Yuesheng Zhu, and Zhiqiang Bai Thermal Face Recognition Based on Multi-scale Image Synthesis . . . . . . . . . 99 Wei-Ta Chu and Ping-Shen Huang Contrastive Learning in Frequency Domain for Non-I.I.D. Image Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Huan Shao, Zhaoquan Yuan, Xiao Peng, and Xiao Wu Group Activity Recognition by Exploiting Position Distribution and Appearance Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Duoxuan Pei, Annan Li, and Yunhong Wang
xvi Contents – Part I Multi-branch and Multi-scale Attention Learning for Fine-Grained Visual Categorization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 Fan Zhang, Meng Li, Guisheng Zhai, and Yizhao Liu Dense Attention-Guided Network for Boundary-Aware Salient Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 Zhe Zhang, Junhui Ma, Panpan Xu, and Wencheng Wang Generative Image Inpainting by Hybrid Contextual Attention Network . . . . . 162 Zhijiao Xiao and Donglun Li Atypical Lyrics Completion Considering Musical Audio Signals . . . . . . . . . . 174 Kento Watanabe and Masataka Goto Improving Supervised Cross-modal Retrieval with Semantic Graph Embedding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Changting Feng, Dagang Li, and Jingwei Zheng Confidence-Based Global Attention Guided Network for Image Inpainting . . . 200 Zhilin Huang, Chujun Qin, Lei Li, Ruixin Liu, and Yuesheng Zhu Multi-task Deep Learning for No-Reference Screen Content Image Quality Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 Rui Gao, Ziqing Huang, and Shiguang Liu Language Person Search with Pair-Based Weighting Loss . . . . . . . . . . . . . . 227 Peng Zhang, Deqiang Ouyang, Chunlin Jiang, and Jie Shao DeepFusion: Deep Ensembles for Domain Independent System Fusion . . . . . 240 Mihai Gabriel Constantin, Liviu-Daniel Ştefan, and Bogdan Ionescu Illuminate Low-Light Image via Coarse-to-fine Multi-level Network . . . . . . . 253 Yansheng Qiu, Jun Chen, Xiao Wang, and Kui Jang MM-Net: Learning Adaptive Meta-metric for Few-Shot Biometric Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 Qinghua Gu, Zhengding Luo, Wanyu Zhao, and Yuesheng Zhu A Sentiment Similarity-Oriented Attention Model with Multi-task Learning for Text-Based Emotion Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 Yahui Fu, Lili Guo, Longbiao Wang, Zhilei Liu, Jiaxing Liu, and Jianwu Dang Locating Visual Explanations for Video Question Answering . . . . . . . . . . . . 290 Xuanwei Chen, Rui Liu, Xiaomeng Song, and Yahong Han
Contents – Part I xvii Global Cognition and Local Perception Network for Blind Image Deblurring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 Chuanfa Zhang, Wei Zhang, Feiyu Chen, Yiting Cheng, Shuyong Gao, and Wenqiang Zhang Multi-grained Fusion for Conditional Image Retrieval . . . . . . . . . . . . . . . . . 315 Yating Liu and Yan Lu A Hybrid Music Recommendation Algorithm Based on Attention Mechanism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328 Weite Feng, Tong Li, Haiyang Yu, and Zhen Yang Few-Shot Learning with Unlabeled Outlier Exposure. . . . . . . . . . . . . . . . . . 340 Haojie Wang, Jieya Lian, and Shengwu Xiong Fine-Grained Video Deblurring with Event Camera . . . . . . . . . . . . . . . . . . . 352 Limeng Zhang, Hongguang Zhang, Chenyang Zhu, Shasha Guo, Jihua Chen, and Lei Wang Discriminative and Selective Pseudo-Labeling for Domain Adaptation . . . . . . 365 Fei Wang, Youdong Ding, Huan Liang, and Jing Wen Multi-level Gate Feature Aggregation with Spatially Adaptive Batch- Instance Normalization for Semantic Image Synthesis . . . . . . . . . . . . . . . . . 378 Jia Long and Hongtao Lu Robust Multispectral Pedestrian Detection via Uncertainty-Aware Cross-Modal Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391 Sungjune Park, Jung Uk Kim, Yeon Gyun Kim, Sang-Keun Moon, and Yong Man Ro Time-Dependent Body Gesture Representation for Video Emotion Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403 Jie Wei, Xinyu Yang, and Yizhuo Dong MusiCoder: A Universal Music-Acoustic Encoder Based on Transformer . . . . 417 Yilun Zhao and Jia Guo DANet: Deformable Alignment Network for Video Inpainting . . . . . . . . . . . 430 Xutong Lu and Jianfu Zhang Deep Centralized Cross-modal Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . 443 Zhenyu Wen and Aimin Feng Shot Boundary Detection Through Multi-stage Deep Convolution Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456 Tingting Wang, Na Feng, Junqing Yu, Yunfeng He, Yangliu Hu, and Yi-Ping Phoebe Chen
xviii Contents – Part I Towards Optimal Multirate Encoding for HTTP Adaptive Streaming . . . . . . . 469 Hadi Amirpour, Ekrem Çetinkaya, Christian Timmerer, and Mohammad Ghanbari Fast Mode Decision Algorithm for Intra Encoding of the 3rd Generation Audio Video Coding Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481 Shengyuan Wu, Zhenyu Wang, Yangang Cai, and Ronggang Wang Graph Structure Reasoning Network for Face Alignment and Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493 Xing Wang, Xinyu Li, and Suping Wu Game Input with Delay – A Model of the Time Distribution for Selecting a Moving Target with a Mouse. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506 Shengmei Liu and Mark Claypool Unsupervised Temporal Attention Summarization Model for User Created Videos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519 Min Hu, Ruimin Hu, Xiaocheng Wang, and Rui Sheng Learning from the Negativity: Deep Negative Correlation Meta-Learning for Adversarial Image Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531 Wenbo Zheng, Lan Yan, Fei-Yue Wang, and Chao Gou Learning 3D-Craft Generation with Predictive Action Neural Network. . . . . . 541 Ze-yu Liu, Jian-wei Liu, Xin Zuo, and Weimin Li Unsupervised Multi-shot Person Re-identification via Dynamic Bi-directional Normalized Sparse Representation . . . . . . . . . . . . . . . . . . . . . 554 Xiaobao Li, Wen Wang, Qingyong Li, and Lijun Guo Classifier Belief Optimization for Visual Categorization . . . . . . . . . . . . . . . . 567 Gang Yang and Xirong Li Fine-Grained Generation for Zero-Shot Learning. . . . . . . . . . . . . . . . . . . . . 580 Weimin Sun, Jieping Xu, and Gang Yang Fine-Grained Image-Text Retrieval via Complementary Feature Learning . . . . 592 Min Zheng, Yantao Jia, and Huajie Jiang Considering Human Perception and Memory in Interactive Multimedia Retrieval Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605 Luca Rossetto, Werner Bailer, and Abraham Bernstein Learning Multi-level Interaction Relations and Feature Representations for Group Activity Recognition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617 Lihua Lu, Yao Lu, and Shunzhou Wang
Contents – Part I xix A Structured Feature Learning Model for Clothing Keypoints Localization. . . 629 Ruhan He, Yuyi Su, Tao Peng, Jia Chen, Zili Zhang, and Xinrong Hu Automatic Pose Quality Assessment for Adaptive Human Pose Refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641 Gang Chu, Chi Xie, and Shuang Liang Deep Attributed Network Embedding with Community Information . . . . . . . 653 Li Xue, Wenbin Yao, Yamei Xia, and Xiaoyong Li An Acceleration Framework for Super-Resolution Network via Region Difficulty Self-adaption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 666 Zhenfang Guo, Yuyao Ye, Yang Zhao, and Ronggang Wang Spatial Gradient Guided Learning and Semantic Relation Transfer for Facial Landmark Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 678 Jian Wang, Yaoyi Li, and Hongtao Lu DVRCNN: Dark Video Post-processing Method for VVC . . . . . . . . . . . . . . 691 Donghui Feng, Yiwei Zhang, Chen Zhu, Han Zhang, and Li Song An Efficient Image Transmission Pipeline for Multimedia Services . . . . . . . . 704 Zeyu Wang Gaussian Mixture Model Based Semi-supervised Sparse Representation for Face Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 716 Xinxin Shan and Ying Wen Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 729
Contents – Part II MSCANet: Adaptive Multi-scale Context Aggregation Network for Congested Crowd Counting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Yani Zhang, Huailin Zhao, Fangbo Zhou, Qing Zhang, Yanjiao Shi, and Lanjun Liang Tropical Cyclones Tracking Based on Satellite Cloud Images: Database and Comprehensive Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Cheng Huang, Sixian Chan, Cong Bai, Weilong Ding, and Jinglin Zhang Image Registration Improved by Generative Adversarial Networks . . . . . . . . 26 Shiyan Jiang, Ci Wang, and Chang Huang Deep 3D Modeling of Human Bodies from Freehand Sketching . . . . . . . . . . 36 Kaizhi Yang, Jintao Lu, Siyu Hu, and Xuejin Chen Two-Stage Real-Time Multi-object Tracking with Candidate Selection. . . . . . 49 Fan Wang, Lei Luo, and En Zhu Tell as You Imagine: Sentence Imageability-Aware Image Captioning . . . . . . 62 Kazuki Umemura, Marc A. Kastner, Ichiro Ide, Yasutomo Kawanishi, Takatsugu Hirayama, Keisuke Doman, Daisuke Deguchi, and Hiroshi Murase Deep Face Swapping via Cross-Identity Adversarial Training . . . . . . . . . . . . 74 Shuhui Yang, Han Xue, Jun Ling, Li Song, and Rong Xie Res2-Unet: An Enhanced Network for Generalized Nuclear Segmentation in Pathological Images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Shuai Zhao, Xuanya Li, Zhineng Chen, Chang Liu, and Changgen Peng Automatic Diagnosis of Glaucoma on Color Fundus Images Using Adaptive Mask Deep Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Gang Yang, Fan Li, Dayong Ding, Jun Wu, and Jie Xu Initialize with Mask: For More Efficient Federated Learning . . . . . . . . . . . . 111 Zirui Zhu and Lifeng Sun Unsupervised Gaze: Exploration of Geometric Constraints for 3D Gaze Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Yawen Lu, Yuxing Wang, Yuan Xin, Di Wu, and Guoyu Lu
xxii Contents – Part II Median-Pooling Grad-CAM: An Efficient Inference Level Visual Explanation for CNN Networks in Remote Sensing Image Classification . . . . 134 Wei Song, Shuyuan Dai, Dongmei Huang, Jinling Song, and Liotta Antonio Multi-granularity Recurrent Attention Graph Neural Network for Few-Shot Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Xu Zhang, Youjia Zhang, and Zuyu Zhang EEG Emotion Recognition Based on Channel Attention for E-Healthcare Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Xu Zhang, Tianzhi Du, and Zuyu Zhang The MovieWall: A New Interface for Browsing Large Video Collections. . . . 170 Marij Nefkens and Wolfgang Hürst Keystroke Dynamics as Part of Lifelogging . . . . . . . . . . . . . . . . . . . . . . . . 183 Alan F. Smeaton, Naveen Garaga Krishnamurthy, and Amruth Hebbasuru Suryanarayana HTAD: A Home-Tasks Activities Dataset with Wrist-Accelerometer and Audio Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 Enrique Garcia-Ceja, Vajira Thambawita, Steven A. Hicks, Debesh Jha, Petter Jakobsen, Hugo L. Hammer, Pål Halvorsen, and Michael A. Riegler MNR-Air: An Economic and Dynamic Crowdsourcing Mechanism to Collect Personal Lifelog and Surrounding Environment Dataset. A Case Study in Ho Chi Minh City, Vietnam. . . . . . . . . . . . . . . . . . . . . . . 206 Dang-Hieu Nguyen, Tan-Loc Nguyen-Tai, Minh-Tam Nguyen, Thanh-Binh Nguyen, and Minh-Son Dao Kvasir-Instrument: Diagnostic and Therapeutic Tool Segmentation Dataset in Gastrointestinal Endoscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 Debesh Jha, Sharib Ali, Krister Emanuelsen, Steven A. Hicks, Vajira Thambawita, Enrique Garcia-Ceja, Michael A. Riegler, Thomas de Lange, Peter T. Schmidt, Håvard D. Johansen, Dag Johansen, and Pål Halvorsen CatMeows: A Publicly-Available Dataset of Cat Vocalizations . . . . . . . . . . . 230 Luca A. Ludovico, Stavros Ntalampiras, Giorgio Presti, Simona Cannas, Monica Battini, and Silvana Mattiello Search and Explore Strategies for Interactive Analysis of Real-Life Image Collections with Unknown and Unique Categories . . . . . . . . . . . . . . . . . . . 244 Floris Gisolf, Zeno Geradts, and Marcel Worring
Contents – Part II xxiii Graph-Based Indexing and Retrieval of Lifelog Data . . . . . . . . . . . . . . . . . . 256 Manh-Duy Nguyen, Binh T. Nguyen, and Cathal Gurrin On Fusion of Learned and Designed Features for Video Data Analytics. . . . . 268 Marek Dobranský and Tomáš Skopal XQM: Interactive Learning on Mobile Phones . . . . . . . . . . . . . . . . . . . . . . 281 Alexandra M. Bagi, Kim I. Schild, Omar Shahbaz Khan, Jan Zahálka, and Björn Þór Jónsson A Multimodal Tensor-Based Late Fusion Approach for Satellite Image Search in Sentinel 2 Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 Ilias Gialampoukidis, Anastasia Moumtzidou, Marios Bakratsas, Stefanos Vrochidis, and Ioannis Kompatsiaris Canopy Height Estimation from Spaceborne Imagery Using Convolutional Encoder-Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 Leonidas Alagialoglou, Ioannis Manakos, Marco Heurich, Jaroslav Červenka, and Anastasios Delopoulos Implementation of a Random Forest Classifier to Examine Wildfire Predictive Modelling in Greece Using Diachronically Collected Fire Occurrence and Fire Mapping Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 Alexis Apostolakis, Stella Girtsou, Charalampos Kontoes, Ioannis Papoutsis, and Michalis Tsoutsos Mobile eHealth Platform for Home Monitoring of Bipolar Disorder . . . . . . . 330 Joan Codina-Filbà, Sergio Escalera, Joan Escudero, Coen Antens, Pau Buch-Cardona, and Mireia Farrús Multimodal Sensor Data Analysis for Detection of Risk Situations of Fragile People in @home Environments. . . . . . . . . . . . . . . . . . . . . . . . . 342 Thinhinane Yebda, Jenny Benois-Pineau, Marion Pech, Hélène Amieva, Laura Middleton, and Max Bergelt Towards the Development of a Trustworthy Chatbot for Mental Health Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354 Matthias Kraus, Philip Seldschopf, and Wolfgang Minker Fusion of Multimodal Sensor Data for Effective Human Action Recognition in the Service of Medical Platforms . . . . . . . . . . . . . . . . . . . . . 367 Panagiotis Giannakeris, Athina Tsanousa, Thanasis Mavropoulos, Georgios Meditskos, Konstantinos Ioannidis, Stefanos Vrochidis, and Ioannis Kompatsiaris SpotifyGraph: Visualisation of User’s Preferences in Music . . . . . . . . . . . . . 379 Pavel Gajdusek and Ladislav Peska
xxiv Contents – Part II A System for Interactive Multimedia Retrieval Evaluations . . . . . . . . . . . . . 385 Luca Rossetto, Ralph Gasser, Loris Sauter, Abraham Bernstein, and Heiko Schuldt SQL-Like Interpretable Interactive Video Search . . . . . . . . . . . . . . . . . . . . . 391 Jiaxin Wu, Phuong Anh Nguyen, Zhixin Ma, and Chong-Wah Ngo VERGE in VBS 2021 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398 Stelios Andreadis, Anastasia Moumtzidou, Konstantinos Gkountakos, Nick Pantelidis, Konstantinos Apostolidis, Damianos Galanopoulos, Ilias Gialampoukidis, Stefanos Vrochidis, Vasileios Mezaris, and Ioannis Kompatsiaris NoShot Video Browser at VBS2021 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405 Christof Karisch, Andreas Leibetseder, and Klaus Schoeffmann Exquisitor at the Video Browser Showdown 2021: Relationships Between Semantic Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410 Omar Shahbaz Khan, Björn Þór Jónsson, Mathias Larsen, Liam Poulsen, Dennis C. Koelma, Stevan Rudinac, Marcel Worring, and Jan Zahálka VideoGraph – Towards Using Knowledge Graphs for Interactive Video Retrieval. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417 Luca Rossetto, Matthias Baumgartner, Narges Ashena, Florian Ruosch, Romana Pernisch, Lucien Heitz, and Abraham Bernstein IVIST: Interactive Video Search Tool in VBS 2021 . . . . . . . . . . . . . . . . . . 423 Yoonho Lee, Heeju Choi, Sungjune Park, and Yong Man Ro Video Search with Collage Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429 Jakub Lokoč, Jana Bátoryová, Dominik Smrž, and Marek Dobranský Towards Explainable Interactive Multi-modal Video Retrieval with Vitrivr. . . 435 Silvan Heller, Ralph Gasser, Cristina Illi, Maurizio Pasquinelli, Loris Sauter, Florian Spiess, and Heiko Schuldt Competitive Interactive Video Retrieval in Virtual Reality with vitrivr-VR . . . 441 Florian Spiess, Ralph Gasser, Silvan Heller, Luca Rossetto, Loris Sauter, and Heiko Schuldt An Interactive Video Search Tool: A Case Study Using the V3C1 Dataset. . . 448 Abdullah Alfarrarjeh, Jungwon Yoon, Seon Ho Kim, Amani Abu Jabal, Akarsh Nagaraj, and Chinmayee Siddaramaiah Less is More - diveXplore 5.0 at VBS 2021 . . . . . . . . . . . . . . . . . . . . . . . . 455 Andreas Leibetseder and Klaus Schoeffmann
Contents – Part II xxv SOMHunter V2 at Video Browser Showdown 2021 . . . . . . . . . . . . . . . . . . 461 Patrik Veselý, František Mejzlík, and Jakub Lokoč W2VV++ BERT Model at VBS 2021 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467 Ladislav Peška, Gregor Kovalčík, Tomáš Souček, Vít Škrhák, and Jakub Lokoč VISIONE at Video Browser Showdown 2021. . . . . . . . . . . . . . . . . . . . . . . 473 Giuseppe Amato, Paolo Bolettieri, Fabrizio Falchi, Claudio Gennaro, Nicola Messina, Lucia Vadicamo, and Claudio Vairo IVOS - The ITEC Interactive Video Object Search System at VBS2021 . . . . 479 Anja Ressmann and Klaus Schoeffmann Video Search with Sub-Image Keyword Transfer Using Existing Image Archives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484 Nico Hezel, Konstantin Schall, Klaus Jung, and Kai Uwe Barthel A VR Interface for Browsing Visual Spaces at VBS2021. . . . . . . . . . . . . . . 490 Ly-Duyen Tran, Manh-Duy Nguyen, Thao-Nhu Nguyen, Graham Healy, Annalina Caputo, Binh T. Nguyen, and Cathal Gurrin Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497
You can also read