Lec 01: Introduction to Computer Vision - ECE/CS 5582/479 Computer Vision
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Fall 2021: Zoom@533 999 8759, pwd: mcc2020, Thr 5:30pm-8:15pm ECE/CS 5582/479 Computer Vision Lec 01: Introduction to Computer Vision Zhu Li Dept of CSEE, UMKC Office: FH560E, Email: lizhu@umkc.edu, Ph: x 2346. http://l.web.umkc.edu/lizhu slides created with WPS Office Linux and EqualX LaTex equation editor Z. Li: ECE 5582 Computer Vision, 2021 p.1
Outline Background Objective of the class Prerequisite Lecture Plan Course Project Q&A Z. Li: ECE 5582 Computer Vision, 2021 p.2
An image is worth a thousand words…. What we observe are pixels…. The story: The train wreck at La Gare Montparnasse, 1895 What computer can do these days: Figure out the building The train People walking around Still long way to go to figure out the semantics Train crashes It is an abnormal event (context) La Gare Montparnasse, 1895 Z. Li: ECE 5582 Computer Vision, 2021 p.3
Advances in Image Sensors: pixels and voxels Hyperspectral Image Sensor I(x,y) in RD, D= 48, e.g. 3D/Depth Sensor: LiDAR, Stereo Capture I(x,y,z) in R Panoramic Video Cameras I( , ), , in [0, 2 ] Lightfield Capture Lenslet images Z. Li: ECE 5582 Computer Vision, 2021 p.4
More than 25 years of Image Retrieval Research… IEEE Computer 1995 Special Issue on Content Based Image Retrieval (CBIR) Dr. Raghavan, Vijay Distinguished Professor Center for Adv. Computer Studies Univ of Louisiana at Lafayette Z. Li: ECE 5582 Computer Vision, 2021 p.5
NSF Digital Librararies Initiative Relevance Feedback in CBIR Z. Li: ECE 5582 Computer Vision, 2021 p.6
MPEG-7 Visual Features (Circa 2003) Color, Shape, Texture Features for Image Search Color Texture Shape Motion 1. Histogram • Texture Browsing • Contour Shape • Scalable Color • Homogeneous • Region Shape texture • Color Structure • Edge Histogram • Camera motion • GOF/GOP 2. Dominant Color • Motion Trajectory 3. Color Layout • Parametric motion • Motion Activity Z. Li: ECE 5582 Computer Vision, 2021 p.7
ImageNet - Deep Learning Classification (2013) Tasks: Image Classification, Object Detection & Localization 2012: Fisher Vector (ECCV test of time award, 2020) 2013: Deep Learning ~ Conv Neural Networks (CNN) .e.g. AlexNet 2016: (Very) Deep Learning ~ Residual Neural Networks (ResNet), K. He, MSRA/FAIR. Z. Li: ECE 5582 Computer Vision, 2021 p.8
MPEG CDVS (2015) - Identification Compact Descriptor for Visual Search (CDVS) Object Re-Identification Applications: Navigation, Query by Capture, AR/VR Technology: Key Point (SIFT) detection Fisher Vector Aggregation and Hashing (for shortlisting) SIFT compression Performance Verification: 90+% precision on 1% recall Retrieval : mAP in 80~90%. Z. Li: ECE 5582 Computer Vision, 2021 p.9
Point Cloud Detection and Segmentation Key problems for auto driving cars • Depth from Stereo Images • Optical Flow • Scene Flow • 2D/3D data fusion and registration • Image/3D features for SLAM • Higher level syntactic object/event recognition Z. Li: ECE 5582 Computer Vision, 2021 p.10
Image Recognition Pipeline - Handcrafted Handcrafted Feature Based Image Feature Feature Classification Formation Computing Aggregation Color histogram Bow Homography, Filtering, Edge VLAD Color space Detection Fisher Vector HoG, Harris Supervector Detector, SIFT Knowledge /Data Base Z. Li: ECE 5582 Computer Vision, 2021 p.11
Image Recognition Pipeline - Holistic/Deep Learning Holistic Image Analysis Direction Pixel Projection Subspace Models w h Y=AX X in Rhxw Convolutional Neural Networks Z. Li: ECE 5582 Computer Vision, 2021 p.12
Outline Background Objective of the class Prerequisite Lecture Plan Course Project Q&A Z. Li: ECE 5582 Computer Vision, 2021 p.13
Prerequisite & Text book Prerequisite For senior and graduate students in EE/CS Good Matlab/C programming skills. Some Python is also desirable. Taken Signal & System, or Digital Signal Processing or consent of the instructor Will have different expectation/evaluation scheme for MS/PhD and undergrad students Textbook: None required (saving $$) , will distribute relevant chapters, papers, and notes. Key References: R. Szeliski, Computer Vision: Algorithms and Applications, Springer, 2014. URL: http://szeliski.org/Book/ J. E. Solem, Programming Computer Vision with Python, O’Reilly, 2015. URL: http://programmingcomputervision.com/downloads/Pro grammingComputerVision_CCdraft.pdf Z. Li: ECE 5582 Computer Vision, 2021 p.14
Tentative Lecture Plan Image Processing Basics HW 1: Camera model and image formation Image Filtering and Features Image filtering Image Features for Retrieval Color Features HW 2: Texture and Shape Features Image Retrieval System Basic Image Retrieval System and Metrics Object Identification in Image HW 3: Key Point Detection Keypoint Feature Aggregation Key Point Feature Description Fisher Vector Aggregation MPEG Mobile Visual Search Technology and Standard HW 4: Holistic Approach in Image Subspace method for face recog Understanding Subspace methods for face recognition: Eigenface, Fisherface, Laplacianface. HW 5: deep learning Deep Learning in Image Classification: methods in Aerieal Image SoftMax and Triplet Loss networks Classification Z. Li: ECE 5582 Computer Vision, 2021 p.15
Potential Course/MS thesis Project Resources from last year: https://sce.umkc.edu/faculty- sites/lizhu/teaching/2019.spring.vision/mai n-cv.html Potential projects with 25% bonus points Google Landmark Grand Challenge - Identification/Recognition (CDVS baseline, U of Surrey) Aerial Image Classification with blur & noise (AFRL project) VisDrone - UAV vision and object recognition (Pengfei) FlatCam Lensless Camera Face Verification Challenge (Salman) Real world smart phone image super resolution (NITRE2020) Fast Face Detection in compressed video (OpenCV) Z. Li: ECE 5582 Computer Vision, 2021 p.16
Working with NSF Center for Big Learning Short Bio: Research Interests: Immersive visual communicaiton: light field, point cloud and 360 video coding and low latency streaming Low Light, Res and Quality Image Understanding What DL can do for compression (intra, ibc, sr, inter, end2end) Multimedia Computing & Communication Lab What compression can do for DL (compression, Univ of Missouri, Kansas City acceleration) signal processing and image understanding visual communication mobile edge computing & communication learning Z. Li: ECE 5582 Computer Vision, 2021 p.17
Dark Image Enhancement To design network to denoise the low-light image in Bayer domain To use wavelet decomposition to divide and conquer the problem by learning sensor field sub images using separate netowks Figure 4: [a] Extreme low-light image from Sony a7S II exposed for 1/25 second . [b] 250x intensity scaling of image in [a]. [c] Ground truth image captured with 10 second exposure time. [d] Output from SID[]. SID introduced some artifacts around the edge of the chair as shown by green arrow. [e] Output from ResLearning[]. The white region as indicated by arrow in image is not properly reconstructed as white compared to that in ground truth image. [f] Our result. Z. Li: ECE 5582 Computer Vision, 2021 p.18
Decomposition based residual learning from sensor field Decomposition of the target image via Wavelet Adaptive loss functions for different subbands to exploit strong texture prior Figure 12: Overview of our wavelet decomposition based network. The first stage learns the decomposed image and used the inverse wavelet to reconstruct the denoised 4 channel image. The second stage uses the off-the-shelf ISP to enhances the image and converts into 3 channel sRGB image. Z. Li: ECE 5582 Computer Vision, 2021 p.19
Experimental Results Z. Li: ECE 5582 Computer Vision, 2021 p.20
Remote Sensing & Vision Highlights (AFOSR) "Hyperspectral Image Classification with Attention Aided CNNs", IEEE Trans. on Geoscience & Remote Sensing (T-GRS), 2020. Attention CNN for Hyperspectral Image Classification • Introducing a dual stream network architecture with separate attention model for spatial and spectral feature maps • Achieving the SOTA performance. “PRINET: A Prior Driven Spectral Super-Resolution Network”, IEEE International Conf on Multimedia & Expo (ICME), London, 2020. PRINET: Spectral Super Resolution • Super-resolve hyper-spectral info from RGB inputs • A dual loss network that learn a correlation decomposed HSI images • Achieving the new SOTA performance. Z. Li, UMKC p.21
Deep Guided Filtering Deblocking The residual frame can be used as the guidance for the in- loop filter of the reconstructed frame Larger residuals indicate larger reconstruction errors Z. Li: ECE 5582 Computer Vision, 2021 p.22
Coding-prior-based in-loop filter The residual frame is used as the additional input Specific networks for reconstruction and residual Residual Network: residual blocks Reconstruction Network: down-sampling and up-sampling Z. Li: ECE 5582 Computer Vision, 2021 p.23
Experimental results Comparison with VRCNN Intra: 2.1% improvement Inter: 0.7% improvement Z. Li: ECE 5582 Computer Vision, 2021 p.24
Radar Signal Learning for Privacy Preserving Fall Detection Use case:Seniors assisted living - Fall Detection Approach: 77Ghz portable radar array sensor set up: horizontal and vertical scanning, 4x2 Tx/Rx Radar Signal Low Dimension Embedding + LSTM action recognition Time GRB Images from Realsense Range- Angle Reflection Headmaps Non-Falls Falls Figure 1. mmWave Radar based Fall Detector Z. Li: ECE 5582 Computer Vision, 2021 p.25
Neural network processing Human activities are continuous dynamic patterns that can be recognized in both spatial and temporal dependencies. We use successive radar reflection heatmaps as the representative of human activities. PCA is adopted as RLDE algorithm to project reflection heatmaps {H , V } to a low-dimension subspace P as the elimination of spatial redundancies, The proposed RNN with LSTM units utilizes the changes of motion at the temporal domain. The softmax layer operates as a classifier. The cross-entropy function is adopted as the objective function. X + Ct-1 Ct it tanh X ot ft X σ σ tanh σ Softmax ht-1 ht RLDE St-1 St St+1 Ht-1 Vt-1 Ht Vt Ht+1 Vt+1 Figure 3. Architecture of RNN with LSTM units Z. Li: ECE 5582 Computer Vision, 2021 p.26
Extensive experiment Multiple human activities detections: 7 categories of human activities are labeled: Boxing, Falling, Jogging, Jump, Pick up, Stand up & Walking. Confusion Matrix of Multiple Human Activities boxing 97.7% 2.3% Average Inference Time Complexity: falling 1.2% 69.4% 1.2% 1.2% 3.5% 15.3% 8.2% RLDE + LSTM: 0.06042 sec jogging 100.0% 3DCNN: 7.336 sec True Class jump 1.8% 96.4% 1.8% pickup 5.9% 91.2% 2.9% standup 32.1% 5.7% 49.1% 13.2% walking 0.7% 99.3% boxing falling jogging jump pickup standup walking Predicted Class Figure 4. Accuracy of Multiple Human Activities Detecting Z. Li: ECE 5582 Computer Vision, 2021 p.27
Internship Opportunities Industry Partners US Citizens - Send me your contact if interested AFRL, JAIC Z. Li: ECE 5582 Computer Vision, 2021 p.28
Course Outcome Upon completion of the course you will be able to: Understand the basic operations in image formation and filtering Understand basic image features for retrieval: color, shape, texture Understand key point features and aggregation in object identification Understand the holistic appearance modeling approach in image understanding Understand the latest image analysis and understanding techniques like deep learning . Can apply the knowledge an algorithms to solve real world image understanding and retrieval problems Well prepared for conducting advanced research and pursing career/PhD in this topic area. (PhD qualify required course) Z. Li: ECE 5582 Computer Vision, 2021 p.29
Grading (total 100pts + bonus) 5 Homeworks (50pts) Image Filtering and Basic Features Image Retrieval System and Performance Metrics Key Point Feature and Fisher Vector Aggregation in Object Identification Subspace Models in Image Understanding Deep Learning Aggregation in Classification 2 Quizzes (20pts) : relax, quiz is also on me, to see where you guys stand Quiz-1: Part I and II Quiz-2: Part III and IV Project (30pts) Original work leads to publication, discuss with me by the mid of October. (up to 15 bonus pts) Regular project: assign papers to read, implement certain aspect, and do a presentation. Z. Li: ECE 5582 Computer Vision, 2021 p.30
Logistics Office Hour: Thu: 2:30-4:30pm on zoom Or by appointment TA: Rijun Liao Lab Sessions are planned to cover certain software tools aspects. Office Hour: TBA Course Resources: Box folder with slides, lecture video, references, data set, and software: (Password: ECE5582CV) https://umkc.box.com/s/zwj3nxrjbh1qzjctp7qhoru044grv5zf Main communication: via class emails, homeworks submission via canvas, zoom meetings/office hours Additional reference, software, and data set will be announced. Z. Li: ECE 5582 Computer Vision, 2021 p.31
Q&A Q&A Z. Li: ECE 5582 Computer Vision, 2021 p.32
You can also read