Lec 18 - MPEG System - ISOBMF and DASH - ECE 5578 Multimedia Communication

Page created by Randy Young
 
CONTINUE READING
Lec 18 - MPEG System - ISOBMF and DASH - ECE 5578 Multimedia Communication
ECE 5578 Multimedia Communication
        Lec 18 - MPEG System - ISOBMF and DASH

                                                 Zhu Li
                                         Dept of CSEE, UMKC
                       Office: FH560E, Email: lizhu@umkc.edu, Ph: x 2346.
                                         http://l.web.umkc.edu/lizhu

                                                        slides created with WPS Office Linux and EqualX LaTex equation editor

ECE5578 Multimedia Communciation, 2021                                                                                          p.1
Lec 18 - MPEG System - ISOBMF and DASH - ECE 5578 Multimedia Communication
Outline
     Recap QoE
     MPEG File Format – ISOBMFF/Mp4
     MPEG DASH – Dynamic Adaptive Streaming over HTTP
    Summary

ECE 5578 Multimedia Communciation, 2021                  p.2
Lec 18 - MPEG System - ISOBMF and DASH - ECE 5578 Multimedia Communication
Learning based Point Cloud Compression

     Network Details
          •    Basic Unit: Inception Residual Network (IRN)
          •    Down-scaling: Convolution with a stride of two
          •    Up-scaling: Transpose convolution with a stride of two

                                   Details illustration of Multiscale PCGC
      Open source at:
     https://github.com/NJUVISION/PCGCv2

ECE 5578 Multimedia Communciation, 2021                                      p.3
Lec 18 - MPEG System - ISOBMF and DASH - ECE 5578 Multimedia Communication
Recap - QoE
     MOS scores
           user mean opinion score
           need to control stats quality
           rather expensive to generate

     objective scores: MSE/PSNR

                                                         100
                                                                JPEG images
                                                         90     JPEG2000 images
                                                                Fitting with Logistic Function
                                                         80

                                                         70

                                                         60

                                                   MOS
                                                         50

                                                         40

                                                         30

                                                         20

                                                         10

                                                          0
                                                           15    20          25         30          35   40   45   50
                                                                                             PSNR

ECE 5578 Multimedia Communciation, 2021                                                                                 p.4
Lec 18 - MPEG System - ISOBMF and DASH - ECE 5578 Multimedia Communication
SSIM
     SSIM - Structural Similarity Index

                                                                                ( , )

                                                                            ( , )

                                                                                ( , )

                         2           +(            )                   2           +(        )
           ( ,      )=                                        ( , )=
                              +           +(           )                   +       +(        )

                                  +(           )                  1
         ( , )=                                              =              (     −     )(       −   )
                                   +(              )              −1
                                                                       =1

ECE 5578 Multimedia Communciation, 2021                                                                  p.5
Lec 18 - MPEG System - ISOBMF and DASH - ECE 5578 Multimedia Communication
SSIM Performance
     better correlates with human perception

          MSE=0, MSSIM=1                   MSE=225, MSSIM=0.949   MSE=225, MSSIM=0.989

      MSE=215, MSSIM=0.671                 MSE=225, MSSIM=0.688    MSE=225, MSSIM=0.723
ECE 5578 Multimedia Communciation, 2021                                                   p.6
Lec 18 - MPEG System - ISOBMF and DASH - ECE 5578 Multimedia Communication
Outline
     ReCap HEVC Rate Control
     MPEG File Format – ISOBMFF/Mp4
     MPEG DASH – Dynamic Adaptive Streaming over HTTP
    Summary

ECE 5578 Multimedia Communciation, 2021                  p.7
Lec 18 - MPEG System - ISOBMF and DASH - ECE 5578 Multimedia Communication
MPEG File Format (FF)
     The problem
           Different audio/video compression technology produces different
            bitstreams
           How to create a format that different applications can built an uniform
            APIs to interpret the bit stream, access the audio/visual content ?
     The Solution – ISOBMFF
           ISO Based Media File Format – aka, Mp4

ECE 5578 Multimedia Communciation, 2021                                               p.8
Lec 18 - MPEG System - ISOBMF and DASH - ECE 5578 Multimedia Communication
ISOBMFF Design Philosophy
     Key Features
             Object Oriented
             Binary, Compact Representation
             Separate Logic structure from Physical storage.
             Hierarchical Structure and layers of abstraction for easy access
             Extensible: forward and backward compatibility, supporting many
              storage and streaming applications, .mp4, DASH, MMT, Flash, ..etc
    Main Applications
           Storage:
              o Store audio visual streams as separate tracks for ease of access and
                editing
              o Minimizing disc access by having a structure for random access
           Local Playback and Remote Streaming:
              o Necessary timing information for playback
              o Design of media data encapsulation for transport agnostic
                streaming,e.g, over RTP, over HTTP, over WebSocket.

ECE 5578 Multimedia Communciation, 2021                                                p.9
Lec 18 - MPEG System - ISOBMF and DASH - ECE 5578 Multimedia Communication
ISOBMFF File Logical Structure
    A file Contains
           Streamable timed data in tracks of a movie
           Other data (untimed or non streamable) in items
           Or a combination of both
    Defines a common timeline for all tracks for synchronization
           Important for audio/visual content playback
     Declares its type and its compatibility
           For interoperability

ECE 5578 Multimedia Communciation, 2021                             p.10
ISOBMFF items
     Untimed data, items:
           Kind of a global variable, data consumed as a whole, valid for the
            whole presentation time frame of the media data
               o E.g, copyright info
               o E.g, logos
           If having multiple items, then primary item is the entry point to access
           Can be encrypted, require key to access
           Can be compressed.

ECE 5578 Multimedia Communciation, 2021                                                p.11
ISOBMFF track
     A track
           Corresponds to a timed data of a specific type
               o Audio track of aac media type
               o Video track of avc media type
               o Video track of hevc media type
               o Hint track does not have media data
           It is decomposed into samples
               o Samples are associated with an unique timestamp,
               o e.g, a frame in video sequences
           Is usually associated with a single decoder
               o Need to specify, e.g, avc codec, hevc codec, aac codec, mp3 codec
           Has prescribed decoder configuration info
           Can be linked, grouped and alternative to other track, e.g,
               o multiple audio tracks for a movie for different language

ECE 5578 Multimedia Communciation, 2021                                              p.12
ISOBMFF sample
     Track sample:
           Represents timed data used by a decoder at a given time on the
            common media timeline
              o DTS: Decoding Time Stamp
              o CTS: Composition Time Stamp (display time)
           Sample Properties
              o Size, position, random access, decoder config
           Supports sub-samples
              o For e.g, slice, and tiles in HEVC
           May be associated into sample groups
              o Temporal scalable layers: all I frames, all B frames, e.g.
           Continuous samples maybe grouped into chunks,
           May have sample specific auxiliary info:
              o E.g, samples using the same crypto key.

ECE 5578 Multimedia Communciation, 2021                                      p.13
ISOBMFF Physical Structure - Boxes
     ISOBMFF is physically organized into boxes
           Byte based stream, no data is outside box
     Box Structure
           Box Header Length: 4~8 bytes
           Has 4 printable ascii characters, e.g, trak, mdat, moov, moof
     Hierarchical/Extensible
           Root is usually moov, or meta
           Unknown/undefined box can be skipped by application

ECE 5578 Multimedia Communciation, 2021                                     p.14
Important Boxes
     Typical media file (.mp4) file boxes

ECE 5578 Multimedia Communciation, 2021                     p.15
MP4 box example
     1-track media file – timed data

ECE 5578 Multimedia Communciation, 2021                     p.16
Untimed data example
     untimed data box structure, typical

ECE 5578 Multimedia Communciation, 2021                      p.17
Separation of Meta data from Media Data
     Easy of access allowed

ECE 5578 Multimedia Communciation, 2021                     p.18
Media Data Box Hierarchy
  Audio/Visual File Structure:

ECE 5578 Multimedia Communciation, 2021                      p.19
Media Data Box (mdat)
     Unstructured, bag of bytes
           Need track info to access the byte range
    Data bytes are stored in one or more boxes of specific type
           Mainly “mdat” type box
           Some time “idat” for item data
           This part is compression specific
     storage of sample
           Bytes belong to the same sample (e.g, video frames) are stored
            continuously
           Sample can be organized into sample groups
           Continuous sample groups can be organized into chunks
              o Supports progressive downloads by interleaving chunks of
                audio/video data
    Item data storage
           Item and media data can be interleaved.

ECE 5578 Multimedia Communciation, 2021                                      p.20
ISOBMFF Open Source Tool
     MP4Box – ParisTech/ENST
           https://gpac.wp.mines-telecom.fr/mp4box/
     To access media file:
           MP4Box –diso foreman_320kpbs.mp4
           It will have a .xml file dumped with all the box structure

ECE 5578 Multimedia Communciation, 2021                                  p.21
Fragmentation for Progressive Download
     Initial design only allow one “moov” box per file
           Have to wait for all frames to be written to save moov box
           Not good for progressive download.
    Introducing “moof”, movie fragments boxes
           Now can have multiple segments of “moof” + n “traf” (track
            fragmetns)
           Data still in “mdat”

ECE 5578 Multimedia Communciation, 2021                                  p.22
Fragmented ISOBMFF File
     Fragments structure:

                                                           Initial moov box

                                                           Seg 1 moof box

                                                           Seg 2 moof box

ECE 5578 Multimedia Communciation, 2021                                  p.23
Hint Track
     Purpose
           Dedicated to interface with streaming protocols, e.g, RTP.
           To provide additional info to assist streaming protocol paketization
            process
           Linked to the media data track
           Examples: hint track for streaming MP4 using RTP, hint track for
            packetization for FLUTE (rateless erasure correction)

ECE 5578 Multimedia Communciation, 2021                                            p.24
ISOBMFF Media Timeline
     Boxes involved
           “mdhd”: gives the time scale, sample delta
           “stts”: provide Decode Time Stamp (DTS) for each sample
           “ctts”: provide Composition (display) Time Stamp (CTS) for each
            sample
           “cslg”: additional info for specific CTS/DTS configuration
           “tfdt”: time anchor for movie fragments
           “trun”: timing for movie fragmetns relative to “tfdt”
     “stts”/”trun” coding:
           DTS(0) = 0, DTS(k)=sum(sample_delta, 0, k-1)
    “cts”/”trun” coding:
           CTS = DTS + Composition offset

ECE 5578 Multimedia Communciation, 2021                                       p.25
Movie Timeline Mapping
     Track sample presentation time mapping

ECE 5578 Multimedia Communciation, 2021                     p.26
Random Access Point (RAP) Support
    “sync” box
           If present, RAP are signaled at I frame,
           If absent, all samples are RAP, e.g, audio frames (20ms)
    Sample group
           “rap”: non-IDR intra frames
           “roll”: signal n_bytes samples to decode until the next perfect
            reconstruction can be achieved.
           “prol”” audio, n_bytes samples before the perfect reconstruction can
            achieve
     Independent and Disposable Samples
           Is_leading: for open GoP, only the first is I frame, the rest all P frames
           Sampel_depends_on: I frame or not
           Sample_has_redundancy: signal redundancy representation.
     “tref”: Track Reference
           Track n uses or refers to another track k

ECE 5578 Multimedia Communciation, 2021                                                  p.27
ISOBMFF file types
     Plain File:
           Simple recording of plain media data, data first, header last
           “mdat” box, then “moov” box, e.g, foreman_320kbps_nf120.mp4
    Progressive File:
           For progressive download or streaming
           Header first then media data
           Interleaved Chunks
    Fragmented File:
           Multiple moof segments followed by moov box
           Good for continous recording
    Segmented File:
           Self contained and playable fragments in signle file or in separate files
           For HTTP streaming (DASH)
           Tools: segment file, indexing

ECE 5578 Multimedia Communciation, 2021                                                 p.28
Software tool: FFMPEG
     Manipulate different video file formats:
           download:
          youtube-dl https://www.youtube.com/watch?v=67OX7DisQdg
           get file info:
          ffmpeg -i kyoto*.mkv

         Input #0, matroska,webm, from 'Kyoto - Tango Chirimen - Journeys in Japan-7OX7DisQdg.mkv':
           Metadata:
            COMPATIBLE_BRANDS: iso6avc1mp41
            MAJOR_BRAND : dash
            MINOR_VERSION : 0
            ENCODER         : Lavf57.83.100
           Duration: 00:28:05.02, start: -0.007000, bitrate: 2242 kb/s
            Stream #0:0: Video: h264 (High), yuv420p(tv, bt709, progressive), 1920x1080 [SAR 1:1 DAR
         16:9], 30 fps, 30 tbr, 1k tbn, 60 tbc (default)
            Metadata:
             HANDLER_NAME : ISO Media file produced by Google Inc.
             DURATION         : 00:28:05.000000000
            Stream #0:1(eng): Audio: opus, 48000 Hz, stereo, fltp (default)
            Metadata:
             DURATION         : 00:28:05.021000000

ECE 5578 Multimedia Communciation, 2021                                                                p.29
Covert to mobile device use
     Media file conversion:
           Reduce the size from 1080p to 360p
           video codec: h264, audio codec: aac
           ffmpeg -i Kyoto\ -\ Tango\ Chirimen\ -\ Journeys\ in\ Japan-67OX7DisQdg.mkv -vf
            scale=640:360 -vcodec h264 -b:v 380k -acodec aac -b:a 64k kyoto-360p.mp4

        frame=50550 fps=136 q=-1.0 Lsize= 91404kB time=00:28:05.03 bitrate= 444.4kbits/s speed=4.52x
        video:76382kB audio:13253kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.973595%
        [libx264 @ 0x55a3fa106be0] frame I:372 Avg QP:23.00 size: 24264
        [libx264 @ 0x55a3fa106be0] frame P:18657 Avg QP:25.58 size: 2490
        [libx264 @ 0x55a3fa106be0] frame B:31521 Avg QP:30.59 size: 721
        [libx264 @ 0x55a3fa106be0] consecutive B-frames: 13.5% 8.1% 6.1% 72.3%
        [libx264 @ 0x55a3fa106be0] mb I I16..4: 10.7% 59.5% 29.8%
        [libx264 @ 0x55a3fa106be0] mb P I16..4: 1.8% 4.2% 0.7% P16..4: 25.0% 8.4% 4.8% 0.0% 0.0% skip:55.2%
        [libx264 @ 0x55a3fa106be0] mb B I16..4: 0.4% 0.7% 0.1% B16..8: 26.8% 3.0% 0.5% direct: 0.8% skip:67.6%
        L0:40.8% L1:54.5% BI: 4.7%
        [libx264 @ 0x55a3fa106be0] final ratefactor: 25.85
        [libx264 @ 0x55a3fa106be0] 8x8 transform intra:62.2% inter:73.7%
        [libx264 @ 0x55a3fa106be0] coded y,uvDC,uvAC intra: 48.9% 50.5% 15.4% inter: 7.1% 5.1% 0.5%
        [libx264 @ 0x55a3fa106be0] kb/s:371.35
        [aac @ 0x55a3fa0bab00] Qavg: 906.779

ECE 5578 Multimedia Communciation, 2021                                                                            p.30
Outline
     ReCap Lecture 14
     MPEG File Format – ISOBMFF/Mp4
     MPEG DASH – Dynamic Adaptive Streaming over HTTP
    Summary

ECE 5578 Multimedia Communciation, 2021                  p.31
Internet Multimedia Protocol Stack
 Where is DASH ?
                                                                               Media encaps
   APPLICATION

                                  Synchronization Service                      (H.264, MPEG-4)
                   DASH

                               SIP        RTSP        RSVP          RTCP                     Layer 5
                                                                                             (Session)

                    HTTP                                                           RTP
                                                                                          Layer 4
                            TCP           DCCP                          UDP               (Transport)
  KERNEL

                                                                                          Layer 3
                                           IP Version 4, IP Version 6                     (Network)

                 AAL3/4           AAL5      MPLS                                         Layer 2
                                                                                         (Link/MAC)

                          ATM/Fiber Optics                     Ethernet/WiFi

ECE 5578 Multimedia Communciation, 2021                                                           p.32
Challenges with Internet Video Delivery
 Video not accessible
          Video Behind firewall           3 DoF in Video Delivery Problem
          Plugins not available
          Bandwidth not sufficient
          Wrong and non-trust device
          Wrong format
 Low Quality of Experience
          Long start-up latency
          Frequent rebuffering
          Low playback quality
          No lip-sync

ECE 5578 Multimedia Communciation, 2021                                      p.33
DASH in a Nutshell
Dynamic, Adaptive, Streaming over HTTP:
       An OTT solution
       HTTP De-facto Internet Transport Infrastructure

ECE 5578 Multimedia Communciation, 2021                        p.34
DASH Key Features
    Imitation of Streaming via Short Downloads
           Downloads desired portion in small chunks to minimize bandwidth waste
           Enables monitoring consumption and tracking clients
    Adaptation to Dynamic Conditions and Device Capabilities
           Adapts to dynamic conditions anywhere on the path through the Internet
            and/or home network
           Adapts to display resolution, CPU and memory resources of the client
           Facilitates “any device, anywhere, anytime” paradigm
     Improved Quality of Experience (QoE)
           Enables faster start-up and seeking (compared to progressive download), and
            quicker buffer fills
           Reduces skips, freezes and stutters
    Use of HTTP – de-facto Internet transport
           Well-understood naming/addressing approach, and
            authentication/authorization infrastructure
           Provides easy traversal for all kinds of middleboxes (e.g., NATs, firewalls)
           Enables cloud access, leverages existing HTTP caching infrastructure
            (Cheaper CDN costs)

ECE 5578 Multimedia Communciation, 2021                                                    p.35
Rate Adaptation in DASH
     Multiple rate representation of content
           Different frame size
           Different quality
           Different bit rate

ECE 5578 Multimedia Communciation, 2021                      p.36
Media Presentation Data Model
   MPD: a manifest of content available on HTTP server
          Accessible segments and their timing
          As a .xml file to be retrieved by clients at the start of DASH session

 Credits to figures on following slides:
 Christian and Ali, Over the Top Content Delivery: State of the Art and Challenges Ahead, ICME 2015 Tutorial

ECE 5578 Multimedia Communciation, 2021                                                                        p.37
DASH Temporal Model
   Playback time is broken up into periods
          Each periods has multiple adaptation sets
          Each adaptation sets has multiple representations/subrepresentations

                                                               Iraj: DASH Tutorial
ECE 5578 Multimedia Communciation, 2021                                              p.38
DASH Representations
    A Representation:
           One of the alternative choices of the media content typically differing
            by encoding parameters such as, bitrate, resolution, language,
            codec,etc.
           Aligned within the period’s boundaries.
           Consists of one or more Segments.
              o Contains an initialisation segment or all segments are self-initialising.
                   May contain zero or more Sub-Representations.

    A Sub-Representation:
           Provide the ability for accessing a lower quality version of the
            Representation. Examples:
               Audio track in a multiplexed Representation.
               Lower frame rate for efficient fast-forward.

ECE 5578 Multimedia Communciation, 2021                                                     p.39
DASH Segments
     A Segment is a unit that can be referenced by an HTTPURL
     included in the MPD.
           “http://” and optionally with a byte range.
     Segments’availability duration:
           the time window at which the Segments can be accessed by the HTTP-
            URL.
    Each representation has at most one SegmentInfo element which
     provides:
             Presence or absence of Initialisation and Index Segment information.
             HTTP-URL and byte range for each segment.
             Segment availability start time and availability end time for live case.
             Approximated media start time and duration of each segment.
             Fixed or variable duration.

ECE 5578 Multimedia Communciation, 2021                                                  p.40
Segment Initialization
ISOBMFF based box represenation

ECE 5578 Multimedia Communciation, 2021                            p.41
DASH Segment Indexing
 Provides information in ISO box structure on

ECE 5578 Multimedia Communciation, 2021                    p.42
DASH Adaptation Scenario
     Client Driven Operation
           Client measure throughput, and retrieve the next segment that fits
            channel condition and display requirements

ECE 5578 Multimedia Communciation, 2021                                          p.43
DASH Scope
    What need to be specified ?
           MPD – the core of the DASH data formats
           Behavior of DASH is out of normative part, but part of
            implementation guideline and amendments.

ECE 5578 Multimedia Communciation, 2021                              p.44
DASH standardization effort
   DASH technology source and history

ECE 5578 Multimedia Communciation, 2021                       p.45
DASH Software Tools
     MP4Box DASH content preparation
             How to create MPD
             How to create media Segments/Sub-segments
             How to create Segment Index
             Example:
               o MP4Box -dash 10000 -frag 1000 -rap -segment-name myDash -
                 subsegs-per-sidx 5 -url-template test.mp4
     DASH.js javascript client
           http://dashif.org/reference/players/javascript/1.4.0/samples/dash-if-
            reference-player/

ECE 5578 Multimedia Communciation, 2021                                             p.46
Summary
     ISOBMFF File Formats (aka, MP4)
           An universal abstraction and access tool to audio/visual media files
            created by different compression technology
           Supports disk storage, over the network streaming.
           Very successful and basis for a variety of technology we use these days
           Logical design based on separating logical description from physical
            box structure
     DASH
             Addresses challenges in OTT video delivery
             Utilizing the de-facto Internet infrastructure, HTTP transport
             Adaptation through multiple rate representation
             Dynamically driven by client
             Content Pieces: MPD, Media Segments, Segment Index.

ECE 5578 Multimedia Communciation, 2021                                               p.47
You can also read