Lec 18 - MPEG System - ISOBMF and DASH - ECE 5578 Multimedia Communication
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
ECE 5578 Multimedia Communication Lec 18 - MPEG System - ISOBMF and DASH Zhu Li Dept of CSEE, UMKC Office: FH560E, Email: lizhu@umkc.edu, Ph: x 2346. http://l.web.umkc.edu/lizhu slides created with WPS Office Linux and EqualX LaTex equation editor ECE5578 Multimedia Communciation, 2021 p.1
Outline Recap QoE MPEG File Format – ISOBMFF/Mp4 MPEG DASH – Dynamic Adaptive Streaming over HTTP Summary ECE 5578 Multimedia Communciation, 2021 p.2
Learning based Point Cloud Compression Network Details • Basic Unit: Inception Residual Network (IRN) • Down-scaling: Convolution with a stride of two • Up-scaling: Transpose convolution with a stride of two Details illustration of Multiscale PCGC Open source at: https://github.com/NJUVISION/PCGCv2 ECE 5578 Multimedia Communciation, 2021 p.3
Recap - QoE MOS scores user mean opinion score need to control stats quality rather expensive to generate objective scores: MSE/PSNR 100 JPEG images 90 JPEG2000 images Fitting with Logistic Function 80 70 60 MOS 50 40 30 20 10 0 15 20 25 30 35 40 45 50 PSNR ECE 5578 Multimedia Communciation, 2021 p.4
SSIM SSIM - Structural Similarity Index ( , ) ( , ) ( , ) 2 +( ) 2 +( ) ( , )= ( , )= + +( ) + +( ) +( ) 1 ( , )= = ( − )( − ) +( ) −1 =1 ECE 5578 Multimedia Communciation, 2021 p.5
SSIM Performance better correlates with human perception MSE=0, MSSIM=1 MSE=225, MSSIM=0.949 MSE=225, MSSIM=0.989 MSE=215, MSSIM=0.671 MSE=225, MSSIM=0.688 MSE=225, MSSIM=0.723 ECE 5578 Multimedia Communciation, 2021 p.6
Outline ReCap HEVC Rate Control MPEG File Format – ISOBMFF/Mp4 MPEG DASH – Dynamic Adaptive Streaming over HTTP Summary ECE 5578 Multimedia Communciation, 2021 p.7
MPEG File Format (FF) The problem Different audio/video compression technology produces different bitstreams How to create a format that different applications can built an uniform APIs to interpret the bit stream, access the audio/visual content ? The Solution – ISOBMFF ISO Based Media File Format – aka, Mp4 ECE 5578 Multimedia Communciation, 2021 p.8
ISOBMFF Design Philosophy Key Features Object Oriented Binary, Compact Representation Separate Logic structure from Physical storage. Hierarchical Structure and layers of abstraction for easy access Extensible: forward and backward compatibility, supporting many storage and streaming applications, .mp4, DASH, MMT, Flash, ..etc Main Applications Storage: o Store audio visual streams as separate tracks for ease of access and editing o Minimizing disc access by having a structure for random access Local Playback and Remote Streaming: o Necessary timing information for playback o Design of media data encapsulation for transport agnostic streaming,e.g, over RTP, over HTTP, over WebSocket. ECE 5578 Multimedia Communciation, 2021 p.9
ISOBMFF File Logical Structure A file Contains Streamable timed data in tracks of a movie Other data (untimed or non streamable) in items Or a combination of both Defines a common timeline for all tracks for synchronization Important for audio/visual content playback Declares its type and its compatibility For interoperability ECE 5578 Multimedia Communciation, 2021 p.10
ISOBMFF items Untimed data, items: Kind of a global variable, data consumed as a whole, valid for the whole presentation time frame of the media data o E.g, copyright info o E.g, logos If having multiple items, then primary item is the entry point to access Can be encrypted, require key to access Can be compressed. ECE 5578 Multimedia Communciation, 2021 p.11
ISOBMFF track A track Corresponds to a timed data of a specific type o Audio track of aac media type o Video track of avc media type o Video track of hevc media type o Hint track does not have media data It is decomposed into samples o Samples are associated with an unique timestamp, o e.g, a frame in video sequences Is usually associated with a single decoder o Need to specify, e.g, avc codec, hevc codec, aac codec, mp3 codec Has prescribed decoder configuration info Can be linked, grouped and alternative to other track, e.g, o multiple audio tracks for a movie for different language ECE 5578 Multimedia Communciation, 2021 p.12
ISOBMFF sample Track sample: Represents timed data used by a decoder at a given time on the common media timeline o DTS: Decoding Time Stamp o CTS: Composition Time Stamp (display time) Sample Properties o Size, position, random access, decoder config Supports sub-samples o For e.g, slice, and tiles in HEVC May be associated into sample groups o Temporal scalable layers: all I frames, all B frames, e.g. Continuous samples maybe grouped into chunks, May have sample specific auxiliary info: o E.g, samples using the same crypto key. ECE 5578 Multimedia Communciation, 2021 p.13
ISOBMFF Physical Structure - Boxes ISOBMFF is physically organized into boxes Byte based stream, no data is outside box Box Structure Box Header Length: 4~8 bytes Has 4 printable ascii characters, e.g, trak, mdat, moov, moof Hierarchical/Extensible Root is usually moov, or meta Unknown/undefined box can be skipped by application ECE 5578 Multimedia Communciation, 2021 p.14
Important Boxes Typical media file (.mp4) file boxes ECE 5578 Multimedia Communciation, 2021 p.15
MP4 box example 1-track media file – timed data ECE 5578 Multimedia Communciation, 2021 p.16
Untimed data example untimed data box structure, typical ECE 5578 Multimedia Communciation, 2021 p.17
Separation of Meta data from Media Data Easy of access allowed ECE 5578 Multimedia Communciation, 2021 p.18
Media Data Box Hierarchy Audio/Visual File Structure: ECE 5578 Multimedia Communciation, 2021 p.19
Media Data Box (mdat) Unstructured, bag of bytes Need track info to access the byte range Data bytes are stored in one or more boxes of specific type Mainly “mdat” type box Some time “idat” for item data This part is compression specific storage of sample Bytes belong to the same sample (e.g, video frames) are stored continuously Sample can be organized into sample groups Continuous sample groups can be organized into chunks o Supports progressive downloads by interleaving chunks of audio/video data Item data storage Item and media data can be interleaved. ECE 5578 Multimedia Communciation, 2021 p.20
ISOBMFF Open Source Tool MP4Box – ParisTech/ENST https://gpac.wp.mines-telecom.fr/mp4box/ To access media file: MP4Box –diso foreman_320kpbs.mp4 It will have a .xml file dumped with all the box structure ECE 5578 Multimedia Communciation, 2021 p.21
Fragmentation for Progressive Download Initial design only allow one “moov” box per file Have to wait for all frames to be written to save moov box Not good for progressive download. Introducing “moof”, movie fragments boxes Now can have multiple segments of “moof” + n “traf” (track fragmetns) Data still in “mdat” ECE 5578 Multimedia Communciation, 2021 p.22
Fragmented ISOBMFF File Fragments structure: Initial moov box Seg 1 moof box Seg 2 moof box ECE 5578 Multimedia Communciation, 2021 p.23
Hint Track Purpose Dedicated to interface with streaming protocols, e.g, RTP. To provide additional info to assist streaming protocol paketization process Linked to the media data track Examples: hint track for streaming MP4 using RTP, hint track for packetization for FLUTE (rateless erasure correction) ECE 5578 Multimedia Communciation, 2021 p.24
ISOBMFF Media Timeline Boxes involved “mdhd”: gives the time scale, sample delta “stts”: provide Decode Time Stamp (DTS) for each sample “ctts”: provide Composition (display) Time Stamp (CTS) for each sample “cslg”: additional info for specific CTS/DTS configuration “tfdt”: time anchor for movie fragments “trun”: timing for movie fragmetns relative to “tfdt” “stts”/”trun” coding: DTS(0) = 0, DTS(k)=sum(sample_delta, 0, k-1) “cts”/”trun” coding: CTS = DTS + Composition offset ECE 5578 Multimedia Communciation, 2021 p.25
Movie Timeline Mapping Track sample presentation time mapping ECE 5578 Multimedia Communciation, 2021 p.26
Random Access Point (RAP) Support “sync” box If present, RAP are signaled at I frame, If absent, all samples are RAP, e.g, audio frames (20ms) Sample group “rap”: non-IDR intra frames “roll”: signal n_bytes samples to decode until the next perfect reconstruction can be achieved. “prol”” audio, n_bytes samples before the perfect reconstruction can achieve Independent and Disposable Samples Is_leading: for open GoP, only the first is I frame, the rest all P frames Sampel_depends_on: I frame or not Sample_has_redundancy: signal redundancy representation. “tref”: Track Reference Track n uses or refers to another track k ECE 5578 Multimedia Communciation, 2021 p.27
ISOBMFF file types Plain File: Simple recording of plain media data, data first, header last “mdat” box, then “moov” box, e.g, foreman_320kbps_nf120.mp4 Progressive File: For progressive download or streaming Header first then media data Interleaved Chunks Fragmented File: Multiple moof segments followed by moov box Good for continous recording Segmented File: Self contained and playable fragments in signle file or in separate files For HTTP streaming (DASH) Tools: segment file, indexing ECE 5578 Multimedia Communciation, 2021 p.28
Software tool: FFMPEG Manipulate different video file formats: download: youtube-dl https://www.youtube.com/watch?v=67OX7DisQdg get file info: ffmpeg -i kyoto*.mkv Input #0, matroska,webm, from 'Kyoto - Tango Chirimen - Journeys in Japan-7OX7DisQdg.mkv': Metadata: COMPATIBLE_BRANDS: iso6avc1mp41 MAJOR_BRAND : dash MINOR_VERSION : 0 ENCODER : Lavf57.83.100 Duration: 00:28:05.02, start: -0.007000, bitrate: 2242 kb/s Stream #0:0: Video: h264 (High), yuv420p(tv, bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 30 fps, 30 tbr, 1k tbn, 60 tbc (default) Metadata: HANDLER_NAME : ISO Media file produced by Google Inc. DURATION : 00:28:05.000000000 Stream #0:1(eng): Audio: opus, 48000 Hz, stereo, fltp (default) Metadata: DURATION : 00:28:05.021000000 ECE 5578 Multimedia Communciation, 2021 p.29
Covert to mobile device use Media file conversion: Reduce the size from 1080p to 360p video codec: h264, audio codec: aac ffmpeg -i Kyoto\ -\ Tango\ Chirimen\ -\ Journeys\ in\ Japan-67OX7DisQdg.mkv -vf scale=640:360 -vcodec h264 -b:v 380k -acodec aac -b:a 64k kyoto-360p.mp4 frame=50550 fps=136 q=-1.0 Lsize= 91404kB time=00:28:05.03 bitrate= 444.4kbits/s speed=4.52x video:76382kB audio:13253kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.973595% [libx264 @ 0x55a3fa106be0] frame I:372 Avg QP:23.00 size: 24264 [libx264 @ 0x55a3fa106be0] frame P:18657 Avg QP:25.58 size: 2490 [libx264 @ 0x55a3fa106be0] frame B:31521 Avg QP:30.59 size: 721 [libx264 @ 0x55a3fa106be0] consecutive B-frames: 13.5% 8.1% 6.1% 72.3% [libx264 @ 0x55a3fa106be0] mb I I16..4: 10.7% 59.5% 29.8% [libx264 @ 0x55a3fa106be0] mb P I16..4: 1.8% 4.2% 0.7% P16..4: 25.0% 8.4% 4.8% 0.0% 0.0% skip:55.2% [libx264 @ 0x55a3fa106be0] mb B I16..4: 0.4% 0.7% 0.1% B16..8: 26.8% 3.0% 0.5% direct: 0.8% skip:67.6% L0:40.8% L1:54.5% BI: 4.7% [libx264 @ 0x55a3fa106be0] final ratefactor: 25.85 [libx264 @ 0x55a3fa106be0] 8x8 transform intra:62.2% inter:73.7% [libx264 @ 0x55a3fa106be0] coded y,uvDC,uvAC intra: 48.9% 50.5% 15.4% inter: 7.1% 5.1% 0.5% [libx264 @ 0x55a3fa106be0] kb/s:371.35 [aac @ 0x55a3fa0bab00] Qavg: 906.779 ECE 5578 Multimedia Communciation, 2021 p.30
Outline ReCap Lecture 14 MPEG File Format – ISOBMFF/Mp4 MPEG DASH – Dynamic Adaptive Streaming over HTTP Summary ECE 5578 Multimedia Communciation, 2021 p.31
Internet Multimedia Protocol Stack Where is DASH ? Media encaps APPLICATION Synchronization Service (H.264, MPEG-4) DASH SIP RTSP RSVP RTCP Layer 5 (Session) HTTP RTP Layer 4 TCP DCCP UDP (Transport) KERNEL Layer 3 IP Version 4, IP Version 6 (Network) AAL3/4 AAL5 MPLS Layer 2 (Link/MAC) ATM/Fiber Optics Ethernet/WiFi ECE 5578 Multimedia Communciation, 2021 p.32
Challenges with Internet Video Delivery Video not accessible Video Behind firewall 3 DoF in Video Delivery Problem Plugins not available Bandwidth not sufficient Wrong and non-trust device Wrong format Low Quality of Experience Long start-up latency Frequent rebuffering Low playback quality No lip-sync ECE 5578 Multimedia Communciation, 2021 p.33
DASH in a Nutshell Dynamic, Adaptive, Streaming over HTTP: An OTT solution HTTP De-facto Internet Transport Infrastructure ECE 5578 Multimedia Communciation, 2021 p.34
DASH Key Features Imitation of Streaming via Short Downloads Downloads desired portion in small chunks to minimize bandwidth waste Enables monitoring consumption and tracking clients Adaptation to Dynamic Conditions and Device Capabilities Adapts to dynamic conditions anywhere on the path through the Internet and/or home network Adapts to display resolution, CPU and memory resources of the client Facilitates “any device, anywhere, anytime” paradigm Improved Quality of Experience (QoE) Enables faster start-up and seeking (compared to progressive download), and quicker buffer fills Reduces skips, freezes and stutters Use of HTTP – de-facto Internet transport Well-understood naming/addressing approach, and authentication/authorization infrastructure Provides easy traversal for all kinds of middleboxes (e.g., NATs, firewalls) Enables cloud access, leverages existing HTTP caching infrastructure (Cheaper CDN costs) ECE 5578 Multimedia Communciation, 2021 p.35
Rate Adaptation in DASH Multiple rate representation of content Different frame size Different quality Different bit rate ECE 5578 Multimedia Communciation, 2021 p.36
Media Presentation Data Model MPD: a manifest of content available on HTTP server Accessible segments and their timing As a .xml file to be retrieved by clients at the start of DASH session Credits to figures on following slides: Christian and Ali, Over the Top Content Delivery: State of the Art and Challenges Ahead, ICME 2015 Tutorial ECE 5578 Multimedia Communciation, 2021 p.37
DASH Temporal Model Playback time is broken up into periods Each periods has multiple adaptation sets Each adaptation sets has multiple representations/subrepresentations Iraj: DASH Tutorial ECE 5578 Multimedia Communciation, 2021 p.38
DASH Representations A Representation: One of the alternative choices of the media content typically differing by encoding parameters such as, bitrate, resolution, language, codec,etc. Aligned within the period’s boundaries. Consists of one or more Segments. o Contains an initialisation segment or all segments are self-initialising. May contain zero or more Sub-Representations. A Sub-Representation: Provide the ability for accessing a lower quality version of the Representation. Examples: Audio track in a multiplexed Representation. Lower frame rate for efficient fast-forward. ECE 5578 Multimedia Communciation, 2021 p.39
DASH Segments A Segment is a unit that can be referenced by an HTTPURL included in the MPD. “http://” and optionally with a byte range. Segments’availability duration: the time window at which the Segments can be accessed by the HTTP- URL. Each representation has at most one SegmentInfo element which provides: Presence or absence of Initialisation and Index Segment information. HTTP-URL and byte range for each segment. Segment availability start time and availability end time for live case. Approximated media start time and duration of each segment. Fixed or variable duration. ECE 5578 Multimedia Communciation, 2021 p.40
Segment Initialization ISOBMFF based box represenation ECE 5578 Multimedia Communciation, 2021 p.41
DASH Segment Indexing Provides information in ISO box structure on ECE 5578 Multimedia Communciation, 2021 p.42
DASH Adaptation Scenario Client Driven Operation Client measure throughput, and retrieve the next segment that fits channel condition and display requirements ECE 5578 Multimedia Communciation, 2021 p.43
DASH Scope What need to be specified ? MPD – the core of the DASH data formats Behavior of DASH is out of normative part, but part of implementation guideline and amendments. ECE 5578 Multimedia Communciation, 2021 p.44
DASH standardization effort DASH technology source and history ECE 5578 Multimedia Communciation, 2021 p.45
DASH Software Tools MP4Box DASH content preparation How to create MPD How to create media Segments/Sub-segments How to create Segment Index Example: o MP4Box -dash 10000 -frag 1000 -rap -segment-name myDash - subsegs-per-sidx 5 -url-template test.mp4 DASH.js javascript client http://dashif.org/reference/players/javascript/1.4.0/samples/dash-if- reference-player/ ECE 5578 Multimedia Communciation, 2021 p.46
Summary ISOBMFF File Formats (aka, MP4) An universal abstraction and access tool to audio/visual media files created by different compression technology Supports disk storage, over the network streaming. Very successful and basis for a variety of technology we use these days Logical design based on separating logical description from physical box structure DASH Addresses challenges in OTT video delivery Utilizing the de-facto Internet infrastructure, HTTP transport Adaptation through multiple rate representation Dynamically driven by client Content Pieces: MPD, Media Segments, Segment Index. ECE 5578 Multimedia Communciation, 2021 p.47
You can also read