GET TO KNOW THE NVIDIA GRIDTM SDK - Shounak Deshpande, NVIDIA - GTC On-Demand

Page created by Travis Chang
 
CONTINUE READING
GET TO KNOW THE NVIDIA GRIDTM SDK - Shounak Deshpande, NVIDIA - GTC On-Demand
April 4-7, 2016 | Silicon Valley

GET TO KNOW
THE NVIDIA GRIDTM SDK
                                   Shounak Deshpande, NVIDIA
GET TO KNOW THE NVIDIA GRIDTM SDK - Shounak Deshpande, NVIDIA - GTC On-Demand
Background

         NVIDIA GRID SDK

AGENDA   Measuring Performance

         Maximizing Performance

         Interactive Question-Answer Session

                                               2
GET TO KNOW THE NVIDIA GRIDTM SDK - Shounak Deshpande, NVIDIA - GTC On-Demand
CLOUD\REMOTE GRAPHICS

VDI Enterprise, Remote
Workstation
    VMWare, CITRIX,
    Dassault, and more

Game streaming
    GeForceNow

    Windows DirectX /
    OpenGL

    Linux OpenGL

                                 3
GET TO KNOW THE NVIDIA GRIDTM SDK - Shounak Deshpande, NVIDIA - GTC On-Demand
REMOTE GRAPHICS ECOSYSTEM

    CLIENT                                  SERVER

             User input
                             IP     NIC    CPU
             Decode       Network
Render                                    Encode           Render
                                                 Capture

    Client                Network   Remote Graphics Server

                                                                    4
GET TO KNOW THE NVIDIA GRIDTM SDK - Shounak Deshpande, NVIDIA - GTC On-Demand
GRID SW AND HW STACK COMPONENTS
 Streaming

                      Capture (Pixel grabbing)
                  HW Accelerated video compression
                   HW Accelerated video decoding

 Virtualization

                   Graphics Shim layers (app streaming)
                       Platform Virtualization (VDI)
                             Hypervisors (VDI)
                         Full Virtualization (VDI)

 HW Platforms
                   Server                      Client
                      AWS G2 Instance
                     GRID K520, M30 GPU                 Anything
                       Tesla M60 GPU
                     NVIDIA Quadro GPUs
                                                                   5
GET TO KNOW THE NVIDIA GRIDTM SDK - Shounak Deshpande, NVIDIA - GTC On-Demand
NVIDIA GRID SDK

                  6
NVIDIA CAPTURE SDK
                    (Formerly known as NVIDIA GRID SDK)

Goal: Enable Low Latency Remote Graphics Solutions by harnessing NVIDIA GPUs

OS: Windows 7+, Linux (CentOS, Debian, RedHat, more)

Download: https://developer.nvidia.com/grid-app-game-streaming

Support: GRID-devtech-support@nvidia.com

                                                                               7
NVIDIA CAPTURE SDK COMPONENTS
Interface Definitions
                                        Sample Code
    NVFBC API            NVIFR API
                         Low Latency
     Low Latency
                        Render Target
   Desktop Capture
                           Capture
                                        Documentation

                 NVENC
               Low latency
            Hardware Encoder

   NVFBC library       NVIFR library                    GPU Driver

                                                                     8
NVIDIA CAPTURE SDK:
              THE “CAPTURE” PART

         NVFBC                               NVIFR

Brute force, capture all on       No-frills RenderTarget capture
screen
                                  Supports Directx9,10,11,
Orthogonal to Graphics APIs       OpenGL APIs
Easy to integrate with NVENC
API                               Easy to integrate with NVENC
                                  API
Easy onboarding, no process
injection                         Needs to be injected in target
                                  process
Efficient than GDI-based screen
scraping                          One session per target window

One session per display           Enables higher density of
                                  streamed apps

                                                                   9
NVIDIA CAPTURE SDK :
                             INTERFACES

       NVFBC: NVIDIA Frame Buffer Capture                     NVIFR: NVIDIA In-band Frame Render

W   NVFBC                                             NVIFR - Directx              NVIFR - Directx
i
n      NVFBCToSys            NVFBCCuda                      NVIFRToSys                   NVIFRToSys
d
o
w     NVFBCToHWEnc         NVFBCToDX9Vid                  NVIFRToHWEnc                 NVIFRToHWEnc
s

L   NVFBC                                                       NVIFR - OpenGL
                            NVFBCToCuda
i
n                                                           NVIFRToSys                 NVIFRToHWEnc
       NVFBCToSys
u
                           NVFBCToHWEnc
x

                             -ToHWEnc interfaces internally invoke NVENC API (part of NVIDIA Video Codec SDK)
                                                                                                            10
EVOLUTION OF NVIDIA CAPTURE SDK
             Legacy              2014                                             2015                                         2016

SDK                                 2.3                           3.0                    4.0                     5.0
          • GRID K340,           • GRID M30 limited            • GRID M30 full      • HEVC support             • Enable NVFBC without driver reload
            K520, K1, K2,          support                       support            • Tesla M60 support        • Windows 10 support
Windows

            Quadro 4000+         • Maxwell NVENC               • NVENC RC 2.0       • New unified codec-       • New NVFBC interface to capture
            support                enhancements –                                     agnostic interface for     desktop to DirectX 9 video memory
          • H.264 encode           quarter-res first pass;                            HW encoder                 surface, along with diffmap support
            support                lossless encoding; 4:4:4                         • Driver support for       • Timeout API for NVFBC blocking mode
          • Windows 7, 8,          encoding                                           H.264 YUV 4:4:4            capture
            8.1 support                                                               NVIFR                    • Separate thread Mouse capture for all
                                                                                      capture+encode for         NVFBC interfaces
                                                                                      DX10/DX11                • Propagate frame timestamp through
                                                                                      applications               NvIFRHWEncode

                                                               • GRID M30 full      • HEVC support
          • GRID K340,
Linux

                                                                 support            • Tesla M60 support
            K520, K1, K2,
                                                               • NvIFR full         • New unified codec-
            Quadro 4000+
                                                                 parity for           agnostic interface for
            support
                                                                 NVENC features       HW encoder
          • H.264 encode
                                                                 with Windows
            support
                                                               • NVENC RC 2.0

                                                        GRID K340, K520, K1, K2, Quadro K2000+
HW

                                                                                  GRID M30, Quadro M6000

                                                                                                                                   Tesla M60

                                                                                                                                          11
USING NVFBC API

                  12
USING NVFBC FOR DESKTOP CAPTURE

Enable NVFBC

Create NVFBC capture session object

Setup NVFBC capture session object

Capture

Release NVFBC capture session object

                                                      13
CAPTURING A SCREENSHOT WITH NVFBC

                            Create NVFBC session object

                            Set up NVFBC session
                            “Capture” starts here

                             Read Grabbed buffer

                                                          14
CAPTURING USING NVFBC
                                     Begin

                                                      NVFBC enabled,
                               NvFBCGetStatusEx()       not in use
                               Check NVFBC Status

          NVFBC Not Enabled                              NvFBCCreateEx()
                                                       Create NVFBC Session

                          NVFBC already in              Success
                                use
Success                                                 Setup NVFBC Session       Fail

                                                        Success
              NvFBCEnable()
              Enable NVFBC                                              Success
              Fail                             Fail           Grab()

                                                        Fail \
                                                      Terminate

                                      Exit             Release NVFBC Session

                                                                                         15
DESKTOP REMOTING
                   USING NVFBC + NVENC HW ENCODER
Desktop
Composition                 NVFBC Capture Process
[System
Process]
                                               IDirec3DSurface9*
                                                 IDirec3DSurface9*
                                                    Captured buffer
                                                                                      Video Bitstream
                              Capture Thread                     Encode Thread            packet

 NV GPU Driver                    NVFBC                               NVENC API

                    3D HW                 NV GPU                      NVENC HW

     < 1millisec                        ~ 2 millisec                   ~ 4 millisec     * Latency approx. for 1080p desktop
                                                                                              streamed as 720p 16 video
USING NVIFR API

                  17
USING NVIFR FOR APPLICATION STEAMING

Write a Shim layer to host NVIFR

Inject Shim layer into target application

Fetch rendering graphics context

Create NVIFR session object using the context

Setup NVIFR session object

Capture

Release NVIFR session object

                                                         18
APP STREAMING USING HW ENCODER

            App

         Render() or
          Present()
                                                    Streaming
             Shim
                                                   Component
                                 Compressed
                               Video Bitstream
            NVIFR

                                                 NVIFR is injected into the application
     DX/OGL Runtime                              before the graphics runtime, using an
                                                          app-level shim layer

                       NVENC
 3D HW
           NV GPU       HW
                                                                                19
DIRECTX APP STREAMING USING NVIFR HW ENCODER
                                        Application allocates
                                         output buffers and
                                           event handles

                                       Select the rate control
                                         mode and encoder
                                        preset according to
                                              use case

                                                     20
DIRECTX APP STREAMING USING NVIFR HW ENCODER

                                   The event handles passed to
                                   NvIFRSetupHWEncoder will be signaled
                                   when NVENC has finished work
                                   submitted by
                                   NvIFRTransferRenderTargetToHWEncoder
                                   API

                                                            21
OPENGL APP STREAMING USING NVIFR HW ENCODER

                                        Create session

                                              Create
                                          TransferObject
                                                 22
OPENGL APP STREAMING USING NVIFR HW ENCODER

                                        Capture + Encode

                                        Retrieve output
                                           bitstream

                                         Release buffers
                                            for re-use

                                                  23
MEASURING PERFORMANCE

                        24
MEASURING PERFORMANCE
                                                 Guidelines

Use high precision timers.

In-process performance measurement is suitable only for generating average numbers.

Measure GPU Utilization. (GPU-Z, NVIDIA SMI, etc.)

Note GPU clock values during measurement.

                                                                                      25
MEASURING PERFORMANCE

                        Use High Performance
                          Multimedia Timer
                             for accuracy

                                               26
MEASURING PERFORMANCE
                          Start Measurement
                          before capture loop

                        Run through capture\encode loop

                        Stop Measurement here

                                                27
MAXIMIZING QUALITY & PERFORMANCE

                                   28
MAXIMIZING QUALITY & PERFORMANCE
                                    Goals & Challenges

Goals:
- Low latency
- Smooth playback of streamed video
- Minimum impact on target application\system performance
Challenge:
- Finding the right balance to get maximum CPU-GPU utilization without negative impact

                                                                     NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   29
MAXIMIZING QUALITY & PERFORMANCE
                                               Guidelines

Know the system’s limits.
Memory management : Ensure there is no time lost for paging
Resource Utilization : GPU-intensive applications need frame rate throttling while lightweight
appllications need pipelining and multithreading of capture – encode/post-process tasks
Timing : Ensure capture rate matches display rate
Impact on target : Use parallelism

                                                                                 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   30
MAXIMIZING QUALITY & PERFORMANCE
                     Memory management

Ensure no paging.         Loss due to paging (insufficient video memory)
- Choose optimal
rendering quality
settings
- Choose optimal
desktop or
application window
resolution

                             Paging work                   Encoder
                                                             Idle
                                                          NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   31
MAXIMIZING QUALITY & PERFORMANCE
                    Resource Utilization: Multithreading

Capture and encode/post-process should run on
different threads
Constraints:
   Multiple threads must not concurrently access
   same DirectX context
   NVIFR Capture thread should never stall
   NVFBC Capture thread should never miss a display
   refresh

                                                           NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   32
MAXIMIZING QUALITY & PERFORMANCE
                   Resource Utilization : Pipelining
 Goal: Minimize time spent by encode thread to wait for capture
 to complete and vice versa

 Benefit: Control on timing capture calls, less impact on
 application rendering performance

 Triple buffering is sufficient in most cases

                                                    Encode\Post-
  Capture                                              process
  Thread                Buffer Queue                    Thread
 [write to                                           [read from
buffer # i]                                            buffer#
                                                       (i-1)%N]

                                                               NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   33
MAXIMIZING PERFORMANCE
                     Resource Utilization: Multiple Contexts with NVIFR

Why use multiple contexts?
                                             Game’s D3D Context                  Encoder’s D3D Context
    NVIFR capture happens in-band,
    shares the DirectX/OGL context           NvIFRCopyToSharedSurface           NvIFRCopyFromSharedSurface
    used by the target application.                  for DX9,                            for DX9,
    Any GPU work scheduled by                 StretchRect to a shared              StretchRect from a shared
    NVIFR on this context reflects as            surface for DX9Ex                     surface for DX9Ex
    drop in rendering frame rate
                                             ResourceCopyRegion to a              ResourceCopyRegion from a
Solution:                                     shared surface for Dx1x               shared surface for Dx1x

    Use shared buffers to hold
    captured output, for processing
    through a separate DirectX/OGL
    context running on a separate
    thread.
                                                                        Shared Surface

                                                                                NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   34
MAXIMIZING QUALITY & PERFORMANCE
                                    NUMA

NUMA: Non-Uniform Memory Addressing

Create resources in the same part of the memory where the
bus holding the GPU is located, reduces contention for bus
bandwidth.

                                                             NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   35
MAXIMIZING QUALITY & PERFORMANCE
                                 QoS
Network bandwidth control :
NV_HW_ENC_PARAMS_RC_2PASS_FRAMESIZE_CAP

Recovering from packet loss :
Reference frame invalidation
NV_HW_ENC_PIC_PARAMS::bInvalidateReferenceFrames
NV_HW_ENC_PIC_PARAMS::ulInvalidFrameTimeStamps[]

Avoiding insertion of IDR frames :
Intra-Refresh
NV_HW_ENC_PIC_PARAMS::bStartIntraRefresh
NV_HW_ENC_PIC_PARAMS::dwIntraRefreshCnt

Dynamic bitrate change :
NV_HW_ENC_PIC_PARAMS::bDynamicBitRate
NV_HW_ENC_PIC_PARAMS::dwNewAvgBitrate,dwNewPeakBitR
ate,dwNewVBVBufferSize,dwNewVBVInitialDelay
                                                      NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   36
COMPATIBILITY

                37
NVIDIA CAPTURE SDK – DRIVER COMPATIBILITY

GPU driver maintains backward compatibility with NVIDIA Capture SDK versions.
Compatibility of Upgraded Application (new SDK interfaces) with already deployed old GPU
drivers needs special handling in application.

                                                                                           38
MANAGING SDK UPGRADES

Compile for multiple interface versions,   App
select based on highest supported
version at run-time
                                                 IFBC_v1               IFBC_v2

                                                 NvFBCGetGRIDSDKVersion() *

                                                           NVFBC
                                                           session
                                                           Object

                                                                     *Similar API is available for NVIFR
                                                                                                 39
QUESTIONS ?

              40
REFERENCES

Past GTC talks about related topics available here.

                                                      Resources
                                                      https://developer.nvidia.com/grid-app-game-streaming
                                                      http://www.nvidia.com/object/cloud-get-started.html
                                                      http://www.nvidia.com/object/enterprise-virtualization.html

                                                                                                               41
NVIDIA VIDEO SDK:
                                      HW VIDEO ENCODING
Video Compression for
game recording, remote
desktop streaming

NVENC HW Encoder
• H.264 support

• HEVC (H.265) support

• Optimized encode settings for low
  latency streaming

NVIDIA Capture SDK enables
easy integration with
NVENC API
•   NVIFRToHWEnc

•   NVFBCToDX9Vid, NVFBCCuda,
    NVFBCToHWEnc

                                                           42
WELCOME TO THE NVIDIA VMWARE COMMUNITY
A community dedicated to NVIDIA and VMware solutions

Web portal with discussions, solution updates and
basic sales support
Interact with peers, learn tips / tricks and
accelerate NVIDIA GRID vGPU deployment on
VMware
Available to any customer who completes a brief
questionnaire
Join us today www.nvidia.com/nvc

                                                       43
April 4-7, 2016 | Silicon Valley

For Queries related to NVIDIA Capture SDK, get in touch with us at:
GRID-devtech-support@nvidia.com

THANK YOU

JOIN THE NVIDIA DEVELOPER PROGRAM AT developer.nvidia.com/join
You can also read