XBOX Performances XNA - Part of the slide set taken from: Understanding XNA Framework Performance Shawn Hargreaves GameFest 2007
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
XBOX Performances XNA Part of the slide set taken from: ‚Understanding XNA Framework Performance‛ Shawn Hargreaves GameFest 2007
XBOX Performances • Overview – XNA Architecture • Context Switch and Batches – XBOX CPUs • Limitations and Threading – XBOX GPU – About Profiling CGL slideset 2
Windows Architecture User programs cannot directly access hardware Operating System Graphics Graphics (supervisor mode) Driver Hardware Game Executable D3D (user mode) D3DX CGL slideset 3
XBOX Architecture Consoles typically just run everything directly in supervisor mode Graphics Graphics Operating System Driver Hardware Game Executable • No mode transitions = reduced overhead • Small batches less expensive than on Windows CGL slideset 4
Xbox 360 Architecture Xbox 360 hypervisor enforces security Game Executable D3D Graphics Hypervisor (supervisor mode) D3DX Hardware • Hypervisor ensures only signed memory pages can execute • Games are signed during certification If only signed code can execute, how is a dynamically jitted runtime even possible? CGL slideset 5
Xbox 360 Architecture Xbox 360 hypervisor enforces security XNA Framework D3D Graphics Hypervisor (supervisor mode) D3DX Hardware Managed Game Managed Graphics (user mode) Device • Managed code cannot directly call D3D or D3DX • User to supervisor transitions are expensive – 4 microseconds per system call • Command buffer batches up API calls CGL slideset 6
Batchable APIs These APIs are currently batched into a single system call Assigning to: Calling: • VertexShader • Effect Begin/End • PixelShader • EffectPass Begin/End • VertexDeclaration • Effect.CommitChanges • IndexBuffer • EffectParameter.SetValue • RenderState • VertexStream.SetSource • SamplerStates • Set*ShaderConstant • Textures • StateBlock Capture/Apply • DepthStencilBuffer • SetRenderTarget • Viewport • Draw[Indexed]Primitives • ScissorRectangle • DrawUser[Indexed]Primitives • ClipPlanes • If the primitive count is small • Effect.CurrentTechnique • Clear • Resolve CGL slideset 7
Nasty Unbatchable APIs These APIs currently require one system call each • Present • Creating or destroying graphics resources • *.SetData, *.GetData – DrawUser[Indexed]Primitives • If the primitive count is large – Reading from: • VertexShader • PixelShader • RenderState • SamplerStates • Textures • Get*ShaderConstant • EffectParameter.GetValue CGL slideset 8
Cached Managed State These can be read without any system call at all • DisplayMode • Viewport • VertexDeclaration • VertexStream • IndexBuffer • Effect.CurrentTechnique CGL slideset 9
XBOX CPUs • XBOX will run your code 4 to 6 times slower than on PC… – JIT Compiler can’t reorder instruction • Stalls are added to maintain sync. • A cache miss might cost tousands cycles – Floating-points are new to .NetCF • JIT Compiler can’t use AltiVec instructions – Operators are ‘pass-by-value’ • Function call + overhead due to copy! – Garbage Collector not generational • Called every 1MB allocated • Called if out-of-memory exception occurs CGL slideset 10
XBOX CPUs • Almost no inlining – Automated, fixed rules: • 16 bytes of IL or less • No branching (typically an ‚if‛) • No local variables • No exception handlers • No 32-bit floating point arguments or return value • If the method has more than one argument, the arguments must be accessed in order from lowest to highest (as seen in the IL) • Virtual methods are never inlined – Solution? Manual inlining... CGL slideset 11
XBOX CPUs • 3 Hardware Cores – Cache + Register • 6 Threads – 0 and 2 reserved for XNA – 1, 3, 4, 5 free – 4, 5 on same core (shared cache!) CGL slideset 12
XBOX Multithreading • Xbox 360 does not automatically schedule threads across multiple cores • You must explicitly assign threads to cores – setProcessorAffinity()… see the twiki • Current Xbox 360 ThreadPool is not optimized CGL slideset 13
XBOX Multithreading • GraphicsDevice is somewhat thread-safe – Cannot render from more than one thread at a time – Can create resources and SetData while another thread renders • ContentManager is not thread-safe – Ok to have multiple instances, but only one per thread • Input is not threadable – Windows games must read input on the main game thread • Audio and networking are thread-safe CGL slideset 14
XBOX GPU • GPU is the only piece of hardware you have real access to • Expensive computations should be done with it CGL slideset 15
Profiling on Xbox 360 XNA Framework Remote Performance Monitor for Xbox 360 • Provides basic garbage collector information • Can tell if you have a GC problem, but not usually enough to diagnose the cause • Shows the number of system calls • Not much help for identifying computational bottlenecks CGL slideset 16
Tricks • Don’t use class defined operators – Use ref/out methods (.Add(ref a, out b)) – Do your computations component-wise • Do not instantiate classes at runtime! – Use structs instead • Consider moving floating-point expensive task to GPU • Pay attention at API calls • Consider GPU instancing (rendering of multiple small models) CGL slideset 17
You can also read