- Added stats for non-nanite VSM instance culling (moved VSM stats functionality into own file).
- r.Shadow.Virtual.NonNanite.UseHZB == 2 (default) uses the current-frame Nanite VSM HZB as this enables correct culling for camera cuts & light movement and contains most of the occluding geometry.
#rb Andrew.Lauritzen
#preflight 6138a6582d09b90001568819
#ROBOMERGE-OWNER: jon.nabozny
#ROBOMERGE-AUTHOR: ola.olsson
#ROBOMERGE-SOURCE: CL 17470225 via CL 17923040 via CL 18360986 via CL 18361244
#ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469)
[CL 18361409 by jon nabozny in ue5-release-engine-test branch]
Add static_assert to prevent the creation of new ones moving forward.
Used SHADER_PARAMETER_SCALAR_ARRAY/GET_SCALAR_ARRAY_ELEMENT for single parameters, or packed them with surrounding parameters when possible.
#rb Guillaume.Abadie,Daniel.Wright,Charles.deRousiers
#preflight 61577bf15631d900011d59a1
#ROBOMERGE-AUTHOR: jeannoe.morissette
#ROBOMERGE-SOURCE: CL 17707027 in //UE5/Main/...
#ROBOMERGE-BOT: STARSHIP (Main -> Release-Engine-Test) (v879-17706426)
#ROBOMERGE[STARSHIP]: UE5-Release-Engine-Staging Release-5.0
[CL 17707037 by jeannoe morissette in ue5-release-engine-test branch]
- preparational step to enable HZB culling of invalidations in an uniform way.
- also add FComputeShaderUtils helper to set up an indirect dispatch.
#rb andrew.lauritzen
#preflight 6130818017a8610001b0cfc7
#ROBOMERGE-SOURCE: CL 17400532 via CL 17400838
#ROBOMERGE-BOT: STARSHIP (Main -> Release-Engine-Test) (v865-17346139)
[CL 17400862 by ola olsson in ue5-release-engine-test branch]
Page allocation part is a fairly minor benefit in most scenes but can occasionally make a big difference.
Projection early-out is a pretty uniform benefit all the time, especially with SMRT.
#preflight 612d1b836a14cc000118f03a
#rb ola.olsson
#ROBOMERGE-SOURCE: CL 17388425 via CL 17389280
#ROBOMERGE-BOT: STARSHIP (Main -> Release-Engine-Test) (v865-17346139)
[CL 17389330 by andrew lauritzen in ue5-release-engine-test branch]
Add cvar for including non-nanite geometry in coarse pages (so that one can disable for performance reasons). Default on.
Some general unused flags cleanup.
#rb rune.stubbe
#preflight 610ccc0e6c6eb0000196b59e
#ROBOMERGE-SOURCE: CL 17085350 via CL 17095081
#ROBOMERGE-BOT: STARSHIP (Main -> Release-Engine-Test) (v853-17066230)
[CL 17095354 by andrew lauritzen in ue5-release-engine-test branch]
Run a compute job that packs most commonly used instance data (LocalToWorld matrix and some other bits - 80 bytes) into per-instance vertex buffer. Vertex shader does not have access to GPUScene and instead loads instance data from a per-instance vertex buffer. If it needs more primitive/instance data than available then it will load it from Primitive UB, binding unique uniform buffer and breaking auto-instancing. Pixel shader has a full access to a GPUScene
There are 3 ways how FSceneDataIntermediates gets populated
1. PrimitiveId + GPUScene (Desktop)
2. Per-Instance data + Primitive UB (Mobile)
3. Primitive UB (auto-instancing disabled)
Details for GPUScene specific vertex inputs and access to FSceneDataIntermediates are hidden behind a macro:
VF_GPUSCENE_DECLARE_INPUT_BLOCK
VF_GPUSCENE_GET_INTERMEDIATES
FSceneDataIntermediates is now stored in FVertexFactoryIntermediates, FMaterialVertexParameters. Added a few GetPrimitiveData() overloads that allows you to access PrimitiveData depending on current context. Removed most of the cases where GetPrimitiveData() gets used with PrimitiveId.
#rb Ola.Ollson
#ROBOMERGE-SOURCE: CL 17093848 in //UE5/Main/...
#ROBOMERGE-BOT: STARSHIP (Main -> Release-Engine-Test) (v853-17066230)
[CL 17093856 by dmitriy dyomin in ue5-release-engine-test branch]
- Eliminates the previous frame texture (~128MB in default config) and copy
- Cached pages are kept in place, with a quick pass to find and allocate free pages following
- Experimented with a persistent free list but for the small numbers of pages we are talking about (2k-4k ish standard), recomputing the free list each frame was cheap and robust
- Main cost is synchronization/barriers between the new passes. Fairly low measured overhead compared to the much larger wins, but can revisit later if need be.
- Removed a few things that added complexity but were not making a big difference to performance, especially with the changes:
- No longer support "panning" the directional light cascade cached pages. With the changes to the snapping and depth range invalidations this was no longer making much of a difference anyways.
- No longer do the hierarchical "eliminate parent if all 4 mip children are marked" for local lights. Tested a bit and it was not making a measurable difference in most cases with caching. Can be revived if useful later.
- Moved ownership of the now persistent physical pool to CacheManager
- A bunch of related cleanup and fixes
#rb ola.olsson
#preflight 61008a655938f90001f04260
#ROBOMERGE-OWNER: andrew.lauritzen
#ROBOMERGE-AUTHOR: andrew.lauritzen
#ROBOMERGE-SOURCE: CL 16985509 via CL 16987414
#ROBOMERGE-BOT: STARSHIP (Main -> Release-Engine-Test) (v838-16927207)
[CL 16987415 by andrew lauritzen in ue5-release-engine-test branch]
#ROBOMERGE-SOURCE: CL 16917622 in //UE5/Main/...
#ROBOMERGE-BOT: STARSHIP (Main -> Release-Engine-Test) (v836-16769935)
[CL 16917631 by andrew lauritzen in ue5-release-engine-test branch]
- Add VirtualShadowMapId to forward light data, removing the per-view remapping table. Should fixe a few multi-view/split-screen bugs.
- Minor cleanup of PCF subsurface path; remove dead/broken code.
- Fix up blending of light attenuation into screen shadow mask; disable CSM fading when VSMs are enabled.
- Fix OnePassProjection flag when VSMs are disabled
#rb none
[FYI] ola.olsson, brian.karis
#ROBOMERGE-SOURCE: CL 16852407 via CL 16852415
#ROBOMERGE-BOT: STARSHIP (Main -> Release-Engine-Test) (v836-16769935)
[CL 16852422 by andrew lauritzen in ue5-release-engine-test branch]
- Uses the InstanceCullingLoadBalancer to pre-distribute the work on the CPU to ensure even load.
- Make instance culling use the instance data offset in MDC instead of translating primitive IDs.
- Track single-instance draws separately from instanced to optimize handling (disable culling for single-instance primitives).
#rb Graham.wihlidal,andrew.lauritzen
[FYI] dmitriy.dyomin
#preflight 60d0eafa2ab2180001269160
#ROBOMERGE-SOURCE: CL 16733827 in //UE5/Main/...
#ROBOMERGE-BOT: STARSHIP (Main -> Release-Engine-Test) (v835-16672529)
[CL 16733837 by ola olsson in ue5-release-engine-test branch]
- Enables overlapping batched and non-batched instance culling (needed for batching work).
- Removes some explicit transitions & minor cleanup.
- Added tracking the required number of instances (fixes non-nanite VSM for large ISMs)
#rb graham.wihlidal,jian.ru,yujiang.wang,zach.bethel
#preflight 60b73f38107dc600017d931b
[CL 16544217 by Ola Olsson in ue5-main branch]
- Instead, create a reference depth range with a guard band that we reuse frame to frame until we get too close to the edge, at which point we invalidate that clipmap level's cached pages (if any)
- Does not significantly affect invalidations, even when moving quickly near surfaces, since those cases would already get invalidated when those pixels moved to a different clipmap level
- Sets up for not re-copying cached VSM pages every frame
#rb ola.olsson
[FYI] brian.karis
#ROBOMERGE-SOURCE: CL 16492432 in //UE5/Private-Frosty/...
#ROBOMERGE-BOT: STARSHIP (Private-Frosty -> Main) (v823-16466674)
[CL 16492445 by andrew lauritzen in ue5-main branch]