This field is no longer used, which frees up 4 bytes in the payload and in the path state.
#rb Patrick.Kelly
#jira none
#preflight 636d6ab185629dff9115b77c
[CL 23088414 by chris kulla in ue5-main branch]
*** This change will incur a full shader invalidation across all platforms ***
Issues:
- Some platforms require async compute dispatch indirect arguments to not cross specific memory boundaries
- This places restrictions on the valid sizes for a dispatch indirect argument set. We were not conforming to these restrictions which could result in GPU crashes on these async passes
Fixes:
- FRHIDispatchIndirectParameters is padded out to meet per-platform memory boundary restrictions
- This is driven via new per-platform preprocessor define PLATFORM_DISPATCH_INDIRECT_ARGUMENT_BOUNDARY_SIZE
- Some platforms require FRHIDispatchIndirectParameters to align with their internal structure hence we cannot universally size to meet all platform's requirements
- Introduce new FRHIDispatchIndirectParametersNoPadding for uses when we explicitly do not want the padding and otherwise avoid the memory boundary restrictions
- Revise and expand indirect argument validation code to catch further such issues in the future
- Update shaders which write to dispatch indirect argument buffers to account for optional per-platform padding
- New utility function WriteDispatchIndirectArgs introduced to faciliate this
- platforms which require other than the default nonpadded dispatch indirect arguments must define DISPATCH_INDIRECT_UINT_COUNT and their own WriteDispatchIndirectArgs in their CommonPlatform.ush
- move creation of DispatchIndirectGraphicsCommandSignature command signature to be per-platform
- DispatchIndirectGraphicsCommandSignature and DispatchIndirectComputeCommandSignature stride changed to account for additional padding on impacted platforms
Testing:
- ran Lyra with and without async compute Lumen on impacted platforms as well as Win64
- ran FN replay on impacted platforms
#rb Krzysztof.Narkowicz, Ben.Woodhouse, Benjamin.Rouveyrol
#jira UE-167950
#preflight 6359563b2e6690262a11bc06
[CL 22862498 by eric mcdaniel in ue5-main branch]
This is meant to help avoid inconsistencies between GetRayTracingPayloadType() and the shader code.
As an example of the usage, convert RayTracingDebug related shaders to only enable the payload conditionally. Fix the current incorrect mixing of payload uses in the same shader by introducing a permutation for the material version of the debug shader vs. the one that uses the debug payload (this was previously a runtime shader parameter only which implied two different payloads could theoretically be compiled into one RTPSO).
Also conditionaly enable some of the simpler payload types like the niagara and decal ones that are only used in a few places. Other payloads will be handled in future refactors.
Split RayTracingDebug.usf into smaller files that only have one raytracing entry in them to reduce the amount of counditional logic needed around payloads.
#rb Yuriy.ODonnell
#jira none
#preflight 635c49e9ae6840072d4df82f
[CL 22849513 by chris kulla in ue5-main branch]
This persistent atlas hold IES texture in a unique texture array which is used by all systems (raster/RT/PT/Lumen), and unify IES profile rendering.
#rb chris.kulla, sebastien.hillaire, krzysztof.narkowicz
#jira UE-167618
#preflight 6356fd907261e565c436a0fb
[CL 22806455 by Charles deRousiers in ue5-main branch]
- Also moved Single Layer Water tile classification pass to the depth pre-pass and generated a mask to let the VSM page marking skip fetching water data for every pixel.
#rb tim.doerries,wouter.dek
[FYI] andrew.lauritzen
#preflight 635150d7047f3570ade21f92
[CL 22726871 by ola olsson in ue5-main branch]
* Collect instance AABBs when building RayTracingScene
* Output depth from RayTracingDebug pass for depth testing the instance AABBs
* Render instance AABBs as boxes with depth testing to accumulate the overlap which is then converted to heatmap and blended on top of RayTracingDebug output
* Non-GPUScene instances are not yet supported in this mode.
#rb yuriy.odonnell, tiago.costa
#preflight 63485810ad0f7e2f20eb5db2
[CL 22524168 by aleksander netzel in ue5-main branch]
Compile DXR shaders using the same profile when targeting PCD3D_SM5 and PCD3D_SM6 (i.e. lib_6_6 or lib_6_5 based on USE_SHADER_MODEL_6_6 define).
#rb christopher.waters
#preflight 63376cc6466fb43669315f62
[CL 22286556 by yuriy odonnell in ue5-main branch]
- Also add the ability for clusters to be able to be sorted into up to two bins so that WPO-disabled clusters will render with the most optimal shaders possible.
#rb rune.stubbe
#preflight skip
[CL 22193070 by jamie hayes in ue5-main branch]
This makes the dbuffer pass 0.05ms faster, but the makes the classification slower by 0.1ms. So no win for now, but further optimization are coming. Disabled for now.
#rb none
#jira none
#preflight 632979ec9840225da24fc245
#fyi sebastien.hillaire
[CL 22091302 by Charles deRousiers in ue5-main branch]
* This CL adds dedicated legacy material compaction and writes 3uint through mrt (for simple clear coat/sss wrap/cloth) instead of 4 uint with 2 through UAV ops). This allows faster base pass in particular on slower platform.
* Activating DBuffer make the entire base pass slower with this change list, but this will be addressed with subsequent changes.
Work done by Seb, submitted on his behalf.
#rb charles.derousiers
#jira none
#preflight 631f8804065c4ac9ce61e81d
#fyi sebastien.hillaire
[CL 22002833 by Charles deRousiers in ue5-main branch]
- bMaterialMayModifyPosition -> bMaterialUsesWorldPositionOffset for non-nanite, we don't want to invalidate due to PDO
- Nanite instance culling records static primitives that invalidate the VSM (so they get cached as dynamic)
- Dirty page flags now store the invalidation as well (static & dynamic) so 3x in size
- Nanite VSM instance/cluster culling uses PRIMITIVE_SCENE_DATA_FLAG_EVALUATE_WORLD_POSITION_OFFSET to drive bHasMoved and invalidaiton.
- Non-nanite instance culling outputs dirty page flags for invalidating instances, with in-group load balancing for large footprints
- Store invalidation flags in physical page metadata flags (removed the cumulative dirty flags buffer).
- Added bAnyMaterialHasWorldPositionOffset (accessor AnyMaterialHasWorldPositionOffset()) to FPrimitiveSceneProxy.
- Driving PRIMITIVE_SCENE_DATA_FLAG_EVALUATE_WORLD_POSITION_OFFSET from bAnyMaterialHasWorldPositionOffset in addition to EvaluateWorldPositionOffset.
- Removed near clip permutation for non-nanite VSM culling shader.
- Non-nanite vsm raster passes are all batched in a single RDG pass to better allow overlap between draws and lower pass overhead.
- Removed old GPU->GPU invalidation logic.
- Removed dynamic caster flags and update the physical page metadata directly
- Renamed PRIMITIVE_SCENE_DATA_FLAG_SHOULD_CACHE_SHADOW -> PRIMITIVE_SCENE_DATA_FLAG_SHOULD_CACHE_SHADOW
#rb andrew.lauritzen,jamie.hayes
#jira UE-147061
#preflight 631b0a18a60c539c98cf1308
[CL 21973831 by ola olsson in ue5-main branch]
Fixed a bug wehre loadiung fast material would not be done correctly inside the classification pass for the sharedmemeory path.
#rb none
#jira UE-161415
#preflight https://horde.devtools.epicgames.com/job/630909b9987e7155b1af0aec
#fyi charles.derousiers
[CL 21661992 by Sebastien Hillaire in ue5-main branch]
Implemented single-page page table support for distant lights
- store only a single page table entry for distant lights
- modify page lookup logic in various places to handle this
Implemented override behavior to render everything to dynamic pages for a light that always invalidates using r.Shadow.Virtual.Cache.ForceInvalidateClipmaps (behave as uncached, despite caching being enabled).
This brings performance to par with uncached rendering by removing various overheads that are not achieving anything for this case.
- Added a new flag to the nanite view to indicate if it is uncached VSM_PROJ_FLAG_UNCACHED, currently driven by the cvar r.Shadow.Virtual.Cache.ForceInvalidateClipmaps
- If this flag is set on a view ShouldCacheInstanceAsStatic, which now takes a nanite view, returns false, causing all rendering to go to the dynamic pages.
- To preserve HZB functionality, the HZB build is modified to load from the dynamic depth pages (normally it uses the static)
- The page initializer (that clears depth) skips static pages for uncached bviews as they will not be used.
- Finally the page merge pass that combines static & dynamic depth into the dynamic page also skips pages from uncached views.
Optimized page allocation pass by storing the actual pages needing allocation from the cache init pass.
Optimized hierarchical page flag generation by dispatching over physical pages instead of virtual.
Fixed dynamic primitive cache invalidation logic.
#jira UE-122102
#rb andrew.lauritzen
#preflight 63087c2592620e5ec3aa3f2f
[CL 21590421 by ola olsson in ue5-main branch]
We are no longer fighting the Strata tree but just using it now, since Legacyconversion node can only oputput a single BSDF at any time. Conversion to decals parameter blending will now correctly work.
#rb charles.derousiers
#jira UE-160739
#fyi charles.derousiers
#preflight none
[CL 21423524 by Sebastien Hillaire in ue5-main branch]