- Decals materials are evaluated using callable shaders in PathTracingKernel.
- Decals are culled using a 2D grid similar to the existing light grid.
- In order to correctly handle decal blending order, decals are sorted using the same logic as the rasterizer on CPU. The compute shader that builds the decal grid maintains the correct order.
- Decal materials are wrapped in FRayTracingDecalMaterialShader. The instance parameters of each decal are bound using uniform buffers.
#preflight 628f3fed2f2409bc1e7a6414
#rb Yuriy.ODonnell, chris.kulla, Jeremy.Moore
[CL 20377336 by tiago costa in ue5-main branch]
Hooked the value used by instanced static meshes into it to get nanite ISMs to respect the cull distance.
#rb brian.karis
#preflight 6287d8a21e478b95c7345866
#ROBOMERGE-AUTHOR: jamie.hayes
#ROBOMERGE-SOURCE: CL 20301693 via CL 20301776 via CL 20301792
#ROBOMERGE-BOT: UE5 (Release-Engine-Staging -> Main) (v948-20297126)
[CL 20305738 by jamie hayes in ue5-main branch]
* Added RaytracingTraversalStatistics to create, capture and print TraceRayInline traversal statistics.
* In inline raytracing shaders you only need to call TraceRayInlineAccumulateStatistics() which will gather the results of the most recent TraceRayInline call.
* New debug visualization mode 'Traversal Statistics' that will print to the screen the traversal statistics for primary rays.
#rb Yuriy.Odonnell
#preflight 62860caa2b53e2be4c8ceee2
[CL 20277869 by aleksander netzel in ue5-main branch]
Bake viewport and subpixel transforms into matrix to save ALU and VGPRs
Changed group size back to 64 for non-masked
#rb graham.wihlidal
#preflight 627a9b2fe713fc6e2c52c9bf
#ROBOMERGE-AUTHOR: rune.stubbe
#ROBOMERGE-SOURCE: CL 20125493 via CL 20125504 via CL 20125508
#ROBOMERGE-BOT: UE5 (Release-Engine-Staging -> Main) (v943-19904690)
[CL 20128642 by rune stubbe in ue5-main branch]
If we want to re-introduce this notion, we should be able to handle this directly in the material given the world position and camera matrix.
This frees up 4 bytes (in preparation for Strata support) and also reduces the path state structure size by 4 bytes.
I've measured around 3% speedup from this removal.
#rb Yuriy.ODonnell,Charles.deRousiers
#preflight 626ae4926461dd769ffe4394
[CL 19966893 by chris kulla in ue5-main branch]
Remove coherent sampler since it would be complicated to support in the mGPU case and isn't really necessary for performance anymore
#rb Jason.Hoerner
#preflight 624b79f09f404234149aec8e
[CL 19617353 by chris kulla in ue5-main branch]
This change also remaps the original bEvaluateWorldPositionOffset on SMC into bEvaluateWorldPositionOffsetInRayTracing, because this var was only ever driven by ray tracing specific methods. The original bEvaluateWorldPositionOffset is now used by this more generic API.
Lastly, a new cvar (r.OptimizedWPO) has been added that indicates if the hint should be respected or not (default is false, which means WPO is always active, regardless of hint)
#rb rune.stubbe, marc.audy, derek.ehrman
[FYI] brian.karis, jamie.hayes, ola.olsson, andrew.lauritzen, jian.ru
#preflight 6244a8dcdc6183e3f5f8de98
#ROBOMERGE-AUTHOR: graham.wihlidal
#ROBOMERGE-SOURCE: CL 19564957 via CL 19564973 via CL 19564978
#ROBOMERGE-BOT: UE5 (Release-Engine-Staging -> Main) (v937-19513599)
[CL 19566743 by graham wihlidal in ue5-main branch]
For pixel with complex/multi-BSDF computes BSDF material byte/index offsets and 'overflowing' tile.
This wil be used by Lumen for handling/parallelizing lighting computation.
#rb none
#jira none
#preflight 6234c38848746817f13c87bb
#fyi sebastien.hillaire
[CL 19438082 by Charles deRousiers in ue5-main branch]
- Add slope-based depth extrapolation which improves the quality of penumbras on angled receivers. Costs ~10% performance in some cases so maintaining a permutation/cvar (default on) for scalability.
- Change screen ray trace to be a simple "space skipping" ray that terminates as soon as it goes behind geometry and continue VSM trace from that distance. This avoids various contact-shadow-like artifacts and undesirable/inconsistent contact shadows from things that aren't in the VSM. In certain cases if regular contact shadows are desired on top of VSM the engine contact shadows can be enabled, as it is with CSMs.
- Remove a bunch of use of "halfs" in the shaders as they cause some extra ALU on some platforms and don't appear to really be helping with occupancy anymore
- Small bump to minimum normal bias clamp (only affects things very close to the camera)
#rb brian.karis
[FYI] ola.olsson
#ROBOMERGE-AUTHOR: andrew.lauritzen
#ROBOMERGE-SOURCE: CL 19411300 via CL 19411627
#ROBOMERGE-BOT: UE5 (Release-Engine-Staging -> Main) (v928-19376421)
[CL 19413239 by andrew lauritzen in ue5-main branch]
Tiles with SSS but no rough refraction are also added onto the scene color buffer separately from the blur process.
#rb none
#preflight
#fyi charles.derousiers
[CL 19404162 by Sebastien Hillaire in ue5-main branch]
This simplify Strata Slab UI, and make it easier to thin-film coat many BSDFs at the same time.
#rb none
#jira none
#preflight 622fa337c51b66df4c248174
#fyi sebastien.hillaire
[CL 19378172 by Charles deRousiers in ue5-main branch]
Reductions *per material*:
SM5
--
FHWRasterizeVS: 832 -> 21
FHWRasterizePS: 104 -> 39
SM6
--
FHWRasterizeVS: 320 -> 9
FHWRasterizeMS: 640 -> 9
FHWRasterizePS: 120 -> 30
Vulkan
--
FHWRasterizeVS: 320 -> 9
FHWRasterizePS: 40 -> 15
Other platforms redacted =)
-- Details
* CLUSTER_PER_PAGE has been fully removed (since we no longer ever run CLUSTER_PER_PAGE=0), which now makes it mutually inclusive with VIRTUAL_TEXTURE_TARGET
* HAS_RASTER_BIN has been replaced with a dynamic branch, since this is just a per cluster index offset based on a simple uniform buffer load
* ADD_CLUSTER_OFFSET has been replaced with a dynamic branch, since this is just a per cluster index offset based on a simple uniform buffer load
* HAS_PREV_DRAW_DATA has been replaced with a dynamic branch, since this is just a per cluster index offset based on a simple uniform buffer load
* NEAR_CLIP (only change to significantly affect codegen) has been turned into a dynamic branch based on FNaniteView - this lets us merge depth clip/clamp rasterizer calls in VSM together instead of relying on HAS_PREV_DRAW_DATA, and a future optimization can now be done to merge local and directional light full Nanite pipeline calls together.
* VISUALIZE permutation removed from VS/MS since it only loaded unform values that passed down per-vertex into fragment stage as nointerpolation parameters. Pixel shader now constructs this uint2 directly under the VISUALIZE permutation
* NANITE_MESH_SHADER_INTERP removed by default but still left in the code, since it is a work in progress potential optimization for DX12 mesh shaders
* Removed explicit Lumen and VSM usage of NANITE_RENDER_FLAG_HAVE_PREV_DRAW_DATA (now the dynamic branch path is only taken if CullRasterizeMultiPass implicitly breaks the rasterization into multiple calls due to NANITE_MAX_VIEWS_PER_CULL_RASTERIZE_PASS overflow)
Performance was tested on a 2080Ti in AncientGame, and the delta is effectively noise (tested cached and uncached VSM). Further testing on other platforms will occur, but important to get this change in for all the benefits and easy to tweak things later if needed.
#rb rune.stubbe
#fyi brian.karis, ola.olsson, andrew.lauritzen, jamie.hayes, daniel.wright, krzysztof.narkowicz
#preflight 622e684c7e2e35638c96a16a
#robomerge FNNC
[CL 19370372 by graham wihlidal in ue5-main branch]
* Moller-Trumbore and Watertight triangle intersections in RayTriangleIntersect.h
* AABB intersection
* Nanite cluster intersection
* Nanite hierarchy intersection using a stack.
#rb Tiago.Costa
#preflight none
[CL 19196324 by aleksander netzel in ue5-main branch]