* Decreased Global Distance Field border from full voxel to half voxel and decreased page size from 16 to 8. This removes ~30% of pages, saving memory and improving update speed
* Half texel border requires to compute gradients using 8 page table fetches, but it didn�t influence Lumen tracing performance on console
* Smaller pages put some pressure on culling and other passes. Culling grid was switch from per page to per 4x4x4 pages. Overall full GDF update speed in FortGPUTestBed decreased from 5.89ms to 4.57ms
* Switched page table format to UInt32 in order to support larger Global Distance Field resolutions
* Moved all GlobalDistanceField shaders into /DistanceField/* folder, in future we should move other distance field shaders there
* Moved GlobalDistanceField page stats from r.Lumen.Visualize.Stats in Lumen code to r.GlobalDistanceField.Debug in GlobalDistanceField code
* Fixed Lumen normal visualization when only Global Distance Field tracing is enabled
#preflight 62630427006fa20b683b405a
[FYI] Tiago.Costa, Daniel.Wright
#ROBOMERGE-AUTHOR: krzysztof.narkowicz
#ROBOMERGE-SOURCE: CL 19873116 via CL 19873429 via CL 19874251
#ROBOMERGE-BOT: UE5 (Release-Engine-Staging -> Main) (v940-19807014)
[CL 19878047 by krzysztof narkowicz in ue5-main branch]
- Added LWCHackToFloat to higher level code.
- Converted a few simple usages to work correctly in LWC.
#rb none
#preflight none
#jira none
[CL 19256116 by tiago costa in ue5-main branch]
- Added LWCHackToFloat to higher level code.
- Converted a few simple usages to work correctly in LWC.
#rb none
#preflight none
#jira none
[CL 19255315 by tiago costa in ue5-main branch]
* Removed culled DF object copies during culling. Instead now only indices to culled objects are stored
* Refactored DF heightfield object loads into FHeightfieldObjectBounds and FHeightfieldObjectData
This is a step towards optimizing DF culling and reusing this code for Lumen Landscape culling
Perf Reverb on 2080:
* CullMeshSDFObjectsToFrustum 0.04ms->0.03ms (removed DF object copies)
* Other passes didn't change
#preflight 61f5a7b7694910780bd91918
#rb Tiago.Costa
#ROBOMERGE-AUTHOR: krzysztof.narkowicz
#ROBOMERGE-SOURCE: CL 18789232 in //UE5/Release-5.0/... via CL 18789258 via CL 18789368
#ROBOMERGE-BOT: UE5 (Release-Engine-Test -> Main) (v908-18788545)
[CL 18789821 by krzysztof narkowicz in ue5-main branch]
Basic approach is to add HLSL types FLWCScalar, FLWCMatrix, FLWCVector, etc. Inside shaders, absolute world space position values should be represented as FLWCVector3. Matrices that transform *into* absolute world space become FLWCMatrix. Matrices that transform *from* world space become FLWCInverseMatrix. Generally LWC values work by extending the regular 'float' value with an additional tile coordinate. Final tile size will be a trade-off between scale/accuracy; I'm using 256k for now, but may need to be adjusted. Value represented by a FLWCVector thus becomes V.Tile * TileSize + V.Offset. Most operations can be performed directly on LWC values. There are HLSL functions like LWCAdd, LWCSub, LWCMultiply, LWCDivide (operator overloading would be really nice here). The goal is to stay with LWC values for as long as needed, then convert to regular float values when possible. One thing that comes up a lot is working in translated (rather than absolute) world space. WorldSpace + View.PrevPreViewTranslation = TranslatedWorldspace. Except 'View.PrevPreViewTranslation' is now a FLWCVector3, and WorldSpace quantities should be as well. So that becomes LWCAdd(WorldSpace, View.PrevPreViewTranslation) = TranslatedWorldspace. Assuming that we're talking about a position that's "reasonably close" to the camera, it should be safe to convert the translated WS value to float. The 'tile' coordinate of the 2 LWC values should cancel out when added together in this case. I've done some work throughout the shader code to do this. Materials are fully supporting LWC-values as well. Projective texturing and vertex animation materials that I've tested work correctly even when positioned "far away" from the origin.
Lots of work remains to fully convert all of our shader code. There's a function LWCHackToFloat(), which is a simple wrapper for LWCToFloat(). The idea of HackToFloat is to mark places that need further attention, where I'm simply converting absolute WS positions to float, to get shaders to compile. Shaders converted in this way should continue to work for all existing content (without LWC-scale values), but they will break if positions get too large.
General overview of changed files:
LargeWorldCoordinates.ush - This defines the FLWC types and operations
GPUScene.cpp, SceneData.ush - Primitives add an extra 'float3' tile coordinate. Instance data is unchanged, so instances need to stay within single-precision range of the primitive origin. Could potentially split instances behind the scenes (I think) if we don't want this limitation
HLSLMaterialDerivativeAutogen.cpp, HLSLMaterialTranslator.cpp, Preshader.cpp - Translated materials to use LWC values
SceneView.cpp, SceneRelativeViewMatrices.cpp, ShaderCompiler.cpp, InstancedStereo.ush - View uniform buffer includes LWC values where appropriate
#jira UE-117101
#rb arne.schober, Michael.Galetzka
#ROBOMERGE-AUTHOR: ben.ingram
#ROBOMERGE-SOURCE: CL 17787435 in //UE5/Main/...
#ROBOMERGE-BOT: STARSHIP (Main -> Release-Engine-Test) (v881-17767770)
[CL 17787478 by ben ingram in ue5-release-engine-test branch]
#jira UE-111998
#rb rolando.caloca
#ROBOMERGE-SOURCE: CL 15805569 in //UE5/Release-5.0-EarlyAccess/...
#ROBOMERGE-BOT: STARSHIP (Release-5.0-EarlyAccess -> Main) (v783-15756269)
[CL 15810213 by brian white in ue5-main branch]
#ROBOMERGE-SOURCE: CL 15793012 in //UE5/Release-5.0-EarlyAccess/...
#ROBOMERGE-BOT: STARSHIP (Release-5.0-EarlyAccess -> Main) (v783-15756269)
[CL 15793027 by daniel wright in ue5-main branch]
* SDFs are now generated, allocated from the atlas and uploaded in 8^3 bricks (7^3 unique data, half voxel padding).
* Tracing must load the brick index from the indirection table, and only bricks near the surface are stored
* 3 mips are now generated, with the lowest resolution always loaded and the other 2 streamed
* SDFs are now G8 narrow band. Lower resolution mips must be traversed when querying distance to nearest surface far away from the surface
* The Distance Field Brick Atlas is now stored for each FScene and dynamically resized based on needs with a GPU memcopy
* Brick atlas uses a 1d pooled allocator which has no fragmentation and greatly reduces packing waste over the 3d allocator
* Added new indirection for Distance Field Asset data, so that only a single entry needs to be updated when a mip is streamed in or out in scenes with millions of instances
* Compute shaders operating on distance field instances generate streaming requests, which are async read back to CPU, turned into IO requests, which are polled and when complete uploaded to atlases
* Any mesh instance inside the Global SDF extent (200m) requests mip1, and at 50m requests mip2
* Now using a batched compute scatter to upload to the distance field atlas instead of RHIUpdateTexture3d, to bypass alignment restrictions and per-upload overhead
* Distance Field streaming uses an async task to move Memcpy and IO request overhead off of the Rendering Thread
* Distance Field Visualization now computes a normal from the SDF gradient and does simple lighting to better visualize the scene representation
* Increased r.DistanceFields.MaxPerMeshResolution from 128 to 512, to better represent large objects
* Mesh SDF generation now uses an Embree point query to calculate closest unsigned distance, and then a much smaller set of rays to count backfaces for negative region determination, for a 11x speedup
* Upgraded mesh utilities to Embree 3.12.2 to get point queries
* Fixed wrong transform used for SDF normals in Lumen, causing non-uniformly scaled meshes to have incorrect Surface Cache interpolation
* Fixed Static Mesh materials not getting PostLoaded before SDF build, causing their blend modes to be wrong for the build, which corrupts the DDC. Also included those blend modes in the DDC key.
Original costs on 1080 GTX (full updates on everything and no screen traces)
10.60ms UpdateGlobalDistanceField
3.62ms LumenReflectiveTest.DirectionalLight_1 Shadowmap 1
1.73ms VoxelizeCards Clipmaps=[0,1,2,3]
0.38ms TraceCards 1 dispatch 1 groups
0.51ms TraceCards 1 dispatch 1 groups
Sparse SDF costs
12.06ms UpdateGlobalDistanceField
4.35ms LumenReflectiveTest.DirectionalLight_1 Shadowmap 1
2.30ms VoxelizeCards Clipmaps=[0,1,2,3]
0.69ms TraceCards 1 dispatch 1 groups
0.77ms TraceCards 1 dispatch 1 groups
Tested: TopazEntry PC, Reverb PC and PS5, EngineTests, QAGame, Rift, Frosty P_Construct_WP, FortGPUTestbed
#rb Krzysztof.Narkowicz
#ROBOMERGE-OWNER: Daniel.Wright
#ROBOMERGE-AUTHOR: daniel.wright
#ROBOMERGE-SOURCE: CL 15784493 in //UE5/Release-5.0-EarlyAccess/...
#ROBOMERGE-BOT: STARSHIP (Release-5.0-EarlyAccess -> Main) (v783-15756269)
#ROBOMERGE-CONFLICT from-shelf
[CL 15790658 by Daniel Wright in ue5-main branch]
#ROBOMERGE-SOURCE: CL 15348315 in //UE5/Release-5.0-EarlyAccess/...
#ROBOMERGE-BOT: STARSHIP (Release-5.0-EarlyAccess -> Main) (v771-15082668)
[CL 15359818 by daniel wright in ue5-main branch]
Contains all work on Lumen since shipping Reverb in March:
* New Opaque Final Gather with higher quality in clean architecture and better performance
* Improved global tracing which reduces leaking
* New basic Reflections pipeline which respects material Roughness
* New HZB traces that are more accurate at tracing against the depth buffer
* Hardware Ray Tracing paths for Lumen shadowing and Final Gather
#rb none
[CL 14274844 by Daniel Wright in ue5-main branch]