Cost is .03ms on 2080TI at 1080p
Fixed Lumen Reflection Screen Traces outputting too far of a distance when fading out at the edges of the screen
Added r.AOGlobalDistanceFieldForceUpdateOnce to allow working around Mesh SDF streaming latency
[CL 23147835 by daniel wright in ue5-main branch]
* We update primitives once per scene inside a SceneRender()
* GlobalSDF is cached per view
* GlobalSDF update is based on whether current primitive is in UpdateTrackingBounds, where UpdateTrackingBounds are built from current views
* If render gets called with a non main view (like a cubemap capture) then scene primitive will be updated, but GlobalSDF for the main view won't
[FYI] Tiago.Costa
[CL 22710474 by krzysztof narkowicz in ue5-main branch]
Only build shaders that use ShaderPrint if ShaderPrint is supported.
#preflight 634ee698e746026e48eb17e1
[CL 22628292 by jeremy moore in ue5-main branch]
Resolves the following ensure:
Resource GlobalDistanceField.PageObjectGridBuffer is in external access mode and is valid for access on the following pipelines: Graphics, but is being used on the AsyncCompute pipe in pass Scene.DiffuseIndirectAndAO.LumenScreenProbeGather.UpdateRadianceCaches.TraceFromProbes Res=16x16.
#rb Jian.Ru
#preflight 634dffb1d737d61a2f2c6c8e
[CL 22601928 by eric mcdaniel in ue5-main branch]
* Memory usage didn't change as Global SDF occupancy was lowered to 0.3. This can be now lowered thanks for previous changes to smaller pages and Landscape page fix.
* Performance is almost the same (<0.05ms change on console)
[CL 22390125 by krzysztof narkowicz in ue5-main branch]
During page composite pass, pages containing coverage values below 1 are marked using a single bit inside PageTable. This bit is then checked during the tracing pass to decide whether we should load a sparse coverage page or not.
Previously we were marking empty areas as coverage 0, but it had some side effects and isn�t compatible with this marking scheme. Now all empty space is marked as coverage=1 and mobility cache composition was accordingly modified.
Performance wins on FortGPUTestBed, all tracing passes, current gen console, 1080p
Epic - 0.17ms
High - 0.1ms
Unfortunately this doesn�t fully remove coverage overhead in levels without foliage. Overhead after this optimization:
Epic - 0.1ms
High - 0.06ms
[CL 22165335 by krzysztof narkowicz in ue5-main branch]
* Higher resolution reflections
* No leaking from voxel resampling
* Better emissive representation as previously voxel lighting could miss those
* Allows to resolve hits below voxel interpolation footprint, which previously had to be black in order to prevent from sampling the surface from which current ray originated and causing multibounce to explode.
Distance field object grid is now a part of the global distance field update. Which allows to reuse the page table, update, separate mobility caches and make sure that global distance field matches hit lighting (previous Voxel Lighting was updated in a different place and didn�t always match). Object grid is half resolution and stores up to 4 closest objects to the cell center. All objects are sorted by distance. On every global distance field, the appropriate cell is loaded. Objects are processed in order (sorted by distance to cell center) and their surface cache is applied to hit points until a valid coverage is found.
Smaller changes:
* Removed LumenVoxelLighting
* Prefixed Global Distance Field resources with �GlobalDistanceField.�
* Distance field object buffers are now properly tracked by RDG
* Distance field will bind empty buffers if there are no distance field objects
* Moved global distance field object composite shaders to GlobalDistanceFieldCompositeObjects.usf in order to speed up shader compilation
* Removed mesh SDF change tracking through AddModifiedBoundsForLumen
* Remove more unused parts of Distance Scene
* Fixed ComputeSurfaceCacheSample sampling outside of valid bilinear region
Memory:
Memory increased from 72mb to 112mb. Could be optimized in future by moving object cull grid to separate page table to leverage extra sparseness, or by simply reducing max number of element per grid cell to 3, thus bringing memory overhead to 82mb.
Performance:
Console, FortGPUTestBed, 1080p
Lumen High: 3.4ms -> 3.4ms
Lumen Epic: 7.9ms -> 8.1ms
[FYI] Daniel.Wright, Tiago.Costa
[CL 21825874 by krzysztof narkowicz in ue5-main branch]
- Clipmap center is used as origin of translated world space during update.
- Modified HeightfieldDescriptions to store LWC matrices.
#rb Krzysztof.Narkowicz
#preflight 63068184c744dac9673b04ba
[CL 21551226 by tiago costa in ue5-main branch]
- Remove depedency on ViewUniformBuffer.
- Changed DistanceFieldSampler to use wrap addressing so that it matches the global sampler addressing mode.
#rb aleksander.netzel
#preflight 63063a1ac744dac967282411
[CL 21541307 by tiago costa in ue5-main branch]
* Global SDF tracing uses a per-step noise comparison to implement semi-transparency (r.LumenScene.GlobalSDF.DitheredTransparencyStepThreshold), with a high min step size to regain performance
* Fixed Global SDF Coverage getting stomped when Movable objects were nearby, which was lowering its effective resolution. Empty space is now marked as not covered, which requires MinStepSizes to be kept smaller than or equal to ONE_SIDED_BAND_SIZE.
* Mesh SDF tracing now expands based on coverage and implements its own semi-transparency (r.Lumen.DiffuseIndirect.MeshSDF.DitheredTransparencyStepThreshold)
* Only Screen Probe and Radiance Cache traces enable dithered transparency to combat foliage over-occlusion. Visualize matches reflection behavior by default, so it has to have TraceInput.bDitheredTransparency = true to visualize what diffuse rays see.
#ROBOMERGE-AUTHOR: daniel.wright
#ROBOMERGE-SOURCE: CL 21019034 via CL 21019035 via CL 21019036
#ROBOMERGE-BOT: UE5 (Release-Engine-Staging -> Main) (v972-20964824)
[CL 21023956 by daniel wright in ue5-main branch]
* Each cube map face gets its own FSceneViewState. The downside is that it potentially could add a lot of extra memory. So some memory optimizations were made to compensate.
* Global distance field data is shared for all cube map faces. This data is only dependent on the view origin, which is invariant for the cube map faces. A mechanism was added to allow cross view state data sharing for shared origin views. This saves 8.5 MB, and improves perf.
* Disabled distance field lighting temporal AO for cube map capture. The AO history texture in the view state is dependent on the Scene texture extents, which is typically the front buffer size, which then gets multiplied by 6. For a 1080p front buffer, this is 24 MB, while for a 4K front buffer, this is 96 MB, regardless of the resolution of the cube map capture itself. A 256x256 cube map capture, including all the auxiliary state, is otherwise only around 9 MB, so this is an order of magnitude increase. A comment in the function "DistanceFieldAOUseHistory" has more details on how this could be fixed, by using the view rect rather than Scene texture extents for the history texture -- for the 256x256 case, it would reduce the incremental memory cost to 768K.
With the memory savings from disabling distance field lighting temporal AO, this change is more or less memory neutral if the Scene texture extent (front buffer) is 1080p. A net increase of 590K. With a higher resolution front buffer, it's a subsantial memory win.
#jira UE-151717
#rnx
#rb tiago.costa daniel.wright
#preflight 629f6a27233ae0a8f8f99932
[CL 20614815 by jason hoerner in ue5-main branch]