This is a fixed version of CL30977048, which had an issue compiling shaders at SM5 and was backed out
#jira UE-204417
#rb Alejandro.Arango, Guillaume.Abadie
#virtualized
[CL 31037789 by trystan binkley-jone in ue5-main branch]
Prevents VRS from being disabled in areas with 2x4, 4x2, or 4x4 rates.
#jira UE-192098
#rb rune.stubbe
[CL 31023301 by christopher fiala in ue5-main branch]
Only normals should be accessible using the deferred decals (reconstructed from depth buffer).
[FYI] charles.derousiers
[CL 30996047 by sebastien hillaire in ue5-main branch]
[FYI] trystan.binkley-Jone
Original CL Desc
-----------------------------------------------------------------
nDisplay: Changed the dynamic depth of field compensation from a hard coded curve to a LUT with a texture that can be provided through the ICVFX camera properties
#jira UE-204417
#rb Alejandro.Arango, Guillaume.Abadie
#virtualized
[CL 30988344 by bob tellez in ue5-main branch]
Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists.
See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows:
This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this:
- The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit().
- A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job.
- Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition).
One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations:
- RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism.
- Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes.
A summary of the new RHI breadcrumb system is as follows:
- A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging.
- The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes.
- RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress.
- Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs).
RenderGraph scopes have been simplified:
- The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types.
- Each RDG pass holds a pointer to the scope it was created under.
- BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes.
Other changes include:
- Fixes for bugs uncovered when parallel translate was enabled.
- Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer.
- Refactored RHI draw call stats to better fit the new pipeline design.
#rb jeannoe.morissette, zach.bethel
#jira UE-139543
[CL 30973133 by Luke Thatcher in ue5-main branch]
#rb charles.derousiers
[FYI] florin.pascu, henry.falconer, per.karefelt
#ushell-cherrypick of 30955220 by Charles.deRousiers
[CL 30955660 by charles derousiers in ue5-main branch]
We now use accurate luminance coefficients provided by the working color space. Furthermore, the maximum nit value is read from the calculated luminance, not the RGB channels directly.
#jira UE-204922
#rb chris.kulla
[CL 30919831 by eric renaudhoude in ue5-main branch]
Revert back to manual interpolation of depth in HW raster to improve shadow rendering performance.
Get rid of unused interpolator for visibility buffer rendering.
#jira UE-204555
#rb Ola.Olsson
[FYI] brian.karis, graham.wihlidal, jamie.hayes
[CL 30918596 by rune stubbe in ue5-main branch]
Also simplify which instance mask is used for mesh decals by deriving it inside EvaluateDecals instead of passing it from outside the call.
This saves ~0.2ms on one benchmark scene without decals, and does not measurably affect scenes with decals.
#jira UE-202786
[CL 30914979 by chris kulla in ue5-main branch]
-Implemented vertex caching to evaluate ~1 vertex per triangle instead of 3
-Separate immediate mode tessellation table that is constrained to one vert + one tri per lane
-Disable material range merging in raster binning, so UVDensities can be made scalar in immediate mode rasterizer
-Use wide LDS loads instead of permutes for some properties to reduce DS pressure
Patch rasterizer optimizations:
-Reduced max tess factor from 16 to 14 to increase occupancy of patch rasterizer
-Move UVDensities to per-patch work in patch rasterizer
Packed FTessellatedPatch data to fit in fewer registers to reduce VGPR/DS pressure
Added debug code to output SVG of tessellation pattern
#jira UE-197833
#rb brian.karis
[FYI] graham.wihlidal, jamie.hayes
[CL 30896683 by rune stubbe in ue5-main branch]
* Use explicit FShaderParameter-s instead of shader parameter struct macros, as this allows the shader to be more easily dispatched from low-level RHI code
* Add total record count parameter to ProcessShaderBundleRecord
* Add CFLAG_ForceBindful to the shader when required by a specific platform
#jira none
#rb christopher.waters, zach.bethel
[CL 30878011 by Yuriy ODonnell in ue5-main branch]
Ex:
position PF_A32B32G32R32F 1024x1024 (16MB)
velocity PF_FloatRGBA 1024x1024 (8MB)
Attribute1 PF_B8G8R8A8 1024x1024 (4MB)
Attribute2 PF_B8G8R8A8 1024x1024 (4MB)
Thoses textures are all double buffered and at full sizes, they represent 64 MB of render targets.
When GFXCascadeGpuSpriteAllowDynAllocs is enabled, resizing happens when the TileAllocator of the FParticleSimulationResources run out of tiles. We then allocate new tiles and resize the gpu simulation resources render targets, each time by a factor 2 (X,Y). This introduce a TileZ index (TilePageIndex) which allow us to compute a tile page index at runtime and retrieve the resized tile coordinates. Resizing is limited by the maximum morton index possible (65535) and also the original resource sizes provided by the device profile (fx.GPUSimulationTextureSizeX,fx.GPUSimulationTextureSizeY).
The resizing feature can be enable with fx.Cascade.GpuSpriteDynamicAllocations cvar, and the initial size is controlled by fx.GPUSimulationDynTextureSizeXY.
Depending on the platform and map content, this can saves alot of memorey, ranging from 7MB to 56MB depending on the platforms.
*Resizing feature currently dont support GPU sorting mode. Radix sort shader algo will need to support keys/values of different format than 32 bits to work with the dynamic resizing of the gpu sim resources. To work around this limitation, we currently, ignore gpu sorting mode on cascade emitters when resizing is enable. A Jira was created to track this issue, and potentially add support in the future.
#reviewer [at]stu.meckenna, [at]rob.krajcarski
#rb rob.krajcarski, Stu.McKenna
[CL 30857374 by serge bernier in ue5-main branch]