* Feature that merges command lists from QueueAsyncCommandListSubmit into a single Payload where possible (r.D3D12.AllowPayloadMerge, default enabled). Saves around 0.5 ms per async command list batch, with total savings from 1 to 4 ms on VP test scenes.
* Remove unnecessary command flush before Present on D3D12. Added a virtual function "NeedFlushBeforeEndDrawing" that can return false if the platform function RHIEndDrawingViewport function flushes commands, and the calling function doesn't need to.
* Wrapped three scene rendering flushes, plus BeginFlushResourcesRHI with "RHIIncludeOptionalFlushes()", which returns false on D3D12. These were a net perf loss on D3D12, as they wrap hardly any rendering. I didn't want to change behavior on other platforms, hence the conditional.
* Overall the above changes remove 8 command list flushes in a two view scene, which typically cost in the range of 0.06 ms each, saving 0.48 ms CPU total. Perf win increases with more views, and removing the flushes also helps with GPU bubbles.
* Added a flush in a strategic location at the end of Shadows and Lumen. In a sample scene, saved 1.5 ms GPU eliminating a bubble.
#jira UE-167553
#rb luke.thatcher
#rnx
#preflight 635547519e14ee3c790cc14e
#lockdown mihnea.balta
[CL 22728231 by jason hoerner in ue5-main branch]
1. Spatial denoising only plugin.
2. Spatial and temporal denoising plugin.
When the user enables both plugins:
1. The spatial denoising only plugin + builtin temporal denoiser can be selected as (default):
`r.PathTracing.SpatialDenoiser.Type 0`
2. The spatial and temporal denoser plugin can be selected as:
`r.PathTracing.SpatialDenoiser.Type 1`
Note:
* There is a modification in FRDGBuilder::IsTransient() to make TexCreate_Shared return false to make it committed resource. Otherwise, the interop will fail when trying to get the device E_Invalidarg (0x80070057).
#jira UE-158838
#preflight 633f20f72a0a2c1ead3a6184
#rb Juan.Canada, Chris.Kulla
#ushell-cherrypick of 21960393 by Tiantian.Xie
[CL 22405452 by tiantian xie in ue5-main branch]
This change includes significant refactor work performed in //UE5/Dev-ParallelRendering. A brief summary of the work is as follows:
Refactored RHI command lists
- Removal of the "immediate" async compute command list
- Introduced an "active pipe" on each command list, allowing RHICmdLists to record work for either graphics or async compute. Pipes can be selected using the SwitchPipeline() function, or the FRHICommandListScopedPipeline helper.
- New explicit command list submission RHI API (RHIFinalizeContext, RHISubmitCommandLists). The IRHICommandContextContainer type has been removed.
- Explicit GPU submission is automatically appended to the immediate command list when it is dispatched to the RHI thread.
Platform RHI implementations
- The new submission API has been implemented across all platforms. Some platforms required a significant refactor.
#rb Mihnea.Balta,Kenzo.Terelst
#jira UE-139550
#preflight 6332e3641003050806d802ef
[CL 22239063 by luke thatcher in ue5-main branch]
- Passes are added to a queue that is consumed by a task to perform pass setup actions.
- Moved CompilePassBarriers to overlap with resource collection.
- Converted most tasks to new task system.
#preflight 630e5e2e660db81edb9a0562
[CL 21710710 by zach bethel in ue5-main branch]
[FYI] zach.bethel
Original CL Desc
-----------------------------------------------------------------
Move RDG buffer uploads off the render thread.
Maybe be related to gpu crash in EMT (UE-157850)
#preflight 62b9f1ec4209c7c579df8fc2
#ushell-cherrypick of 20839110 by zach.bethel
#ROBOMERGE-OWNER: ben.woodhouse
#ROBOMERGE-AUTHOR: serge.bernier
#ROBOMERGE-COMMAND: _robomerge ue5-main
#ROBOMERGE-SOURCE: CL 21010071 via CL 21010165 via CL 21010189
#ROBOMERGE-BOT: UE5 (Release-Engine-Staging -> Main) (v972-20964824)
[CL 21020108 by serge bernier in ue5-main branch]
- Removed scene render mem-mark among others. MemStack usage is now restricted to local scopes with known marks.
- Render resources with destructors are allocated using the FSceneRenderingBulkObjectAllocator on FSceneRenderer, which is deleted when the scene render is.
#preflight 62b266e20d4d6228de97babe
#rb mihnea.balta, yuriy.odonnell
[CL 20907647 by zach bethel in ue5-main branch]
This allows to define a new dynamic scaling in the renderer with low amount of boiler plate:
DynamicRenderScaling::FHeuristicSettings GetDynamicTranslucencyResolutionSettings()
{
RenderingDynamicScaling::FHeuristicSettings BucketSetting;
BucketSetting.Model = RenderingDynamicScaling::EHeuristicModel::Quadratic;
BucketSetting.bModelScalesWithPrimaryScreenPercentage = true;
BucketSetting.MinResolutionFraction = ...
...
return BucketSetting;
}
DynamicRenderScaling::FBudget GDynamicTranslucencyResolution(TEXT("DynamicTranslucencyResolution"), &GetDynamicTranslucencyResolutionSettings);
And then simply define a scope to measure the GPU timing as such:
{
DynamicRenderScaling::FRDGScope DynamicTranslucencyResolutionScope(GraphBuilder, GDynamicTranslucencyResolution);
// add passes to GraphBuilder
}
#rb zach.bethel
#jira UE-152561
#preflight 628f1219bb14235aa38c904c
[CL 20376428 by Guillaume Abadie in ue5-main branch]