Commit Graph

330 Commits

Author SHA1 Message Date
daniele pieroni
c72246904e Workaround for RHI validation error due to prologue pass issuing a fence along with all its transitions.
#jira UE-219098
#rnx

[CL 37168030 by daniele pieroni in 5.5 branch]
2024-10-16 05:44:42 -04:00
zach bethel
3e78b68d24 Added explicit Pipe wait inside of RDG to avoid speculative race condition between task completion and clearing the active task in the pipe.
#rb none
#jira UE-226893

[CL 37052593 by zach bethel in 5.5 branch]
2024-10-11 11:51:53 -04:00
zach bethel
badbe7829a Reworked RDG dispatch logic to cut a new parallel pass set when a dispatch is requested. Also reworked the Slate 8 swapchain limit workaround so we only dispatch after 8 instead of every time.
[CL 36749528 by zach bethel in 5.5 branch]
2024-10-01 18:06:24 -04:00
luke thatcher
7ec6ef81f5 New GPU profiler improvements. The TStatId on FRHIBreadcrumb has been replaced with a FRHIBreadcrumbData struct that holds additional profiling related data, which includes:
- The TStatId for "stat gpu" stats.
 - The FName required by the CSV profiler for GPU stats.
 - The source file and line number to allow breadcrumbs shown in Insights to link back to their original source location.

Additional changes:
 - Added temporary support for the Insights GPU track. This is guarded by RHI_TEMP_USE_GPU_TRACE until we have a newer, more capable API.
 - Simplified FMeshDrawEvent into a standard RHI breadcrumb in FMeshDrawCommand::SubmitDraw().
 - Moved "r.GPUCsvStatsEnabled" cvar into GPUProfiler.cpp, so it is accessible to both old and new profilers.

#jira UE-177299
#rb mihnea.balta

[CL 35973862 by luke thatcher in ue5-main branch]
2024-09-03 14:10:13 -04:00
zach bethel
443f6d3331 Implemented asynchronous task support for RDG execution lambdas. This will evenutally allow for RDG to avoid syncing parallel execution tasks, improving pipeling of the rendering frame.
The behavior is opt-in by adding a FRDGAsyncTask member to the lambda args like so:

[...] (FRDGAsyncTask, FRHICommandList& RHICmdList) {}

This API design choice is for two reasons.

1. Visually localize the tag near the lambda capture args. Capture arg lifetime must be valid for async access; e.g. by value or referencing memory that is tied to RDG lifetime or the scene renderer lifetime (async tasks are synced by the scene renderer just like mesh pass tasks are). This responsibility is up to the user, so it should be obvious at first glance when a pass can run on an async task.
2. Enforce a compile-time trait on the lambda without requiring multiple AddPass function variants. This allows for utility functions to continue to work as is.

#rb Luke.Thatcher

[CL 35969245 by zach bethel in ue5-main branch]
2024-09-03 11:39:06 -04:00
luke thatcher
40a158ad93 Prepare RHI breadcrumbs and GPU event stream for "stat gpu" support
- Added a TStatId field on breadcrumbs. When set, this indicates the breadcrumb should write its computed GPU duration to the given stat.
 - Implemented "stat gpu" alongside the "stat unit" GPU event stream sink. Times in "stat gpu" are now taken as the union of busy time across all GPU queues, in the same way we compute the "stat unit" GPU time.
 - RDG support is handled via the new RHI_EVENT_SCOPE_STAT macro, allowing us to tag RDG scopes with GPU stats. When the new GPU profiler is enabled, RDG_GPU_STAT_SCOPE and SCOPED_GPU_STAT are empty and will eventually be deprecated and removed.

Cleanup of breadcrumb macros
 - Remove unnecessary "Name" arguments.
 - Require the user to wrap the format string in quotes, rather than stringizing the format arg with the preprocessor.
 - Removed "F" version of macros. Both string literal and formatted strings can be handled with the varargs macro, since the varargs are simply empty when using only a string literal.

Added GetTypeHash function for TStatId
 - This didn't exist before, and allows use of TStatId in TMaps etc.

#jira UE-177299
#rb zach.bethel

[CL 35953056 by luke thatcher in ue5-main branch]
2024-09-02 06:24:43 -04:00
nicholas howe
35538a6f6b PipelineStateCache Asynchronous Consolidation
[REVIEW] [at]zach.bethel [at]Luke.Thatcher
#tests ReplayRun in Dev-PerfB-Main

[CL 35865429 by nicholas howe in ue5-main branch]
2024-08-28 09:52:06 -04:00
zach bethel
2e27983d3b Fixed crash due to D3D12 recording commands into multiple swap chains. Dispatch work to the RHI after each window renders.
[CL 35804249 by zach bethel in ue5-main branch]
2024-08-26 13:52:42 -04:00
bob tellez
b2b2878f51 [Backout] - CL35793735
[FYI] zach.bethel
Original CL Desc
-----------------------------------------------------------------
Removed legacy dispatch after execute calls before / after async compute.

[CL 35803085 by bob tellez in ue5-main branch]
2024-08-26 13:15:07 -04:00
zach bethel
a34fb1665d Removed legacy dispatch after execute calls before / after async compute.
[CL 35793738 by zach bethel in ue5-main branch]
2024-08-25 21:20:03 -04:00
zach bethel
232141149b Reintroduced flush of RHI command lists if bDispatchAfterExecute is true on a pass. Without it, the RHI thread spends too much time processing large batches of work, causing GPU bubbles.
[CL 35766261 by zach bethel in ue5-main branch]
2024-08-23 00:55:44 -04:00
zach bethel
9cb08ca715 Introduced 'AddDispatchPass' to RDG which allows mesh passes to be scheduled along with other parallel RDG passes, reducing the number of sync points. Dispatch passes support raster or compute and take a special FRDGDispatchPassBuilder& as an argument to the lambda instead of a command list. This builder enables creating sub-command lists and launching tasks to process them.
#jira UE-222176

[CL 35637714 by zach bethel in ue5-main branch]
2024-08-19 16:29:07 -04:00
zach bethel
3b5e32dcac Refactored reserved buffer commit logic in RDG to flush commits on the next use of the buffer in a pass.
#rb Yuriy.ODonnell
#jira UE-219937

[CL 35628659 by zach bethel in ue5-main branch]
2024-08-19 12:34:52 -04:00
christopher waters
c2560b57fc Changing SetExternalPooledBufferRHI to take a TRefCountPtr to reduce reference count traffic.
#rb zach.bethel

[CL 35033494 by christopher waters in ue5-main branch]
2024-07-23 16:55:08 -04:00
zach bethel
27bc0ce82c Refactored RDG to allow certain setup tasks to run in parallel with RDG compilation. Disabled render pass merging if the underlying GPU is not tile based, as this avoids large render pass ranges from being batched into a single async task.
#jira UE-211739

[CL 34859015 by zach bethel in ue5-main branch]
2024-07-16 20:58:37 -04:00
zach bethel
2af557f110 Fixed various RHI validation errors and inefficiencies due to async compute.
- Fixed shader validation to handle buffer types for texture metadata.
 - RDG external access mode will now leave the resource in its external state / pipelines.
 - Modified the RHI validation layer to allow transitioning All -> One pipe without a fence if the NoFence flag is specified. Added fence validation to emit an error of an external fence is not used that might introduce a race condition.
 - Fixed VT / GPU Skin cache buffers to use read states on all pipes.
 - Batched multiple VT external access calls together to reduce passes in RDG.
 - Fixed IES texture manager validation error.
 - Fixed sky cube texture map validation error.
 - Fixed virtual shadow debug shader that was attempting to read and writing using the same texture.
 - Fixed render thread CVar access on game thread.
 - Fixed up -ForceRHIBypass to not crash.

#rb mihnea.balta
#jira UE-210930,

[CL 34043166 by zach bethel in ue5-main branch]
2024-05-31 16:26:19 -04:00
pr0-zac
ce0d74a2a8 Adding HW Depth Resolve for Vulkan.
#jira UE-203788
#rb Dmitriy.Dyomin, jeannoe.morissette

[CL 33195788 by pr0-zac in ue5-main branch]
2024-04-24 06:43:38 -04:00
zach bethel
0674d30d69 Added SRVNonPixel, SHADER_PARAMETER_RDG_NON_PIXEL_SRV, and modified RDG_TEXTURE_ACCESS to support texture subresources.
- SRVNonPixel is needed by mobile to insert a barrier between fragment -> vertex texture fetch, but since this is a heavyweight barrier, it is opt-in with SHADER_PARAMETER_RDG_NON_PIXEL_SRV.
 - Small refactor to FRDGTextureAccess to allow for arbitrary subresources, as the current model only allows full resource transitions.

#rb mihnea.balta, luke.thatcher, serge.bernier
#jira UE-211883

[CL 33179861 by zach bethel in ue5-main branch]
2024-04-23 17:02:48 -04:00
luke thatcher
ef68543d9f Fix circular reference in RHI breadcrumb allocators causing a memory leak
- RDG uses its own breadcrumb allocator for RDG scopes. When the graph executes, this allocator is attached to the parent one owned by the immediate command list.
 - Later, when a pass makes use of the immediate command list, a new breadcrumb is inserted on that command list's allocator, which gets reattached to the RDG allocator, forming a circular reference.
 - The circular reference causes a memory leak, and a crash on shutdown when the underlying mempage allocators are destroyed.
 - The reference only occurs in bypass mode, and when platform RHIs skip the DispatchToRHIThread inside FRDGBuilder::BeginFlushResourcesRHI().
 - Fix is to swap out the immediate command list's allocator for the RDG one when the graph executes. This is now done in FRHICommandListBase::AttachBreadcrumbSubTree(...).

Also added RHI validation to check for circular references in the RHI breadcrumb allocators.

#jira UE-212035
#rb mihnea.balta

[CL 33101846 by luke thatcher in ue5-main branch]
2024-04-19 10:02:21 -04:00
josh adams
eca6d6b781 - Benign engine changes that came from new platform experimentation
#rb Chris.Babcock

[CL 33037334 by josh adams in ue5-main branch]
2024-04-17 11:17:07 -04:00
zach bethel
4655163d7a Fixed shadowed local variable in RDG builder.
#jira UE-211759

[CL 32976798 by zach bethel in ue5-main branch]
2024-04-15 17:09:14 -04:00
zach bethel
d6bfab4abe Removed deferral of RDG command list setup task submission as it causes RHI work to execute out of order of the serial mode, causing validation errors.
#jira UE-210959

[CL 32975224 by zach bethel in ue5-main branch]
2024-04-15 15:13:58 -04:00
zach bethel
9ed7fb21c9 Fixed missing transient discard for transient extracted textures. Also properly disabled transient async compute aliasing on platforms that use the page allocator.
#jira UE-210114

[CL 32767787 by zach bethel in ue5-main branch]
2024-04-05 14:34:09 -04:00
luke thatcher
ea788945c1 Workaround performance issues with the GPU profiler by batching timestamp queries.
- Extended RHI[Begin|End]OcclusionQueryBatch to support timestamp queries, and renamed to RHI[Begin|End]RenderQueryBatch
 - The GPU profiler starts a query batch in BeginFrame, which persists on the immediate command list until EndFrame.
 - RDG forwards the current batch to the parallel command lists it creates.
 - Platform RHIs use the active batch to group completion events / sync points to reduce overhead. If no batch is active, they fall back to how they worked before, creating individual sync points per query.

#rb christopher.waters

[CL 32763061 by luke thatcher in ue5-main branch]
2024-04-05 12:29:03 -04:00
zach bethel
a095bbf035 Added AddPostExecuteCallback to RDG to support deferring an operation until after graph execution.
[CL 32654692 by zach bethel in ue5-main branch]
2024-04-01 18:51:58 -04:00