Commit Graph

297 Commits

Author SHA1 Message Date
jason hoerner
2b0f208938 Visualize Texture: Performance and feature upgrades.
* Visualize texture system starts out in an inactive state until a command is issued, avoiding overhead of tracking views and scene textures, saving 1.4% on the render thread.
* Visualization overhead eliminated for views besides the one currently being visualized.
* Support for visualization of textures from scene captures, via "view=N" option (specifying the unique ID of the view), with "view=?" displaying a list of views for reference.
* Improved visualization for cube maps.  PIP uses 2:1 aspect for the longitudinal render to match resource viewer display, and pixel perfect option shows tiled flat cube map faces (actual pixels) rather than running a projection.
* Padding for scene or screen pass textures is removed in the visualization -- the padding otherwise shows up as garbage or blank space.

To remove scene texture padding, it's necessary to add a field to RDG textures to provide an option to track the viewport sizes that were rendered for a given texture.  If not set, the assumption is the whole texture was rendered.  The field is set for FSceneTextures and FScreenPassTexture, covering the vast majority of cases, plus the denoiser was spot fixed -- worst case if any other cases are missed, you still see the padding.  You can tell padding was present when visualizing by contrasting the texture size with the viewport size.

Padding was always a potential issue for the visualizer, but is exacerbated by scene captures, as the padded scene textures are set to a size that's a union of the main view and any scene captures.  Padding is also exacerbated by dynamic resolution scaling, as the buffers will be padded to the maximum resolution.  For example, a cube map rendering at 512x512 will have 93% of the pixel area as padding if the front buffer is at 1440p, or the default dynamic resolution setup will have 70% of the pixels as padding at minimum res.

#rb Jason.Nadro

[CL 31160232 by jason hoerner in ue5-main branch]
2024-02-03 16:07:46 -05:00
zach bethel
75e48eba41 Removed RDG wait stat to work around crash in CSV profiler
[CL 31067782 by zach bethel in ue5-main branch]
2024-01-31 17:19:44 -05:00
zach bethel
1218f41c3a Fixed RHI validation error when transitioning back to ERHIAccess::Discard.
#jira UE-205031

[CL 31052471 by zach bethel in ue5-main branch]
2024-01-31 12:08:00 -05:00
zach bethel
ed52df29e8 Fixed bad robomerge.
[CL 30984434 by zach bethel in ue5-main branch]
2024-01-29 21:37:12 -05:00
zach bethel
1747ae9c34 Fixed build break.
[CL 30983529 by zach bethel in ue5-main branch]
2024-01-29 20:59:31 -05:00
zach bethel
f238fac4d9 Fixed race condition between RDG builder and async destruction task. Added a sync point with scene render clean-up to fix race condition with scene extension context.
#jira UE-204879

[CL 30983204 by zach bethel in ue5-main branch]
2024-01-29 20:46:43 -05:00
zach bethel
4f9d3be993 Fixed race condition between RDG builder and async destruction task. Added a sync point with scene render clean-up to fix race condition with scene extension context.
#jira UE-204879

[CL 30981309 by zach bethel in ue5-main branch]
2024-01-29 19:13:40 -05:00
Luke Thatcher
10cdd4a111 Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637)
Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists.
See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows:

This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this:
 - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit().
 - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job.
 - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition).

One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations:
 - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism.
 - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes.

A summary of the new RHI breadcrumb system is as follows:
 - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging.
 - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes.
 - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress.
 - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs).

RenderGraph scopes have been simplified:
 - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types.
 - Each RDG pass holds a pointer to the scope it was created under.
 - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes.

Other changes include:
 - Fixes for bugs uncovered when parallel translate was enabled.
 - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer.
 - Refactored RHI draw call stats to better fit the new pipeline design.

#rb jeannoe.morissette, zach.bethel
#jira UE-139543

[CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
jon olick
3a2895f321 Fixed null pointer deref crash on Android. DepthBeforeState was NULL.
#rb Tiantian.Xie

[CL 30893588 by jon olick in ue5-main branch]
2024-01-25 13:42:52 -05:00
zach bethel
253c7eea71 Submit RDG setup command lists before flushing the RHI thread to avoid PSO refcounting issue.
[CL 30851378 by zach bethel in ue5-main branch]
2024-01-24 12:51:27 -05:00
steve robb
fde2961f55 Fixed up a lot of bool-taking container resize functions to take EAllowShrinking instead.
[CL 30803608 by steve robb in ue5-main branch]
2024-01-23 09:51:10 -05:00
zach bethel
afa7e5d622 Fixed PSO cache refcount check. Flush RDG command list setup tasks to the RHI thread prior to flushing the PSO cache.
[CL 30795393 by zach bethel in ue5-main branch]
2024-01-22 22:42:55 -05:00
jason hoerner
4843583c83 Render Graph: A couple simple development build RDG optimizations.
* Skip contents of Begin/EndUAVOverlap when GRHIValidationEnabled is false, avoiding expensive GatherPassUAVsForOverlapValidation call, saving around 1.4% frame wide.  Lower level function does nothing when validation isn't enabled.
* Optimize trivial RDG event names with no format specifiers.  Detects at compile time that there are no variadic parameters and calls a non-variadic version of the constructor, which references the string instead of generating a dynamically allocated copy through FCString::GetVarArgs.  Covers around 40% of RDG event names in my test case, saving around 1.2% frame wide.

#rnx
#rb mihnea.balta

[CL 30781413 by jason hoerner in ue5-main branch]
2024-01-22 13:51:08 -05:00
zach bethel
522b0e9af3 RDG Refactor to Improve Parallelism
- Subresource state tracking manipulates less data by always using pointers instead of copying full states.
 - Offloaded pooled texture / buffer allocations to an async task.
 - Removed dependency between pooled resource allocations and collecting resource barriers. The 'finalize' phase now gathers both first and last barriers for a resource.
 - Refactors RDG allocator implementation to make things simpler.
 - Moved async tasks within RDG to use the setup task API.
 - Moved culling into the async setup queue.

[CL 30774932 by zach bethel in ue5-main branch]
2024-01-22 11:04:48 -05:00
steve robb
f029468598 Fixed up a lot of bool-taking container resize functions to take EAllowShrinking instead.
[CL 30729174 by steve robb in ue5-main branch]
2024-01-19 16:41:35 -05:00
dmitriy dyomin
1c99a294d8 Make sure memoryless textures are discarded by RDG at the end of the pass
#rb mihnea.balta, zach.bethel

[CL 30257193 by dmitriy dyomin in ue5-main branch]
2023-12-11 23:34:38 -05:00
mihnea balta
3bdadfd747 Use FOptionalTaskTagScope instead of FTaskTagScope to avoid overwriting the current tag when task retraction kicks in.
This fixes some invalid tag ensures in FD3D11DynamicRHI::RHICreateUniformBuffer.

#rnx
#rb zach.bethel

[CL 29783616 by mihnea balta in ue5-main branch]
2023-11-16 13:12:57 -05:00
yuriy odonnell
c3ad0622c8 Add support for reserved buffer commit operation to RDG
* Add QueueCommitReservedBuffer() to FRDGBuilder
* Add GetCommittedSize() to FRDGPooledBuffer, to query the physical size of the buffer
* Change ResizeBufferIfNeeded() to issue a commit operation instead of copying when possible
* Add r.Nanite.Streaming.ReservedResources CVar (default: false) to allocate Nanite ClusterPageData.DataBuffer as a reserved resource of maximum size, which is committed in ResizeBufferIfNeeded()

Based on implementation by Zach Bethel.

#rb Zach.Bethel

[CL 29626308 by yuriy odonnell in ue5-main branch]
2023-11-09 21:34:39 -05:00
guillaume abadie
89c07ef689 Deduplicates RDG resources pointer across graph builders in DumpGPU to handle r.DumpGPU.FrameCount with many many frames
#jira UE-192501, UE-179496
[FYI] zach.bethel

[CL 28631536 by guillaume abadie in ue5-main branch]
2023-10-10 14:50:30 -04:00
zach bethel
1adbcfc6f2 Fixed assert in RDG task queue flush due to waiting on task while in a task pipe.
- Converted lock free list to locked array.

[CL 28322428 by zach bethel in ue5-main branch]
2023-09-28 13:09:55 -04:00
kenzo terelst
0b678d4693 Add SupportsResourceType to transient allocation because certain RHIs might not support all resources types
#rb Luke.Thatcher

[CL 28266473 by kenzo terelst in ue5-main branch]
2023-09-27 04:23:25 -04:00
zach bethel
df8c177da3 Removed PSO cache flush from command list flush method and manually placed it to happen after each scene render.
#rb graham.wihlidal
#jira UE-196266

[CL 28238003 by zach bethel in ue5-main branch]
2023-09-26 13:56:11 -04:00
zach bethel
7156e6f9d1 Fixed RDG AddCommandListSetupTask to support single tasks or task collections. Added IsParallelSetupEnabled() external call.
[CL 28213694 by zach bethel in ue5-main branch]
2023-09-25 20:37:06 -04:00
dmitriy dyomin
61d30a34c9 Remove UBMT_RDG_UNIFORM_BLOCK_SRV, added a UB usage flag that can be used to identify "UniformView" buffers
#rb none

[CL 27943664 by dmitriy dyomin in ue5-main branch]
2023-09-15 23:24:28 -04:00
zach bethel
1c6ffb85f7 Defer submission of AddCommandListSetupTask to batch calls to AsyncCommandListSubmit.
[CL 27833869 by zach bethel in ue5-main branch]
2023-09-13 11:48:20 -04:00