Commit Graph

55 Commits

Author SHA1 Message Date
luke thatcher
808b695e4f Replace use of FRHICommandListExecutor::GetImmediateCommandList() with FRHICommandListImmediate::Get()
- Only in places where it is trivially proven the call is only made on the render thread, due to an existing check(IsInRenderingThread()) assert somewhere in the function.
 - FRHICommandListImmediate::Get() itself contains a check(IsInRenderingThread()), so this enforces correct threading, and removes the need for extra checks at the call sites.
 - Remaining uses of FRHICommandListExecutor::GetImmediateCommandList() need investigation. Some may be bugs.
 - Also some changes to make use of the passed-in RHICmdList where possible (e.g. render commands that are given the immediate command list, but call the global getter rather than using the argument they were given).

#rb zach.bethel

[CL 31699633 by luke thatcher in ue5-main branch]
2024-02-21 17:26:04 -05:00
guillaume abadie
e9eb11ef4e Comments an ensure() in DumpGPU to unblock automated tests
#jira UE-205749

[CL 31131433 by guillaume abadie in ue5-main branch]
2024-02-02 13:01:51 -05:00
guillaume abadie
b7425a4813 Fix DumpGPU scopes
[FYI] Luke.Thatcher

[CL 31102738 by guillaume abadie in ue5-main branch]
2024-02-01 16:27:42 -05:00
Luke Thatcher
10cdd4a111 Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637)
Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists.
See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows:

This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this:
 - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit().
 - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job.
 - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition).

One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations:
 - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism.
 - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes.

A summary of the new RHI breadcrumb system is as follows:
 - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging.
 - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes.
 - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress.
 - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs).

RenderGraph scopes have been simplified:
 - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types.
 - Each RDG pass holds a pointer to the scope it was created under.
 - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes.

Other changes include:
 - Fixes for bugs uncovered when parallel translate was enabled.
 - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer.
 - Refactored RHI draw call stats to better fit the new pipeline design.

#rb jeannoe.morissette, zach.bethel
#jira UE-139543

[CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
guillaume abadie
8f89b67394 Changes DumpGPU textures to be top left cornered
#tests win64
[FYI] aleksander.netzel

[CL 29729213 by guillaume abadie in ue5-main branch]
2023-11-14 18:14:49 -05:00
guillaume abadie
89c07ef689 Deduplicates RDG resources pointer across graph builders in DumpGPU to handle r.DumpGPU.FrameCount with many many frames
#jira UE-192501, UE-179496
[FYI] zach.bethel

[CL 28631536 by guillaume abadie in ue5-main branch]
2023-10-10 14:50:30 -04:00
guillaume abadie
079ed4f3a4 Implements r.DumpGPU.CameraCut
#jira UE-192501, UE-179496

[CL 28620797 by guillaume abadie in ue5-main branch]
2023-10-10 11:13:43 -04:00
wojciech krywult
572fd58df2 RenderCore: Fixed crashes that could occur while using DumpGPU console command if writes to the drive are slow.
#rb David.Harvey
#jira UE-160213
#rnx

[CL 27107689 by wojciech krywult in ue5-main branch]
2023-08-15 12:55:53 -04:00
mihnea balta
79ba042011 Fix out of memory crashes when using DumpGPU.
The views made by DumpGPU must be created with lifetime extension disabled, so they release the underlying resources immediately after each dump pass.

#rnx
#jira UE-157708
#rb Guillaume.Abadie, Zach.Bethel

[CL 26769211 by mihnea balta in ue5-main branch]
2023-08-02 09:06:32 -04:00
steve robb
94b8262dab Replaced operator new TArray calls with emplacement.
#rb none

[CL 26395743 by steve robb in ue5-main branch]
2023-07-13 19:17:12 -04:00
zach bethel
aa1b0c680f Deprecated non-command list RHI methods.
- RHICreate{Vertex, Index, Structured}Buffer
 - RHICreate{ShaderResource, UnorderedAccess}View
 - RHIUpdateUniformBuffer
 - Various initialization / locking methods for helper buffer types in RHIUtilities.h

The goal is to continue to force resource creation through command lists to avoid surprises with moving things off the render thread.

#rb christopher.waters

[CL 26183242 by zach bethel in ue5-main branch]
2023-06-22 11:08:27 -04:00
carl lloyd
8b05dfb4f2 Fixed Nanite bugs with Atomic64Compatible on Mac
Fixed missing AtomicCompatible flags in NaniteCullRaster
Fixed incorrect image transition when using DumpGPU

#rb Luke.Thatcher

[CL 26125943 by carl lloyd in ue5-main branch]
2023-06-20 12:38:45 -04:00
Guillaume Abadie
3e85b2179c Implements GetConsoleVariableSetByName()
#rb trivial
#jira UE-184651

[CL 25849305 by Guillaume Abadie in ue5-main branch]
2023-06-07 13:01:48 -04:00
Guillaume Abadie
8c8f521883 Fixes resource transition bug in DumpGPU
#rb trivial
#jira UE-187344
#preflight 64790a0f7a6aeda41b646b96

[CL 25744068 by Guillaume Abadie in ue5-main branch]
2023-06-01 17:36:08 -04:00
Guillaume Abadie
b374168d80 Adds a LogDumpGPU category for improved log search
#rb trivial
#jira none
#preflight 647604654b0d5a1eb1cf537e

[CL 25677309 by Guillaume Abadie in ue5-main branch]
2023-05-30 10:33:20 -04:00
Guillaume Abadie
481856a89f Implements DumpGPU streaming to better diagnose temporal problems on isolated feature like TSR without tanking frame rate
r.DumpGPU.Stream=1 instead allocate and reuse staging resources from its own pool, only issuing copy to staging and gpu fence rhi commands.
Then every frame it polls whether the resource staging is lockable with the GPU fence on render thread.
When ready, the render thread kicks of a background task that cakes care of resource CPU post processing and disk writing.
Once disk write is complete, the render thread polls everyframe a FEvent to know when complete and unlock the staging resource.
The staging resource is then release to staging resource pool ready to be reused for dumping another resource of the current frame.

#rb none
#jira UE-179496
#preflight 64713ebcb310540a8d8e7da3

[CL 25657285 by Guillaume Abadie in ue5-main branch]
2023-05-26 21:01:27 -04:00
Guillaume Abadie
30c80b4d2e Fixes missing RHIUnmapStagingSurface() call in DumpGPU caused by 25447955
#rb trivial
#jira UE-186281
#preflight 647135490515781578e40efc

[CL 25656365 by Guillaume Abadie in ue5-main branch]
2023-05-26 19:06:04 -04:00
Guillaume Abadie
23c87be73c Allows DumpGPU to captures GSystemTextures that have SkipTracking
#rb trivial
#jira none
#preflight 6470f740b20adf94d795b672

[CL 25650053 by Guillaume Abadie in ue5-main branch]
2023-05-26 14:42:30 -04:00
Guillaume Abadie
cde79087c6 Moves DumpGPU's texture post processing in its own PostProcessTexture()
This notiably make the difference between a PreprocessedPixelFormat and PostprocessedPixelFormat in
FTextureSubresourceDumpDesc to handle the special logic for GL not able to write uav with less than 4byte
directly in TranslateSubresourceDumpDesc().

Motivation of this change is to have a very modular PostProcessTexture() easily reusable from asynchronous
stream GPU resource capture to disk.


#rb dmitriy.dyomin
#jira none
#preflight 645bc04d8f4d53ff22a289fa

[CL 25447955 by Guillaume Abadie in ue5-main branch]
2023-05-12 09:55:33 -04:00
mihnea balta
5c24794294 Fix crashes caused by processing the deferred deletion queue at the wrong time, especially when running without an RHI thread.
Since FRHICommandListImmediate::ImmediateFlush() was calling FlushPendingDeletes() after executing the current list of commands, the lambda enqueued by that function was actually running as part of the next command list. Besides unnecessarily extending resource lifetimes, this also meant that the new command list started by flushing the deferred deletion queue, so anything added directly in there (instead of via another enqueued command) would be deleted before the commands were executed on the GPU. This was very easy to reproduce with -norhithread on DX12, because texture unlock operations add the staging buffer to the deferred deletion queue immediately, instead of enqueuing a command to do it, so the staging buffers were gone by the time the GPU tried to copy from them.

This changelist adds a boolean to RHISubmitCommandLists() which is true when we're flushing the immediate command list with the FlushRHIThreadFlushResources mode, so that the RHI can process the deletion queue internally after submission, instead of doing it in RHIPerFrameRHIFlushComplete(). I didn't want to move RHIPerFrameRHIFlushComplete() itself to another point in the timeline, because old RHIs (D3D11 and OpenGL) use that for other purposes, and it seems unwise to alter their behavior (e.g. D3D11 wants to resolve timing queries in there with a blocking wait, and running that at the end of the current command list would introduce a CPU/GPU sync point).

Also:
* deprecated FRHIResource::FlushPendingDeletes(), since all it does is call FlushPendingDeletes() on the command list being passed in, and code outside of the RHI really shouldn't be doing that.
* made FlushPendingDeleteRHIResources_RenderThread() flush the immediate command list instead of calling FlushPendingDeletes() directly

#jira UE-184426
#rnx
#preflight https://horde.devtools.epicgames.com/job/6455194d023fe5d3ad8faa64
#rb Luke.Thatcher

[CL 25423894 by mihnea balta in ue5-main branch]
2023-05-11 06:25:32 -04:00
Guillaume Abadie
706885ec10 Implements r.DumpGPU.FixedTickRate to tick the engine at fixed delta time when dumping many frames
#rb none
#jira none
#fyi jason.hoerner
#preflight 645bb49d7e39940634a7b8cd

[CL 25406503 by Guillaume Abadie in ue5-main branch]
2023-05-10 11:41:42 -04:00
Guillaume Abadie
fecf7b32f5 Fixes an error in DumpGPU viewer when the log's name is not ProjectName.log
#preflight 6459360e2d27fa25b3a3a2f4

[CL 25373783 by Guillaume Abadie in ue5-main branch]
2023-05-08 14:01:36 -04:00
jeannoe morissette
1a99ad4e08 VulkanRHI: Use the DimensionOverride in DumpGPU when dumping slices of arrays individually.
#rb Guillaume.Abadie
#preflight 64540c32fd4b8f4e0d8b1d4c
#rnx

[CL 25341401 by jeannoe morissette in ue5-main branch]
2023-05-04 16:04:03 -04:00
Guillaume Abadie
5bc5350126 Fixes a bug on DumpGPU with Texture2DArray bound to FRenderTargetBinding with ArraySlice==-1
#rb trivial
#jira UE-181710
#preflight 642af2ccce01db47aca7f812

[CL 24947763 by Guillaume Abadie in ue5-main branch]
2023-04-06 13:26:04 -04:00
Guillaume Abadie
77df14a99e Fixes ERHIAccess::CopyDest not being in DumpGPU's output resources
#rb trivial
#jira none
#preflight 6425f79c50546ea3365d02ce

[CL 24860823 by Guillaume Abadie in ue5-main branch]
2023-03-30 17:28:52 -04:00