on shutdown, AppPreExit() is covered by StartRenderCommandFenceBundler()/StoptRenderCommandFenceBundler() to minimise the number of RT tasks as there're lots of render fences in this region. waiting on any of fences stops bundling, e.g. by FlushRenderingCommands(). As this happens early in AppPreExit(), this effectively disables bundling and we get a ton of RT tasks that drives mem usage up.
This CL adds FlushRenderCommandFenceBundler() and uses it in FRenderCommandFence::Wait() instead of StopRenderCommandFenceBundler() to keep bundling enabled
#rb danny.couture
#preflight 647a11ea8417d79259a9944b
[CL 25764394 by Andriy Tylychko in ue5-main branch]
Since FRHICommandListImmediate::ImmediateFlush() was calling FlushPendingDeletes() after executing the current list of commands, the lambda enqueued by that function was actually running as part of the next command list. Besides unnecessarily extending resource lifetimes, this also meant that the new command list started by flushing the deferred deletion queue, so anything added directly in there (instead of via another enqueued command) would be deleted before the commands were executed on the GPU. This was very easy to reproduce with -norhithread on DX12, because texture unlock operations add the staging buffer to the deferred deletion queue immediately, instead of enqueuing a command to do it, so the staging buffers were gone by the time the GPU tried to copy from them.
This changelist adds a boolean to RHISubmitCommandLists() which is true when we're flushing the immediate command list with the FlushRHIThreadFlushResources mode, so that the RHI can process the deletion queue internally after submission, instead of doing it in RHIPerFrameRHIFlushComplete(). I didn't want to move RHIPerFrameRHIFlushComplete() itself to another point in the timeline, because old RHIs (D3D11 and OpenGL) use that for other purposes, and it seems unwise to alter their behavior (e.g. D3D11 wants to resolve timing queries in there with a blocking wait, and running that at the end of the current command list would introduce a CPU/GPU sync point).
Also:
* deprecated FRHIResource::FlushPendingDeletes(), since all it does is call FlushPendingDeletes() on the command list being passed in, and code outside of the RHI really shouldn't be doing that.
* made FlushPendingDeleteRHIResources_RenderThread() flush the immediate command list instead of calling FlushPendingDeletes() directly
#jira UE-184426
#rnx
#preflight https://horde.devtools.epicgames.com/job/6455194d023fe5d3ad8faa64
#rb Luke.Thatcher
[CL 25423894 by mihnea balta in ue5-main branch]
an excessive FGraphEvent instance is still used in `bSyncToRHIAndGPU` branch, and can be similarly improved. though this would require massive changes and won't bring any measurable perf improvements as this happens once a frame.
#preflight 6453d0234d593c0b428d7888
#rb danny.couture, luke.thatcher
[CL 25336380 by Andriy Tylychko in ue5-main branch]
This change includes significant refactor work performed in //UE5/Dev-ParallelRendering. A brief summary of the work is as follows:
Refactored RHI command lists
- Removal of the "immediate" async compute command list
- Introduced an "active pipe" on each command list, allowing RHICmdLists to record work for either graphics or async compute. Pipes can be selected using the SwitchPipeline() function, or the FRHICommandListScopedPipeline helper.
- New explicit command list submission RHI API (RHIFinalizeContext, RHISubmitCommandLists). The IRHICommandContextContainer type has been removed.
- Explicit GPU submission is automatically appended to the immediate command list when it is dispatched to the RHI thread.
Platform RHI implementations
- The new submission API has been implemented across all platforms. Some platforms required a significant refactor.
#rb Mihnea.Balta,Kenzo.Terelst
#jira UE-139550
#preflight 6332e3641003050806d802ef
[CL 22239063 by luke thatcher in ue5-main branch]
The implementation extends the existing FThreadIdleStats to track critical path vs non-critical path waits for a thread.
The cvar r.RenderThreadTimeIncludesDependentWaits forces the main renderthread stat to use the new behavior, but this is disabled by default for historical tracking reasons.
#rb Nuno Leiria
#ROBOMERGE-AUTHOR: ben.woodhouse
#ROBOMERGE-SOURCE: CL 21386702 via CL 21387159 via CL 21387356 via CL 21391039 via CL 21391830
#ROBOMERGE-BOT: UE5 (Release-Engine-Staging -> Main) (v975-21357124)
[CL 21394348 by ben woodhouse in ue5-main branch]
-Allow the frame sync to be kicked from the render thread when using the gtsync==2 (low input latency mode). On some platforms, the RHI thread will block on present. Putting the RHI as the trigger index will block the GameThread until the present is finished (vblank). In low input latency mode, some consoles uses the RHIOffsetThread to kick of the Gamethread, so we dont want it to block on present.
-Use the present mode to submit frame tokens in low input latency mode. Using GT_wait or GT_early introduce stuttering in the frame time since the gamethread will sync on the present when its double buffered. Even if we increase the frame interval (3 or 4), the WaitFrameEventX will eventually block the gamethread when the second display buffer is in used. Note that we delay the barrier transition(RT->Display) until we really use the backbuffer, so we dont technically need to throttle on the frame token, since we are in pseudo triple buffer mode. Issuing GT_Wait/GT_early mode in double buffer mode will always produce bad frame pacing when we use the FRHIFrameOffsetThread approach.
-Set the frame offset interval to zero to sync the os input pooling with the vblank.
[REVIEW] [at]Eric.McDaniel
#ROBOMERGE-OWNER: serge.bernier
#ROBOMERGE-AUTHOR: serge.bernier
#ROBOMERGE-SOURCE: CL 20749857 via CL 20750200 via CL 20750276 via CL 20750599 via CL 20750604
#ROBOMERGE-BOT: UE5 (Release-Engine-Staging -> Main) (v970-20704180)
[CL 20751490 by serge bernier in ue5-main branch]
This change allows users to opt-out of the engines default crash handling. Currently only implemented on Windows where the SEH code forwards the exception to externally registered handlers.
Thanks to Kristj�n Valur J�nsson for PR.
#rb patrick.laflamme
#preflight 62977779101592b80503b809
[CL 20452469 by Johan Berg in ue5-main branch]
### Notes
The check() message and callstack is actually printed in FWindowsErrorOutputDevice::HandleError which is called in the __except block after we RaiseException() inside ReportAssert(). However, RenderingThread's __except doesn't do this, so the contents of GErrorHist are lost to everyone except for CrashReporter. This change adds this necessary logging.
Also, add Windows Launch support for -IgnoreDebugger when choosing __try/__except or not (this precedes initialization of GIgnoreDebugger). This makes debugging easier for this class of issue.
### Testing
Repro steps are in the JIRA. Tested at that change (which has a reliable RenderingThread check() failure), and at #head where no defect exists. Tests fail and pass, respectively, as expected.
#rnx
#rb francis.hurteau
#jira UE-151056
#preflight 627b0b4ac42338be653745e6
[CL 20140980 by geoff evans in ue5-main branch]