Commit Graph

283 Commits

Author SHA1 Message Date
danny couture
71ebc3cb5c [TaskGraph]
- Fix use-after-free in both WaitForAnyTaskCompleted and UE::Tasks::WaitAny

#rnx
#jira UE-208191
#rb kevin.macaulayvacher
#tests Stress FTaskGraphWaitForAnyTask under ASAN/TSAN for both TASKGRAPH_NEW_FRONTEND on/off

[CL 31870683 by danny couture in ue5-main branch]
2024-02-28 09:48:11 -05:00
dmytro vovk
2c523f4899 Fixed default initialization of TLS slots to 0 and invalidity check against 0 as 0 is a valid TLS slot index
Attempt no. 2
#rb Francis.Hurteau, Matt.Peters

[CL 31123996 by dmytro vovk in ue5-main branch]
2024-02-02 09:59:30 -05:00
sean boocock
5a5ab2b7cd [Backout] - 31084550 - Blocking launching FN editor
[FYI] dmytro.vovk
Original CL Desc
-----------------------------------------------------------------
Fixed default initialization of TLS slots to 0 and invalidity check against 0 as 0 is a valid TLS slot index
#rb Francis.Hurteau, Matt.Peters

[CL 31088717 by sean boocock in ue5-main branch]
2024-02-01 11:00:01 -05:00
dmytro vovk
08c3036b27 Fixed default initialization of TLS slots to 0 and invalidity check against 0 as 0 is a valid TLS slot index
#rb Francis.Hurteau, Matt.Peters

[CL 31084584 by dmytro vovk in ue5-main branch]
2024-02-01 08:33:08 -05:00
Luke Thatcher
10cdd4a111 Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637)
Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists.
See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows:

This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this:
 - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit().
 - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job.
 - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition).

One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations:
 - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism.
 - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes.

A summary of the new RHI breadcrumb system is as follows:
 - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging.
 - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes.
 - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress.
 - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs).

RenderGraph scopes have been simplified:
 - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types.
 - Each RDG pass holds a pointer to the scope it was created under.
 - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes.

Other changes include:
 - Fixes for bugs uncovered when parallel translate was enabled.
 - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer.
 - Refactored RHI draw call stats to better fit the new pipeline design.

#rb jeannoe.morissette, zach.bethel
#jira UE-139543

[CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
steve robb
66266c6a11 Fixed up DerivedDataCache, DesktopPlatform, ApplicationCore, AssetRegistry, Core, CoreUObject, Projects, Sockets code to use EAllowShrinking instead of bools.
[CL 30676428 by steve robb in ue5-main branch]
2024-01-17 19:51:06 -05:00
steve robb
7da84c1d1b Replaced UE_NODISCARD with [[nodiscard]].
[CL 30593744 by steve robb in ue5-main branch]
2024-01-12 10:47:04 -05:00
johan berg
56c3d1a953 Alllow applications to limit the total number of worker threads in TaskGraph
#rb danny.couture

[CL 30565011 by johan berg in ue5-main branch]
2024-01-11 06:52:25 -05:00
danny couture
3b347cfcfb [TaskGraph]
- Fix new frontend to use ConcurrentLinearAllocator for normal tasks like the old frontend.
  - Store subsequents and prerequisites in a more cache-friendly way to significantly improve performance.

#jira UE-117550
#rb Francis.Hurteau

[CL 30478814 by danny couture in ue5-main branch]
2024-01-08 07:32:21 -05:00
danny couture
2222619119 [Scheduler]
- Fix DoNotRunInsideBusyWait tasks that ended up being picked up during busy waiting anyway.
  - Change internal API of ExecuteTask to make it evident it can actually return a new task.
  - Make sure that busy wait will dequeue tasks from local/global queue in priority order.
  - Remove unused local queue functions.

#jira UE-199870
#rb JeanFrancois.Dube

[CL 29908002 by danny couture in ue5-main branch]
2023-11-23 07:04:09 -05:00
danny couture
f0d2935d27 [Scheduler]
- Remove costly insight trace that should not have been submitted

[CL 29880057 by danny couture in ue5-main branch]
2023-11-21 19:44:40 -05:00
danny couture
deb6fca216 [Scheduler]
- Fix deadlock and indeterminism caused by not always signaling a thread after launching a task
- Fix deadlock that happened when trying to launch blocking tasks on all workers
- Use a PreWait/CancelWait/CommitWait semantic to close the gap where a drowsing thread could pick a task without the signaling thread knowing about it.
- Deprecate TryLaunchAffinity workaround now that the deadlocks above have been fixed
- Get rid of task stealing throttling as stealing to reduce latency as it is only involved when searching work before going to sleep
- Less spurious wake-ups as a worker will only wake-up another one when picking up a task if CancelWait has consumed a signal
- Workers will now honor task priorities between local and global queues
- Add blocked workers benchmark test

#jira UE-199959
#rb JeanFrancois.Dube
#tests TSAN / UnitTests / StressTests / ReplayRuns

[CL 29744016 by danny couture in ue5-main branch]
2023-11-15 08:20:09 -05:00
jared cotton
936c3772e6 SOL-4839 - Use free bits in VCell header to create mutex's in subclasses which require them instead of them having their own mutex members.
- Add UE::FTwoBitMutex which takes a pointer to an external-state of which it only uses its two LSBs.
- Adds tests for FMutex and FExternalMutex

#rb devin.doucette
#rb yiliang.siew

[CL 29024883 by jared cotton in ue5-main branch]
2023-10-23 19:30:12 -04:00
louisphilippe seguin
6a0e046774 Fix missing include
#jira UE-197898

[CL 28990108 by louisphilippe seguin in ue5-main branch]
2023-10-20 19:42:24 -04:00
dpull
628190e29d Allow OnlineSubsystemNull and ReserveScheduler to start threads in a forked multithread environment
#jira UE-197898
#10979 tag
#rnx

[CL 28986073 by dpull in ue5-main branch]
2023-10-20 18:20:06 -04:00
andriy tylychko
0f84b2f46f removed a member that is not used anywhere, a remnant of old versions
#rb cosmetic
[FYI] danny.couture

[CL 28449088 by andriy tylychko in ue5-main branch]
2023-10-04 06:45:08 -04:00
andriy tylychko
58db28f5d2 "any task" support
#rb luke.thatcher, danny.coutrue

[CL 28232583 by andriy tylychko in ue5-main branch]
2023-09-26 12:17:47 -04:00
ionut matasaru
f476922a6e Fixed usage of CpuProfiler trace in TaskTrace.
FCpuProfilerTrace::OutputEventType() and FCpuProfilerTrace::OutputBeginDynamicEvent() needs to be called only if the CpuChannel is enabled, otherwise the "spec" trace event is not emitted, so it will result in "<unknown>" cpu timers. The issue can be easily reproduced with -trace=tasks (i.e. without enabling cpu channel).

#jira UE-196134
#rb Andriy.Tylychko

[CL 28185939 by ionut matasaru in ue5-main branch]
2023-09-25 09:26:45 -04:00
andriy tylychko
bf42c10569 disable reserve scheduler by default as it seems to be causing deadlocks on low-core windows
#rb francis.hurteau

[CL 27598266 by andriy tylychko in ue5-main branch]
2023-09-05 11:41:26 -04:00
andriy tylychko
a0b79d2c56 removed tasks LLM instrumentation because they are not useful and steal data from other tags. tasks are a low-level thing that shouldn't be represented in a high-level mem usage overview
#rb trivial

[CL 27380009 by andriy tylychko in ue5-main branch]
2023-08-25 12:25:28 -04:00
andriy tylychko
8d1ba56905 TaskGraph: fixed a race reported by TSAN - StartWorkers() adds to ReserveEvents array while a just created worker accessed it in CreateWorker() via EventStack
#rb danny.couture

[CL 26635216 by andriy tylychko in ue5-main branch]
2023-07-27 05:35:47 -04:00
andriy tylychko
ab120cd9e3 adjusted LLM tracing for tasks
#rb francis.hurteau

[CL 26230373 by andriy tylychko in ue5-main branch]
2023-06-26 06:01:47 -04:00
Devin Doucette
ed92f77023 ParkingLot: Gracefully support waiting even after FThreadLocalData is destroyed
#jira none
#preflight 64652cd3f033744ae6a9cca1
#rb Zousar.Shaker
#rnx

[CL 25514848 by Devin Doucette in ue5-main branch]
2023-05-17 15:55:21 -04:00
Andriy Tylychko
28e264700e LLM tags for tasks
#preflight 645cd2872c180971eefa0322
#rb none

[CL 25424808 by Andriy Tylychko in ue5-main branch]
2023-05-11 07:52:16 -04:00
Andriy Tylychko
623986fd4d TaskGraph: fixed tracing of standalone graph events that don't have nested tasks at the moment of dispatching subsequents. because of this issue a lot of graph events weren't shown in TasksInsights
This is a rather "hacky" fix. The old task implementation (TaskGraph API) is on a life support so there's no point of spending too much time for this
#preflight 644be99f1c2846595c45a4f2
#rb danny.couture

[CL 25235003 by Andriy Tylychko in ue5-main branch]
2023-04-28 13:09:16 -04:00