Commit Graph

419 Commits

Author SHA1 Message Date
denys mentiei
0dc15c6085 [EventCount] Race fix
There was a race possible between FPipe::WaitUntilEmpty and FPipe::ClearTask() which was caused by a race in EventCount between Notify & Wait.

That led to a dead-lock on IOS.

Fixed by adding a StoreLoad (kinda) barrier into Notify to ensure the proper memory ordering for Counter value.
RMW is used as a barrier, but could also be solved with a few seq_cst fences (considered to be less optimal and unsupported by TSan).

More details:

Thread 1:
- ClearTask: fetch_sub-s TaskCount to 0 and goes Notify
- Notify: reads count as 0 and exits without any wakes (first bit is not set -> nobody is waiting)
Thead 2:
- WaitUntilEmpty: first check fails, second check check fails, calls PrepareWait (sets count to 1, returns 0), calls WaitFor with 0
- WaitFor: gets count as 1, count without 1 bit is 0, matches token -> wait for Notify which will never happen.

One thread was reading stale data due to memory reordering.

#rb anderson.ramos, danny.couture, Devin.Doucette
#rnx

[CL 36755848 by denys mentiei in 5.5 branch]
2024-10-01 19:11:19 -04:00
marc audy
35f9ed49f3 Eliminate MSVC vtable padding and other packing cleanup in the associated classes
#rnx

[CL 35238395 by marc audy in ue5-main branch]
2024-08-01 01:01:38 -04:00
kevin macaulayvacher
1f8425a0c1 Reduce number of max in-flight tasks allowed in a worker thread queue when AGGRESSIVE_MEMORY_SAVING is enabled. Saves ~320KB per worker thread
#rb danny.couture
#rnx
[FYI] Francis.Hurteau

[CL 35217734 by kevin macaulayvacher in ue5-main branch]
2024-07-31 12:53:53 -04:00
danny couture
4df97af9ae [Scheduler]
- Add delegate fired when the scheduler reaches its maximum oversubscription capacity
 - Add function to determine when the scheduler has exceeded its oversubscription capacity
 - Add CSV accumulative stat each time the scheduler reaches its maximum oversubscription capacity

[CL 35125841 by danny couture in ue5-main branch]
2024-07-26 20:52:39 -04:00
danny couture
aadbf4e833 [Scheduler/TaskGraph]
- Deprecate BusyWait APIs

#rb kevin.macaulayvacher

[CL 34342021 by danny couture in ue5-main branch]
2024-06-13 11:34:29 -04:00
danny couture
6a8068c714 [LocalWorkQueue]
- Replace busywait by a better approach that doesn't end up wasting cpu cycles

#rnx
#rb kevin.macaulayvacher

[CL 34337670 by danny couture in ue5-main branch]
2024-06-13 09:26:25 -04:00
danny couture
3998c8615c [TaskGraph]
- Activate new frontend by default allowing the old taskgraph to benefit from retraction

#jira UE-117550

[CL 33543508 by danny couture in ue5-main branch]
2024-05-09 08:47:18 -04:00
danny couture
75f532dd16 [TaskGraph]
- Remove old task graph backend
 - Remove old deprecations

#rb kevin.macaulayvacher

[CL 33099915 by danny couture in ue5-main branch]
2024-04-19 08:16:10 -04:00
danny couture
0ef7c73833 [EventCount]
- Prevent PrepareWait from being reordered with memory operations around it
 - This fixes a potential deadlock on some platform

#rb kevin.macaulayvacher

[CL 32896177 by danny couture in ue5-main branch]
2024-04-11 13:24:12 -04:00
danny couture
93e9f7dbe3 [TaskGraph]
- Change include name so it's more future proof.

[CL 32886665 by danny couture in ue5-main branch]
2024-04-11 09:11:52 -04:00
danny couture
a46455e27b [TaskGraph]
- Pass the thread index including the queue index to fix some checks being performed by RHI

[CL 32885257 by danny couture in ue5-main branch]
2024-04-11 08:22:40 -04:00
jane haslam
adafa45441 fix compilation error
#rb aleksandr.cicenkov, danny.couture

[CL 32875530 by jane haslam in ue5-main branch]
2024-04-11 04:13:50 -04:00
danny couture
b33a592a64 [Scheduler]
- Fix potential deadlock by always enqueueing to the global queue and perform wakeup for standby workers in case they go to sleep unconditionally.

#rb kevin.macaulayvacher

[CL 32397515 by danny couture in ue5-main branch]
2024-03-21 09:15:22 -04:00
kevin macaulayvacher
4af66860c2 Moves the SyncCompletion Trace marker into the waiting block. Otherwise, code that polls IsDone will create sync trace markers for already completed tasks adding unnecessary overhead for something that can be called often and should be very quick.
#rb danny.couture
[FYI] Francis.Hurteau

[CL 32327594 by kevin macaulayvacher in ue5-main branch]
2024-03-19 11:53:07 -04:00
danny couture
b016769edc [TaskGraph]
- Add missing CORE_API export for WaitForAnyTaskCompleted and AnyTaskCompleted

#rb kevin.macaulayvacher

[CL 32320774 by danny couture in ue5-main branch]
2024-03-19 05:47:35 -04:00
danny couture
873580af56 [Scheduler/Taskgraph]
- Replace busy wait logic by oversubscription using a pool of standby thread.
  - This fixes a whole class of deadlocks because of dependency inversion between tasks on the same callstack.
  - Using a simple oversubscription scope, we can now improve concurrency on any long IO or wait operations.
  - Oversubscription has already been applied to many wait API so it just works without having developers know it's there.
  - Standby threads go back to sleep as soon as the oversubscription period finishes and they finish their current task.
  - Standby threads never busy yield, they go to sleep when no more work is available, giving time slice back for real work.
  - Standby threads are only active when all normal threads are busy and some oversubscription scopes are active.
  - Add dynamic thread creation support so we only create what's needed (especially good on high core count machines).
  - For platforms where static thread creation is preferred. TaskGraph.UseDynamicThreadCreation = 0 can be used.
  - The maximum number of standby threads is controlled by TaskGraph.OversubscriptionRatio. (Current default 2x).
  - Deprecate reserve workers as they are now superseded by this feature.
  - Busywait API has been reimplemented using oversubscription and will be deprecated in another CL to keep this one focused.

#jira UE-209887
#rb kevin.macaulayvacher

[CL 32298767 by danny couture in ue5-main branch]
2024-03-18 09:21:17 -04:00
danny couture
9d8b8be95e [Scheduler]
- Fix race where multiple threads trying to steal from the same local queue could miss a task

#rb kevin.macaulayvacher

[CL 32168260 by danny couture in ue5-main branch]
2024-03-11 18:29:09 -04:00
matt peters
1d5fe6d2ab LLM - Changed bIsDisabled into a multi-state variable that includes the state NotYetKnown. During the NotYetKnown state, the multithreaded synchronization strategy is different, because some values that are normally immutable have not yet been initialized.
#jira UE-208554
#rb Jeff.Fisher
#rnx

[CL 32089115 by matt peters in ue5-main branch]
2024-03-07 11:42:48 -05:00
matt peters
65511c6a9b Fix crash when running game with legacy pakfiles (-skipiostore). FPakReadRequestBase and all subclasses of IAsyncReadRequest need a default implementation for the new EnsureCompletion function rather than a fatal error if called.
#rnx
#rb danny.couture

[CL 31567969 by matt peters in ue5-main branch]
2024-02-16 11:40:30 -05:00
tiago costa
6810836c3a Parallelize cached static raytracing primitives processing in FRayTracingSceneAddInstancesTask.
Context:
- FRayTracingSceneAddInstancesTask processed cached and non-cached raytracing primitives in a single-threaded loop
- Non-cached primitives cannot be efficiently parallelized due to auto-instancing logic.
- On the other hand, cached primitives processing doesn't have dependencies between primitives which means we can efficiently split it in parallel tasks.

Change:
- Modified GatherRayTracingRelevantPrimitives_ComputeLOD to output separate arrays on cached and non-cached static raytracing primitives.
- Also moved logic that filtered out cached static primitives depending on ShowFlags/cvars/etc out of RayTracingSceneStaticInstanceTask and into GatherRayTracingRelevantPrimitives_ComputeLOD.
- Preallocate range of instances in RayTracingScene and VisibleMeshCommands for cached static instances.
- Fill RayTracingScene instances and VisibleMeshCommands for cached static instances in parallel.

#rb aleksander.netzel

[CL 31422467 by tiago costa in ue5-main branch]
2024-02-13 08:42:23 -05:00
kevin macaulayvacher
e897ddf724 AsyncCompilationHelper - Remove 16ms main thread sleep that happens per incomplete task and instead favours polling all tasks and sleeping once if not all tasks are complete.
ICompilable WaitCompletionWithTimeout () method is now documented to poll if given a 0 TimeLimitSeconds. To adjust for this, implementers of ICompilable will now poll if TimeLimitSeconds is 0 before waiting. There are a few implementers that don't use an event and sleep which is not ideal, but we at least now poll again after sleeping to avoid another round trip to know if the task is complete.

FAsyncTaskBase::WaitCompletionWithTimeout now polls for completion when given a time limit of 0 seconds. This simplifies use, and avoids unintended yielding.

Before PIE.Startup
FAsyncTask::SyncCompletion	Total Inclusive Time 5.25s  (40606 calls)

After PIE.Startup
FAsyncTask::SyncCompletion	Total Inclusive Time 195ms (39504 calls)

#jira UE-204061
#rb Francis.Hurteau, danny.couture

[CL 30922445 by kevin macaulayvacher in ue5-main branch]
2024-01-26 09:34:10 -05:00
steve robb
3d35619b76 Deprecated IfAThenAOrB, IfPThenAOrB and XOR.
#rb devin.doucette

[CL 30838783 by steve robb in ue5-main branch]
2024-01-24 06:25:47 -05:00
peter engstrom
48efd40908 Prevent internal compile error.
#rnx

[FYI] andriy.tylychko

[CL 30711936 by peter engstrom in ue5-main branch]
2024-01-19 03:47:02 -05:00
danny couture
68fbfa68cc [ParallelFor]
- Restore ParallelWithPreWork behavior that should always call prework even when number of tasks to execute is 0.

#rb kevin.macaulayvacher

[CL 30657555 by danny couture in ue5-main branch]
2024-01-17 09:05:37 -05:00
steve robb
7da84c1d1b Replaced UE_NODISCARD with [[nodiscard]].
[CL 30593744 by steve robb in ue5-main branch]
2024-01-12 10:47:04 -05:00