Clang __builtin_frame_address(0) is not behaving like msvc _AddressOfReturnAddress() on all windows platforms causing issues in the stacktrace tracking.
More investigation is needed to see if we can use the fast unwind path on clang as well.
#jira UE-222509
#rb ben.woodhouse, Wojciech.Krywult
[CL 35796224 by daniele pieroni in ue5-main branch]
Note:
- not enabled in shipping, currently only used to compute the Input latency debug stat as in other RHIs
- computed stats are currently different from Nvidia Reflex PC latency timing reports
- needs extra work to support g-sync/free sync monitors
#rb ionut.matasaru, Luke.Thatcher
#jira none
[CL 35053471 by daniele pieroni in ue5-main branch]
The Warning:
warning C6393: A lookup table of size 365 is not sufficient to handle leap years.
This is a warning that goes off if any static array is 365 members long. The intent of the warning is to stop common misuse of static arrays to look up days of the year.
In practice someone hitting this could just disable the warning locally to approve the array and remove the warning. In our case though the code generation step can easily generate arrays 365 members long and there is no (good) way to fix it.
So turning it off globally makes the most sense for us.
#rb Marc.Audy, Rob.Cannaday
[CL 33697690 by thomas mauer in ue5-main branch]
Now -processaffinity=<number of logical cores> can be used to restrict the process to use a specific number of logical cores, and -processaffinityphysical=<number of physical cores> can be used to restrict the process to use a specific number of physical (non-hyperthreaded) cores.
When using this option, explicitly setting thread affinities is disabled for simplicity.
#rb Arciel.Rekman, danny.couture
[CL 33691407 by daniele vettorel in ue5-main branch]
- Keep the generic FPlatformMemory implementation of those callbacks in a common place to avoid duplication
- Note the APIs are defined in each eos_<Platform>.h header separately (they're passed in the platform specific Initialization struct) but our implementation is the same on all but one platform so we keep it in a common place.
- Update ApiVersion in various places in line with EOSSDK 1.16.2 update
#jira UE-209192 UE-209549
[REVIEW] [at]Alejandro.Aguilar
#rb alejandro.aguilar
#tests local builds, preflight, tested on console
[CL 32403264 by chris varnsverry in ue5-main branch]
Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists.
See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows:
This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this:
- The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit().
- A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job.
- Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition).
One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations:
- RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism.
- Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes.
A summary of the new RHI breadcrumb system is as follows:
- A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging.
- The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes.
- RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress.
- Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs).
RenderGraph scopes have been simplified:
- The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types.
- Each RDG pass holds a pointer to the scope it was created under.
- BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes.
Other changes include:
- Fixes for bugs uncovered when parallel translate was enabled.
- Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer.
- Refactored RHI draw call stats to better fit the new pipeline design.
#rb jeannoe.morissette, zach.bethel
#jira UE-139543
[CL 30973133 by Luke Thatcher in ue5-main branch]
Deprecated CONSTEXPR, PLATFORM_COMPILER_HAS_DECLTYPE_AUTO, PLATFORM_COMPILER_HAS_FOLD_EXPRESSIONS and UE_NORETURN.
Made UE_NODISCARD mandatory (not yet deprecated as fixup is needed).
#rb devin.doucette
[CL 30617451 by steve robb in ue5-main branch]
When we encounter an OOM-crash in UEFN's cloud cooking, we would like to be able to quickly rule out different parts of the cooking process. One major part that we have poor visibility on at the moment is shader compilation. This CL adds a codepath that prints out the memory usage of the shader compile processes when such a crash occurs.
#rb Laura.Hermanns
[CL 30479044 by sebastian schoner in ue5-main branch]