This represents UE4/Main @18073326, Release-5.0 @18081140 and Dev-PerfTest @18045971
[CL 18081471 by aurel cordonnier in ue5-release-engine-test branch]
This represents UE4/Main @17911760, Release-5.0 @17915875 and Dev-PerfTest @17914035
[CL 17918595 by aurel cordonnier in ue5-release-engine-test branch]
This represents UE4/Main @17774255, Release-5.0 @17791557 and Dev-PerfTest @17789485
[CL 17794212 by aurel cordonnier in ue5-release-engine-test branch]
* Introduce the concept of heaps to memory tracing. A heap is defined a block of memory which can be used to host allocations. Heaps can be given descriptive names and belong to logical hierarchy. At the top of the hierarchy is one of 16 possible "root heaps". Currently the only two root heaps are system memory (virtual memory) and video memory (VRAM). If no root heap is specified for an allocation, system memory is assumed.
* Remove the CoreAdd/CoreRemove events and replace them with Alloc + MarkAllocAsHeap events.
* Introduce a public interface for tracing memory allocations.
* Reduce the memory footprint of allocation events by changing the "Owner" field (hash or return address) into a "CallstackId" (running unique callstack counter). Additonally introduced a AllocSystem/AllocVideo specialized event which encodes the root heap in the message type. Total size of the event went from 21 bytes to 17 bytes/18 bytes.
* Remove realloc scopes. These scopes comprised 50% of events during memory tracing, but the information can be reconstructed during analysis in the case of reallocations. The only other use case (in Level.cpp) was using the scope to maintain tag while copying from one array to another. For this use case we add a specific scope for pushing the tag of a specific pointer.
#rb ionut.matasaru
#preflight 614b0058a3310f000101cc42
#ROBOMERGE-AUTHOR: johan.berg
#ROBOMERGE-SOURCE: CL 17594699 in //UE5/Main/...
#ROBOMERGE-BOT: STARSHIP (Main -> Release-Engine-Test) (v871-17566257)
[CL 17594714 by johan berg in ue5-release-engine-test branch]
Make memory tag specs important so they don't get dropped before a trace connection is made.
Prevent Insights from crashing when opening a trace missing memory scopes or with missing scope specs.
Add a tooltip to symbol resolution noting the environment variable for symbol paths.
Allow Insights to capture full callstacks (including system modules) with the define UE_CALLSTACK_TRACE_FULL_CALLSTACKS or the command line argument -tracefullcallstacks on Windows.
#jira none
#rb ionut.matasaru, martin.ridgers
#ROBOMERGE-AUTHOR: robert.millar
#ROBOMERGE-SOURCE: CL 17461782 in //UE5/Main/...
#ROBOMERGE-BOT: STARSHIP (Main -> Release-Engine-Test) (v870-17433530)
#ROBOMERGE[bot1]: Dev-EngineMerge
[CL 17461823 by robert millar in ue5-release-engine-test branch]
#rb matt.peters johan.berg ionut.matasaru brandon.dawson jason.nadro
#rnx
Over half the perf gain was from reducing critical section locks, with the rest of the gains split between an optimization to call stack traversal (caching return address lookups in a lock free map), and enabling TLS related optimizations for MallocBinned2, which were inadvertently disabled when tracing was enabled (this also saves some locks). Multiple lock types had to be optimized before measurable perf gains were observed, due to contention on remaining locks increasing as the first few locks were removed. A total of 6 types of locks were removed or reduced: AnnounceFNameTag, MallocBinned2, TagDataNameMap, Enum tag data lookup, Windows callstack trace, LLMMap.
* Added hash table utility class TGrowOnlyLockFreeHash which supports lock free reads, writes using a critical section, and no deletion. This is intended for classes that represent caches of one-time initialized debug data used during memory tracing, which never needs to be freed.
* TGrowOnlyLockFreeHash applied to 4 sets/maps used when memory tracing is enabled, including tag data, announced tag names, call stack functions, and call stack IDs.
* Lock striping added to LLMMap. This isn't a good candidate for a lock free approach, as it has a heavy amount of modification, and lock free approaches tend to perform worse than lock striping in that case, plus it would be a massive undertaking to make this lock free.
* A future lock to be removed would be the one in "FLowLevelAllocInfo::GetTag", protecting the "FLowLevelMemTracker::TagDatas" array, saving a couple percent more perf.
#preflight 6132a3a6bf137d00019a8c97
#ROBOMERGE-SOURCE: CL 17428762 via CL 17429451
#ROBOMERGE-BOT: STARSHIP (Main -> Release-Engine-Test) (v865-17346139)
[CL 17429480 by jason hoerner in ue5-release-engine-test branch]
Use Perf trace for context switches and stack sampling on PS4 and PS5.
#rb Ionut.Matasaru
#ROBOMERGE-SOURCE: CL 17277312 in //UE5/Main/...
#ROBOMERGE-BOT: STARSHIP (Main -> Release-Engine-Test) (v858-17259218)
[CL 17277323 by martins mozeiko in ue5-release-engine-test branch]