This improves performance of FPrimitiveSceneInfo::CacheNaniteMeshCommands from worst-case measurements of ~9ms to ~3ms
#rb ola.olsson, brandon.dawson
#lockdown michal.valient
#preflight 616d8af408cf4d00013f7a6a
#ROBOMERGE-AUTHOR: jamie.hayes
#ROBOMERGE-SOURCE: CL 17846248 via CL 18003588 via CL 18369608 via CL 18369702
#ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v895-18170469)
[CL 18369790 by jamie hayes in ue5-release-engine-test branch]
#rb matt.peters johan.berg ionut.matasaru brandon.dawson jason.nadro
#rnx
Over half the perf gain was from reducing critical section locks, with the rest of the gains split between an optimization to call stack traversal (caching return address lookups in a lock free map), and enabling TLS related optimizations for MallocBinned2, which were inadvertently disabled when tracing was enabled (this also saves some locks). Multiple lock types had to be optimized before measurable perf gains were observed, due to contention on remaining locks increasing as the first few locks were removed. A total of 6 types of locks were removed or reduced: AnnounceFNameTag, MallocBinned2, TagDataNameMap, Enum tag data lookup, Windows callstack trace, LLMMap.
* Added hash table utility class TGrowOnlyLockFreeHash which supports lock free reads, writes using a critical section, and no deletion. This is intended for classes that represent caches of one-time initialized debug data used during memory tracing, which never needs to be freed.
* TGrowOnlyLockFreeHash applied to 4 sets/maps used when memory tracing is enabled, including tag data, announced tag names, call stack functions, and call stack IDs.
* Lock striping added to LLMMap. This isn't a good candidate for a lock free approach, as it has a heavy amount of modification, and lock free approaches tend to perform worse than lock striping in that case, plus it would be a massive undertaking to make this lock free.
* A future lock to be removed would be the one in "FLowLevelAllocInfo::GetTag", protecting the "FLowLevelMemTracker::TagDatas" array, saving a couple percent more perf.
#preflight 6132a3a6bf137d00019a8c97
#ROBOMERGE-SOURCE: CL 17428762 via CL 17429451
#ROBOMERGE-BOT: STARSHIP (Main -> Release-Engine-Test) (v865-17346139)
[CL 17429480 by jason hoerner in ue5-release-engine-test branch]
Implementation details:
- PS4 & PS5 uses dwarf symbol format and uses Syms symbol resolver.
- To resolve symbols the path to folder where .self file is built currently must be specified in UE_INSIGHTS_SYMBOL_PATH env variable for Insights.
- Multiple paths can be separated by ; in this variable.
- Build for PS5 does not seem to have PLATFORM_PS5 define, I used defined(__PS5__).
- PS5 runtime collects and sends callstacks, but Syms resolver does not support dwarf v5 format yet, which is used on PS5 toolchain.
#rb none
#preflight 6112ead49c7bb10001bc63fc
#ROBOMERGE-SOURCE: CL 17128247 in //UE5/Main/...
#ROBOMERGE-BOT: STARSHIP (Main -> Release-Engine-Test) (v855-17104924)
[CL 17128270 by martins mozeiko in ue5-release-engine-test branch]
r.LumenScene.SurfaceCache.MeshCardsMergeComponents - enable/disable
r.Lumen.Visualize.CardPlacementLOD - which LOD level to visualize (all, instances, merged instances, merged components)
r.Lumen.Visualize.CardPlacementPrimitives - visualize which primitives were used to merge
#ROBOMERGE-SOURCE: CL 16355257 in //UE5/Private-Frosty/... via CL 16355260
#ROBOMERGE-BOT: STARSHIP (Main -> Release-Engine-Test) (v804-16311228)
[CL 16355272 by krzysztof narkowicz in ue5-release-engine-test branch]
#ROBOMERGE-SOURCE: CL 12673399 via CL 12673400 via CL 12673405
#ROBOMERGE-BOT: RELEASE (Release-Engine-Staging -> Main) (v675-12543919)
[CL 12673408 by arne schober in Main branch]
#JIRA
#RB
#ROBOMERGE-SOURCE: CL 12117125 in //UE4/Release-4.25/... via CL 12117149
#ROBOMERGE-BOT: RELEASE (Release-4.25Plus -> Main) (v657-12064184)
[CL 12117174 by arne schober in Main branch]