Files
UnrealEngineUWP/Engine/Plugins/Runtime/StateTree/Source/StateTreeModule
johan torp 146b21d3d4 GC reachability optimizations. ~1.5x speedup and ~2.5MB memory saved for an internal title.
List of optimizations and changes:
* Token stream structure
    * Split token stream into strong-only and a mixed (weak+strong) stream
    * Split token stream into a builder and a tighter view class which reduces sizeof(UClass)
    * Implemented ref-counted token stream view sharing
    * Removed Class and Outer from token stream
    * Allow empty token streams (enabled by removing Class/Outer) to avoid touching token stream data
    * Placed ARO (AddReferencedObjects) last to reduce per object cache thrashing, improve control flow predicability and avoid reading the last EndOfPointer and EndOfStream tokens
    * FPrefetchingObjectIterator that bring in Class/Outer, class' tokenstream view and the first token data ahead of time
    * Decode token bitfield once and ahead of time
* Reference queues and batch processing
    * Introduced bounded queues: ref arrays -> unvalidated refs -> validated (non-null / non-permanent) refs
    * Split all these queues for killable vs immutable references
    * Stack-living references still handled synchronously. With removal of Class/Outer (prefetched ahead of time) few instances remain outside of ARO calls.
    * Outer queues hold 32 items and get flushed when full.
* AddReferencedObjects (ARO) optimizations
    * Misc optimizations in many ARO implementations
    * New FReferenceCollector API to queue up ARO references (AddStableReference), old sync API (HandleObjectReference) still available
    * New AddPropertyReferences traversal that replaces SerializeBin and PropertyValueIterator
        * 4.5x faster than PropertyValueIterator
        * Uses CLASSCAST dispatch instead of virtual SerializeItem dispatch.
        * Step towards new unified token stream replacement shared by class token processing, structs and ARO
    * Replaced StructUtil::AddReferencedObjects with AddPropertyReferences traversal, ~8x speedup and collects more references for CitySample
* Parallelism
    * Single long-running task per worker
    * Improved work-stealing / load-balancing, workers can steal full blocks, ARO calls and initial references
    * Queue up slow ARO calls to improve load balancing and avoid late stragglers. Motivated by certain ARO calls taking over 2ms for a few specific objects.
    * Kick tasks manually to avoid ParallelFor end synchronization
* FGCObject
    * Initial reference collector runs in parallel with mark phase
    * New FGCObject constructor API (AddStableNativeReferencesOnly) to opt-in to initial reference collection, used by StreamableManager
    * Same constructor API allows FGCObjects to defer registration until they become active (RegisterLater), reduces number of active GCObjects
* Reduced memory usage
    * Allocate reached objects in scratch pages (FWorkBlock) and reuse processed blocks, instead of swapping two big TArray<UObject*> per worker
    * Reduced sizeof(UClass)
    * Shareable token streams
* Misc optimizations
    * New API to test if an object is in the permanent object pool. Old API read two global pointers for every visited reference.
    * Fixed signed integer usage in GUObjectArray lookup that led to bad codegen
    * FPropertyIterator optimizations
    * SerializeBin optimizations
* Other changes
    * Moved many helpers into UE::GC namespace
    * Replaced TFastReferenceCollector API with simplified CollectReferences call. Needed to break this API  any way.
    * Introduced FGCInternals to avoid forward-declaring TFastReferenceCollector and depend on the options enum in common headers
    * Moved and outlined code from GarbageCollection.h / FastReferenceCollector.h to GarbageCollection.cpp
    * Moved GC History and Garbage Reference Tracking into a synchronous TDebugReachabilityProcessor
    * Removed PersistentGarbage flag since it wasn't used in practice
    * Improved const correctness

#rb robert.millar,robert.manuszewski,pj.kack
#preflight 63945bf45624e6da5ec85f88
#jira UE-169791

[CL 23475562 by johan torp in ue5-main branch]
2022-12-11 23:21:18 -05:00
..