You've already forked UnrealEngineUWP
mirror of
https://github.com/izzy2lost/UnrealEngineUWP.git
synced 2026-03-26 18:15:20 -07:00
List of optimizations and changes:
* Token stream structure
* Split token stream into strong-only and a mixed (weak+strong) stream
* Split token stream into a builder and a tighter view class which reduces sizeof(UClass)
* Implemented ref-counted token stream view sharing
* Removed Class and Outer from token stream
* Allow empty token streams (enabled by removing Class/Outer) to avoid touching token stream data
* Placed ARO (AddReferencedObjects) last to reduce per object cache thrashing, improve control flow predicability and avoid reading the last EndOfPointer and EndOfStream tokens
* FPrefetchingObjectIterator that bring in Class/Outer, class' tokenstream view and the first token data ahead of time
* Decode token bitfield once and ahead of time
* Reference queues and batch processing
* Introduced bounded queues: ref arrays -> unvalidated refs -> validated (non-null / non-permanent) refs
* Split all these queues for killable vs immutable references
* Stack-living references still handled synchronously. With removal of Class/Outer (prefetched ahead of time) few instances remain outside of ARO calls.
* Outer queues hold 32 items and get flushed when full.
* AddReferencedObjects (ARO) optimizations
* Misc optimizations in many ARO implementations
* New FReferenceCollector API to queue up ARO references (AddStableReference), old sync API (HandleObjectReference) still available
* New AddPropertyReferences traversal that replaces SerializeBin and PropertyValueIterator
* 4.5x faster than PropertyValueIterator
* Uses CLASSCAST dispatch instead of virtual SerializeItem dispatch.
* Step towards new unified token stream replacement shared by class token processing, structs and ARO
* Replaced StructUtil::AddReferencedObjects with AddPropertyReferences traversal, ~8x speedup and collects more references for CitySample
* Parallelism
* Single long-running task per worker
* Improved work-stealing / load-balancing, workers can steal full blocks, ARO calls and initial references
* Queue up slow ARO calls to improve load balancing and avoid late stragglers. Motivated by certain ARO calls taking over 2ms for a few specific objects.
* Kick tasks manually to avoid ParallelFor end synchronization
* FGCObject
* Initial reference collector runs in parallel with mark phase
* New FGCObject constructor API (AddStableNativeReferencesOnly) to opt-in to initial reference collection, used by StreamableManager
* Same constructor API allows FGCObjects to defer registration until they become active (RegisterLater), reduces number of active GCObjects
* Reduced memory usage
* Allocate reached objects in scratch pages (FWorkBlock) and reuse processed blocks, instead of swapping two big TArray<UObject*> per worker
* Reduced sizeof(UClass)
* Shareable token streams
* Misc optimizations
* New API to test if an object is in the permanent object pool. Old API read two global pointers for every visited reference.
* Fixed signed integer usage in GUObjectArray lookup that led to bad codegen
* FPropertyIterator optimizations
* SerializeBin optimizations
* Other changes
* Moved many helpers into UE::GC namespace
* Replaced TFastReferenceCollector API with simplified CollectReferences call. Needed to break this API any way.
* Introduced FGCInternals to avoid forward-declaring TFastReferenceCollector and depend on the options enum in common headers
* Moved and outlined code from GarbageCollection.h / FastReferenceCollector.h to GarbageCollection.cpp
* Moved GC History and Garbage Reference Tracking into a synchronous TDebugReachabilityProcessor
* Removed PersistentGarbage flag since it wasn't used in practice
* Improved const correctness
#rb robert.millar,robert.manuszewski,pj.kack
#preflight 63945bf45624e6da5ec85f88
#jira UE-169791
[CL 23475562 by johan torp in ue5-main branch]