#jira UE-210757
#rb
#tests Integrated //UE5/Partner-Latte-5.4/... @32408657 (the last non-robomerge change)
[CL 32557111 by Wojciech Krywult in 5.4 branch]
[FYI] dan.elksnitis
Original CL Desc
-----------------------------------------------------------------
[shaders] add compression for FShaderSource objects and enable by default (Mermaid SuperFast compression). This halves the high watermark for the ShaderCompiler LLM scope in my testing (QAGame cold cook), with minimal measurable impact to preprocessing time (<3% on average, but this may just be noise).
#rb Jason.Nadro, Laura.Hermanns
#jira UE-209037
[CL 32500132 by justin moe in 5.4 branch]
- Credits to Rex Hill for working out the mechanics of the leak.
#rb Rex.Hill
[REVIEW] [at]Rex.Hill, [at]Daniele.Vetorel
[CL 32499779 by arciel rekman in 5.4 branch]
if (!IsRunningWithPakFile() && ShouldLookForLooseCookedChunks())
to not affect performance in Test and Shipping builds that are using pak files.
#rb Yuriy.ODonnell
[FYI] Arciel.Rekman
#jira UE-209658
[CL 32498970 by daniele pieroni in 5.4 branch]
Shader preprocessing currently uses the same exception handler as shader compilation. The exception handler only makes sense when we call into platform shader compiler code, which we don't do for preprocessing. Otherwise it is preferable to use the existing error handling that already exists in the engine (the current exception handler turns fatal errors into warnings, for example).
Furthermore, the exception handler assumes that the global error history always includes a callstack. That is not the case and can cause ugly failures during error handling. This CL inserts a check and falls back to the general mechanism for extracting the callstack if no callstack is found. It's worth noting that this general mechanism will return the callstack of the exception handler, and not the callstack that caused the exception: the act of handling the exception will unwind the callstack and destroy that information.
#rb dan.elksnitis
[CL 32497935 by sebastian schoner in 5.4 branch]
==
* Off by default pending extensive testing (r.Visibility.SkipAlwaysVisible=1 to enable)
* Now all the primitive arrays are kept sorted into a "tested" vs "always visible" partition, which are then internally sorted by proxy type for cache coherency, using the primitive component ID as a tie breaker and to improve determinism (new implementation of FPrimitiveArraySortKey)
* The partition split location for "tested" vs. "always visible" is very efficiently calculated from the TypeOffsetTable using only a few loop iterations (as opposed to scanning the array), stored in Scene.PrimitivesAlwaysVisibleOffset (~0u if no always visible partition, or if optimization is disabled)
* The partition split location is aligned up to the next full dword - this is to avoid having a single dword spanning "tested" and "always visible" primitives making the lockless parallel calculations much more efficient. This will push a few (<32) primitives from always visible into the tested path, but this is not a big deal.
* FPrimitiveSceneProxy has a bIsAlwaysVisible flag (default false) that determines which partition it gets sorted into. Nanite determines this value with Nanite::FSceneProxyBase::SupportsAlwaysVisible()
* Countless places now only iterate from 0...StartOfAlwaysVisible range instead of 0...NumPrimitives, dropping the tested primitive count by orders of magnitude.
* r.Nanite.OptimizedRelevance=1 has been in place for a few years now, so this optimization now enforces optimized relevance in !WITH_EDITOR builds, and relies on assumptions from this (r.Nanite.OptimizedRelevance has been baked down and cvar removed)
* Nanite forces off IsUsingDistanceCullFade now so that any Nanite proxies going through FrustumCull will no longer test for fade distances (which are not even supported by Nanite)
* A new UpdateAlwaysVisible tasks is launched prior to FrustumCull (which waits on ray tracing if needed), and then FrustumCull waits on this new task. This task is responsible for filling the PrimitiveVisibilityMap for always visible primitives, and also for testing ray tracing culling per primitive for the TLAS
* stat initviews now contains an additional "Always Visible" timer that shows the thread time spent on processing the always visible primitives (primarily ray tracing cull tests for the TLAS) - this is not a measure of wall time though, as it's largely async
* Renamed (to match other methods like prepass) FDeferredShadingSceneRenderer::IsNaniteEnabled() -> ShouldRenderNanite() and moved it to the base FSceneRenderer under a virtual that can be checked in the FComputeAndMarkRelevance task
* Deleted legacy and broken r.Visibility.PrimitiveCull.SkipNanite cvar and logic
* FPrimitiveSceneInfo::CacheNaniteMaterialBins now calculates material relevance across all shading pipelines, and merges them into a combined primitive view relevance struct (without any view dependent bits, as Nanite does not have any per-view relevance tests that are respected)
* The combined Nanite relevance is now merged into FViewInfo at the end of FComputeAndMarkRelevance::Finalize(), after waiting for the CacheNaniteMaterialBins task to finish (which computes the combined relevance)
* Switched FrustumCull and UpdateAlwaysVisible tasks to use RESTRICT* instead of GetData() to allow the compiler to perform additional optimizations
* Nanite "tested" primitives no longer create mesh draw commands when going through FDrawCommandRelevancePacket::AddCommandsForMesh
* FRelevancePacket::Finalize() now completely skips creating visible cached mesh draw commands for the NaniteMeshPass
* Optimizing the relevance code was critical after pushing 60k+ primitives (in CitySample) down the always visible pass, because now there were ~124k static mesh relevances to process, which would previously rely on accurate visibility. It was an insane amount of overhead to compute something that is effectively not a per-primitive or per-mesh decision in Nanite.
* Nanite Proxies using features like custom depth, lighting channels, or static lighting will be forced down the "tested" path. Additionally, the editor will disable the "always visible" optimization due to various features like hit proxies or debug view modes relying on dynamic relevance (this will hopefully be addressed in the future).
#rb ola.olsson, zach.bethel
[FYI] brian.karis, rune.stubbe, jamie.hayes, mihnea.balta, luke.thatcher, john.huelin, ben.woodhouse, jian.ru
#jira UE-187480
[REVIEW] https://p4-swarm.epicgames.net/reviews/31997742
[CL 32496735 by graham wihlidal in 5.4 branch]
- If we need the shader source (i.e. material stats in the editor) we still keep the data around. The editor compiles shaders on demand and doesn't try to load these from the DDC so we don't have to worry about cache pollution.
- Shader Debug Info also needs this information so we keep it around if developers are running with this enabled.
- Added a helper function to the job input which determines if we need to store the original source code. NeedsOriginalShaderSource()
#rb dan.elksnitis
#tests Lyra Cook, Lyra Editor, UEFN Cooks
[FYI] sebastian.schoner, Graeme.Thornton
[CL 32060916 by jason nadro in 5.4 branch]
- The GRHIThreadId field was zeroed by StopRenderingThread(). but never set again when starting up.
- Fix only necessary for UE 5.4. This is already fixed in UE5.5/Main by the DevPR refactor.
#jira UE-197775
#rb zach.bethel
[CL 31901382 by Luke Thatcher in 5.4 branch]
When the setting is enabled, any mesh LODs using Unlimited Bone Influences that don't have a deformer assigned will use the DeformerGraph plugin's default deformer. This ensures that UBI meshes are always rendered with a deformer, and therefore the GPU skinning permutations for UBI aren't needed.
This change also adds a per-LOD setting that allows users to disable mesh deformers on a LOD, which could be useful for controlling performance, e.g. disabling an expensive deformer on lower LODs. Some changes to functions on USkinnedMeshComponent lay the foundations for having different deformers on different LODs as well.
#rb Jeremy.Moore, daniele.vettorel
[CL 31869023 by henry falconer in 5.4 branch]
[FYI] dan.elksnitis
Original CL Desc
-----------------------------------------------------------------
[shaders]
- store shadertype associations for each shader in a shadermap in the editor only data
- output a debug artifact containing per-shadertype permutation/memory stats; note that this is not directly representative of final shader memory usage since it doesn't account for shader library deduplication or shader library chunk re-duplication; it is only intended to be used as a tool for tracking/identifying shader growth due to added/modified shadertypes
- remove the old "dumpshadercodestats" path from the shader library, as well as the stats tracking "unique" shaders and shader memory; the former is no longer used and the latter does not correctly account for library chunking
- bump shader version due to shadermap editoronly data change
#rb Arciel.Rekman
[CL 31775922 by bob tellez in 5.4 branch]
- Available on SM6
- Disabled with config
- Includes support for heap allocations on Metal, enabled by default, can be disabled with -nometalheap
#rb Luke.Thatcher, Chris.Waters, Laura.Hermanns
#jira UE-204112
[CL 31527528 by carl lloyd in 5.4 branch]
- Disabled by default
- Removed MetalDerivedData and moved shader compiler specific code to seperate files
#rb Laura.Hermanns
#jira UE-204112
[CL 31524725 by carl lloyd in 5.4 branch]