2512 Commits

Author SHA1 Message Date
dan elksnitis
c203206fc2 [shaders] handle gracefully the case where we don't have remapping data for a particular line of shader code and just leave the error/warning unmodified. This _shouldn't_ happen in practice, generally is caused by regressions introduced in shaderformat code (where code is modified without preserving line numbers/whitespace). However due to another bug in 5.4 this is currently occurring in live (said bug has been fixed but is slightly involved to bring across to 5.4 so going to avoid this for now). Not remapping just means the filename/line number for an error or warning will be reported on the stripped code rather than the unstripped code, which is slightly less useful but not the end of the world.
#jira UE-214649
#rb Laura.Hermanns

[CL 34324358 by dan elksnitis in 5.4 branch]
2024-06-12 17:40:08 -04:00
Wojciech Krywult
d271c828b9 SDK (1): Changes in common code needed to support SDK changes on some platforms
#jira UE-210757
#rb
#tests Integrated //UE5/Partner-Latte-5.4/... @32408657 (the last non-robomerge change)

[CL 32557111 by Wojciech Krywult in 5.4 branch]
2024-03-27 16:55:10 -04:00
dan elksnitis
f37872bb7f [shaders] resubmit: add compression for FShaderSource objects and enable by default (Mermaid SuperFast compression). This halves the high watermark for the ShaderCompiler LLM scope in my testing (QAGame cold cook), with minimal measurable impact to preprocessing time (<3% on average, but this may just be noise).
#rb Jason.Nadro, Laura.Hermanns
#jira UE-209037

[CL 32500256 by dan elksnitis in 5.4 branch]
2024-03-26 02:56:31 -04:00
justin moe
f7aec52eb8 [Backout] - CL32363267
[FYI] dan.elksnitis
Original CL Desc
-----------------------------------------------------------------
[shaders] add compression for FShaderSource objects and enable by default (Mermaid SuperFast compression). This halves the high watermark for the ShaderCompiler LLM scope in my testing (QAGame cold cook), with minimal measurable impact to preprocessing time (<3% on average, but this may just be noise).

#rb Jason.Nadro, Laura.Hermanns
#jira UE-209037

[CL 32500132 by justin moe in 5.4 branch]
2024-03-26 02:52:38 -04:00
erica stella
564ec21859 Fix vulkan tonemap subpass only rendering to left eye.
#jira UE-209453
#rb Dmitriy.Dyomin
[FYI] jeannoe.morissette

[CL 32499988 by erica stella in 5.4 branch]
2024-03-26 02:47:54 -04:00
dan elksnitis
0c86f4d9fc [shaders] add compression for FShaderSource objects and enable by default (Mermaid SuperFast compression). This halves the high watermark for the ShaderCompiler LLM scope in my testing (QAGame cold cook), with minimal measurable impact to preprocessing time (<3% on average, but this may just be noise).
#rb Jason.Nadro, Laura.Hermanns
#jira UE-209037

[CL 32499932 by dan elksnitis in 5.4 branch]
2024-03-26 02:42:11 -04:00
arciel rekman
fa70401a8e Fix rare FShaderMapResource_SharedCode::ReleaseRHI crash
#rb Rex.Hill
[REVIEW] [at]Rex.Hill

[CL 32499902 by arciel rekman in 5.4 branch]
2024-03-26 02:41:10 -04:00
arciel rekman
4e21c7292e Fix leaking shader code preloading entries.
- Credits to Rex Hill for working out the mechanics of the leak.

#rb Rex.Hill
[REVIEW] [at]Rex.Hill, [at]Daniele.Vetorel

[CL 32499779 by arciel rekman in 5.4 branch]
2024-03-26 02:37:44 -04:00
dmytro vovk
606abaad10 Replaced thread_local in RenderGraphAllocator by a TLS index as it's faster on some platforms
#rb zach.bethel

[CL 32499758 by dmytro vovk in 5.4 branch]
2024-03-26 02:37:08 -04:00
daniele pieroni
fc509d0f48 Enabling chunk discovery on all build configurations in order to support Zen Streaming on Test configuration, while relying on the runtime check
if (!IsRunningWithPakFile() && ShouldLookForLooseCookedChunks())
to not affect performance in Test and Shipping builds that are using pak files.

#rb Yuriy.ODonnell
[FYI] Arciel.Rekman
#jira UE-209658

[CL 32498970 by daniele pieroni in 5.4 branch]
2024-03-26 02:10:10 -04:00
zach bethel
5ba664ddfc Enable UAV overlap for all UAVs in an RDG pass. If a pass happens to use multiple UAVs pointing at the same resource (but different slices), only one is currently chosen. However, using multiple UAVs that happen to have overlap will break validation (due to multiple begin / end calls). Until that is fixed, just enable it for all UAVs in the pass.
#jira UE-209165

[CL 32498661 by zach bethel in 5.4 branch]
2024-03-26 02:01:41 -04:00
sebastian schoner
c3e24ad4a4 Improve error handling for failed shader preprocessing
Shader preprocessing currently uses the same exception handler as shader compilation. The exception handler only makes sense when we call into platform shader compiler code, which we don't do for preprocessing. Otherwise it is preferable to use the existing error handling that already exists in the engine (the current exception handler turns fatal errors into warnings, for example).

Furthermore, the exception handler assumes that the global error history always includes a callstack. That is not the case and can cause ugly failures during error handling. This CL inserts a check and falls back to the general mechanism for extracting the callstack if no callstack is found. It's worth noting that this general mechanism will return the callstack of the exception handler, and not the callstack that caused the exception: the act of handling the exception will unwind the callstack and destroy that information.


#rb dan.elksnitis

[CL 32497935 by sebastian schoner in 5.4 branch]
2024-03-26 01:28:54 -04:00
graham wihlidal
c3f0e49549 Added CFLAG_ShaderBundle and decorated Nanite CS materials with it (some platforms can use this to build relevant data structures for internal bundle implementations).
#rb yuriy.odonnell
[REVIEW] https://p4-swarm.epicgames.net/reviews/32011926

[CL 32496753 by graham wihlidal in 5.4 branch]
2024-03-26 00:47:10 -04:00
graham wihlidal
5f85ecee17 Heavily optimized InitViews and ComputeRelevancy for Nanite scenes (up to 8x faster on the render thread in CitySample) - requires Nanite compute materials to be enabled.
==
* Off by default pending extensive testing (r.Visibility.SkipAlwaysVisible=1 to enable)
* Now all the primitive arrays are kept sorted into a "tested" vs "always visible" partition, which are then internally sorted by proxy type for cache coherency, using the primitive component ID as a tie breaker and to improve determinism (new implementation of FPrimitiveArraySortKey)
* The partition split location for "tested" vs. "always visible" is very efficiently calculated from the TypeOffsetTable using only a few loop iterations (as opposed to scanning the array), stored in Scene.PrimitivesAlwaysVisibleOffset (~0u if no always visible partition, or if optimization is disabled)
* The partition split location is aligned up to the next full dword - this is to avoid having a single dword spanning "tested" and "always visible" primitives making the lockless parallel calculations much more efficient. This will push a few (<32) primitives from always visible into the tested path, but this is not a big deal.
* FPrimitiveSceneProxy has a bIsAlwaysVisible flag (default false) that determines which partition it gets sorted into. Nanite determines this value with Nanite::FSceneProxyBase::SupportsAlwaysVisible()
* Countless places now only iterate from 0...StartOfAlwaysVisible range instead of 0...NumPrimitives, dropping the tested primitive count by orders of magnitude.
* r.Nanite.OptimizedRelevance=1 has been in place for a few years now, so this optimization now enforces optimized relevance in !WITH_EDITOR builds, and relies on assumptions from this (r.Nanite.OptimizedRelevance has been baked down and cvar removed)
* Nanite forces off IsUsingDistanceCullFade now so that any Nanite proxies going through FrustumCull will no longer test for fade distances (which are not even supported by Nanite)
* A new UpdateAlwaysVisible tasks is launched prior to FrustumCull (which waits on ray tracing if needed), and then FrustumCull waits on this new task. This task is responsible for filling the PrimitiveVisibilityMap for always visible primitives, and also for testing ray tracing culling per primitive for the TLAS
* stat initviews now contains an additional "Always Visible" timer that shows the thread time spent on processing the always visible primitives (primarily ray tracing cull tests for the TLAS) - this is not a measure of wall time though, as it's largely async
* Renamed (to match other methods like prepass) FDeferredShadingSceneRenderer::IsNaniteEnabled() -> ShouldRenderNanite() and moved it to the base FSceneRenderer under a virtual that can be checked in the FComputeAndMarkRelevance task
* Deleted legacy and broken r.Visibility.PrimitiveCull.SkipNanite cvar and logic
* FPrimitiveSceneInfo::CacheNaniteMaterialBins now calculates material relevance across all shading pipelines, and merges them into a combined primitive view relevance struct (without any view dependent bits, as Nanite does not have any per-view relevance tests that are respected)
* The combined Nanite relevance is now merged into FViewInfo at the end of FComputeAndMarkRelevance::Finalize(), after waiting for the CacheNaniteMaterialBins task to finish (which computes the combined relevance)
* Switched FrustumCull and UpdateAlwaysVisible tasks to use RESTRICT* instead of GetData() to allow the compiler to perform additional optimizations
* Nanite "tested" primitives no longer create mesh draw commands when going through FDrawCommandRelevancePacket::AddCommandsForMesh
* FRelevancePacket::Finalize() now completely skips creating visible cached mesh draw commands for the NaniteMeshPass
* Optimizing the relevance code was critical after pushing 60k+ primitives (in CitySample) down the always visible pass, because now there were ~124k static mesh relevances to process, which would previously rely on accurate visibility. It was an insane amount of overhead to compute something that is effectively not a per-primitive or per-mesh decision in Nanite.
* Nanite Proxies using features like custom depth, lighting channels, or static lighting will be forced down the "tested" path. Additionally, the editor will disable the "always visible" optimization due to various features like hit proxies or debug view modes relying on dynamic relevance (this will hopefully be addressed in the future).

#rb ola.olsson, zach.bethel
[FYI] brian.karis, rune.stubbe, jamie.hayes, mihnea.balta, luke.thatcher, john.huelin, ben.woodhouse, jian.ru
#jira UE-187480
[REVIEW] https://p4-swarm.epicgames.net/reviews/31997742

[CL 32496735 by graham wihlidal in 5.4 branch]
2024-03-26 00:46:51 -04:00
jason nadro
55fcd73bb3 [Shaders] - Discard non-stripped, original shader source to save on memory. Saves ~2x of the memory tracked by the LLM tag ShaderCompiler in small projects. For Lyra this saved ~600MB during a cook.
- If we need the shader source (i.e. material stats in the editor) we still keep the data around.  The editor compiles shaders on demand and doesn't try to load these from the DDC so we don't have to worry about cache pollution.
- Shader Debug Info also needs this information so we keep it around if developers are running with this enabled.
- Added a helper function to the job input which determines if we need to store the original source code. NeedsOriginalShaderSource()

#rb dan.elksnitis
#tests Lyra Cook, Lyra Editor, UEFN Cooks
[FYI] sebastian.schoner, Graeme.Thornton

[CL 32060916 by jason nadro in 5.4 branch]
2024-03-06 13:46:06 -05:00
Luke Thatcher
748a3e3a13 Fix assert when toggling the RHI thread on and off using "r.RHIThread.Enable"
- The GRHIThreadId field was zeroed by StopRenderingThread(). but never set again when starting up.
 - Fix only necessary for UE 5.4. This is already fixed in UE5.5/Main by the DevPR refactor.

#jira UE-197775
#rb zach.bethel

[CL 31901382 by Luke Thatcher in 5.4 branch]
2024-02-29 04:34:54 -05:00
jimmy andrews
32fdb471b8 fix unreachable implicit "++iterator" in cases where a for-loop was used to just grab the first element of a set or map
#rb lonnie.li
#lockdown michael.balzer

[CL 31877332 by jimmy andrews in 5.4 branch]
2024-02-28 13:34:31 -05:00
henry falconer
38d5cddf89 Adds a project setting, r.GPUSkin.AlwaysUseDeformerForUnlimitedBoneInfluences, that allows you to enable Unlimited Bone Influences in a project without compiling extra shader permutations for GPU skinning. This saves runtime memory, disk space and shader compilation time.
When the setting is enabled, any mesh LODs using Unlimited Bone Influences that don't have a deformer assigned will use the DeformerGraph plugin's default deformer. This ensures that UBI meshes are always rendered with a deformer, and therefore the GPU skinning permutations for UBI aren't needed.

This change also adds a per-LOD setting that allows users to disable mesh deformers on a LOD, which could be useful for controlling performance, e.g. disabling an expensive deformer on lower LODs. Some changes to functions on USkinnedMeshComponent lay the foundations for having different deformers on different LODs as well.

#rb Jeremy.Moore, daniele.vettorel

[CL 31869023 by henry falconer in 5.4 branch]
2024-02-28 08:09:12 -05:00
bob tellez
aa7f058350 [Backout] - CL31423450
[FYI] dan.elksnitis
Original CL Desc
-----------------------------------------------------------------
[shaders]
- store shadertype associations for each shader in a shadermap in the editor only data
- output a debug artifact containing per-shadertype permutation/memory stats; note that this is not directly representative of final shader memory usage since it doesn't account for shader library deduplication or shader library chunk re-duplication; it is only intended to be used as a tool for tracking/identifying shader growth due to added/modified shadertypes
- remove the old "dumpshadercodestats" path from the shader library, as well as the stats tracking "unique" shaders and shader memory; the former is no longer used and the latter does not correctly account for library chunking
- bump shader version due to shadermap editoronly data change

#rb Arciel.Rekman

[CL 31775922 by bob tellez in 5.4 branch]
2024-02-23 16:00:17 -05:00
zach bethel
6807cbeffc Fixed r.rdg.clobberresources crash and incorrect max discard transition for transient resources.
#jira UE-207384
#rb kenzo.terelst, mihnea.balta

[CL 31676689 by zach bethel in 5.4 branch]
2024-02-21 03:46:51 -05:00
ola olsson
0e37c1ebc0 Fixed tracking of dirty instance hierarchy cells for the purposes of explicit bounds update.
#jira UE-204867, FORT-707131
#rb rune.stubbe
#rnx

[CL 31641079 by ola olsson in 5.4 branch]
2024-02-20 06:42:41 -05:00
graham wihlidal
a3327e8829 Cleaned up shader bundle public API to remove unused buffer arguments, changed record index and platform data over to root constants, and fixed some flow control issues on some platforms.
#jira UE-207367
[FYI]
#rb yuriy.odonnell, luke.thatcher

[CL 31632949 by graham wihlidal in 5.4 branch]
2024-02-19 18:55:05 -05:00
carl lloyd
f1f8c9300a Added support for Metal Shader Converter with Bindless in MetalRHI
- Available on SM6
 - Disabled with config
 - Includes support for heap allocations on Metal, enabled by default, can be disabled with -nometalheap

#rb Luke.Thatcher, Chris.Waters, Laura.Hermanns
#jira UE-204112

[CL 31527528 by carl lloyd in 5.4 branch]
2024-02-15 13:52:54 -05:00
carl lloyd
c1edfa3ade Added support for Metal Shader Converter to Metal
- Disabled by default
    - Removed MetalDerivedData and moved shader compiler specific code to seperate files

#rb Laura.Hermanns
#jira UE-204112

[CL 31524725 by carl lloyd in 5.4 branch]
2024-02-15 12:38:42 -05:00
Sebastien Hillaire
130dc2c49b Lumen - added render settings to remove all translucent surface for lumen scene when it is know trnaslucent refraction will not be used by a project.
#rb jamie.hayes, Krzysztof.Narkowicz
#jira none

[CL 31519340 by Sebastien Hillaire in 5.4 branch]
2024-02-15 10:06:59 -05:00