Commit Graph

1936 Commits

Author SHA1 Message Date
sebastian schoner
c301fbf538 Improve error handling for failed shader preprocessing
Shader preprocessing currently uses the same exception handler as shader compilation. The exception handler only makes sense when we call into platform shader compiler code, which we don't do for preprocessing. Otherwise it is preferable to use the existing error handling that already exists in the engine (the current exception handler turns fatal errors into warnings, for example).

Furthermore, the exception handler assumes that the global error history always includes a callstack. That is not the case and can cause ugly failures during error handling. This CL inserts a check and falls back to the general mechanism for extracting the callstack if no callstack is found. It's worth noting that this general mechanism will return the callstack of the exception handler, and not the callstack that caused the exception: the act of handling the exception will unwind the callstack and destroy that information.


#rb dan.elksnitis

[CL 32152284 by sebastian schoner in ue5-main branch]
2024-03-11 08:19:47 -04:00
tiago costa
5479150be6 Added missing profiling scopes related to raytracing and renamed some existing ones.
[FYI] aleksander.netzel

[CL 32152062 by tiago costa in ue5-main branch]
2024-03-11 08:05:15 -04:00
dan elksnitis
721a361177 [shaders] optimize AppendKeyStringShaderDependencies by caching FShaderParametersMetadata pointers in each shader/vf type instead of uniform buffer names; this eliminates a bunch of string hashing/comparison/fname construction that was previously needed every time this function is called and improves performance of GetMaterialShaderMapKeyString by approximately 20%
#rb christopher.waters

[CL 32085954 by dan elksnitis in ue5-main branch]
2024-03-07 10:17:38 -05:00
dan elksnitis
595eecd18e [shaders] optimize global shader config overrides mechanism to store a cache of EShaderPlatform to constructed FPlatformData; these were already cached per shader format but we weren't caching the platform -> format association (and enumerating all target formats for each target platform is actually surprisingly slow); this improves performance of GetGlobalShaderMapKeyString by approximately 4x
#rb christopher.waters, Laura.Hermanns

[CL 32084310 by dan elksnitis in ue5-main branch]
2024-03-07 09:19:10 -05:00
jason nadro
5751f3b142 [Shaders] - Discard non-stripped, original shader source to save on memory. Saves ~2x of the memory tracked by the LLM tag ShaderCompiler in small projects. For Lyra this saved ~600MB during a cook.
- If we need the shader source (i.e. material stats in the editor) we still keep the data around.  The editor compiles shaders on demand and doesn't try to load these from the DDC so we don't have to worry about cache pollution.
- Shader Debug Info also needs this information so we keep it around if developers are running with this enabled.
- Added a helper function to the job input which determines if we need to store the original source code. NeedsOriginalShaderSource()

#rb dan.elksnitis
#tests Lyra Cook, Lyra Editor, UEFN Cooks
[FYI] sebastian.schoner, Graeme.Thornton

[CL 32061613 by jason nadro in ue5-main branch]
2024-03-06 14:02:44 -05:00
luke thatcher
8b297e8d54 Deprecate GetRHIPipelines() and EnumerateRHIPipelines()
- Replaced with range iteration of set bits, using MakeFlagsRange().
 - Results in better code gen.

#rb zach.bethel

[CL 32049770 by luke thatcher in ue5-main branch]
2024-03-06 06:47:07 -05:00
zach bethel
d54fcc2269 Transient allocator memory improvements
- Async compute resources are able to alias with each other on the async compute pipe.
        - Added r.RDG.AsyncComputeTransientAliasing and GRHIGlobals.SupportsAsyncComputeTransientAliasing to control at runtime and per-platform, respectively.
 - Offload D3D12 create placed resource calls to tasks.
        - Placed resource creation in D3D12 is very expensive relative to other platforms.
 - Fixed tracking of transient memory on platforms that support virtual pages.
 - Added more debug information to RDG insights.
 - Refactored D3D12 barrier implementation to support acquire / discard operations on async compute.

#jira UE-198603
#rb Luke.Thatcher

[CL 32042201 by zach bethel in ue5-main branch]
2024-03-05 19:32:30 -05:00
zach bethel
f8671a1db5 Make RDG use 32 bit handles.
- Back compatible with legacy 16 bit insights traces.
 - Fixed SkipTracking validation to not check states to allow for writable access.

[CL 32025670 by zach bethel in ue5-main branch]
2024-03-05 12:58:21 -05:00
Wei Liu
3f7b9994b3 Support Vulkan pre-rotation on mobile.
- Create an intermediate image as the BackBuffer and render normally, then copy the image to the actual BackBuffer in the low level.
- Remove the QCom pre-rotation implementation.

#jira none

#rb Dmitriy.Dyomin, florian.penzkofer, jeannoe.morissette

[CL 32014454 by Wei Liu in ue5-main branch]
2024-03-05 04:21:29 -05:00
graham wihlidal
e8a215eee1 Heavily optimized InitViews and ComputeRelevancy for Nanite scenes (up to 8x faster on the render thread in CitySample) - requires Nanite compute materials to be enabled.
==
* Off by default pending extensive testing (r.Visibility.SkipAlwaysVisible=1 to enable)
* Now all the primitive arrays are kept sorted into a "tested" vs "always visible" partition, which are then internally sorted by proxy type for cache coherency, using the primitive component ID as a tie breaker and to improve determinism (new implementation of FPrimitiveArraySortKey)
* The partition split location for "tested" vs. "always visible" is very efficiently calculated from the TypeOffsetTable using only a few loop iterations (as opposed to scanning the array), stored in Scene.PrimitivesAlwaysVisibleOffset (~0u if no always visible partition, or if optimization is disabled)
* The partition split location is aligned up to the next full dword - this is to avoid having a single dword spanning "tested" and "always visible" primitives making the lockless parallel calculations much more efficient. This will push a few (<32) primitives from always visible into the tested path, but this is not a big deal.
* FPrimitiveSceneProxy has a bIsAlwaysVisible flag (default false) that determines which partition it gets sorted into. Nanite determines this value with Nanite::FSceneProxyBase::SupportsAlwaysVisible()
* Countless places now only iterate from 0...StartOfAlwaysVisible range instead of 0...NumPrimitives, dropping the tested primitive count by orders of magnitude.
* r.Nanite.OptimizedRelevance=1 has been in place for a few years now, so this optimization now enforces optimized relevance in !WITH_EDITOR builds, and relies on assumptions from this (r.Nanite.OptimizedRelevance has been baked down and cvar removed)
* Nanite forces off IsUsingDistanceCullFade now so that any Nanite proxies going through FrustumCull will no longer test for fade distances (which are not even supported by Nanite)
* A new UpdateAlwaysVisible tasks is launched prior to FrustumCull (which waits on ray tracing if needed), and then FrustumCull waits on this new task. This task is responsible for filling the PrimitiveVisibilityMap for always visible primitives, and also for testing ray tracing culling per primitive for the TLAS
* stat initviews now contains an additional "Always Visible" timer that shows the thread time spent on processing the always visible primitives (primarily ray tracing cull tests for the TLAS) - this is not a measure of wall time though, as it's largely async
* Renamed (to match other methods like prepass) FDeferredShadingSceneRenderer::IsNaniteEnabled() -> ShouldRenderNanite() and moved it to the base FSceneRenderer under a virtual that can be checked in the FComputeAndMarkRelevance task
* Deleted legacy and broken r.Visibility.PrimitiveCull.SkipNanite cvar and logic
* FPrimitiveSceneInfo::CacheNaniteMaterialBins now calculates material relevance across all shading pipelines, and merges them into a combined primitive view relevance struct (without any view dependent bits, as Nanite does not have any per-view relevance tests that are respected)
* The combined Nanite relevance is now merged into FViewInfo at the end of FComputeAndMarkRelevance::Finalize(), after waiting for the CacheNaniteMaterialBins task to finish (which computes the combined relevance)
* Switched FrustumCull and UpdateAlwaysVisible tasks to use RESTRICT* instead of GetData() to allow the compiler to perform additional optimizations
* Nanite "tested" primitives no longer create mesh draw commands when going through FDrawCommandRelevancePacket::AddCommandsForMesh
* FRelevancePacket::Finalize() now completely skips creating visible cached mesh draw commands for the NaniteMeshPass
* Optimizing the relevance code was critical after pushing 60k+ primitives (in CitySample) down the always visible pass, because now there were ~124k static mesh relevances to process, which would previously rely on accurate visibility. It was an insane amount of overhead to compute something that is effectively not a per-primitive or per-mesh decision in Nanite.
* Nanite Proxies using features like custom depth, lighting channels, or static lighting will be forced down the "tested" path. Additionally, the editor will disable the "always visible" optimization due to various features like hit proxies or debug view modes relying on dynamic relevance (this will hopefully be addressed in the future).

#rb ola.olsson, zach.bethel
[FYI] brian.karis, rune.stubbe, jamie.hayes, mihnea.balta, luke.thatcher, john.huelin, ben.woodhouse, jian.ru
#jira UE-187480
[REVIEW] https://p4-swarm.epicgames.net/reviews/31997742

[CL 32010445 by graham wihlidal in ue5-main branch]
2024-03-04 21:25:27 -05:00
mihnea balta
b4b63af128 Fix RDG clobber mode trying to fill block compressed textures with RHIClearUAVFloat.
These UAVs need to be handled as UINT, not float. Also fixed some undefined float to int type casting in the function which was computing the integer clobber value.

#jira UE-207822
#rnx
#rb zach.bethel

[CL 31912873 by mihnea balta in ue5-main branch]
2024-02-29 12:18:26 -05:00
florin pascu
77010d118e Make RenderUtils read from TargetPlatformSettings.
Fixes the crashes/wrong rendering preview had when the sdk was not installed for the previewed platform
#jira UE-207071, UE-198717
#rb Jack.Porter

[CL 31901275 by florin pascu in ue5-main branch]
2024-02-29 04:15:58 -05:00
tiago costa
c36f9e2e6c Workaround deprecation warnings caused by default constructor/operators of FRayTracingGeometry.
[CL 31883292 by tiago costa in ue5-main branch]
2024-02-28 15:51:10 -05:00
tiago costa
e1e92e4284 Deprecate public access to RayTracingGeometryRHI in FRayTracingGeometry.
- use GetRHI() instead.

[FYI] aleksander.netzel

[CL 31843065 by tiago costa in ue5-main branch]
2024-02-27 11:57:07 -05:00
dan elksnitis
3bf0293c8e [shaders] hash all ueshadermetadata directives found when preprocessing shaders for inclusion in the per-shader DDC key, not just VERSION ones. others are typically used to modify compilation behaviour in some manner.
#rb Yuriy.ODonnell

[CL 31839145 by dan elksnitis in ue5-main branch]
2024-02-27 09:30:44 -05:00
bob tellez
b9a0ee14a3 [Backout] - CL31423450
[FYI] dan.elksnitis
Original CL Desc
-----------------------------------------------------------------
[shaders]
- store shadertype associations for each shader in a shadermap in the editor only data
- output a debug artifact containing per-shadertype permutation/memory stats; note that this is not directly representative of final shader memory usage since it doesn't account for shader library deduplication or shader library chunk re-duplication; it is only intended to be used as a tool for tracking/identifying shader growth due to added/modified shadertypes
- remove the old "dumpshadercodestats" path from the shader library, as well as the stats tracking "unique" shaders and shader memory; the former is no longer used and the latter does not correctly account for library chunking
- bump shader version due to shadermap editoronly data change

#rb Arciel.Rekman

[CL 31780302 by bob tellez in ue5-main branch]
2024-02-23 17:25:34 -05:00
christopher waters
68308c5193 Shader statistics
- Adding utility function for easier additions to FShaderCompilerOutput::ShaderStatistics
- Simplified existing uses of FShaderCompilerOutput::ShaderStatistics
- Reworked D3D compilation to allow DXC vs FXC to specify their different shader limits.
- Adding Shader Statistics for D3D SamplerState and Resource usages.
- Adding Shader Statistics for D3D SM6.6 bindless resource usages.

#rb Jason.Nadro, Laura.Hermanns

[CL 31740056 by christopher waters in ue5-main branch]
2024-02-22 18:16:34 -05:00
luke thatcher
01203093c6 Deprecate:
- FRHITexture2D
 - FRHITexture2DArray
 - FRHITexture3D
 - FRHITextureCube
 - FTexture2DRHIRef
 - FTexture2DArrayRHIRef
 - FTexture3DRHIRef
 - FTextureCubeRHIRef

Replaced with FRHITexture and FTextureRHIRef

These types were unified in UE 5.1 and have been defined via "using" statements to the same underlying texture type for several engine releases.

#rb christopher.waters

[CL 31724002 by luke thatcher in ue5-main branch]
2024-02-22 11:38:35 -05:00
luke thatcher
808b695e4f Replace use of FRHICommandListExecutor::GetImmediateCommandList() with FRHICommandListImmediate::Get()
- Only in places where it is trivially proven the call is only made on the render thread, due to an existing check(IsInRenderingThread()) assert somewhere in the function.
 - FRHICommandListImmediate::Get() itself contains a check(IsInRenderingThread()), so this enforces correct threading, and removes the need for extra checks at the call sites.
 - Remaining uses of FRHICommandListExecutor::GetImmediateCommandList() need investigation. Some may be bugs.
 - Also some changes to make use of the passed-in RHICmdList where possible (e.g. render commands that are given the immediate command list, but call the global getter rather than using the argument they were given).

#rb zach.bethel

[CL 31699633 by luke thatcher in ue5-main branch]
2024-02-21 17:26:04 -05:00
zach bethel
5ad1f324c6 Fixed r.rdg.clobberresources crash and incorrect max discard transition for transient resources.
#jira UE-207384
#rb kenzo.terelst, mihnea.balta

[CL 31679358 by zach bethel in ue5-main branch]
2024-02-21 08:42:30 -05:00
wouter dek
677a7d6665 Parallelize UpdateReferencedUniformBufferNames, which saves a second on startup
#rb dan.elksnitis, jason.hoerner

[CL 31649258 by wouter dek in ue5-main branch]
2024-02-20 12:37:39 -05:00
christopher waters
485342b86f Removing a large chunk of deprecated code from RHI.
Removing more deprecated code that used SetParameters with command lists.

[CL 31645019 by christopher waters in ue5-main branch]
2024-02-20 10:13:25 -05:00
graham wihlidal
27cc9c66ea Cleaned up shader bundle public API to remove unused buffer arguments, changed record index and platform data over to root constants, and fixed some flow control issues on some platforms.
#jira UE-207367
[FYI]
#rb yuriy.odonnell, luke.thatcher

[CL 31633290 by graham wihlidal in ue5-main branch]
2024-02-19 19:00:27 -05:00
christopher waters
e294adac0f Removing per-parameter binding code that was deprecated in 5.3
#jira UE-194669

[CL 31629525 by christopher waters in ue5-main branch]
2024-02-19 17:52:57 -05:00
steve robb
f8d47335a4 Replaced RemoveAt(N, 1, EAllowShrinking::*) with RemoveAt(N, EAllowShrinking::*).
[CL 31626444 by steve robb in ue5-main branch]
2024-02-19 16:51:58 -05:00