Shader preprocessing currently uses the same exception handler as shader compilation. The exception handler only makes sense when we call into platform shader compiler code, which we don't do for preprocessing. Otherwise it is preferable to use the existing error handling that already exists in the engine (the current exception handler turns fatal errors into warnings, for example).
Furthermore, the exception handler assumes that the global error history always includes a callstack. That is not the case and can cause ugly failures during error handling. This CL inserts a check and falls back to the general mechanism for extracting the callstack if no callstack is found. It's worth noting that this general mechanism will return the callstack of the exception handler, and not the callstack that caused the exception: the act of handling the exception will unwind the callstack and destroy that information.
#rb dan.elksnitis
[CL 32152284 by sebastian schoner in ue5-main branch]
- If we need the shader source (i.e. material stats in the editor) we still keep the data around. The editor compiles shaders on demand and doesn't try to load these from the DDC so we don't have to worry about cache pollution.
- Shader Debug Info also needs this information so we keep it around if developers are running with this enabled.
- Added a helper function to the job input which determines if we need to store the original source code. NeedsOriginalShaderSource()
#rb dan.elksnitis
#tests Lyra Cook, Lyra Editor, UEFN Cooks
[FYI] sebastian.schoner, Graeme.Thornton
[CL 32061613 by jason nadro in ue5-main branch]
- Replaced with range iteration of set bits, using MakeFlagsRange().
- Results in better code gen.
#rb zach.bethel
[CL 32049770 by luke thatcher in ue5-main branch]
- Async compute resources are able to alias with each other on the async compute pipe.
- Added r.RDG.AsyncComputeTransientAliasing and GRHIGlobals.SupportsAsyncComputeTransientAliasing to control at runtime and per-platform, respectively.
- Offload D3D12 create placed resource calls to tasks.
- Placed resource creation in D3D12 is very expensive relative to other platforms.
- Fixed tracking of transient memory on platforms that support virtual pages.
- Added more debug information to RDG insights.
- Refactored D3D12 barrier implementation to support acquire / discard operations on async compute.
#jira UE-198603
#rb Luke.Thatcher
[CL 32042201 by zach bethel in ue5-main branch]
- Back compatible with legacy 16 bit insights traces.
- Fixed SkipTracking validation to not check states to allow for writable access.
[CL 32025670 by zach bethel in ue5-main branch]
- Create an intermediate image as the BackBuffer and render normally, then copy the image to the actual BackBuffer in the low level.
- Remove the QCom pre-rotation implementation.
#jira none
#rb Dmitriy.Dyomin, florian.penzkofer, jeannoe.morissette
[CL 32014454 by Wei Liu in ue5-main branch]
==
* Off by default pending extensive testing (r.Visibility.SkipAlwaysVisible=1 to enable)
* Now all the primitive arrays are kept sorted into a "tested" vs "always visible" partition, which are then internally sorted by proxy type for cache coherency, using the primitive component ID as a tie breaker and to improve determinism (new implementation of FPrimitiveArraySortKey)
* The partition split location for "tested" vs. "always visible" is very efficiently calculated from the TypeOffsetTable using only a few loop iterations (as opposed to scanning the array), stored in Scene.PrimitivesAlwaysVisibleOffset (~0u if no always visible partition, or if optimization is disabled)
* The partition split location is aligned up to the next full dword - this is to avoid having a single dword spanning "tested" and "always visible" primitives making the lockless parallel calculations much more efficient. This will push a few (<32) primitives from always visible into the tested path, but this is not a big deal.
* FPrimitiveSceneProxy has a bIsAlwaysVisible flag (default false) that determines which partition it gets sorted into. Nanite determines this value with Nanite::FSceneProxyBase::SupportsAlwaysVisible()
* Countless places now only iterate from 0...StartOfAlwaysVisible range instead of 0...NumPrimitives, dropping the tested primitive count by orders of magnitude.
* r.Nanite.OptimizedRelevance=1 has been in place for a few years now, so this optimization now enforces optimized relevance in !WITH_EDITOR builds, and relies on assumptions from this (r.Nanite.OptimizedRelevance has been baked down and cvar removed)
* Nanite forces off IsUsingDistanceCullFade now so that any Nanite proxies going through FrustumCull will no longer test for fade distances (which are not even supported by Nanite)
* A new UpdateAlwaysVisible tasks is launched prior to FrustumCull (which waits on ray tracing if needed), and then FrustumCull waits on this new task. This task is responsible for filling the PrimitiveVisibilityMap for always visible primitives, and also for testing ray tracing culling per primitive for the TLAS
* stat initviews now contains an additional "Always Visible" timer that shows the thread time spent on processing the always visible primitives (primarily ray tracing cull tests for the TLAS) - this is not a measure of wall time though, as it's largely async
* Renamed (to match other methods like prepass) FDeferredShadingSceneRenderer::IsNaniteEnabled() -> ShouldRenderNanite() and moved it to the base FSceneRenderer under a virtual that can be checked in the FComputeAndMarkRelevance task
* Deleted legacy and broken r.Visibility.PrimitiveCull.SkipNanite cvar and logic
* FPrimitiveSceneInfo::CacheNaniteMaterialBins now calculates material relevance across all shading pipelines, and merges them into a combined primitive view relevance struct (without any view dependent bits, as Nanite does not have any per-view relevance tests that are respected)
* The combined Nanite relevance is now merged into FViewInfo at the end of FComputeAndMarkRelevance::Finalize(), after waiting for the CacheNaniteMaterialBins task to finish (which computes the combined relevance)
* Switched FrustumCull and UpdateAlwaysVisible tasks to use RESTRICT* instead of GetData() to allow the compiler to perform additional optimizations
* Nanite "tested" primitives no longer create mesh draw commands when going through FDrawCommandRelevancePacket::AddCommandsForMesh
* FRelevancePacket::Finalize() now completely skips creating visible cached mesh draw commands for the NaniteMeshPass
* Optimizing the relevance code was critical after pushing 60k+ primitives (in CitySample) down the always visible pass, because now there were ~124k static mesh relevances to process, which would previously rely on accurate visibility. It was an insane amount of overhead to compute something that is effectively not a per-primitive or per-mesh decision in Nanite.
* Nanite Proxies using features like custom depth, lighting channels, or static lighting will be forced down the "tested" path. Additionally, the editor will disable the "always visible" optimization due to various features like hit proxies or debug view modes relying on dynamic relevance (this will hopefully be addressed in the future).
#rb ola.olsson, zach.bethel
[FYI] brian.karis, rune.stubbe, jamie.hayes, mihnea.balta, luke.thatcher, john.huelin, ben.woodhouse, jian.ru
#jira UE-187480
[REVIEW] https://p4-swarm.epicgames.net/reviews/31997742
[CL 32010445 by graham wihlidal in ue5-main branch]
These UAVs need to be handled as UINT, not float. Also fixed some undefined float to int type casting in the function which was computing the integer clobber value.
#jira UE-207822
#rnx
#rb zach.bethel
[CL 31912873 by mihnea balta in ue5-main branch]
Fixes the crashes/wrong rendering preview had when the sdk was not installed for the previewed platform
#jira UE-207071, UE-198717
#rb Jack.Porter
[CL 31901275 by florin pascu in ue5-main branch]
[FYI] dan.elksnitis
Original CL Desc
-----------------------------------------------------------------
[shaders]
- store shadertype associations for each shader in a shadermap in the editor only data
- output a debug artifact containing per-shadertype permutation/memory stats; note that this is not directly representative of final shader memory usage since it doesn't account for shader library deduplication or shader library chunk re-duplication; it is only intended to be used as a tool for tracking/identifying shader growth due to added/modified shadertypes
- remove the old "dumpshadercodestats" path from the shader library, as well as the stats tracking "unique" shaders and shader memory; the former is no longer used and the latter does not correctly account for library chunking
- bump shader version due to shadermap editoronly data change
#rb Arciel.Rekman
[CL 31780302 by bob tellez in ue5-main branch]
- Adding utility function for easier additions to FShaderCompilerOutput::ShaderStatistics
- Simplified existing uses of FShaderCompilerOutput::ShaderStatistics
- Reworked D3D compilation to allow DXC vs FXC to specify their different shader limits.
- Adding Shader Statistics for D3D SamplerState and Resource usages.
- Adding Shader Statistics for D3D SM6.6 bindless resource usages.
#rb Jason.Nadro, Laura.Hermanns
[CL 31740056 by christopher waters in ue5-main branch]
- FRHITexture2D
- FRHITexture2DArray
- FRHITexture3D
- FRHITextureCube
- FTexture2DRHIRef
- FTexture2DArrayRHIRef
- FTexture3DRHIRef
- FTextureCubeRHIRef
Replaced with FRHITexture and FTextureRHIRef
These types were unified in UE 5.1 and have been defined via "using" statements to the same underlying texture type for several engine releases.
#rb christopher.waters
[CL 31724002 by luke thatcher in ue5-main branch]
- Only in places where it is trivially proven the call is only made on the render thread, due to an existing check(IsInRenderingThread()) assert somewhere in the function.
- FRHICommandListImmediate::Get() itself contains a check(IsInRenderingThread()), so this enforces correct threading, and removes the need for extra checks at the call sites.
- Remaining uses of FRHICommandListExecutor::GetImmediateCommandList() need investigation. Some may be bugs.
- Also some changes to make use of the passed-in RHICmdList where possible (e.g. render commands that are given the immediate command list, but call the global getter rather than using the argument they were given).
#rb zach.bethel
[CL 31699633 by luke thatcher in ue5-main branch]