resubmit with following fixes:
- static analysis error which caught an >=0 check on a uint64 which should have been >0
- fix for an inverted guard on multiprocess cook sending bytecode to director (was only sending code across if empty instead of non-empty)
- fix for uninitialized padding in the FShaderCodeResource::FHeader struct causing nondeterministic puts
- fix for incorrect size passed to job cache hashing on receiving buffers from DDC
#rb Devin.Doucette, Laura.Hermanns, Zousar.Shaker
#lockdown Marc.Audy
[CL 36754792 by dan elksnitis in 5.5 branch]
- Update to Metal Shader Converter 2.0 Beta 4 which added support for quad derivatives in Compute
- Reduced max number of shaders per library due to increased RAM usage in cooks
#rb zack.neyland
#jira UE-223492
[CL 36754144 by carl lloyd in 5.5 branch]
[FYI] dan.elksnitis
Original CL Desc
-----------------------------------------------------------------
[shaders] modify FShaderCode finalize to create a FSharedBuffer object, and modify all downstream uses of shader code to re-use this buffer (job cache, pushes to DDC, shader maps, and shader library). This reduces total amount of LLM tracked memory allocated at the end of a cold Lyra PS4 cook by about ~350MB; impact likely much larger for cooks of larger projects.
#rb Devin.Doucette, Zousar.Shaker
#lockdown Marc.Audy
resubmit with SA+MP cook fix
[CL 36747522 by dan elksnitis in 5.5 branch]
[FYI] dan.elksnitis
Original CL Desc
-----------------------------------------------------------------
[shaders] modify FShaderCode finalize to create a FSharedBuffer object, and modify all downstream uses of shader code to re-use this buffer (job cache, pushes to DDC, shader maps, and shader library). This reduces total amount of LLM tracked memory allocated at the end of a cold Lyra PS4 cook by about ~350MB; impact likely much larger for cooks of larger projects.
#rb Zousar.Shaker
#lockdown marc.audy
[CL 36440265 by dan elksnitis in 5.5 branch]
- Updated MSC to latest version 2.0 beta 3
- Removed MTLBufferPtr to make deallocations more explicit
- Re-wrote MetalTempAllocator to be a simple buffer allocator as the heap allocator had a huge perf overhead when used with Bindless
- Fixed use after free in deferred delete
- Limited SM6 to MacOS 15
Changes in collaboration with Apple:
- Reworked residency management
- Replace manual resource binding/pre-draw steps with IRRuntime helpers
- Added vertex layout hashing support for MSC vertex descriptors
- Replaced VertexBuffers cache struct with MSC IRRuntimeVertexBuffer
- Fixed texture reference update by adding an texture override in SRVs (this way the texture reference SRV don???t revert to the default resource when the view is invalidated).
- Fixed some page faults by removing the side table allocation with temporary allocations
#jira UE-223489
#rb Luke.Thatcher
[CL 36227379 by carl lloyd in 5.5 branch]
- This also reverts the DDPI information from CL 35396864, since D3D does in fact support vertex shader layers (except for D3D11.2 or older).
- Don't set USING_VERTEX_SHADER_LAYER when geometry shaders are available; This resulted in ShaderMinifier not able to find the entry point.
#jira UE-221358
#rnx
#rb Arciel.Rekman, Sebastien.Hillaire
[FYI] Christopher.Waters, Graham.Wihlidal, Erica.Stella
[CL 35971246 by laura hermanns in ue5-main branch]
[FYI] Laura.Hermanns
Original CL Desc
-----------------------------------------------------------------
[Shaders] Set USING_VERTEX_SHADER_LAYER depending on DDPI values not hard coded in shader backend.
This also reverts the DDPI information from CL 35396864, since D3D does in fact support vertex shader layers (except for D3D11.2 or older).
#jira UE-221358
#rnx
#rb Arciel.Rekman
[FYI] Christopher.Waters, Erica.Stella
[CL 35892689 by tiago costa in ue5-main branch]
This also reverts the DDPI information from CL 35396864, since D3D does in fact support vertex shader layers (except for D3D11.2 or older).
#jira UE-221358
#rnx
#rb Arciel.Rekman
[FYI] Christopher.Waters, Erica.Stella
[CL 35880876 by laura hermanns in ue5-main branch]
- Removed MetalContext and MetalRenderPass.
- Removed all code to restart renderpasses
- Added support in the RHI for a new UploadContext which allows uploads to execute before the submission of contexts using them.
- Most functionality is now within MetalRHIContext and removed dependancies so that multiple MetalRHIContext's can execute in parallel.
- MetalDeviceContext has been removed and replaced with MetalDevice.
- Removed the previous FrameAllocator and replaced with a temporary heap based allocator.
- Metal no longer uses SubmitCommandsHint and now builds/submits command buffers through RHIFinalizeContext/RHISubmitCommandLists
- Added initial support for MetalRHI parallel encoding, can be tested with -rhiparallel.
- Removed addition temporary allocations when uploading to buffers
#rb Luke.Thatcher
#jira UE-212349
[CL 35450226 by carl lloyd in ue5-main branch]
Storing the whole metal shader source text in the optional data field was introduced with CL 3341849 for debugging purposes.
This should only be required when CFLAG_ExtraShaderData is specified (enabled via CVar r.Shaders.ExtraData).
The only place this optional data is needed outside the shader compiler, is TMetalBaseShader::Init() to initialize the field "GlslCodeNSString" which is documented as debuggable text.
Also invalidate all Metal shaders to trigger a recompilation with a much smaller memory footprint of the output shader binaries.
#rnx
#rb Arciel.Rekman, Florin.Pascu, Jason.Nadro
#lockdown Michal.Valient
[CL 35284217 by laura hermanns in ue5-main branch]
[FYI] dan.elksnitis
Original CL Desc
-----------------------------------------------------------------
[shaders]
- move population of output.target field into core code so each backend doesn't need to do it manually (do so before calling the compile function so any existing code expecting the output field to be set at any point during the compile process is unaffected)
- add a check in FShaderCompileJob::SerializeOutput and FShaderCompileJob::SerializeWorkerOutput that the FShaderTarget for a job output matches that of its input. This should catch cases of shaders with the wrong frequency being associated with jobs; further the callstack should indicate where this incorrect association is coming from (since SerializeOutput is called in different places for each cache path: duplicate in-flight jobs, jobs which hit in the in-memory job cache, and jobs which hit in the DDC cache, and SerializeWorkerOutput is only called when the job is read back from SCW output).
#rb Laura.Hermanns
[CL 35080902 by dan elksnitis in ue5-main branch]
- move population of output.target field into core code so each backend doesn't need to do it manually (do so before calling the compile function so any existing code expecting the output field to be set at any point during the compile process is unaffected)
- add a check in FShaderCompileJob::SerializeOutput and FShaderCompileJob::SerializeWorkerOutput that the FShaderTarget for a job output matches that of its input. This should catch cases of shaders with the wrong frequency being associated with jobs; further the callstack should indicate where this incorrect association is coming from (since SerializeOutput is called in different places for each cache path: duplicate in-flight jobs, jobs which hit in the in-memory job cache, and jobs which hit in the DDC cache, and SerializeWorkerOutput is only called when the job is read back from SCW output).
#rb Laura.Hermanns
[CL 35053511 by dan elksnitis in ue5-main branch]
- Moving various UB booleans into a flags enum.
- UB booleans could not be reasonably deprecated without incurring memory overhead, so this will break custom code that uses them.
- Adding UB flag to force the shader compilers to generate reflection for the UB members which are normally excluded from reflection.
- Adding UB flag that tells MeshCommands that a UB will be bound during pass drawing and that it doesn't need to be set via MDCs.
- New flags are not used in this CL, they are prerequisites for subsequent, larger changes.
#rb jeannoe.morissette
[CL 34356503 by christopher waters in ue5-main branch]
- SRVNonPixel is needed by mobile to insert a barrier between fragment -> vertex texture fetch, but since this is a heavyweight barrier, it is opt-in with SHADER_PARAMETER_RDG_NON_PIXEL_SRV.
- Small refactor to FRDGTextureAccess to allow for arbitrary subresources, as the current model only allows full resource transitions.
#rb mihnea.balta, luke.thatcher, serge.bernier
#jira UE-211883
[CL 33179861 by zach bethel in ue5-main branch]
- Adds StructPackingPass to SPIRV-Tools which re-assigns all struct member offsets of the global cbuffer ("type.$Globals" when translated in DXC/SPIR-V) according to the std140 memory layout rules.
- Remove DXC rewriter from shader backends as the shader minifier can already handle the majority of dead code removal.
- Rebuilt DXC for Win64, Mac, Linux.
#jira UE-207703
#rnx
#rb Yuriy.ODonnell
[FYI] Dan.Elksnitis, JeanNoe.Morissette, Serge.Bernier, Florin.Pascu
[CL 32646612 by laura hermanns in ue5-main branch]