Commit Graph

8399 Commits

Author SHA1 Message Date
dmitriy dyomin
7c784d65cf Remove a loop for CSM cascade selection in mobile shading to avoid a driver issue on Adreno 7xx
[CL 31040771 by dmitriy dyomin in ue5-main branch]
2024-01-31 04:57:27 -05:00
trystan binkley-jone
cf3ac8718b nDisplay: Changed the dynamic depth of field compensation from a hard coded curve to a LUT with a texture that can be provided through the ICVFX camera properties.
This is a fixed version of CL30977048, which had an issue compiling shaders at SM5 and was backed out

#jira UE-204417
#rb Alejandro.Arango, Guillaume.Abadie

#virtualized

[CL 31037789 by trystan binkley-jone in ue5-main branch]
2024-01-30 22:26:06 -05:00
christopher fiala
39806bb584 Correctly clamp additional VRS shading rates to 2x2 max for Nanite Software VRS.
Prevents VRS from being disabled in areas with 2x4, 4x2, or 4x4 rates.

#jira UE-192098
#rb rune.stubbe

[CL 31023301 by christopher fiala in ue5-main branch]
2024-01-30 17:17:51 -05:00
sebastien hillaire
423226fd40 Substrate - fixed crashing when accessing scene textures normal in decals.
Only normals should be accessible using the deferred decals (reconstructed from depth buffer).

[FYI] charles.derousiers

[CL 30996047 by sebastien hillaire in ue5-main branch]
2024-01-30 03:17:31 -05:00
bob tellez
8b5a50ff3c [Backout] - CL30977048
[FYI] trystan.binkley-Jone
Original CL Desc
-----------------------------------------------------------------
nDisplay: Changed the dynamic depth of field compensation from a hard coded curve to a LUT with a texture that can be provided through the ICVFX camera properties

#jira UE-204417
#rb Alejandro.Arango, Guillaume.Abadie

#virtualized

[CL 30988344 by bob tellez in ue5-main branch]
2024-01-30 00:21:26 -05:00
trystan binkley-jone
f54ef68e5b nDisplay: Changed the dynamic depth of field compensation from a hard coded curve to a LUT with a texture that can be provided through the ICVFX camera properties
#jira UE-204417
#rb Alejandro.Arango, Guillaume.Abadie

#virtualized

[CL 30979441 by trystan binkley-jone in ue5-main branch]
2024-01-29 17:43:20 -05:00
sebastien hillaire
fa7908f418 Fixed lumen reflection discrepancy on post motion blur translucents.
#jira UE-153094
#rb guillaume.abadie

[CL 30975285 by sebastien hillaire in ue5-main branch]
2024-01-29 14:27:20 -05:00
Luke Thatcher
10cdd4a111 Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637)
Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists.
See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows:

This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this:
 - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit().
 - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job.
 - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition).

One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations:
 - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism.
 - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes.

A summary of the new RHI breadcrumb system is as follows:
 - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging.
 - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes.
 - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress.
 - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs).

RenderGraph scopes have been simplified:
 - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types.
 - Each RDG pass holds a pointer to the scope it was created under.
 - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes.

Other changes include:
 - Fixes for bugs uncovered when parallel translate was enabled.
 - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer.
 - Refactored RHI draw call stats to better fit the new pipeline design.

#rb jeannoe.morissette, zach.bethel
#jira UE-139543

[CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
patrick kelly
e6cb15a9b9 Heterogeneous Volumes: Code cleanup. Removing unused types and includes.
#rb Michael.Galetzka

[CL 30969084 by patrick kelly in ue5-main branch]
2024-01-29 10:23:18 -05:00
charles derousiers
93f874d6d0 Add hair cards/meshes binding & deformation support for mobile renderer.
#rb charles.derousiers
[FYI] florin.pascu, henry.falconer, per.karefelt

#ushell-cherrypick of 30955220 by Charles.deRousiers

[CL 30955660 by charles derousiers in ue5-main branch]
2024-01-27 11:28:27 -05:00
charles derousiers
4b83d11313 Do not discard coverage dilation.
#rb charles.derousiers

#ushell-cherrypick of 30887053 by Charles.deRousiers

[CL 30955635 by charles derousiers in ue5-main branch]
2024-01-27 11:28:16 -05:00
guillaume abadie
ac1498f398 Works arround shader compiler issue with bindless in TSR
#jira UE-203870

[CL 30944520 by guillaume abadie in ue5-main branch]
2024-01-26 17:33:59 -05:00
eric renaudhoude
524e9efe7c Fix inaccurate luminance/illuminance meter values in Visualize HDR mode.
We now use accurate luminance coefficients provided by the working color space. Furthermore, the maximum nit value is read from the calculated luminance, not the RGB channels directly.

#jira UE-204922
#rb chris.kulla

[CL 30919831 by eric renaudhoude in ue5-main branch]
2024-01-26 07:50:40 -05:00
rune stubbe
b747798357 Mitigate performance regression from 30531313 (UE-204555), until we can fix extremely long and thin triangles during Nanite build.
Revert back to manual interpolation of depth in HW raster to improve shadow rendering performance.
Get rid of unused interpolator for visibility buffer rendering.
#jira UE-204555

#rb Ola.Olsson
[FYI] brian.karis, graham.wihlidal, jamie.hayes

[CL 30918596 by rune stubbe in ue5-main branch]
2024-01-26 06:20:32 -05:00
chris kulla
68dda024c0 Path Tracing: Minor optimization for decals to avoid writes into decal payload if decals are not used
Also simplify which instance mask is used for mesh decals by deriving it inside EvaluateDecals instead of passing it from outside the call.

This saves ~0.2ms on one benchmark scene without decals, and does not measurably affect scenes with decals.

#jira UE-202786

[CL 30914979 by chris kulla in ue5-main branch]
2024-01-26 01:02:43 -05:00
patrick kelly
36380c6163 Heterogeneous Volumes: Initial shadow-casting support and integration with translucency pass.
#rb stu.mckenna krzysztof.narkowicz
#jira UE-190041

[CL 30914925 by patrick kelly in ue5-main branch]
2024-01-26 00:33:50 -05:00
rune stubbe
2c750e1101 Immediate mode patch rasterizer optimizations:
-Implemented vertex caching to evaluate ~1 vertex per triangle instead of 3
-Separate immediate mode tessellation table that is constrained to one vert + one tri per lane
-Disable material range merging in raster binning, so UVDensities can be made scalar in immediate mode rasterizer
-Use wide LDS loads instead of permutes for some properties to reduce DS pressure

Patch rasterizer optimizations:
-Reduced max tess factor from 16 to 14 to increase occupancy of patch rasterizer
-Move UVDensities to per-patch work in patch rasterizer

Packed FTessellatedPatch data to fit in fewer registers to reduce VGPR/DS pressure
Added debug code to output SVG of tessellation pattern

#jira UE-197833
#rb brian.karis
[FYI] graham.wihlidal, jamie.hayes

[CL 30896683 by rune stubbe in ue5-main branch]
2024-01-25 14:54:04 -05:00
ola olsson
18385670fb Fixed stats instance counting issue for hierarchical culling that led to underreporting the work
#rb Rune.Stubbe

[CL 30883277 by ola olsson in ue5-main branch]
2024-01-25 08:07:17 -05:00
ola olsson
ed8c2170fb Add Vertex Factory controlled VSM constant bias that applies to Non-Nanite geometry
#jira UE-141201

#rb andrew.lauritzen

[CL 30882520 by ola olsson in ue5-main branch]
2024-01-25 07:14:46 -05:00
carl lloyd
79eba84fae Fix for incorrect regex expression when renaming uniform buffers in OpenGL
#rb tim.doerries

[CL 30880719 by carl lloyd in ue5-main branch]
2024-01-25 05:14:40 -05:00
charles derousiers
a20978d4fa Update Compact Mesh layout.
#rb charles.derousiers


#ushell-cherrypick of 30879809 by Charles.deRousiers

[CL 30880047 by charles derousiers in ue5-main branch]
2024-01-25 04:10:33 -05:00
Yuriy ODonnell
6931a0eb46 Various changes for cross-platform implementation of FDispatchShaderBundleCS
* Use explicit FShaderParameter-s instead of shader parameter struct macros, as this allows the shader to be more easily dispatched from low-level RHI code
* Add total record count parameter to ProcessShaderBundleRecord
* Add CFLAG_ForceBindful to the shader when required by a specific platform

#jira none
#rb christopher.waters, zach.bethel

[CL 30878011 by Yuriy ODonnell in ue5-main branch]
2024-01-25 00:23:59 -05:00
serge bernier
dc80470188 Support resizing of the ParticleGPUSimulation resources. Cascade emitter are currently allocating a lot of render resources to support gpu simulation.
Ex:
position        PF_A32B32G32R32F   1024x1024 (16MB)
velocity        PF_FloatRGBA             1024x1024 (8MB)
Attribute1    PF_B8G8R8A8             1024x1024 (4MB)
Attribute2    PF_B8G8R8A8             1024x1024 (4MB)

Thoses textures are all double buffered and at full sizes, they represent 64 MB of render targets.

When GFXCascadeGpuSpriteAllowDynAllocs is enabled, resizing happens when the TileAllocator of the FParticleSimulationResources run out of tiles. We then allocate new tiles and resize the gpu simulation resources render targets, each time by a factor 2 (X,Y). This introduce a TileZ index (TilePageIndex) which allow us to compute a tile page index at runtime and retrieve the resized tile coordinates. Resizing is limited by the maximum morton index possible (65535) and also the original resource sizes provided by the device profile (fx.GPUSimulationTextureSizeX,fx.GPUSimulationTextureSizeY).

The resizing feature can be enable with fx.Cascade.GpuSpriteDynamicAllocations cvar, and the initial size is controlled by fx.GPUSimulationDynTextureSizeXY.

Depending on the platform and map content, this can saves alot of memorey, ranging from 7MB to 56MB depending on the platforms.

*Resizing feature currently dont support GPU sorting mode. Radix sort shader algo will need to support keys/values of different format than 32 bits to work with the dynamic resizing of the gpu sim resources. To work around this limitation, we currently, ignore gpu sorting mode on cascade emitters when resizing is enable. A Jira was created to track this issue, and potentially add support in the future.

#reviewer [at]stu.meckenna, [at]rob.krajcarski
#rb rob.krajcarski, Stu.McKenna

[CL 30857374 by serge bernier in ue5-main branch]
2024-01-24 14:48:03 -05:00
charles derousiers
2e4c4d5e94 Change Curve and PointToCurve buffers from typed buffer to byteaddress buffer for scalarizing loads.
#rb charles.derousiers

[CL 30841515 by charles derousiers in ue5-main branch]
2024-01-24 09:19:28 -05:00
wouter dek
3f6344d47a Fix missing translated world space position in light function shaders
[CL 30840859 by wouter dek in ue5-main branch]
2024-01-24 08:58:21 -05:00