Commit Graph

72 Commits

Author SHA1 Message Date
jason hoerner
0267d53323 DisplayCluster: Virtual Shadow Map cache can optionally be allocated per view, and enabled that feature for Virtual Production. Significant performance win by avoiding constant cache thrashing when rendering multiple families. Frame time goes from ~45 ms to ~26 ms for Valley test scene (Node 2), GPU bound in both cases. Separate cache per view isn't implemented for split screen views (single view family with multiple views), but Virtual Production doesn't use that, and support for that could be added in the future.
#jira UE-142732
#rb andrew.lauritzen ola.olsson
#preflight 628d06a45c3ef99a7b2fffa3

[CL 20351116 by jason hoerner in ue5-main branch]
2022-05-24 13:17:13 -04:00
andrew lauritzen
14db6fec95 Update dynamic primitives to use payload encoding function
#rb ola.olsson
#preflight 627408f303269096abcd5353

[CL 20060881 by andrew lauritzen in ue5-main branch]
2022-05-05 14:02:57 -04:00
zach bethel
94dae5bea0 Ported GPU scene buffers to RDG.
#rb Krysztof.narkowicz
#preflight 627149d0fe09c0cfbc3c7bdd

[CL 20027095 by zach bethel in ue5-main branch]
2022-05-03 12:08:20 -04:00
jason hoerner
b19bb6be2f UE5_MAIN: Multi-view-family scene renderer refactor, part 1. Major structural change to allow scene renderer to accept multiple view families, with otherwise negligible changes in internal behavior.
* Added "BeginRenderingViewFamilies" render interface call that accepts multiple view families.  Original "BeginRenderingViewFamily" falls through to this.
* FSceneRenderer modified to include an array of view families, plus an active view family and the Views for that family.
* Swap ViewFamily to ActiveViewFamily.
* Swap Views array from TArray<FViewInfo> to TArrayView<FViewInfo>, including where the Views array is passed to functions.
* FSceneRenderer iterates over the view families, rendering each one at a time, as separate render graph executions.
* Some frame setup and cleanup logic outside the render graph runs once.
* Moved stateful FSceneRenderer members to FViewFamilyInfo, to preserve existing one-at-a-time view family rendering behavior.
* Display Cluster (Virtual Production) uses new API.

Next step will push everything into one render graph, which requires handling per-family external resources and cleaning up singletons (like FSceneTextures and FSceneTexturesConfig).  Once that's done, we'll be in a position to further interleave rendering, properly handle once per frame work, and solve artifacts in various systems.

#jira none
#rnx
#rb zach.bethel
#preflight 625df821b21bb49791d377c9

[CL 19813996 by jason hoerner in ue5-main branch]
2022-04-19 14:45:26 -04:00
andrew lauritzen
d78351e7a5 Check page mask in Nanite before invalidating VSM HZB to avoid excessive invalidations
Add cvar to force full HZB update for debugging
Force invalidation of static pages for any non-movable objects, even if they currently "draw velocity" due to editor movement
- Fixes shadows staying in the original location when dragging static objects in editor
- This is also step 1 to having a more heuristic static/dynamic per-prim split where things can go from one bucket to the other

#preflight 62585cb86e2c50550f07a394
#rb jamie.hayes

[CL 19766198 by andrew lauritzen in ue5-main branch]
2022-04-14 19:28:00 -04:00
andrew lauritzen
70a2837739 Move static separate cache to second texture array slice rather than "below" in UV space:
- Avoid gotchas with max texture size when static separate enabled
- Simplify addressing logic in a number of places
- Avoid allocating extra HZB that we never use

Details:
- Support rendering/sampling to 2D depth texture array in Nanite and virtual shadow map pass
- Remove some unnecessary HZB-related cvars
- Remove unused permutations from VSM HW raster

#preflight 624f4e5611261bc7b2171208
#rb jamie.hayes

#ROBOMERGE-AUTHOR: andrew.lauritzen
#ROBOMERGE-SOURCE: CL 19679616 via CL 19679656 via CL 19679706
#ROBOMERGE-BOT: UE5 (Release-Engine-Staging -> Main) (v938-19570697)

[CL 19680680 by andrew lauritzen in ue5-main branch]
2022-04-07 18:36:13 -04:00
andrew lauritzen
9a05ddf238 Partial workaround for scene captures invalidating virtual shadow maps:
- VSM caching now "skips" render calls/frames that did not include any VSM rendering. This allows disabling dynamic shadows or VSMs in scene captures to avoid wiping the cache for the main frame.

#preflight 624b3218323cb7b991431070
#rb graham.wihlidal

#ROBOMERGE-AUTHOR: andrew.lauritzen
#ROBOMERGE-SOURCE: CL 19614849 via CL 19615203 via CL 19615409
#ROBOMERGE-BOT: UE5 (Release-Engine-Staging -> Main) (v938-19570697)

[CL 19616302 by andrew lauritzen in ue5-main branch]
2022-04-04 18:25:43 -04:00
ola olsson
b86b9f1d70 Add tracking dirty VSM pages during Nanite rendering to reduce HZB build cost
- Make VSM HZB persistent, reallocated on pool size change.
- Enable two-pass HZB by default for Nanite VSM (r.Shadow.Virtual.UseHZB == 2)

#rb andrew.lauritzen
#preflight 623b1c6088538cd45e0ce824

#ROBOMERGE-AUTHOR: ola.olsson
#ROBOMERGE-SOURCE: CL 19478487 via CL 19481406 via CL 19481553
#ROBOMERGE-BOT: UE5 (Release-Engine-Staging -> Main) (v936-19480137)

[CL 19484029 by ola olsson in ue5-main branch]
2022-03-23 15:54:41 -04:00
Ola Olsson
245fe2702a Implement two-pass HZB occlusion culling for VSM (Nanite path) enabled through r.Shadow.Virtual.UseHZB = 2
- Refactor Nanite instance / cluster culling to accommodate VSM HZB tests without blowing out code size.
- Consolidate VSM HZB testing code to one flexible path.
- Always tests against static cached (when static separate is enabled) as this provides the best coverage.
- Currently performs a full HZB rebuild, which is expensive.

#rb rune.stubbe
#fyi andrew.lauritzen
#preflight 62309454e65a7e65d6855209
#robomerge fnnc

[CL 19384824 by Ola Olsson in ue5-main branch]
2022-03-15 10:05:21 -04:00
graham wihlidal
25f8e29153 Removed a massive number of Nanite rasterizer shader permutations across all platforms/shaderdbs, significantly improving iteration times for the editor and cooker, especially when these numbers get multiplied by the number of materials that utilize programmable features in addition to the default material "fixed function" path.
Reductions *per material*:

SM5
--
FHWRasterizeVS: 832 -> 21
FHWRasterizePS: 104 -> 39

SM6
--
FHWRasterizeVS: 320 -> 9
FHWRasterizeMS: 640 -> 9
FHWRasterizePS: 120 -> 30

Vulkan
--
FHWRasterizeVS: 320 -> 9
FHWRasterizePS: 40 -> 15

Other platforms redacted =)

-- Details

* CLUSTER_PER_PAGE has been fully removed (since we no longer ever run CLUSTER_PER_PAGE=0), which now makes it mutually inclusive with VIRTUAL_TEXTURE_TARGET
* HAS_RASTER_BIN has been replaced with a dynamic branch, since this is just a per cluster index offset based on a simple uniform buffer load
* ADD_CLUSTER_OFFSET has been replaced with a dynamic branch, since this is just a per cluster index offset based on a simple uniform buffer load
* HAS_PREV_DRAW_DATA has been replaced with a dynamic branch, since this is just a per cluster index offset based on a simple uniform buffer load
* NEAR_CLIP (only change to significantly affect codegen) has been turned into a dynamic branch based on FNaniteView - this lets us merge depth clip/clamp rasterizer calls in VSM together instead of relying on HAS_PREV_DRAW_DATA, and a future optimization can now be done to merge local and directional light full Nanite pipeline calls together.
* VISUALIZE permutation removed from VS/MS since it only loaded unform values that passed down per-vertex into fragment stage as nointerpolation parameters. Pixel shader now constructs this uint2 directly under the VISUALIZE permutation
* NANITE_MESH_SHADER_INTERP removed by default but still left in the code, since it is a work in progress potential optimization for DX12 mesh shaders
* Removed explicit Lumen and VSM usage of NANITE_RENDER_FLAG_HAVE_PREV_DRAW_DATA (now the dynamic branch path is only taken if CullRasterizeMultiPass implicitly breaks the rasterization into multiple calls due to NANITE_MAX_VIEWS_PER_CULL_RASTERIZE_PASS overflow)

Performance was tested on a 2080Ti in AncientGame, and the delta is effectively noise (tested cached and uncached VSM). Further testing on other platforms will occur, but important to get this change in for all the benefits and easy to tweak things later if needed.

#rb rune.stubbe
#fyi brian.karis, ola.olsson, andrew.lauritzen, jamie.hayes, daniel.wright, krzysztof.narkowicz
#preflight 622e684c7e2e35638c96a16a
#robomerge FNNC

[CL 19370372 by graham wihlidal in ue5-main branch]
2022-03-13 23:18:25 -04:00
Ola Olsson
cd997d47b2 Make page rect clipping consistently used and always before page flag overlap test (improves accuracy).
- also fix construction of top part of VSM HZB.

#rb rune.stubbe
#robomerge FNNC
#fyi andrew.lauritzen
#preflight 6220a20ad059c6be6c8ca719

[CL 19241653 by Ola Olsson in ue5-main branch]
2022-03-03 06:41:20 -05:00
Charles deRousiers
3b1b1c0a69 Merge ShaderPrint ShaderDrawDebug to ease binding/setup.
#rb none
#jira none
#preflight 62153e289e113332ba232936

[CL 19079474 by Charles deRousiers in ue5-main branch]
2022-02-22 15:35:01 -05:00
andrew lauritzen
ae021ee661 Rename cvar for consistency.
#preflight 620c18b601253d2e19ec8bdd

#ROBOMERGE-AUTHOR: andrew.lauritzen
#ROBOMERGE-SOURCE: CL 19059104 in //UE5/Release-5.0/... via CL 19074695
#ROBOMERGE-BOT: UE5 (Release-Engine-Staging -> Main) (v921-19075845)

[CL 19076601 by andrew lauritzen in ue5-main branch]
2022-02-22 13:52:40 -05:00
charles derousiers
2a2828c8a4 Add VirtualShadowMap prefix to all shaders related to virtual shadow map.
#rb andrew.lauritzen
#jira none
#preflight 620a2a37015ab8f37a3e2d2c
#lockdown juan.canada

#ROBOMERGE-AUTHOR: charles.derousiers
#ROBOMERGE-SOURCE: CL 18977253 in //UE5/Release-5.0/... via CL 18977320 via CL 18977399
#ROBOMERGE-BOT: UE5 (Release-Engine-Test -> Main) (v917-18934589)

[CL 18977416 by charles derousiers in ue5-main branch]
2022-02-14 05:44:50 -05:00
andrew lauritzen
c79193c915 Many LWC fixes for virtual shadow maaps:
- Shadow PreViewTranslation and ClipmapOrigin become full LWC tile/offset values on the GPU
- In most cases, the camera's and shadow's PreViewTranslations can be subtracted on the GPU to produce a regular-range value to transform from PrimaryView.TranslatedWorld to ShadowView.TranslatedWorld
- Miner cleanup and improvements to SMRT trace loop
- Remove special case for ortho matrices disabling PreViewTranslation in FViewMatrices
- Remove broken static function local and associated cvar r.PreViewTranslation

#preflight 6205dd571404d0fef964d721
#jira UE-139824
#rb graham.wihlidal
#lockdown juan.canada

#ROBOMERGE-AUTHOR: andrew.lauritzen
#ROBOMERGE-SOURCE: CL 18956693 in //UE5/Release-5.0/... via CL 18956877 via CL 18957087
#ROBOMERGE-BOT: UE5 (Release-Engine-Test -> Main) (v917-18934589)

[CL 18958948 by andrew lauritzen in ue5-main branch]
2022-02-11 14:57:27 -05:00
ola olsson
1e2e7de5aa Fix missing VSM invalidations (where the primitive transform is not updated, but e.g., a skinned mesh is animating)
- Add bHasDeformableMesh to FPrimitiSceneProxy to declare if the meshes are deformed, e.g., skeletal mesh, to be able to catch this case.
- also fix incorrect tracking of revealed primitives (causes invalidation errors with missing shadows when meshes are culled on the CPU).
- Add cvar switch to turn off the new behavior (for emergency use) r.Shadow.Virtual.Cache.DeformableMeshesInvalidate (defaults to 1)

#jira UE-133211
#rb andrew.lauritzen
#preflight 620389584c05b86e6d60185a
#lockdown juan.canada

#ROBOMERGE-AUTHOR: ola.olsson
#ROBOMERGE-SOURCE: CL 18915487 in //UE5/Release-5.0/... via CL 18920329 via CL 18922688
#ROBOMERGE-BOT: UE5 (Release-Engine-Test -> Main) (v916-18915374)

[CL 18923422 by ola olsson in ue5-main branch]
2022-02-09 15:11:16 -05:00
graham wihlidal
bac2a2b090 Moved all Nanite defines shared between C++ and shaders into a common header file, removing all the "keep this define in sync with this file" cases all over the code, and make the code a lot more maintainable. Common definitions now have a NANITE_ prefix to disambiguate global symbols
#rb rune.stubbe
#preflight 61f94f9ea6632a34f372dc39
[FYI] brian.karis, ola.olsson, jamie.hayes, andrew.lauritzen

#ROBOMERGE-AUTHOR: graham.wihlidal
#ROBOMERGE-SOURCE: CL 18808945 in //UE5/Release-5.0/... via CL 18809413 via CL 18822535
#ROBOMERGE-BOT: UE5 (Release-Engine-Test -> Main) (v908-18788545)

[CL 18823295 by graham wihlidal in ue5-main branch]
2022-02-02 05:33:52 -05:00
andrew lauritzen
482b3e6acf Reimplement clipmap panning to reduce full-level cache invalidations during camera movement
#rb graham.wihlidal
[FYI] ola.olsson
#jira UE-140434
#preflight 61f885a8a6632a34f3603419

#ROBOMERGE-AUTHOR: andrew.lauritzen
#ROBOMERGE-SOURCE: CL 18805354 in //UE5/Release-5.0/... via CL 18807961 via CL 18821749
#ROBOMERGE-BOT: UE5 (Release-Engine-Test -> Main) (v908-18788545)

[CL 18822110 by andrew lauritzen in ue5-main branch]
2022-02-02 02:18:54 -05:00
ola olsson
dc9e277e00 Disable (remaining) VSM shader compilation for platforms that don't support Nanite.
#rb graham.wihlidal
#preflight 61f2c33bf50f352300d098c6

#ROBOMERGE-AUTHOR: ola.olsson
#ROBOMERGE-SOURCE: CL 18754327 in //UE5/Release-5.0/... via CL 18754334 via CL 18757414
#ROBOMERGE-BOT: UE5 (Release-Engine-Test -> Main) (v903-18687472)

[CL 18758408 by ola olsson in ue5-main branch]
2022-01-27 14:34:07 -05:00
ola olsson
95f303b0cd Fix bug in new rendered primitive tracking for VSM (incorrectly used upper boundfor non-persistent primitive IDs).
#rb rune.stubbe
#preflight 61e7fb10614a721b0c3db801

#ROBOMERGE-AUTHOR: ola.olsson
#ROBOMERGE-SOURCE: CL 18656867 in //UE5/Release-5.0/... via CL 18656871 via CL 18656875
#ROBOMERGE-BOT: UE5 (Release-Engine-Test -> Main) (v900-18638592)

[CL 18656880 by ola olsson in ue5-main branch]
2022-01-19 07:04:51 -05:00
christopher waters
d4aebdb7cb Fixing out of bounds array access with non-nanite assets.
#jira none
#rb zach.bethel
[FYI] ola.olsson
#preflight none

#ROBOMERGE-AUTHOR: christopher.waters
#ROBOMERGE-SOURCE: CL 18649580 in //UE5/Release-5.0/... via CL 18649866 via CL 18650114
#ROBOMERGE-BOT: UE5 (Release-Engine-Test -> Main) (v900-18638592)

[CL 18650361 by christopher waters in ue5-main branch]
2022-01-18 17:56:53 -05:00
andrew lauritzen
7960520813 Encode hierarchical page flags inside bits of page flags mip structure.
- Eliminates the additional HPageFlags buffer and associated scalar array indexing in constant buffer
- Unifies addressing logic and helpers (effectively now the addressing is just MipLevel + HMipLevel)
- Small reduction in memory
Move PageFlags and PageRectBounds into the VSM uniform buffer - similar to the page table - to avoid needing to individually funnel them through various interfaces that need to check page overlap
Rename nanitestats VSM_Perspective to VSM_Local for consistency with other cvars

#rb ola.olsson
#preflight 61e5e57ea2616066f68f3453

#ROBOMERGE-AUTHOR: andrew.lauritzen
#ROBOMERGE-SOURCE: CL 18642391 in //UE5/Release-5.0/... via CL 18642432 via CL 18642483
#ROBOMERGE-BOT: UE5 (Release-Engine-Test -> Main) (v900-18638592)

[CL 18642585 by andrew lauritzen in ue5-main branch]
2022-01-18 13:05:54 -05:00
ola olsson
5d9aa916c8 Add tracking of culled prims for cached VSM, to correctly and efficiently invalidate primitives that move (even if they are culled)
- Disable view-dependent CPU culling of primitives for local-light VSMs (to fix incorrect caching).
- Track per-light cache data for clipmaps to enable storing bit flags for rendered primitives.
- Make invalidations skip primitives that were never rendered into a clipmap, and mark invalidated primitives as not rendered.

#rb andrew.lauritzen
#preflight 61e6c37b3778a195deabfb8a

#ROBOMERGE-AUTHOR: ola.olsson
#ROBOMERGE-SOURCE: CL 18639158 in //UE5/Release-5.0/... via CL 18639165 via CL 18639180
#ROBOMERGE-BOT: UE5 (Release-Engine-Test -> Main) (v899-18417669)

[CL 18639190 by ola olsson in ue5-main branch]
2022-01-18 08:51:35 -05:00
andrew lauritzen
a94f888d56 Normalize opacity clamping logic between PCF and VSM path. The former was basically treating relevant materials as having a max of 0.999 (changed now to 0.99 to make a few other cases robust) opacity always and thus never occluded backfaces on two-sided materials.
Fix warning triggering incorrectly when enabling/disabling VSMs.
Make contact shadow stepoffset the same between VSM and engine contact shadows paths.

#preflight 61b8e845c8566c1582c8255b
#rb ola.olsson

#ROBOMERGE-AUTHOR: andrew.lauritzen
#ROBOMERGE-SOURCE: CL 18467898 in //UE5/Release-5.0/... via CL 18467900
#ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v899-18417669)

[CL 18467908 by andrew lauritzen in ue5-release-engine-test branch]
2021-12-15 12:03:31 -05:00
ola olsson
185679e580 Fix invalidations due to things like WPO (that are fed back on the GPU)
#rb andrew.lauritzen
#preflight 61b0b1a218370fb3a0ea9b20

#ROBOMERGE-AUTHOR: ola.olsson
#ROBOMERGE-SOURCE: CL 18417019 in //UE5/Release-5.0/... via CL 18417035
#ROBOMERGE-BOT: STARSHIP (Release-Engine-Staging -> Release-Engine-Test) (v897-18405271)

[CL 18417047 by ola olsson in ue5-release-engine-test branch]
2021-12-09 04:08:19 -05:00