Files
UnrealEngineUWP/Engine/Shaders/ReflectionEnvironmentComputeShaders.usf
Gil Gribb 80f6fa5fa7 Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3231693)
#lockdown Nick.Penwarden
#rb none

==========================
MAJOR FEATURES + CHANGES
==========================

Change 3219796 on 2016/12/02 by Rolando.Caloca

	DR - vk - Increase timeout to 60ms

Change 3219884 on 2016/12/02 by Daniel.Wright

	Assert to help track down rare crash locking capsule indirect shadow vertex buffer

Change 3219885 on 2016/12/02 by Daniel.Wright

	Fixed saving a package that doesn't exist on disk but exists in p4 at a newer revision when the user chooses 'Mark Writable'

Change 3219886 on 2016/12/02 by Daniel.Wright

	Don't create projected shadows when r.ShadowQuality is 0
	* Fixes crash in the forward path trying to render shadows
	* In the deferred path, the shadowmap was still being rendered and only the projection skipped, now all cost will be skipped

Change 3219887 on 2016/12/02 by Daniel.Wright

	Changed ClearRenderTarget2D default alpha to 1, which is necessary for correct compositing

Change 3219893 on 2016/12/02 by Daniel.Wright

	AMD AGS library with approved TPS
	Disabled DFAO on AMD pre-GCN PC video cards to workaround a driver bug which won't be fixed (Radeon 6xxx and below)

Change 3219913 on 2016/12/02 by Daniel.Wright

	Level unload of a lighting scenario propagates the lighting scenario change - fixes crash when precomputed lighting volume data gets unloaded

Change 3220029 on 2016/12/02 by Daniel.Wright

	Async shader compiling now recreates scene proxies which are affected by the material which was compiled.  This fixes crashes which were occuring as proxies cache various material properties, but applying compiled materials would not update these cached properties (bRequiresAdjacencyInformation).
	* A new ensure has been added in FMeshElementCollector::AddMesh and FBatchingSPDI::DrawMesh to catch attempts to render with a material not reported in GetUsedMaterials
	* Fixed UParticleSystemComponent::GetUsedMaterials and UMaterialBillboardComponent::GetUsedMaterials
	* FMaterialUpdateContext should be changed to use the same pattern, but that hasn't been done yet

Change 3220108 on 2016/12/02 by Daniel.Wright

	Fixed shadowmap channel assignment for stationary lights which are not in a lighting scenario level, when a lighting scenario level is present

Change 3220504 on 2016/12/03 by Mark.Satterthwaite

	Metal Desktop Tessellation support from Unicorn.
	- Apple: Metal tessellation support added to MetalShaderFormat, MetalRHI and incl. changes to engine runtime/shaders for Desktop renderer and enabled in ElementalDemo by default (OS X 10.11 will run SM4).
	- Epic: Support for different Metal shader standards on Mac, iOS & tvOS which required moving some RHI functions around as this is a project setting and not a compile-time constant.
	- Epic: Fragment shader UAV support, which is also tied to newer Metal shader standard like Tessellation.
	- Epic: Significant refactor of MetalRHI's internals to clearly separate state-caching from render-pass management and command-encoding.
	- Epic: Internal MetalRHI validation code is now cleanly separated out into custom implementations of the Metal @protocol's and is on by default.
	- Epic: Various fixes to Layered Rendering for Metal.
	- Omits Mobile Tessellation support which needs further revision.

Change 3220881 on 2016/12/04 by Mark.Satterthwaite

	Compiles fixes for iOS & static analysis fixes from Windows.

Change 3221180 on 2016/12/05 by Guillaume.Abadie

	Avoid compiling PreviousFrameSwitch's both Current Frame and Previous Frame inputs every time.

Change 3221217 on 2016/12/05 by Chris.Bunner

	More NVAPI warning fixups.

Change 3221219 on 2016/12/05 by Chris.Bunner

	When comparing overriden properties used to force instance recompilation we need to check against the base material, not assume the immediate parent.
	#jira UE-37792

Change 3221220 on 2016/12/05 by Chris.Bunner

	Exported GetAllStaticSwitchParamNames and GetAllStaticComponentMaskParamNames.
	#jira UE-35132

Change 3221221 on 2016/12/05 by Chris.Bunner

	PR #2785: Fix comment typo in RendererInterface.h (Contributed by dustin-biser)
	#jira UE-35760

Change 3221223 on 2016/12/05 by Chris.Bunner

	Default to include dev-code when compiling material preview stats.
	#jira UE-20321

Change 3221534 on 2016/12/05 by Rolando.Caloca

	DR - Added FDynamicRHI::GetName()

Change 3221833 on 2016/12/05 by Chris.Bunner

	Set correct output extent on PostProcessUpscale (allows users to extend chain correctly).
	#jira UE-36989

Change 3221852 on 2016/12/05 by Chris.Bunner

	32-bit/ch EXR screenshot and frame dump output.
	Fixed row increment bug in 128-bit/px surface format readback.
	#jira UE-37962

Change 3222059 on 2016/12/05 by Rolando.Caloca

	DR - vk - Fix memory type not found

Change 3222104 on 2016/12/05 by Rolando.Caloca

	DR - Lambdaize
	- Added quicker method to check if system textures are initialized

Change 3222290 on 2016/12/05 by Mark.Satterthwaite

	Trivial fixes to reporting Metal shader pipeline errors - need to check if Hull & Domain exist.

Change 3222864 on 2016/12/06 by Rolando.Caloca

	DR - Fix mem leak when exiting

Change 3222873 on 2016/12/06 by Rolando.Caloca

	DR - vk - Minor info to help track down leaks

Change 3222875 on 2016/12/06 by Rolando.Caloca

	DR - Fix mem leak with VisualizeTexture
	#jira UE-39360

Change 3223226 on 2016/12/06 by Chris.Bunner

	Static analysis warning workaround.

Change 3223235 on 2016/12/06 by Ben.Woodhouse

	Integrate from NREAL: Set a custom projection matrix on a SceneCapture2D

Change 3223343 on 2016/12/06 by Chris.Bunner

	Moved HLOD persistent data to viewstate to fix per-view compatability.
	#jira UE-37539

Change 3223349 on 2016/12/06 by Chris.Bunner

	Fixed HLOD with FreezeRendering command.
	#jira UE-29839

Change 3223371 on 2016/12/06 by Michael.Trepka

	Removed obsolete check() in FMetalSurface constructor

Change 3223450 on 2016/12/06 by Chris.Bunner

	Added explicit ScRGB output device selection rather than Nvidia-only hardcoded checks. Allows easier support for Mac and other devices moving forward.

Change 3223638 on 2016/12/06 by Michael.Trepka

	Restored part of the check() in FMetalSurface constructor removed in CL 3223371

Change 3223642 on 2016/12/06 by Mark.Satterthwaite

	Experimental Metal EDR/HDR output support for Mac (iOS/tvOS need custom formats & shaders so they are not supported yet).
	- Only available on macOS Sierra (10.12) for Macs with HDR displays (e.g. Retina iMacs).
	- Enable with -metaledr command-line argument as it is off-by-default.
	- Sets up the CAMetalLayer & the back-buffer for RGBA_FP16 output on Mac using DCI-P3 as the color gamut and ACES 1000 nit ScRGB output encoding.

Change 3223830 on 2016/12/06 by Rolando.Caloca

	DR - vk - Better error when finding an invalid Vulkan driver
	#jira UE-37495

Change 3223869 on 2016/12/06 by Rolando.Caloca

	DR - vk - Reuse fences

Change 3223906 on 2016/12/06 by Guillaume.Abadie

	Fix alpha through TempAA artifact causing a small darker edge layouts.

Change 3224199 on 2016/12/06 by Mark.Satterthwaite

	Fix a dumb copy-paste error from the HDR changes to Metal.

Change 3224220 on 2016/12/06 by Mark.Satterthwaite

	Fix various errors with Metal UAV & Render-Pass Restart support so that we can use the Pixel Shader culling for DistanceField effects.
	- Unfortunately Metal requires that a texture be bound to start a render-pass, so reuse the dummy depth-stencil surface from the problematic editor preview tile rendering.

Change 3224236 on 2016/12/06 by Mark.Satterthwaite

	IWYU CIS compile fix for iOS.

Change 3224366 on 2016/12/06 by Mark.Satterthwaite

	Simplify some of the changes from CL# 3224220 so that we don't perform unnecessary clears.
	- If the RenderPass is broken to issue compute or blit operations then treat the cached RenderTargetsInfo as invalid, unless the RenderPass is restarted.
	- This guarantees that we don't erroneously ignore calls to SetRenderTargets if the calling code issues a dispatch between two RenderPasses that use the same RenderTargetsInfo.

Change 3224416 on 2016/12/06 by Uriel.Doyon

	New default implementation for UPrimitiveComponent::GetStreamingTextureInfo using a conservative heuristic where the textures are stretched across the bounds.
	Optimized UPrimitiveComponent::GetStreamingTextureInfoWithNULLRemoval by not handling registered components with no proxy (essentially hidden game / collision primitives).

	Added blueprint support for texture streaming built data through FStaticMeshComponentInstanceData.

	Fix for material texture streaming data not being available on some cooked builds.

	Enabled split requests on all texture load requests (first loading everything visible and then loaded everything not visible).
	This is controlled by "r.Streaming.MinMipForSplitRequest" which defines the minimum mip for which to allow splitting.
	Forced residency are now loaded in two steps (visible, then forced), improving reactiveness.

	Updated "stat streaming" to include "UnkownRefMips" which represent texture with no known component referencing them,
	and also "LastRenderTimeMips" which related to timed primitives.
	Changed "Forced Mips" so that it only shows mips that are loaded become of forced residency.

	"Texture Streaming Build" now updates the map check after execution.

	Removed Orphaned texture logic as this has  become irrelevant with the latest retention priority logic.

	Updated "r.streaming.usenewmetrics" so that it shows behavior before and after 4.12 improvements.

Change 3224532 on 2016/12/07 by Uriel.Doyon

	Integrated CL 3223965 :

	Building texture streaming data for materials does not wait for pending shaders to finish compilation anymore.
	Added more options to allow the user to cancel this build also.

Change 3224714 on 2016/12/07 by Ben.Woodhouse

	Cherry pick CL 3223972 from //fortnite/main:

	Disable Geometry shader onchip on XB1. This saves 4ms for a single shadow casting point light @ 512x512 (4.8ms to 1.8ms)

Change 3224715 on 2016/12/07 by Ben.Woodhouse

	New version of d3dx12.h from Microsoft which incorporates my suggested static analysis fixes. This avoids us diverging from the official version

Change 3224975 on 2016/12/07 by Rolando.Caloca

	DR - vk - Dump improvements

Change 3225012 on 2016/12/07 by Rolando.Caloca

	DR - Show warning if trying to use num samples != (1,2,4,8,16)

Change 3225126 on 2016/12/07 by Chris.Bunner

	Added 'force 128-bit rendering pipeline' to high-res screenshot tool.
	#jira UE-39345

Change 3225449 on 2016/12/07 by Chris.Bunner

	Updated engine rendering defaults to better match current best practices.
	#jira UE-38081

Change 3225485 on 2016/12/07 by Chris.Bunner

	Moved QuantizeSceneBufferSize to RenderCore and added call for PostProcess settings. Fixes screenpercentage out-of-bounds reads in some cases.
	#jira UE-19394

Change 3225486 on 2016/12/07 by Chris.Bunner

	Only disable TAA during HighResScreenshots if we don't have a reasonable frame-delay enabled.

Change 3225505 on 2016/12/07 by Daniel.Wright

	Fixed exponential height fog disappearing with no skybox

Change 3225655 on 2016/12/07 by Benjamin.Hyder

	Updating TM-Shadermodels to include Translucent lighting, Two sided, updated cloth animation, and adjusted lighting.

Change 3225668 on 2016/12/07 by Chris.Bunner

	Dirty owning packages when user manually forces regeneration of all reflection captures.
	#jira UE-38759

Change 3226139 on 2016/12/07 by Rolando.Caloca

	DR - Fix recompute tangents disabling skin cache
	- Make some macros into lambdas
	#jira UE-39143

Change 3226212 on 2016/12/07 by Daniel.Wright

	Features which require a full prepass use DDM_AllOpaque instead of DDM_AllOccluders, which can be skipped if the component has bUseAsOccluder=false

Change 3226213 on 2016/12/07 by Daniel.Wright

	Scene Capture 2D can specify a global clip plane, which is useful for portals
	* Requires the global clip plane project setting to be enabled

Change 3226214 on 2016/12/07 by Daniel.Wright

	Improved deferred shadowing with MSAA by upsampling light attenuation intelligently in the base pass
	* If the current fragment's depth doesn't match what was used for deferred shadowing, the neighbor (cross pattern) with the nearest depth's shadowing is used
	* Edge artifacts can still occur where the upsample fails or the shadow factor was computed per-sample due to depth / stencil testing
	* Indirect Occlusion from capsule shadows also uses the nearest depth neighbor UV for no extra cost
	* Base pass on 970 GTX 1.69ms -> 1.85ms (.16ms) in RoboRecall

Change 3226258 on 2016/12/07 by Rolando.Caloca

	DR - Typo fix

Change 3226259 on 2016/12/07 by Rolando.Caloca

	DR - compile fix
	#jira UE-39143

Change 3226932 on 2016/12/08 by Chris.Bunner

	Re-saved Infiltrator maps to update reflection captures.
	#jira UE-38759

Change 3227063 on 2016/12/08 by Mark.Satterthwaite

	For Metal platforms ONLY temporarily disable USE_LIGHT_GRID_REFLECTION_CAPTURE_CULLING to avoid UE-37436 while the Nvidia driver team investigate why this doesn't work for them but does for the others. This won't affect non-Metal platforms and the intent is to revert this prior to 4.16 provided we can work through the problem with Nvidia.
	#jira UE-37436

Change 3227120 on 2016/12/08 by Gil.Gribb

	Merging //UE4/Dev-Main@3226895 to Dev-Rendering (//UE4/Dev-Rendering)

Change 3227211 on 2016/12/08 by Arne.Schober

	DR - UE-38585 - Fixing crash where HierInstStaticMesh duplication fails. Also reverting the fix from UE-28189 which is redundant.

Change 3227257 on 2016/12/08 by Marc.Olano

	Extension to PseudoVolumeTexture for more flexible layout
	Change by ryan.brucks

Change 3227286 on 2016/12/08 by Rolando.Caloca

	DR - Fix crash when using custom expressions and using reserved keywords
	#jira UE-39311

Change 3227376 on 2016/12/08 by Mark.Satterthwaite

	Must not include a private header inside the MenuStack public header as that causes compile errors in plugins.

Change 3227415 on 2016/12/08 by Mark.Satterthwaite

	Fix shader compilation due to my disabling of USE_LIGHT_GRID_REFLECTION_CAPTURE_CULLING on Metal - InstancedCompositeTileReflectionCaptureIndices needs to be defined even though Metal doesn't support instanced-stereo rendering.

Change 3227516 on 2016/12/08 by Daniel.Wright

	Implemented UWidgetComponent::GetUsedMaterials

Change 3227521 on 2016/12/08 by Guillaume.Abadie

	Fixes post process volume's indirect lighting color.

	#jira UE-38888

Change 3227567 on 2016/12/08 by Marc.Olano

	New upscale filters: Lanczos-2 (new default), Lanczos-3 and Gaussian Unsharp Mask

Change 3227628 on 2016/12/08 by Daniel.Wright

	Removed redundant ResolveSceneDepthTexture from the merge

Change 3227635 on 2016/12/08 by Daniel.Wright

	Forward renderer supports shadowing from movable lights and light functions
	* Only 4 shadow casting movable or stationary lights can overlap at any point in space, otherwise the movable lights will lose their shadows and an on-screen message will be displayed
	* Light functions only work on shadow casting lights since they need a shadowmap channel to be assigned

Change 3227660 on 2016/12/08 by Rolando.Caloca

	DR - vk - Fix r.MobileMSAA on Vulkan
	- r.MobileMSAA is now read-only (to be fixed on 4.16)
	- Show time for PSO creation hitches
	#jira UE-39184

Change 3227704 on 2016/12/08 by Mark.Satterthwaite

	Fix Mac HDR causing incorrect output color encoding being used, HDR support is now entirely off unless you pass -metaledr which will enable it regardless of whether the current display supports HDR (as we haven't written the detection code yet). Fixed the LUT/UI compositing along the way - Mac Metal wasn't using volume LUT as it should have been, RHISupportsVertexShaderLayer now correctly returns false for non-Mac Metal platforms.

Change 3227705 on 2016/12/08 by Daniel.Wright

	Replaced built-in samplers in the nearest depth translucency upsample because the built-in samplers are no longer bound on PC (cl 2852426)

Change 3227787 on 2016/12/08 by Chris.Bunner

	Added extent clear to motion blur pass to catch misized buffers bringing in errors.
	Added early out to clear call when excluded region matches RT region.
	#jira UE-39437

Change 3228177 on 2016/12/08 by Marc.Olano

	Fix DCC sqrt(int) error

Change 3228285 on 2016/12/08 by Chris.Bunner

	Back out changelist 3225449.
	#jira UE-39528

Change 3228680 on 2016/12/09 by Gil.Gribb

	Merging //UE4/Dev-Main@3228528 to Dev-Rendering (//UE4/Dev-Rendering)

Change 3228940 on 2016/12/09 by Mark.Satterthwaite

	Editor fixes for 4.15:
	- PostProcessTonemap can't fail to bind a texture to the ColorLUT or the subsequent rendering will be garbage: the changes for optimising stereo rendering forgot to account for the Editor's use of Views without States for the asset preview thumbnails. Amended the CombineLUT post-processing to allocate a local output texture when there's no ViewState and read from this when this situation arises which makes everything function again.
	- Don't start render-passes without a valid render-target-array in MetalRHI.

Change 3228950 on 2016/12/09 by Mark.Satterthwaite

	Make GPUSkinCache run on Mac Metal - it wasn't working because it was forcibly disabled on all platforms but for Windows D3D 11.
	- Fixed the Skeleton editor tree trying to access a widget before it has been constructed.
	- Enable GPUSkinCache for Metal SM5: doesn't render correctly, even on AMD, so needs Radar's filing and investigation.
	#jira UE-39256

Change 3229013 on 2016/12/09 by Mark.Satterthwaite

	Further tidy up in SSkeletonTreeView as suggested by Nick.A.

Change 3229101 on 2016/12/09 by Chris.Bunner

	Log compile error fix and updated cvar comments.

Change 3229236 on 2016/12/09 by Ben.Woodhouse

	XB1 D3D11 and D3D12: Use the DXGI frame statistics to get accurate GPU time unaffected by bubbles

Change 3229430 on 2016/12/09 by Ben.Woodhouse

	PR #2680: Optimized histogram generation. (Contributed by PjotrSvetachov)

	Profiled on nvidia 980GTX (2x faster), and on XB1 (marginally faster)

Change 3229580 on 2016/12/09 by Marcus.Wassmer

	DepthBoundsTest for AMD.

Change 3229701 on 2016/12/09 by Michael.Trepka

	Changed "OS X" to "macOS" in few places where we display it and updated the code that asks users to update to latest version to check for 10.12.2

Change 3229706 on 2016/12/09 by Chris.Bunner

	Added GameUserSettings controls for HDR display output.
	Removed Metal commandline as this should replace the need for it.

Change 3229774 on 2016/12/09 by Michael.Trepka

	Disabled OpenGL on Mac. -opengl is now ignored, we always use Metal. On old Macs that do not support Metal we show a message saying that the app requires Metal and exit.

Change 3229819 on 2016/12/09 by Chris.Bunner

	Updated engine rendering defaults to better match current best practices.
	#jira UE-38081

Change 3229948 on 2016/12/09 by Rolando.Caloca

	DR - Fix d3d debug error
	#jira UE-39589

Change 3230341 on 2016/12/11 by Mark.Satterthwaite

	Don't fatally assert that the game-thread stalled waiting for the rendering thread in the Editor executable, even when running -game as the rendering thread can take a while to respond if shaders need to be compiled.
	#jira UE-39613

Change 3230860 on 2016/12/12 by Marcus.Wassmer

	Experimental Nvidia AFR support.

Change 3230930 on 2016/12/12 by Mark.Satterthwaite

	Disable RHICmdList state-caching on Mac - Metal already does this internally and depends on receiving all state changes in order to function.

Change 3231252 on 2016/12/12 by Marcus.Wassmer

	Fix NumGPU detection. (SLI only crash)

Change 3231486 on 2016/12/12 by Mark.Satterthwaite

	Fix a stupid mistake in MetalStateCache::CommitResourceTable that would unnecessarily rebind samplers.

Change 3231661 on 2016/12/12 by Mark.Satterthwaite

	Retain the RHI samplers in MetalRHI to guarantee lifetime.

[CL 3231696 by Gil Gribb in Main branch]
2016-12-12 17:47:42 -05:00

612 lines
21 KiB
Plaintext

// Copyright 1998-2017 Epic Games, Inc. All Rights Reserved.
/*=============================================================================
ReflectionEnvironmentComputeShaders - functionality to apply local cubemaps.
=============================================================================*/
#include "Common.usf"
#include "DeferredShadingCommon.usf"
#include "BRDF.usf"
#include "ReflectionEnvironmentShared.usf"
#include "SkyLightingShared.usf"
#include "ShadingModels.usf"
#include "LightGridCommon.usf"
#define THREADGROUP_TOTALSIZE (THREADGROUP_SIZEX * THREADGROUP_SIZEY)
// Workaround performance issue with shared memory bank collisions in GLSL
#if GL4_PROFILE
#define ATOMIC_REDUCTION 0
#else
#define ATOMIC_REDUCTION 0
#endif
#define AABB_INTERSECT 1
#define VISUALIZE_OVERLAP 0
uint NumCaptures;
/** View rect min in xy, max in zw. */
uint4 ViewDimensions;
/** Min and Max depth for this tile. */
groupshared uint IntegerTileMinZ;
groupshared uint IntegerTileMaxZ;
/** Inner Min and Max depth for this tile. */
groupshared uint IntegerTileMinZ2;
groupshared uint IntegerTileMaxZ2;
/** Number of reflection captures affecting this tile, after culling. */
groupshared uint TileNumReflectionCaptures;
/** Indices into the capture data buffer of captures that affect this tile, computed by culling. */
groupshared uint TileReflectionCaptureIndices[MAX_CAPTURES];
/** Capture indices after sorting. */
groupshared uint SortedTileReflectionCaptureIndices[MAX_CAPTURES];
// Whether to use reflection captures culled to the light grid, or tiled culling
// .93ms -> .70ms when enabled on 970 GTX. PS4 cost is about the same.
#define USE_LIGHT_GRID_REFLECTION_CAPTURE_CULLING !COMPILER_METAL
#define SORT_CAPTURES 1
#if USE_LIGHT_GRID_REFLECTION_CAPTURE_CULLING
#define CompositeTileReflectionCaptureIndices CulledLightDataGrid
#define InstancedCompositeTileReflectionCaptureIndices InstancedCulledLightDataGrid
#else
#if SORT_CAPTURES
#define CompositeTileReflectionCaptureIndices SortedTileReflectionCaptureIndices
#define InstancedCompositeTileReflectionCaptureIndices SortedTileReflectionCaptureIndices
#else
#define CompositeTileReflectionCaptureIndices TileReflectionCaptureIndices
#define InstancedCompositeTileReflectionCaptureIndices TileReflectionCaptureIndices
#endif
#endif
#define REFLECTION_COMPOSITE_USE_BLENDED_REFLECTION_CAPTURES 1
#define REFLECTION_COMPOSITE_SUPPORT_SKYLIGHT_BLEND 1
#include "ReflectionEnvironmentComposite.usf"
#if !ATOMIC_REDUCTION
#if THREADGROUP_TOTALSIZE < 107
#define TILE_Z_SIZE 107
#else
#define TILE_Z_SIZE THREADGROUP_TOTALSIZE
#endif
groupshared float TileZ[TILE_Z_SIZE];
#endif
void ComputeTileMinMax(uint ThreadIndex, float SceneDepth, out float MinTileZ, out float MaxTileZ, out float MinTileZ2, out float MaxTileZ2)
{
#if ATOMIC_REDUCTION
// Initialize per-tile variables
if (ThreadIndex == 0)
{
IntegerTileMinZ = 0x7F7FFFFF;
IntegerTileMaxZ = 0;
IntegerTileMinZ2 = 0x7F7FFFFF;
IntegerTileMaxZ2 = 0;
}
GroupMemoryBarrierWithGroupSync();
// Use shared memory atomics to build the depth bounds for this tile
// Each thread is assigned to a pixel at this point
InterlockedMin(IntegerTileMinZ, asuint(SceneDepth));
InterlockedMax(IntegerTileMaxZ, asuint(SceneDepth));
GroupMemoryBarrierWithGroupSync();
MinTileZ = asfloat(IntegerTileMinZ);
MaxTileZ = asfloat(IntegerTileMaxZ);
float HalfZ = .5f * (MinTileZ + MaxTileZ);
// Compute a second min and max Z, clipped by HalfZ, so that we get two depth bounds per tile
// This results in more conservative tile depth bounds and fewer intersections
if (SceneDepth >= HalfZ)
{
InterlockedMin(IntegerTileMinZ2, asuint(SceneDepth));
}
if (SceneDepth <= HalfZ)
{
InterlockedMax(IntegerTileMaxZ2, asuint(SceneDepth));
}
GroupMemoryBarrierWithGroupSync();
MinTileZ2 = asfloat(IntegerTileMinZ2);
MaxTileZ2 = asfloat(IntegerTileMaxZ2);
#else
TileZ[ThreadIndex] = SceneDepth;
GroupMemoryBarrierWithGroupSync();
THREADGROUP_TOTALSIZE;
if (ThreadIndex < 32)
{
float Min = SceneDepth;
float Max = SceneDepth;
for ( int i = ThreadIndex+32; i< THREADGROUP_TOTALSIZE; i+=32)
{
Min = min( Min, TileZ[i]);
Max = max( Max, TileZ[i]);
}
TileZ[ThreadIndex] = Min;
TileZ[ThreadIndex + 32] = Max;
}
GroupMemoryBarrierWithGroupSync();
if (ThreadIndex < 8)
{
float Min = TileZ[ThreadIndex];
float Max = TileZ[ThreadIndex + 32];
Min = min( Min, TileZ[ThreadIndex + 8]);
Max = max( Max, TileZ[ThreadIndex + 40]);
Min = min( Min, TileZ[ThreadIndex + 16]);
Max = max( Max, TileZ[ThreadIndex + 48]);
Min = min( Min, TileZ[ThreadIndex + 24]);
Max = max( Max, TileZ[ThreadIndex + 56]);
TileZ[ThreadIndex + 64] = Min;
TileZ[ThreadIndex + 96] = Max;
}
GroupMemoryBarrierWithGroupSync();
if (ThreadIndex == 0)
{
float Min = TileZ[64];
float Max = TileZ[96];
for ( int i = 1; i< 8; i++)
{
Min = min( Min, TileZ[i+64]);
Max = max( Max, TileZ[i+96]);
}
IntegerTileMinZ = asuint(Min);
IntegerTileMaxZ = asuint(Max);
}
GroupMemoryBarrierWithGroupSync();
MinTileZ = asfloat(IntegerTileMinZ);
MaxTileZ = asfloat(IntegerTileMaxZ);
float HalfZ = .5f * (MinTileZ + MaxTileZ);
MinTileZ2 = HalfZ;
MaxTileZ2 = HalfZ;
#endif
}
bool SphereVsBox( float3 SphereCenter, float SphereRadius, float3 BoxCenter, float3 BoxExtent )
{
float3 ClosestOnBox = max( 0, abs( BoxCenter - SphereCenter ) - BoxExtent );
return dot( ClosestOnBox, ClosestOnBox ) < SphereRadius * SphereRadius;
}
// Culls reflection captures in the scene with the current tile
// Outputs are stored in shared memory
void DoTileCulling(uint3 GroupId, uint ThreadIndex, float MinTileZ, float MaxTileZ, float MinTileZ2, float MaxTileZ2)
{
#if AABB_INTERSECT
float3 TileBoxCenter;
float3 TileBoxExtent;
// can be optmized
// left top front
float2 ScreenUV0 = float2((GroupId.xy + int2(0, 0))* float2(THREADGROUP_SIZEX, THREADGROUP_SIZEY) + .5f) / (ViewDimensions.zw - ViewDimensions.xy);
float3 ScreenPos0 = float3(float2(2.0f, -2.0f) * ScreenUV0 + float2(-1.0f, 1.0f), ConvertToDeviceZ(MinTileZ));
// right bottom back
float2 ScreenUV1 = float2((GroupId.xy + int2(1, 1)) * float2(THREADGROUP_SIZEX, THREADGROUP_SIZEY) - .5f) / (ViewDimensions.zw - ViewDimensions.xy);
float3 ScreenPos1 = float3(float2(2.0f, -2.0f) * ScreenUV1 + float2(-1.0f, 1.0f), ConvertToDeviceZ(MaxTileZ));
// back rect
float4 ViewPos0 = mul(float4(ScreenPos0.x, ScreenPos0.y, ScreenPos1.z, 1), View.ClipToView); ViewPos0.xyz /= ViewPos0.w;
float4 ViewPos1 = mul(float4(ScreenPos0.x, ScreenPos1.y, ScreenPos1.z, 1), View.ClipToView); ViewPos1.xyz /= ViewPos1.w;
float4 ViewPos2 = mul(float4(ScreenPos1.x, ScreenPos0.y, ScreenPos1.z, 1), View.ClipToView); ViewPos2.xyz /= ViewPos2.w;
float4 ViewPos3 = mul(float4(ScreenPos1.x, ScreenPos1.y, ScreenPos1.z, 1), View.ClipToView); ViewPos3.xyz /= ViewPos3.w;
// front point
// Warning: this assumes a point at the near depth, which is not a valid assumption, will cause culling artifacts
float4 ViewPos4 = mul(float4(ScreenPos0.xy, ScreenPos0.z, 1), View.ClipToView); ViewPos4.xyz /= ViewPos4.w;
float3 TileBoxMin = min(ViewPos4.xyz, min(ViewPos0.xyz, ViewPos3.xyz));
float3 TileBoxMax = max(ViewPos4.xyz, max(ViewPos0.xyz, ViewPos3.xyz));
TileBoxCenter = (TileBoxMax + TileBoxMin) * 0.5f;
TileBoxExtent = (TileBoxMax - TileBoxMin) * 0.5f;
#else
// Setup tile frustum planes
float2 TileScale = float2(ViewDimensions.zw - ViewDimensions.xy) * rcp(2 * float2(THREADGROUP_SIZEX, THREADGROUP_SIZEY));
float2 TileBias = TileScale - GroupId.xy;
float4 C1 = float4(View.ViewToClip._11 * TileScale.x, 0.0f, View.ViewToClip._31 * TileScale.x + TileBias.x, 0.0f);
float4 C2 = float4(0.0f, -View.ViewToClip._22 * TileScale.y, View.ViewToClip._32 * TileScale.y + TileBias.y, 0.0f);
float4 C4 = float4(0.0f, 0.0f, 1.0f, 0.0f);
// TODO transform to world space
#if ATOMIC_REDUCTION
float4 frustumPlanes[8];
frustumPlanes[0] = C4 - C1;
frustumPlanes[1] = C4 + C1;
frustumPlanes[2] = C4 - C2;
frustumPlanes[3] = C4 + C2;
frustumPlanes[4] = float4(0.0f, 0.0f, 1.0f, -MinTileZ);
frustumPlanes[5] = float4(0.0f, 0.0f, -1.0f, MaxTileZ2);
frustumPlanes[6] = float4(0.0f, 0.0f, 1.0f, -MinTileZ2);
frustumPlanes[7] = float4(0.0f, 0.0f, -1.0f, MaxTileZ);
#else
float4 frustumPlanes[6];
frustumPlanes[0] = C4 - C1;
frustumPlanes[1] = C4 + C1;
frustumPlanes[2] = C4 - C2;
frustumPlanes[3] = C4 + C2;
frustumPlanes[4] = float4(0.0f, 0.0f, 1.0f, -MinTileZ);
frustumPlanes[5] = float4(0.0f, 0.0f, -1.0f, MaxTileZ);
#endif
// Normalize tile frustum planes
UNROLL
for (uint i = 0; i < 4; ++i)
{
frustumPlanes[i] *= rcp(length(frustumPlanes[i].xyz));
}
#endif
if (ThreadIndex == 0)
{
TileNumReflectionCaptures = 0;
}
GroupMemoryBarrierWithGroupSync();
// Compute per-tile lists of affecting captures through bounds culling
// Each thread now operates on a sample instead of a pixel
LOOP
for (uint CaptureIndex = ThreadIndex; CaptureIndex < NumCaptures && CaptureIndex < MAX_CAPTURES; CaptureIndex += THREADGROUP_TOTALSIZE)
{
float4 CapturePositionAndRadius = ReflectionCapture.PositionAndRadius[CaptureIndex];
float3 BoundsViewPosition = mul(float4(CapturePositionAndRadius.xyz + View.PreViewTranslation.xyz, 1), View.TranslatedWorldToView).xyz;
#if AABB_INTERSECT
// Add this capture to the list of indices if it intersects
BRANCH
if( SphereVsBox( BoundsViewPosition, CapturePositionAndRadius.w, TileBoxCenter, TileBoxExtent ) )
{
uint ListIndex;
InterlockedAdd(TileNumReflectionCaptures, 1U, ListIndex);
TileReflectionCaptureIndices[ListIndex] = CaptureIndex;
}
#else
// Cull the light against the tile's frustum planes
// Note: this has some false positives, a light that is intersecting three different axis frustum planes yet not intersecting the volume of the tile will be treated as intersecting
bool bInTile = true;
// Test against the screen x and y oriented planes first
UNROLL
for (uint i = 0; i < 4; ++i)
{
float PlaneDistance = dot(frustumPlanes[i], float4(BoundsViewPosition, 1.0f));
bInTile = bInTile && (PlaneDistance >= -CapturePositionAndRadius.w);
}
BRANCH
if (bInTile)
{
#if ATOMIC_REDUCTION
bool bInNearDepthRange = true;
// Test against the near depth range
UNROLL
for (uint i = 4; i < 6; ++i)
{
float PlaneDistance = dot(frustumPlanes[i], float4(BoundsViewPosition, 1.0f));
bInNearDepthRange = bInNearDepthRange && (PlaneDistance >= -CapturePositionAndRadius.w);
}
bool bInFarDepthRange = true;
// Test against the far depth range
UNROLL
for (uint j = 6; j < 8; ++j)
{
float PlaneDistance = dot(frustumPlanes[j], float4(BoundsViewPosition, 1.0f));
bInFarDepthRange = bInFarDepthRange && (PlaneDistance >= -CapturePositionAndRadius.w);
}
bool bInDepthRange = bInNearDepthRange || bInFarDepthRange;
#else
bool bInDepthRange = true;
// Test against the depth range
UNROLL
for (uint i = 4; i < 6; ++i)
{
float PlaneDistance = dot(frustumPlanes[i], float4(BoundsViewPosition, 1.0f));
bInDepthRange = bInDepthRange && (PlaneDistance >= -CapturePositionAndRadius.w);
}
#endif
// Add this capture to the list of indices if it intersects
BRANCH
if (bInDepthRange)
{
uint ListIndex;
InterlockedAdd(TileNumReflectionCaptures, 1U, ListIndex);
TileReflectionCaptureIndices[ListIndex] = CaptureIndex;
}
}
#endif
}
GroupMemoryBarrierWithGroupSync();
uint NumCapturesAffectingTile = TileNumReflectionCaptures;
// Sort captures by their original capture index
// This is necessary because the culling used InterlockedAdd to generate compacted array indices,
// Which rearranged the original capture order, in which the captures were sorted smallest to largest on the CPU.
//@todo - parallel stream compaction could be faster than this
#if SORT_CAPTURES
// O(N^2) simple parallel sort
LOOP
for (uint CaptureIndex2 = ThreadIndex; CaptureIndex2 < NumCapturesAffectingTile; CaptureIndex2 += THREADGROUP_TOTALSIZE)
{
// Sort by original capture index
int SortKey = TileReflectionCaptureIndices[CaptureIndex2];
uint NumSmaller = 0;
// Count how many items have a smaller key, so we can insert ourselves into the correct position, without requiring interaction between threads
for (uint OtherSampleIndex = 0; OtherSampleIndex < NumCapturesAffectingTile; OtherSampleIndex++)
{
int OtherSortKey = TileReflectionCaptureIndices[OtherSampleIndex];
if (OtherSortKey < SortKey)
{
NumSmaller++;
}
}
// Move this entry into its sorted position
SortedTileReflectionCaptureIndices[NumSmaller] = TileReflectionCaptureIndices[CaptureIndex2];
}
#endif
GroupMemoryBarrierWithGroupSync();
}
float CountOverlap( float3 WorldPosition )
{
float Overlap = 0;
float Opacity = 1;
uint NumCapturesAffectingTile = TileNumReflectionCaptures;
// Accumulate reflections from captures affecting this tile, applying largest captures first so that the smallest ones display on top
LOOP
for (uint TileCaptureIndex = 0; TileCaptureIndex < NumCapturesAffectingTile; TileCaptureIndex++)
{
BRANCH
if( Opacity < 0.001 )
{
break;
}
uint CaptureIndex = CompositeTileReflectionCaptureIndices[TileCaptureIndex];
float4 CapturePositionAndRadius = ReflectionCapture.PositionAndRadius[CaptureIndex];
float3 CaptureVector = WorldPosition - CapturePositionAndRadius.xyz;
float CaptureVectorLength = length(CaptureVector);
BRANCH
if (CaptureVectorLength < CapturePositionAndRadius.w)
{
float NormalizedDistanceToCapture = saturate(CaptureVectorLength / CapturePositionAndRadius.w);
// Fade out based on distance to capture
float x = saturate( 2.5 * NormalizedDistanceToCapture - 1.5 );
float DistanceAlpha = 1 - x*x*(3 - 2*x);
Overlap += 1;
Opacity *= 1 - DistanceAlpha;
}
}
return Overlap;
}
float3 GatherRadiance(float CompositeAlpha, float3 WorldPosition, float3 RayDirection, float Roughness, float2 ScreenPosition, float IndirectIrradiance, uint ShadingModelID, uint NumCulledReflectionCaptures, uint CaptureDataStartIndex)
{
// Indirect occlusion from DFAO, which should be applied to reflection captures and skylight specular, but not SSR
float IndirectSpecularOcclusion = 1.0f;
float3 ExtraIndirectSpecular = 0;
#if SUPPORT_DFAO_INDIRECT_OCCLUSION
float2 ScreenUV = ScreenPosition * View.ScreenPositionScaleBias.xy + View.ScreenPositionScaleBias.wz;
float IndirectDiffuseOcclusion;
GetDistanceFieldAOSpecularOcclusion(ScreenUV, RayDirection, Roughness, ShadingModelID == SHADINGMODELID_TWOSIDED_FOLIAGE, IndirectSpecularOcclusion, IndirectDiffuseOcclusion, ExtraIndirectSpecular);
// Apply DFAO to IndirectIrradiance before mixing with indirect specular
IndirectIrradiance *= IndirectDiffuseOcclusion;
#endif
return CompositeReflectionCapturesAndSkylight(CompositeAlpha, WorldPosition, RayDirection, Roughness, IndirectIrradiance, IndirectSpecularOcclusion, ExtraIndirectSpecular, NumCulledReflectionCaptures, CaptureDataStartIndex);
}
Texture2D ScreenSpaceReflections;
Texture2D InSceneColor;
/** Output HDR target. */
RWTexture2D<float4> RWOutSceneColor;
[numthreads(THREADGROUP_SIZEX, THREADGROUP_SIZEY, 1)]
void ReflectionEnvironmentTiledDeferredMain(
uint3 GroupId : SV_GroupID,
uint3 DispatchThreadId : SV_DispatchThreadID,
uint3 GroupThreadId : SV_GroupThreadID)
{
uint ThreadIndex = GroupThreadId.y * THREADGROUP_SIZEX + GroupThreadId.x;
uint2 PixelPos = DispatchThreadId.xy + ViewDimensions.xy;
float2 ViewportUV = (float2(DispatchThreadId.xy) + .5f) / (ViewDimensions.zw - ViewDimensions.xy);
float2 ScreenPosition = float2(2.0f, -2.0f) * ViewportUV + float2(-1.0f, 1.0f);
float SceneDepth = CalcSceneDepth(PixelPos);
#if !USE_LIGHT_GRID_REFLECTION_CAPTURE_CULLING
float MinTileZ;
float MaxTileZ;
float MinTileZ2;
float MaxTileZ2;
ComputeTileMinMax(ThreadIndex, SceneDepth, MinTileZ, MaxTileZ, MinTileZ2, MaxTileZ2);
DoTileCulling(GroupId, ThreadIndex, MinTileZ, MaxTileZ, MinTileZ2, MaxTileZ2);
#endif
// Lookup GBuffer properties once per pixel
FScreenSpaceData ScreenSpaceData = GetScreenSpaceDataUint(PixelPos);
FGBufferData GBuffer = ScreenSpaceData.GBuffer;
float4 Color = float4(0, 0, 0, 1);
float4 HomogeneousWorldPosition = mul(float4(ScreenPosition * SceneDepth, SceneDepth, 1), View.ScreenToWorld);
float3 WorldPosition = HomogeneousWorldPosition.xyz / HomogeneousWorldPosition.w;
float3 CameraToPixel = normalize(WorldPosition - View.WorldCameraOrigin);
float3 ReflectionVector = reflect(CameraToPixel, GBuffer.WorldNormal);
float IndirectIrradiance = GBuffer.IndirectIrradiance;
#if ENABLE_SKY_LIGHT && ALLOW_STATIC_LIGHTING
BRANCH
// Add in diffuse contribution from dynamic skylights so reflection captures will have something to mix with
if (SkyLightParameters.y > 0 && SkyLightParameters.z > 0)
{
float2 ScreenUV = ScreenPosition * View.ScreenPositionScaleBias.xy + View.ScreenPositionScaleBias.wz;
IndirectIrradiance += GetDynamicSkyIndirectIrradiance(ScreenUV, GBuffer.WorldNormal);
}
#endif
#if VISUALIZE_OVERLAP
float Overlap = CountOverlap( WorldPosition );
#endif
BRANCH
if( GBuffer.ShadingModelID != SHADINGMODELID_UNLIT && GBuffer.ShadingModelID != SHADINGMODELID_HAIR )
{
float3 N = GBuffer.WorldNormal;
float3 V = -CameraToPixel;
float3 R = 2 * dot( V, N ) * N - V;
float NoV = saturate( dot( N, V ) );
// Point lobe in off-specular peak direction
R = GetOffSpecularPeakReflectionDir(N, R, GBuffer.Roughness);
#if 1
// Note: this texture may also contain planar reflections
float4 SSR = ScreenSpaceReflections.Load( int3(PixelPos, 0) );
Color.rgb = SSR.rgb;
Color.a = 1 - SSR.a;
#endif
if( GBuffer.ShadingModelID == SHADINGMODELID_CLEAR_COAT )
{
const float ClearCoat = GBuffer.CustomData.x;
Color = lerp( Color, float4(0,0,0,1), ClearCoat );
#if CLEAR_COAT_BOTTOM_NORMAL
const float2 oct1 = ((float2(GBuffer.CustomData.a, GBuffer.CustomData.z) * 2) - (256.0/255.0)) + UnitVectorToOctahedron(GBuffer.WorldNormal);
const float3 ClearCoatUnderNormal = OctahedronToUnitVector(oct1);
const float3 BottomEffectiveNormal = ClearCoatUnderNormal;
R = 2 * dot( V, ClearCoatUnderNormal ) * ClearCoatUnderNormal - V;
#endif
}
float AO = ScreenSpaceData.AmbientOcclusion;
float RoughnessSq = GBuffer.Roughness * GBuffer.Roughness;
float SpecularOcclusion = GetSpecularOcclusion(NoV, RoughnessSq, AO);
Color.a *= SpecularOcclusion;
uint NumCulledReflectionCaptures = TileNumReflectionCaptures;
uint DataStartIndex = 0;
#if USE_LIGHT_GRID_REFLECTION_CAPTURE_CULLING
{
uint GridIndex = ComputeLightGridCellIndex(DispatchThreadId.xy, SceneDepth);
uint NumCulledEntryIndex = (ForwardGlobalLightData.NumGridCells + GridIndex) * NUM_CULLED_LIGHTS_GRID_STRIDE;
NumCulledReflectionCaptures = NumCulledLightsGrid[NumCulledEntryIndex + 0];
DataStartIndex = NumCulledLightsGrid[NumCulledEntryIndex + 1];
}
#endif
//bottom for clearcoat or the only reflection.
Color.rgb += GatherRadiance(Color.a, WorldPosition, R, GBuffer.Roughness, ScreenPosition, IndirectIrradiance, GBuffer.ShadingModelID, NumCulledReflectionCaptures, DataStartIndex);
BRANCH
if( GBuffer.ShadingModelID == SHADINGMODELID_CLEAR_COAT )
{
const float ClearCoat = GBuffer.CustomData.x;
const float ClearCoatRoughness = GBuffer.CustomData.y;
// TODO EnvBRDF should have a mask param
float2 AB = PreIntegratedGF.SampleLevel( PreIntegratedGFSampler, float2( NoV, GBuffer.Roughness ), 0 ).rg;
Color.rgb *= GBuffer.SpecularColor * AB.x + AB.y * saturate( 50 * GBuffer.SpecularColor.g ) * (1 - ClearCoat);
// F_Schlick
float F0 = 0.04;
float Fc = Pow5( 1 - NoV );
float F = Fc + (1 - Fc) * F0;
F *= ClearCoat;
float LayerAttenuation = (1 - F);
Color.rgb *= LayerAttenuation;
Color.a = F;
Color.rgb += SSR.rgb * F;
Color.a *= 1 - SSR.a;
Color.a *= SpecularOcclusion;
float3 TopLayerR = 2 * dot( V, N ) * N - V;
Color.rgb += GatherRadiance(Color.a, WorldPosition, TopLayerR, ClearCoatRoughness, ScreenPosition, IndirectIrradiance, GBuffer.ShadingModelID, NumCulledReflectionCaptures, DataStartIndex);
}
else
{
Color.rgb *= EnvBRDF( GBuffer.SpecularColor, GBuffer.Roughness, NoV );
}
}
// Only write to the buffer for threads inside the view
BRANCH
if (all(DispatchThreadId.xy < ViewDimensions.zw))
{
float4 OutColor = 0;
#if VISUALIZE_OVERLAP
//OutColor.rgb = 0.1 * TileNumReflectionCaptures;
OutColor.rgb = 0.1 * Overlap;
#else
OutColor.rgb = Color.rgb;
#endif
// Transform NaNs to black, transform negative colors to black.
OutColor.rgb = -min(-OutColor.rgb, 0.0);
// alpha channel is also added to keep the alpha channel for screen space subsurface scattering
OutColor += InSceneColor.Load( int3(PixelPos, 0) );
RWOutSceneColor[PixelPos.xy] = OutColor;
}
}