Files
UnrealEngineUWP/Engine/Shaders/CapsuleShadowShaders.usf
Gil Gribb e581ead572 Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3045398)
#lockdown Nick.Penwarden
#rb none

==========================
MAJOR FEATURES + CHANGES
==========================

Change 3028958 on 2016/06/27 by Ben.Woodhouse

	Fix for perf issue with GetSingleFinalDataConst

	This was caused by the LPV integration/switch to blendables. Now we cache the flag for the directionalocclusion in the LPV class. This reduces calls to GetSingleFinalDataConst on the blendable data (potentially slow), and makes things a bit cleaner and consistent.

	Tested in QAGame editor (with LPV enabled in ConsoleSettings.ini)

	#jira UE-26179

Change 3029401 on 2016/06/27 by Rolando.Caloca

	DR - More vk logging

Change 3029549 on 2016/06/27 by Uriel.Doyon

	Refactored "r.OnlyStreamInTextures" into "r.Streaming.FullyLoadUsedTextures", making it fully load every used textures, as an alternative to disabling texture streaming.
	New options "r.Streaming.UsePerTextureBias" that assign a  bias between 0 and MipBias to each texture in order to fit in budget.
	Fixed crash when disabling texture streaming.
	Fixed issue when disabling texture streaming that would make current loaded texture low res.
	New logic to prevent retrying to cancel a streaming request more than once.
	Pending load request of one extra mip will not be cancelled anymore.
	Changed UTexture2D from float to double. Also using FApp::GetCurrentTime() instead of FPlatformTime::Seconds().
	#jira UE-32197
	#jira UE-31102

Change 3029837 on 2016/06/27 by David.Hill

	Fixed Shutter SM4 not working when using compute shader eye-adaptation
	#jira UE-32443

	The default eye adaptation value was missing.

Change 3030039 on 2016/06/27 by Uriel.Doyon

	Fix for crash when landscape materials are used in the Texture Streaming Build.
	#jira UE-32196

Change 3030081 on 2016/06/27 by Uriel.Doyon

	Updated MaterialTexCoordScalesPixelShader to use PackedEyeIndex, preventing crash when building the map with stereo rendering enabled.

Change 3030401 on 2016/06/28 by Ben.Woodhouse

	Perf Monitor: Fix for perf warning due to cvar FindConsoleVariable being called too frequently. Tested in QAGame editor (DX11)
	#jira UE-31238

Change 3030607 on 2016/06/28 by Marc.Olano

	Random Number generators: fixed bug in TEA, added integer and float Blum-Blum-Shub. BBS is way cheaper for similar quality, suggest it for future use.

Change 3030627 on 2016/06/28 by Ben.Woodhouse

	Fix for warning. CVar naming scope clash (doesn't appear to happen in vs2015).

Change 3030809 on 2016/06/28 by Marc.Olano

	Noise shader function rename & perf improvement.

	Due to incorrect terminology in internet soruces, previous "Perlin" noise was not, in fact, Perlin noise. Now more accurately called "Value" noise. 6x perf improvement for value noise by changing random number function to BBS. Also updated instruction counts in UI tooltips.

Change 3030850 on 2016/06/28 by Marc.Olano

	Rename & redirect noise material enums. At some point these got switched around and no longer accurately described the noise options the selected. Redirect, so all existing content will continue to work as-is. Updated UDN docs to match.

Change 3030981 on 2016/06/28 by Rolando.Caloca

	DR - vk - More logging

Change 3031056 on 2016/06/28 by Marc.Olano

	Introduce new pure-ALU gradient shader noise. Add noise samples to RenderTest map

Change 3031398 on 2016/06/28 by Benjamin.Hyder

	updating TM-Shadermodels (correcting Mt Rushmore)

Change 3031441 on 2016/06/28 by Marc.Olano

	Use only float version of BBS shader rand function for ES2

Change 3031463 on 2016/06/28 by John.Billon

	Fixed F4 changing the viewmode in Fortnite editor. The detailed lighting viewmode (detaillighting) named in DefaultInput.ini differed from the one in BaseInput.ini(lit_detaillighting).
	#Jira UE-32020

Change 3031512 on 2016/06/28 by Zabir.Hoque

	Relax clear flags for DX12 RHIs.
	Properly flush pending commands before residency is updated.

Change 3031517 on 2016/06/28 by Rolando.Caloca

	DR - vk logging using r.Vulkan.DumpLayer

Change 3032359 on 2016/06/29 by Allan.Bentham

	Fix mobile shadows crash.

Change 3032431 on 2016/06/29 by Gil.Gribb

	Merging //UE4/Dev-Main@3032394 to Dev-Rendering (//UE4/Dev-Rendering)

Change 3032757 on 2016/06/29 by Uriel.Doyon

	Fixed global mip bias being applied twice following integration with main.

Change 3033121 on 2016/06/29 by Rolando.Caloca

	DR - vk - Logging

Change 3033529 on 2016/06/29 by Daniel.Wright

	Null world guard on UReflectionCaptureComponent::ReadbackFromGPU

Change 3033668 on 2016/06/29 by Uriel.Doyon

	Grouped texture streaming settings to simplify logic.
	New options "r.Streaming.UseAllMips" to ignores the different lod and cinematic bias
	#jira UE-32118

Change 3034403 on 2016/06/30 by Rolando.Caloca

	DR - Shorten dumped shader debug strings

Change 3034475 on 2016/06/30 by Rolando.Caloca

	DR - Missing logging

Change 3034722 on 2016/06/30 by Uriel.Doyon

	Improved StreamingAccuracy viewmodes with alpha test and translucent materials
	#jira UE-32656

Change 3034797 on 2016/06/30 by Rolando.Caloca

	DR - vk - 'fix' RHIClear but causes a CPU hang on AMD, so disabled again

Change 3034799 on 2016/06/30 by Rolando.Caloca

	DR - vk - missed file

Change 3034905 on 2016/06/30 by Rolando.Caloca

	DR - vk - Fix for render passes being reused with wrong dimensions

Change 3035503 on 2016/07/01 by Simon.Tovey

	Async compute version of translucency lighting volume clear.

Change 3035577 on 2016/07/01 by Marc.Olano

	Tiling noise. Adds tiling option for gradient, gradient texture, and value noise in the noise material node. Tiling is more expensive, but allows noise functions to be baked into a seamless repeating texture.

Change 3035587 on 2016/07/01 by Ben.Woodhouse

	Fix for async SSAO bug (SSAO Async Compute results are used before the async job wait)

	#jira UE-32709

Change 3035618 on 2016/07/01 by Olaf.Piesche

	Asset fixes

Change 3035692 on 2016/07/01 by Rolando.Caloca

	DR - vk - Deferred deletion queue

Change 3035808 on 2016/07/01 by Rolando.Caloca

	DR - vk - Stat for deletion time, fixed some logging

Change 3036012 on 2016/07/01 by John.Billon

	Alpha Coverage Preservation
	-Textures have a Alpha Preservation Vec4 property which dictates about much of that channel to preserve down the mip chain during mip generation.
	#Jira UE-31986

Change 3036041 on 2016/07/01 by Rolando.Caloca

	DR - vk - Fix for 32bit

Change 3036433 on 2016/07/01 by Rolando.Caloca

	DR - More vk logging

Change 3036935 on 2016/07/04 by Simon.Tovey

	Removing Data Objects

Change 3036942 on 2016/07/04 by Ben.Woodhouse

	Fix for decal rendering resource leak

	The cause was that FD3D11BoundRenderTargets doesn't support setting RTs sparsely. So if one element is NULL, it won't release the ones after it.

	The sparse RT layout happened as a result of a change back in October, which meant that GBuffers for decals could be set sparsely, dependent on whether the decal wrote to the normalbuffer

	This change adds support for sparsely bound rendertargets in FD3D11BoundRenderTargets.

	#jira UE-32602

Change 3037563 on 2016/07/05 by Chris.Bunner

	HLOD self-shadowing in baked lighting fix.

Change 3037640 on 2016/07/05 by Marcus.Wassmer

	Fix bug in USE_GPU_OVERWRITE_CHECKING

Change 3037927 on 2016/07/05 by Rolando.Caloca

	DR - Fix touch pads not showing on Vulkan
	#jira UE-32062

Change 3038085 on 2016/07/05 by Chris.Bunner

	HLOD dynamic shadowing support.
	#jira UE-22627

Change 3038209 on 2016/07/05 by Rolando.Caloca

	DR - vk - Android compile fix

Change 3038644 on 2016/07/05 by Uriel.Doyon

	Added LerpRange that allows to lerp between two rotators without taking the sortest path.

Change 3038820 on 2016/07/05 by Uriel.Doyon

	Selecting streaming accuracy view modes will not automatically generate missing visualization data.

Change 3039332 on 2016/07/06 by John.Billon

	-Made MaxGPUSkinBonesCvar a FAutoConsoleVariableRef and moved it to mesh utilitles from console manager to fix a thread initialization problem.
	#Jira UE-31710

Change 3039454 on 2016/07/06 by Simon.Tovey

	Moved all Niagara files from Engine and UnrealEd to remove dependancies and increase compile times.
	Niagara is now 99.999% decoupled from engine and editor so development should be much streamlined.

	Plus a few other edits to remove Curves/DataObjects that I missed in last CL.

Change 3039517 on 2016/07/06 by Gil.Gribb

	Merging //UE4/Dev-Main@3039013 to Dev-Rendering (//UE4/Dev-Rendering)

Change 3039587 on 2016/07/06 by Rolando.Caloca

	DR - vk logging, submit counter

Change 3039603 on 2016/07/06 by Rolando.Caloca

	DR - Allow more samplers on GL4
	#jira UE-32628
	#jira UE-32744

Change 3039661 on 2016/07/06 by Daniel.Wright

	Fixed non-directional DFAO occlusion on specular 'r.AOSpecularOcclusionMode 0'
	Skylight occlusion tint now applies to specular
	Skylight occlusion tint on diffuse is now correctly affected by DiffuseColor

Change 3039960 on 2016/07/06 by Daniel.Wright

	Forward renderer initial implementation
	* Point and spot lights are culled to a frustum space grid, base pass loops over culled lights.
	* Light culling uses a reverse linked list to avoid a per-cell limit, and the linked list is compacted to an array before the base pass.
	* New cvars to control light culling: r.Forward.MaxCulledLightsPerCell, r.Forward.LightGridSizeZ, r.Forward.LightGridPixelSize
	* A full Z Prepass is forced with forward shading.  This allows deferred rendering before the base pass of shadow projection methods that only rely on depth.
	* Dynamic shadows are packed based on the assigned stationary light ShadowMapChannel, since stationary lights are already restricted to 4 overlapping.
	* GBuffer render targets are still allocated
	* Fixed several issues in parallax corrected base pass reflections - not blending out box shape, discontinuity in reflection vector, not blending with stationary skylight properly
	* Forward shading is now used for TLM_SurfacePerPixelLighting translucency in the deferred path
	* Notable missing features: shadowing of translucency, support for various translucency lighting modes, multiple blended reflection captures

Change 3040050 on 2016/07/06 by Daniel.Wright

	Added r.Shadow.WholeSceneShadowCacheMb, which defaults to 150, to limit how much memory can be spent caching whole scene shadowmaps

Change 3040160 on 2016/07/06 by Daniel.Wright

	Fixed tile artifacts in indirect capsule shadows from doing the scaled sphere vs tile bounding sphere intersection in the wrong space

Change 3040163 on 2016/07/06 by Rolando.Caloca

	DR - vk - More logging

Change 3040257 on 2016/07/06 by Daniel.Wright

	Skylights aren't captured until their level is made visible- fixes the case where skylights capture too early

Change 3040316 on 2016/07/06 by Daniel.Wright

	PerObject shadows from point / spot lights do the light source pull back based on subject box size, not subject radius, since the box is used to find a valid < 90 degree projection.  Fix from licensee

Change 3040361 on 2016/07/06 by Daniel.Wright

	Fixed TexCreate_UAV being used on translucency volume textures in SM4

Change 3040402 on 2016/07/06 by Rolando.Caloca

	DR - vk - Make host mem accesses coherent

Change 3040486 on 2016/07/06 by Daniel.Wright

	CIS fixes

Change 3041028 on 2016/07/07 by Gil.Gribb

	Merging //UE4/Dev-Main@3040917 to Dev-Rendering (//UE4/Dev-Rendering)

Change 3041235 on 2016/07/07 by Simon.Tovey

	Compile fix for FName conflict on UProperty (hopefully).

Change 3041666 on 2016/07/07 by Daniel.Wright

	Fixed TLM_SurfacePerPixelLighting in SM4, falls back to lighting volume

Change 3041731 on 2016/07/07 by Olaf.Piesche

	Adding Niagara to dynamically loaded module list; should fix UE-32915

Change 3042181 on 2016/07/07 by Daniel.Wright

	CIS fix

[CL 3045471 by Gil Gribb in Main branch]
2016-07-11 18:51:20 -04:00

812 lines
31 KiB
Plaintext

// Copyright 1998-2016 Epic Games, Inc. All Rights Reserved.
/*=============================================================================
CapsuleShadowShaders.usf: Tiled deferred culling and shadowing from capsule shapes
=============================================================================*/
#include "Common.usf"
#include "DeferredShadingCommon.usf"
#include "FastMath.usf"
#ifndef THREADGROUP_SIZEX
#define THREADGROUP_SIZEX 1
#endif
#ifndef THREADGROUP_SIZEY
#define THREADGROUP_SIZEY 1
#endif
#ifndef LIGHT_SOURCE_MODE
#define LIGHT_SOURCE_MODE 0
#endif
#define LIGHT_SOURCE_PUNCTUAL 0
#define LIGHT_SOURCE_FROM_CAPSULE 1
#define LIGHT_SOURCE_FROM_RECEIVER 2
#define MAX_INTERSECTING_SHAPES 512
groupshared uint IntersectingShapeIndices[MAX_INTERSECTING_SHAPES * 2];
uint NumShadowCapsules;
Buffer<float4> ShadowCapsuleShapes;
#if LIGHT_SOURCE_MODE == LIGHT_SOURCE_FROM_CAPSULE
Buffer<float4> LightDirectionData;
#endif
bool SphereIntersectCone(float4 SphereCenterAndRadius, float3 ConeVertex, float3 ConeAxis, float ConeAngleCos, float ConeAngleSin)
{
float3 U = ConeVertex - (SphereCenterAndRadius.w / ConeAngleSin) * ConeAxis;
float3 D = SphereCenterAndRadius.xyz - U;
float DSizeSq = dot(D, D);
float E = dot(ConeAxis, D);
if (E > 0 && E * E >= DSizeSq * ConeAngleCos * ConeAngleCos)
{
D = SphereCenterAndRadius.xyz - ConeVertex;
DSizeSq = dot(D, D);
E = -dot(ConeAxis, D);
if (E > 0 && E * E >= DSizeSq * ConeAngleSin * ConeAngleSin)
{
return DSizeSq <= SphereCenterAndRadius.w * SphereCenterAndRadius.w;
}
else
{
return true;
}
}
return false;
}
bool SphereIntersectConeWithMaxDistance(float4 SphereCenterAndRadius, float3 ConeVertex, float3 ConeAxis, float ConeAngleCos, float ConeAngleSin, float MaxDistanceAlongAxis)
{
if (SphereIntersectCone(SphereCenterAndRadius, ConeVertex, ConeAxis, ConeAngleCos, ConeAngleSin))
{
float ConeAxisDistance = dot(SphereCenterAndRadius.xyz - ConeVertex, ConeAxis);
float ConeAxisDistanceMax = ConeAxisDistance - SphereCenterAndRadius.w;
return ConeAxisDistanceMax < MaxDistanceAlongAxis;
}
return false;
}
bool SphereIntersectSphere(float4 SphereCenterAndRadius, float4 OtherSphereCenterAndRadius)
{
float CombinedRadii = SphereCenterAndRadius.w + OtherSphereCenterAndRadius.w;
float3 VectorBetweenCenters = SphereCenterAndRadius.xyz - OtherSphereCenterAndRadius.xyz;
return dot(VectorBetweenCenters, VectorBetweenCenters) < CombinedRadii * CombinedRadii;
}
#if LIGHT_SOURCE_MODE == LIGHT_SOURCE_PUNCTUAL
/** From point being shaded toward light, for directional lights. */
float3 LightDirection;
float4 LightPositionAndInvRadius;
float LightSourceRadius;
float RayStartOffsetDepthScale;
float3 LightAngleAndNormalThreshold;
#endif
uint4 ScissorRectMinAndSize;
float2 NumGroups;
/** Min and Max depth for this tile. */
groupshared uint IntegerTileMinZ;
groupshared uint IntegerTileMaxZ;
/** Inner Min and Max depth for this tile. */
groupshared uint IntegerTileMinZ2;
groupshared uint IntegerTileMaxZ2;
/** Number of capsules affecting the tile, after culling. */
groupshared uint TileNumCapsules0;
groupshared uint TileNumCapsules1;
struct FTileCullingData
{
float4 BoundingSphere;
#if LIGHT_SOURCE_MODE == LIGHT_SOURCE_PUNCTUAL
float3 ConeAxis;
float ConeAngleCos;
float ConeAngleSin;
#endif
};
void SetupTileCullingData(
float SceneDepth,
float MaxDepth,
uint ThreadIndex,
uint2 GroupId,
out FTileCullingData TileCullingData0,
out FTileCullingData TileCullingData1,
out bool bTileShouldComputeShadowing,
out uint GroupIndex)
{
// Initialize per-tile variables
if (ThreadIndex == 0)
{
IntegerTileMinZ = 0x7F7FFFFF;
IntegerTileMaxZ = 0;
IntegerTileMinZ2 = 0x7F7FFFFF;
IntegerTileMaxZ2 = 0;
TileNumCapsules0 = 0;
TileNumCapsules1 = 0;
}
GroupMemoryBarrierWithGroupSync();
// Use shared memory atomics to build the depth bounds for this tile
// Each thread is assigned to a pixel at this point
//@todo - move depth range computation to a central point where it can be reused by all the frame's tiled deferred passes!
if (SceneDepth < MaxDepth)
{
InterlockedMin(IntegerTileMinZ, asuint(SceneDepth));
InterlockedMax(IntegerTileMaxZ, asuint(SceneDepth));
}
GroupMemoryBarrierWithGroupSync();
float MinTileZ = asfloat(IntegerTileMinZ);
float MaxTileZ = asfloat(IntegerTileMaxZ);
float HalfZ = .5f * (MinTileZ + MaxTileZ);
if (SceneDepth < MaxDepth)
{
// Compute a second min and max Z, clipped by HalfZ, so that we get two depth bounds per tile
// This results in more conservative tile depth bounds and fewer intersections
if (SceneDepth >= HalfZ)
{
InterlockedMin(IntegerTileMinZ2, asuint(SceneDepth));
}
if (SceneDepth <= HalfZ)
{
InterlockedMax(IntegerTileMaxZ2, asuint(SceneDepth));
}
}
GroupMemoryBarrierWithGroupSync();
float MinTileZ2 = asfloat(IntegerTileMinZ2);
float MaxTileZ2 = asfloat(IntegerTileMaxZ2);
bTileShouldComputeShadowing = true;
if (IntegerTileMinZ == 0x7F7FFFFF && IntegerTileMaxZ == 0)
{
bTileShouldComputeShadowing = false;
}
float3 ViewTileMin;
float3 ViewTileMax;
float3 ViewTileMin2;
float3 ViewTileMax2;
bool bCenteredProjection = abs(View.ViewToClip[1][0]) < .00001f && abs(View.ViewToClip[2][0]) < .00001f;
BRANCH
// Off center projection path uses 37 more asm instructions
if (bCenteredProjection)
{
float2 TanViewFOV = GetTanHalfFieldOfView();
// tan(FOV) = HalfUnitPlaneWidth / 1, so TanViewFOV * 2 is the size of the whole unit view plane
// We are operating on a subset of that defined by ScissorRectMinAndSize
float2 TileSize = TanViewFOV * 2 * ScissorRectMinAndSize.zw / ((float2)View.ViewSizeAndInvSize.xy * NumGroups);
float2 UnitPlaneMin = -TanViewFOV + TanViewFOV * 2 * (ScissorRectMinAndSize.xy - View.ViewRectMin.xy) * View.ViewSizeAndInvSize.zw;
float2 UnitPlaneTileMin = (GroupId.xy * TileSize + UnitPlaneMin) * float2(1, -1);
float2 UnitPlaneTileMax = ((GroupId.xy + 1) * TileSize + UnitPlaneMin) * float2(1, -1);
ViewTileMin.xy = min(MinTileZ * UnitPlaneTileMin, MaxTileZ2 * UnitPlaneTileMin);
ViewTileMax.xy = max(MinTileZ * UnitPlaneTileMax, MaxTileZ2 * UnitPlaneTileMax);
ViewTileMin.z = MinTileZ;
ViewTileMax.z = MaxTileZ2;
ViewTileMin2.xy = min(MinTileZ2 * UnitPlaneTileMin, MaxTileZ * UnitPlaneTileMin);
ViewTileMax2.xy = max(MinTileZ2 * UnitPlaneTileMax, MaxTileZ * UnitPlaneTileMax);
ViewTileMin2.z = MinTileZ2;
ViewTileMax2.z = MaxTileZ;
}
else
{
float2 TileSize = 2 * ScissorRectMinAndSize.zw / ((float2)View.ViewSizeAndInvSize.xy * NumGroups);
float2 UnitPlaneMin = -1 + 2 * (ScissorRectMinAndSize.xy - View.ViewRectMin.xy) * View.ViewSizeAndInvSize.zw;
float2 UnitPlaneTileMin = (GroupId.xy * TileSize + UnitPlaneMin) * float2(1, -1);
float2 UnitPlaneTileMax = ((GroupId.xy + 1) * TileSize + UnitPlaneMin) * float2(1, -1);
{
float MinTileDeviceZ = ConvertToDeviceZ(MinTileZ);
float4 MinDepthMinCorner = mul(float4(UnitPlaneTileMin.x, UnitPlaneTileMin.y, MinTileDeviceZ, 1), View.ClipToView);
float4 MinDepthMaxCorner = mul(float4(UnitPlaneTileMax.x, UnitPlaneTileMax.y, MinTileDeviceZ, 1), View.ClipToView);
float MaxTileDeviceZ = ConvertToDeviceZ(MaxTileZ2);
float4 MaxDepthMinCorner = mul(float4(UnitPlaneTileMin.x, UnitPlaneTileMin.y, MaxTileDeviceZ, 1), View.ClipToView);
float4 MaxDepthMaxCorner = mul(float4(UnitPlaneTileMax.x, UnitPlaneTileMax.y, MaxTileDeviceZ, 1), View.ClipToView);
ViewTileMin.xy = min(MinDepthMinCorner.xy / MinDepthMinCorner.w, MaxDepthMinCorner.xy / MaxDepthMinCorner.w);
ViewTileMax.xy = max(MinDepthMaxCorner.xy / MinDepthMaxCorner.w, MaxDepthMaxCorner.xy / MaxDepthMaxCorner.w);
ViewTileMin.z = MinTileZ;
ViewTileMax.z = MaxTileZ2;
}
{
float MinTileDeviceZ = ConvertToDeviceZ(MinTileZ2);
float4 MinDepthMinCorner = mul(float4(UnitPlaneTileMin.x, UnitPlaneTileMin.y, MinTileDeviceZ, 1), View.ClipToView);
float4 MinDepthMaxCorner = mul(float4(UnitPlaneTileMax.x, UnitPlaneTileMax.y, MinTileDeviceZ, 1), View.ClipToView);
float MaxTileDeviceZ = ConvertToDeviceZ(MaxTileZ);
float4 MaxDepthMinCorner = mul(float4(UnitPlaneTileMin.x, UnitPlaneTileMin.y, MaxTileDeviceZ, 1), View.ClipToView);
float4 MaxDepthMaxCorner = mul(float4(UnitPlaneTileMax.x, UnitPlaneTileMax.y, MaxTileDeviceZ, 1), View.ClipToView);
ViewTileMin2.xy = min(MinDepthMinCorner.xy / MinDepthMinCorner.w, MaxDepthMinCorner.xy / MaxDepthMinCorner.w);
ViewTileMax2.xy = max(MinDepthMaxCorner.xy / MinDepthMaxCorner.w, MaxDepthMaxCorner.xy / MaxDepthMaxCorner.w);
ViewTileMin2.z = MinTileZ2;
ViewTileMax2.z = MaxTileZ;
}
}
float3 ViewGroup0Center = (ViewTileMax + ViewTileMin) / 2;
TileCullingData0.BoundingSphere.xyz = mul(float4(ViewGroup0Center, 1), View.ViewToTranslatedWorld).xyz - View.PreViewTranslation;
TileCullingData0.BoundingSphere.w = length(ViewGroup0Center - ViewTileMax);
float3 ViewGroup1Center = (ViewTileMax2 + ViewTileMin2) / 2;
TileCullingData1.BoundingSphere.xyz = mul(float4(ViewGroup1Center, 1), View.ViewToTranslatedWorld).xyz - View.PreViewTranslation;
TileCullingData1.BoundingSphere.w = length(ViewGroup1Center - ViewTileMax2);
#if LIGHT_SOURCE_MODE == LIGHT_SOURCE_PUNCTUAL
#if POINT_LIGHT
float3 LightVector0 = LightPositionAndInvRadius.xyz - TileCullingData0.BoundingSphere.xyz;
float LightVector0Length = length(LightVector0);
float3 LightVector1 = LightPositionAndInvRadius.xyz - TileCullingData1.BoundingSphere.xyz;
float LightVector1Length = length(LightVector1);
TileCullingData0.ConeAxis = LightVector0 / LightVector0Length;
TileCullingData1.ConeAxis = LightVector1 / LightVector1Length;;
float TanLightAngle0 = LightSourceRadius / LightVector0Length;
float TanLightAngle1 = LightSourceRadius / LightVector1Length;
TileCullingData0.ConeAngleCos = 1.0f / sqrt(1 + TanLightAngle0 * TanLightAngle0);
TileCullingData0.ConeAngleSin = TileCullingData0.ConeAngleCos * TanLightAngle0;
TileCullingData1.ConeAngleCos = 1.0f / sqrt(1 + TanLightAngle1 * TanLightAngle1);
TileCullingData1.ConeAngleSin = TileCullingData1.ConeAngleCos * TanLightAngle1;
// Don't operate on tiles completely outside of the light's influence
bool bTileInLightInfluenceBounds = LightVector0Length < 1.0f / LightPositionAndInvRadius.w + TileCullingData0.BoundingSphere.w
|| LightVector1Length < 1.0f / LightPositionAndInvRadius.w + TileCullingData1.BoundingSphere.w;
bTileShouldComputeShadowing = bTileShouldComputeShadowing && bTileInLightInfluenceBounds;
#else
TileCullingData0.ConeAxis = TileCullingData1.ConeAxis = LightDirection;
TileCullingData0.ConeAngleCos = TileCullingData1.ConeAngleCos = cos(LightAngleAndNormalThreshold.x);
TileCullingData0.ConeAngleSin = TileCullingData1.ConeAngleSin = sin(LightAngleAndNormalThreshold.x);
#endif
#endif
GroupIndex = SceneDepth > MaxTileZ2 ? 1 : 0;
}
// Scaled sphere intersection allows capsule shadows to blend together better when penumbras are large, so use for indirect.
// Otherwise an occluder sphere will be extracted from the capsule and used for shadowing.
// This maintains shadow silhouette shapes better but has a discontinuity when the capsule direction is nearly parallel to the light direction.
#define USE_SCALED_SPHERE_INTERSECTION (LIGHT_SOURCE_MODE != LIGHT_SOURCE_PUNCTUAL)
uint CullCapsuleShapesToTile(
uint ThreadIndex,
uint GroupIndex,
float MaxOcclusionDistance,
FTileCullingData TileCullingData0,
FTileCullingData TileCullingData1)
{
#if LIGHT_SOURCE_MODE == LIGHT_SOURCE_PUNCTUAL
float3 ConeAxis0 = TileCullingData0.ConeAxis;
float ConeAngleCos0 = TileCullingData0.ConeAngleCos;
float ConeAngleSin0 = TileCullingData0.ConeAngleSin;
float3 ConeAxis1 = TileCullingData1.ConeAxis;
float ConeAngleCos1 = TileCullingData1.ConeAngleCos;
float ConeAngleSin1 = TileCullingData1.ConeAngleSin;
#endif
LOOP
for (uint ShapeIndex = ThreadIndex; ShapeIndex < NumShadowCapsules; ShapeIndex += THREADGROUP_SIZEX * THREADGROUP_SIZEY)
{
#if LIGHT_SOURCE_MODE == LIGHT_SOURCE_FROM_CAPSULE
float4 LightData = LightDirectionData[ShapeIndex];
float3 ConeAxis0 = LightData.xyz;
float LightAngle = LightData.w;
float ConeAngleCos0 = cos(LightAngle);
float ConeAngleSin0 = sin(LightAngle);
float3 ConeAxis1 = ConeAxis0;
float ConeAngleCos1 = ConeAngleCos0;
float ConeAngleSin1 = ConeAngleSin0;
#endif
float4 SphereCenterAndRadius = ShadowCapsuleShapes[ShapeIndex * 2];
float3 TransformedSphereCenter = SphereCenterAndRadius.xyz;
float TransformedSphereRadius = SphereCenterAndRadius.w;
float3 TransformedTileBoundingSphereCenter0 = TileCullingData0.BoundingSphere.xyz;
float3 TransformedTileBoundingSphereCenter1 = TileCullingData1.BoundingSphere.xyz;
#if LIGHT_SOURCE_MODE != LIGHT_SOURCE_FROM_RECEIVER
float3 TransformedConeAxis0 = ConeAxis0;
float3 TransformedConeAxis1 = ConeAxis1;
#endif
#if USE_SCALED_SPHERE_INTERSECTION
float4 CapsuleCenterAndRadius = SphereCenterAndRadius;
float4 CapsuleOrientationAndLength = ShadowCapsuleShapes[ShapeIndex * 2 + 1];
float3 CapsuleSpaceX;
float3 CapsuleSpaceY;
float3 CapsuleSpaceZ = CapsuleOrientationAndLength.xyz;
GenerateCoordinateSystem(CapsuleSpaceZ, CapsuleSpaceX, CapsuleSpaceY);
// Scale required along the capsule's axis to turn it into a sphere (assuming it was originally a scaled sphere instead of a capsule)
float CapsuleZScale = CapsuleCenterAndRadius.w / (.5f * CapsuleOrientationAndLength.w + CapsuleCenterAndRadius.w);
CapsuleSpaceZ *= CapsuleZScale;
// The capsule is centered at 0 in the scaled sphere space
TransformedSphereCenter = 0;
// After scaling along the capsule axis it will become a sphere with the original radius
TransformedSphereRadius = SphereCenterAndRadius.w;
// Transform the sphere center and cone axis into the scaled sphere space
float3 CapsuleCenterToTileCenter0 = TileCullingData0.BoundingSphere.xyz - CapsuleCenterAndRadius.xyz;
TransformedTileBoundingSphereCenter0 = float3(dot(CapsuleCenterToTileCenter0, CapsuleSpaceX), dot(CapsuleCenterToTileCenter0, CapsuleSpaceY), dot(CapsuleCenterToTileCenter0, CapsuleSpaceZ));
float3 CapsuleCenterToTileCenter1 = TileCullingData1.BoundingSphere.xyz - CapsuleCenterAndRadius.xyz;
TransformedTileBoundingSphereCenter1 = float3(dot(CapsuleCenterToTileCenter1, CapsuleSpaceX), dot(CapsuleCenterToTileCenter1, CapsuleSpaceY), dot(CapsuleCenterToTileCenter1, CapsuleSpaceZ));
#if LIGHT_SOURCE_MODE != LIGHT_SOURCE_FROM_RECEIVER
// Renormalize the cone axis as it went through a non-uniformly scaled transform
TransformedConeAxis0 = normalize(float3(dot(ConeAxis0, CapsuleSpaceX), dot(ConeAxis0, CapsuleSpaceY), dot(ConeAxis0, CapsuleSpaceZ)));
TransformedConeAxis1 = normalize(float3(dot(ConeAxis1, CapsuleSpaceX), dot(ConeAxis1, CapsuleSpaceY), dot(ConeAxis1, CapsuleSpaceZ)));
#endif
#else
float CapsuleLength = ShadowCapsuleShapes[ShapeIndex * 2 + 1].w;
// Add half capsule length to bounding sphere
TransformedSphereRadius = SphereCenterAndRadius.w + .5f * CapsuleLength;
#endif
BRANCH
if (SphereIntersectSphere(float4(TransformedSphereCenter, TransformedSphereRadius + MaxOcclusionDistance), float4(TransformedTileBoundingSphereCenter0, TileCullingData0.BoundingSphere.w))
#if LIGHT_SOURCE_MODE != LIGHT_SOURCE_FROM_RECEIVER
&& SphereIntersectConeWithMaxDistance(float4(TransformedSphereCenter, TransformedSphereRadius + TileCullingData0.BoundingSphere.w), TransformedTileBoundingSphereCenter0, TransformedConeAxis0, ConeAngleCos0, ConeAngleSin0, MaxOcclusionDistance)
#endif
)
{
uint ListIndex;
InterlockedAdd(TileNumCapsules0, 1U, ListIndex);
// Don't overwrite on overflow
ListIndex = min(ListIndex, (uint)(MAX_INTERSECTING_SHAPES - 1));
IntersectingShapeIndices[MAX_INTERSECTING_SHAPES * 0 + ListIndex] = ShapeIndex;
}
BRANCH
if (SphereIntersectSphere(float4(TransformedSphereCenter, TransformedSphereRadius + MaxOcclusionDistance), float4(TransformedTileBoundingSphereCenter1, TileCullingData1.BoundingSphere.w))
#if LIGHT_SOURCE_MODE != LIGHT_SOURCE_FROM_RECEIVER
&& SphereIntersectConeWithMaxDistance(float4(TransformedSphereCenter, TransformedSphereRadius + TileCullingData1.BoundingSphere.w), TransformedTileBoundingSphereCenter1, TransformedConeAxis1, ConeAngleCos1, ConeAngleSin1, MaxOcclusionDistance)
#endif
)
{
uint ListIndex;
InterlockedAdd(TileNumCapsules1, 1U, ListIndex);
// Don't write out of bounds on overflow
ListIndex = min(ListIndex, (uint)(MAX_INTERSECTING_SHAPES - 1));
IntersectingShapeIndices[MAX_INTERSECTING_SHAPES * 1 + ListIndex] = ShapeIndex;
}
}
GroupMemoryBarrierWithGroupSync();
return min(GroupIndex == 0 ? TileNumCapsules0 : TileNumCapsules1, (uint)MAX_INTERSECTING_SHAPES);
}
// Approximate the area of intersection of two spherical caps, from 'Ambient Aperture Lighting'
// fRadius0 : First caps radius (arc length in radians)
// fRadius1 : Second caps radius (in radians)
// fDist : Distance between caps (radians between centers of caps)
float SphericalCapIntersectionAreaFast(float fRadius0, float fRadius1, float fDist)
{
float fArea;
if ( fDist <= max(fRadius0, fRadius1) - min(fRadius0, fRadius1) )
{
// One cap is completely inside the other
fArea = 6.283185308f - 6.283185308f * cos( min(fRadius0,fRadius1) );
}
else if ( fDist >= fRadius0 + fRadius1 )
{
// No intersection exists
fArea = 0;
}
else
{
float fDiff = abs(fRadius0 - fRadius1);
fArea = smoothstep(0.0f,
1.0f,
1.0f - saturate((fDist-fDiff)/(fRadius0+fRadius1-fDiff)));
fArea *= 6.283185308f - 6.283185308f * cos( min(fRadius0,fRadius1) );
}
return fArea;
}
float ShadowConeTraceAgainstCulledCapsuleShapes(
float3 WorldRayStart,
float3 UnitRayDirection,
float LightAngle,
float InvMaxOcclusionDistance,
uint CulledDataParameter,
uint NumIntersectingCapsules,
uniform bool bUseCulling)
{
float ConeVisibility = 1;
float AreaOfLight = 6.283185308f - 6.283185308f * cos(LightAngle);
LOOP
for (uint ListObjectIndex = 0; ListObjectIndex < NumIntersectingCapsules; ListObjectIndex++)
{
uint ObjectIndex;
if (bUseCulling)
{
uint GroupIndex = CulledDataParameter;
ObjectIndex = IntersectingShapeIndices[MAX_INTERSECTING_SHAPES * GroupIndex + ListObjectIndex];
}
else
{
ObjectIndex = ListObjectIndex;
}
#if LIGHT_SOURCE_MODE == LIGHT_SOURCE_FROM_CAPSULE
float4 LightData = LightDirectionData[ObjectIndex];
UnitRayDirection = LightData.xyz;
LightAngle = LightData.w;
AreaOfLight = 6.283185308f - 6.283185308f * cos(LightAngle);
#endif
#define OVERRIDE_LIGHT_DEBUG 0
#if OVERRIDE_LIGHT_DEBUG
//UnitRayDirection = normalize(float3(.2f, .2f, .8f));
UnitRayDirection = float3(0, 0, 1);
LightAngle = .3f;
#endif
float4 CapsuleCenterAndRadius = ShadowCapsuleShapes[ObjectIndex * 2];
float4 CapsuleOrientationAndLength = ShadowCapsuleShapes[ObjectIndex * 2 + 1];
float DistanceToShadowSphere;
float3 UnitVectorToShadowSphere;
float3 UnitRayDirectionInCorrectSpace = UnitRayDirection;
BRANCH
if (CapsuleOrientationAndLength.w > 0)
{
#if USE_SCALED_SPHERE_INTERSECTION
float3 CapsuleSpaceX;
float3 CapsuleSpaceY;
float3 CapsuleSpaceZ = CapsuleOrientationAndLength.xyz;
GenerateCoordinateSystem(CapsuleSpaceZ, CapsuleSpaceX, CapsuleSpaceY);
float CapsuleZScale = CapsuleCenterAndRadius.w / (.5f * CapsuleOrientationAndLength.w + CapsuleCenterAndRadius.w);
CapsuleSpaceZ *= CapsuleZScale;
float3 CapsuleCenterToRayStart = WorldRayStart - CapsuleCenterAndRadius.xyz;
float3 CapsuleSpaceRayStart = float3(dot(CapsuleCenterToRayStart, CapsuleSpaceX), dot(CapsuleCenterToRayStart, CapsuleSpaceY), dot(CapsuleCenterToRayStart, CapsuleSpaceZ));
float3 CapsuleSpaceRayDirection = float3(dot(UnitRayDirection, CapsuleSpaceX), dot(UnitRayDirection, CapsuleSpaceY), dot(UnitRayDirection, CapsuleSpaceZ));
DistanceToShadowSphere = length(CapsuleSpaceRayStart);
UnitVectorToShadowSphere = -CapsuleSpaceRayStart / DistanceToShadowSphere;
UnitRayDirectionInCorrectSpace = normalize(CapsuleSpaceRayDirection);
#else
float3 VectorToCapsuleCenter = CapsuleCenterAndRadius.xyz - WorldRayStart;
// Closest point on line segment to ray
float3 L01 = CapsuleOrientationAndLength.xyz * CapsuleOrientationAndLength.w;
float3 L0 = VectorToCapsuleCenter - 0.5 * L01;
float3 L1 = VectorToCapsuleCenter + 0.5 * L01;
// The below is computing the shortest distance between capsule line segment and ray
float CapsuleOrientationProjectedOntoRay = dot(UnitRayDirection, L01);
// Vector that spans L01 perpendicular to the ray
float3 PerpendicularSpanningVector = CapsuleOrientationProjectedOntoRay * UnitRayDirection - L01;
// Length of PerpendicularSpanningVector using the right triangle formed by L01 and UnitRayDirection * CapsuleOrientationProjectedOntoRay
float PerpendicularDistance = Square(CapsuleOrientationAndLength.w) - CapsuleOrientationProjectedOntoRay * CapsuleOrientationProjectedOntoRay;
// Project the vector to a capsule endpoint onto the perpendicular spanning vector, normalized
float t = saturate(dot(L0, PerpendicularSpanningVector) / PerpendicularDistance);
// Compute the vector to the shadow sphere which best approximates the capsule's shadowing
float3 VectorToShadowSphere = L0 + t * L01;
DistanceToShadowSphere = length(VectorToShadowSphere);
UnitVectorToShadowSphere = VectorToShadowSphere / DistanceToShadowSphere;
// The above 'best shadow sphere' calculation doesn't take into account the projected solid angle of the potential shadow spheres
// As a result, there's a discontinuity when the capsule and the ray point in nearly the same direction, where the far end of the capsule gets chosen
// Here we mitigate the effect by overriding the distance to shadow sphere if one of the capsule end points was closer
DistanceToShadowSphere = min(DistanceToShadowSphere, length(L0));
DistanceToShadowSphere = min(DistanceToShadowSphere, length(L1));
#endif
}
else
{
DistanceToShadowSphere = length(CapsuleCenterAndRadius.xyz - WorldRayStart);
UnitVectorToShadowSphere = (CapsuleCenterAndRadius.xyz - WorldRayStart) / DistanceToShadowSphere;
}
float AngleBetween = acosFast(dot(UnitVectorToShadowSphere, UnitRayDirectionInCorrectSpace));
float ConeConeIntersection = 1 - saturate(SphericalCapIntersectionAreaFast(LightAngle, atanFastPos(CapsuleCenterAndRadius.w / DistanceToShadowSphere), AngleBetween) / AreaOfLight);
float DistanceFadeAlpha = saturate(DistanceToShadowSphere * InvMaxOcclusionDistance * 3 - 2);
ConeConeIntersection = lerp(ConeConeIntersection, 1, DistanceFadeAlpha);
ConeVisibility *= ConeConeIntersection;
}
return ConeVisibility;
}
#if APPLY_TO_BENT_NORMAL
Texture2D ReceiverBentNormalTexture;
RWTexture2D<float4> RWBentNormalTexture;
#endif
float ReduceSelfShadowingIntensity;
uint DownsampleFactor;
RWTexture2D<float2> RWShadowFactors;
float MaxOcclusionDistance;
float MinVisibility;
uint2 TileDimensions;
RWBuffer<uint> RWTileIntersectionCounts;
[numthreads(THREADGROUP_SIZEX, THREADGROUP_SIZEY, 1)]
void CapsuleShadowingCS(
uint3 GroupId : SV_GroupID,
uint3 DispatchThreadId : SV_DispatchThreadID,
uint3 GroupThreadId : SV_GroupThreadID)
{
uint ThreadIndex = GroupThreadId.y * THREADGROUP_SIZEX + GroupThreadId.x;
float2 ScreenUV = float2((DispatchThreadId.xy * DownsampleFactor + ScissorRectMinAndSize.xy + .5f) * View.BufferSizeAndInvSize.zw);
float2 ScreenPosition = (ScreenUV.xy - View.ScreenPositionScaleBias.wz) / View.ScreenPositionScaleBias.xy;
#if LIGHT_SOURCE_MODE == LIGHT_SOURCE_FROM_RECEIVER
float4 ReceiverTextureValue = ReceiverBentNormalTexture.Load(DispatchThreadId.xyz);
float3 ReceiverBentNormal = ReceiverTextureValue.xyz;
float SceneDepth = ReceiverTextureValue.w;
#else
float SceneDepth = CalcSceneDepth(ScreenUV);
#endif
float4 HomogeneousWorldPosition = mul(float4(ScreenPosition * SceneDepth, SceneDepth, 1), View.ScreenToWorld);
float3 OpaqueWorldPosition = HomogeneousWorldPosition.xyz / HomogeneousWorldPosition.w;
uint CulledDataParameter = 0;
bool bTileShouldComputeShadowing = true;
FTileCullingData TileCullingData0;
FTileCullingData TileCullingData1;
uint NumPixelIntersectingShapes = 0;
uint NumTileIntersectingShapes = 0;
// So we can skip skybox pixels / tiles without having to check the GBuffer for shading model
float MaxDepth = 20000;
#define USE_CULLING 1
#if USE_CULLING
SetupTileCullingData(SceneDepth, MaxDepth, ThreadIndex, GroupId.xy, TileCullingData0, TileCullingData1, bTileShouldComputeShadowing, CulledDataParameter);
#endif // USE_CULLING
float Visibility = 1;
BRANCH
if (bTileShouldComputeShadowing)
{
// World space offset along the start of the ray to avoid incorrect self-shadowing
float RayStartOffset = 0;
#if LIGHT_SOURCE_MODE == LIGHT_SOURCE_PUNCTUAL
#if POINT_LIGHT
float3 LightVector = LightPositionAndInvRadius.xyz - OpaqueWorldPosition;
float LightVectorLength = length(LightVector);
float3 WorldRayStart = OpaqueWorldPosition + LightVector / LightVectorLength * RayStartOffset;
float3 UnitRayDirection = (LightPositionAndInvRadius.xyz - OpaqueWorldPosition) / LightVectorLength;
float LightAngle = atanFastPos(LightSourceRadius / LightVectorLength);
#else
float3 WorldRayStart = OpaqueWorldPosition + LightDirection * RayStartOffset;
float3 UnitRayDirection = LightDirection;
float LightAngle = LightAngleAndNormalThreshold.x;
#endif
#elif LIGHT_SOURCE_MODE == LIGHT_SOURCE_FROM_RECEIVER
float3 WorldRayStart = OpaqueWorldPosition;
float BentNormalLength = length(ReceiverBentNormal);
float3 UnitRayDirection = ReceiverBentNormal / max(BentNormalLength, .00001f);
float LightAngle = max(BentNormalLength * .5f * PI, PI / 8);
#else
float3 WorldRayStart = OpaqueWorldPosition;
float3 UnitRayDirection = 0;
float LightAngle = 0;
#endif
uint NumIntersectingCapsules = NumShadowCapsules;
#if USE_CULLING
NumIntersectingCapsules = CullCapsuleShapesToTile(
ThreadIndex,
CulledDataParameter,
MaxOcclusionDistance,
TileCullingData0,
TileCullingData1);
NumTileIntersectingShapes = TileNumCapsules0 + TileNumCapsules1;
#else
NumTileIntersectingShapes = NumShadowCapsules;
#endif
NumPixelIntersectingShapes += NumIntersectingCapsules;
Visibility *= ShadowConeTraceAgainstCulledCapsuleShapes(
WorldRayStart,
UnitRayDirection,
LightAngle,
1.0f / MaxOcclusionDistance,
CulledDataParameter,
NumIntersectingCapsules,
USE_CULLING ? true : false);
#if !APPLY_TO_BENT_NORMAL
if (all(GroupThreadId.xy == 0) && all(GroupId.xy < TileDimensions))
{
RWTileIntersectionCounts[GroupId.y * TileDimensions.x + GroupId.x] = NumTileIntersectingShapes;
}
#endif
}
#if LIGHT_SOURCE_MODE == LIGHT_SOURCE_FROM_CAPSULE && !FORWARD_SHADING
BRANCH
if (ReduceSelfShadowingIntensity > 0)
{
FGBufferData GBufferData = GetGBufferData(ScreenUV);
// Reduce self shadowing intensity
Visibility = lerp(Visibility, 1, HasDistanceFieldRepresentation(GBufferData) ? .8f : 0);
}
#endif
#if LIGHT_SOURCE_MODE != LIGHT_SOURCE_PUNCTUAL
// Apply to indirect shadows only
Visibility = lerp(MinVisibility, 1, Visibility);
#endif
//Visibility = NumPixelIntersectingShapes / 20.0f;
//Visibility = bTileShouldComputeShadowing ? 1 : 0;
#if APPLY_TO_BENT_NORMAL
#if LIGHT_SOURCE_MODE != LIGHT_SOURCE_FROM_RECEIVER
float3 ReceiverBentNormal = ReceiverBentNormalTexture.Load(DispatchThreadId.xyz).xyz;
#endif
RWBentNormalTexture[DispatchThreadId.xy] = float4(ReceiverBentNormal * Visibility, SceneDepth);
#else
RWShadowFactors[DispatchThreadId.xy] = float2(Visibility, SceneDepth);
#endif
}
Buffer<uint> TileIntersectionCounts;
// Size of a tile in NDC
float2 TileSize;
#ifndef TILES_PER_INSTANCE
#define TILES_PER_INSTANCE 1
#endif
void CapsuleShadowingUpsampleVS(
float2 TexCoord : ATTRIBUTE0,
uint VertexId : SV_VertexID,
uint InstanceId : SV_InstanceID,
out float4 OutPosition : SV_POSITION
)
{
// Compute the actual instance id for when multiple tiles are packed into the vertex buffer
uint EffectiveInstanceId = InstanceId * TILES_PER_INSTANCE + VertexId / 4;
uint NumCapsulesAffectingTile = TileIntersectionCounts[EffectiveInstanceId];
uint TileY = InstanceId / TileDimensions.x;
uint2 TileCoordinate = uint2(EffectiveInstanceId - TileY * TileDimensions.x, TileY);
float2 ScreenUV = ((TileCoordinate + TexCoord) * TileSize + ScissorRectMinAndSize.xy) * View.BufferSizeAndInvSize.zw;
float2 ScreenPosition = (ScreenUV.xy - View.ScreenPositionScaleBias.wz) / View.ScreenPositionScaleBias.xy;
OutPosition = float4(ScreenPosition, 0, 1);
// Cull the tile if no affecting capsules, shadow will not be visible
if (NumCapsulesAffectingTile == 0)
{
OutPosition.xy = 0;
}
}
Texture2D ShadowFactorsTexture;
SamplerState ShadowFactorsSampler;
float OutputtingToLightAttenuation;
void CapsuleShadowingUpsamplePS(
in float4 SVPos : SV_POSITION,
out float4 OutColor : SV_Target0
#if APPLY_TO_SSAO
,out float4 OutAmbientOcclusion : SV_Target1
#endif
)
{
// Distance field shadowing was computed at 0,0 regardless of viewrect min
float2 DistanceFieldUVs = SvPositionToBufferUV(SVPos) - ScissorRectMinAndSize.xy * View.BufferSizeAndInvSize.zw;
float SceneDepth = CalcSceneDepth(SvPositionToBufferUV(SVPos));
#define BILATERAL_UPSAMPLE 1
#if BILATERAL_UPSAMPLE && UPSAMPLE_REQUIRED
float2 LowResBufferSize = floor(View.RenderTargetSize / DOWNSAMPLE_FACTOR);
float2 LowResTexelSize = 1.0f / LowResBufferSize;
float2 Corner00UV = floor(DistanceFieldUVs * LowResBufferSize - .5f) / LowResBufferSize + .5f * LowResTexelSize;
float2 BilinearWeights = (DistanceFieldUVs - Corner00UV) * LowResBufferSize;
float2 TextureValues00 = Texture2DSampleLevel(ShadowFactorsTexture, ShadowFactorsSampler, Corner00UV, 0).xy;
float2 TextureValues10 = Texture2DSampleLevel(ShadowFactorsTexture, ShadowFactorsSampler, Corner00UV + float2(LowResTexelSize.x, 0), 0).xy;
float2 TextureValues01 = Texture2DSampleLevel(ShadowFactorsTexture, ShadowFactorsSampler, Corner00UV + float2(0, LowResTexelSize.y), 0).xy;
float2 TextureValues11 = Texture2DSampleLevel(ShadowFactorsTexture, ShadowFactorsSampler, Corner00UV + LowResTexelSize, 0).xy;
float4 CornerWeights = float4(
(1 - BilinearWeights.y) * (1 - BilinearWeights.x),
(1 - BilinearWeights.y) * BilinearWeights.x,
BilinearWeights.y * (1 - BilinearWeights.x),
BilinearWeights.y * BilinearWeights.x);
float Epsilon = .0001f;
float4 CornerDepths = abs(float4(TextureValues00.y, TextureValues10.y, TextureValues01.y, TextureValues11.y));
float4 DepthWeights = 1.0f / (abs(CornerDepths - SceneDepth.xxxx) + Epsilon);
float4 FinalWeights = CornerWeights * DepthWeights;
float InterpolatedResult =
(FinalWeights.x * TextureValues00.x
+ FinalWeights.y * TextureValues10.x
+ FinalWeights.z * TextureValues01.x
+ FinalWeights.w * TextureValues11.x)
/ dot(FinalWeights, 1);
float Output = InterpolatedResult;
#else
float Output = Texture2DSampleLevel(ShadowFactorsTexture, ShadowFactorsSampler, DistanceFieldUVs, 0).x;
#endif
if (OutputtingToLightAttenuation > 0)
{
OutColor = EncodeLightAttenuation(Output).xxxx;
}
else
{
OutColor = Output;
}
#if APPLY_TO_SSAO
OutAmbientOcclusion = Output;
#endif
}