You've already forked UnrealEngineUWP
mirror of
https://github.com/izzy2lost/UnrealEngineUWP.git
synced 2026-03-26 18:15:20 -07:00
#lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change3259270on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change3269512on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
667 lines
25 KiB
Plaintext
667 lines
25 KiB
Plaintext
// Copyright 1998-2017 Epic Games, Inc. All Rights Reserved.
|
|
|
|
/*=============================================================================
|
|
DistanceFieldShadowing.usf
|
|
=============================================================================*/
|
|
|
|
#include "Common.usf"
|
|
#include "DeferredShadingCommon.usf"
|
|
#include "DistanceFieldLightingShared.usf"
|
|
#include "DistanceFieldShadowingShared.usf"
|
|
|
|
uint ObjectBoundingGeometryIndexCount;
|
|
|
|
float4 FetchObjectDataFloat4(uint SourceIndex)
|
|
{
|
|
return float4(ObjectData[4 * SourceIndex + 0], ObjectData[4 * SourceIndex + 1], ObjectData[4 * SourceIndex + 2], ObjectData[4 * SourceIndex + 3]);
|
|
}
|
|
|
|
void CopyCulledObjectData(uint DestIndex, uint SourceIndex)
|
|
{
|
|
RWCulledObjectBounds[DestIndex] = float4(ObjectBounds[4 * SourceIndex + 0], ObjectBounds[4 * SourceIndex + 1], ObjectBounds[4 * SourceIndex + 2], ObjectBounds[4 * SourceIndex + 3]);
|
|
|
|
UNROLL
|
|
for (uint VectorIndex = 0; VectorIndex < CULLED_OBJECT_DATA_STRIDE; VectorIndex++)
|
|
{
|
|
// Note: only copying the first CULLED_OBJECT_DATA_STRIDE of the original object data
|
|
RWCulledObjectData[DestIndex * CULLED_OBJECT_DATA_STRIDE + VectorIndex] = FetchObjectDataFloat4(SourceIndex * OBJECT_DATA_STRIDE + VectorIndex);
|
|
}
|
|
|
|
float3 LocalVolumeBoundsMin;
|
|
float3 LocalVolumeBoundsMax;
|
|
LoadGlobalObjectLocalVolumeBoundsMinMax(SourceIndex, LocalVolumeBoundsMin, LocalVolumeBoundsMax);
|
|
|
|
float3 LocalBoundsVertices[8];
|
|
LocalBoundsVertices[0] = float3(LocalVolumeBoundsMin.x, LocalVolumeBoundsMin.y, LocalVolumeBoundsMin.z);
|
|
LocalBoundsVertices[1] = float3(LocalVolumeBoundsMax.x, LocalVolumeBoundsMin.y, LocalVolumeBoundsMin.z);
|
|
LocalBoundsVertices[2] = float3(LocalVolumeBoundsMin.x, LocalVolumeBoundsMax.y, LocalVolumeBoundsMin.z);
|
|
LocalBoundsVertices[3] = float3(LocalVolumeBoundsMax.x, LocalVolumeBoundsMax.y, LocalVolumeBoundsMin.z);
|
|
LocalBoundsVertices[4] = float3(LocalVolumeBoundsMin.x, LocalVolumeBoundsMin.y, LocalVolumeBoundsMax.z);
|
|
LocalBoundsVertices[5] = float3(LocalVolumeBoundsMax.x, LocalVolumeBoundsMin.y, LocalVolumeBoundsMax.z);
|
|
LocalBoundsVertices[6] = float3(LocalVolumeBoundsMin.x, LocalVolumeBoundsMax.y, LocalVolumeBoundsMax.z);
|
|
LocalBoundsVertices[7] = float3(LocalVolumeBoundsMax.x, LocalVolumeBoundsMax.y, LocalVolumeBoundsMax.z);
|
|
|
|
float3 MinViewSpacePosition = float3(2000000, 2000000, 2000000);
|
|
float3 MaxViewSpacePosition = float3(-2000000, -2000000, -2000000);
|
|
|
|
float4 M0 = FetchObjectDataFloat4(SourceIndex * OBJECT_DATA_STRIDE + 11);
|
|
float4 M1 = FetchObjectDataFloat4(SourceIndex * OBJECT_DATA_STRIDE + 12);
|
|
float4 M2 = FetchObjectDataFloat4(SourceIndex * OBJECT_DATA_STRIDE + 13);
|
|
float4 M3 = FetchObjectDataFloat4(SourceIndex * OBJECT_DATA_STRIDE + 14);
|
|
|
|
float4x4 LocalToWorld = float4x4(M0, M1, M2, M3);
|
|
|
|
float3 ViewSpaceBoundsVertices[8];
|
|
|
|
for (uint i = 0; i < 8; i++)
|
|
{
|
|
float3 WorldBoundsPosition = mul(float4(LocalBoundsVertices[i], 1), LocalToWorld).xyz;
|
|
float3 ViewSpacePosition = mul(float4(WorldBoundsPosition, 1), WorldToShadow).xyz;
|
|
MinViewSpacePosition = min(MinViewSpacePosition, ViewSpacePosition);
|
|
MaxViewSpacePosition = max(MaxViewSpacePosition, ViewSpacePosition);
|
|
ViewSpaceBoundsVertices[i] = ViewSpacePosition;
|
|
}
|
|
|
|
float3 ObjectXAxis = (ViewSpaceBoundsVertices[1] - ViewSpaceBoundsVertices[0]) / 2.0f;
|
|
float3 ObjectYAxis = (ViewSpaceBoundsVertices[2] - ViewSpaceBoundsVertices[0]) / 2.0f;
|
|
float3 ObjectZAxis = (ViewSpaceBoundsVertices[4] - ViewSpaceBoundsVertices[0]) / 2.0f;
|
|
|
|
RWCulledObjectBoxBounds[DestIndex * CULLED_OBJECT_BOX_BOUNDS_STRIDE + 0] = float4(MinViewSpacePosition, 0);
|
|
RWCulledObjectBoxBounds[DestIndex * CULLED_OBJECT_BOX_BOUNDS_STRIDE + 1] = float4(MaxViewSpacePosition, 0);
|
|
RWCulledObjectBoxBounds[DestIndex * CULLED_OBJECT_BOX_BOUNDS_STRIDE + 2] = float4(ObjectXAxis / max(dot(ObjectXAxis, ObjectXAxis), .0001f), 0);
|
|
RWCulledObjectBoxBounds[DestIndex * CULLED_OBJECT_BOX_BOUNDS_STRIDE + 3] = float4(ObjectYAxis / max(dot(ObjectYAxis, ObjectYAxis), .0001f), 0);
|
|
RWCulledObjectBoxBounds[DestIndex * CULLED_OBJECT_BOX_BOUNDS_STRIDE + 4] = float4(ObjectZAxis / max(dot(ObjectZAxis, ObjectZAxis), .0001f), 0);
|
|
}
|
|
|
|
float4 ShadowConvexHull[12];
|
|
float4 ShadowBoundingSphere;
|
|
uint NumShadowHullPlanes;
|
|
|
|
bool ShadowConvexHullIntersectSphere(float3 SphereOrigin, float SphereRadius)
|
|
{
|
|
float3 TranslatedSphereOrigin = SphereOrigin + ShadowBoundingSphere.xyz;
|
|
|
|
for (uint PlaneIndex = 0; PlaneIndex < NumShadowHullPlanes; PlaneIndex++)
|
|
{
|
|
float4 PlaneData = ShadowConvexHull[PlaneIndex];
|
|
float PlaneDistance = dot(PlaneData.xyz, TranslatedSphereOrigin) - PlaneData.w;
|
|
|
|
if (PlaneDistance > SphereRadius)
|
|
{
|
|
return false;
|
|
}
|
|
}
|
|
|
|
return true;
|
|
}
|
|
|
|
[numthreads(UPDATEOBJECTS_THREADGROUP_SIZE, 1, 1)]
|
|
void CullObjectsForShadowCS(
|
|
uint3 GroupId : SV_GroupID,
|
|
uint3 DispatchThreadId : SV_DispatchThreadID,
|
|
uint3 GroupThreadId : SV_GroupThreadID)
|
|
{
|
|
uint ObjectIndex = DispatchThreadId.x;
|
|
|
|
#define USE_FRUSTUM_CULLING 1
|
|
#if USE_FRUSTUM_CULLING
|
|
|
|
if (DispatchThreadId.x == 0)
|
|
{
|
|
// RWObjectIndirectArguments is zeroed by a clear before this shader, only need to set things that are non-zero (and are not read by this shader as that would be a race condition)
|
|
// IndexCount, NumInstances, StartIndex, BaseVertexIndex, FirstInstance
|
|
RWObjectIndirectArguments[0] = ObjectBoundingGeometryIndexCount;
|
|
}
|
|
|
|
GroupMemoryBarrierWithGroupSync();
|
|
|
|
uint SourceIndex = ObjectIndex;
|
|
|
|
if (ObjectIndex < NumSceneObjects)
|
|
{
|
|
float4 ObjectBoundingSphere = float4(ObjectBounds[4 * SourceIndex + 0], ObjectBounds[4 * SourceIndex + 1], ObjectBounds[4 * SourceIndex + 2], ObjectBounds[4 * SourceIndex + 3]);
|
|
|
|
if (ShadowBoundingSphere.w == 0 && ShadowConvexHullIntersectSphere(ObjectBoundingSphere.xyz, ObjectBoundingSphere.w)
|
|
|| ShadowBoundingSphere.w > 0 && dot(ShadowBoundingSphere.xyz - ObjectBoundingSphere.xyz, ShadowBoundingSphere.xyz - ObjectBoundingSphere.xyz) < Square(ShadowBoundingSphere.w + ObjectBoundingSphere.w))
|
|
{
|
|
uint DestIndex;
|
|
InterlockedAdd(RWObjectIndirectArguments[1], 1U, DestIndex);
|
|
CopyCulledObjectData(DestIndex, SourceIndex);
|
|
}
|
|
}
|
|
|
|
#else
|
|
|
|
if (DispatchThreadId.x == 0)
|
|
{
|
|
// IndexCount, NumInstances, StartIndex, BaseVertexIndex, FirstInstance
|
|
RWObjectIndirectArguments[0] = ObjectBoundingGeometryIndexCount;
|
|
RWObjectIndirectArguments[1] = NumSceneObjects;
|
|
}
|
|
|
|
uint SourceIndex = ObjectIndex;
|
|
uint DestIndex = ObjectIndex;
|
|
|
|
if (ObjectIndex < NumSceneObjects)
|
|
{
|
|
CopyCulledObjectData(DestIndex, SourceIndex);
|
|
}
|
|
#endif
|
|
}
|
|
|
|
RWBuffer<uint> RWShadowTileHeadDataUnpacked;
|
|
RWBuffer<uint> RWShadowTileArrayData;
|
|
|
|
[numthreads(THREADGROUP_SIZEX, THREADGROUP_SIZEY, 1)]
|
|
void ClearTilesCS(
|
|
uint3 GroupId : SV_GroupID,
|
|
uint3 DispatchThreadId : SV_DispatchThreadID,
|
|
uint3 GroupThreadId : SV_GroupThreadID)
|
|
{
|
|
uint TileIndex = DispatchThreadId.y * ShadowTileListGroupSize.x + DispatchThreadId.x;
|
|
|
|
RWShadowTileHeadDataUnpacked[TileIndex * 2 + 0] = TileIndex;
|
|
RWShadowTileHeadDataUnpacked[TileIndex * 2 + 1] = 0;
|
|
}
|
|
|
|
struct FShadowObjectCullVertexOutput
|
|
{
|
|
nointerpolation float4 PositionAndRadius : TEXCOORD0;
|
|
nointerpolation uint ObjectIndex : TEXCOORD1;
|
|
};
|
|
|
|
float ConservativeRadiusScale;
|
|
float MinRadius;
|
|
|
|
/** Used when culling objects into screenspace tile lists */
|
|
void ShadowObjectCullVS(
|
|
float4 InPosition : ATTRIBUTE0,
|
|
uint ObjectIndex : SV_InstanceID,
|
|
out FShadowObjectCullVertexOutput Output,
|
|
out float4 OutPosition : SV_POSITION
|
|
)
|
|
{
|
|
float4 ObjectPositionAndRadius = LoadObjectPositionAndRadius(ObjectIndex);
|
|
// ConservativeRadiusScale pushes the sphere vertices out so the triangles between them lie completely outside the sphere
|
|
// MinRadius is for conservative rasterization
|
|
float EffectiveRadius = (ObjectPositionAndRadius.w + MinRadius) * ConservativeRadiusScale;
|
|
float3 WorldPosition = InPosition.xyz * EffectiveRadius + ObjectPositionAndRadius.xyz;
|
|
OutPosition = mul(float4(WorldPosition, 1), WorldToShadow);
|
|
|
|
// Clamp the vertex to the near plane if it is in front of the near plane
|
|
if (OutPosition.z < 0)
|
|
{
|
|
OutPosition.z = 0.000001f;
|
|
OutPosition.w = 1.0f;
|
|
}
|
|
|
|
Output.PositionAndRadius = ObjectPositionAndRadius;
|
|
Output.ObjectIndex = ObjectIndex;
|
|
}
|
|
|
|
/** Intersects a single object with the tile and adds to the intersection list if needed. */
|
|
void ShadowObjectCullPS(
|
|
FShadowObjectCullVertexOutput Input,
|
|
in float4 SVPos : SV_POSITION,
|
|
out float4 OutColor : SV_Target0)
|
|
{
|
|
OutColor = 0;
|
|
|
|
uint2 TilePosition = (uint2)SVPos.xy;
|
|
uint TileIndex = TilePosition.y * ShadowTileListGroupSize.x + TilePosition.x;
|
|
|
|
#define OBJECT_OBB_INTERSECTION 1
|
|
#if OBJECT_OBB_INTERSECTION
|
|
|
|
float3 ShadowTileMin;
|
|
float3 ShadowTileMax;
|
|
|
|
float2 TilePositionForCulling = float2(TilePosition.x, ShadowTileListGroupSize.y - TilePosition.y);
|
|
//@todo - why is this expand needed
|
|
float TileExpand = 1;
|
|
ShadowTileMin.xy = (TilePositionForCulling - TileExpand) / (float2)ShadowTileListGroupSize * 2 - 1;
|
|
ShadowTileMax.xy = (TilePositionForCulling + 1) / (float2)ShadowTileListGroupSize * 2 - 1;
|
|
// Extrude toward light to avoid culling objects between the light and the shadow frustum
|
|
ShadowTileMin.z = -1000;
|
|
ShadowTileMax.z = 1;
|
|
|
|
float3 ObjectViewSpaceMin;
|
|
float3 ObjectViewSpaceMax;
|
|
LoadObjectViewSpaceBox(Input.ObjectIndex, ObjectViewSpaceMin, ObjectViewSpaceMax);
|
|
|
|
BRANCH
|
|
// Separating axis test on the AABB
|
|
// Note: don't clip by near plane, objects closer to the light can still cast into the frustum
|
|
if (all(ObjectViewSpaceMax.xy > ShadowTileMin.xy) && all(ObjectViewSpaceMin < ShadowTileMax))
|
|
{
|
|
float3 ObjectCenter = .5f * (ObjectViewSpaceMin + ObjectViewSpaceMax);
|
|
float3 MinProjections = 500000;
|
|
float3 MaxProjections = -500000;
|
|
|
|
{
|
|
float3 Corners[8];
|
|
Corners[0] = float3(ShadowTileMin.x, ShadowTileMin.y, ShadowTileMin.z);
|
|
Corners[1] = float3(ShadowTileMax.x, ShadowTileMin.y, ShadowTileMin.z);
|
|
Corners[2] = float3(ShadowTileMin.x, ShadowTileMax.y, ShadowTileMin.z);
|
|
Corners[3] = float3(ShadowTileMax.x, ShadowTileMax.y, ShadowTileMin.z);
|
|
Corners[4] = float3(ShadowTileMin.x, ShadowTileMin.y, ShadowTileMax.z);
|
|
Corners[5] = float3(ShadowTileMax.x, ShadowTileMin.y, ShadowTileMax.z);
|
|
Corners[6] = float3(ShadowTileMin.x, ShadowTileMax.y, ShadowTileMax.z);
|
|
Corners[7] = float3(ShadowTileMax.x, ShadowTileMax.y, ShadowTileMax.z);
|
|
|
|
float3 ObjectAxisX;
|
|
float3 ObjectAxisY;
|
|
float3 ObjectAxisZ;
|
|
LoadObjectAxes(Input.ObjectIndex, ObjectAxisX, ObjectAxisY, ObjectAxisZ);
|
|
|
|
UNROLL
|
|
for (int i = 0; i < 8; i++)
|
|
{
|
|
float3 CenterToVertex = Corners[i] - ObjectCenter;
|
|
float3 Projections = float3(dot(CenterToVertex, ObjectAxisX), dot(CenterToVertex, ObjectAxisY), dot(CenterToVertex, ObjectAxisZ));
|
|
MinProjections = min(MinProjections, Projections);
|
|
MaxProjections = max(MaxProjections, Projections);
|
|
}
|
|
}
|
|
|
|
BRANCH
|
|
// Separating axis test on the OBB
|
|
if (all(MinProjections < 1) && all(MaxProjections > -1))
|
|
{
|
|
uint ArrayIndex;
|
|
InterlockedAdd(RWShadowTileHeadDataUnpacked[TileIndex * 2 + 1], 1U, ArrayIndex);
|
|
|
|
if (ArrayIndex < ShadowMaxObjectsPerTile)
|
|
{
|
|
uint DataIndex = (ArrayIndex * (uint)(ShadowTileListGroupSize.x * ShadowTileListGroupSize.y + .5f) + TileIndex);
|
|
RWShadowTileArrayData[DataIndex * SHADOW_TILE_ARRAY_DATA_STRIDE] = Input.ObjectIndex;
|
|
}
|
|
}
|
|
}
|
|
|
|
#else
|
|
{
|
|
uint ArrayIndex;
|
|
InterlockedAdd(RWShadowTileHeadDataUnpacked[TileIndex * 2 + 1], 1U, ArrayIndex);
|
|
|
|
if (ArrayIndex < ShadowMaxObjectsPerTile)
|
|
{
|
|
uint DataIndex = (ArrayIndex * (uint)(ShadowTileListGroupSize.x * ShadowTileListGroupSize.y + .5f) + TileIndex);
|
|
RWShadowTileArrayData[DataIndex * SHADOW_TILE_ARRAY_DATA_STRIDE] = Input.ObjectIndex;
|
|
}
|
|
}
|
|
#endif
|
|
}
|
|
|
|
RWTexture2D<float2> RWShadowFactors;
|
|
float2 NumGroups;
|
|
|
|
/** From point being shaded toward light, for directional lights. */
|
|
float3 LightDirection;
|
|
float4 LightPositionAndInvRadius;
|
|
float LightSourceRadius;
|
|
float RayStartOffsetDepthScale;
|
|
float3 TanLightAngleAndNormalThreshold;
|
|
int4 ScissorRectMinAndSize;
|
|
|
|
/** Min and Max depth for this tile. */
|
|
groupshared uint IntegerTileMinZ;
|
|
groupshared uint IntegerTileMaxZ;
|
|
|
|
/** Inner Min and Max depth for this tile. */
|
|
groupshared uint IntegerTileMinZ2;
|
|
groupshared uint IntegerTileMaxZ2;
|
|
|
|
/** Number of objects affecting the tile, after culling. */
|
|
groupshared uint TileNumObjects0;
|
|
groupshared uint TileNumObjects1;
|
|
|
|
void CullObjectsToTileWithGather(
|
|
float SceneDepth,
|
|
uint ThreadIndex,
|
|
uint2 GroupId,
|
|
float TraceDistance,
|
|
float MinDepth,
|
|
out uint NumIntersectingObjects,
|
|
out bool bTileShouldComputeShadowing,
|
|
out uint GroupIndex)
|
|
{
|
|
// Initialize per-tile variables
|
|
if (ThreadIndex == 0)
|
|
{
|
|
IntegerTileMinZ = 0x7F7FFFFF;
|
|
IntegerTileMaxZ = 0;
|
|
IntegerTileMinZ2 = 0x7F7FFFFF;
|
|
IntegerTileMaxZ2 = 0;
|
|
TileNumObjects0 = 0;
|
|
TileNumObjects1 = 0;
|
|
}
|
|
|
|
GroupMemoryBarrierWithGroupSync();
|
|
|
|
if (SceneDepth > MinDepth)
|
|
{
|
|
// Use shared memory atomics to build the depth bounds for this tile
|
|
// Each thread is assigned to a pixel at this point
|
|
//@todo - move depth range computation to a central point where it can be reused by all the frame's tiled deferred passes!
|
|
InterlockedMin(IntegerTileMinZ, asuint(SceneDepth));
|
|
InterlockedMax(IntegerTileMaxZ, asuint(SceneDepth));
|
|
}
|
|
|
|
GroupMemoryBarrierWithGroupSync();
|
|
|
|
float MinTileZ = asfloat(IntegerTileMinZ);
|
|
float MaxTileZ = asfloat(IntegerTileMaxZ);
|
|
|
|
float HalfZ = .5f * (MinTileZ + MaxTileZ);
|
|
|
|
// Compute a second min and max Z, clipped by HalfZ, so that we get two depth bounds per tile
|
|
// This results in more conservative tile depth bounds and fewer intersections
|
|
if (SceneDepth >= HalfZ && SceneDepth > MinDepth)
|
|
{
|
|
InterlockedMin(IntegerTileMinZ2, asuint(SceneDepth));
|
|
}
|
|
|
|
if (SceneDepth <= HalfZ && SceneDepth > MinDepth)
|
|
{
|
|
InterlockedMax(IntegerTileMaxZ2, asuint(SceneDepth));
|
|
}
|
|
|
|
GroupMemoryBarrierWithGroupSync();
|
|
|
|
float MinTileZ2 = asfloat(IntegerTileMinZ2);
|
|
float MaxTileZ2 = asfloat(IntegerTileMaxZ2);
|
|
|
|
float3 ViewTileMin;
|
|
float3 ViewTileMax;
|
|
|
|
float3 ViewTileMin2;
|
|
float3 ViewTileMax2;
|
|
|
|
float ExpandRadius = 0;
|
|
|
|
// Note: this code is assuming a centered projection, aka no translation present in ViewToClip
|
|
// Stereo rendering uses an off center projection
|
|
float2 TanViewFOV = GetTanHalfFieldOfView();
|
|
// tan(FOV) = HalfUnitPlaneWidth / 1, so TanViewFOV * 2 is the size of the whole unit view plane
|
|
// We are operating on a subset of that defined by ScissorRectMinAndSize
|
|
float2 TileSize = TanViewFOV * 2 * ScissorRectMinAndSize.zw / ((float2)View.ViewSizeAndInvSize.xy * NumGroups);
|
|
float2 UnitPlaneMin = -TanViewFOV + TanViewFOV * 2 * (ScissorRectMinAndSize.xy - View.ViewRectMin.xy) * View.ViewSizeAndInvSize.zw;
|
|
float2 UnitPlaneTileMin = (GroupId.xy * TileSize + UnitPlaneMin) * float2(1, -1);
|
|
float2 UnitPlaneTileMax = ((GroupId.xy + 1) * TileSize + UnitPlaneMin) * float2(1, -1);
|
|
|
|
ViewTileMin.xy = min(MinTileZ * UnitPlaneTileMin, MaxTileZ2 * UnitPlaneTileMin) - ExpandRadius;
|
|
ViewTileMax.xy = max(MinTileZ * UnitPlaneTileMax, MaxTileZ2 * UnitPlaneTileMax) + ExpandRadius;
|
|
ViewTileMin.z = MinTileZ - ExpandRadius;
|
|
ViewTileMax.z = MaxTileZ2 + ExpandRadius;
|
|
ViewTileMin2.xy = min(MinTileZ2 * UnitPlaneTileMin, MaxTileZ * UnitPlaneTileMin) - ExpandRadius;
|
|
ViewTileMax2.xy = max(MinTileZ2 * UnitPlaneTileMax, MaxTileZ * UnitPlaneTileMax) + ExpandRadius;
|
|
ViewTileMin2.z = MinTileZ2 - ExpandRadius;
|
|
ViewTileMax2.z = MaxTileZ + ExpandRadius;
|
|
|
|
float3 ViewGroup0Center = (ViewTileMax + ViewTileMin) / 2;
|
|
float3 WorldGroup0Center = mul(float4(ViewGroup0Center, 1), View.ViewToTranslatedWorld).xyz - View.PreViewTranslation;
|
|
float Group0BoundingRadius = length(ViewGroup0Center - ViewTileMax);
|
|
|
|
float3 ViewGroup1Center = (ViewTileMax2 + ViewTileMin2) / 2;
|
|
float3 WorldGroup1Center = mul(float4(ViewGroup1Center, 1), View.ViewToTranslatedWorld).xyz - View.PreViewTranslation;
|
|
float Group1BoundingRadius = length(ViewGroup1Center - ViewTileMax2);
|
|
|
|
#if POINT_LIGHT
|
|
float3 LightVector0 = LightPositionAndInvRadius.xyz - WorldGroup0Center;
|
|
float LightVector0Length = length(LightVector0);
|
|
float3 LightVector1 = LightPositionAndInvRadius.xyz - WorldGroup1Center;
|
|
float LightVector1Length = length(LightVector1);
|
|
float3 LightDirection0 = LightVector0 / LightVector0Length;
|
|
float3 LightDirection1 = LightVector1 / LightVector1Length;;
|
|
float RayLength0 = LightVector0Length;
|
|
float RayLength1 = LightVector1Length;
|
|
|
|
// Don't operate on tiles completely outside of the light's influence
|
|
bTileShouldComputeShadowing = LightVector0Length < 1.0f / LightPositionAndInvRadius.w + Group0BoundingRadius
|
|
|| LightVector1Length < 1.0f / LightPositionAndInvRadius.w + Group1BoundingRadius;
|
|
|
|
#else
|
|
float3 LightDirection0 = LightDirection;
|
|
float3 LightDirection1 = LightDirection;
|
|
float RayLength0 = TraceDistance;
|
|
float RayLength1 = TraceDistance;
|
|
|
|
bTileShouldComputeShadowing = SceneDepth > MinDepth;
|
|
|
|
#endif
|
|
|
|
BRANCH
|
|
if (bTileShouldComputeShadowing)
|
|
{
|
|
uint NumCulledObjects = GetCulledNumObjects();
|
|
|
|
// Compute per-tile lists of affecting objects through bounds culling
|
|
// Each thread now operates on a sample instead of a pixel
|
|
LOOP
|
|
for (uint ObjectIndex = ThreadIndex; ObjectIndex < NumCulledObjects; ObjectIndex += THREADGROUP_TOTALSIZE)
|
|
{
|
|
float4 SphereCenterAndRadius = LoadObjectPositionAndRadius(ObjectIndex);
|
|
|
|
BRANCH
|
|
if (RaySegmentHitSphere(WorldGroup0Center, LightDirection0, RayLength0, SphereCenterAndRadius.xyz, SphereCenterAndRadius.w + Group0BoundingRadius))
|
|
{
|
|
uint ListIndex;
|
|
InterlockedAdd(TileNumObjects0, 1U, ListIndex);
|
|
// Don't overwrite on overflow
|
|
ListIndex = min(ListIndex, (uint)(MAX_INTERSECTING_OBJECTS - 1));
|
|
IntersectingObjectIndices[MAX_INTERSECTING_OBJECTS * 0 + ListIndex] = ObjectIndex;
|
|
}
|
|
|
|
BRANCH
|
|
if (RaySegmentHitSphere(WorldGroup1Center, LightDirection1, RayLength1, SphereCenterAndRadius.xyz, SphereCenterAndRadius.w + Group1BoundingRadius))
|
|
{
|
|
uint ListIndex;
|
|
InterlockedAdd(TileNumObjects1, 1U, ListIndex);
|
|
// Don't write out of bounds on overflow
|
|
ListIndex = min(ListIndex, (uint)(MAX_INTERSECTING_OBJECTS - 1));
|
|
IntersectingObjectIndices[MAX_INTERSECTING_OBJECTS * 1 + ListIndex] = ObjectIndex;
|
|
}
|
|
}
|
|
}
|
|
|
|
GroupMemoryBarrierWithGroupSync();
|
|
|
|
GroupIndex = SceneDepth > MaxTileZ2 ? 1 : 0;
|
|
NumIntersectingObjects = min(GroupIndex == 0 ? TileNumObjects0 : TileNumObjects1, (uint)MAX_INTERSECTING_OBJECTS);
|
|
}
|
|
|
|
float MinDepth;
|
|
float MaxDepth;
|
|
uint DownsampleFactor;
|
|
|
|
[numthreads(THREADGROUP_SIZEX, THREADGROUP_SIZEY, 1)]
|
|
void DistanceFieldShadowingCS(
|
|
uint3 GroupId : SV_GroupID,
|
|
uint3 DispatchThreadId : SV_DispatchThreadID,
|
|
uint3 GroupThreadId : SV_GroupThreadID)
|
|
{
|
|
uint ThreadIndex = GroupThreadId.y * THREADGROUP_SIZEX + GroupThreadId.x;
|
|
|
|
float2 ScreenUV = float2((DispatchThreadId.xy * DownsampleFactor + ScissorRectMinAndSize.xy + .5f) * View.BufferSizeAndInvSize.zw);
|
|
float2 ScreenPosition = (ScreenUV.xy - View.ScreenPositionScaleBias.wz) / View.ScreenPositionScaleBias.xy;
|
|
|
|
float SceneDepth = CalcSceneDepth(ScreenUV);
|
|
float4 HomogeneousWorldPosition = mul(float4(ScreenPosition * SceneDepth, SceneDepth, 1), View.ScreenToWorld);
|
|
float3 OpaqueWorldPosition = HomogeneousWorldPosition.xyz / HomogeneousWorldPosition.w;
|
|
|
|
// Distance for directional lights to trace
|
|
float TraceDistance = TanLightAngleAndNormalThreshold.z;
|
|
|
|
uint NumIntersectingObjects = GetCulledNumObjects();
|
|
uint CulledDataParameter = 0;
|
|
bool bTileShouldComputeShadowing = SceneDepth > MinDepth && SceneDepth < MaxDepth;
|
|
|
|
#define USE_CULLING 1
|
|
#if USE_CULLING
|
|
|
|
#if SCATTER_TILE_CULLING
|
|
|
|
if (bTileShouldComputeShadowing)
|
|
{
|
|
GetShadowTileCulledData(OpaqueWorldPosition, CulledDataParameter, NumIntersectingObjects);
|
|
}
|
|
|
|
#else
|
|
|
|
//@todo - support MinDepth for tile culling
|
|
float MinDepthForTileCulling = 0;
|
|
CullObjectsToTileWithGather(SceneDepth, ThreadIndex, GroupId.xy, TraceDistance, MinDepthForTileCulling, NumIntersectingObjects, bTileShouldComputeShadowing, CulledDataParameter);
|
|
|
|
#endif
|
|
#endif // USE_CULLING
|
|
|
|
float Result = 0;
|
|
|
|
#define COMPUTE_SHADOWING 1
|
|
#if COMPUTE_SHADOWING
|
|
BRANCH
|
|
if (bTileShouldComputeShadowing)
|
|
{
|
|
{
|
|
// World space offset along the start of the ray to avoid incorrect self-shadowing
|
|
float RayStartOffset = 2 + RayStartOffsetDepthScale * SceneDepth;
|
|
// Keeps result from going all the way sharp
|
|
float MinSphereRadius = .4f;
|
|
// Maintain reasonable culling bounds
|
|
float MaxSphereRadius = 100;
|
|
|
|
#if POINT_LIGHT
|
|
|
|
float3 LightVector = LightPositionAndInvRadius.xyz - OpaqueWorldPosition;
|
|
float LightVectorLength = length(LightVector);
|
|
float3 WorldRayStart = OpaqueWorldPosition + LightVector / LightVectorLength * RayStartOffset;
|
|
float3 WorldRayEnd = LightPositionAndInvRadius.xyz;
|
|
float MaxRayTime = LightVectorLength;
|
|
float MaxAngle = tan(10 * PI / 180.0f);
|
|
// Comparing tangents instead of angles, but tangent is always increasing in this range
|
|
float TanLightAngle = min(LightSourceRadius / LightVectorLength, MaxAngle);
|
|
|
|
#else
|
|
|
|
float3 WorldRayStart = OpaqueWorldPosition + LightDirection * RayStartOffset;
|
|
float3 WorldRayEnd = OpaqueWorldPosition + LightDirection * TraceDistance;
|
|
float MaxRayTime = TraceDistance;
|
|
float TanLightAngle = TanLightAngleAndNormalThreshold.x;
|
|
|
|
#endif
|
|
|
|
#if SCATTER_TILE_CULLING
|
|
bool bUseScatterTileCulling = true;
|
|
#else
|
|
bool bUseScatterTileCulling = false;
|
|
#endif
|
|
|
|
#if USE_CULLING
|
|
bool bUseCulling = true;
|
|
#else
|
|
bool bUseCulling = false;
|
|
#endif
|
|
|
|
float SubsurfaceDensity = 0;
|
|
bool bUseSubsurfaceTransmission = false;
|
|
|
|
#if !FORWARD_SHADING
|
|
FGBufferData GBufferData = GetGBufferData(ScreenUV);
|
|
|
|
BRANCH
|
|
if (IsSubsurfaceModel(GBufferData.ShadingModelID))
|
|
{
|
|
// Note: this has to match shadowmap logic
|
|
// Derive density from a heuristic using opacity, tweaked for useful falloff ranges and to give a linear depth falloff with opacity
|
|
SubsurfaceDensity = -.05f * log(1 - min(GBufferData.CustomData.a, .999f));
|
|
bUseSubsurfaceTransmission = true;
|
|
}
|
|
#endif
|
|
|
|
Result = ShadowRayTraceThroughCulledObjects(
|
|
WorldRayStart,
|
|
WorldRayEnd,
|
|
MaxRayTime,
|
|
TanLightAngle,
|
|
MinSphereRadius,
|
|
MaxSphereRadius,
|
|
SubsurfaceDensity,
|
|
CulledDataParameter,
|
|
NumIntersectingObjects,
|
|
bUseCulling,
|
|
bUseScatterTileCulling,
|
|
bUseSubsurfaceTransmission);
|
|
}
|
|
}
|
|
|
|
#else
|
|
//Result = bTileShouldComputeShadowing;
|
|
Result = NumIntersectingObjects / 5000.0f;
|
|
#endif
|
|
|
|
RWShadowFactors[DispatchThreadId.xy] = float2(Result, SceneDepth);
|
|
}
|
|
|
|
Texture2D ShadowFactorsTexture;
|
|
SamplerState ShadowFactorsSampler;
|
|
|
|
float FadePlaneOffset;
|
|
float InvFadePlaneLength;
|
|
float NearFadePlaneOffset;
|
|
float InvNearFadePlaneLength;
|
|
|
|
void DistanceFieldShadowingUpsamplePS(
|
|
in float4 UVAndScreenPos : TEXCOORD0,
|
|
in float4 SVPos : SV_POSITION,
|
|
out float4 OutColor : SV_Target0)
|
|
{
|
|
// Distance field shadowing was computed at 0,0 regardless of viewrect min
|
|
float2 DistanceFieldUVs = UVAndScreenPos.xy - ScissorRectMinAndSize.xy * View.BufferSizeAndInvSize.zw;
|
|
float SceneDepth = CalcSceneDepth(UVAndScreenPos.xy);
|
|
|
|
#define BILATERAL_UPSAMPLE 1
|
|
#if BILATERAL_UPSAMPLE && UPSAMPLE_REQUIRED
|
|
float2 LowResBufferSize = floor(View.BufferSizeAndInvSize.xy / DOWNSAMPLE_FACTOR);
|
|
float2 LowResTexelSize = 1.0f / LowResBufferSize;
|
|
float2 Corner00UV = floor(DistanceFieldUVs * LowResBufferSize - .5f) / LowResBufferSize + .5f * LowResTexelSize;
|
|
float2 BilinearWeights = (DistanceFieldUVs - Corner00UV) * LowResBufferSize;
|
|
|
|
float2 TextureValues00 = Texture2DSampleLevel(ShadowFactorsTexture, ShadowFactorsSampler, Corner00UV, 0).xy;
|
|
float2 TextureValues10 = Texture2DSampleLevel(ShadowFactorsTexture, ShadowFactorsSampler, Corner00UV + float2(LowResTexelSize.x, 0), 0).xy;
|
|
float2 TextureValues01 = Texture2DSampleLevel(ShadowFactorsTexture, ShadowFactorsSampler, Corner00UV + float2(0, LowResTexelSize.y), 0).xy;
|
|
float2 TextureValues11 = Texture2DSampleLevel(ShadowFactorsTexture, ShadowFactorsSampler, Corner00UV + LowResTexelSize, 0).xy;
|
|
|
|
float4 CornerWeights = float4(
|
|
(1 - BilinearWeights.y) * (1 - BilinearWeights.x),
|
|
(1 - BilinearWeights.y) * BilinearWeights.x,
|
|
BilinearWeights.y * (1 - BilinearWeights.x),
|
|
BilinearWeights.y * BilinearWeights.x);
|
|
|
|
float Epsilon = .0001f;
|
|
|
|
float4 CornerDepths = abs(float4(TextureValues00.y, TextureValues10.y, TextureValues01.y, TextureValues11.y));
|
|
float4 DepthWeights = 1.0f / (abs(CornerDepths - SceneDepth.xxxx) + Epsilon);
|
|
float4 FinalWeights = CornerWeights * DepthWeights;
|
|
|
|
float InterpolatedResult =
|
|
(FinalWeights.x * TextureValues00.x
|
|
+ FinalWeights.y * TextureValues10.x
|
|
+ FinalWeights.z * TextureValues01.x
|
|
+ FinalWeights.w * TextureValues11.x)
|
|
/ dot(FinalWeights, 1);
|
|
|
|
float Output = InterpolatedResult;
|
|
|
|
#else
|
|
float Output = Texture2DSampleLevel(ShadowFactorsTexture, ShadowFactorsSampler, DistanceFieldUVs, 0).x;
|
|
#endif
|
|
|
|
float FarBlendFactor = 1.0f - saturate((SceneDepth - FadePlaneOffset) * InvFadePlaneLength);
|
|
Output = lerp(1, Output, FarBlendFactor);
|
|
|
|
float NearBlendFactor = saturate((SceneDepth - NearFadePlaneOffset) * InvNearFadePlaneLength);
|
|
Output = lerp(1, Output, NearBlendFactor);
|
|
|
|
OutColor = EncodeLightAttenuation(half4(Output, Output, Output, Output));
|
|
} |