You've already forked UnrealEngineUWP
mirror of
https://github.com/izzy2lost/UnrealEngineUWP.git
synced 2026-03-26 18:15:20 -07:00
#lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change3259270on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change3269512on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
332 lines
11 KiB
Plaintext
332 lines
11 KiB
Plaintext
// Copyright 1998-2017 Epic Games, Inc. All Rights Reserved.
|
|
|
|
/*=============================================================================
|
|
DistanceFieldLightingShared.usf
|
|
=============================================================================*/
|
|
|
|
#include "SurfelMaterialShared.usf"
|
|
|
|
#ifndef THREADGROUP_SIZEX
|
|
#define THREADGROUP_SIZEX 1
|
|
#endif
|
|
|
|
#ifndef THREADGROUP_SIZEY
|
|
#define THREADGROUP_SIZEY 1
|
|
#endif
|
|
|
|
#define THREADGROUP_TOTALSIZE (THREADGROUP_SIZEX * THREADGROUP_SIZEY)
|
|
|
|
#ifndef DOWNSAMPLE_FACTOR
|
|
#define DOWNSAMPLE_FACTOR 1
|
|
#endif
|
|
|
|
#ifndef UPDATEOBJECTS_THREADGROUP_SIZE
|
|
#define UPDATEOBJECTS_THREADGROUP_SIZE 1
|
|
#endif
|
|
|
|
float3 DistanceFieldVolumePositionToUV(float3 VolumePosition, float3 UVScale, float3 UVAdd)
|
|
{
|
|
float3 VolumeUV = VolumePosition * UVScale + UVAdd;
|
|
return VolumeUV;
|
|
}
|
|
|
|
Texture3D DistanceFieldTexture;
|
|
SamplerState DistanceFieldSampler;
|
|
float3 DistanceFieldAtlasTexelSize;
|
|
|
|
RWBuffer<uint> RWObjectIndirectArguments;
|
|
Buffer<uint> ObjectIndirectArguments;
|
|
|
|
uint GetCulledNumObjects()
|
|
{
|
|
// IndexCount, NumInstances, StartIndex, BaseVertexIndex, FirstInstance
|
|
return ObjectIndirectArguments[1];
|
|
}
|
|
|
|
// In float4's. Must match equivalent C++ variables.
|
|
#define OBJECT_DATA_STRIDE 17
|
|
|
|
uint NumSceneObjects;
|
|
|
|
// Have to make these R32F with 4x the reads and writes because of the horrible D3D11 limitation
|
|
// "error X3676: typed UAV loads are only allowed for single-component 32-bit element types"
|
|
Buffer<float> ObjectBounds;
|
|
Buffer<float> ObjectData;
|
|
|
|
RWBuffer<float> RWObjectBounds;
|
|
RWBuffer<float> RWObjectData;
|
|
|
|
float4 LoadGlobalObjectPositionAndRadius(uint ObjectIndex)
|
|
{
|
|
uint VectorIndex = ObjectIndex * 4;
|
|
return float4(ObjectBounds[VectorIndex + 0], ObjectBounds[VectorIndex + 1], ObjectBounds[VectorIndex + 2], ObjectBounds[VectorIndex + 3]);
|
|
}
|
|
|
|
float4x4 LoadGlobalObjectWorldToVolume(uint ObjectIndex)
|
|
{
|
|
uint VectorIndex = (ObjectIndex * OBJECT_DATA_STRIDE + 0) * 4;
|
|
float4 M0 = float4(ObjectData[VectorIndex + 0], ObjectData[VectorIndex + 1], ObjectData[VectorIndex + 2], ObjectData[VectorIndex + 3]);
|
|
VectorIndex = (ObjectIndex * OBJECT_DATA_STRIDE + 1) * 4;
|
|
float4 M1 = float4(ObjectData[VectorIndex + 0], ObjectData[VectorIndex + 1], ObjectData[VectorIndex + 2], ObjectData[VectorIndex + 3]);
|
|
VectorIndex = (ObjectIndex * OBJECT_DATA_STRIDE + 2) * 4;
|
|
float4 M2 = float4(ObjectData[VectorIndex + 0], ObjectData[VectorIndex + 1], ObjectData[VectorIndex + 2], ObjectData[VectorIndex + 3]);
|
|
VectorIndex = (ObjectIndex * OBJECT_DATA_STRIDE + 3) * 4;
|
|
float4 M3 = float4(ObjectData[VectorIndex + 0], ObjectData[VectorIndex + 1], ObjectData[VectorIndex + 2], ObjectData[VectorIndex + 3]);
|
|
|
|
return float4x4(M0, M1, M2, M3);
|
|
}
|
|
|
|
float3 LoadGlobalObjectLocalPositionExtent(uint ObjectIndex)
|
|
{
|
|
uint VectorIndex = (ObjectIndex * OBJECT_DATA_STRIDE + 4) * 4;
|
|
return float3(ObjectData[VectorIndex + 0], ObjectData[VectorIndex + 1], ObjectData[VectorIndex + 2]);
|
|
}
|
|
|
|
float4 LoadGlobalObjectUVScale(uint ObjectIndex)
|
|
{
|
|
uint VectorIndex = (ObjectIndex * OBJECT_DATA_STRIDE + 5) * 4;
|
|
return float4(ObjectData[VectorIndex + 0], ObjectData[VectorIndex + 1], ObjectData[VectorIndex + 2], ObjectData[VectorIndex + 3]);
|
|
}
|
|
|
|
float3 LoadGlobalObjectUVAdd(uint ObjectIndex)
|
|
{
|
|
uint VectorIndex = (ObjectIndex * OBJECT_DATA_STRIDE + 6) * 4;
|
|
return float3(ObjectData[VectorIndex + 0], ObjectData[VectorIndex + 1], ObjectData[VectorIndex + 2]);
|
|
}
|
|
|
|
void LoadGlobalObjectLocalVolumeBoundsMinMax(uint ObjectIndex, out float3 LocalVolumeBoundsMin, out float3 LocalVolumeBoundsMax)
|
|
{
|
|
uint VectorIndex = 4 * (ObjectIndex * OBJECT_DATA_STRIDE + 16);
|
|
LocalVolumeBoundsMin = float3(ObjectData[VectorIndex + 0], ObjectData[VectorIndex + 1], ObjectData[VectorIndex + 2]);
|
|
|
|
VectorIndex = 4 * (ObjectIndex * OBJECT_DATA_STRIDE + 7);
|
|
LocalVolumeBoundsMax = float3(ObjectData[VectorIndex + 0], ObjectData[VectorIndex + 1], ObjectData[VectorIndex + 2]);
|
|
}
|
|
|
|
uint LoadGlobalObjectOftenMoving(uint ObjectIndex)
|
|
{
|
|
uint VectorIndex = (ObjectIndex * OBJECT_DATA_STRIDE + 7) * 4;
|
|
return ObjectData[VectorIndex + 3] > 0.0f ? 1 : 0;
|
|
}
|
|
|
|
// In float4's. Must match equivalent C++ variables.
|
|
#define CULLED_OBJECT_DATA_STRIDE 16
|
|
#define CULLED_OBJECT_BOX_BOUNDS_STRIDE 5
|
|
|
|
float4 LoadObjectPositionAndRadiusFromBuffer(uint ObjectIndex, StructuredBuffer<float4> InBuffer)
|
|
{
|
|
return InBuffer[ObjectIndex];
|
|
}
|
|
|
|
float4x4 LoadObjectWorldToVolumeFromBuffer(uint ObjectIndex, StructuredBuffer<float4> InBuffer)
|
|
{
|
|
float4 M0 = InBuffer[ObjectIndex * CULLED_OBJECT_DATA_STRIDE + 0];
|
|
float4 M1 = InBuffer[ObjectIndex * CULLED_OBJECT_DATA_STRIDE + 1];
|
|
float4 M2 = InBuffer[ObjectIndex * CULLED_OBJECT_DATA_STRIDE + 2];
|
|
float4 M3 = InBuffer[ObjectIndex * CULLED_OBJECT_DATA_STRIDE + 3];
|
|
|
|
return float4x4(M0, M1, M2, M3);
|
|
}
|
|
|
|
float3 LoadObjectLocalPositionExtentFromBuffer(uint ObjectIndex, StructuredBuffer<float4> InBuffer)
|
|
{
|
|
return InBuffer[ObjectIndex * CULLED_OBJECT_DATA_STRIDE + 4].xyz;
|
|
}
|
|
|
|
float4 LoadObjectUVScaleFromBuffer(uint ObjectIndex, StructuredBuffer<float4> InBuffer, out bool bGeneratedAsTwoSided)
|
|
{
|
|
float4 Value = InBuffer[ObjectIndex * CULLED_OBJECT_DATA_STRIDE + 5].xyzw;
|
|
bGeneratedAsTwoSided = Value.w < 0;
|
|
return float4(Value.xyz, abs(Value.w));
|
|
}
|
|
|
|
float4 LoadObjectUVAddAndSelfShadowBiasFromBuffer(uint ObjectIndex, StructuredBuffer<float4> InBuffer)
|
|
{
|
|
return InBuffer[ObjectIndex * CULLED_OBJECT_DATA_STRIDE + 6];
|
|
}
|
|
|
|
float3x3 LoadObjectVolumeToWorldFromBuffer(uint ObjectIndex, StructuredBuffer<float4> InBuffer)
|
|
{
|
|
float3 M0 = InBuffer[ObjectIndex * CULLED_OBJECT_DATA_STRIDE + 8].xyz;
|
|
float3 M1 = InBuffer[ObjectIndex * CULLED_OBJECT_DATA_STRIDE + 9].xyz;
|
|
float3 M2 = InBuffer[ObjectIndex * CULLED_OBJECT_DATA_STRIDE + 10].xyz;
|
|
|
|
return float3x3(M0, M1, M2);
|
|
}
|
|
|
|
float4x4 LoadObjectLocalToWorldFromBuffer(uint ObjectIndex, StructuredBuffer<float4> InBuffer)
|
|
{
|
|
float4 M0 = InBuffer[ObjectIndex * CULLED_OBJECT_DATA_STRIDE + 11];
|
|
float4 M1 = InBuffer[ObjectIndex * CULLED_OBJECT_DATA_STRIDE + 12];
|
|
float4 M2 = InBuffer[ObjectIndex * CULLED_OBJECT_DATA_STRIDE + 13];
|
|
float4 M3 = InBuffer[ObjectIndex * CULLED_OBJECT_DATA_STRIDE + 14];
|
|
|
|
return float4x4(M0, M1, M2, M3);
|
|
}
|
|
|
|
// These are structured buffers so they can be scalar memory loads on GCN
|
|
StructuredBuffer<float4> CulledObjectBounds;
|
|
StructuredBuffer<float4> CulledObjectData;
|
|
StructuredBuffer<float4> CulledObjectBoxBounds;
|
|
|
|
RWStructuredBuffer<float4> RWCulledObjectBounds;
|
|
RWStructuredBuffer<float4> RWCulledObjectData;
|
|
RWStructuredBuffer<float4> RWCulledObjectBoxBounds;
|
|
|
|
float4 LoadObjectPositionAndRadius(uint ObjectIndex)
|
|
{
|
|
return LoadObjectPositionAndRadiusFromBuffer(ObjectIndex, CulledObjectBounds);
|
|
}
|
|
|
|
float4x4 LoadObjectWorldToVolume(uint ObjectIndex)
|
|
{
|
|
return LoadObjectWorldToVolumeFromBuffer(ObjectIndex, CulledObjectData);
|
|
}
|
|
|
|
float3 LoadObjectLocalPositionExtent(uint ObjectIndex)
|
|
{
|
|
return LoadObjectLocalPositionExtentFromBuffer(ObjectIndex, CulledObjectData);
|
|
}
|
|
|
|
float4 LoadObjectUVScale(uint ObjectIndex)
|
|
{
|
|
bool bUnused;
|
|
return LoadObjectUVScaleFromBuffer(ObjectIndex, CulledObjectData, bUnused);
|
|
}
|
|
|
|
float4 LoadObjectUVScale(uint ObjectIndex, out bool bGeneratedAsTwoSided)
|
|
{
|
|
return LoadObjectUVScaleFromBuffer(ObjectIndex, CulledObjectData, bGeneratedAsTwoSided);
|
|
}
|
|
|
|
float4 LoadObjectUVAddAndSelfShadowBias(uint ObjectIndex)
|
|
{
|
|
return LoadObjectUVAddAndSelfShadowBiasFromBuffer(ObjectIndex, CulledObjectData);
|
|
}
|
|
|
|
float3x3 LoadObjectVolumeToWorld(uint ObjectIndex)
|
|
{
|
|
return LoadObjectVolumeToWorldFromBuffer(ObjectIndex, CulledObjectData);
|
|
}
|
|
|
|
float4x4 LoadObjectLocalToWorld(uint ObjectIndex)
|
|
{
|
|
return LoadObjectLocalToWorldFromBuffer(ObjectIndex, CulledObjectData);
|
|
}
|
|
|
|
// x = Offset in global buffer, y = NumLOD0, z = NumSurfels (all LODs), W = instance index
|
|
uint4 LoadObjectSurfelCoordinate(uint ObjectIndex)
|
|
{
|
|
return (uint4)CulledObjectData[ObjectIndex * CULLED_OBJECT_DATA_STRIDE + 15];
|
|
}
|
|
|
|
void LoadObjectViewSpaceBox(uint ObjectIndex, out float3 ObjectViewSpaceMin, out float3 ObjectViewSpaceMax)
|
|
{
|
|
ObjectViewSpaceMin = CulledObjectBoxBounds[ObjectIndex * CULLED_OBJECT_BOX_BOUNDS_STRIDE + 0].xyz;
|
|
ObjectViewSpaceMax = CulledObjectBoxBounds[ObjectIndex * CULLED_OBJECT_BOX_BOUNDS_STRIDE + 1].xyz;
|
|
}
|
|
|
|
void LoadObjectAxes(uint ObjectIndex, out float3 ObjectAxisX, out float3 ObjectAxisY, out float3 ObjectAxisZ)
|
|
{
|
|
ObjectAxisX = CulledObjectBoxBounds[ObjectIndex * CULLED_OBJECT_BOX_BOUNDS_STRIDE + 2].xyz;
|
|
ObjectAxisY = CulledObjectBoxBounds[ObjectIndex * CULLED_OBJECT_BOX_BOUNDS_STRIDE + 3].xyz;
|
|
ObjectAxisZ = CulledObjectBoxBounds[ObjectIndex * CULLED_OBJECT_BOX_BOUNDS_STRIDE + 4].xyz;
|
|
}
|
|
|
|
void FindBestAxisVectors2(float3 InZAxis, out float3 OutXAxis, out float3 OutYAxis )
|
|
{
|
|
float3 UpVector = abs(InZAxis.z) < 0.999 ? float3(0,0,1) : float3(1,0,0);
|
|
OutXAxis = normalize( cross( UpVector, InZAxis ) );
|
|
OutYAxis = cross( InZAxis, OutXAxis );
|
|
}
|
|
|
|
// VPLs generated by raytracing from the light
|
|
#define VPL_DATA_STRIDE 3
|
|
|
|
#define FINAL_GATHER_THREADGROUP_SIZE 64
|
|
|
|
// Must match C++
|
|
#define NUM_VISIBILITY_STEPS 10
|
|
|
|
// Must match C++
|
|
#define RECORD_CONE_DATA_STRIDE NUM_VISIBILITY_STEPS
|
|
|
|
// Must match C++
|
|
#define BENT_NORMAL_STRIDE 1
|
|
|
|
Buffer<float4> IrradianceCachePositionRadius;
|
|
Buffer<float> IrradianceCacheOccluderRadius;
|
|
Buffer<uint2> IrradianceCacheTileCoordinate;
|
|
Buffer<float4> IrradianceCacheNormal;
|
|
Buffer<float4> IrradianceCacheBentNormal;
|
|
Buffer<float4> IrradianceCacheIrradiance;
|
|
|
|
Buffer<uint> ScatterDrawParameters;
|
|
Buffer<uint> SavedStartIndex;
|
|
|
|
uint NumConvexHullPlanes;
|
|
float4 ViewFrustumConvexHull[6];
|
|
|
|
bool ViewFrustumIntersectSphere(float3 SphereOrigin, float SphereRadius)
|
|
{
|
|
for (uint PlaneIndex = 0; PlaneIndex < NumConvexHullPlanes; PlaneIndex++)
|
|
{
|
|
float4 PlaneData = ViewFrustumConvexHull[PlaneIndex];
|
|
float PlaneDistance = dot(PlaneData.xyz, SphereOrigin) - PlaneData.w;
|
|
|
|
if (PlaneDistance > SphereRadius)
|
|
{
|
|
return false;
|
|
}
|
|
}
|
|
|
|
return true;
|
|
}
|
|
|
|
Buffer<float4> SurfelData;
|
|
|
|
float4 LoadSurfelPositionAndRadius(uint SurfelIndex)
|
|
{
|
|
return SurfelData[SurfelIndex * SURFEL_DATA_STRIDE + 0];
|
|
}
|
|
|
|
float3 LoadSurfelNormal(uint SurfelIndex)
|
|
{
|
|
return SurfelData[SurfelIndex * SURFEL_DATA_STRIDE + 1].xyz;
|
|
}
|
|
|
|
float3 LoadSurfelDiffuseColor(uint SurfelIndex)
|
|
{
|
|
return SurfelData[SurfelIndex * SURFEL_DATA_STRIDE + 2].xyz;
|
|
}
|
|
|
|
float3 LoadSurfelEmissiveColor(uint SurfelIndex)
|
|
{
|
|
return SurfelData[SurfelIndex * SURFEL_DATA_STRIDE + 3].xyz;
|
|
}
|
|
|
|
float ApproximateConeConeIntersection(float ArcLength0, float ArcLength1, float AngleBetweenCones)
|
|
{
|
|
float AngleDifference = abs(ArcLength0 - ArcLength1);
|
|
|
|
float Intersection = smoothstep(
|
|
0,
|
|
1.0,
|
|
1.0 - saturate((AngleBetweenCones - AngleDifference) / (ArcLength0 + ArcLength1 - AngleDifference)));
|
|
|
|
return Intersection;
|
|
}
|
|
|
|
float GetVPLOcclusion(float3 BentNormalAO, float3 NormalizedVectorToVPL, float VPLConeAngle, float VPLOcclusionStrength)
|
|
{
|
|
float BentNormalLength = length(BentNormalAO);
|
|
float UnoccludedAngle = BentNormalLength * PI / VPLOcclusionStrength;
|
|
float AngleBetween = acos(dot(BentNormalAO, NormalizedVectorToVPL) / max(BentNormalLength, .0001f));
|
|
float Visibility = ApproximateConeConeIntersection(VPLConeAngle, UnoccludedAngle, AngleBetween);
|
|
|
|
// Can't rely on the direction of the bent normal when close to fully occluded, lerp to shadowed
|
|
Visibility = lerp(0, Visibility, saturate((UnoccludedAngle - .1f) / .2f));
|
|
|
|
return Visibility;
|
|
}
|