Files
UnrealEngineUWP/Engine/Shaders/PostProcessCircleDOF.usf
Jack Porter 2082f7ec9b Copying //UE4/Dev-Mobile to //UE4/Dev-Main (Source: //UE4/Dev-Mobile @ 3056055)
#lockdown Nick.Penwarden
#rb None

==========================
MAJOR FEATURES + CHANGES
==========================

Change 3011102 on 2016/06/13 by Steve.Cano

	After taking a screenshot using glReadPixels, transfer the data to the target buffer from bottom row up to fix the "upside-down" render that OpenGL does. Confirmed with QA (owen.stupka_volt) that this does not appear to be happening on iOS (non-metal devices, inclusion of iOS in write-up was a mistake), verified on an ipod touch 5. Also confirmed that this does not happen on html5, and that Mobile HDR flag does not make a difference in function.

	#jira UE-26421
	#ue4
	#android

Change 3015801 on 2016/06/16 by Dmitriy.Dyomin

	Probbably fix for UE-30878, was not able to repro an actual crash(FFoliageInstanceBaseCache::AddInstanceBaseId). Added even more logging in case fix does not work.
	#jira  UE-30878

Change 3015903 on 2016/06/16 by Dmitriy.Dyomin

	Fixed: Levels window has Refresh/UI issues when World Composition is active
	#jira UE-26160

Change 3018352 on 2016/06/17 by Chris.Babcock

	Handle Android media prepare failure (URL without internet for example)
	#jira UE-32029
	#ue4
	#android

Change 3026387 on 2016/06/24 by Jack.Porter

	Remove FFuncTestManager warning about PIE when running on a standalone game binary

Change 3026398 on 2016/06/24 by Jack.Porter

	Prevent FSocketBSD::Recv returning false on SE_EWOULDBLOCK

Change 3027553 on 2016/06/25 by Niklas.Smedberg

	OpenGL: Made some block size calculation work for arbitrary block sizes (e.g. not pow-of-two).

Change 3027554 on 2016/06/25 by Niklas.Smedberg

	Metal: copyFromTexture now gets block-aligned size parameter (e.g. used for texture streaming)

Change 3028061 on 2016/06/26 by Jack.Porter

	Fixed a problem where newly discovered instances were not added to an existing session in the Session Browser.
	Fixed a problem where selecting an instance in a session with multiple instances didn't deselect the previously selected instance correctly.

Change 3029220 on 2016/06/27 by Steve.Cano

	Change Android Tilt values to use GetRotationMatrix/GetOrientation logic, same as java-side android would use, and adjust slightly to match as closely as possible to iOS values for tilt. There is drift and some differences in the "Y" value but the same sort of inconsistencies are also seen on iOS.

	#jira UE-6135
	#ue4
	#android

Change 3030420 on 2016/06/28 by Jack.Porter

	Fix crash with RenderOutputValidation when running with cooked content

Change 3030426 on 2016/06/28 by Jack.Porter

	Fix to CL 3026398 - make FSocketBSD(IPv6)::Recv(From) return false when recv returns 0.
	A return value of 0 indicates the connection was shutdown in an orderly manner.

Change 3030973 on 2016/06/28 by Steve.Cano

	Added a landscape downloader background along with the options to change it from within Android settings

	#ue4
	#android
	#jira UE-32318

Change 3031757 on 2016/06/28 by Chris.Babcock

	Remove unused methods from AndroidJNI header
	#ue4
	#android

Change 3032387 on 2016/06/29 by Allan.Bentham

	Rename android es31+aep -> glesdeferred.

Change 3032711 on 2016/06/29 by Allan.Bentham

	Rename GLSL_310_ES_EXT shader define:
	ES31_AEP_PROFILE -> ESDEFERRED_PROFILE
	bumped UE_SHADER_GLSL_310_ES_EXT_VER version number.

Change 3033698 on 2016/06/29 by Jack.Porter

	Merging //UE4/Dev-Main to Dev-Mobile (//UE4/Dev-Mobile)

Change 3034210 on 2016/06/30 by Steve.Cano

	Added a new AndroidRuntimeSettings variable that allows creation of installers for both Windows and Mac/Linux if set to true.

	#jira UE-32302

	#ue4
	#android

Change 3034530 on 2016/06/30 by Chris.Babcock

	Rename FManifestReader to FAndroidFileManifestReader in AndroidFile
	#jira UE-32679
	#ue4
	#android

Change 3034612 on 2016/06/30 by Steve.Cano

	Change Alpha from being set to a range of 0-255 to being in a range of 0-1 (which is the correct range of values)

	#jira UE-25325
	#ue4
	#android

Change 3034679 on 2016/06/30 by Chris.Babcock

	Fix tooltip (.command for mac, not .sh)
	#jira UE-32302
	#ue4
	#android

Change 3038881 on 2016/07/05 by Jack.Porter

	Package and launch on multiple Android devices simultaneously using the -Device=xxxxxxx+yyyyyyyy+zzzzzzzz format generated by a Project Launcher profile when you select multiple devices

	#jira UEMOB-115

Change 3039240 on 2016/07/06 by Jack.Porter

	TcpMessageTransport - connection-based message bus transport.

	#jira UEMOB-112
	#jira UEMOB-113

Change 3039252 on 2016/07/06 by Jack.Porter

	Enable messaging and session services and functional testing on Android when launched with -messaging
	Android device detection module support for adding port forwarding and connection announcement for TcpMessageTransport

	#jira UEMOB-112
	#jira UEMOB-113

Change 3039264 on 2016/07/06 by Jack.Porter

	Merging //UE4/Dev-Main to Dev-Mobile (//UE4/Dev-Mobile)

Change 3040041 on 2016/07/06 by Chris.Babcock

	Pass proper value to script generator functions
	#jira UE-32861
	#ue4
	#android

Change 3040890 on 2016/07/07 by Allan.Bentham

	Fix shadow crash
	#jira UE-32884

Change 3041458 on 2016/07/07 by Peter.Sauerbrei

	fix for IOS launch on failures

Change 3041542 on 2016/07/07 by Peter.Sauerbrei

	better fix for the multi-device deployment issue

Change 3041774 on 2016/07/07 by Steve.Cano

	Fixing crash that occurs when a games app id for Google Play is set before configuring the apk packaging. Also validating the value that is inserted and using it to override any values that have been hand-inserted into the GooglePlayAppID.xml

	#jira UE-16992
	#android
	#ue4

Change 3042222 on 2016/07/08 by Dmitriy.Dyomin

	Mobile packaging scenarious
	Added a wizard for creating launcher profiles (Android & IOS) for scenario: Minimal App + Downloadable content
	Added Archive step to launcher profiles to be able to store build product into specified directory
	Changes to a cooker to be able to pack DLC based with a different flavor to a release App
	Changes to DLC packaging to be able to build streaming data without chunking pak files
	#jira UEMOB-119

Change 3042244 on 2016/07/08 by Dmitriy.Dyomin

	Fixed crash in FTcpMessageTransportConnection::Stop

Change 3042270 on 2016/07/08 by Dmitriy.Dyomin

	GitHub #2320 : [ULevelStreamingKismet] Load Level Instance, Enables UE4 Users to create multiple transformed instances of a .umap without having to include in persistent level's list ? Rama
	contributed by: EverNewJoy
	#jira UE-29867

Change 3042449 on 2016/07/08 by Dmitriy.Dyomin

	Fixing Mac Editor build erros from CL# 3042222

Change 3042480 on 2016/07/08 by Allan.Bentham

	Add ES3.1 profile & compiler_glsl_es3_1 to shaders.

Change 3042481 on 2016/07/08 by Allan.Bentham

	hlslcc - ES3.1 changes.
	set ES3.1 version number to 310
	Do not use ES2 keywords for ES3.1.
	Generate Layout Locations for ES3.1
	bump version.

Change 3042483 on 2016/07/08 by Allan.Bentham

	Add mobile ES3.1 support.
	Recreates EGL and ES3.1 context during PlatformInitOpenGL if ES3.1 is required.

Change 3042485 on 2016/07/08 by Allan.Bentham

	Undo android XGE change.

Change 3042506 on 2016/07/08 by Dmitriy.Dyomin

	One more compile fix from CL# 3042222

Change 3044173 on 2016/07/10 by Dmitriy.Dyomin

	UAT: Added support for building target platforms with multiple cook flavors
	ex: -targetplatform=Android -cookflavor=ETC1+ETC2

Change 3044213 on 2016/07/11 by Dmitriy.Dyomin

	Fixed: Can't stream in a level whose name is a substring of another streaming level
	#jira UE-32999

Change 3044221 on 2016/07/11 by Jack.Porter

	Merging //UE4/Dev-Main to Dev-Mobile (//UE4/Dev-Mobile)

Change 3044815 on 2016/07/11 by Allan.Bentham

	Corrected NAME_GLSL_ES3_1_ANDROID format string.

Change 3046911 on 2016/07/12 by Chris.Babcock

	Add handling of OnTextChanged for virtual keyboard input on Android
	#jira UE-32348
	#ue4
	#android

Change 3046958 on 2016/07/12 by Chris.Babcock

	Rename some functions with Error in the name to prevent false coloring in the logs
	#jira UE-30541
	#ue4
	#android

Change 3047169 on 2016/07/12 by Chris.Babcock

	Return player ID and handle auth token for Google Play Games on Android (contributed by gameDNAstudio)
	#jira UE-30610
	#pr #2372
	#ue4
	#android

Change 3047406 on 2016/07/12 by Jack.Porter

	Add missing import to GameActivity.java

Change 3047442 on 2016/07/13 by Dmitriy.Dyomin

	Added: Mobile custom post-process
	Limitations: can fetch only from PostProcessInput0 (SceneColor) other scene textures are not supported. Does not support "Replacing the Tonemapper" blendable location.
	#jira UEMOB-147

Change 3047466 on 2016/07/13 by Dmitriy.Dyomin

	Disabled engine crash handler on Android, system crash handler works more reliably across different os versions/devices

Change 3047746 on 2016/07/13 by Jack.Porter

	Rename FBasePassFowardDynamicPointLightInfo

Change 3047778 on 2016/07/13 by Jack.Porter

	Missing file for rename FBasePassFowardDynamicPointLightInfo

Change 3047788 on 2016/07/13 by Allan.Bentham

	Fix incorrect TargetPlatformDescriptor string generation.

Change 3047790 on 2016/07/13 by Allan.Bentham

	Fixed half3x3 matrix use with ES3.1 glsl
	Fixed couple of interpolator precision mismatch.
	Fixed ES3.1 support detection issues

Change 3047816 on 2016/07/13 by Allan.Bentham

	Remove AndroidGL4 remnants.

Change 3048926 on 2016/07/13 by Chris.Babcock

	Added detection of Amazon Fire TV to disable requiring virtual joysticks
	#ue4
	#android

Change 3049335 on 2016/07/14 by Dmitriy.Dyomin

	Fixing UAT crash when packaging project for iOS

Change 3049390 on 2016/07/14 by Jack.Porter

	Disabled error for warning 4819 "The file contains a character that cannot be represented in the current code page (xxx). Save the file in Unicode format to prevent data loss"
	This is triggered by European characters and copyright symbols in source saved as latin-1 when compiling on non-US windows. Seen often in 3rd party headers, eg nvapi.

	#code_review: Ben.Marsh

Change 3049391 on 2016/07/14 by Jack.Porter

	Fixed incorrect comment order in CL 3049390

Change 3049545 on 2016/07/14 by Dmitriy.Dyomin

	Reworking some code from CL#3047442 to make static analizer happy

Change 3049626 on 2016/07/14 by Allan.Bentham

	Automatic CSM shader toggling
	#jira UE-27429

Change 3051574 on 2016/07/15 by Jack.Porter

	Support for lighting channels on Mobile
	- Multiple directional lights are supported in different channels but primitives are only affected by the directional light in the first channel they have set
	- CSM shadows from stationary or movable directional lights correctly follow their lighting channels
	- No channel limitations for dynamic point lights

	Notes:
	Removed mobile-specific directional light shadowing fields from View uniform buffer and mobile no longers uses SimpleDirectionalLight.
	Separate uniform buffers for mobile directional light are generated for each lighting channel.
	CSM culling information is now stored in FViewInfo and not per FVisibleLightViewInfo as the visibility bits are per view.

	#code_review Daniel.Wright
	#jira UEMOB-110

Change 3051699 on 2016/07/15 by Steve.Cano

	Preserve the original, pre-transformed input vertices for Slate shaders, which is required to properly do anti-aliasing (the ViewProjection-transformed values were causing the lines to not be drawn).

	#jira UE-20320
	#ue4
	#android

Change 3051744 on 2016/07/15 by Chris.Babcock

	Fix Android Vulkan include path checks (contributed by kodomastro)
	#jira UE-33311
	#PR #2602
	#ue4
	#android

Change 3052023 on 2016/07/15 by Chris.Babcock

	Fix shadowed variables

Change 3052110 on 2016/07/15 by Chris.Babcock

	Compile fixes for light channel support on mobile
	- missing template
	- accessor function for MobileDirectionalLights from scene

Change 3052242 on 2016/07/15 by Chris.Babcock

	Compile fixes for light channel support on mobile
	- removed dependency on C++14 feature

Change 3052730 on 2016/07/16 by Dmitriy.Dyomin

	Win32 build fix

Change 3053041 on 2016/07/17 by Jack.Porter

	Merging //UE4/Dev-Main to Dev-Mobile (//UE4/Dev-Mobile)

Change 3053054 on 2016/07/17 by Jack.Porter

	Changed use of old function ShouldUseDeferredRenderer() to new GetShadingPath()

Change 3053055 on 2016/07/17 by Jack.Porter

	Fixed local variable aliasing in unity build

Change 3053206 on 2016/07/18 by Jack.Porter

	Support ExecuteJavascript on iOS and Android
	Expose ExecuteJavascript to widget blueprint
	Fix ExecuteJavascript unicode string support on desktop platforms

	#jira UEMOB-152

Change 3053323 on 2016/07/18 by Dmitriy.Dyomin

	Added: Ability to set thread affinity for a device in Device Profiles (ex: +CVars=android.SetThreadAffinity=RT 0x02 GT 0x01)
	#jira UEMOB-107

Change 3053723 on 2016/07/18 by Jack.Porter

	Fix for UnrealTournamentProto.Automation.cs build errors

Change 3055090 on 2016/07/19 by Dmitriy.Dyomin

	Junk OnlineBlueprintSupport module binaries

[CL 3056789 by Jack Porter in Main branch]
2016-07-19 19:13:01 -04:00

763 lines
28 KiB
Plaintext

// Copyright 1998-2016 Epic Games, Inc. All Rights Reserved.
/*=============================================================================
PostProcessCircleDOF.usf: PostProcessing Circle Depth of Field
=============================================================================*/
#include "Common.usf"
#include "PostProcessCommon.usf"
#include "DeferredShadingCommon.usf" // FGBufferData
#include "DepthOfFieldCommon.usf"
#include "CircleDOFCommon.usf"
// 0:off / 1:on (use with "vis CircleDOF0")
#define DEBUG_SAMPLE_PATTERN 0
// 0:old, 1:new(fixed high res screenshots), does not make a difference
#define DILATION_RESOLUTION_INDEPENDENT 0
// Note: View.CircleDOFParams.w = View.ViewSizeAndInvSize.x / 1920
// is wrapping the DepthToCoc() function in this file for faster iteration
// @return half res pixel radius
float DepthToCoc2(float SceneDepth)
{
// return ((SceneDepth > 8000) ? 5.0f : 2.6f) * 0.5f * View.CircleDOFParams.w;
return DepthToCoc(SceneDepth);
}
// pixel shader entry point
void CircleSetupPS(noperspective float4 UVAndScreenPos : TEXCOORD0, out float4 OutColor0 : SV_Target0)
{
float2 UV = UVAndScreenPos.xy;
float4 DepthQuad = GatherSceneDepth(UV, PostprocessInput1Size.zw);
UV = UVAndScreenPos.xy - 0.5*PostprocessInput0Size.zw;
float3 CW = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UV.xy, 0).rgb;
float3 CZ = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UV.xy, 0, int2(1,0)).rgb;
float3 CX = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UV.xy, 0, int2(0,1)).rgb;
float3 CY = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UV.xy, 0, int2(1,1)).rgb;
// clamp to avoid artifacts from exceeding fp16 through framebuffer blending of multiple very bright lights
CW = min(256 * 256, CW);
CZ = min(256 * 256, CZ);
CX = min(256 * 256, CX);
CY = min(256 * 256, CY);
float4 CocQuad = float4(DepthToCoc2(DepthQuad.x), DepthToCoc2(DepthQuad.y), DepthToCoc2(DepthQuad.z), DepthToCoc2(DepthQuad.w));
#if ENABLE_FAR_BLUR == 0
CocQuad = min(CocQuad, 0);
#endif
// Doing a max depth reduction (erode the foreground). Less correct, but less artifacts.
// Perhaps need to re-open this in the future.
float mi = min(min(CocQuad.x,CocQuad.y),min(CocQuad.z,CocQuad.w));
float ma = max(max(CocQuad.x,CocQuad.y),max(CocQuad.z,CocQuad.w));
float ami = min(min(abs(CocQuad.x),abs(CocQuad.y)),min(abs(CocQuad.z),abs(CocQuad.w)));
float ama = max(max(abs(CocQuad.x),abs(CocQuad.y)),max(abs(CocQuad.z),abs(CocQuad.w)));
// 0:was an option before, causes erosion / 1:to reduce TemporalAA issues / 2:was used in KiteDemo
#define COC_METHOD 2
#if COC_METHOD == 0
// Stuff max radius in alpha.
// bad erosion on TemporalDitherAA
OutColor0.a = ma;
#elif COC_METHOD == 1
// acceptable TemporalDitherAA
// requires DefaultWeight > 1
OutColor0.a = (mi + ma) / 2;
#elif COC_METHOD == 2
// This in theory is better but causes bleeding artifacts with temporal AA..
// This is important otherwise near thin objects disappear (leaves clamping artifacts in recombined pass).
// bad on TemporalDitherAA, flat opacity where it should transition
OutColor0.a = CocQuad.x;
if(abs(OutColor0.a) > CocQuad.y) OutColor0.a = CocQuad.y;
if(abs(OutColor0.a) > CocQuad.z) OutColor0.a = CocQuad.z;
if(abs(OutColor0.a) > CocQuad.w) OutColor0.a = CocQuad.w;
#elif COC_METHOD == 3
// this should be better than the method before
// bad on TemporalDitherAA
OutColor0.a = CocQuad.x;
if(abs(OutColor0.a) > abs(CocQuad.y)) OutColor0.a = CocQuad.y;
if(abs(OutColor0.a) > abs(CocQuad.z)) OutColor0.a = CocQuad.z;
if(abs(OutColor0.a) > abs(CocQuad.w)) OutColor0.a = CocQuad.w;
#elif COC_METHOD == 4
// Stuff max radius in alpha.
OutColor0.a = mi;
#elif COC_METHOD == 5
// artifacts that look like negative colors (tb070) (with and without the 2nd line)
// bad erosion on TemporalDitherAA
OutColor0.a = (ami + ama) / 2;
// if((mi + ma) / 2 < 0) OutColor0.a = 0;
#elif COC_METHOD == 6
// like #3 but with inverted comparison, ok?
// bad erosion on TemporalDitherAA
OutColor0.a = CocQuad.x;
if(abs(OutColor0.a) < abs(CocQuad.y)) OutColor0.a = CocQuad.y;
if(abs(OutColor0.a) < abs(CocQuad.z)) OutColor0.a = CocQuad.z;
if(abs(OutColor0.a) < abs(CocQuad.w)) OutColor0.a = CocQuad.w;
#elif COC_METHOD == 7
// requires DefaultWeight > 1
float A = CocQuad.x;
if(abs(A) < abs(CocQuad.y)) A = CocQuad.y;
if(abs(A) < abs(CocQuad.z)) A = CocQuad.z;
if(abs(A) < abs(CocQuad.w)) A = CocQuad.w;
float B = CocQuad.x;
if(abs(B) > abs(CocQuad.y)) B = CocQuad.y;
if(abs(B) > abs(CocQuad.z)) B = CocQuad.z;
if(abs(B) > abs(CocQuad.w)) B = CocQuad.w;
OutColor0.a= (A + B) / 2;
#elif COC_METHOD == 8
// broken near dof
OutColor0.a = dot(0.25f, max(0, CocQuad));
#elif COC_METHOD == 9
// mix between 2 and 8, seems to be best in most cases
// requires DefaultWeight > 1
OutColor0.a = CocQuad.x;
if(abs(OutColor0.a) > CocQuad.y) OutColor0.a = CocQuad.y;
if(abs(OutColor0.a) > CocQuad.z) OutColor0.a = CocQuad.z;
if(abs(OutColor0.a) > CocQuad.w) OutColor0.a = CocQuad.w;
if(OutColor0.a > 0) OutColor0.a = dot(0.25f, max(0, CocQuad));
#else
error
#endif
// >1 to avoid /0 (resulting in dark outlines in level tb070)
// a bit laregr to avoid a specific leaking artifact in level tb080
const float DefaultWeight = 1.4f;
// Remove samples which are outside the size.
// TODO: Tune the ScaleFactor.
float ScaleFactor = 64.0;
float4 W = float4(
DefaultWeight - saturate(abs(OutColor0.a - CocQuad.x) * ScaleFactor),
DefaultWeight - saturate(abs(OutColor0.a - CocQuad.y) * ScaleFactor),
DefaultWeight - saturate(abs(OutColor0.a - CocQuad.z) * ScaleFactor),
DefaultWeight - saturate(abs(OutColor0.a - CocQuad.w) * ScaleFactor));
OutColor0.rgb = (1.0 / (W.x + W.y + W.z + W.w)) * (CX * W.x + CY * W.y + CZ * W.z + CW * W.w);
}
// {0 to 1} output.
float NoizNorm(float2 N, float X)
{
N+=X;
return frac(sin(dot(N.xy,float2(12.9898, 78.233)))*43758.5453);
}
// {-1 to 1} output.
float NoizSnorm(float2 N, float X)
{
return NoizNorm(N,X)*2.0-1.0;
}
float2 RotVec(float Radius, float Radians)
{
return Radius * float2(cos(Radians), sin(Radians));
}
float2 RandomOffset;
float Min4(float4 A)
{
return min(min(A.x,A.y),min(A.z,A.w));
}
float Min16(float4 A, float4 B, float4 C, float4 D)
{
return min(min(Min4(A),Min4(B)),min(Min4(C),Min4(D)));
}
// This does a 2x2:1 reduction with a 4x4:1 dilation.
// OutColor is float as we output to a single channel format (NVIDIA Windows driver has undefined result in the other channels if the RT has more than one)
void CircleDilatePS(float4 UVAndScreenPos : TEXCOORD0, out float OutColor : SV_Target0)
{
// Sampling pattern (each gather4)
// d g
// j M (M={0,0} point)
#if COMPILER_GLSL || COMPILER_GLSL_ES2 || COMPILER_GLSL_ES3_1 || FEATURE_LEVEL < FEATURE_LEVEL_SM5
float2 UV = UVAndScreenPos.xy + 0.5*PostprocessInput0Size.zw;
// This leverages nearest sampling (bilinear won't work).
// Probably not the best way to do this.
float4 Sd, Sg, Sj, Sm;
Sd.x = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UV.xy, 0, int2(-2,-2)).a;
Sd.y = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UV.xy, 0, int2(-1,-2)).a;
Sd.z = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UV.xy, 0, int2(-2,-1)).a;
Sd.w = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UV.xy, 0, int2(-1,-1)).a;
Sg.x = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UV.xy, 0, int2(0,-2)).a;
Sg.y = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UV.xy, 0, int2(1,-2)).a;
Sg.z = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UV.xy, 0, int2(0,-1)).a;
Sg.w = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UV.xy, 0, int2(1,-1)).a;
Sj.x = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UV.xy, 0, int2(-2,0)).a;
Sj.y = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UV.xy, 0, int2(-1,0)).a;
Sj.z = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UV.xy, 0, int2(-2,1)).a;
Sj.w = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UV.xy, 0, int2(-1,1)).a;
Sm.x = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UV.xy, 0, int2(0,0)).a;
Sm.y = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UV.xy, 0, int2(1,0)).a;
Sm.z = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UV.xy, 0, int2(0,1)).a;
Sm.w = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UV.xy, 0, int2(1,1)).a;
#else
#if DILATION_RESOLUTION_INDEPENDENT
float factor = View.ViewSizeAndInvSize.x / 1980.0f;
float2 A = 1 + float2(-2,-2);
float2 B = 1 + float2(0,-2);
float2 C = 1 + float2(-2,0);
float2 D = 1 + float2(0,0);
float4 Sd = PostprocessInput0.GatherAlpha(PostprocessInput0Sampler, UVAndScreenPos.xy + A * factor * PostprocessInput0Size.zw);
float4 Sg = PostprocessInput0.GatherAlpha(PostprocessInput0Sampler, UVAndScreenPos.xy + B * factor * PostprocessInput0Size.zw);
float4 Sj = PostprocessInput0.GatherAlpha(PostprocessInput0Sampler, UVAndScreenPos.xy + C * factor * PostprocessInput0Size.zw);
float4 Sm = PostprocessInput0.GatherAlpha(PostprocessInput0Sampler, UVAndScreenPos.xy + D * factor * PostprocessInput0Size.zw);
#else
float2 UV = UVAndScreenPos.xy + 1.0*PostprocessInput0Size.zw;
float4 Sd = PostprocessInput0.GatherAlpha(PostprocessInput0Sampler, UV, int2(-2,-2));
float4 Sg = PostprocessInput0.GatherAlpha(PostprocessInput0Sampler, UV, int2(0,-2));
float4 Sj = PostprocessInput0.GatherAlpha(PostprocessInput0Sampler, UV, int2(-2,0));
float4 Sm = PostprocessInput0.GatherAlpha(PostprocessInput0Sampler, UV, int2(0,0));
#endif
#endif
// Make sure near is only near blur.
OutColor = min(0.0, Min16(Sd, Sg, Sj, Sm));
}
float4 TestFunc(float2 p)
{
return saturate(15 - length((p- 0.5f) * PostprocessInput0Size.xy));
}
void Circle4Samples(float4 UVAndScreenPos : TEXCOORD0, out float4 OutColor0 : SV_Target0, float InRand)
{
float2 UV = UVAndScreenPos.xy;
//
// Pass 0
// Dilate near minimum CoC (near CoC is negative values).
//
// Fixed maximum search size (in terms of Circle of Confusion radius).
// Higher than 8 is too noizy for 4 samples. (actual radius is scaled by width/1920)
float Coc = 8.0 * View.CircleDOFParams.w;
float LocalRand = RandomOffset.x + InRand;
// Get base semi-random direction and dither along radius.
// Reused throughout the rest of the algorithm.
float TwoPi = 2.0 * 3.14159;
float RadianBase = NoizSnorm(UVAndScreenPos.xy, 0.010 * LocalRand) * TwoPi;
float RadiusBase = NoizNorm(UVAndScreenPos.xy, 0.013 * LocalRand);
#if DEBUG_SAMPLE_PATTERN
Coc = 100 * View.CircleDOFParams.w;
#endif
// Radius
float RadiusBase2 = RadiusBase * (1.0/4.0);
float R1 = sqrt(RadiusBase2 + 3.0/4.0) * Coc;
float R2 = sqrt(RadiusBase2 + 2.0/4.0) * Coc;
float R3 = sqrt(RadiusBase2 + 1.0/4.0) * Coc;
float R4 = sqrt(RadiusBase2 + 0.6/4.0) * Coc; // 0 gives a disk shape, 0.6 avoids some artifacts but results in a dark center
float2 UV1 = RotVec(R1, RadianBase + TwoPi * 0.0/4.0);
float2 UV2 = RotVec(R2, RadianBase + TwoPi * 2.0/4.0);
float2 UV3 = RotVec(R3, RadianBase + TwoPi * 1.0/4.0);
float2 UV4 = RotVec(R4, RadianBase + TwoPi * 3.0/4.0);
UV1 = UVAndScreenPos.xy + UV1 * PostprocessInput0Size.zw;
UV2 = UVAndScreenPos.xy + UV2 * PostprocessInput0Size.zw;
UV3 = UVAndScreenPos.xy + UV3 * PostprocessInput0Size.zw;
UV4 = UVAndScreenPos.xy + UV4 * PostprocessInput0Size.zw;
#if DEBUG_SAMPLE_PATTERN
OutColor0 = TestFunc(UV1)*float4(1,0,0,0) +
TestFunc(UV2)*float4(0,1,0,0) +
TestFunc(UV3)*float4(0,0,1,0) +
TestFunc(UV4)*float4(1,1,1,0)/3.0f;
OutColor0.rgb=dot(1/3.0f, OutColor0.rgb);
OutColor0.a=1;
return;
#endif
float D1 = PostprocessInput1.SampleLevel(PostprocessInput1Sampler, UV1, 0).x;
float D2 = PostprocessInput1.SampleLevel(PostprocessInput1Sampler, UV2, 0).x;
float D3 = PostprocessInput1.SampleLevel(PostprocessInput1Sampler, UV3, 0).x;
float D4 = PostprocessInput1.SampleLevel(PostprocessInput1Sampler, UV4, 0).x;
float NearCoc = 65536.0;
// >0 causes DepthBlur to disappear in low resolutions, unclear why it was 2
float Feather = 0.0f;
if(abs(D1)+Feather > R1) NearCoc = min(NearCoc, D1);
if(abs(D2)+Feather > R2) NearCoc = min(NearCoc, D2);
if(abs(D3)+Feather > R3) NearCoc = min(NearCoc, D3);
if(abs(D4)+Feather > R4) NearCoc = min(NearCoc, D4);
//
// Pass 1
//
// Going to grab sets of 4 samples per pass.
// Each set of 4 samples can be a smaller circle of confusion
// (aka can be in-front of the larger background).
// Setup for 12 samples (3 passes of 4 samples).
RadiusBase *= (1.0/11.5);
// Grab circle of confusion for the pixel and pixel color.
OutColor0 = Texture2DSampleLevel(PostprocessInput0, PostprocessInput0Sampler, UV, 0);
// Uncomment to see intermediate debug output
// return;
float FarCoc = OutColor0.a;
// Fix in case no near exists.
NearCoc = min(NearCoc, FarCoc);
// Used for sample pattern.
Coc = max(abs(FarCoc),abs(NearCoc));
// Bring out to the smaller radius of sample sets.
// This has the highest chance of seeing a smaller overlapping CoC.
R1 = (RadiusBase+9.0/11.5) * Coc;
R2 = (RadiusBase+3.0/11.5) * Coc;
R3 = (RadiusBase+6.0/11.5) * Coc;
R4 = (RadiusBase+0.0/11.5) * Coc;
// Ensure at least getting different sample than center pixel.
float R1a = max(1.0,R1);
float R2a = max(1.0,R2);
float R3a = max(1.0,R3);
float R4a = max(1.0,R4);
UV1 = RotVec(R1a, RadianBase + TwoPi * 0.0/12.0);
UV2 = RotVec(R2a, RadianBase + TwoPi * 3.0/12.0);
UV3 = RotVec(R3a, RadianBase + TwoPi * 6.0/12.0);
UV4 = RotVec(R4a, RadianBase + TwoPi * 9.0/12.0);
UV1 = UVAndScreenPos.xy + UV1 * PostprocessInput0Size.zw;
UV2 = UVAndScreenPos.xy + UV2 * PostprocessInput0Size.zw;
UV3 = UVAndScreenPos.xy + UV3 * PostprocessInput0Size.zw;
UV4 = UVAndScreenPos.xy + UV4 * PostprocessInput0Size.zw;
float4 C1 = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UV1, 0);
float4 C2 = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UV2, 0);
float4 C3 = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UV3, 0);
float4 C4 = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UV4, 0);
// Base weight works around the max(1.0,radius) constraint.
// Base weight also shapes to weight higher on the outside radius.
float W0 = 1.0 - saturate(Coc);
float W1 = R1;
float W2 = R2;
float W3 = R3;
float W4 = R4;
// Intersection weight: 0=sample does not intersect pixel, to 1=sample intersects.
// TODO: Tune feather factors.
float IFeather0 = 1.0/4.0;
float I1 = saturate((abs(C1.a) - R1) * IFeather0);
float I2 = saturate((abs(C2.a) - R2) * IFeather0);
float I3 = saturate((abs(C3.a) - R3) * IFeather0);
float I4 = saturate((abs(C4.a) - R4) * IFeather0);
// Check if have a more near intersecting Coc for next pass.
float FarCoc2 = FarCoc;
if(I1*W1 > 0.0) FarCoc2 = min(FarCoc2, C1.a);
if(I2*W2 > 0.0) FarCoc2 = min(FarCoc2, C2.a);
if(I3*W3 > 0.0) FarCoc2 = min(FarCoc2, C3.a);
if(I4*W4 > 0.0) FarCoc2 = min(FarCoc2, C4.a);
// Fully ignore intersection weight when in nearfield blur
// and sample average CoC is 50% between near and far CoC neighborhood.
float AvgCoc = (FarCoc + C1.a + C2.a + C3.a + C4.a) * (1.0/5.0);
// Get dilated far.
FarCoc = max(FarCoc, max(max(C1.a, C2.a),max(C3.a, C4.a)));
// Controls the transition between states.
float IFeather1 = 1.0;
float IFeather2 = 2.0;
float Ignore = saturate(-NearCoc * IFeather1) * saturate(((AvgCoc - FarCoc) / (NearCoc - FarCoc)) * IFeather2);
W1 *= lerp(I1, 1.0, Ignore);
W2 *= lerp(I2, 1.0, Ignore);
W3 *= lerp(I3, 1.0, Ignore);
W4 *= lerp(I4, 1.0, Ignore);
// Make sure at least something is not zero.
W0 += 1.0/65536.0;
// Start weighted accumulation.
OutColor0.rgb = OutColor0.rgb * W0 + C1.rgb * W1 + C2.rgb * W2 + C3.rgb * W3 + C4.rgb * W4;
float Weight = W0+W1+W2+W3+W4;
// Set current result as possible background.
float3 Background = OutColor0.rgb * (1.0/Weight);
// former method
// #define FadeOutOutsideCoC(INDEX, COC) C##INDEX.rgb = lerp(C##INDEX.rgb, Background.rgb, saturate(abs(C##INDEX.a) - COC));
// new method avoids having the center leaking with large CoC
#define FadeOutOutsideCoC(INDEX, COC) W##INDEX = lerp(W##INDEX, 0, saturate(abs(C##INDEX.a) - COC));
// Uncomment to see intermediate debug output
// OutColor0.rgb *= (1.0/Weight);return;
//
// Pass 2
//
// Drop weight of existing pass if Coc changes too much.
float Coc2 = max(abs(FarCoc2),abs(NearCoc));
float Drop = (1.0/65536.0) + 1.0 - saturate(abs(Coc - Coc2));
OutColor0.rgb *= Drop;
Weight *= Drop;
R1 = (RadiusBase+10.0/11.5) * Coc2;
R2 = (RadiusBase+ 4.0/11.5) * Coc2;
R3 = (RadiusBase+ 7.0/11.5) * Coc2;
R4 = (RadiusBase+ 1.0/11.5) * Coc2;
R1a = max(1.0,R1);
R2a = max(1.0,R2);
R3a = max(1.0,R3);
R4a = max(1.0,R4);
UV1 = RotVec(R1a, RadianBase + TwoPi * 8.0/12.0);
UV2 = RotVec(R2a, RadianBase + TwoPi * 11.0/12.0);
UV3 = RotVec(R3a, RadianBase + TwoPi * 2.0/12.0);
UV4 = RotVec(R4a, RadianBase + TwoPi * 5.0/12.0);
UV1 = UVAndScreenPos.xy + UV1 * PostprocessInput0Size.zw;
UV2 = UVAndScreenPos.xy + UV2 * PostprocessInput0Size.zw;
UV3 = UVAndScreenPos.xy + UV3 * PostprocessInput0Size.zw;
UV4 = UVAndScreenPos.xy + UV4 * PostprocessInput0Size.zw;
C1 = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UV1, 0);
C2 = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UV2, 0);
C3 = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UV3, 0);
C4 = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UV4, 0);
// Lerp to background if outside possibly smaller CoC.
FadeOutOutsideCoC(1, Coc2)
FadeOutOutsideCoC(2, Coc2)
FadeOutOutsideCoC(3, Coc2)
FadeOutOutsideCoC(4, Coc2)
W1 = R1;
W2 = R2;
W3 = R3;
W4 = R4;
// Intersection weight: 0=sample does not intersect pixel, to 1=sample intersects.
I1 = saturate((abs(C1.a) - R1) * IFeather0);
I2 = saturate((abs(C2.a) - R2) * IFeather0);
I3 = saturate((abs(C3.a) - R3) * IFeather0);
I4 = saturate((abs(C4.a) - R4) * IFeather0);
// Check if have a more near intersecting Coc for next pass.
float FarCoc3 = FarCoc2;
if(I1*W1 > 0.0) FarCoc3 = min(FarCoc3, C1.a);
if(I2*W2 > 0.0) FarCoc3 = min(FarCoc3, C2.a);
if(I3*W3 > 0.0) FarCoc3 = min(FarCoc3, C3.a);
if(I4*W4 > 0.0) FarCoc3 = min(FarCoc3, C4.a);
W1 *= lerp(I1, 1.0, Ignore);
W2 *= lerp(I2, 1.0, Ignore);
W3 *= lerp(I3, 1.0, Ignore);
W4 *= lerp(I4, 1.0, Ignore);
OutColor0.rgb += C1.rgb * W1 + C2.rgb * W2 + C3.rgb * W3 + C4.rgb * W4;
Weight += W1+W2+W3+W4;
// Uncomment to see intermediate debug output
// OutColor0.rgb *= (1.0/Weight);return;
//
// Pass 3
//
// Drop weight of existing pass if Coc changes too much.
float Coc3 = max(abs(FarCoc3),abs(NearCoc));
Drop = (1.0/65536.0) + 1.0 - saturate(abs(Coc2 - Coc3));
OutColor0.rgb *= Drop;
Weight *= Drop;
// Send near most CoC back to recombine pass.
OutColor0.a = min(FarCoc3, NearCoc);
R1 = (RadiusBase+11.0/11.5) * Coc3;
R2 = (RadiusBase+ 5.0/11.5) * Coc3;
R3 = (RadiusBase+ 8.0/11.5) * Coc3;
R4 = (RadiusBase+ 2.0/11.5) * Coc3;
R1a = max(1.0,R1);
R2a = max(1.0,R2);
R3a = max(1.0,R3);
R4a = max(1.0,R4);
UV1 = RotVec(R1a, RadianBase + TwoPi * 4.0/12.0);
UV2 = RotVec(R2a, RadianBase + TwoPi * 7.0/12.0);
UV3 = RotVec(R3a, RadianBase + TwoPi * 10.0/12.0);
UV4 = RotVec(R4a, RadianBase + TwoPi * 1.0/12.0);
UV1 = UVAndScreenPos.xy + UV1 * PostprocessInput0Size.zw;
UV2 = UVAndScreenPos.xy + UV2 * PostprocessInput0Size.zw;
UV3 = UVAndScreenPos.xy + UV3 * PostprocessInput0Size.zw;
UV4 = UVAndScreenPos.xy + UV4 * PostprocessInput0Size.zw;
C1 = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UV1, 0);
C2 = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UV2, 0);
C3 = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UV3, 0);
C4 = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UV4, 0);
FadeOutOutsideCoC(1, Coc3)
FadeOutOutsideCoC(2, Coc3)
FadeOutOutsideCoC(3, Coc3)
FadeOutOutsideCoC(4, Coc3)
W1 = R1;
W2 = R2;
W3 = R3;
W4 = R4;
I1 = saturate((abs(C1.a) - R1) * IFeather0);
I2 = saturate((abs(C2.a) - R2) * IFeather0);
I3 = saturate((abs(C3.a) - R3) * IFeather0);
I4 = saturate((abs(C4.a) - R4) * IFeather0);
W1 *= lerp(I1, 1.0, Ignore);
W2 *= lerp(I2, 1.0, Ignore);
W3 *= lerp(I3, 1.0, Ignore);
W4 *= lerp(I4, 1.0, Ignore);
OutColor0.rgb += C1.rgb * W1 + C2.rgb * W2 + C3.rgb * W3 + C4.rgb * W4;
Weight += W1+W2+W3+W4;
OutColor0.rgb *= (1.0/Weight);
}
// pixel shader entry point
void CirclePS(float4 UVAndScreenPos : TEXCOORD0, out float4 OutColor0 : SV_Target0)
{
// Count 2 or higher, gets slower but less noisy
#if ( QUALITY == 2 )
const uint Count = 32;
#elif ( QUALITY == 1 )
const uint Count = 12;
#else
const uint Count = 1;
#endif
OutColor0 = 0;
LOOP for(uint i = 0; i < Count; ++i)
{
float4 Color;
Circle4Samples(UVAndScreenPos, Color, i);
OutColor0 += Color;
}
OutColor0 /= Count;
}
// actual color is RGB/A
float4 Recombine2Samples(float4 UVAndScreenPos, float PixCoc, float InRand)
{
float LocalRand = RandomOffset.x + InRand;
#if 1
// Fetch 2 samples mirrored around the pixel
// which is stochastically distributed to fill out the circle of confusion.
// TODO: Fix the "random values".
float2 UV = UVAndScreenPos.xy * PostprocessInput0Size.xy;
float RadianBase = NoizNorm(UVAndScreenPos.xy, 0.010 * LocalRand) * 3.14159; // 0..PI as we use sample pairs and don't need 0..2*PI
float RadiusJitter = NoizNorm(UVAndScreenPos.xy, 0.013 * LocalRand);
float ICoc = PixCoc*sqrt(RadiusJitter);
float2 VP = RotVec(ICoc, RadianBase) * PostprocessInput0Size.zw;
// These two samples will still have jitter induced artifacts (very limited utility).
// These two samples will also have bleeding artifacts.
float4 CA = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UVAndScreenPos.xy + VP, 0);
float4 CB = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UVAndScreenPos.xy - VP, 0);
// clamp to avoid artifacts from exceeding fp16 through framebuffer blending of multiple very bright lights
CA.rgb = min(float3(256 * 256, 256 * 256, 256 * 256), CA.rgb);
CB.rgb = min(float3(256 * 256, 256 * 256, 256 * 256), CB.rgb);
float I1 = 1.0/65536.0;
float I2 = 1.0/65536.0;
#if QUALITY
// we don't have depth in the alpha channel
CA.a = DepthToCoc2(CalcSceneDepth(UVAndScreenPos.xy + VP));
CB.a = DepthToCoc2(CalcSceneDepth(UVAndScreenPos.xy - VP));
// Weight the two samples to avoid forground into background bleed.
float IFeather0 = 1.0/4.0;
ICoc *= 0.5; // Coc is half res units.
// todo: verify the value, was 0.0f at some point
float Tweak = 1.0f;
I1 += saturate((abs(CA.a) - ICoc) * IFeather0 + Tweak);
I2 += saturate((abs(CB.a) - ICoc) * IFeather0 + Tweak);
#endif
return float4(CA.rgb * I1 + CB.rgb * I2, I1 + I2);
#else
// Possibly higher quality option in the future.
// Fetch 4 samples in filled disc pattern
// which is stochastically distributed to fill out the circle of confusion.
float2 UV = UVAndScreenPos.xy * PostprocessInput0Size.xy;
float RadianBase = NoizNorm(UVAndScreenPos.xy, 0.010 * LocalRand) * 3.14159 * 2.0;
float RadiusBase = NoizNorm(UVAndScreenPos.xy, 0.013 * LocalRand);
float RadiusBase2 = RadiusBase * (1.0/4.0);
float R1 = sqrt(RadiusBase2 + 3.0/4.0) * PixCoc;
float R2 = sqrt(RadiusBase2 + 2.0/4.0) * PixCoc;
float R3 = sqrt(RadiusBase2 + 1.0/4.0) * PixCoc;
float R4 = sqrt(RadiusBase2 + 0.0/4.0) * PixCoc;
float TwoPi = 3.14159 * 2.0;
float2 UV1 = RotVec(R1, RadianBase + TwoPi * 0.0/4.0);
float2 UV2 = RotVec(R2, RadianBase + TwoPi * 2.0/4.0);
float2 UV3 = RotVec(R3, RadianBase + TwoPi * 1.0/4.0);
float2 UV4 = RotVec(R4, RadianBase + TwoPi * 3.0/4.0);
UV1 = UVAndScreenPos.xy + UV1 * PostprocessInput0Size.zw;
UV2 = UVAndScreenPos.xy + UV2 * PostprocessInput0Size.zw;
UV3 = UVAndScreenPos.xy + UV3 * PostprocessInput0Size.zw;
UV4 = UVAndScreenPos.xy + UV4 * PostprocessInput0Size.zw;
float4 CA = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UV1, 0);
float4 CB = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UV2, 0);
float4 CC = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UV3, 0);
float4 CD = PostprocessInput0.SampleLevel(PostprocessInput0Sampler, UV4, 0);
// Weight the two samples to avoid forground into background bleed.
float IFeather0 = 1.0/4.0;
float ICoc = PixCoc * 0.5; // Coc is half res units.
float I1 = saturate((abs(CA.a) - ICoc) * IFeather0);
float I2 = saturate((abs(CB.a) - ICoc) * IFeather0);
float I3 = saturate((abs(CC.a) - ICoc) * IFeather0);
float I4 = saturate((abs(CD.a) - ICoc) * IFeather0);
// Make sure something is non-zero.
I1 += 1.0/65536.0;
I2 += 1.0/65536.0;
I3 += 1.0/65536.0;
I4 += 1.0/65536.0;
return float4(CA.rgb * I1 + CB.rgb * I2 + CC.rgb * I3 + CD.rgb * I4, I1+I2+I3+I4);
#endif
}
float3 RecombineNSamples(float4 UVAndScreenPos, float PixCoc)
{
// Count 2 or higher, gets slower but less noisy
#if ( QUALITY == 2 )
const uint Count = 32;
#elif ( QUALITY == 1 )
const uint Count = 12;
#else
const uint Count = 1;
#endif
float4 SumWithWeight = 0;
LOOP for(uint i = 0; i < Count; ++i)
{
float4 ColorWithWeight = Recombine2Samples(UVAndScreenPos, PixCoc, i);
SumWithWeight += ColorWithWeight;
}
return SumWithWeight.rgb / SumWithWeight.a;
}
// pixel shader to combine the full res scene and the blurred images behind and in front of the the focal plane
void MainCircleRecombinePS(in float4 UVAndScreenPos : TEXCOORD0, out float4 OutColor : SV_Target0)
{
// Circle of confusion size for the pixel.
float PixDepth = CalcSceneDepth(UVAndScreenPos.xy);
float PixCoc = DepthToCoc2(PixDepth);
float4 HalfRes = PostprocessInput1.SampleLevel(PostprocessInput1Sampler, UVAndScreenPos.xy, 0);
// debug CircleDOF pass
// OutColor = HalfRes; return;
// Grab nearest Coc.
PixCoc = min(PixCoc, HalfRes.a);
// debug CoC
// OutColor = PixCoc*0.05f; return;
// Transform into sample pattern.
PixCoc = abs(PixCoc) * 2.0; // 2x because full instead of half resolution.
OutColor.rgb = RecombineNSamples(UVAndScreenPos, PixCoc);
OutColor.a=0;
// Grab the half resolution neighborhood to remove the artifacts from the full resolution output.
// Nearest location.
#if 1
// This has higher in-focus contrast, but possibly lower noise reduction later.
float2 HUVBase = UVAndScreenPos.xy * PostprocessInput1Size.xy - 0.5;
float2 HUVFrac = frac(HUVBase);
float2 HUV = (trunc(HUVBase) + 0.5) * PostprocessInput1Size.zw;
#else
// This makes the mostly in-focus transition bad (too blurry).
float2 HUV = UVAndScreenPos.xy - 0.5 * PostprocessInput1Size.zw;
#endif
// Load four nearest samples.
float4 H0 = PostprocessInput1.SampleLevel(PostprocessInput1Sampler, HUV, 0);
float4 H1 = PostprocessInput1.SampleLevel(PostprocessInput1Sampler, HUV, 0, int2(1,0));
float4 H2 = PostprocessInput1.SampleLevel(PostprocessInput1Sampler, HUV, 0, int2(0,1));
float4 H3 = PostprocessInput1.SampleLevel(PostprocessInput1Sampler, HUV, 0, int2(1,1));
// to YCoCg (doesn't make much of a difference)
// H0.rgb = RGBToYCoCg(H0.rgb);
// H1.rgb = RGBToYCoCg(H1.rgb);
// H2.rgb = RGBToYCoCg(H2.rgb);
// H3.rgb = RGBToYCoCg(H3.rgb);
// OutColor.rgb = RGBToYCoCg(OutColor.rgb);
// TODO: This would work a lot better in YUV style colorspace?
// Limit the full resolution to remove jitter artifacts.
float4 HMax = max(max(H0,H1),max(H2,H3));
float4 HMin = min(min(H0,H1),min(H2,H3));
#if 1
// Increase constrast of limit a little to workaround to strong denoise at near-in-focus.
float4 HD = HMin / 8.0;
float Small = 1.0 - saturate(PixCoc*PixCoc*(1.0/64.0));
HMax += HD * Small;
HMin -= HD * Small;
#endif
// debug unclamped color
// return;
// Blend in the limited version quickly to remove HDR jitter artifacts and noise.
float4 OutLimited = min(max(OutColor,HMin),HMax);
OutColor = lerp(OutColor, OutLimited, saturate(PixCoc*PixCoc*4.0));
// back to RGB
// OutColor.rgb = YCoCgToRGB(OutColor.rgb);
}