Files
UnrealEngineUWP/Engine/Source/Runtime/Renderer/Private/GlobalDistanceField.cpp

1374 lines
61 KiB
C++
Raw Normal View History

// Copyright 1998-2017 Epic Games, Inc. All Rights Reserved.
/*=============================================================================
GlobalDistanceField.cpp
=============================================================================*/
#include "GlobalDistanceField.h"
Copying //UE4/Dev-Build to //UE4/Dev-Main (Source: //UE4/Dev-Build @ 3209340) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3209340 on 2016/11/23 by Ben.Marsh Convert UE4 codebase to an "include what you use" model - where every header just includes the dependencies it needs, rather than every source file including large monolithic headers like Engine.h and UnrealEd.h. Measured full rebuild times around 2x faster using XGE on Windows, and improvements of 25% or more for incremental builds and full rebuilds on most other platforms. * Every header now includes everything it needs to compile. * There's a CoreMinimal.h header that gets you a set of ubiquitous types from Core (eg. FString, FName, TArray, FVector, etc...). Most headers now include this first. * There's a CoreTypes.h header that sets up primitive UE4 types and build macros (int32, PLATFORM_WIN64, etc...). All headers in Core include this first, as does CoreMinimal.h. * Every .cpp file includes its matching .h file first. * This helps validate that each header is including everything it needs to compile. * No engine code includes a monolithic header such as Engine.h or UnrealEd.h any more. * You will get a warning if you try to include one of these from the engine. They still exist for compatibility with game projects and do not produce warnings when included there. * There have only been minor changes to our internal games down to accommodate these changes. The intent is for this to be as seamless as possible. * No engine code explicitly includes a precompiled header any more. * We still use PCHs, but they're force-included on the compiler command line by UnrealBuildTool instead. This lets us tune what they contain without breaking any existing include dependencies. * PCHs are generated by a tool to get a statistical amount of coverage for the source files using it, and I've seeded the new shared PCHs to contain any header included by > 15% of source files. Tool used to generate this transform is at Engine\Source\Programs\IncludeTool. [CL 3209342 by Ben Marsh in Main branch]
2016-11-23 15:48:37 -05:00
#include "DistanceFieldLightingShared.h"
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3357411) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3244756 on 2017/01/03 by Marcus.Wassmer Copying //Tasks/UE4/Dev-Niagara@3244743 to Dev-Rendering (//UE4/Dev-Rendering) Change 3248667 on 2017/01/05 by Olaf.Piesche Resaving default asset because of engine verison issue; maybe unnecessary, but resaving niagara engine content to be sure #jira UE-40160 Change 3249324 on 2017/01/06 by Marcus.Wassmer Resave with an actual version to stop cook warning Change 3249611 on 2017/01/06 by Marcus.Wassmer Just remove warning-causing niagara data for now. Change 3308052 on 2017/02/16 by Rolando.Caloca DR - Check for Vulkan SDK, and only use it if it's newer or the same as the headers we distribute Change 3308109 on 2017/02/16 by Rolando.Caloca DR - Upgrade glslang to 1.0.39.1 Change 3308111 on 2017/02/16 by Rolando.Caloca DR - Update Vulkan distribution to 1.0.39.1 Change 3308153 on 2017/02/16 by Rolando.Caloca DR - Updated glslang libs Change 3308842 on 2017/02/17 by Rolando.Caloca DR - Fixed copy/paste Change 3310007 on 2017/02/17 by Chris.Bunner Back out CL 3221219 - causing MIC generation issues and superseded by CL 3273971. #jira UE-37792 Change 3310154 on 2017/02/17 by Chris.Bunner Assert when attempting to add a custom material attribute already in the base attributes list. Change 3310155 on 2017/02/17 by Chris.Bunner PR #3231: Validate material index before accessing (Contributed by projectgheist) #jira UE-41774, UE-41788 Change 3310162 on 2017/02/17 by Chris.Bunner PR #3252: Added MobileMaterialInterface to UsedMaterials (Contributed by projectgheist) #jira UE-41823, UE-41950 Change 3310176 on 2017/02/17 by Chris.Bunner Merging CL 3233886: AMD HDR support (requires r.AMDSupportsHDRDisplayOutput=1 in ini). Update to AGS 5.0.5. Partial code tidy up. Change 3310187 on 2017/02/17 by Chris.Bunner Preserve constant expressions rather than always casting after translating a material attribute. Losing the notion of constant means we can't correctly detect used properties and falsely enable e.g. PDO. Happened because of the incorrect component masks in BreakMaterialNodes which then had to be downcast to the correct type which is done as an inline fragment rather than swizzle expression. #jira UE-41594 Change 3310215 on 2017/02/17 by Chris.Bunner Prevent SpeedTree node compiling for skeletal meshes (not supported as uses more UV sets than available). More descriptive error for missing Cubemap UV input on TextureSample material node . #jira UE-33098 Change 3310838 on 2017/02/18 by Joe.Graf Moved some private functions to public for a licensee #CodeReview: matt.kuhlenschmidt #rb: n/a Change 3311876 on 2017/02/20 by Rolando.Caloca DR - Expose skin cache cvar r.SkinCache.AccumulationBufferSizeInKB #jira UE-42014 Change 3314139 on 2017/02/21 by Rolando.Caloca DR - Minor cleanup pass - Remove FVulkanPendingState - Renamed some classes for clarity - Hoist pending UAVs for flush out to pending compute state Change 3314642 on 2017/02/21 by Rolando.Caloca DR - Some more renaming Change 3315431 on 2017/02/21 by Ben.Salem Properly set default values for test time out and tick. We now will default to ticking once per second, and tracking the macro stats of GPU/Render/Game thread time. #tests Ran showdown demo several times Change 3316710 on 2017/02/22 by Rolando.Caloca DR - hlslcc - Fix refract intrinsic Change 3316718 on 2017/02/22 by Rolando.Caloca DR - hlslcc - Built libs to pick up change from 3316710 - refract fix Change 3316820 on 2017/02/22 by Benjamin.Hyder updating Tm-TrigNodes map Change 3317192 on 2017/02/22 by Benjamin.Hyder Updating QA-Decals map Change 3317528 on 2017/02/22 by Benjamin.Hyder Updating QA-Decals map Change 3317639 on 2017/02/22 by Benjamin.Hyder Updating Decal on Complex Mesh example in QA-Decals Change 3317764 on 2017/02/22 by Benjamin.Hyder Final updates to QA-Decals Change 3318319 on 2017/02/22 by Rolando.Caloca DR - minor reorg/rename Change 3318379 on 2017/02/22 by Rolando.Caloca DR - more cleanup Change 3321181 on 2017/02/24 by Rolando.Caloca DR - Fix GL bug Change 3321247 on 2017/02/24 by Rolando.Caloca DR - Fix misc bugs Change 3321898 on 2017/02/24 by Chris.Bunner Only issue clear TLV dispatch if required. #jira UERNDR-193 Change 3321904 on 2017/02/24 by Chris.Bunner Added comment for potential future optimization. Change 3322013 on 2017/02/24 by Uriel.Doyon Fixed separate translucency being affected by Gaussian DOF #jira UE-40489 Change 3322517 on 2017/02/24 by Uriel.Doyon Fixed issue with InvestigateTexture command removing budget limit. Fixed StreamingBounds show flag not working. It nows shows the streaming bound for the currently selected textures. #jira UE-40485 Change 3323470 on 2017/02/27 by Chad.Garyet Removing DDC job from dev-rendering Change 3323479 on 2017/02/27 by Chad.Garyet Removing RDU agent type Change 3323519 on 2017/02/27 by Chad.Garyet removing NCL/LHR/SEA agent types to clean up space Change 3323639 on 2017/02/27 by Benjamin.Hyder More updates to QA-Decals Change 3324207 on 2017/02/27 by Uriel.Doyon Fixed typo ScaleTexturesByGlobalMyBias -> ScaleTexturesByGlobalMipBias Removed bad merge in FStreamingTextureLevelContext::GetBuildDataIndexRef Change 3324396 on 2017/02/27 by Uriel.Doyon Fixed an issue with the Streaming Bounds show flag interferring with the static level data initialization #jira UE-40485 Change 3325227 on 2017/02/28 by Chris.Bunner Fix-up AMD AGS libs. Change 3325566 on 2017/02/28 by Uriel.Doyon Fixed possible out-of-bound access in GetUsedTexture() when passing ERHIFeatureLevel::Num Change 3326009 on 2017/02/28 by Uriel.Doyon Better fix for 3325566, as the previous fix would ignore the material instance overrides. Change 3327058 on 2017/03/01 by Benjamin.Hyder Preparing TM_Shadermodels map for automation Change 3328222 on 2017/03/01 by Chris.Bunner Prevent decals from drawing in separate translucency pass. Whilst user control and material relevance were already removed, if the flag was checked before being disabled (by swapping to decal domain) this was still being read in the render loop, now explicitly ignores decals. #jira UE-42449, UE-42446 Change 3329848 on 2017/03/02 by Uriel.Doyon Added some extra logs to help track UE-42168 Change 3329977 on 2017/03/02 by Rolando.Caloca DR - Fix bad clear value Change 3330008 on 2017/03/02 by Benjamin.Hyder More preparations for QA-Decals automation Change 3330754 on 2017/03/02 by Daniel.Wright Prominent comment explaining reflection env async compute usage and why it's not overlapped with anything Change 3331451 on 2017/03/03 by Marc.Olano Manually unroll simplex noise loop to avoid PSO bug on AMD/Metal Change 3331839 on 2017/03/03 by Rolando.Caloca DR - hlslcc - add missing file to project Change 3332247 on 2017/03/03 by Rolando.Caloca DR - Fix for integrated intel PR #3305 #jira UE-42393 Change 3332259 on 2017/03/03 by Rolando.Caloca DR - Fix bad index into pixel formats PR #3237 #jira UE-41855 Change 3332305 on 2017/03/03 by Rolando.Caloca DR - OpenGL SRV for index buffers PR #3271 #jira UE-32618 Change 3332313 on 2017/03/03 by Rolando.Caloca DR - Fix for integrated intel (properly) PR #3305 #jira UE-42393 Change 3332317 on 2017/03/03 by Rolando.Caloca DR - OpenGL SRV for index buffers (properly) PR #3271 #jira UE-32618 Change 3332368 on 2017/03/03 by Rolando.Caloca DR - Minor fixes so -sm4 and -sm5 can be used on windows with OpenGL/Vulkan Change 3333690 on 2017/03/06 by Daniel.Wright [Copy] Changing movable skylight properties no longer affects static draw lists Change 3333693 on 2017/03/06 by Daniel.Wright [Copy] Added 'r.AOListMeshDistanceFields' which dumps out mesh distance fields sorted by memory size, useful for directing content optimizations Change 3333705 on 2017/03/06 by Daniel.Wright [Copy] Mesh distance fields are now 8 bit fixed point by default, but can be changed back to 16 bit floating piont with a project setting. * 8 bit uses half memory but introduces error for thin surfaces or large meshes. Change 3333721 on 2017/03/06 by David.Hill DecalProxy: Copy float FadeScreenSize to FDeferredDecalProxy for use in the render thread. This avoids pointer chasing to the UDecalComponent (game thread component). Change 3333772 on 2017/03/06 by Daniel.Wright [Copy] Scene motion blur data is only updated for the main renderer frames. Fixes scene captures and planar reflections breaking object motion blur. Change 3333790 on 2017/03/06 by Daniel.Wright [Copy] Mesh distance field generation uses Embree, for a 2.5x speedup * Can switch back to old kDOP generation with 'r.DistanceFieldBuild.UseEmbree 0' for debugging Change 3333822 on 2017/03/06 by Daniel.Wright [Copy] Moved mesh distance field code into MeshDistanceFieldUtilities.cpp * Moved FMeshUtilities to its own header so the 8k line MeshUtilites.cpp file can be further split up Change 3333827 on 2017/03/06 by Daniel.Wright [Copy] Range compress 8bit distance fields - gets one extra bit of precision on average Change 3333828 on 2017/03/06 by Daniel.Wright [Copy] Raised High ShadowQuality to 2048 as 1024 for CSM is way too low Change 3333831 on 2017/03/06 by Daniel.Wright Non-editor compile fix Change 3333836 on 2017/03/06 by Daniel.Wright [Copy] Workaround for gobal distance field volume textures being bloated by 4x on PS4 due to the recommended tiling modes. They now use a 2d tiling mode which avoids the bloat, saving 96Mb. Change 3333843 on 2017/03/06 by Daniel.Wright [Copy] Added OcclusionExponent to skylight component * Useful for brightening up indoors without losing contact shadows as MinOcclusion does Change 3333845 on 2017/03/06 by Daniel.Wright [Copy] Capsule shadow BP functions Change 3333850 on 2017/03/06 by Daniel.Wright [Copy] Added OcclusionCombineMode to skylight component Change 3333854 on 2017/03/06 by Daniel.Wright [Copy] Gnm properly registers clears as GPU work so those events show up in profilegpu Change 3333857 on 2017/03/06 by Daniel.Wright [Copy] Clear light attenuation for local lights with a quad covering their screen extents * Clearing the entire light attenuation buffer costs .1ms on PS4. This optimization lowers the minimum cost of a shadow casting light from .15ms -> .03ms. * Shadowed lights in Fortnite with 25 lights 3.7ms -> 1.42ms on PS4 Change 3333860 on 2017/03/06 by Daniel.Wright [Copy] Flush deferred deletes when reallocating distance field atlas to reduce peak memory Change 3333861 on 2017/03/06 by Daniel.Wright [Copy] Disable all distance field features on Intel cards as HD 4000 hangs in the RHICreateTexture3D call to allocate the large atlas Change 3333869 on 2017/03/06 by Daniel.Wright [Copy] Volumetric Fog using a volume texture mapped to the camera frustum * Volumetric fog can be enabled on an Exponential Height Fog component with additional controls * Lights have a VolumetricScatteringIntensity * New cvars r.VolumetricFog, r.VolumetricFog.GridPixelSize, r.VolumetricFog.GridSizeZ, r.VolumetricFog.DepthDistributionScale * Lighting features supported: * Directional light with CSM and a light function * Point / spot lights without shadows / light functions / IES profiles * Skylight with occlusion from distance fields * Analytical height fog covers the view range past where the volumetric fog ends * Temporal reprojection is used on the volumetric fog scattering and extinction to achieve stability * Translucency integrates properly into volumetric fog * Height fog StartDistance is not supported by volumetric fog and should be set to 0. Change 3333894 on 2017/03/06 by Daniel.Wright [Copy] Initialize GDummyVolumetricFogGlobalDataUniformBuffer outside of parallel rendering Change 3333902 on 2017/03/06 by Daniel.Wright [Copy] Better handling of volumetric fog enabled with distance of 0 Change 3333903 on 2017/03/06 by Daniel.Wright [Copy] Fixed volumetric fog trying to render light functions for a point light Change 3333908 on 2017/03/06 by Daniel.Wright [Copy] Volumetric materials * Added new material domain Volume, which can output Scattering, Absorption and Emissive. All properties are in world space densities. * Particle systems using the Volume domain are voxelized based on their ParticlePosition and ParticleRadius * Volumetric fog integration is now energy conservative - scattering is integrated against transmission over the depth of each slice. * Added bOverrideLightColorsWithFogInscatteringColors to exponential height fog, which can be enabled to make Volumetric Fog match Height fog more closely Change 3334134 on 2017/03/06 by Daniel.Wright [Copy from Michael Trepka] Added Embree 2.14.0 and changed MeshUtilities to use it as this solves issues with Embree leaking TLS keys. UnrealLightmass is still using older Embree 2.7.0 until we can find time to properly test it with the new version. Also, invalidated distance field DDC to force it to rebuild with updated Embree. Change 3334420 on 2017/03/06 by Daniel.Wright Fixed RTDF shadows Change 3335467 on 2017/03/07 by Benjamin.Hyder Initial submission of QA-Decals map to EngineTest Change 3335556 on 2017/03/07 by Daniel.Wright Changed mesh distance field default format back to R16f Change 3338020 on 2017/03/08 by Daniel.Wright Disable volumetric fog in vertex shaders for feature levels which don't support it Change 3339394 on 2017/03/09 by Chris.Bunner Correctly handle material texture translation error edge case. #jira UE-42579, UE-42670 Change 3339992 on 2017/03/09 by Daniel.Wright Only compile volumetric fog shaders on supporting platforms Change 3341858 on 2017/03/10 by Arne.Schober Copying //UE4/Dev-Rendering-PSO to Dev-Rendering (//UE4/Dev-Rendering) #RB Rolando.Caloca, Marcus.Wassmer, Daniel.Wright, Nick.Penwarden, Mark.Satterthwaite Change 3342004 on 2017/03/10 by Arne.Schober Copying //UE4/Dev-Rendering-PSO to Dev-Rendering (//UE4/Dev-Rendering) Fix unity build #RB Marcus.Wassmer Change 3343307 on 2017/03/13 by Marcus.Wassmer Update showflags when we are guaranteed it will happen in all possible ways to spawn the scenecapture. (drag into editor, PIE, -game, etc) Change 3343732 on 2017/03/13 by Rolando.Caloca DR - Vulkan compute pipeline & refactor Change 3344846 on 2017/03/14 by Rolando.Caloca DR - Android compile fixes Change 3344883 on 2017/03/14 by Rolando.Caloca DR - Add missing stencil load/store to PSO initializer Change 3344985 on 2017/03/14 by Rolando.Caloca DR - Made load/store actions uint8 Change 3345141 on 2017/03/14 by Rolando.Caloca DR - vk - Rework render pass hash Change 3345304 on 2017/03/14 by Benjamin.Hyder Updating TM-Distancefields map to include TemplateFloor mesh Change 3345387 on 2017/03/14 by Rolando.Caloca DR - Add _RenderThread calls for Create*Shader so RHIs can choose not to stall when creating Change 3345388 on 2017/03/14 by Rolando.Caloca DR - Do not stall when creating shaders on Vulkan Change 3345722 on 2017/03/14 by Chris.Bunner PR #3357: MinimalAPI add to many material expressions (Contributed by DeanoC) #jira UE-42752 Change 3345723 on 2017/03/14 by Chris.Bunner Reduce log verbosity causing spamming during landscape editing. #jira UE-42714 Change 3345725 on 2017/03/14 by Chris.Bunner [Duplicate 3341860] Fixed material translation error with multiple connections from custom interpolator nodes. Change 3345726 on 2017/03/14 by Chris.Bunner Typo fixes. Change 3345732 on 2017/03/14 by Rolando.Caloca DR - Decouple vertex declaration off BSS Change 3345746 on 2017/03/14 by Chris.Bunner Added sign() intrinsic material graph node and delisted material function workaround. Change 3346042 on 2017/03/14 by Chris.Bunner Implement missing size query interface for FRenderTargetResources. #jira UE-41672 Change 3346387 on 2017/03/14 by Daniel.Wright [Copy] Added VolumetricScatteringIntensity to particle lights Change 3346389 on 2017/03/14 by Daniel.Wright [Copy] Clamp Volumetric material attributes to fp16 range to avoid INFs Disable volumetric fog when the fog show flag is disabled Change 3346392 on 2017/03/14 by Daniel.Wright [Copy] Fixed skylight being much too bright on volumetric fog Change 3346406 on 2017/03/14 by Daniel.Wright [Copy] CSM resolution is now controlled by r.Shadow.MaxCSMResolution. * Changed HighPC to use 1024 MaxShadowResolution (max for all non-CSM shadows), saves 60Mb in Fortnite Change 3346412 on 2017/03/14 by Daniel.Wright [Copy] TexCreate_ReduceMemoryWithTilingMode for translucency lighting 3d textures, saves 13Mb Change 3346414 on 2017/03/14 by Daniel.Wright [Copy] TexCreate_ReduceMemoryWithTilingMode for volumetric fog 3d textures, saves 13Mb Change 3346415 on 2017/03/14 by Daniel.Wright [Copy] Missing file from cl 3338451 Change 3346421 on 2017/03/14 by Daniel.Wright [Copy] Fixed NaNs in volumetric fog due to rendering when height fog is disabled * Volumetric fog converts NaNs to black now so they don't spread Change 3346422 on 2017/03/14 by Daniel.Wright [Copy] Fixed NaN in volumetric fog with low density values Change 3346423 on 2017/03/14 by Daniel.Wright [Copy] Changed default VolumetricFogScatteringDistribution to .2 Change 3346430 on 2017/03/14 by Daniel.Wright [Copy] New translucent material option to compute fog per pixel instead of the default per vertex Change 3346432 on 2017/03/14 by Daniel.Wright [Copy] Moved Volumetric Fog parameters to view uniform buffer for translucency pass Fixed lifetimes of temporary Volumetric Fog render targets Change 3346526 on 2017/03/14 by Daniel.Wright [Copy] Volumetric Fog supports point and spot light shadows * These lights are injected separately so that per-light resources can be bound (shadow depth map, static shadow depth map) * Forward lighting of local lights can be forced with 'r.VolumetricFog.InjectShadowedLightsSeparately 0' * Shadowed lights come at a cost: 2.9ms for volumetric fog on 970 -> 4.2ms with shadowing Change 3347053 on 2017/03/15 by Rolando.Caloca DR - android compile fix Change 3347384 on 2017/03/15 by Rolando.Caloca DR - Fix merge issue Change 3347643 on 2017/03/15 by Marcus.Wassmer Fix some bugs with the 'disable stationary skylight ffor the project' feature. Fixes lighting in Persona on Paragon. Change 3347979 on 2017/03/15 by Rolando.Caloca DR - Allow to automatically apply cached rendertargets to PSO initializer Change 3348024 on 2017/03/15 by Rolando.Caloca DR - Remove NullPS on Vulkan to avoid deadlock Change 3348303 on 2017/03/15 by Rolando.Caloca DR - Fix for debugging SCW with material SRT Change 3348357 on 2017/03/15 by Marcus.Wassmer Fix stencildither and a stencilref bug that was probably breaking decals sometimes. Change 3348549 on 2017/03/15 by Marcus.Wassmer Hopefully fix static analysis for potential nullptr access. Change 3348614 on 2017/03/15 by Marcus.Wassmer Duplicate some switch changes to fix crash on launch. Change 3349369 on 2017/03/16 by Gil.Gribb Fixed botched merge Change 3349947 on 2017/03/16 by Rolando.Caloca DR - Fix for mismatched primitive type Change 3349956 on 2017/03/16 by Benjamin.Hyder initial updates to TM-DistanceFields map Change 3350151 on 2017/03/16 by Rolando.Caloca DR - Fix UT compile issue Change 3350155 on 2017/03/16 by Rolando.Caloca DR - Catch mismatched primitive type on PSOs on D3D11 Change 3350192 on 2017/03/16 by Daniel.Wright Fix for point light shadow depths rendering with wrong cull mode due to PSO refactor Change 3350736 on 2017/03/16 by Daniel.Wright Fixed formatting from merge Change 3350881 on 2017/03/16 by Rolando.Caloca DR - Fix texture arrays as UAVs on Metal Change 3350927 on 2017/03/16 by Rolando.Caloca DR - Fix warning Change 3350935 on 2017/03/16 by Daniel.Wright Fix for materials with non-Surface domains being skipped in mesh passes Change 3351583 on 2017/03/17 by Marcus.Wassmer Fix clang platforms Change 3351917 on 2017/03/17 by Marcus.Wassmer Fix linux compile Change 3351973 on 2017/03/17 by Marcus.Wassmer Fix mismatched rendertargetformat Change 3352038 on 2017/03/17 by Daniel.Wright Enabled GetAndOrCreateGraphicsPipelineState ensures in Development for testing Change 3352110 on 2017/03/17 by Marcus.Wassmer Fix missing RT PSO apply Change 3352695 on 2017/03/17 by Arne.Schober DR - Remove PSO Rendertarget check in DX12 Resolve with Shader. #RB Rolando.Caloca Change 3352960 on 2017/03/17 by Arne.Schober DR - Fix some things that slipped trough the PSO merge #RB none Change 3353150 on 2017/03/18 by Rolando.Caloca DR - compile fix Change 3353205 on 2017/03/18 by Arne.Schober DR - Fix Incremental Compile and PS4 runtime error where CMASK is not allowed for ThickTile Mode #RB none Change 3353207 on 2017/03/18 by Arne.Schober DR - Fix Confusion #RB none Change 3355183 on 2017/03/20 by Nick.Bullard Fixed up Content orginzation for Decals automation tests in EngineTest Change 3355627 on 2017/03/20 by Arne.Schober DR - [UE-43094] - removed ensure in comporiton graph as control of the clear color cannot be gurantueed. Change 3356342 on 2017/03/21 by Marcus.Wassmer Fix clang errors Change 3356591 on 2017/03/21 by Arne.Schober DR - Fix ensure message #RB none Change 3356873 on 2017/03/21 by Arne.Schober DR - Fix comparission of undefined values in RendertargetApply Check Change 3357261 on 2017/03/21 by Marcus.Wassmer Fix LinuxEditor compile Change 3357294 on 2017/03/21 by Marcus.Wassmer Add missing SSE functions Change 3357351 on 2017/03/21 by Frank.Fella Fix win32 and linux compiler errors Change 3357370 on 2017/03/21 by Arne.Schober DR - disable ensure in test builds #RB Marcus.Wassmer [CL 3357449 by Marcus Wassmer in Main branch]
2017-03-21 17:46:52 -04:00
#include "RendererModule.h"
#include "ClearQuad.h"
int32 GAOGlobalDistanceField = 1;
FAutoConsoleVariableRef CVarAOGlobalDistanceField(
TEXT("r.AOGlobalDistanceField"),
GAOGlobalDistanceField,
TEXT("Whether to use a global distance field to optimize occlusion cone traces.\n")
TEXT("The global distance field is created by compositing object distance fields into clipmaps as the viewer moves through the level."),
Copying //UE4/Dev-Rendering to Dev-Main (//UE4/Dev-Main) (Source: //UE4/Dev-Rendering @ 2943238) #lockdown nick.penwarden ========================== MAJOR FEATURES + CHANGES ========================== Change 2932679 on 2016/04/04 by Martin.Mittring remove hack/cvar that is not longer needed as we fixed the bug #rb:Bob.Tellez #code_review:Bob.Tellez Change 2932681 on 2016/04/04 by Martin.Mittring fixed cvars in consolevariables.ini can affect engine even if marked with cheat (no longer load consolevariables.ini in shipping and test), unified 3 code path, added testcase, cvars with cheat in ini file other than consolevariables.ini now trigger ensure, =on/off/true/false/.. works in all ini files, added enure if non scalability setting are used in ScalabilityIni (get now ignored) #rb:David.Hill #code_review:Marcus.Wassmer, Michael.Noland Change 2932719 on 2016/04/04 by Marcus.Wassmer Merge 3 band SH back to DevRendering #rb Daniel.Wright Change 2932760 on 2016/04/04 by Zabir.Hoque Migrating high resolution cubemaps for skylight and reflection probes. #rb: Daniel.Wright Change 2933121 on 2016/04/05 by Rolando.Caloca DR - vk - Fix free blocks not getting joined - Fix compile issue Change 2933122 on 2016/04/05 by Rolando.Caloca DR - Do not shorten dumped shaders path Change 2933126 on 2016/04/05 by Rolando.Caloca DR - vk - Index Buffers using new resource management Change 2933127 on 2016/04/05 by Rolando.Caloca DR - vk - Extract multibuffer off index buffer Change 2933131 on 2016/04/05 by Rolando.Caloca DR - vk - Transition to vb's using mutlibuffer Change 2933136 on 2016/04/05 by Rolando.Caloca DR - vk - Change staging buffers to use resource allocation system - Fix free block not getting joined - Remove define Change 2933140 on 2016/04/05 by Rolando.Caloca DR - vk - 'static' textures now use resource mgmt - Release free pages back to the OS - Remove ensure Change 2933152 on 2016/04/05 by Rolando.Caloca DR - vk - Fix aliasing granularity - Fix renderpass end/copy buffer ensure Change 2933155 on 2016/04/05 by Rolando.Caloca DR - SCW - Fix for -directcompile to directly load file for preprocessor Change 2933158 on 2016/04/05 by Rolando.Caloca DR - hlslcc - Error on Metal if trying to R & W on RWTextures - Fix indices on RW reads to be unsigned #codereview Mark.Satterthwaite, Michael.Trepka Change 2933169 on 2016/04/05 by Rolando.Caloca DR - vk - Move header to public to match changes on DevMobile Change 2933173 on 2016/04/05 by David.Hill Deferred decal rendering with negative scale #rb:Matrin.Mittring #jira:UE-27389 Change 2933273 on 2016/04/05 by Rolando.Caloca DR - vk - Fix renderdoc markers Change 2933274 on 2016/04/05 by Rolando.Caloca DR - Support for -AttachDebugger Change 2933316 on 2016/04/05 by Rolando.Caloca DR - vk - Compile fix whene enabling define Change 2933334 on 2016/04/05 by Rolando.Caloca DR - Compile fix #codereview Martin.Mittring Change 2933805 on 2016/04/05 by Brian.Karis Temporal AA dynamic antighosting. Fixed DOF Change 2933811 on 2016/04/05 by Brian.Karis Fixed area light NaNs. Improvements to area lights. Horizen handling for wrap around. Change 2933812 on 2016/04/05 by Brian.Karis Fixed fresnel on SSS skin. Change 2933813 on 2016/04/05 by Brian.Karis Tessellation fix Change 2933816 on 2016/04/05 by Brian.Karis Improved forward shading support [CL 2943241 by Gil Gribb in Main branch]
2016-04-13 21:24:38 -04:00
ECVF_RenderThreadSafe
);
int32 GAOUpdateGlobalDistanceField = 1;
FAutoConsoleVariableRef CVarAOUpdateGlobalDistanceField(
TEXT("r.AOUpdateGlobalDistanceField"),
GAOUpdateGlobalDistanceField,
TEXT("Whether to update the global distance field, useful for debugging."),
Copying //UE4/Dev-Rendering to Dev-Main (//UE4/Dev-Main) (Source: //UE4/Dev-Rendering @ 2943238) #lockdown nick.penwarden ========================== MAJOR FEATURES + CHANGES ========================== Change 2932679 on 2016/04/04 by Martin.Mittring remove hack/cvar that is not longer needed as we fixed the bug #rb:Bob.Tellez #code_review:Bob.Tellez Change 2932681 on 2016/04/04 by Martin.Mittring fixed cvars in consolevariables.ini can affect engine even if marked with cheat (no longer load consolevariables.ini in shipping and test), unified 3 code path, added testcase, cvars with cheat in ini file other than consolevariables.ini now trigger ensure, =on/off/true/false/.. works in all ini files, added enure if non scalability setting are used in ScalabilityIni (get now ignored) #rb:David.Hill #code_review:Marcus.Wassmer, Michael.Noland Change 2932719 on 2016/04/04 by Marcus.Wassmer Merge 3 band SH back to DevRendering #rb Daniel.Wright Change 2932760 on 2016/04/04 by Zabir.Hoque Migrating high resolution cubemaps for skylight and reflection probes. #rb: Daniel.Wright Change 2933121 on 2016/04/05 by Rolando.Caloca DR - vk - Fix free blocks not getting joined - Fix compile issue Change 2933122 on 2016/04/05 by Rolando.Caloca DR - Do not shorten dumped shaders path Change 2933126 on 2016/04/05 by Rolando.Caloca DR - vk - Index Buffers using new resource management Change 2933127 on 2016/04/05 by Rolando.Caloca DR - vk - Extract multibuffer off index buffer Change 2933131 on 2016/04/05 by Rolando.Caloca DR - vk - Transition to vb's using mutlibuffer Change 2933136 on 2016/04/05 by Rolando.Caloca DR - vk - Change staging buffers to use resource allocation system - Fix free block not getting joined - Remove define Change 2933140 on 2016/04/05 by Rolando.Caloca DR - vk - 'static' textures now use resource mgmt - Release free pages back to the OS - Remove ensure Change 2933152 on 2016/04/05 by Rolando.Caloca DR - vk - Fix aliasing granularity - Fix renderpass end/copy buffer ensure Change 2933155 on 2016/04/05 by Rolando.Caloca DR - SCW - Fix for -directcompile to directly load file for preprocessor Change 2933158 on 2016/04/05 by Rolando.Caloca DR - hlslcc - Error on Metal if trying to R & W on RWTextures - Fix indices on RW reads to be unsigned #codereview Mark.Satterthwaite, Michael.Trepka Change 2933169 on 2016/04/05 by Rolando.Caloca DR - vk - Move header to public to match changes on DevMobile Change 2933173 on 2016/04/05 by David.Hill Deferred decal rendering with negative scale #rb:Matrin.Mittring #jira:UE-27389 Change 2933273 on 2016/04/05 by Rolando.Caloca DR - vk - Fix renderdoc markers Change 2933274 on 2016/04/05 by Rolando.Caloca DR - Support for -AttachDebugger Change 2933316 on 2016/04/05 by Rolando.Caloca DR - vk - Compile fix whene enabling define Change 2933334 on 2016/04/05 by Rolando.Caloca DR - Compile fix #codereview Martin.Mittring Change 2933805 on 2016/04/05 by Brian.Karis Temporal AA dynamic antighosting. Fixed DOF Change 2933811 on 2016/04/05 by Brian.Karis Fixed area light NaNs. Improvements to area lights. Horizen handling for wrap around. Change 2933812 on 2016/04/05 by Brian.Karis Fixed fresnel on SSS skin. Change 2933813 on 2016/04/05 by Brian.Karis Tessellation fix Change 2933816 on 2016/04/05 by Brian.Karis Improved forward shading support [CL 2943241 by Gil Gribb in Main branch]
2016-04-13 21:24:38 -04:00
ECVF_RenderThreadSafe
);
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
int32 GAOGlobalDistanceFieldCacheMostlyStaticSeparately = 1;
FAutoConsoleVariableRef CVarAOGlobalDistanceFieldCacheMostlyStaticSeparately(
TEXT("r.AOGlobalDistanceFieldCacheMostlyStaticSeparately"),
GAOGlobalDistanceFieldCacheMostlyStaticSeparately,
TEXT("Whether to cache mostly static primitives separately from movable primitives, which reduces global DF update cost when a movable primitive is modified. Adds another 12Mb of volume textures."),
ECVF_RenderThreadSafe
);
int32 GAOGlobalDistanceFieldPartialUpdates = 1;
FAutoConsoleVariableRef CVarAOGlobalDistanceFieldPartialUpdates(
TEXT("r.AOGlobalDistanceFieldPartialUpdates"),
GAOGlobalDistanceFieldPartialUpdates,
TEXT("Whether to allow partial updates of the global distance field. When profiling it's useful to disable this and get the worst case composition time that happens on camera cuts."),
Copying //UE4/Dev-Rendering to Dev-Main (//UE4/Dev-Main) (Source: //UE4/Dev-Rendering @ 2943238) #lockdown nick.penwarden ========================== MAJOR FEATURES + CHANGES ========================== Change 2932679 on 2016/04/04 by Martin.Mittring remove hack/cvar that is not longer needed as we fixed the bug #rb:Bob.Tellez #code_review:Bob.Tellez Change 2932681 on 2016/04/04 by Martin.Mittring fixed cvars in consolevariables.ini can affect engine even if marked with cheat (no longer load consolevariables.ini in shipping and test), unified 3 code path, added testcase, cvars with cheat in ini file other than consolevariables.ini now trigger ensure, =on/off/true/false/.. works in all ini files, added enure if non scalability setting are used in ScalabilityIni (get now ignored) #rb:David.Hill #code_review:Marcus.Wassmer, Michael.Noland Change 2932719 on 2016/04/04 by Marcus.Wassmer Merge 3 band SH back to DevRendering #rb Daniel.Wright Change 2932760 on 2016/04/04 by Zabir.Hoque Migrating high resolution cubemaps for skylight and reflection probes. #rb: Daniel.Wright Change 2933121 on 2016/04/05 by Rolando.Caloca DR - vk - Fix free blocks not getting joined - Fix compile issue Change 2933122 on 2016/04/05 by Rolando.Caloca DR - Do not shorten dumped shaders path Change 2933126 on 2016/04/05 by Rolando.Caloca DR - vk - Index Buffers using new resource management Change 2933127 on 2016/04/05 by Rolando.Caloca DR - vk - Extract multibuffer off index buffer Change 2933131 on 2016/04/05 by Rolando.Caloca DR - vk - Transition to vb's using mutlibuffer Change 2933136 on 2016/04/05 by Rolando.Caloca DR - vk - Change staging buffers to use resource allocation system - Fix free block not getting joined - Remove define Change 2933140 on 2016/04/05 by Rolando.Caloca DR - vk - 'static' textures now use resource mgmt - Release free pages back to the OS - Remove ensure Change 2933152 on 2016/04/05 by Rolando.Caloca DR - vk - Fix aliasing granularity - Fix renderpass end/copy buffer ensure Change 2933155 on 2016/04/05 by Rolando.Caloca DR - SCW - Fix for -directcompile to directly load file for preprocessor Change 2933158 on 2016/04/05 by Rolando.Caloca DR - hlslcc - Error on Metal if trying to R & W on RWTextures - Fix indices on RW reads to be unsigned #codereview Mark.Satterthwaite, Michael.Trepka Change 2933169 on 2016/04/05 by Rolando.Caloca DR - vk - Move header to public to match changes on DevMobile Change 2933173 on 2016/04/05 by David.Hill Deferred decal rendering with negative scale #rb:Matrin.Mittring #jira:UE-27389 Change 2933273 on 2016/04/05 by Rolando.Caloca DR - vk - Fix renderdoc markers Change 2933274 on 2016/04/05 by Rolando.Caloca DR - Support for -AttachDebugger Change 2933316 on 2016/04/05 by Rolando.Caloca DR - vk - Compile fix whene enabling define Change 2933334 on 2016/04/05 by Rolando.Caloca DR - Compile fix #codereview Martin.Mittring Change 2933805 on 2016/04/05 by Brian.Karis Temporal AA dynamic antighosting. Fixed DOF Change 2933811 on 2016/04/05 by Brian.Karis Fixed area light NaNs. Improvements to area lights. Horizen handling for wrap around. Change 2933812 on 2016/04/05 by Brian.Karis Fixed fresnel on SSS skin. Change 2933813 on 2016/04/05 by Brian.Karis Tessellation fix Change 2933816 on 2016/04/05 by Brian.Karis Improved forward shading support [CL 2943241 by Gil Gribb in Main branch]
2016-04-13 21:24:38 -04:00
ECVF_RenderThreadSafe
);
Copying //UE4/Dev-Rendering to Dev-Main (Source //UE4/Dev-Rendering@2932636) #lockdown nick.penwarden ========================== MAJOR FEATURES + CHANGES ========================== Change 2917472 on 2016/03/21 by Rolando.Caloca DR - Fix SCW directcompile arguments, add -pipeline Change 2919580 on 2016/03/23 by Rolando.Caloca DR - HlslParser - Fix for used elements (sparrow's arrow was showing when it shouldn't) Arrays of input/outputs are now flattened so disjoint entries can be optimized out (and fixes a bug) #jira OR-15380 #tests Run game with sparrow, test with slomo to check for gfx glitches Change 2919660 on 2016/03/23 by Rolando.Caloca DR - Latest vk changes (from dev mobile's 2916881 to 2919157) Change 2919902 on 2016/03/23 by Rolando.Caloca DR - Fix skeletal meshes decrementing stats twice #codereview Marcus.Wassmer #jira UE-28478 Change 2920020 on 2016/03/23 by David.Hill #Jira UE-28503 EyeAdapation when used in material shader may not be initialized. #rb olaf.piesche Change 2920071 on 2016/03/23 by Rolando.Caloca DR - Remove old vk define - Started moving around direct calls to queue submit Change 2920252 on 2016/03/23 by Rolando.Caloca DR - Changes vk structs to classes Change 2920314 on 2016/03/23 by Olaf.Piesche Add -windowed to standalone game PIE command line to avoid PIE launching in full screen #jira UE-27870 #codereview michael.trepka Change 2920745 on 2016/03/24 by Uriel.Doyon Texture streaming build now takes into account the material texcoord scales applied to the texture sampling. Also finds out which texcoord is being used when sampling textures (between 0 and 3 currently). TexCoord analysis debug view shaders is now working with SM4 ane SM5. StaticMeshComponents hold persistent data coming from the texture streaming build. #tests tested with different Paragon assets. Editor SM4 & SM5. Cooked maps #codereview marcus.wassmer Change 2921335 on 2016/03/24 by Uriel.Doyon Added missing static keyword for locally defined console variable. #codereview rolando.caloca Change 2921416 on 2016/03/24 by Uriel.Doyon Revert enabling debugview shaders on non PC platforms (until properly tested and debugged) Change 2921446 on 2016/03/24 by Daniel.Wright Planar reflection mesh Change 2921530 on 2016/03/24 by Daniel.Wright Manual revert of Ronin planar reflections #codereview Ryan.Vance Change 2921608 on 2016/03/24 by Uriel.Doyon Updated texture streamer to take into account the new HLOD texture group. Change 2921677 on 2016/03/24 by Daniel.Wright Distance Field Specular Occlusion * Prototype - disabled by default Change 2921681 on 2016/03/24 by Daniel.Wright UnmappedTexelsPercentage is now 100 based Change 2921682 on 2016/03/24 by Daniel.Wright Planar reflections * New actor and component * The scene is rendered to texture with a mirrored camera and a clip plane each frame * The reflection texture is then applied to opaque pixels in a deferred pass, with distance and angle from plane fades * Translucent materials apply the nearest reflection plane in the base pass * Planar reflections require the project setting 'Support global clip plane for Planar Reflections' to be enabled, since writing to SV_ClipDistance all the time adds about 15% BasePass GPU time on PS4 * Fixed global distance field in materials which had been broken since moving global distance field properties into the view uniform buffer * Fixed PS4 removing system-value semantics when output from vertex shader and not read in next stage Change 2921734 on 2016/03/24 by Uriel.Doyon Fixed tessellated cube having wrong UVs #jira UE-28379 Change 2922063 on 2016/03/24 by Daniel.Wright Removed planar reflection debug code Change 2922428 on 2016/03/25 by Chris.Bunner Delete FShaderPipeline objects when clearing TMaterialShaderMaps. #rb Rolando.Caloca #jira UE-28621 Change 2922803 on 2016/03/25 by Rolando.Caloca DR - New cmd buffer management (disabled) - Move cmd buffer out of pending state and into context - Do not hardcode # cmd buffers - Move back buffer image mgmt into swapchain - Fixed some image layout transition bugs Change 2923056 on 2016/03/25 by Rolando.Caloca DR - Initial fix for canvas locking inside a render pass [CL 2932649 by Gil Gribb in Main branch]
2016-04-04 18:44:59 -04:00
int32 GAOGlobalDistanceFieldStaggeredUpdates = 1;
FAutoConsoleVariableRef CVarAOGlobalDistanceFieldStaggeredUpdatess(
TEXT("r.AOGlobalDistanceFieldStaggeredUpdates"),
GAOGlobalDistanceFieldStaggeredUpdates,
TEXT("Whether to allow the larger clipmaps to be updated less frequently."),
Copying //UE4/Dev-Rendering to Dev-Main (//UE4/Dev-Main) (Source: //UE4/Dev-Rendering @ 2943238) #lockdown nick.penwarden ========================== MAJOR FEATURES + CHANGES ========================== Change 2932679 on 2016/04/04 by Martin.Mittring remove hack/cvar that is not longer needed as we fixed the bug #rb:Bob.Tellez #code_review:Bob.Tellez Change 2932681 on 2016/04/04 by Martin.Mittring fixed cvars in consolevariables.ini can affect engine even if marked with cheat (no longer load consolevariables.ini in shipping and test), unified 3 code path, added testcase, cvars with cheat in ini file other than consolevariables.ini now trigger ensure, =on/off/true/false/.. works in all ini files, added enure if non scalability setting are used in ScalabilityIni (get now ignored) #rb:David.Hill #code_review:Marcus.Wassmer, Michael.Noland Change 2932719 on 2016/04/04 by Marcus.Wassmer Merge 3 band SH back to DevRendering #rb Daniel.Wright Change 2932760 on 2016/04/04 by Zabir.Hoque Migrating high resolution cubemaps for skylight and reflection probes. #rb: Daniel.Wright Change 2933121 on 2016/04/05 by Rolando.Caloca DR - vk - Fix free blocks not getting joined - Fix compile issue Change 2933122 on 2016/04/05 by Rolando.Caloca DR - Do not shorten dumped shaders path Change 2933126 on 2016/04/05 by Rolando.Caloca DR - vk - Index Buffers using new resource management Change 2933127 on 2016/04/05 by Rolando.Caloca DR - vk - Extract multibuffer off index buffer Change 2933131 on 2016/04/05 by Rolando.Caloca DR - vk - Transition to vb's using mutlibuffer Change 2933136 on 2016/04/05 by Rolando.Caloca DR - vk - Change staging buffers to use resource allocation system - Fix free block not getting joined - Remove define Change 2933140 on 2016/04/05 by Rolando.Caloca DR - vk - 'static' textures now use resource mgmt - Release free pages back to the OS - Remove ensure Change 2933152 on 2016/04/05 by Rolando.Caloca DR - vk - Fix aliasing granularity - Fix renderpass end/copy buffer ensure Change 2933155 on 2016/04/05 by Rolando.Caloca DR - SCW - Fix for -directcompile to directly load file for preprocessor Change 2933158 on 2016/04/05 by Rolando.Caloca DR - hlslcc - Error on Metal if trying to R & W on RWTextures - Fix indices on RW reads to be unsigned #codereview Mark.Satterthwaite, Michael.Trepka Change 2933169 on 2016/04/05 by Rolando.Caloca DR - vk - Move header to public to match changes on DevMobile Change 2933173 on 2016/04/05 by David.Hill Deferred decal rendering with negative scale #rb:Matrin.Mittring #jira:UE-27389 Change 2933273 on 2016/04/05 by Rolando.Caloca DR - vk - Fix renderdoc markers Change 2933274 on 2016/04/05 by Rolando.Caloca DR - Support for -AttachDebugger Change 2933316 on 2016/04/05 by Rolando.Caloca DR - vk - Compile fix whene enabling define Change 2933334 on 2016/04/05 by Rolando.Caloca DR - Compile fix #codereview Martin.Mittring Change 2933805 on 2016/04/05 by Brian.Karis Temporal AA dynamic antighosting. Fixed DOF Change 2933811 on 2016/04/05 by Brian.Karis Fixed area light NaNs. Improvements to area lights. Horizen handling for wrap around. Change 2933812 on 2016/04/05 by Brian.Karis Fixed fresnel on SSS skin. Change 2933813 on 2016/04/05 by Brian.Karis Tessellation fix Change 2933816 on 2016/04/05 by Brian.Karis Improved forward shading support [CL 2943241 by Gil Gribb in Main branch]
2016-04-13 21:24:38 -04:00
ECVF_RenderThreadSafe
Copying //UE4/Dev-Rendering to Dev-Main (Source //UE4/Dev-Rendering@2932636) #lockdown nick.penwarden ========================== MAJOR FEATURES + CHANGES ========================== Change 2917472 on 2016/03/21 by Rolando.Caloca DR - Fix SCW directcompile arguments, add -pipeline Change 2919580 on 2016/03/23 by Rolando.Caloca DR - HlslParser - Fix for used elements (sparrow's arrow was showing when it shouldn't) Arrays of input/outputs are now flattened so disjoint entries can be optimized out (and fixes a bug) #jira OR-15380 #tests Run game with sparrow, test with slomo to check for gfx glitches Change 2919660 on 2016/03/23 by Rolando.Caloca DR - Latest vk changes (from dev mobile's 2916881 to 2919157) Change 2919902 on 2016/03/23 by Rolando.Caloca DR - Fix skeletal meshes decrementing stats twice #codereview Marcus.Wassmer #jira UE-28478 Change 2920020 on 2016/03/23 by David.Hill #Jira UE-28503 EyeAdapation when used in material shader may not be initialized. #rb olaf.piesche Change 2920071 on 2016/03/23 by Rolando.Caloca DR - Remove old vk define - Started moving around direct calls to queue submit Change 2920252 on 2016/03/23 by Rolando.Caloca DR - Changes vk structs to classes Change 2920314 on 2016/03/23 by Olaf.Piesche Add -windowed to standalone game PIE command line to avoid PIE launching in full screen #jira UE-27870 #codereview michael.trepka Change 2920745 on 2016/03/24 by Uriel.Doyon Texture streaming build now takes into account the material texcoord scales applied to the texture sampling. Also finds out which texcoord is being used when sampling textures (between 0 and 3 currently). TexCoord analysis debug view shaders is now working with SM4 ane SM5. StaticMeshComponents hold persistent data coming from the texture streaming build. #tests tested with different Paragon assets. Editor SM4 & SM5. Cooked maps #codereview marcus.wassmer Change 2921335 on 2016/03/24 by Uriel.Doyon Added missing static keyword for locally defined console variable. #codereview rolando.caloca Change 2921416 on 2016/03/24 by Uriel.Doyon Revert enabling debugview shaders on non PC platforms (until properly tested and debugged) Change 2921446 on 2016/03/24 by Daniel.Wright Planar reflection mesh Change 2921530 on 2016/03/24 by Daniel.Wright Manual revert of Ronin planar reflections #codereview Ryan.Vance Change 2921608 on 2016/03/24 by Uriel.Doyon Updated texture streamer to take into account the new HLOD texture group. Change 2921677 on 2016/03/24 by Daniel.Wright Distance Field Specular Occlusion * Prototype - disabled by default Change 2921681 on 2016/03/24 by Daniel.Wright UnmappedTexelsPercentage is now 100 based Change 2921682 on 2016/03/24 by Daniel.Wright Planar reflections * New actor and component * The scene is rendered to texture with a mirrored camera and a clip plane each frame * The reflection texture is then applied to opaque pixels in a deferred pass, with distance and angle from plane fades * Translucent materials apply the nearest reflection plane in the base pass * Planar reflections require the project setting 'Support global clip plane for Planar Reflections' to be enabled, since writing to SV_ClipDistance all the time adds about 15% BasePass GPU time on PS4 * Fixed global distance field in materials which had been broken since moving global distance field properties into the view uniform buffer * Fixed PS4 removing system-value semantics when output from vertex shader and not read in next stage Change 2921734 on 2016/03/24 by Uriel.Doyon Fixed tessellated cube having wrong UVs #jira UE-28379 Change 2922063 on 2016/03/24 by Daniel.Wright Removed planar reflection debug code Change 2922428 on 2016/03/25 by Chris.Bunner Delete FShaderPipeline objects when clearing TMaterialShaderMaps. #rb Rolando.Caloca #jira UE-28621 Change 2922803 on 2016/03/25 by Rolando.Caloca DR - New cmd buffer management (disabled) - Move cmd buffer out of pending state and into context - Do not hardcode # cmd buffers - Move back buffer image mgmt into swapchain - Fixed some image layout transition bugs Change 2923056 on 2016/03/25 by Rolando.Caloca DR - Initial fix for canvas locking inside a render pass [CL 2932649 by Gil Gribb in Main branch]
2016-04-04 18:44:59 -04:00
);
Copying //UE4/Dev-Rendering to Dev-Main (//UE4/Dev-Main) #lockdown nick.penwarden ========================== MAJOR FEATURES + CHANGES ========================== Change 2771498 on 2015/11/18 by Rolando.Caloca DevRendering - HlslParser - Do not crash if an unknown preprocessor directive is found; add proper support for #pragma #codereview Marcus.Wassmer Change 2771600 on 2015/11/18 by Rolando.Caloca DevRendering - SCW - Added support for running the platform shader compiler for one usf file off the dumped usf Usage: ShaderCompileWorker -directcompile FILENAME.USF -entry=EntryPoint -format=PCD3D_SM5/SF_PS4/etc -vs/-ps/-gs/-hs/-ds/-cs -Also removed old communication enum from SCW #rb Daniel.Wright Change 2771647 on 2015/11/18 by Rolando.Caloca DevRendering - HlslParser - Refactored removed unused outputs code in prep for reusing a lot of this code - Entry point string now gets modified to the optimized one - Fixed parser allocator when requesting pages bigger than PageSize #rb Chris.Bunner Change 2772133 on 2015/11/18 by Chris.Bunner Removed physics shape type zeroing on Speedtree import. UE-23285 #rb Ori.Cohen Change 2772225 on 2015/11/18 by Rolando.Caloca DevRendering - Hlsl - Support for removing unused inputs on pixel shaders - Fix some shadow variable warnings #rb Chris.Bunner, Nick.Penwarden Change 2772469 on 2015/11/18 by Daniel.Wright Fixed SCW always exiting after compiling a long shader, now checks idle time starting from the end of the last compile task Automated smoke tests aren't run in standalone programs which are frequently launched as they increase the startup time (doubles startup time of SCW as shown in sampling profile) #rb Rolando.Caloca Change 2772471 on 2015/11/18 by Daniel.Wright Particle SubUV cutouts * A new asset type 'SubUV Animation' precomputes bounding geometry for every frame of a SubUV texture animation. * Particle emitters with a SubUV module can then use this SubUV Animation to render with much tigher bounding geometry to reduce overdraw. * GPU performance savings depend on how much empty space (zero alpha) existed in the texture. Measured a reduction of 2-3x GPU time on a smoke effect. * This only works if the material does not modify opacity to reveal areas with zero texture alpha Change 2772483 on 2015/11/18 by Marcus.Wassmer Filtering options on UnrealPak -list #rb Josh.Adams Change 2772644 on 2015/11/18 by Daniel.Wright Integrate - Temporal AA dithering is only enabled if outputting to a low precision format #rb Nick.Penwarden Change 2773336 on 2015/11/19 by Rolando.Caloca DevRendering - PS4 shaders - Added input/output attribute information when r.PS4StripExtraShaderBinaryData=0 #rb Marcus.Wassmer Change 2773476 on 2015/11/19 by Rolando.Caloca DevRendering - PS4 Shader attribute export stats Run using r.PS4DumpExportStats 1 in the console - Also fixed non-vertex shaders not getting optional data #codereview Marcus.Wassmer Change 2773865 on 2015/11/19 by Gil.Gribb UE4 - Added an FName churn tracker. Change 2773900 on 2015/11/19 by Rolando.Caloca DevRendering - Fix sharing shaders for material & mesh shaders #rb Marcus.Wassmer Change 2774277 on 2015/11/19 by Gil.Gribb UE4 - Did minor optimizations to the PS4 RHI and drawlists. Change 2774421 on 2015/11/19 by Olaf.Piesche Fix #2 for UE-23325 - separate translucency materials don't show in static mesh editor #codereview Martin.Mittring Change 2774447 on 2015/11/19 by Rolando.Caloca DevRendering - Velocity and Depth shader pipelines #rb Marcus.Wassmer Change 2774603 on 2015/11/19 by Marcus.Wassmer Windowed vsync for ps4 #rb Rolando.Caloca Change 2775650 on 2015/11/20 by Rolando.Caloca DevRendering - Added two utility overloads per UDN suggestion #codereview Gil.Gribb Change 2775798 on 2015/11/20 by David.Hill Adding a new AutoExposure method #rb Martin.Mittring Change 2776345 on 2015/11/20 by Daniel.Wright Capsule shadows for movable skylight * Gathers capsule occlusion along the unoccluded sky cone computed by Distance Field Ambient Occlusion * Requires DFAO to be enabled at the moment * Some serious artifacts remaining in indoor scenarios, as the unoccluded sky direction is not continuous Change 2777033 on 2015/11/22 by Uriel.Doyon Enabled SceneTextures node validation when material domain is DeferredDecal #review Martin.Mittring #jira UE-23141 Change 2778618 on 2015/11/23 by Daniel.Wright
2015-12-10 21:55:37 -05:00
int32 GAOLogGlobalDistanceFieldModifiedPrimitives = 0;
FAutoConsoleVariableRef CVarAOLogGlobalDistanceFieldModifiedPrimitives(
TEXT("r.AOGlobalDistanceFieldLogModifiedPrimitives"),
GAOLogGlobalDistanceFieldModifiedPrimitives,
TEXT("Whether to log primitive modifications (add, remove, updatetransform) that caused an update of the global distance field.\n")
TEXT("This can be useful for tracking down why updating the global distance field is always costing a lot, since it should be mostly cached."),
Copying //UE4/Dev-Rendering to Dev-Main (//UE4/Dev-Main) (Source: //UE4/Dev-Rendering @ 2943238) #lockdown nick.penwarden ========================== MAJOR FEATURES + CHANGES ========================== Change 2932679 on 2016/04/04 by Martin.Mittring remove hack/cvar that is not longer needed as we fixed the bug #rb:Bob.Tellez #code_review:Bob.Tellez Change 2932681 on 2016/04/04 by Martin.Mittring fixed cvars in consolevariables.ini can affect engine even if marked with cheat (no longer load consolevariables.ini in shipping and test), unified 3 code path, added testcase, cvars with cheat in ini file other than consolevariables.ini now trigger ensure, =on/off/true/false/.. works in all ini files, added enure if non scalability setting are used in ScalabilityIni (get now ignored) #rb:David.Hill #code_review:Marcus.Wassmer, Michael.Noland Change 2932719 on 2016/04/04 by Marcus.Wassmer Merge 3 band SH back to DevRendering #rb Daniel.Wright Change 2932760 on 2016/04/04 by Zabir.Hoque Migrating high resolution cubemaps for skylight and reflection probes. #rb: Daniel.Wright Change 2933121 on 2016/04/05 by Rolando.Caloca DR - vk - Fix free blocks not getting joined - Fix compile issue Change 2933122 on 2016/04/05 by Rolando.Caloca DR - Do not shorten dumped shaders path Change 2933126 on 2016/04/05 by Rolando.Caloca DR - vk - Index Buffers using new resource management Change 2933127 on 2016/04/05 by Rolando.Caloca DR - vk - Extract multibuffer off index buffer Change 2933131 on 2016/04/05 by Rolando.Caloca DR - vk - Transition to vb's using mutlibuffer Change 2933136 on 2016/04/05 by Rolando.Caloca DR - vk - Change staging buffers to use resource allocation system - Fix free block not getting joined - Remove define Change 2933140 on 2016/04/05 by Rolando.Caloca DR - vk - 'static' textures now use resource mgmt - Release free pages back to the OS - Remove ensure Change 2933152 on 2016/04/05 by Rolando.Caloca DR - vk - Fix aliasing granularity - Fix renderpass end/copy buffer ensure Change 2933155 on 2016/04/05 by Rolando.Caloca DR - SCW - Fix for -directcompile to directly load file for preprocessor Change 2933158 on 2016/04/05 by Rolando.Caloca DR - hlslcc - Error on Metal if trying to R & W on RWTextures - Fix indices on RW reads to be unsigned #codereview Mark.Satterthwaite, Michael.Trepka Change 2933169 on 2016/04/05 by Rolando.Caloca DR - vk - Move header to public to match changes on DevMobile Change 2933173 on 2016/04/05 by David.Hill Deferred decal rendering with negative scale #rb:Matrin.Mittring #jira:UE-27389 Change 2933273 on 2016/04/05 by Rolando.Caloca DR - vk - Fix renderdoc markers Change 2933274 on 2016/04/05 by Rolando.Caloca DR - Support for -AttachDebugger Change 2933316 on 2016/04/05 by Rolando.Caloca DR - vk - Compile fix whene enabling define Change 2933334 on 2016/04/05 by Rolando.Caloca DR - Compile fix #codereview Martin.Mittring Change 2933805 on 2016/04/05 by Brian.Karis Temporal AA dynamic antighosting. Fixed DOF Change 2933811 on 2016/04/05 by Brian.Karis Fixed area light NaNs. Improvements to area lights. Horizen handling for wrap around. Change 2933812 on 2016/04/05 by Brian.Karis Fixed fresnel on SSS skin. Change 2933813 on 2016/04/05 by Brian.Karis Tessellation fix Change 2933816 on 2016/04/05 by Brian.Karis Improved forward shading support [CL 2943241 by Gil Gribb in Main branch]
2016-04-13 21:24:38 -04:00
ECVF_RenderThreadSafe
Copying //UE4/Dev-Rendering to Dev-Main (//UE4/Dev-Main) #lockdown nick.penwarden ========================== MAJOR FEATURES + CHANGES ========================== Change 2771498 on 2015/11/18 by Rolando.Caloca DevRendering - HlslParser - Do not crash if an unknown preprocessor directive is found; add proper support for #pragma #codereview Marcus.Wassmer Change 2771600 on 2015/11/18 by Rolando.Caloca DevRendering - SCW - Added support for running the platform shader compiler for one usf file off the dumped usf Usage: ShaderCompileWorker -directcompile FILENAME.USF -entry=EntryPoint -format=PCD3D_SM5/SF_PS4/etc -vs/-ps/-gs/-hs/-ds/-cs -Also removed old communication enum from SCW #rb Daniel.Wright Change 2771647 on 2015/11/18 by Rolando.Caloca DevRendering - HlslParser - Refactored removed unused outputs code in prep for reusing a lot of this code - Entry point string now gets modified to the optimized one - Fixed parser allocator when requesting pages bigger than PageSize #rb Chris.Bunner Change 2772133 on 2015/11/18 by Chris.Bunner Removed physics shape type zeroing on Speedtree import. UE-23285 #rb Ori.Cohen Change 2772225 on 2015/11/18 by Rolando.Caloca DevRendering - Hlsl - Support for removing unused inputs on pixel shaders - Fix some shadow variable warnings #rb Chris.Bunner, Nick.Penwarden Change 2772469 on 2015/11/18 by Daniel.Wright Fixed SCW always exiting after compiling a long shader, now checks idle time starting from the end of the last compile task Automated smoke tests aren't run in standalone programs which are frequently launched as they increase the startup time (doubles startup time of SCW as shown in sampling profile) #rb Rolando.Caloca Change 2772471 on 2015/11/18 by Daniel.Wright Particle SubUV cutouts * A new asset type 'SubUV Animation' precomputes bounding geometry for every frame of a SubUV texture animation. * Particle emitters with a SubUV module can then use this SubUV Animation to render with much tigher bounding geometry to reduce overdraw. * GPU performance savings depend on how much empty space (zero alpha) existed in the texture. Measured a reduction of 2-3x GPU time on a smoke effect. * This only works if the material does not modify opacity to reveal areas with zero texture alpha Change 2772483 on 2015/11/18 by Marcus.Wassmer Filtering options on UnrealPak -list #rb Josh.Adams Change 2772644 on 2015/11/18 by Daniel.Wright Integrate - Temporal AA dithering is only enabled if outputting to a low precision format #rb Nick.Penwarden Change 2773336 on 2015/11/19 by Rolando.Caloca DevRendering - PS4 shaders - Added input/output attribute information when r.PS4StripExtraShaderBinaryData=0 #rb Marcus.Wassmer Change 2773476 on 2015/11/19 by Rolando.Caloca DevRendering - PS4 Shader attribute export stats Run using r.PS4DumpExportStats 1 in the console - Also fixed non-vertex shaders not getting optional data #codereview Marcus.Wassmer Change 2773865 on 2015/11/19 by Gil.Gribb UE4 - Added an FName churn tracker. Change 2773900 on 2015/11/19 by Rolando.Caloca DevRendering - Fix sharing shaders for material & mesh shaders #rb Marcus.Wassmer Change 2774277 on 2015/11/19 by Gil.Gribb UE4 - Did minor optimizations to the PS4 RHI and drawlists. Change 2774421 on 2015/11/19 by Olaf.Piesche Fix #2 for UE-23325 - separate translucency materials don't show in static mesh editor #codereview Martin.Mittring Change 2774447 on 2015/11/19 by Rolando.Caloca DevRendering - Velocity and Depth shader pipelines #rb Marcus.Wassmer Change 2774603 on 2015/11/19 by Marcus.Wassmer Windowed vsync for ps4 #rb Rolando.Caloca Change 2775650 on 2015/11/20 by Rolando.Caloca DevRendering - Added two utility overloads per UDN suggestion #codereview Gil.Gribb Change 2775798 on 2015/11/20 by David.Hill Adding a new AutoExposure method #rb Martin.Mittring Change 2776345 on 2015/11/20 by Daniel.Wright Capsule shadows for movable skylight * Gathers capsule occlusion along the unoccluded sky cone computed by Distance Field Ambient Occlusion * Requires DFAO to be enabled at the moment * Some serious artifacts remaining in indoor scenarios, as the unoccluded sky direction is not continuous Change 2777033 on 2015/11/22 by Uriel.Doyon Enabled SceneTextures node validation when material domain is DeferredDecal #review Martin.Mittring #jira UE-23141 Change 2778618 on 2015/11/23 by Daniel.Wright
2015-12-10 21:55:37 -05:00
);
float GAOGlobalDFClipmapDistanceExponent = 2;
FAutoConsoleVariableRef CVarAOGlobalDFClipmapDistanceExponent(
TEXT("r.AOGlobalDFClipmapDistanceExponent"),
GAOGlobalDFClipmapDistanceExponent,
TEXT("Exponent used to derive each clipmap's size, together with r.AOInnerGlobalDFClipmapDistance."),
Copying //UE4/Dev-Rendering to Dev-Main (//UE4/Dev-Main) (Source: //UE4/Dev-Rendering @ 2943238) #lockdown nick.penwarden ========================== MAJOR FEATURES + CHANGES ========================== Change 2932679 on 2016/04/04 by Martin.Mittring remove hack/cvar that is not longer needed as we fixed the bug #rb:Bob.Tellez #code_review:Bob.Tellez Change 2932681 on 2016/04/04 by Martin.Mittring fixed cvars in consolevariables.ini can affect engine even if marked with cheat (no longer load consolevariables.ini in shipping and test), unified 3 code path, added testcase, cvars with cheat in ini file other than consolevariables.ini now trigger ensure, =on/off/true/false/.. works in all ini files, added enure if non scalability setting are used in ScalabilityIni (get now ignored) #rb:David.Hill #code_review:Marcus.Wassmer, Michael.Noland Change 2932719 on 2016/04/04 by Marcus.Wassmer Merge 3 band SH back to DevRendering #rb Daniel.Wright Change 2932760 on 2016/04/04 by Zabir.Hoque Migrating high resolution cubemaps for skylight and reflection probes. #rb: Daniel.Wright Change 2933121 on 2016/04/05 by Rolando.Caloca DR - vk - Fix free blocks not getting joined - Fix compile issue Change 2933122 on 2016/04/05 by Rolando.Caloca DR - Do not shorten dumped shaders path Change 2933126 on 2016/04/05 by Rolando.Caloca DR - vk - Index Buffers using new resource management Change 2933127 on 2016/04/05 by Rolando.Caloca DR - vk - Extract multibuffer off index buffer Change 2933131 on 2016/04/05 by Rolando.Caloca DR - vk - Transition to vb's using mutlibuffer Change 2933136 on 2016/04/05 by Rolando.Caloca DR - vk - Change staging buffers to use resource allocation system - Fix free block not getting joined - Remove define Change 2933140 on 2016/04/05 by Rolando.Caloca DR - vk - 'static' textures now use resource mgmt - Release free pages back to the OS - Remove ensure Change 2933152 on 2016/04/05 by Rolando.Caloca DR - vk - Fix aliasing granularity - Fix renderpass end/copy buffer ensure Change 2933155 on 2016/04/05 by Rolando.Caloca DR - SCW - Fix for -directcompile to directly load file for preprocessor Change 2933158 on 2016/04/05 by Rolando.Caloca DR - hlslcc - Error on Metal if trying to R & W on RWTextures - Fix indices on RW reads to be unsigned #codereview Mark.Satterthwaite, Michael.Trepka Change 2933169 on 2016/04/05 by Rolando.Caloca DR - vk - Move header to public to match changes on DevMobile Change 2933173 on 2016/04/05 by David.Hill Deferred decal rendering with negative scale #rb:Matrin.Mittring #jira:UE-27389 Change 2933273 on 2016/04/05 by Rolando.Caloca DR - vk - Fix renderdoc markers Change 2933274 on 2016/04/05 by Rolando.Caloca DR - Support for -AttachDebugger Change 2933316 on 2016/04/05 by Rolando.Caloca DR - vk - Compile fix whene enabling define Change 2933334 on 2016/04/05 by Rolando.Caloca DR - Compile fix #codereview Martin.Mittring Change 2933805 on 2016/04/05 by Brian.Karis Temporal AA dynamic antighosting. Fixed DOF Change 2933811 on 2016/04/05 by Brian.Karis Fixed area light NaNs. Improvements to area lights. Horizen handling for wrap around. Change 2933812 on 2016/04/05 by Brian.Karis Fixed fresnel on SSS skin. Change 2933813 on 2016/04/05 by Brian.Karis Tessellation fix Change 2933816 on 2016/04/05 by Brian.Karis Improved forward shading support [CL 2943241 by Gil Gribb in Main branch]
2016-04-13 21:24:38 -04:00
ECVF_RenderThreadSafe
);
int32 GAOGlobalDFResolution = 128;
FAutoConsoleVariableRef CVarAOGlobalDFResolution(
TEXT("r.AOGlobalDFResolution"),
GAOGlobalDFResolution,
TEXT("Resolution of the global distance field. Higher values increase fidelity but also increase memory and composition cost."),
Copying //UE4/Dev-Rendering to Dev-Main (//UE4/Dev-Main) (Source: //UE4/Dev-Rendering @ 2943238) #lockdown nick.penwarden ========================== MAJOR FEATURES + CHANGES ========================== Change 2932679 on 2016/04/04 by Martin.Mittring remove hack/cvar that is not longer needed as we fixed the bug #rb:Bob.Tellez #code_review:Bob.Tellez Change 2932681 on 2016/04/04 by Martin.Mittring fixed cvars in consolevariables.ini can affect engine even if marked with cheat (no longer load consolevariables.ini in shipping and test), unified 3 code path, added testcase, cvars with cheat in ini file other than consolevariables.ini now trigger ensure, =on/off/true/false/.. works in all ini files, added enure if non scalability setting are used in ScalabilityIni (get now ignored) #rb:David.Hill #code_review:Marcus.Wassmer, Michael.Noland Change 2932719 on 2016/04/04 by Marcus.Wassmer Merge 3 band SH back to DevRendering #rb Daniel.Wright Change 2932760 on 2016/04/04 by Zabir.Hoque Migrating high resolution cubemaps for skylight and reflection probes. #rb: Daniel.Wright Change 2933121 on 2016/04/05 by Rolando.Caloca DR - vk - Fix free blocks not getting joined - Fix compile issue Change 2933122 on 2016/04/05 by Rolando.Caloca DR - Do not shorten dumped shaders path Change 2933126 on 2016/04/05 by Rolando.Caloca DR - vk - Index Buffers using new resource management Change 2933127 on 2016/04/05 by Rolando.Caloca DR - vk - Extract multibuffer off index buffer Change 2933131 on 2016/04/05 by Rolando.Caloca DR - vk - Transition to vb's using mutlibuffer Change 2933136 on 2016/04/05 by Rolando.Caloca DR - vk - Change staging buffers to use resource allocation system - Fix free block not getting joined - Remove define Change 2933140 on 2016/04/05 by Rolando.Caloca DR - vk - 'static' textures now use resource mgmt - Release free pages back to the OS - Remove ensure Change 2933152 on 2016/04/05 by Rolando.Caloca DR - vk - Fix aliasing granularity - Fix renderpass end/copy buffer ensure Change 2933155 on 2016/04/05 by Rolando.Caloca DR - SCW - Fix for -directcompile to directly load file for preprocessor Change 2933158 on 2016/04/05 by Rolando.Caloca DR - hlslcc - Error on Metal if trying to R & W on RWTextures - Fix indices on RW reads to be unsigned #codereview Mark.Satterthwaite, Michael.Trepka Change 2933169 on 2016/04/05 by Rolando.Caloca DR - vk - Move header to public to match changes on DevMobile Change 2933173 on 2016/04/05 by David.Hill Deferred decal rendering with negative scale #rb:Matrin.Mittring #jira:UE-27389 Change 2933273 on 2016/04/05 by Rolando.Caloca DR - vk - Fix renderdoc markers Change 2933274 on 2016/04/05 by Rolando.Caloca DR - Support for -AttachDebugger Change 2933316 on 2016/04/05 by Rolando.Caloca DR - vk - Compile fix whene enabling define Change 2933334 on 2016/04/05 by Rolando.Caloca DR - Compile fix #codereview Martin.Mittring Change 2933805 on 2016/04/05 by Brian.Karis Temporal AA dynamic antighosting. Fixed DOF Change 2933811 on 2016/04/05 by Brian.Karis Fixed area light NaNs. Improvements to area lights. Horizen handling for wrap around. Change 2933812 on 2016/04/05 by Brian.Karis Fixed fresnel on SSS skin. Change 2933813 on 2016/04/05 by Brian.Karis Tessellation fix Change 2933816 on 2016/04/05 by Brian.Karis Improved forward shading support [CL 2943241 by Gil Gribb in Main branch]
2016-04-13 21:24:38 -04:00
ECVF_RenderThreadSafe
);
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3249742) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3232283 on 2016/12/13 by Ben.Woodhouse D3D12 - downgrade root signature size warning to a log following a discussion with Microsoft. There's not much we can actually do about it, and it's not relevant to all hardware #jira UE-36999 Change 3232641 on 2016/12/13 by Mark.Satterthwaite - Eliminate redundant state changes in MetalRHI in the state cache. - Add a new debug level for setting buffers to nil prior to calls to set*Bytes so that the tool doesn't display incorrect data. - Make testing for validation & statistics features use the same EMetalFeatures API as everything else for consistency. - Cache the fallback depth-stencil texture in the state cache and ignore it for determining whether a pass can restart - if we are using this texture its contents are worthless anyway. Change 3232661 on 2016/12/13 by Mark.Satterthwaite Re-enable Metal SM5 & DFAO/DistanceFieldShadowing on Intel for 10.12.2 or later. Change 3232759 on 2016/12/13 by Ben.Woodhouse Fix memory leak on XB1 when calling GPURealloc with count of 0, suggested on UDN https://udn.unrealengine.com/questions/326660/gpurealloc-leak.html Change 3232803 on 2016/12/13 by Ben.Marsh Add UT to the populate DDC job, and cook UT and Fortnite for Mac as well. Change 3232836 on 2016/12/13 by Ben.Marsh Split cooks to populate DDC into separate nodes for each platform. May help to reduce number of timeouts on remote VMs. Change 3232974 on 2016/12/13 by Rolando.Caloca DR - Refactor common code to UWorld::RecreateScene #jira UE-36719 PR #2824 Change 3232976 on 2016/12/13 by Ben.Marsh Add missing dependency on tools node for Mac cooks. Need to compile SCW first. Change 3233289 on 2016/12/13 by Olaf.Piesche Fixing potentially broken spot/point light fade with old content; initialize new properties properly Change 3233811 on 2016/12/13 by Mark.Satterthwaite Fix compiling QA-Material tessellation shaders that don't need to emit from Hull or sample in Domain the HSOut buffer which was confusing MetalBackend. Change 3233854 on 2016/12/13 by Mark.Satterthwaite More information about texture type validation errors in Metal. Change 3234650 on 2016/12/14 by Rolando.Caloca DR - vk - Fix bad aspect on depth cubemaps Change 3234651 on 2016/12/14 by Rolando.Caloca DR - vk - Fix for 32 bit crash on dump layer Change 3234813 on 2016/12/14 by Guillaume.Abadie Fixes texture mask static lighting when using GBuffer selective outputs. #jira UE-39527 Change 3235047 on 2016/12/14 by Uriel.Doyon Refactored HLOD texture streaming strategy to separate forced load from visibility. Added an incremental update in the last stage of the texture streaming update load to clear any pending work. Added an option "All" to the "BuildMateriaTexturelStreamingData" command to force rebuild everything. Change 3235317 on 2016/12/14 by Uriel.Doyon Removed timed primitives in the texture streaming since it was not used and there is now a fallback implementation in UPrimitiveComponent::GetStreamingTextureInfo. Change 3235431 on 2016/12/14 by Rolando.Caloca DR - Fix for Vulkan drawing black Change 3236788 on 2016/12/15 by Mark.Satterthwaite Fix 10.11.6 support (aka -nometalv2): the stencil view workaround necessitates a mid-render blit and the way things were setup resulted in the HasValidRenderTargets assert firing. Refactored the code to separate the concept or valid render-states in the cache from active render-states in the render-pass. Now it works as intended and will be needed for 4.15. Change 3236850 on 2016/12/15 by Mark.Satterthwaite Make changing the Metal Shader Version project setting prompt the user to restart for the changes to take effect. #jira UE-39801 Change 3237002 on 2016/12/15 by Benjamin.Hyder submitting updated TM-Shadermodels map Change 3237312 on 2016/12/15 by Rolando.Caloca DR - Change more macros to lambdas Change 3237394 on 2016/12/15 by Mark.Satterthwaite Add Metal-specific permutations of TBasePassHS - they affect the C++ definition on all platforms but are only cached or used on Metal - because the way we compile the combined VS+HS tessellation stage requires that the combined VS + HS HLSL code references the same resources, otherwise we get incorrect resouce bindings and subsequently fail to render properly. Long-term the Metal tessellation code will need to be refactored so that the vertex shader stage is emitted as a separate shader from the hull shader stage as this but will keep cropping back up and continue to complicate the engine. #jira UE-39799 Change 3237490 on 2016/12/15 by Daniel.Wright Fixed ULandscapeComponent::GetUsedMaterials Change 3237597 on 2016/12/15 by Ben.Woodhouse Disable timestamp queries on pre-Maxwell nvidia hardware. Local testing suggests that this is the major cause of instability in the UE4.14 release. It's possible that we could be more targeted by only excluding Fermi and older hardware, but identifying fermi hardware by device ID is difficult in practice, since the range overlaps with Kepler. Change 3237654 on 2016/12/15 by Daniel.Wright Non-editor compile fix Change 3238229 on 2016/12/16 by Rolando.Caloca DR - Remove ExcludeRect from inner RHI Clear methods; ensure will happen if trying to use it Change 3238236 on 2016/12/16 by Rolando.Caloca DR - Compile fixes Change 3238280 on 2016/12/16 by Marc.Olano Small optimization to Lanczos-3 upsample shader code. Change 3238321 on 2016/12/16 by Rolando.Caloca DR - Compile fix Change 3238331 on 2016/12/16 by Rolando.Caloca DR - compile fix Change 3238495 on 2016/12/16 by Marc.Olano Replace TEA random number generator with PCG. Was only used in #if-disabled reference rendering, but ldoes make better quality reference rendering when enabled. Change 3238496 on 2016/12/16 by Marc.Olano Tone mapping fix for OR-31752, cherry picked from Orion 3208273 Assumption that green is approximates luminance fails on red/blue HDR content, resulting in ugly black artifacts. Go back to luminance. Change 3238520 on 2016/12/16 by Rolando.Caloca DR - CIS Fix Change 3238571 on 2016/12/16 by Rolando.Caloca DR - CIS fix Change 3238605 on 2016/12/16 by Daniel.Wright Sharing IndirectLightingCacheTextureSampler samplers Change 3238626 on 2016/12/16 by Daniel.Wright Ray Traced Distance Field Shadow optimizations * Tighter light space tile culling * Skip ray marching pixels before the RTDF cascade near distance, or further than the cascade far distance * Depth bounds test on upsample * Created FLightTileIntersectionParameters for encapsulation of light tile culling functionality * RTDF shadow time went from 1.8ms -> .8ms and 3.1ms -> 1.2ms in FortGPUTestbed on 7870 with these changes Change 3238652 on 2016/12/16 by Rolando.Caloca DR - RHI clear methods no longer have an ExcludeRect, use DrawClearQuad functions instead Change 3238855 on 2016/12/16 by Rolando.Caloca DR - Added FRHITexture2D GetSizeXY Change 3238881 on 2016/12/16 by Rolando.Caloca DR - CIS fix Change 3239008 on 2016/12/16 by Arne.Schober DR - Fixing accidently returning a stackpointer in EnqueueRenderCommands Change 3239012 on 2016/12/16 by Arne.Schober DR - missing file Change 3239255 on 2016/12/17 by Rolando.Caloca DR - Remove shader clears from D3D11 Change 3239690 on 2016/12/19 by Rolando.Caloca DR - vk - Misc fixes from 1.0.37.00 SDK warnings Change 3239964 on 2016/12/19 by Rolando.Caloca DR - Fix click on editor not showing selected Change 3239995 on 2016/12/19 by Rolando.Caloca DR - Enable dist field on GL4 & Vulkan SM5 Change 3240162 on 2016/12/19 by Daniel.Wright Added EnableDepthBoundsTest / DisableDepthBoundsTest to RHIUtilites to share some common code Change 3240163 on 2016/12/19 by Daniel.Wright Distance field self shadowing controls for hiding world position offset self-shadow artifacts * Removed static mesh build settings DistanceFieldBias, which shrunk the distance field, breaking AO and shadows * Added DistanceFieldSelfShadowBias, which prevents occlusion close to the surface only, maintaining shadows on the ground and AO on the ground Change 3240271 on 2016/12/19 by Daniel.Wright Use 16 bit indices for distance field objects culled to tiles, when 16 bit will be enough. Saves 10mb of tile culling buffers. Change 3240282 on 2016/12/19 by Rolando.Caloca DR - Proper fix for hit proxies clear - Added missing stencil ref to DrawClearQuad Change 3240316 on 2016/12/19 by Rolando.Caloca DR - vk - Fixed some new 1.0.37.0 warnings Change 3240354 on 2016/12/19 by Rolando.Caloca DR - Dev shaders on sm4/5 Change 3240759 on 2016/12/20 by Rolando.Caloca DR - Fix bad crc on GL element declarations Change 3240895 on 2016/12/20 by Rolando.Caloca DR - vk - Swapchain fixes Change 3241057 on 2016/12/20 by Rolando.Caloca DR - vk - Fix resize on desktop Change 3241112 on 2016/12/20 by Rolando.Caloca DR - vk - Fix 1.0.37.0 warnings - Ignore some warnings we know we can't fix Change 3241310 on 2016/12/20 by Rolando.Caloca DR - vk - Fix crash Change 3241417 on 2016/12/20 by Daniel.Wright [Copy] Fixed race condition with FPrecomputedLightVolume::Data which was exposed when switching lighting scenarios Change 3241990 on 2016/12/21 by Daniel.Wright Converted DistanceFieldVolume data to BulkData * FDistanceFieldVolumeData Serialize time from .7s on PS4 to 0s Change 3242005 on 2016/12/21 by Daniel.Wright Removed unused !USE_DEPTH_RANGE_LISTS path to reduce complexity Change 3242295 on 2016/12/21 by Bob.Tellez Duplicating CL#3242294 from //Fortnite/Main #UE4 Re-applying the fix for rendering editor primitives when r.EarlyZPassOnlyMaterialMasking is enabled Change 3242487 on 2016/12/21 by Marcus.Wassmer Fix typo Change 3243091 on 2016/12/22 by Daniel.Wright Fixed too many groups dispatched for TConeTraceScreenGridGlobalOcclusionCS Change 3243161 on 2016/12/22 by Uriel.Doyon New async tasks for the streaming update. Optimizing the biggest frame cost. Change 3243179 on 2016/12/22 by Uriel.Doyon Fixed possible invalid access from the async FNormalizeLightmapTexelFactorTask Change 3243236 on 2016/12/22 by Daniel.Wright Fixed DFAO bilateral upsample * Depth buffer was being unbound due to lack of DepthRead_StencilNop Change 3243452 on 2016/12/23 by Ben.Woodhouse Bring back 1024 render query limit workaround on D3D12 which was lost during the merge from partners #jira UE-35247 Change 3243512 on 2016/12/23 by Uriel.Doyon Improved task system for texture streaming. Change 3243742 on 2016/12/26 by Rolando.Caloca DR - vk - Fix UAV clears - Removed old validation layer - Print found device layers Change 3243745 on 2016/12/27 by Rolando.Caloca DR - vk - Fix for texture cube arrays - Warning for ClearUAVs Change 3243762 on 2016/12/27 by Rolando.Caloca DR - vk - Always use pipeline cache Change 3244450 on 2016/12/31 by Rolando.Caloca DR - vk - Pre reqs for separate transfer queue Change 3244453 on 2016/12/31 by Rolando.Caloca DR - vk - Win32 compile fix Change 3244756 on 2017/01/03 by Marcus.Wassmer Copying //Tasks/UE4/Dev-Niagara@3244743 to Dev-Rendering (//UE4/Dev-Rendering) Change 3244757 on 2017/01/03 by Marcus.Wassmer Niagara is still experimental in non-task branches. Change 3245059 on 2017/01/03 by Benjamin.Hyder Submitting TM-TrigNodes map Change 3245500 on 2017/01/03 by Olaf.Piesche Compile fix #1 for post-merge problems Change 3245572 on 2017/01/03 by Olaf.Piesche (Speculative) fix #2 for post-merge build problem. Hopefully fixes public distribution level error for cross compiler tool. Change 3245683 on 2017/01/03 by Marcus.Wassmer Fix some niagara warnings Change 3245732 on 2017/01/03 by Marcus.Wassmer Fix Niagara compile on clang platforms. Fix a few warnings / static analysis things as well. Change 3246403 on 2017/01/04 by Rolando.Caloca DR - vk - Fix bogus warning Change 3246432 on 2017/01/04 by Marcus.Wassmer Copying //Tasks/UE4/Dev-Niagara@3246424 to Dev-Rendering (//UE4/Dev-Rendering) Change 3246538 on 2017/01/04 by Rolando.Caloca DR - vk - Show hitch time for compute psos Change 3246580 on 2017/01/04 by Rolando.Caloca DR - vk - compile fix Change 3246610 on 2017/01/04 by Rolando.Caloca DR - Compute PSO pre reqs Change 3246707 on 2017/01/04 by Marcus.Wassmer Add missing integer operations to UnrealMathDirectX.h Change 3246786 on 2017/01/04 by Marcus.Wassmer Avoid public dependency build errors. Should probably just remove the DDCUtils module instead Change 3246828 on 2017/01/04 by Olaf.Piesche UE-39249; need to check the view as well as the view family in CheckAndUpdateLastFrame; scene captures use a different family, but each eye for VR uses a different scene view. Change 3247026 on 2017/01/04 by Rolando.Caloca DR - Remove CrossCompilerTool as it's not required anymore Change 3247086 on 2017/01/04 by Marcus.Wassmer Remove includes for Core.h monolithic header Change 3247227 on 2017/01/04 by Marcus.Wassmer Fix typo and compile errors. Change 3247228 on 2017/01/04 by Marcus.Wassmer Use crossplatform intrinsics Change 3247229 on 2017/01/04 by Marcus.Wassmer Implement missing integer NEON operations. Change NEON vectorint to match name and sign from other platforms Change 3247245 on 2017/01/04 by Marcus.Wassmer Fixing various warnings/errors from clang platforms (Mac/Linux) Change 3247331 on 2017/01/04 by Marcus.Wassmer More Mac/clang fixes Change 3247958 on 2017/01/05 by Marcus.Wassmer VectorInt < - > Float ops should be conversions not reinterpret cast Change 3247959 on 2017/01/05 by Marcus.Wassmer Add missing ops to non-vector header Change 3247964 on 2017/01/05 by Rolando.Caloca DR - Temp fix for crash #jira UE-40211 Change 3248067 on 2017/01/05 by Rolando.Caloca DR - Static analysis fixes #jira UE-40167 Change 3248284 on 2017/01/05 by Rolando.Caloca DR - Linuix Compile fix #jira UE-40260 Change 3248288 on 2017/01/05 by Rolando.Caloca DR - Linux compile fix #jira UE-40264 Change 3248399 on 2017/01/05 by Brian.Karis Filtered importance sampling for envmap prefiltering. Fixed SSR on clearcoat with skylight only. Change 3248503 on 2017/01/05 by Rolando.Caloca DR - Linux fixes #jira UE-40264 Change 3248666 on 2017/01/05 by Brian.Karis Fix GL compile error Change 3248740 on 2017/01/05 by Marcus.Wassmer Fix linux and clang errors/warnings Change 3248851 on 2017/01/05 by Marcus.Wassmer Simplest fix for ES2 compile errors Change 3249217 on 2017/01/06 by Simon.Tovey Speculative fix for static analysis warning Change 3249296 on 2017/01/06 by Ben.Woodhouse XB1/Fast semantics: Add missing L1/L2 cache flush on transition to readable (or RW). The missing cache flush was causing indeterminism when reading from a texture shortly after writing to it as a render target. This fixes bloom and diffuse irradiance issues The bug has been there for a while, but CL 3227787 (drawclear early out) caused it to manifest #jira UE-39727 #jira UE-40238 Change 3249300 on 2017/01/06 by Ben.Woodhouse Remove workaround for diffuse irradiance (redundant clear). No longer necessary with CL 3249296 Change 3249387 on 2017/01/06 by Rolando.Caloca DR - Fix GL clear issues #jira UE-40254 Change 3249435 on 2017/01/06 by Ben.Woodhouse Duplicated from UT CL 3238664 Fix dbuffer decal rendering issues in fullscreen on PC. Also fixes crash in editor when viewing dbuffer materials. Pass clearcolor in RT params for system textures to workaround a bug with ClearColorTexture not working in fullscreen mode on DX11. Make sure dbuffer targets are bound if we're rendering mesh decals #jira UT-6891 #jira UE-39842 Change 3249721 on 2017/01/06 by Marcus.Wassmer Remove final references to non-existent Niagara data Change 3249742 on 2017/01/06 by Marcus.Wassmer Fix missing GPU particles on Mac. Pointers getting reused is causing the blendstate equality operator to fail. Simple workaround until we have time for a proper fix. [CL 3249983 by Marcus Wassmer in Main branch]
2017-01-06 17:51:46 -05:00
float GAOGlobalDFStartDistance = 100;
FAutoConsoleVariableRef CVarAOGlobalDFStartDistance(
TEXT("r.AOGlobalDFStartDistance"),
GAOGlobalDFStartDistance,
TEXT("World space distance along a cone trace to switch to using the global distance field instead of the object distance fields.\n")
TEXT("This has to be large enough to hide the low res nature of the global distance field, but smaller values result in faster cone tracing."),
Copying //UE4/Dev-Rendering to Dev-Main (//UE4/Dev-Main) (Source: //UE4/Dev-Rendering @ 2943238) #lockdown nick.penwarden ========================== MAJOR FEATURES + CHANGES ========================== Change 2932679 on 2016/04/04 by Martin.Mittring remove hack/cvar that is not longer needed as we fixed the bug #rb:Bob.Tellez #code_review:Bob.Tellez Change 2932681 on 2016/04/04 by Martin.Mittring fixed cvars in consolevariables.ini can affect engine even if marked with cheat (no longer load consolevariables.ini in shipping and test), unified 3 code path, added testcase, cvars with cheat in ini file other than consolevariables.ini now trigger ensure, =on/off/true/false/.. works in all ini files, added enure if non scalability setting are used in ScalabilityIni (get now ignored) #rb:David.Hill #code_review:Marcus.Wassmer, Michael.Noland Change 2932719 on 2016/04/04 by Marcus.Wassmer Merge 3 band SH back to DevRendering #rb Daniel.Wright Change 2932760 on 2016/04/04 by Zabir.Hoque Migrating high resolution cubemaps for skylight and reflection probes. #rb: Daniel.Wright Change 2933121 on 2016/04/05 by Rolando.Caloca DR - vk - Fix free blocks not getting joined - Fix compile issue Change 2933122 on 2016/04/05 by Rolando.Caloca DR - Do not shorten dumped shaders path Change 2933126 on 2016/04/05 by Rolando.Caloca DR - vk - Index Buffers using new resource management Change 2933127 on 2016/04/05 by Rolando.Caloca DR - vk - Extract multibuffer off index buffer Change 2933131 on 2016/04/05 by Rolando.Caloca DR - vk - Transition to vb's using mutlibuffer Change 2933136 on 2016/04/05 by Rolando.Caloca DR - vk - Change staging buffers to use resource allocation system - Fix free block not getting joined - Remove define Change 2933140 on 2016/04/05 by Rolando.Caloca DR - vk - 'static' textures now use resource mgmt - Release free pages back to the OS - Remove ensure Change 2933152 on 2016/04/05 by Rolando.Caloca DR - vk - Fix aliasing granularity - Fix renderpass end/copy buffer ensure Change 2933155 on 2016/04/05 by Rolando.Caloca DR - SCW - Fix for -directcompile to directly load file for preprocessor Change 2933158 on 2016/04/05 by Rolando.Caloca DR - hlslcc - Error on Metal if trying to R & W on RWTextures - Fix indices on RW reads to be unsigned #codereview Mark.Satterthwaite, Michael.Trepka Change 2933169 on 2016/04/05 by Rolando.Caloca DR - vk - Move header to public to match changes on DevMobile Change 2933173 on 2016/04/05 by David.Hill Deferred decal rendering with negative scale #rb:Matrin.Mittring #jira:UE-27389 Change 2933273 on 2016/04/05 by Rolando.Caloca DR - vk - Fix renderdoc markers Change 2933274 on 2016/04/05 by Rolando.Caloca DR - Support for -AttachDebugger Change 2933316 on 2016/04/05 by Rolando.Caloca DR - vk - Compile fix whene enabling define Change 2933334 on 2016/04/05 by Rolando.Caloca DR - Compile fix #codereview Martin.Mittring Change 2933805 on 2016/04/05 by Brian.Karis Temporal AA dynamic antighosting. Fixed DOF Change 2933811 on 2016/04/05 by Brian.Karis Fixed area light NaNs. Improvements to area lights. Horizen handling for wrap around. Change 2933812 on 2016/04/05 by Brian.Karis Fixed fresnel on SSS skin. Change 2933813 on 2016/04/05 by Brian.Karis Tessellation fix Change 2933816 on 2016/04/05 by Brian.Karis Improved forward shading support [CL 2943241 by Gil Gribb in Main branch]
2016-04-13 21:24:38 -04:00
ECVF_RenderThreadSafe
);
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3185985) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3170391 on 2016/10/21 by Ben.Woodhouse Remove the wait on end of frame ensure, because we can't rely on all the the underlying codepaths to never miss a call to flush RHI resources. The consequences of missing a flush on a given frame are not serious now, since we enforce the synchronisation with a fence, preventing the rendering thread from getting too far ahead. We will simply accumulate resources for an additional frame when this happens. #jira UE-37437 #fyi rolando.caloca, marcus.wassmer Change 3170659 on 2016/10/21 by Rolando.Caloca DR - vk - Prep work for state key changes Change 3170676 on 2016/10/21 by Rolando.Caloca DR - vk - Reworked blend state keys - Added depth/stencil to pipeline key Change 3170848 on 2016/10/21 by Daniel.Wright Level viewport 'show stats' option is now enabled by default, which avoids confusion with artists thinking lighting is built, when really the message is hidden. Change 3170849 on 2016/10/21 by Daniel.Wright Split FProjectedShadowInfo::RenderProjection into smaller functions which make the algorithm structure clear Change 3170995 on 2016/10/21 by Rolando.Caloca DR - vk - Show object on vulkan validation msgs Change 3171085 on 2016/10/21 by Rolando.Caloca DR - vk - Fix pipelines being used with incompatible renderpasses Change 3171159 on 2016/10/21 by Rolando.Caloca DR - vk - Fix layout when reading textures on CPU Change 3171167 on 2016/10/21 by Rolando.Caloca DR - vk - compile fix Change 3172462 on 2016/10/24 by Daniel.Wright Added a warning about shader compile times to the material tooltip Change 3172463 on 2016/10/24 by Daniel.Wright Reduced MinUnoccludedFraction to avoid artitfacts when a stationary light touches only a tiny part of a mesh Change 3172716 on 2016/10/24 by Brian.Karis Fix for crash UE-37369 when reimporting over a generated LOD. Change 3172967 on 2016/10/24 by Rolando.Caloca DR - vk - Fix writing buffers while GPU was using them Change 3174187 on 2016/10/25 by Olaf.Piesche UE-37020 Change 3174718 on 2016/10/26 by Rolando.Caloca DR - vk - Remove old timestamp queries, increase occlusion queries per pool to 4k Change 3175960 on 2016/10/26 by Rolando.Caloca DR - Added support for hlslcc header to have custom parsing Change 3176611 on 2016/10/27 by David.Hill DrawWireCone confusion: In response to a UDN, I'm updating confusing parameter names and comments for DrawWireCone() and DrawWireSphereCappedCone() Change 3177111 on 2016/10/27 by Rolando.Caloca DR - vk - Fix timestamps for frame Change 3177192 on 2016/10/27 by Arne.Schober DR - DitherLOD refactor - moved computation of the DepthStencil state out of SetMeshRenderState into GetDitheredLODTransitionState this is a prerequisite of further PSO work where we want to move up State setting in a similar war and reuse FMeshDrawingRenderState Change 3177278 on 2016/10/27 by Olaf.Piesche UE-37484 Change 3177297 on 2016/10/27 by Rolando.Caloca DR - vk - Enable GRHISupportsBaseVertexIndex Change 3177607 on 2016/10/27 by Rolando.Caloca DR - vk - SM4 UB prep Change 3178052 on 2016/10/28 by Arne.Schober DR - fix WebGL - the WebGL compiler is very picky on double underscores and does want the presission to be defined before any function definition. Change 3178156 on 2016/10/28 by Rolando.Caloca DR - vk - Added query timer - Fixed inline issues Change 3178158 on 2016/10/28 by Rolando.Caloca DR - vk - Fixes for out of stencil bits Change 3178462 on 2016/10/28 by Rolando.Caloca DR - vk - Fixes for Elemental Change 3179131 on 2016/10/28 by Rolando.Caloca DR - vk - Fix for r.Vulkan.UseRealUBs Change 3179139 on 2016/10/28 by Rolando.Caloca DR - vk - Move UB ring buffer to context Change 3179145 on 2016/10/28 by Rolando.Caloca DR - vk - Fix buffer barriers Change 3179888 on 2016/10/31 by Rolando.Caloca DR - vk - Align buffers to 16 bytes as we sometimes write to them with SIMD Change 3179923 on 2016/10/31 by Rolando.Caloca DR - vk - Wait for swapchain counter Change 3180430 on 2016/10/31 by Rolando.Caloca DR - vk - Properly wait for occlusion queries/cmd buffer - Actual log error if trying to use occlusion queries out of order Change 3180746 on 2016/10/31 by Rolando.Caloca DR - vk - Undo some waiting as it was on the wrong thread Change 3182115 on 2016/11/01 by Rolando.Caloca DR - hlslcc Linux path fix Change 3182118 on 2016/11/01 by Daniel.Wright Fixed global distance field seam artifacts from landscapes with no subsections Change 3182368 on 2016/11/01 by Daniel.Wright Dynamic Indirect Shadows for static meshes using distance fields * These Distance Field indirect shadows use the same tile culled and downsampled framework that capsule shadows use, with similar GPU cost * Individual StaticMesh assets can enable bGenerateMeshDistanceField to compute a distance field, without the memory cost of enabling for the whole project * New StaticMeshComponent properties bCastDynamicIndirectShadow and DynamicIndirectShadowMinVisibility * New WorldSettings property DynamicIndirectShadowsSelfShadowingIntensity which replaces the cvar * The GBuffer now stores HasDynamicIndirectShadowCasterRepresentation instead of HasHeightfieldRepresentation * DFAO from landscape is now done through the global distance field entirely. Landscape contribution to the global distance field is deferred to attempt to workaround texture streaming issues. Change 3182408 on 2016/11/01 by Rolando.Caloca DR - vk - Reworked occlusion queries, fixes flickering on AMD Change 3182585 on 2016/11/01 by Daniel.Wright PS4 compile fix Change 3183151 on 2016/11/02 by Rolando.Caloca DR - vk - Fix issue when processing super quick cmd buffers Change 3183160 on 2016/11/02 by Rolando.Caloca Dr - vk - Call reset queries outside render pass Change 3183182 on 2016/11/02 by Rolando.Caloca DR - Switch clear Change 3183194 on 2016/11/02 by Rolando.Caloca DR - Try to catch crash ahead of time Change 3183268 on 2016/11/02 by Rolando.Caloca DR - vk - Rename RenderPassState to TransitionState Change 3183440 on 2016/11/02 by Daniel.Wright Renamed 'Dynamic Indirect Shadow' to 'Distance Field Indirect Shadow' Change 3183793 on 2016/11/02 by Daniel.Wright Added ShadowResolutionScale to lightcomponent Change 3183796 on 2016/11/02 by Daniel.Wright Improved bSimulatePhysics comment, with info on why it might be greyed out Change 3183797 on 2016/11/02 by Daniel.Wright Precomputed shadowmaps no longer enable Force2To1Aspect, which is only needed for lightmaps. Improves shadowmap utilization. Change 3183915 on 2016/11/02 by Rolando.Caloca DR - vk - Remove redundant renderpasses Change 3183991 on 2016/11/02 by Daniel.Wright Added r.ReflectionEnvironmentLightmapMixLargestWeight, useful for restricting lightmap mixing to darkening only Change 3184001 on 2016/11/02 by Daniel.Wright Better draw event for IndirectCapsuleShadows in stereo Change 3184096 on 2016/11/02 by Chris.Bunner HDR for D3D11 - NVAPI toggle and encoding, UI compositing. Removed some outdated tonemamping cvars and modes. Change 3184399 on 2016/11/02 by Daniel.Wright Static analysis workaround Change 3184455 on 2016/11/02 by Mark.Satterthwaite Fix missing log10 from FCompositePS on hlslcc shader platforms so that QA can continue their integration. #jira UE-38164 Change 3184953 on 2016/11/03 by Chris.Bunner Fixing CIS warnings. [CL 3186011 by Marcus Wassmer in Main branch]
2016-11-03 16:55:27 -04:00
int32 GAOGlobalDistanceFieldRepresentHeightfields = 1;
FAutoConsoleVariableRef CVarAOGlobalDistanceFieldRepresentHeightfields(
TEXT("r.AOGlobalDistanceFieldRepresentHeightfields"),
GAOGlobalDistanceFieldRepresentHeightfields,
TEXT("Whether to put landscape in the global distance field. Changing this won't propagate until the global distance field gets recached (fly away and back)."),
ECVF_RenderThreadSafe
);
void FGlobalDistanceFieldInfo::UpdateParameterData(float MaxOcclusionDistance)
{
if (Clipmaps.Num() > 0)
{
for (int32 ClipmapIndex = 0; ClipmapIndex < GMaxGlobalDistanceFieldClipmaps; ClipmapIndex++)
{
FTextureRHIParamRef TextureValue = ClipmapIndex < Clipmaps.Num()
? Clipmaps[ClipmapIndex].RenderTarget->GetRenderTargetItem().ShaderResourceTexture
: NULL;
ParameterData.Textures[ClipmapIndex] = TextureValue;
if (ClipmapIndex < Clipmaps.Num())
{
const FGlobalDistanceFieldClipmap& Clipmap = Clipmaps[ClipmapIndex];
ParameterData.CenterAndExtent[ClipmapIndex] = FVector4(Clipmap.Bounds.GetCenter(), Clipmap.Bounds.GetExtent().X);
// GlobalUV = (WorldPosition - GlobalVolumeCenterAndExtent[ClipmapIndex].xyz + GlobalVolumeScollOffset[ClipmapIndex].xyz) / (GlobalVolumeCenterAndExtent[ClipmapIndex].w * 2) + .5f;
// WorldToUVMul = 1.0f / (GlobalVolumeCenterAndExtent[ClipmapIndex].w * 2)
// WorldToUVAdd = (GlobalVolumeScollOffset[ClipmapIndex].xyz - GlobalVolumeCenterAndExtent[ClipmapIndex].xyz) / (GlobalVolumeCenterAndExtent[ClipmapIndex].w * 2) + .5f
const FVector WorldToUVAdd = (Clipmap.ScrollOffset - Clipmap.Bounds.GetCenter()) / (Clipmap.Bounds.GetExtent().X * 2) + FVector(.5f);
ParameterData.WorldToUVAddAndMul[ClipmapIndex] = FVector4(WorldToUVAdd, 1.0f / (Clipmap.Bounds.GetExtent().X * 2));
}
else
{
ParameterData.CenterAndExtent[ClipmapIndex] = FVector4(0, 0, 0, 0);
ParameterData.WorldToUVAddAndMul[ClipmapIndex] = FVector4(0, 0, 0, 0);
}
}
ParameterData.GlobalDFResolution = GAOGlobalDFResolution;
Copying //UE4/Dev-Rendering to Dev-Main (Source //UE4/Dev-Rendering@2932636) #lockdown nick.penwarden ========================== MAJOR FEATURES + CHANGES ========================== Change 2917472 on 2016/03/21 by Rolando.Caloca DR - Fix SCW directcompile arguments, add -pipeline Change 2919580 on 2016/03/23 by Rolando.Caloca DR - HlslParser - Fix for used elements (sparrow's arrow was showing when it shouldn't) Arrays of input/outputs are now flattened so disjoint entries can be optimized out (and fixes a bug) #jira OR-15380 #tests Run game with sparrow, test with slomo to check for gfx glitches Change 2919660 on 2016/03/23 by Rolando.Caloca DR - Latest vk changes (from dev mobile's 2916881 to 2919157) Change 2919902 on 2016/03/23 by Rolando.Caloca DR - Fix skeletal meshes decrementing stats twice #codereview Marcus.Wassmer #jira UE-28478 Change 2920020 on 2016/03/23 by David.Hill #Jira UE-28503 EyeAdapation when used in material shader may not be initialized. #rb olaf.piesche Change 2920071 on 2016/03/23 by Rolando.Caloca DR - Remove old vk define - Started moving around direct calls to queue submit Change 2920252 on 2016/03/23 by Rolando.Caloca DR - Changes vk structs to classes Change 2920314 on 2016/03/23 by Olaf.Piesche Add -windowed to standalone game PIE command line to avoid PIE launching in full screen #jira UE-27870 #codereview michael.trepka Change 2920745 on 2016/03/24 by Uriel.Doyon Texture streaming build now takes into account the material texcoord scales applied to the texture sampling. Also finds out which texcoord is being used when sampling textures (between 0 and 3 currently). TexCoord analysis debug view shaders is now working with SM4 ane SM5. StaticMeshComponents hold persistent data coming from the texture streaming build. #tests tested with different Paragon assets. Editor SM4 & SM5. Cooked maps #codereview marcus.wassmer Change 2921335 on 2016/03/24 by Uriel.Doyon Added missing static keyword for locally defined console variable. #codereview rolando.caloca Change 2921416 on 2016/03/24 by Uriel.Doyon Revert enabling debugview shaders on non PC platforms (until properly tested and debugged) Change 2921446 on 2016/03/24 by Daniel.Wright Planar reflection mesh Change 2921530 on 2016/03/24 by Daniel.Wright Manual revert of Ronin planar reflections #codereview Ryan.Vance Change 2921608 on 2016/03/24 by Uriel.Doyon Updated texture streamer to take into account the new HLOD texture group. Change 2921677 on 2016/03/24 by Daniel.Wright Distance Field Specular Occlusion * Prototype - disabled by default Change 2921681 on 2016/03/24 by Daniel.Wright UnmappedTexelsPercentage is now 100 based Change 2921682 on 2016/03/24 by Daniel.Wright Planar reflections * New actor and component * The scene is rendered to texture with a mirrored camera and a clip plane each frame * The reflection texture is then applied to opaque pixels in a deferred pass, with distance and angle from plane fades * Translucent materials apply the nearest reflection plane in the base pass * Planar reflections require the project setting 'Support global clip plane for Planar Reflections' to be enabled, since writing to SV_ClipDistance all the time adds about 15% BasePass GPU time on PS4 * Fixed global distance field in materials which had been broken since moving global distance field properties into the view uniform buffer * Fixed PS4 removing system-value semantics when output from vertex shader and not read in next stage Change 2921734 on 2016/03/24 by Uriel.Doyon Fixed tessellated cube having wrong UVs #jira UE-28379 Change 2922063 on 2016/03/24 by Daniel.Wright Removed planar reflection debug code Change 2922428 on 2016/03/25 by Chris.Bunner Delete FShaderPipeline objects when clearing TMaterialShaderMaps. #rb Rolando.Caloca #jira UE-28621 Change 2922803 on 2016/03/25 by Rolando.Caloca DR - New cmd buffer management (disabled) - Move cmd buffer out of pending state and into context - Do not hardcode # cmd buffers - Move back buffer image mgmt into swapchain - Fixed some image layout transition bugs Change 2923056 on 2016/03/25 by Rolando.Caloca DR - Initial fix for canvas locking inside a render pass [CL 2932649 by Gil Gribb in Main branch]
2016-04-04 18:44:59 -04:00
extern float GAOConeHalfAngle;
const float GlobalMaxSphereQueryRadius = MaxOcclusionDistance / (1 + FMath::Tan(GAOConeHalfAngle));
ParameterData.MaxDistance = GlobalMaxSphereQueryRadius;
}
else
{
FPlatformMemory::Memzero(&ParameterData, sizeof(ParameterData));
}
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3091903) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3072947 on 2016/08/01 by Uriel.Doyon Texture GUIDs are now included in cooked builds, as they are required by the texture streamer to link build data to in game texture. #review-3072934 @marcus.wassmer #jira UE-34045 Change 3073301 on 2016/08/02 by Ben.Woodhouse Fix for large spotlight culling precision issues, reported on UDN by Aaron Jacobs at Double Fine. For a full description, see the UDN post https://udn.unrealengine.com/questions/305440/shadowed-light-flicker-caused-by-floating-point-pr.html #jira UE-34052 Change 3073689 on 2016/08/02 by Ben.Woodhouse Improved skin postprocess - support for full resolution, with diffuse/spec lighting combined into single RGBA (sharing chroma) Full res lighting gives less temporal AA flickering, sharper diffuse and specular lighting in the surface (since this is now at full resolution), faster postprocessing if using a 64-bit rendertarget (on NV 980Ti). Checkerboard rendering is controlled via the r.sss.checkerboard cvar. - 0 is off/full res, 1 is checkerboard, 2 is automatic based on scenecolor (non-checkerboard requires 64bit or more rendertarget w/separate alpha) Tested/profiled on PC, PS4 Change 3074666 on 2016/08/02 by Daniel.Wright Fixed stationary skylight brightness Change 3074667 on 2016/08/02 by Daniel.Wright Fixed r.ReflectionEnvironmentLightmapMixing Change 3074687 on 2016/08/02 by Daniel.Wright Disallowed DrawMaterialToRenderTarget and Begin/EndDrawCanvasToRenderTarget in construction scripts, since they don't work in game. Blutilities can be used to do blueprint rendering in the editor. Change 3075241 on 2016/08/03 by Rolando.Caloca DR - Fix linux compile issue & static analysis warning Change 3075746 on 2016/08/03 by Daniel.Wright Removed bOverride_AntiAliasingMethod and outdated ini references to PP AntiAliasingMethod Change 3075783 on 2016/08/03 by Ryan.Brucks #code.review Marcus.Wassmer Added two material nodes that return Atmospheric Light Vector and Light Direction using: View.AtmosphericFogSunColor View.AtmosphericFogSunDirection Nodes are called: AtmosphericLightVector AtmosphericLightColor Also changed SceneRendering.cpp so that values will be grabbed from directional lights without needing an Atmospheric Fog actor in the scene. Change 3075969 on 2016/08/03 by Uriel.Doyon Material GUIDs are not updated anymore when parents or textures change. Lighting now uses a hash built from the list of parents, textures and shader functions. #review-3072980 @marcus.wassmer @daniel.wright Change 3076116 on 2016/08/03 by Ryan.Brucks #code.review marcus.wassmer Fixed typo in the Caption of new Nodes "Atmospheric Light Vector" and "Atmospheric Light Color" Change 3076456 on 2016/08/03 by Rolando.Caloca DR - Fix geometry shader gl_Layer for SPIR-V Change 3076730 on 2016/08/03 by Uriel.Doyon Added user warning logic for the texture streaming build. Ran in MapCheck, BeginPlay and PreSave. #review-3072984 @marcus.wassmer Change 3077616 on 2016/08/04 by Daniel.Wright Planar reflection show flags can now be edited Change 3077621 on 2016/08/04 by Daniel.Wright Changed default Planar Reflection DistanceFromPlaneFadeoutEnd from 600 to 100, which reduces artifacts and is a more intuitive initial setting Change 3077792 on 2016/08/04 by Daniel.Wright Fixed an unnecessary sky capture caused by the sky light component owned by the default ASkyLight Change 3077799 on 2016/08/04 by Daniel.Wright Skip RF_ArchetypeObject for reflection captures Change 3077876 on 2016/08/04 by Marc.Olano Noise material perf improvements Change random number generator for Gradient-ALU (1.7x perf boost), improve speed of Voronoi noise quality level 3. Removes integer BBS random number generators. Fewer instructions, but too slow to use (see 1.7x perf boost above) Change 3077884 on 2016/08/04 by Daniel.Wright Lighting channels can now be edited on components with static mobility, since dynamic lights can still affect them Change 3078994 on 2016/08/05 by Simon.Tovey Fix for UE-34241 Scene proxy ptr was being cached during a downcast. Inside a call to CreateDynamicData, CheckMaterialUsage_Concurrent() was causing the scene proxy to be recreated an so the cached ptr was stale. I've fixed the immediate issue but recreating the scene proxy here doesn't seem great. Maybe CheckMaterailUsage() should be rethought a bit. Change 3079162 on 2016/08/05 by Ben.Woodhouse Fix for jittering in Paper2D. Was caused by override being ignored due to a change in intiialization order for AA settings. #jira UE-34091 Change 3079613 on 2016/08/05 by Daniel.Wright New blueprint function ClearRenderTarget2D, which is the only way to set a render target alpha directly New blueprint function CreateRenderTarget2D Change 3079708 on 2016/08/05 by Uriel.Doyon Fixed crash when building texture streaming on some levels. Change 3079795 on 2016/08/05 by Uriel.Doyon Fixed issue with instanced static meshes when building texture streaming. Fixed typo with func "GetNumTextureStreamingPrimitives" Change 3079806 on 2016/08/05 by Uriel.Doyon Enabled PerTexture MipBias. The per texture mip bias now resets to 0 when the texture gets required at low resolution. New scalability setting named "r.Streaming.LimitPoolSizeToVRAM" enabling the PoolSize to be limited the available VRAM (according to GPoolSizeVRAMPercentage) #review-3074662 @marcus.wassmer Change 3082698 on 2016/08/09 by Daniel.Wright Copy - CreateRenderTarget2D uses a world context object as owner, allows use in a construction script Change 3082699 on 2016/08/09 by Daniel.Wright Changed display name for 'Two Sided' shading model to 'Two Sided Foliage' to make it clear what it's intended to be used for Change 3083909 on 2016/08/10 by Olaf.Piesche #jira UE-34106 #jira UE-32784 #jira UE-31198 Reset vertex factories on mesh emitters if mesh has been reimported (if mesh package is dirty) Change 3084645 on 2016/08/10 by Olaf.Piesche #jira UE-30398 Fix offset added to particle collision locations. Change 3084709 on 2016/08/10 by Daniel.Wright Copy - Scene capture alpha is now inverted to match DrawMaterialToRenderTarget, and to allow compositing with existing render target contents Added CompositeMode to SceneCapture2D, which can be used to addively accumulate or composite instead of the default overwrite behavior Added bCaptureOnMovement to SceneCapture, which can be disabled so the only source of scene capturing is a manual capture by calling CaptureScene() Change 3084783 on 2016/08/10 by Rolando.Caloca DR - Use the first targeted rhi shader platform as the initial RHI to load on Windows #jira UE-34510 Change 3084958 on 2016/08/10 by Daniel.Wright Copy - Reverted cl 2938543 "Lightmass now respects owner bHidden, and bCastHiddenShadow" because it did not have backwards compatibility so breaks content using hidden light cards Change 3086023 on 2016/08/11 by Marcus.Wassmer Merging //UE4/Dev-Main@3085468 to Dev-Rendering (//UE4/Dev-Rendering) #test none Change 3086778 on 2016/08/11 by Ben.Woodhouse Workaround for fortnite character rendering issue. Enable checkerboard rendering by default until we can fix properly #jira UE-34561 Change 3087404 on 2016/08/12 by Rolando.Caloca DR - Upgrade glslang to 1.0.21.1 - Added some more debug output Change 3087524 on 2016/08/12 by Rolando.Caloca DR - vk - Fixed StencilRef, fixed size of RHIReadSurfaceFloatData (but still returns dummy data) Change 3087663 on 2016/08/12 by Rolando.Caloca DR - vk - Fix for SRGB; support for mip texture views Change 3087735 on 2016/08/12 by Daniel.Wright TextureRenderTarget2D's can now be up to 8192^2. Anything over 2048 pops up an 'are you sure' dialog. Change 3087750 on 2016/08/12 by Rolando.Caloca DR - vk - Minor renaming in prep for merge Change 3087813 on 2016/08/12 by Rolando.Caloca DR - vk - More minor cleanup Change 3087819 on 2016/08/12 by Chris.Bunner Check material function input types directly, no need to traverse connected graph. #jira UE-32134 Change 3087901 on 2016/08/12 by Rolando.Caloca DR - vk - Fix RT view to use 1 mip Fix depth buffer component swizzle Change 3088193 on 2016/08/12 by Daniel.Wright DFAO and RTDF shadows are enabled in High and Epic scalability settings by default Change 3088988 on 2016/08/15 by Rolando.Caloca DR - Add Accessors Change 3089104 on 2016/08/15 by Olaf.Piesche #jira UE-34241 Sceneproxy can be nullptr in FDynamicMeshEmitterData::Init if the proxy is being recreated Change 3089208 on 2016/08/15 by Daniel.Wright Downsampled separate translucency uses a separate view uniform buffer with correct buffer sizes * Fixes WorldPosition in downsampled translucency * View uniform buffer parameters are now cached on the view, to allow recreating the uniform buffer without having to rebuild the entire struct. Currently used by global distance field, downsampled separate translucency. * Fixed the downsampled translucency depth buffer being full res used together with a smaller color target, now they are both the downsampled res Change 3089209 on 2016/08/15 by Daniel.Wright Fixed atmospheric fog on translucency Change 3089457 on 2016/08/15 by Daniel.Wright Fixed lighting build failure from UMaterialInstanceDynamic assigned to a mesh that's being exported to Lightmass. The Swarm cache entry is created using the parent's guid, causing multiple MID's with the same parent to acquire a file handle multiple times which fails after the first. Change 3089549 on 2016/08/15 by Daniel.Wright UMaterialInterface initializes LightingGuid to something valid - causes UMaterialInstanceDynamic to have a valid LightingGuid so they can be used in lighting builds Change 3089703 on 2016/08/15 by Daniel.Wright Custom expression fixup for View.RenderTargetSize Change 3090546 on 2016/08/16 by Daniel.Wright Hopeful fix for recycled snapshot view crash Change 3091202 on 2016/08/16 by Daniel.Wright Manually clear FViewInfo::CachedViewUniformShaderParameters on creating a snapshot, since memcpy is used to create the snapshot view [CL 3091931 by Gil Gribb in Main branch]
2016-08-17 11:38:13 -04:00
bInitialized = true;
}
TGlobalResource<FDistanceFieldObjectBufferResource> GGlobalDistanceFieldCulledObjectBuffers;
uint32 CullObjectsGroupSize = 64;
class FCullObjectsForVolumeCS : public FGlobalShader
{
DECLARE_SHADER_TYPE(FCullObjectsForVolumeCS,Global)
public:
static bool ShouldCache(EShaderPlatform Platform)
{
return IsFeatureLevelSupported(Platform, ERHIFeatureLevel::SM5) && DoesPlatformSupportDistanceFieldAO(Platform);
}
static void ModifyCompilationEnvironment(EShaderPlatform Platform, FShaderCompilerEnvironment& OutEnvironment)
{
FGlobalShader::ModifyCompilationEnvironment(Platform,OutEnvironment);
OutEnvironment.SetDefine(TEXT("CULLOBJECTS_THREADGROUP_SIZE"), CullObjectsGroupSize);
}
FCullObjectsForVolumeCS(const ShaderMetaType::CompiledShaderInitializerType& Initializer)
: FGlobalShader(Initializer)
{
ObjectBufferParameters.Bind(Initializer.ParameterMap);
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
CulledObjectParameters.Bind(Initializer.ParameterMap);
AOGlobalMaxSphereQueryRadius.Bind(Initializer.ParameterMap, TEXT("AOGlobalMaxSphereQueryRadius"));
VolumeBounds.Bind(Initializer.ParameterMap, TEXT("VolumeBounds"));
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
AcceptOftenMovingObjectsOnly.Bind(Initializer.ParameterMap, TEXT("AcceptOftenMovingObjectsOnly"));
}
FCullObjectsForVolumeCS()
{
}
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
void SetParameters(FRHICommandList& RHICmdList, const FScene* Scene, const FSceneView& View, float MaxOcclusionDistance, const FVector4& VolumeBoundsValue, FGlobalDFCacheType CacheType)
{
FComputeShaderRHIParamRef ShaderRHI = GetComputeShader();
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3357411) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3244756 on 2017/01/03 by Marcus.Wassmer Copying //Tasks/UE4/Dev-Niagara@3244743 to Dev-Rendering (//UE4/Dev-Rendering) Change 3248667 on 2017/01/05 by Olaf.Piesche Resaving default asset because of engine verison issue; maybe unnecessary, but resaving niagara engine content to be sure #jira UE-40160 Change 3249324 on 2017/01/06 by Marcus.Wassmer Resave with an actual version to stop cook warning Change 3249611 on 2017/01/06 by Marcus.Wassmer Just remove warning-causing niagara data for now. Change 3308052 on 2017/02/16 by Rolando.Caloca DR - Check for Vulkan SDK, and only use it if it's newer or the same as the headers we distribute Change 3308109 on 2017/02/16 by Rolando.Caloca DR - Upgrade glslang to 1.0.39.1 Change 3308111 on 2017/02/16 by Rolando.Caloca DR - Update Vulkan distribution to 1.0.39.1 Change 3308153 on 2017/02/16 by Rolando.Caloca DR - Updated glslang libs Change 3308842 on 2017/02/17 by Rolando.Caloca DR - Fixed copy/paste Change 3310007 on 2017/02/17 by Chris.Bunner Back out CL 3221219 - causing MIC generation issues and superseded by CL 3273971. #jira UE-37792 Change 3310154 on 2017/02/17 by Chris.Bunner Assert when attempting to add a custom material attribute already in the base attributes list. Change 3310155 on 2017/02/17 by Chris.Bunner PR #3231: Validate material index before accessing (Contributed by projectgheist) #jira UE-41774, UE-41788 Change 3310162 on 2017/02/17 by Chris.Bunner PR #3252: Added MobileMaterialInterface to UsedMaterials (Contributed by projectgheist) #jira UE-41823, UE-41950 Change 3310176 on 2017/02/17 by Chris.Bunner Merging CL 3233886: AMD HDR support (requires r.AMDSupportsHDRDisplayOutput=1 in ini). Update to AGS 5.0.5. Partial code tidy up. Change 3310187 on 2017/02/17 by Chris.Bunner Preserve constant expressions rather than always casting after translating a material attribute. Losing the notion of constant means we can't correctly detect used properties and falsely enable e.g. PDO. Happened because of the incorrect component masks in BreakMaterialNodes which then had to be downcast to the correct type which is done as an inline fragment rather than swizzle expression. #jira UE-41594 Change 3310215 on 2017/02/17 by Chris.Bunner Prevent SpeedTree node compiling for skeletal meshes (not supported as uses more UV sets than available). More descriptive error for missing Cubemap UV input on TextureSample material node . #jira UE-33098 Change 3310838 on 2017/02/18 by Joe.Graf Moved some private functions to public for a licensee #CodeReview: matt.kuhlenschmidt #rb: n/a Change 3311876 on 2017/02/20 by Rolando.Caloca DR - Expose skin cache cvar r.SkinCache.AccumulationBufferSizeInKB #jira UE-42014 Change 3314139 on 2017/02/21 by Rolando.Caloca DR - Minor cleanup pass - Remove FVulkanPendingState - Renamed some classes for clarity - Hoist pending UAVs for flush out to pending compute state Change 3314642 on 2017/02/21 by Rolando.Caloca DR - Some more renaming Change 3315431 on 2017/02/21 by Ben.Salem Properly set default values for test time out and tick. We now will default to ticking once per second, and tracking the macro stats of GPU/Render/Game thread time. #tests Ran showdown demo several times Change 3316710 on 2017/02/22 by Rolando.Caloca DR - hlslcc - Fix refract intrinsic Change 3316718 on 2017/02/22 by Rolando.Caloca DR - hlslcc - Built libs to pick up change from 3316710 - refract fix Change 3316820 on 2017/02/22 by Benjamin.Hyder updating Tm-TrigNodes map Change 3317192 on 2017/02/22 by Benjamin.Hyder Updating QA-Decals map Change 3317528 on 2017/02/22 by Benjamin.Hyder Updating QA-Decals map Change 3317639 on 2017/02/22 by Benjamin.Hyder Updating Decal on Complex Mesh example in QA-Decals Change 3317764 on 2017/02/22 by Benjamin.Hyder Final updates to QA-Decals Change 3318319 on 2017/02/22 by Rolando.Caloca DR - minor reorg/rename Change 3318379 on 2017/02/22 by Rolando.Caloca DR - more cleanup Change 3321181 on 2017/02/24 by Rolando.Caloca DR - Fix GL bug Change 3321247 on 2017/02/24 by Rolando.Caloca DR - Fix misc bugs Change 3321898 on 2017/02/24 by Chris.Bunner Only issue clear TLV dispatch if required. #jira UERNDR-193 Change 3321904 on 2017/02/24 by Chris.Bunner Added comment for potential future optimization. Change 3322013 on 2017/02/24 by Uriel.Doyon Fixed separate translucency being affected by Gaussian DOF #jira UE-40489 Change 3322517 on 2017/02/24 by Uriel.Doyon Fixed issue with InvestigateTexture command removing budget limit. Fixed StreamingBounds show flag not working. It nows shows the streaming bound for the currently selected textures. #jira UE-40485 Change 3323470 on 2017/02/27 by Chad.Garyet Removing DDC job from dev-rendering Change 3323479 on 2017/02/27 by Chad.Garyet Removing RDU agent type Change 3323519 on 2017/02/27 by Chad.Garyet removing NCL/LHR/SEA agent types to clean up space Change 3323639 on 2017/02/27 by Benjamin.Hyder More updates to QA-Decals Change 3324207 on 2017/02/27 by Uriel.Doyon Fixed typo ScaleTexturesByGlobalMyBias -> ScaleTexturesByGlobalMipBias Removed bad merge in FStreamingTextureLevelContext::GetBuildDataIndexRef Change 3324396 on 2017/02/27 by Uriel.Doyon Fixed an issue with the Streaming Bounds show flag interferring with the static level data initialization #jira UE-40485 Change 3325227 on 2017/02/28 by Chris.Bunner Fix-up AMD AGS libs. Change 3325566 on 2017/02/28 by Uriel.Doyon Fixed possible out-of-bound access in GetUsedTexture() when passing ERHIFeatureLevel::Num Change 3326009 on 2017/02/28 by Uriel.Doyon Better fix for 3325566, as the previous fix would ignore the material instance overrides. Change 3327058 on 2017/03/01 by Benjamin.Hyder Preparing TM_Shadermodels map for automation Change 3328222 on 2017/03/01 by Chris.Bunner Prevent decals from drawing in separate translucency pass. Whilst user control and material relevance were already removed, if the flag was checked before being disabled (by swapping to decal domain) this was still being read in the render loop, now explicitly ignores decals. #jira UE-42449, UE-42446 Change 3329848 on 2017/03/02 by Uriel.Doyon Added some extra logs to help track UE-42168 Change 3329977 on 2017/03/02 by Rolando.Caloca DR - Fix bad clear value Change 3330008 on 2017/03/02 by Benjamin.Hyder More preparations for QA-Decals automation Change 3330754 on 2017/03/02 by Daniel.Wright Prominent comment explaining reflection env async compute usage and why it's not overlapped with anything Change 3331451 on 2017/03/03 by Marc.Olano Manually unroll simplex noise loop to avoid PSO bug on AMD/Metal Change 3331839 on 2017/03/03 by Rolando.Caloca DR - hlslcc - add missing file to project Change 3332247 on 2017/03/03 by Rolando.Caloca DR - Fix for integrated intel PR #3305 #jira UE-42393 Change 3332259 on 2017/03/03 by Rolando.Caloca DR - Fix bad index into pixel formats PR #3237 #jira UE-41855 Change 3332305 on 2017/03/03 by Rolando.Caloca DR - OpenGL SRV for index buffers PR #3271 #jira UE-32618 Change 3332313 on 2017/03/03 by Rolando.Caloca DR - Fix for integrated intel (properly) PR #3305 #jira UE-42393 Change 3332317 on 2017/03/03 by Rolando.Caloca DR - OpenGL SRV for index buffers (properly) PR #3271 #jira UE-32618 Change 3332368 on 2017/03/03 by Rolando.Caloca DR - Minor fixes so -sm4 and -sm5 can be used on windows with OpenGL/Vulkan Change 3333690 on 2017/03/06 by Daniel.Wright [Copy] Changing movable skylight properties no longer affects static draw lists Change 3333693 on 2017/03/06 by Daniel.Wright [Copy] Added 'r.AOListMeshDistanceFields' which dumps out mesh distance fields sorted by memory size, useful for directing content optimizations Change 3333705 on 2017/03/06 by Daniel.Wright [Copy] Mesh distance fields are now 8 bit fixed point by default, but can be changed back to 16 bit floating piont with a project setting. * 8 bit uses half memory but introduces error for thin surfaces or large meshes. Change 3333721 on 2017/03/06 by David.Hill DecalProxy: Copy float FadeScreenSize to FDeferredDecalProxy for use in the render thread. This avoids pointer chasing to the UDecalComponent (game thread component). Change 3333772 on 2017/03/06 by Daniel.Wright [Copy] Scene motion blur data is only updated for the main renderer frames. Fixes scene captures and planar reflections breaking object motion blur. Change 3333790 on 2017/03/06 by Daniel.Wright [Copy] Mesh distance field generation uses Embree, for a 2.5x speedup * Can switch back to old kDOP generation with 'r.DistanceFieldBuild.UseEmbree 0' for debugging Change 3333822 on 2017/03/06 by Daniel.Wright [Copy] Moved mesh distance field code into MeshDistanceFieldUtilities.cpp * Moved FMeshUtilities to its own header so the 8k line MeshUtilites.cpp file can be further split up Change 3333827 on 2017/03/06 by Daniel.Wright [Copy] Range compress 8bit distance fields - gets one extra bit of precision on average Change 3333828 on 2017/03/06 by Daniel.Wright [Copy] Raised High ShadowQuality to 2048 as 1024 for CSM is way too low Change 3333831 on 2017/03/06 by Daniel.Wright Non-editor compile fix Change 3333836 on 2017/03/06 by Daniel.Wright [Copy] Workaround for gobal distance field volume textures being bloated by 4x on PS4 due to the recommended tiling modes. They now use a 2d tiling mode which avoids the bloat, saving 96Mb. Change 3333843 on 2017/03/06 by Daniel.Wright [Copy] Added OcclusionExponent to skylight component * Useful for brightening up indoors without losing contact shadows as MinOcclusion does Change 3333845 on 2017/03/06 by Daniel.Wright [Copy] Capsule shadow BP functions Change 3333850 on 2017/03/06 by Daniel.Wright [Copy] Added OcclusionCombineMode to skylight component Change 3333854 on 2017/03/06 by Daniel.Wright [Copy] Gnm properly registers clears as GPU work so those events show up in profilegpu Change 3333857 on 2017/03/06 by Daniel.Wright [Copy] Clear light attenuation for local lights with a quad covering their screen extents * Clearing the entire light attenuation buffer costs .1ms on PS4. This optimization lowers the minimum cost of a shadow casting light from .15ms -> .03ms. * Shadowed lights in Fortnite with 25 lights 3.7ms -> 1.42ms on PS4 Change 3333860 on 2017/03/06 by Daniel.Wright [Copy] Flush deferred deletes when reallocating distance field atlas to reduce peak memory Change 3333861 on 2017/03/06 by Daniel.Wright [Copy] Disable all distance field features on Intel cards as HD 4000 hangs in the RHICreateTexture3D call to allocate the large atlas Change 3333869 on 2017/03/06 by Daniel.Wright [Copy] Volumetric Fog using a volume texture mapped to the camera frustum * Volumetric fog can be enabled on an Exponential Height Fog component with additional controls * Lights have a VolumetricScatteringIntensity * New cvars r.VolumetricFog, r.VolumetricFog.GridPixelSize, r.VolumetricFog.GridSizeZ, r.VolumetricFog.DepthDistributionScale * Lighting features supported: * Directional light with CSM and a light function * Point / spot lights without shadows / light functions / IES profiles * Skylight with occlusion from distance fields * Analytical height fog covers the view range past where the volumetric fog ends * Temporal reprojection is used on the volumetric fog scattering and extinction to achieve stability * Translucency integrates properly into volumetric fog * Height fog StartDistance is not supported by volumetric fog and should be set to 0. Change 3333894 on 2017/03/06 by Daniel.Wright [Copy] Initialize GDummyVolumetricFogGlobalDataUniformBuffer outside of parallel rendering Change 3333902 on 2017/03/06 by Daniel.Wright [Copy] Better handling of volumetric fog enabled with distance of 0 Change 3333903 on 2017/03/06 by Daniel.Wright [Copy] Fixed volumetric fog trying to render light functions for a point light Change 3333908 on 2017/03/06 by Daniel.Wright [Copy] Volumetric materials * Added new material domain Volume, which can output Scattering, Absorption and Emissive. All properties are in world space densities. * Particle systems using the Volume domain are voxelized based on their ParticlePosition and ParticleRadius * Volumetric fog integration is now energy conservative - scattering is integrated against transmission over the depth of each slice. * Added bOverrideLightColorsWithFogInscatteringColors to exponential height fog, which can be enabled to make Volumetric Fog match Height fog more closely Change 3334134 on 2017/03/06 by Daniel.Wright [Copy from Michael Trepka] Added Embree 2.14.0 and changed MeshUtilities to use it as this solves issues with Embree leaking TLS keys. UnrealLightmass is still using older Embree 2.7.0 until we can find time to properly test it with the new version. Also, invalidated distance field DDC to force it to rebuild with updated Embree. Change 3334420 on 2017/03/06 by Daniel.Wright Fixed RTDF shadows Change 3335467 on 2017/03/07 by Benjamin.Hyder Initial submission of QA-Decals map to EngineTest Change 3335556 on 2017/03/07 by Daniel.Wright Changed mesh distance field default format back to R16f Change 3338020 on 2017/03/08 by Daniel.Wright Disable volumetric fog in vertex shaders for feature levels which don't support it Change 3339394 on 2017/03/09 by Chris.Bunner Correctly handle material texture translation error edge case. #jira UE-42579, UE-42670 Change 3339992 on 2017/03/09 by Daniel.Wright Only compile volumetric fog shaders on supporting platforms Change 3341858 on 2017/03/10 by Arne.Schober Copying //UE4/Dev-Rendering-PSO to Dev-Rendering (//UE4/Dev-Rendering) #RB Rolando.Caloca, Marcus.Wassmer, Daniel.Wright, Nick.Penwarden, Mark.Satterthwaite Change 3342004 on 2017/03/10 by Arne.Schober Copying //UE4/Dev-Rendering-PSO to Dev-Rendering (//UE4/Dev-Rendering) Fix unity build #RB Marcus.Wassmer Change 3343307 on 2017/03/13 by Marcus.Wassmer Update showflags when we are guaranteed it will happen in all possible ways to spawn the scenecapture. (drag into editor, PIE, -game, etc) Change 3343732 on 2017/03/13 by Rolando.Caloca DR - Vulkan compute pipeline & refactor Change 3344846 on 2017/03/14 by Rolando.Caloca DR - Android compile fixes Change 3344883 on 2017/03/14 by Rolando.Caloca DR - Add missing stencil load/store to PSO initializer Change 3344985 on 2017/03/14 by Rolando.Caloca DR - Made load/store actions uint8 Change 3345141 on 2017/03/14 by Rolando.Caloca DR - vk - Rework render pass hash Change 3345304 on 2017/03/14 by Benjamin.Hyder Updating TM-Distancefields map to include TemplateFloor mesh Change 3345387 on 2017/03/14 by Rolando.Caloca DR - Add _RenderThread calls for Create*Shader so RHIs can choose not to stall when creating Change 3345388 on 2017/03/14 by Rolando.Caloca DR - Do not stall when creating shaders on Vulkan Change 3345722 on 2017/03/14 by Chris.Bunner PR #3357: MinimalAPI add to many material expressions (Contributed by DeanoC) #jira UE-42752 Change 3345723 on 2017/03/14 by Chris.Bunner Reduce log verbosity causing spamming during landscape editing. #jira UE-42714 Change 3345725 on 2017/03/14 by Chris.Bunner [Duplicate 3341860] Fixed material translation error with multiple connections from custom interpolator nodes. Change 3345726 on 2017/03/14 by Chris.Bunner Typo fixes. Change 3345732 on 2017/03/14 by Rolando.Caloca DR - Decouple vertex declaration off BSS Change 3345746 on 2017/03/14 by Chris.Bunner Added sign() intrinsic material graph node and delisted material function workaround. Change 3346042 on 2017/03/14 by Chris.Bunner Implement missing size query interface for FRenderTargetResources. #jira UE-41672 Change 3346387 on 2017/03/14 by Daniel.Wright [Copy] Added VolumetricScatteringIntensity to particle lights Change 3346389 on 2017/03/14 by Daniel.Wright [Copy] Clamp Volumetric material attributes to fp16 range to avoid INFs Disable volumetric fog when the fog show flag is disabled Change 3346392 on 2017/03/14 by Daniel.Wright [Copy] Fixed skylight being much too bright on volumetric fog Change 3346406 on 2017/03/14 by Daniel.Wright [Copy] CSM resolution is now controlled by r.Shadow.MaxCSMResolution. * Changed HighPC to use 1024 MaxShadowResolution (max for all non-CSM shadows), saves 60Mb in Fortnite Change 3346412 on 2017/03/14 by Daniel.Wright [Copy] TexCreate_ReduceMemoryWithTilingMode for translucency lighting 3d textures, saves 13Mb Change 3346414 on 2017/03/14 by Daniel.Wright [Copy] TexCreate_ReduceMemoryWithTilingMode for volumetric fog 3d textures, saves 13Mb Change 3346415 on 2017/03/14 by Daniel.Wright [Copy] Missing file from cl 3338451 Change 3346421 on 2017/03/14 by Daniel.Wright [Copy] Fixed NaNs in volumetric fog due to rendering when height fog is disabled * Volumetric fog converts NaNs to black now so they don't spread Change 3346422 on 2017/03/14 by Daniel.Wright [Copy] Fixed NaN in volumetric fog with low density values Change 3346423 on 2017/03/14 by Daniel.Wright [Copy] Changed default VolumetricFogScatteringDistribution to .2 Change 3346430 on 2017/03/14 by Daniel.Wright [Copy] New translucent material option to compute fog per pixel instead of the default per vertex Change 3346432 on 2017/03/14 by Daniel.Wright [Copy] Moved Volumetric Fog parameters to view uniform buffer for translucency pass Fixed lifetimes of temporary Volumetric Fog render targets Change 3346526 on 2017/03/14 by Daniel.Wright [Copy] Volumetric Fog supports point and spot light shadows * These lights are injected separately so that per-light resources can be bound (shadow depth map, static shadow depth map) * Forward lighting of local lights can be forced with 'r.VolumetricFog.InjectShadowedLightsSeparately 0' * Shadowed lights come at a cost: 2.9ms for volumetric fog on 970 -> 4.2ms with shadowing Change 3347053 on 2017/03/15 by Rolando.Caloca DR - android compile fix Change 3347384 on 2017/03/15 by Rolando.Caloca DR - Fix merge issue Change 3347643 on 2017/03/15 by Marcus.Wassmer Fix some bugs with the 'disable stationary skylight ffor the project' feature. Fixes lighting in Persona on Paragon. Change 3347979 on 2017/03/15 by Rolando.Caloca DR - Allow to automatically apply cached rendertargets to PSO initializer Change 3348024 on 2017/03/15 by Rolando.Caloca DR - Remove NullPS on Vulkan to avoid deadlock Change 3348303 on 2017/03/15 by Rolando.Caloca DR - Fix for debugging SCW with material SRT Change 3348357 on 2017/03/15 by Marcus.Wassmer Fix stencildither and a stencilref bug that was probably breaking decals sometimes. Change 3348549 on 2017/03/15 by Marcus.Wassmer Hopefully fix static analysis for potential nullptr access. Change 3348614 on 2017/03/15 by Marcus.Wassmer Duplicate some switch changes to fix crash on launch. Change 3349369 on 2017/03/16 by Gil.Gribb Fixed botched merge Change 3349947 on 2017/03/16 by Rolando.Caloca DR - Fix for mismatched primitive type Change 3349956 on 2017/03/16 by Benjamin.Hyder initial updates to TM-DistanceFields map Change 3350151 on 2017/03/16 by Rolando.Caloca DR - Fix UT compile issue Change 3350155 on 2017/03/16 by Rolando.Caloca DR - Catch mismatched primitive type on PSOs on D3D11 Change 3350192 on 2017/03/16 by Daniel.Wright Fix for point light shadow depths rendering with wrong cull mode due to PSO refactor Change 3350736 on 2017/03/16 by Daniel.Wright Fixed formatting from merge Change 3350881 on 2017/03/16 by Rolando.Caloca DR - Fix texture arrays as UAVs on Metal Change 3350927 on 2017/03/16 by Rolando.Caloca DR - Fix warning Change 3350935 on 2017/03/16 by Daniel.Wright Fix for materials with non-Surface domains being skipped in mesh passes Change 3351583 on 2017/03/17 by Marcus.Wassmer Fix clang platforms Change 3351917 on 2017/03/17 by Marcus.Wassmer Fix linux compile Change 3351973 on 2017/03/17 by Marcus.Wassmer Fix mismatched rendertargetformat Change 3352038 on 2017/03/17 by Daniel.Wright Enabled GetAndOrCreateGraphicsPipelineState ensures in Development for testing Change 3352110 on 2017/03/17 by Marcus.Wassmer Fix missing RT PSO apply Change 3352695 on 2017/03/17 by Arne.Schober DR - Remove PSO Rendertarget check in DX12 Resolve with Shader. #RB Rolando.Caloca Change 3352960 on 2017/03/17 by Arne.Schober DR - Fix some things that slipped trough the PSO merge #RB none Change 3353150 on 2017/03/18 by Rolando.Caloca DR - compile fix Change 3353205 on 2017/03/18 by Arne.Schober DR - Fix Incremental Compile and PS4 runtime error where CMASK is not allowed for ThickTile Mode #RB none Change 3353207 on 2017/03/18 by Arne.Schober DR - Fix Confusion #RB none Change 3355183 on 2017/03/20 by Nick.Bullard Fixed up Content orginzation for Decals automation tests in EngineTest Change 3355627 on 2017/03/20 by Arne.Schober DR - [UE-43094] - removed ensure in comporiton graph as control of the clear color cannot be gurantueed. Change 3356342 on 2017/03/21 by Marcus.Wassmer Fix clang errors Change 3356591 on 2017/03/21 by Arne.Schober DR - Fix ensure message #RB none Change 3356873 on 2017/03/21 by Arne.Schober DR - Fix comparission of undefined values in RendertargetApply Check Change 3357261 on 2017/03/21 by Marcus.Wassmer Fix LinuxEditor compile Change 3357294 on 2017/03/21 by Marcus.Wassmer Add missing SSE functions Change 3357351 on 2017/03/21 by Frank.Fella Fix win32 and linux compiler errors Change 3357370 on 2017/03/21 by Arne.Schober DR - disable ensure in test builds #RB Marcus.Wassmer [CL 3357449 by Marcus Wassmer in Main branch]
2017-03-21 17:46:52 -04:00
FGlobalShader::SetParameters<FViewUniformShaderParameters>(RHICmdList, ShaderRHI, View.ViewUniformBuffer);
ObjectBufferParameters.Set(RHICmdList, ShaderRHI, *(Scene->DistanceFieldSceneData.ObjectBuffers), Scene->DistanceFieldSceneData.NumObjectsInBuffer);
FUnorderedAccessViewRHIParamRef OutUAVs[4];
OutUAVs[0] = GGlobalDistanceFieldCulledObjectBuffers.Buffers.ObjectIndirectArguments.UAV;
OutUAVs[1] = GGlobalDistanceFieldCulledObjectBuffers.Buffers.Bounds.UAV;
OutUAVs[2] = GGlobalDistanceFieldCulledObjectBuffers.Buffers.Data.UAV;
OutUAVs[3] = GGlobalDistanceFieldCulledObjectBuffers.Buffers.BoxBounds.UAV;
RHICmdList.TransitionResources(EResourceTransitionAccess::ERWBarrier, EResourceTransitionPipeline::EComputeToCompute, OutUAVs, ARRAY_COUNT(OutUAVs));
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
CulledObjectParameters.Set(RHICmdList, ShaderRHI, GGlobalDistanceFieldCulledObjectBuffers.Buffers);
extern float GAOConeHalfAngle;
const float GlobalMaxSphereQueryRadius = MaxOcclusionDistance / (1 + FMath::Tan(GAOConeHalfAngle));
SetShaderValue(RHICmdList, ShaderRHI, AOGlobalMaxSphereQueryRadius, GlobalMaxSphereQueryRadius);
SetShaderValue(RHICmdList, ShaderRHI, VolumeBounds, VolumeBoundsValue);
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
uint32 AcceptOftenMovingObjectsOnlyValue = 0;
if (!GAOGlobalDistanceFieldCacheMostlyStaticSeparately)
{
AcceptOftenMovingObjectsOnlyValue = 2;
}
else if (CacheType == GDF_Full)
{
// First cache is for mostly static, second contains both, inheriting static objects distance fields with a lookup
// So only composite often moving objects into the full global distance field
AcceptOftenMovingObjectsOnlyValue = 1;
}
SetShaderValue(RHICmdList, ShaderRHI, AcceptOftenMovingObjectsOnly, AcceptOftenMovingObjectsOnlyValue);
}
void UnsetParameters(FRHICommandList& RHICmdList, const FScene* Scene)
{
ObjectBufferParameters.UnsetParameters(RHICmdList, GetComputeShader(), *(Scene->DistanceFieldSceneData.ObjectBuffers));
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
CulledObjectParameters.UnsetParameters(RHICmdList, GetComputeShader());
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
TArray<FUnorderedAccessViewRHIParamRef> UAVs;
CulledObjectParameters.GetUAVs(GGlobalDistanceFieldCulledObjectBuffers.Buffers, UAVs);
RHICmdList.TransitionResources(EResourceTransitionAccess::EReadable, EResourceTransitionPipeline::EComputeToCompute, UAVs.GetData(), UAVs.Num());
}
virtual bool Serialize(FArchive& Ar)
{
bool bShaderHasOutdatedParameters = FGlobalShader::Serialize(Ar);
Ar << ObjectBufferParameters;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
Ar << CulledObjectParameters;
Ar << AOGlobalMaxSphereQueryRadius;
Ar << VolumeBounds;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
Ar << AcceptOftenMovingObjectsOnly;
return bShaderHasOutdatedParameters;
}
private:
FDistanceFieldObjectBufferParameters ObjectBufferParameters;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
FDistanceFieldCulledObjectBufferParameters CulledObjectParameters;
FShaderParameter AOGlobalMaxSphereQueryRadius;
FShaderParameter VolumeBounds;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
FShaderParameter AcceptOftenMovingObjectsOnly;
};
IMPLEMENT_SHADER_TYPE(,FCullObjectsForVolumeCS,TEXT("GlobalDistanceField"),TEXT("CullObjectsForVolumeCS"),SF_Compute);
const int32 GMaxGridCulledObjects = 2047;
class FObjectGridBuffers : public FRenderResource
{
public:
int32 GridDimension;
FRWBuffer CulledObjectGrid;
FObjectGridBuffers()
{
GridDimension = 0;
}
virtual void InitDynamicRHI() override
{
if (GridDimension > 0)
{
CulledObjectGrid.Initialize(sizeof(uint32), GridDimension * GridDimension * GridDimension * (GMaxGridCulledObjects + 1), PF_R32_UINT);
}
}
virtual void ReleaseDynamicRHI() override
{
CulledObjectGrid.Release();
}
size_t GetSizeBytes() const
{
return CulledObjectGrid.NumBytes;
}
};
TGlobalResource<FObjectGridBuffers> GObjectGridBuffers;
const int32 GCullGridTileSize = 16;
class FCullObjectsToGridCS : public FGlobalShader
{
DECLARE_SHADER_TYPE(FCullObjectsToGridCS,Global)
public:
static bool ShouldCache(EShaderPlatform Platform)
{
return IsFeatureLevelSupported(Platform, ERHIFeatureLevel::SM5) && DoesPlatformSupportDistanceFieldAO(Platform);
}
static void ModifyCompilationEnvironment(EShaderPlatform Platform, FShaderCompilerEnvironment& OutEnvironment)
{
FGlobalShader::ModifyCompilationEnvironment(Platform,OutEnvironment);
OutEnvironment.SetDefine(TEXT("CULL_GRID_TILE_SIZE"), GCullGridTileSize);
OutEnvironment.SetDefine(TEXT("MAX_GRID_CULLED_DF_OBJECTS"), GMaxGridCulledObjects);
}
FCullObjectsToGridCS(const ShaderMetaType::CompiledShaderInitializerType& Initializer)
: FGlobalShader(Initializer)
{
CulledObjectBufferParameters.Bind(Initializer.ParameterMap);
GlobalDistanceFieldParameters.Bind(Initializer.ParameterMap);
CulledObjectGrid.Bind(Initializer.ParameterMap, TEXT("CulledObjectGrid"));
CullGridDimension.Bind(Initializer.ParameterMap, TEXT("CullGridDimension"));
VolumeTexelSize.Bind(Initializer.ParameterMap, TEXT("VolumeTexelSize"));
UpdateRegionVolumeMin.Bind(Initializer.ParameterMap, TEXT("UpdateRegionVolumeMin"));
ClipmapIndex.Bind(Initializer.ParameterMap, TEXT("ClipmapIndex"));
AOGlobalMaxSphereQueryRadius.Bind(Initializer.ParameterMap, TEXT("AOGlobalMaxSphereQueryRadius"));
}
FCullObjectsToGridCS()
{
}
void SetParameters(
FRHICommandList& RHICmdList,
const FScene* Scene,
const FSceneView& View,
float MaxOcclusionDistance,
const FGlobalDistanceFieldInfo& GlobalDistanceFieldInfo,
int32 ClipmapIndexValue,
const FVolumeUpdateRegion& UpdateRegion)
{
FComputeShaderRHIParamRef ShaderRHI = GetComputeShader();
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3357411) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3244756 on 2017/01/03 by Marcus.Wassmer Copying //Tasks/UE4/Dev-Niagara@3244743 to Dev-Rendering (//UE4/Dev-Rendering) Change 3248667 on 2017/01/05 by Olaf.Piesche Resaving default asset because of engine verison issue; maybe unnecessary, but resaving niagara engine content to be sure #jira UE-40160 Change 3249324 on 2017/01/06 by Marcus.Wassmer Resave with an actual version to stop cook warning Change 3249611 on 2017/01/06 by Marcus.Wassmer Just remove warning-causing niagara data for now. Change 3308052 on 2017/02/16 by Rolando.Caloca DR - Check for Vulkan SDK, and only use it if it's newer or the same as the headers we distribute Change 3308109 on 2017/02/16 by Rolando.Caloca DR - Upgrade glslang to 1.0.39.1 Change 3308111 on 2017/02/16 by Rolando.Caloca DR - Update Vulkan distribution to 1.0.39.1 Change 3308153 on 2017/02/16 by Rolando.Caloca DR - Updated glslang libs Change 3308842 on 2017/02/17 by Rolando.Caloca DR - Fixed copy/paste Change 3310007 on 2017/02/17 by Chris.Bunner Back out CL 3221219 - causing MIC generation issues and superseded by CL 3273971. #jira UE-37792 Change 3310154 on 2017/02/17 by Chris.Bunner Assert when attempting to add a custom material attribute already in the base attributes list. Change 3310155 on 2017/02/17 by Chris.Bunner PR #3231: Validate material index before accessing (Contributed by projectgheist) #jira UE-41774, UE-41788 Change 3310162 on 2017/02/17 by Chris.Bunner PR #3252: Added MobileMaterialInterface to UsedMaterials (Contributed by projectgheist) #jira UE-41823, UE-41950 Change 3310176 on 2017/02/17 by Chris.Bunner Merging CL 3233886: AMD HDR support (requires r.AMDSupportsHDRDisplayOutput=1 in ini). Update to AGS 5.0.5. Partial code tidy up. Change 3310187 on 2017/02/17 by Chris.Bunner Preserve constant expressions rather than always casting after translating a material attribute. Losing the notion of constant means we can't correctly detect used properties and falsely enable e.g. PDO. Happened because of the incorrect component masks in BreakMaterialNodes which then had to be downcast to the correct type which is done as an inline fragment rather than swizzle expression. #jira UE-41594 Change 3310215 on 2017/02/17 by Chris.Bunner Prevent SpeedTree node compiling for skeletal meshes (not supported as uses more UV sets than available). More descriptive error for missing Cubemap UV input on TextureSample material node . #jira UE-33098 Change 3310838 on 2017/02/18 by Joe.Graf Moved some private functions to public for a licensee #CodeReview: matt.kuhlenschmidt #rb: n/a Change 3311876 on 2017/02/20 by Rolando.Caloca DR - Expose skin cache cvar r.SkinCache.AccumulationBufferSizeInKB #jira UE-42014 Change 3314139 on 2017/02/21 by Rolando.Caloca DR - Minor cleanup pass - Remove FVulkanPendingState - Renamed some classes for clarity - Hoist pending UAVs for flush out to pending compute state Change 3314642 on 2017/02/21 by Rolando.Caloca DR - Some more renaming Change 3315431 on 2017/02/21 by Ben.Salem Properly set default values for test time out and tick. We now will default to ticking once per second, and tracking the macro stats of GPU/Render/Game thread time. #tests Ran showdown demo several times Change 3316710 on 2017/02/22 by Rolando.Caloca DR - hlslcc - Fix refract intrinsic Change 3316718 on 2017/02/22 by Rolando.Caloca DR - hlslcc - Built libs to pick up change from 3316710 - refract fix Change 3316820 on 2017/02/22 by Benjamin.Hyder updating Tm-TrigNodes map Change 3317192 on 2017/02/22 by Benjamin.Hyder Updating QA-Decals map Change 3317528 on 2017/02/22 by Benjamin.Hyder Updating QA-Decals map Change 3317639 on 2017/02/22 by Benjamin.Hyder Updating Decal on Complex Mesh example in QA-Decals Change 3317764 on 2017/02/22 by Benjamin.Hyder Final updates to QA-Decals Change 3318319 on 2017/02/22 by Rolando.Caloca DR - minor reorg/rename Change 3318379 on 2017/02/22 by Rolando.Caloca DR - more cleanup Change 3321181 on 2017/02/24 by Rolando.Caloca DR - Fix GL bug Change 3321247 on 2017/02/24 by Rolando.Caloca DR - Fix misc bugs Change 3321898 on 2017/02/24 by Chris.Bunner Only issue clear TLV dispatch if required. #jira UERNDR-193 Change 3321904 on 2017/02/24 by Chris.Bunner Added comment for potential future optimization. Change 3322013 on 2017/02/24 by Uriel.Doyon Fixed separate translucency being affected by Gaussian DOF #jira UE-40489 Change 3322517 on 2017/02/24 by Uriel.Doyon Fixed issue with InvestigateTexture command removing budget limit. Fixed StreamingBounds show flag not working. It nows shows the streaming bound for the currently selected textures. #jira UE-40485 Change 3323470 on 2017/02/27 by Chad.Garyet Removing DDC job from dev-rendering Change 3323479 on 2017/02/27 by Chad.Garyet Removing RDU agent type Change 3323519 on 2017/02/27 by Chad.Garyet removing NCL/LHR/SEA agent types to clean up space Change 3323639 on 2017/02/27 by Benjamin.Hyder More updates to QA-Decals Change 3324207 on 2017/02/27 by Uriel.Doyon Fixed typo ScaleTexturesByGlobalMyBias -> ScaleTexturesByGlobalMipBias Removed bad merge in FStreamingTextureLevelContext::GetBuildDataIndexRef Change 3324396 on 2017/02/27 by Uriel.Doyon Fixed an issue with the Streaming Bounds show flag interferring with the static level data initialization #jira UE-40485 Change 3325227 on 2017/02/28 by Chris.Bunner Fix-up AMD AGS libs. Change 3325566 on 2017/02/28 by Uriel.Doyon Fixed possible out-of-bound access in GetUsedTexture() when passing ERHIFeatureLevel::Num Change 3326009 on 2017/02/28 by Uriel.Doyon Better fix for 3325566, as the previous fix would ignore the material instance overrides. Change 3327058 on 2017/03/01 by Benjamin.Hyder Preparing TM_Shadermodels map for automation Change 3328222 on 2017/03/01 by Chris.Bunner Prevent decals from drawing in separate translucency pass. Whilst user control and material relevance were already removed, if the flag was checked before being disabled (by swapping to decal domain) this was still being read in the render loop, now explicitly ignores decals. #jira UE-42449, UE-42446 Change 3329848 on 2017/03/02 by Uriel.Doyon Added some extra logs to help track UE-42168 Change 3329977 on 2017/03/02 by Rolando.Caloca DR - Fix bad clear value Change 3330008 on 2017/03/02 by Benjamin.Hyder More preparations for QA-Decals automation Change 3330754 on 2017/03/02 by Daniel.Wright Prominent comment explaining reflection env async compute usage and why it's not overlapped with anything Change 3331451 on 2017/03/03 by Marc.Olano Manually unroll simplex noise loop to avoid PSO bug on AMD/Metal Change 3331839 on 2017/03/03 by Rolando.Caloca DR - hlslcc - add missing file to project Change 3332247 on 2017/03/03 by Rolando.Caloca DR - Fix for integrated intel PR #3305 #jira UE-42393 Change 3332259 on 2017/03/03 by Rolando.Caloca DR - Fix bad index into pixel formats PR #3237 #jira UE-41855 Change 3332305 on 2017/03/03 by Rolando.Caloca DR - OpenGL SRV for index buffers PR #3271 #jira UE-32618 Change 3332313 on 2017/03/03 by Rolando.Caloca DR - Fix for integrated intel (properly) PR #3305 #jira UE-42393 Change 3332317 on 2017/03/03 by Rolando.Caloca DR - OpenGL SRV for index buffers (properly) PR #3271 #jira UE-32618 Change 3332368 on 2017/03/03 by Rolando.Caloca DR - Minor fixes so -sm4 and -sm5 can be used on windows with OpenGL/Vulkan Change 3333690 on 2017/03/06 by Daniel.Wright [Copy] Changing movable skylight properties no longer affects static draw lists Change 3333693 on 2017/03/06 by Daniel.Wright [Copy] Added 'r.AOListMeshDistanceFields' which dumps out mesh distance fields sorted by memory size, useful for directing content optimizations Change 3333705 on 2017/03/06 by Daniel.Wright [Copy] Mesh distance fields are now 8 bit fixed point by default, but can be changed back to 16 bit floating piont with a project setting. * 8 bit uses half memory but introduces error for thin surfaces or large meshes. Change 3333721 on 2017/03/06 by David.Hill DecalProxy: Copy float FadeScreenSize to FDeferredDecalProxy for use in the render thread. This avoids pointer chasing to the UDecalComponent (game thread component). Change 3333772 on 2017/03/06 by Daniel.Wright [Copy] Scene motion blur data is only updated for the main renderer frames. Fixes scene captures and planar reflections breaking object motion blur. Change 3333790 on 2017/03/06 by Daniel.Wright [Copy] Mesh distance field generation uses Embree, for a 2.5x speedup * Can switch back to old kDOP generation with 'r.DistanceFieldBuild.UseEmbree 0' for debugging Change 3333822 on 2017/03/06 by Daniel.Wright [Copy] Moved mesh distance field code into MeshDistanceFieldUtilities.cpp * Moved FMeshUtilities to its own header so the 8k line MeshUtilites.cpp file can be further split up Change 3333827 on 2017/03/06 by Daniel.Wright [Copy] Range compress 8bit distance fields - gets one extra bit of precision on average Change 3333828 on 2017/03/06 by Daniel.Wright [Copy] Raised High ShadowQuality to 2048 as 1024 for CSM is way too low Change 3333831 on 2017/03/06 by Daniel.Wright Non-editor compile fix Change 3333836 on 2017/03/06 by Daniel.Wright [Copy] Workaround for gobal distance field volume textures being bloated by 4x on PS4 due to the recommended tiling modes. They now use a 2d tiling mode which avoids the bloat, saving 96Mb. Change 3333843 on 2017/03/06 by Daniel.Wright [Copy] Added OcclusionExponent to skylight component * Useful for brightening up indoors without losing contact shadows as MinOcclusion does Change 3333845 on 2017/03/06 by Daniel.Wright [Copy] Capsule shadow BP functions Change 3333850 on 2017/03/06 by Daniel.Wright [Copy] Added OcclusionCombineMode to skylight component Change 3333854 on 2017/03/06 by Daniel.Wright [Copy] Gnm properly registers clears as GPU work so those events show up in profilegpu Change 3333857 on 2017/03/06 by Daniel.Wright [Copy] Clear light attenuation for local lights with a quad covering their screen extents * Clearing the entire light attenuation buffer costs .1ms on PS4. This optimization lowers the minimum cost of a shadow casting light from .15ms -> .03ms. * Shadowed lights in Fortnite with 25 lights 3.7ms -> 1.42ms on PS4 Change 3333860 on 2017/03/06 by Daniel.Wright [Copy] Flush deferred deletes when reallocating distance field atlas to reduce peak memory Change 3333861 on 2017/03/06 by Daniel.Wright [Copy] Disable all distance field features on Intel cards as HD 4000 hangs in the RHICreateTexture3D call to allocate the large atlas Change 3333869 on 2017/03/06 by Daniel.Wright [Copy] Volumetric Fog using a volume texture mapped to the camera frustum * Volumetric fog can be enabled on an Exponential Height Fog component with additional controls * Lights have a VolumetricScatteringIntensity * New cvars r.VolumetricFog, r.VolumetricFog.GridPixelSize, r.VolumetricFog.GridSizeZ, r.VolumetricFog.DepthDistributionScale * Lighting features supported: * Directional light with CSM and a light function * Point / spot lights without shadows / light functions / IES profiles * Skylight with occlusion from distance fields * Analytical height fog covers the view range past where the volumetric fog ends * Temporal reprojection is used on the volumetric fog scattering and extinction to achieve stability * Translucency integrates properly into volumetric fog * Height fog StartDistance is not supported by volumetric fog and should be set to 0. Change 3333894 on 2017/03/06 by Daniel.Wright [Copy] Initialize GDummyVolumetricFogGlobalDataUniformBuffer outside of parallel rendering Change 3333902 on 2017/03/06 by Daniel.Wright [Copy] Better handling of volumetric fog enabled with distance of 0 Change 3333903 on 2017/03/06 by Daniel.Wright [Copy] Fixed volumetric fog trying to render light functions for a point light Change 3333908 on 2017/03/06 by Daniel.Wright [Copy] Volumetric materials * Added new material domain Volume, which can output Scattering, Absorption and Emissive. All properties are in world space densities. * Particle systems using the Volume domain are voxelized based on their ParticlePosition and ParticleRadius * Volumetric fog integration is now energy conservative - scattering is integrated against transmission over the depth of each slice. * Added bOverrideLightColorsWithFogInscatteringColors to exponential height fog, which can be enabled to make Volumetric Fog match Height fog more closely Change 3334134 on 2017/03/06 by Daniel.Wright [Copy from Michael Trepka] Added Embree 2.14.0 and changed MeshUtilities to use it as this solves issues with Embree leaking TLS keys. UnrealLightmass is still using older Embree 2.7.0 until we can find time to properly test it with the new version. Also, invalidated distance field DDC to force it to rebuild with updated Embree. Change 3334420 on 2017/03/06 by Daniel.Wright Fixed RTDF shadows Change 3335467 on 2017/03/07 by Benjamin.Hyder Initial submission of QA-Decals map to EngineTest Change 3335556 on 2017/03/07 by Daniel.Wright Changed mesh distance field default format back to R16f Change 3338020 on 2017/03/08 by Daniel.Wright Disable volumetric fog in vertex shaders for feature levels which don't support it Change 3339394 on 2017/03/09 by Chris.Bunner Correctly handle material texture translation error edge case. #jira UE-42579, UE-42670 Change 3339992 on 2017/03/09 by Daniel.Wright Only compile volumetric fog shaders on supporting platforms Change 3341858 on 2017/03/10 by Arne.Schober Copying //UE4/Dev-Rendering-PSO to Dev-Rendering (//UE4/Dev-Rendering) #RB Rolando.Caloca, Marcus.Wassmer, Daniel.Wright, Nick.Penwarden, Mark.Satterthwaite Change 3342004 on 2017/03/10 by Arne.Schober Copying //UE4/Dev-Rendering-PSO to Dev-Rendering (//UE4/Dev-Rendering) Fix unity build #RB Marcus.Wassmer Change 3343307 on 2017/03/13 by Marcus.Wassmer Update showflags when we are guaranteed it will happen in all possible ways to spawn the scenecapture. (drag into editor, PIE, -game, etc) Change 3343732 on 2017/03/13 by Rolando.Caloca DR - Vulkan compute pipeline & refactor Change 3344846 on 2017/03/14 by Rolando.Caloca DR - Android compile fixes Change 3344883 on 2017/03/14 by Rolando.Caloca DR - Add missing stencil load/store to PSO initializer Change 3344985 on 2017/03/14 by Rolando.Caloca DR - Made load/store actions uint8 Change 3345141 on 2017/03/14 by Rolando.Caloca DR - vk - Rework render pass hash Change 3345304 on 2017/03/14 by Benjamin.Hyder Updating TM-Distancefields map to include TemplateFloor mesh Change 3345387 on 2017/03/14 by Rolando.Caloca DR - Add _RenderThread calls for Create*Shader so RHIs can choose not to stall when creating Change 3345388 on 2017/03/14 by Rolando.Caloca DR - Do not stall when creating shaders on Vulkan Change 3345722 on 2017/03/14 by Chris.Bunner PR #3357: MinimalAPI add to many material expressions (Contributed by DeanoC) #jira UE-42752 Change 3345723 on 2017/03/14 by Chris.Bunner Reduce log verbosity causing spamming during landscape editing. #jira UE-42714 Change 3345725 on 2017/03/14 by Chris.Bunner [Duplicate 3341860] Fixed material translation error with multiple connections from custom interpolator nodes. Change 3345726 on 2017/03/14 by Chris.Bunner Typo fixes. Change 3345732 on 2017/03/14 by Rolando.Caloca DR - Decouple vertex declaration off BSS Change 3345746 on 2017/03/14 by Chris.Bunner Added sign() intrinsic material graph node and delisted material function workaround. Change 3346042 on 2017/03/14 by Chris.Bunner Implement missing size query interface for FRenderTargetResources. #jira UE-41672 Change 3346387 on 2017/03/14 by Daniel.Wright [Copy] Added VolumetricScatteringIntensity to particle lights Change 3346389 on 2017/03/14 by Daniel.Wright [Copy] Clamp Volumetric material attributes to fp16 range to avoid INFs Disable volumetric fog when the fog show flag is disabled Change 3346392 on 2017/03/14 by Daniel.Wright [Copy] Fixed skylight being much too bright on volumetric fog Change 3346406 on 2017/03/14 by Daniel.Wright [Copy] CSM resolution is now controlled by r.Shadow.MaxCSMResolution. * Changed HighPC to use 1024 MaxShadowResolution (max for all non-CSM shadows), saves 60Mb in Fortnite Change 3346412 on 2017/03/14 by Daniel.Wright [Copy] TexCreate_ReduceMemoryWithTilingMode for translucency lighting 3d textures, saves 13Mb Change 3346414 on 2017/03/14 by Daniel.Wright [Copy] TexCreate_ReduceMemoryWithTilingMode for volumetric fog 3d textures, saves 13Mb Change 3346415 on 2017/03/14 by Daniel.Wright [Copy] Missing file from cl 3338451 Change 3346421 on 2017/03/14 by Daniel.Wright [Copy] Fixed NaNs in volumetric fog due to rendering when height fog is disabled * Volumetric fog converts NaNs to black now so they don't spread Change 3346422 on 2017/03/14 by Daniel.Wright [Copy] Fixed NaN in volumetric fog with low density values Change 3346423 on 2017/03/14 by Daniel.Wright [Copy] Changed default VolumetricFogScatteringDistribution to .2 Change 3346430 on 2017/03/14 by Daniel.Wright [Copy] New translucent material option to compute fog per pixel instead of the default per vertex Change 3346432 on 2017/03/14 by Daniel.Wright [Copy] Moved Volumetric Fog parameters to view uniform buffer for translucency pass Fixed lifetimes of temporary Volumetric Fog render targets Change 3346526 on 2017/03/14 by Daniel.Wright [Copy] Volumetric Fog supports point and spot light shadows * These lights are injected separately so that per-light resources can be bound (shadow depth map, static shadow depth map) * Forward lighting of local lights can be forced with 'r.VolumetricFog.InjectShadowedLightsSeparately 0' * Shadowed lights come at a cost: 2.9ms for volumetric fog on 970 -> 4.2ms with shadowing Change 3347053 on 2017/03/15 by Rolando.Caloca DR - android compile fix Change 3347384 on 2017/03/15 by Rolando.Caloca DR - Fix merge issue Change 3347643 on 2017/03/15 by Marcus.Wassmer Fix some bugs with the 'disable stationary skylight ffor the project' feature. Fixes lighting in Persona on Paragon. Change 3347979 on 2017/03/15 by Rolando.Caloca DR - Allow to automatically apply cached rendertargets to PSO initializer Change 3348024 on 2017/03/15 by Rolando.Caloca DR - Remove NullPS on Vulkan to avoid deadlock Change 3348303 on 2017/03/15 by Rolando.Caloca DR - Fix for debugging SCW with material SRT Change 3348357 on 2017/03/15 by Marcus.Wassmer Fix stencildither and a stencilref bug that was probably breaking decals sometimes. Change 3348549 on 2017/03/15 by Marcus.Wassmer Hopefully fix static analysis for potential nullptr access. Change 3348614 on 2017/03/15 by Marcus.Wassmer Duplicate some switch changes to fix crash on launch. Change 3349369 on 2017/03/16 by Gil.Gribb Fixed botched merge Change 3349947 on 2017/03/16 by Rolando.Caloca DR - Fix for mismatched primitive type Change 3349956 on 2017/03/16 by Benjamin.Hyder initial updates to TM-DistanceFields map Change 3350151 on 2017/03/16 by Rolando.Caloca DR - Fix UT compile issue Change 3350155 on 2017/03/16 by Rolando.Caloca DR - Catch mismatched primitive type on PSOs on D3D11 Change 3350192 on 2017/03/16 by Daniel.Wright Fix for point light shadow depths rendering with wrong cull mode due to PSO refactor Change 3350736 on 2017/03/16 by Daniel.Wright Fixed formatting from merge Change 3350881 on 2017/03/16 by Rolando.Caloca DR - Fix texture arrays as UAVs on Metal Change 3350927 on 2017/03/16 by Rolando.Caloca DR - Fix warning Change 3350935 on 2017/03/16 by Daniel.Wright Fix for materials with non-Surface domains being skipped in mesh passes Change 3351583 on 2017/03/17 by Marcus.Wassmer Fix clang platforms Change 3351917 on 2017/03/17 by Marcus.Wassmer Fix linux compile Change 3351973 on 2017/03/17 by Marcus.Wassmer Fix mismatched rendertargetformat Change 3352038 on 2017/03/17 by Daniel.Wright Enabled GetAndOrCreateGraphicsPipelineState ensures in Development for testing Change 3352110 on 2017/03/17 by Marcus.Wassmer Fix missing RT PSO apply Change 3352695 on 2017/03/17 by Arne.Schober DR - Remove PSO Rendertarget check in DX12 Resolve with Shader. #RB Rolando.Caloca Change 3352960 on 2017/03/17 by Arne.Schober DR - Fix some things that slipped trough the PSO merge #RB none Change 3353150 on 2017/03/18 by Rolando.Caloca DR - compile fix Change 3353205 on 2017/03/18 by Arne.Schober DR - Fix Incremental Compile and PS4 runtime error where CMASK is not allowed for ThickTile Mode #RB none Change 3353207 on 2017/03/18 by Arne.Schober DR - Fix Confusion #RB none Change 3355183 on 2017/03/20 by Nick.Bullard Fixed up Content orginzation for Decals automation tests in EngineTest Change 3355627 on 2017/03/20 by Arne.Schober DR - [UE-43094] - removed ensure in comporiton graph as control of the clear color cannot be gurantueed. Change 3356342 on 2017/03/21 by Marcus.Wassmer Fix clang errors Change 3356591 on 2017/03/21 by Arne.Schober DR - Fix ensure message #RB none Change 3356873 on 2017/03/21 by Arne.Schober DR - Fix comparission of undefined values in RendertargetApply Check Change 3357261 on 2017/03/21 by Marcus.Wassmer Fix LinuxEditor compile Change 3357294 on 2017/03/21 by Marcus.Wassmer Add missing SSE functions Change 3357351 on 2017/03/21 by Frank.Fella Fix win32 and linux compiler errors Change 3357370 on 2017/03/21 by Arne.Schober DR - disable ensure in test builds #RB Marcus.Wassmer [CL 3357449 by Marcus Wassmer in Main branch]
2017-03-21 17:46:52 -04:00
FGlobalShader::SetParameters<FViewUniformShaderParameters>(RHICmdList, ShaderRHI, View.ViewUniformBuffer);
CulledObjectBufferParameters.Set(RHICmdList, ShaderRHI, GGlobalDistanceFieldCulledObjectBuffers.Buffers);
GlobalDistanceFieldParameters.Set(RHICmdList, ShaderRHI, GlobalDistanceFieldInfo.ParameterData);
RHICmdList.TransitionResource(EResourceTransitionAccess::ERWBarrier, EResourceTransitionPipeline::EComputeToCompute, GObjectGridBuffers.CulledObjectGrid.UAV);
CulledObjectGrid.SetBuffer(RHICmdList, ShaderRHI, GObjectGridBuffers.CulledObjectGrid);
const FIntVector GridDimensionValue(
FMath::DivideAndRoundUp<int32>(UpdateRegion.CellsSize.X, GCullGridTileSize),
FMath::DivideAndRoundUp<int32>(UpdateRegion.CellsSize.Y, GCullGridTileSize),
FMath::DivideAndRoundUp<int32>(UpdateRegion.CellsSize.Z, GCullGridTileSize));
SetShaderValue(RHICmdList, ShaderRHI, CullGridDimension, GridDimensionValue);
SetShaderValue(RHICmdList, ShaderRHI, VolumeTexelSize, FVector(1.0f / GAOGlobalDFResolution));
SetShaderValue(RHICmdList, ShaderRHI, UpdateRegionVolumeMin, UpdateRegion.Bounds.Min);
SetShaderValue(RHICmdList, ShaderRHI, ClipmapIndex, ClipmapIndexValue);
extern float GAOConeHalfAngle;
const float GlobalMaxSphereQueryRadius = MaxOcclusionDistance / (1 + FMath::Tan(GAOConeHalfAngle));
SetShaderValue(RHICmdList, ShaderRHI, AOGlobalMaxSphereQueryRadius, GlobalMaxSphereQueryRadius);
}
void UnsetParameters(FRHICommandList& RHICmdList)
{
CulledObjectGrid.UnsetUAV(RHICmdList, GetComputeShader());
RHICmdList.TransitionResource(EResourceTransitionAccess::EReadable, EResourceTransitionPipeline::EComputeToCompute, GObjectGridBuffers.CulledObjectGrid.UAV);
}
virtual bool Serialize(FArchive& Ar)
{
bool bShaderHasOutdatedParameters = FGlobalShader::Serialize(Ar);
Ar << CulledObjectBufferParameters;
Ar << GlobalDistanceFieldParameters;
Ar << CulledObjectGrid;
Ar << CullGridDimension;
Ar << VolumeTexelSize;
Ar << UpdateRegionVolumeMin;
Ar << ClipmapIndex;
Ar << AOGlobalMaxSphereQueryRadius;
return bShaderHasOutdatedParameters;
}
private:
FDistanceFieldCulledObjectBufferParameters CulledObjectBufferParameters;
FGlobalDistanceFieldParameters GlobalDistanceFieldParameters;
FRWShaderParameter CulledObjectGrid;
FShaderParameter CullGridDimension;
FShaderParameter VolumeTexelSize;
FShaderParameter UpdateRegionVolumeMin;
FShaderParameter ClipmapIndex;
FShaderParameter AOGlobalMaxSphereQueryRadius;
};
IMPLEMENT_SHADER_TYPE(,FCullObjectsToGridCS,TEXT("GlobalDistanceField"),TEXT("CullObjectsToGridCS"),SF_Compute);
const int32 CompositeTileSize = 4;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
template<bool bUseParentDistanceField>
class TCompositeObjectDistanceFieldsCS : public FGlobalShader
{
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
DECLARE_SHADER_TYPE(TCompositeObjectDistanceFieldsCS,Global)
public:
static bool ShouldCache(EShaderPlatform Platform)
{
return IsFeatureLevelSupported(Platform, ERHIFeatureLevel::SM5) && DoesPlatformSupportDistanceFieldAO(Platform);
}
static void ModifyCompilationEnvironment(EShaderPlatform Platform, FShaderCompilerEnvironment& OutEnvironment)
{
FGlobalShader::ModifyCompilationEnvironment(Platform,OutEnvironment);
OutEnvironment.SetDefine(TEXT("COMPOSITE_THREADGROUP_SIZE"), CompositeTileSize);
OutEnvironment.SetDefine(TEXT("CULL_GRID_TILE_SIZE"), GCullGridTileSize);
OutEnvironment.SetDefine(TEXT("MAX_GRID_CULLED_DF_OBJECTS"), GMaxGridCulledObjects);
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
OutEnvironment.SetDefine(TEXT("USE_PARENT_DISTANCE_FIELD"), bUseParentDistanceField ? 1 : 0);
}
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
TCompositeObjectDistanceFieldsCS(const ShaderMetaType::CompiledShaderInitializerType& Initializer)
: FGlobalShader(Initializer)
{
CulledObjectBufferParameters.Bind(Initializer.ParameterMap);
GlobalDistanceFieldParameters.Bind(Initializer.ParameterMap);
GlobalDistanceFieldTexture.Bind(Initializer.ParameterMap, TEXT("GlobalDistanceFieldTexture"));
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
ParentGlobalDistanceFieldTexture.Bind(Initializer.ParameterMap, TEXT("ParentGlobalDistanceFieldTexture"));
CulledObjectGrid.Bind(Initializer.ParameterMap, TEXT("CulledObjectGrid"));
UpdateRegionSize.Bind(Initializer.ParameterMap, TEXT("UpdateRegionSize"));
CullGridDimension.Bind(Initializer.ParameterMap, TEXT("CullGridDimension"));
VolumeTexelSize.Bind(Initializer.ParameterMap, TEXT("VolumeTexelSize"));
UpdateRegionVolumeMin.Bind(Initializer.ParameterMap, TEXT("UpdateRegionVolumeMin"));
ClipmapIndex.Bind(Initializer.ParameterMap, TEXT("ClipmapIndex"));
AOGlobalMaxSphereQueryRadius.Bind(Initializer.ParameterMap, TEXT("AOGlobalMaxSphereQueryRadius"));
}
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
TCompositeObjectDistanceFieldsCS()
{
}
void SetParameters(
FRHICommandList& RHICmdList,
const FScene* Scene,
const FSceneView& View,
float MaxOcclusionDistance,
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
const FGlobalDistanceFieldParameterData& ParameterData,
const FGlobalDistanceFieldClipmap& Clipmap,
IPooledRenderTarget* ParentDistanceField,
int32 ClipmapIndexValue,
const FVolumeUpdateRegion& UpdateRegion)
{
FComputeShaderRHIParamRef ShaderRHI = GetComputeShader();
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3357411) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3244756 on 2017/01/03 by Marcus.Wassmer Copying //Tasks/UE4/Dev-Niagara@3244743 to Dev-Rendering (//UE4/Dev-Rendering) Change 3248667 on 2017/01/05 by Olaf.Piesche Resaving default asset because of engine verison issue; maybe unnecessary, but resaving niagara engine content to be sure #jira UE-40160 Change 3249324 on 2017/01/06 by Marcus.Wassmer Resave with an actual version to stop cook warning Change 3249611 on 2017/01/06 by Marcus.Wassmer Just remove warning-causing niagara data for now. Change 3308052 on 2017/02/16 by Rolando.Caloca DR - Check for Vulkan SDK, and only use it if it's newer or the same as the headers we distribute Change 3308109 on 2017/02/16 by Rolando.Caloca DR - Upgrade glslang to 1.0.39.1 Change 3308111 on 2017/02/16 by Rolando.Caloca DR - Update Vulkan distribution to 1.0.39.1 Change 3308153 on 2017/02/16 by Rolando.Caloca DR - Updated glslang libs Change 3308842 on 2017/02/17 by Rolando.Caloca DR - Fixed copy/paste Change 3310007 on 2017/02/17 by Chris.Bunner Back out CL 3221219 - causing MIC generation issues and superseded by CL 3273971. #jira UE-37792 Change 3310154 on 2017/02/17 by Chris.Bunner Assert when attempting to add a custom material attribute already in the base attributes list. Change 3310155 on 2017/02/17 by Chris.Bunner PR #3231: Validate material index before accessing (Contributed by projectgheist) #jira UE-41774, UE-41788 Change 3310162 on 2017/02/17 by Chris.Bunner PR #3252: Added MobileMaterialInterface to UsedMaterials (Contributed by projectgheist) #jira UE-41823, UE-41950 Change 3310176 on 2017/02/17 by Chris.Bunner Merging CL 3233886: AMD HDR support (requires r.AMDSupportsHDRDisplayOutput=1 in ini). Update to AGS 5.0.5. Partial code tidy up. Change 3310187 on 2017/02/17 by Chris.Bunner Preserve constant expressions rather than always casting after translating a material attribute. Losing the notion of constant means we can't correctly detect used properties and falsely enable e.g. PDO. Happened because of the incorrect component masks in BreakMaterialNodes which then had to be downcast to the correct type which is done as an inline fragment rather than swizzle expression. #jira UE-41594 Change 3310215 on 2017/02/17 by Chris.Bunner Prevent SpeedTree node compiling for skeletal meshes (not supported as uses more UV sets than available). More descriptive error for missing Cubemap UV input on TextureSample material node . #jira UE-33098 Change 3310838 on 2017/02/18 by Joe.Graf Moved some private functions to public for a licensee #CodeReview: matt.kuhlenschmidt #rb: n/a Change 3311876 on 2017/02/20 by Rolando.Caloca DR - Expose skin cache cvar r.SkinCache.AccumulationBufferSizeInKB #jira UE-42014 Change 3314139 on 2017/02/21 by Rolando.Caloca DR - Minor cleanup pass - Remove FVulkanPendingState - Renamed some classes for clarity - Hoist pending UAVs for flush out to pending compute state Change 3314642 on 2017/02/21 by Rolando.Caloca DR - Some more renaming Change 3315431 on 2017/02/21 by Ben.Salem Properly set default values for test time out and tick. We now will default to ticking once per second, and tracking the macro stats of GPU/Render/Game thread time. #tests Ran showdown demo several times Change 3316710 on 2017/02/22 by Rolando.Caloca DR - hlslcc - Fix refract intrinsic Change 3316718 on 2017/02/22 by Rolando.Caloca DR - hlslcc - Built libs to pick up change from 3316710 - refract fix Change 3316820 on 2017/02/22 by Benjamin.Hyder updating Tm-TrigNodes map Change 3317192 on 2017/02/22 by Benjamin.Hyder Updating QA-Decals map Change 3317528 on 2017/02/22 by Benjamin.Hyder Updating QA-Decals map Change 3317639 on 2017/02/22 by Benjamin.Hyder Updating Decal on Complex Mesh example in QA-Decals Change 3317764 on 2017/02/22 by Benjamin.Hyder Final updates to QA-Decals Change 3318319 on 2017/02/22 by Rolando.Caloca DR - minor reorg/rename Change 3318379 on 2017/02/22 by Rolando.Caloca DR - more cleanup Change 3321181 on 2017/02/24 by Rolando.Caloca DR - Fix GL bug Change 3321247 on 2017/02/24 by Rolando.Caloca DR - Fix misc bugs Change 3321898 on 2017/02/24 by Chris.Bunner Only issue clear TLV dispatch if required. #jira UERNDR-193 Change 3321904 on 2017/02/24 by Chris.Bunner Added comment for potential future optimization. Change 3322013 on 2017/02/24 by Uriel.Doyon Fixed separate translucency being affected by Gaussian DOF #jira UE-40489 Change 3322517 on 2017/02/24 by Uriel.Doyon Fixed issue with InvestigateTexture command removing budget limit. Fixed StreamingBounds show flag not working. It nows shows the streaming bound for the currently selected textures. #jira UE-40485 Change 3323470 on 2017/02/27 by Chad.Garyet Removing DDC job from dev-rendering Change 3323479 on 2017/02/27 by Chad.Garyet Removing RDU agent type Change 3323519 on 2017/02/27 by Chad.Garyet removing NCL/LHR/SEA agent types to clean up space Change 3323639 on 2017/02/27 by Benjamin.Hyder More updates to QA-Decals Change 3324207 on 2017/02/27 by Uriel.Doyon Fixed typo ScaleTexturesByGlobalMyBias -> ScaleTexturesByGlobalMipBias Removed bad merge in FStreamingTextureLevelContext::GetBuildDataIndexRef Change 3324396 on 2017/02/27 by Uriel.Doyon Fixed an issue with the Streaming Bounds show flag interferring with the static level data initialization #jira UE-40485 Change 3325227 on 2017/02/28 by Chris.Bunner Fix-up AMD AGS libs. Change 3325566 on 2017/02/28 by Uriel.Doyon Fixed possible out-of-bound access in GetUsedTexture() when passing ERHIFeatureLevel::Num Change 3326009 on 2017/02/28 by Uriel.Doyon Better fix for 3325566, as the previous fix would ignore the material instance overrides. Change 3327058 on 2017/03/01 by Benjamin.Hyder Preparing TM_Shadermodels map for automation Change 3328222 on 2017/03/01 by Chris.Bunner Prevent decals from drawing in separate translucency pass. Whilst user control and material relevance were already removed, if the flag was checked before being disabled (by swapping to decal domain) this was still being read in the render loop, now explicitly ignores decals. #jira UE-42449, UE-42446 Change 3329848 on 2017/03/02 by Uriel.Doyon Added some extra logs to help track UE-42168 Change 3329977 on 2017/03/02 by Rolando.Caloca DR - Fix bad clear value Change 3330008 on 2017/03/02 by Benjamin.Hyder More preparations for QA-Decals automation Change 3330754 on 2017/03/02 by Daniel.Wright Prominent comment explaining reflection env async compute usage and why it's not overlapped with anything Change 3331451 on 2017/03/03 by Marc.Olano Manually unroll simplex noise loop to avoid PSO bug on AMD/Metal Change 3331839 on 2017/03/03 by Rolando.Caloca DR - hlslcc - add missing file to project Change 3332247 on 2017/03/03 by Rolando.Caloca DR - Fix for integrated intel PR #3305 #jira UE-42393 Change 3332259 on 2017/03/03 by Rolando.Caloca DR - Fix bad index into pixel formats PR #3237 #jira UE-41855 Change 3332305 on 2017/03/03 by Rolando.Caloca DR - OpenGL SRV for index buffers PR #3271 #jira UE-32618 Change 3332313 on 2017/03/03 by Rolando.Caloca DR - Fix for integrated intel (properly) PR #3305 #jira UE-42393 Change 3332317 on 2017/03/03 by Rolando.Caloca DR - OpenGL SRV for index buffers (properly) PR #3271 #jira UE-32618 Change 3332368 on 2017/03/03 by Rolando.Caloca DR - Minor fixes so -sm4 and -sm5 can be used on windows with OpenGL/Vulkan Change 3333690 on 2017/03/06 by Daniel.Wright [Copy] Changing movable skylight properties no longer affects static draw lists Change 3333693 on 2017/03/06 by Daniel.Wright [Copy] Added 'r.AOListMeshDistanceFields' which dumps out mesh distance fields sorted by memory size, useful for directing content optimizations Change 3333705 on 2017/03/06 by Daniel.Wright [Copy] Mesh distance fields are now 8 bit fixed point by default, but can be changed back to 16 bit floating piont with a project setting. * 8 bit uses half memory but introduces error for thin surfaces or large meshes. Change 3333721 on 2017/03/06 by David.Hill DecalProxy: Copy float FadeScreenSize to FDeferredDecalProxy for use in the render thread. This avoids pointer chasing to the UDecalComponent (game thread component). Change 3333772 on 2017/03/06 by Daniel.Wright [Copy] Scene motion blur data is only updated for the main renderer frames. Fixes scene captures and planar reflections breaking object motion blur. Change 3333790 on 2017/03/06 by Daniel.Wright [Copy] Mesh distance field generation uses Embree, for a 2.5x speedup * Can switch back to old kDOP generation with 'r.DistanceFieldBuild.UseEmbree 0' for debugging Change 3333822 on 2017/03/06 by Daniel.Wright [Copy] Moved mesh distance field code into MeshDistanceFieldUtilities.cpp * Moved FMeshUtilities to its own header so the 8k line MeshUtilites.cpp file can be further split up Change 3333827 on 2017/03/06 by Daniel.Wright [Copy] Range compress 8bit distance fields - gets one extra bit of precision on average Change 3333828 on 2017/03/06 by Daniel.Wright [Copy] Raised High ShadowQuality to 2048 as 1024 for CSM is way too low Change 3333831 on 2017/03/06 by Daniel.Wright Non-editor compile fix Change 3333836 on 2017/03/06 by Daniel.Wright [Copy] Workaround for gobal distance field volume textures being bloated by 4x on PS4 due to the recommended tiling modes. They now use a 2d tiling mode which avoids the bloat, saving 96Mb. Change 3333843 on 2017/03/06 by Daniel.Wright [Copy] Added OcclusionExponent to skylight component * Useful for brightening up indoors without losing contact shadows as MinOcclusion does Change 3333845 on 2017/03/06 by Daniel.Wright [Copy] Capsule shadow BP functions Change 3333850 on 2017/03/06 by Daniel.Wright [Copy] Added OcclusionCombineMode to skylight component Change 3333854 on 2017/03/06 by Daniel.Wright [Copy] Gnm properly registers clears as GPU work so those events show up in profilegpu Change 3333857 on 2017/03/06 by Daniel.Wright [Copy] Clear light attenuation for local lights with a quad covering their screen extents * Clearing the entire light attenuation buffer costs .1ms on PS4. This optimization lowers the minimum cost of a shadow casting light from .15ms -> .03ms. * Shadowed lights in Fortnite with 25 lights 3.7ms -> 1.42ms on PS4 Change 3333860 on 2017/03/06 by Daniel.Wright [Copy] Flush deferred deletes when reallocating distance field atlas to reduce peak memory Change 3333861 on 2017/03/06 by Daniel.Wright [Copy] Disable all distance field features on Intel cards as HD 4000 hangs in the RHICreateTexture3D call to allocate the large atlas Change 3333869 on 2017/03/06 by Daniel.Wright [Copy] Volumetric Fog using a volume texture mapped to the camera frustum * Volumetric fog can be enabled on an Exponential Height Fog component with additional controls * Lights have a VolumetricScatteringIntensity * New cvars r.VolumetricFog, r.VolumetricFog.GridPixelSize, r.VolumetricFog.GridSizeZ, r.VolumetricFog.DepthDistributionScale * Lighting features supported: * Directional light with CSM and a light function * Point / spot lights without shadows / light functions / IES profiles * Skylight with occlusion from distance fields * Analytical height fog covers the view range past where the volumetric fog ends * Temporal reprojection is used on the volumetric fog scattering and extinction to achieve stability * Translucency integrates properly into volumetric fog * Height fog StartDistance is not supported by volumetric fog and should be set to 0. Change 3333894 on 2017/03/06 by Daniel.Wright [Copy] Initialize GDummyVolumetricFogGlobalDataUniformBuffer outside of parallel rendering Change 3333902 on 2017/03/06 by Daniel.Wright [Copy] Better handling of volumetric fog enabled with distance of 0 Change 3333903 on 2017/03/06 by Daniel.Wright [Copy] Fixed volumetric fog trying to render light functions for a point light Change 3333908 on 2017/03/06 by Daniel.Wright [Copy] Volumetric materials * Added new material domain Volume, which can output Scattering, Absorption and Emissive. All properties are in world space densities. * Particle systems using the Volume domain are voxelized based on their ParticlePosition and ParticleRadius * Volumetric fog integration is now energy conservative - scattering is integrated against transmission over the depth of each slice. * Added bOverrideLightColorsWithFogInscatteringColors to exponential height fog, which can be enabled to make Volumetric Fog match Height fog more closely Change 3334134 on 2017/03/06 by Daniel.Wright [Copy from Michael Trepka] Added Embree 2.14.0 and changed MeshUtilities to use it as this solves issues with Embree leaking TLS keys. UnrealLightmass is still using older Embree 2.7.0 until we can find time to properly test it with the new version. Also, invalidated distance field DDC to force it to rebuild with updated Embree. Change 3334420 on 2017/03/06 by Daniel.Wright Fixed RTDF shadows Change 3335467 on 2017/03/07 by Benjamin.Hyder Initial submission of QA-Decals map to EngineTest Change 3335556 on 2017/03/07 by Daniel.Wright Changed mesh distance field default format back to R16f Change 3338020 on 2017/03/08 by Daniel.Wright Disable volumetric fog in vertex shaders for feature levels which don't support it Change 3339394 on 2017/03/09 by Chris.Bunner Correctly handle material texture translation error edge case. #jira UE-42579, UE-42670 Change 3339992 on 2017/03/09 by Daniel.Wright Only compile volumetric fog shaders on supporting platforms Change 3341858 on 2017/03/10 by Arne.Schober Copying //UE4/Dev-Rendering-PSO to Dev-Rendering (//UE4/Dev-Rendering) #RB Rolando.Caloca, Marcus.Wassmer, Daniel.Wright, Nick.Penwarden, Mark.Satterthwaite Change 3342004 on 2017/03/10 by Arne.Schober Copying //UE4/Dev-Rendering-PSO to Dev-Rendering (//UE4/Dev-Rendering) Fix unity build #RB Marcus.Wassmer Change 3343307 on 2017/03/13 by Marcus.Wassmer Update showflags when we are guaranteed it will happen in all possible ways to spawn the scenecapture. (drag into editor, PIE, -game, etc) Change 3343732 on 2017/03/13 by Rolando.Caloca DR - Vulkan compute pipeline & refactor Change 3344846 on 2017/03/14 by Rolando.Caloca DR - Android compile fixes Change 3344883 on 2017/03/14 by Rolando.Caloca DR - Add missing stencil load/store to PSO initializer Change 3344985 on 2017/03/14 by Rolando.Caloca DR - Made load/store actions uint8 Change 3345141 on 2017/03/14 by Rolando.Caloca DR - vk - Rework render pass hash Change 3345304 on 2017/03/14 by Benjamin.Hyder Updating TM-Distancefields map to include TemplateFloor mesh Change 3345387 on 2017/03/14 by Rolando.Caloca DR - Add _RenderThread calls for Create*Shader so RHIs can choose not to stall when creating Change 3345388 on 2017/03/14 by Rolando.Caloca DR - Do not stall when creating shaders on Vulkan Change 3345722 on 2017/03/14 by Chris.Bunner PR #3357: MinimalAPI add to many material expressions (Contributed by DeanoC) #jira UE-42752 Change 3345723 on 2017/03/14 by Chris.Bunner Reduce log verbosity causing spamming during landscape editing. #jira UE-42714 Change 3345725 on 2017/03/14 by Chris.Bunner [Duplicate 3341860] Fixed material translation error with multiple connections from custom interpolator nodes. Change 3345726 on 2017/03/14 by Chris.Bunner Typo fixes. Change 3345732 on 2017/03/14 by Rolando.Caloca DR - Decouple vertex declaration off BSS Change 3345746 on 2017/03/14 by Chris.Bunner Added sign() intrinsic material graph node and delisted material function workaround. Change 3346042 on 2017/03/14 by Chris.Bunner Implement missing size query interface for FRenderTargetResources. #jira UE-41672 Change 3346387 on 2017/03/14 by Daniel.Wright [Copy] Added VolumetricScatteringIntensity to particle lights Change 3346389 on 2017/03/14 by Daniel.Wright [Copy] Clamp Volumetric material attributes to fp16 range to avoid INFs Disable volumetric fog when the fog show flag is disabled Change 3346392 on 2017/03/14 by Daniel.Wright [Copy] Fixed skylight being much too bright on volumetric fog Change 3346406 on 2017/03/14 by Daniel.Wright [Copy] CSM resolution is now controlled by r.Shadow.MaxCSMResolution. * Changed HighPC to use 1024 MaxShadowResolution (max for all non-CSM shadows), saves 60Mb in Fortnite Change 3346412 on 2017/03/14 by Daniel.Wright [Copy] TexCreate_ReduceMemoryWithTilingMode for translucency lighting 3d textures, saves 13Mb Change 3346414 on 2017/03/14 by Daniel.Wright [Copy] TexCreate_ReduceMemoryWithTilingMode for volumetric fog 3d textures, saves 13Mb Change 3346415 on 2017/03/14 by Daniel.Wright [Copy] Missing file from cl 3338451 Change 3346421 on 2017/03/14 by Daniel.Wright [Copy] Fixed NaNs in volumetric fog due to rendering when height fog is disabled * Volumetric fog converts NaNs to black now so they don't spread Change 3346422 on 2017/03/14 by Daniel.Wright [Copy] Fixed NaN in volumetric fog with low density values Change 3346423 on 2017/03/14 by Daniel.Wright [Copy] Changed default VolumetricFogScatteringDistribution to .2 Change 3346430 on 2017/03/14 by Daniel.Wright [Copy] New translucent material option to compute fog per pixel instead of the default per vertex Change 3346432 on 2017/03/14 by Daniel.Wright [Copy] Moved Volumetric Fog parameters to view uniform buffer for translucency pass Fixed lifetimes of temporary Volumetric Fog render targets Change 3346526 on 2017/03/14 by Daniel.Wright [Copy] Volumetric Fog supports point and spot light shadows * These lights are injected separately so that per-light resources can be bound (shadow depth map, static shadow depth map) * Forward lighting of local lights can be forced with 'r.VolumetricFog.InjectShadowedLightsSeparately 0' * Shadowed lights come at a cost: 2.9ms for volumetric fog on 970 -> 4.2ms with shadowing Change 3347053 on 2017/03/15 by Rolando.Caloca DR - android compile fix Change 3347384 on 2017/03/15 by Rolando.Caloca DR - Fix merge issue Change 3347643 on 2017/03/15 by Marcus.Wassmer Fix some bugs with the 'disable stationary skylight ffor the project' feature. Fixes lighting in Persona on Paragon. Change 3347979 on 2017/03/15 by Rolando.Caloca DR - Allow to automatically apply cached rendertargets to PSO initializer Change 3348024 on 2017/03/15 by Rolando.Caloca DR - Remove NullPS on Vulkan to avoid deadlock Change 3348303 on 2017/03/15 by Rolando.Caloca DR - Fix for debugging SCW with material SRT Change 3348357 on 2017/03/15 by Marcus.Wassmer Fix stencildither and a stencilref bug that was probably breaking decals sometimes. Change 3348549 on 2017/03/15 by Marcus.Wassmer Hopefully fix static analysis for potential nullptr access. Change 3348614 on 2017/03/15 by Marcus.Wassmer Duplicate some switch changes to fix crash on launch. Change 3349369 on 2017/03/16 by Gil.Gribb Fixed botched merge Change 3349947 on 2017/03/16 by Rolando.Caloca DR - Fix for mismatched primitive type Change 3349956 on 2017/03/16 by Benjamin.Hyder initial updates to TM-DistanceFields map Change 3350151 on 2017/03/16 by Rolando.Caloca DR - Fix UT compile issue Change 3350155 on 2017/03/16 by Rolando.Caloca DR - Catch mismatched primitive type on PSOs on D3D11 Change 3350192 on 2017/03/16 by Daniel.Wright Fix for point light shadow depths rendering with wrong cull mode due to PSO refactor Change 3350736 on 2017/03/16 by Daniel.Wright Fixed formatting from merge Change 3350881 on 2017/03/16 by Rolando.Caloca DR - Fix texture arrays as UAVs on Metal Change 3350927 on 2017/03/16 by Rolando.Caloca DR - Fix warning Change 3350935 on 2017/03/16 by Daniel.Wright Fix for materials with non-Surface domains being skipped in mesh passes Change 3351583 on 2017/03/17 by Marcus.Wassmer Fix clang platforms Change 3351917 on 2017/03/17 by Marcus.Wassmer Fix linux compile Change 3351973 on 2017/03/17 by Marcus.Wassmer Fix mismatched rendertargetformat Change 3352038 on 2017/03/17 by Daniel.Wright Enabled GetAndOrCreateGraphicsPipelineState ensures in Development for testing Change 3352110 on 2017/03/17 by Marcus.Wassmer Fix missing RT PSO apply Change 3352695 on 2017/03/17 by Arne.Schober DR - Remove PSO Rendertarget check in DX12 Resolve with Shader. #RB Rolando.Caloca Change 3352960 on 2017/03/17 by Arne.Schober DR - Fix some things that slipped trough the PSO merge #RB none Change 3353150 on 2017/03/18 by Rolando.Caloca DR - compile fix Change 3353205 on 2017/03/18 by Arne.Schober DR - Fix Incremental Compile and PS4 runtime error where CMASK is not allowed for ThickTile Mode #RB none Change 3353207 on 2017/03/18 by Arne.Schober DR - Fix Confusion #RB none Change 3355183 on 2017/03/20 by Nick.Bullard Fixed up Content orginzation for Decals automation tests in EngineTest Change 3355627 on 2017/03/20 by Arne.Schober DR - [UE-43094] - removed ensure in comporiton graph as control of the clear color cannot be gurantueed. Change 3356342 on 2017/03/21 by Marcus.Wassmer Fix clang errors Change 3356591 on 2017/03/21 by Arne.Schober DR - Fix ensure message #RB none Change 3356873 on 2017/03/21 by Arne.Schober DR - Fix comparission of undefined values in RendertargetApply Check Change 3357261 on 2017/03/21 by Marcus.Wassmer Fix LinuxEditor compile Change 3357294 on 2017/03/21 by Marcus.Wassmer Add missing SSE functions Change 3357351 on 2017/03/21 by Frank.Fella Fix win32 and linux compiler errors Change 3357370 on 2017/03/21 by Arne.Schober DR - disable ensure in test builds #RB Marcus.Wassmer [CL 3357449 by Marcus Wassmer in Main branch]
2017-03-21 17:46:52 -04:00
FGlobalShader::SetParameters<FViewUniformShaderParameters>(RHICmdList, ShaderRHI, View.ViewUniformBuffer);
CulledObjectBufferParameters.Set(RHICmdList, ShaderRHI, GGlobalDistanceFieldCulledObjectBuffers.Buffers);
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
GlobalDistanceFieldParameters.Set(RHICmdList, ShaderRHI, ParameterData);
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
const FSceneRenderTargetItem& ClipMapRTI = Clipmap.RenderTarget->GetRenderTargetItem();
RHICmdList.TransitionResource(EResourceTransitionAccess::ERWBarrier, EResourceTransitionPipeline::EComputeToCompute, ClipMapRTI.UAV);
GlobalDistanceFieldTexture.SetTexture(RHICmdList, ShaderRHI, ClipMapRTI.ShaderResourceTexture, ClipMapRTI.UAV);
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
if (bUseParentDistanceField)
{
SetTextureParameter(RHICmdList, ShaderRHI, ParentGlobalDistanceFieldTexture, ParentDistanceField->GetRenderTargetItem().ShaderResourceTexture);
}
else
{
check(!ParentGlobalDistanceFieldTexture.IsBound());
}
SetSRVParameter(RHICmdList, ShaderRHI, CulledObjectGrid, GObjectGridBuffers.CulledObjectGrid.SRV);
const FIntVector GridDimensionValue(
FMath::DivideAndRoundUp<int32>(UpdateRegion.CellsSize.X, GCullGridTileSize),
FMath::DivideAndRoundUp<int32>(UpdateRegion.CellsSize.Y, GCullGridTileSize),
FMath::DivideAndRoundUp<int32>(UpdateRegion.CellsSize.Z, GCullGridTileSize));
SetShaderValue(RHICmdList, ShaderRHI, CullGridDimension, GridDimensionValue);
SetShaderValue(RHICmdList, ShaderRHI, UpdateRegionSize, UpdateRegion.CellsSize);
SetShaderValue(RHICmdList, ShaderRHI, VolumeTexelSize, FVector(1.0f / GAOGlobalDFResolution));
SetShaderValue(RHICmdList, ShaderRHI, UpdateRegionVolumeMin, UpdateRegion.Bounds.Min);
SetShaderValue(RHICmdList, ShaderRHI, ClipmapIndex, ClipmapIndexValue);
extern float GAOConeHalfAngle;
const float GlobalMaxSphereQueryRadius = MaxOcclusionDistance / (1 + FMath::Tan(GAOConeHalfAngle));
SetShaderValue(RHICmdList, ShaderRHI, AOGlobalMaxSphereQueryRadius, GlobalMaxSphereQueryRadius);
}
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
void UnsetParameters(FRHICommandList& RHICmdList,const FGlobalDistanceFieldClipmap& Clipmap)
{
GlobalDistanceFieldTexture.UnsetUAV(RHICmdList, GetComputeShader());
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
const FSceneRenderTargetItem& ClipMapRTI = Clipmap.RenderTarget->GetRenderTargetItem();
RHICmdList.TransitionResource(EResourceTransitionAccess::EReadable, EResourceTransitionPipeline::EComputeToCompute, ClipMapRTI.UAV);
}
virtual bool Serialize(FArchive& Ar)
{
bool bShaderHasOutdatedParameters = FGlobalShader::Serialize(Ar);
Ar << CulledObjectBufferParameters;
Ar << GlobalDistanceFieldParameters;
Ar << GlobalDistanceFieldTexture;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
Ar << ParentGlobalDistanceFieldTexture;
Ar << CulledObjectGrid;
Ar << UpdateRegionSize;
Ar << CullGridDimension;
Ar << VolumeTexelSize;
Ar << UpdateRegionVolumeMin;
Ar << ClipmapIndex;
Ar << AOGlobalMaxSphereQueryRadius;
return bShaderHasOutdatedParameters;
}
private:
FDistanceFieldCulledObjectBufferParameters CulledObjectBufferParameters;
FGlobalDistanceFieldParameters GlobalDistanceFieldParameters;
FRWShaderParameter GlobalDistanceFieldTexture;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
FShaderResourceParameter ParentGlobalDistanceFieldTexture;
FShaderResourceParameter CulledObjectGrid;
FShaderParameter UpdateRegionSize;
FShaderParameter CullGridDimension;
FShaderParameter VolumeTexelSize;
FShaderParameter UpdateRegionVolumeMin;
FShaderParameter ClipmapIndex;
FShaderParameter AOGlobalMaxSphereQueryRadius;
};
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
IMPLEMENT_SHADER_TYPE(template<>,TCompositeObjectDistanceFieldsCS<true>,TEXT("GlobalDistanceField"),TEXT("CompositeObjectDistanceFieldsCS"),SF_Compute);
IMPLEMENT_SHADER_TYPE(template<>,TCompositeObjectDistanceFieldsCS<false>,TEXT("GlobalDistanceField"),TEXT("CompositeObjectDistanceFieldsCS"),SF_Compute);
Copying //UE4/Dev-Rendering to Dev-Main (Source //UE4/Dev-Rendering@2932636) #lockdown nick.penwarden ========================== MAJOR FEATURES + CHANGES ========================== Change 2917472 on 2016/03/21 by Rolando.Caloca DR - Fix SCW directcompile arguments, add -pipeline Change 2919580 on 2016/03/23 by Rolando.Caloca DR - HlslParser - Fix for used elements (sparrow's arrow was showing when it shouldn't) Arrays of input/outputs are now flattened so disjoint entries can be optimized out (and fixes a bug) #jira OR-15380 #tests Run game with sparrow, test with slomo to check for gfx glitches Change 2919660 on 2016/03/23 by Rolando.Caloca DR - Latest vk changes (from dev mobile's 2916881 to 2919157) Change 2919902 on 2016/03/23 by Rolando.Caloca DR - Fix skeletal meshes decrementing stats twice #codereview Marcus.Wassmer #jira UE-28478 Change 2920020 on 2016/03/23 by David.Hill #Jira UE-28503 EyeAdapation when used in material shader may not be initialized. #rb olaf.piesche Change 2920071 on 2016/03/23 by Rolando.Caloca DR - Remove old vk define - Started moving around direct calls to queue submit Change 2920252 on 2016/03/23 by Rolando.Caloca DR - Changes vk structs to classes Change 2920314 on 2016/03/23 by Olaf.Piesche Add -windowed to standalone game PIE command line to avoid PIE launching in full screen #jira UE-27870 #codereview michael.trepka Change 2920745 on 2016/03/24 by Uriel.Doyon Texture streaming build now takes into account the material texcoord scales applied to the texture sampling. Also finds out which texcoord is being used when sampling textures (between 0 and 3 currently). TexCoord analysis debug view shaders is now working with SM4 ane SM5. StaticMeshComponents hold persistent data coming from the texture streaming build. #tests tested with different Paragon assets. Editor SM4 & SM5. Cooked maps #codereview marcus.wassmer Change 2921335 on 2016/03/24 by Uriel.Doyon Added missing static keyword for locally defined console variable. #codereview rolando.caloca Change 2921416 on 2016/03/24 by Uriel.Doyon Revert enabling debugview shaders on non PC platforms (until properly tested and debugged) Change 2921446 on 2016/03/24 by Daniel.Wright Planar reflection mesh Change 2921530 on 2016/03/24 by Daniel.Wright Manual revert of Ronin planar reflections #codereview Ryan.Vance Change 2921608 on 2016/03/24 by Uriel.Doyon Updated texture streamer to take into account the new HLOD texture group. Change 2921677 on 2016/03/24 by Daniel.Wright Distance Field Specular Occlusion * Prototype - disabled by default Change 2921681 on 2016/03/24 by Daniel.Wright UnmappedTexelsPercentage is now 100 based Change 2921682 on 2016/03/24 by Daniel.Wright Planar reflections * New actor and component * The scene is rendered to texture with a mirrored camera and a clip plane each frame * The reflection texture is then applied to opaque pixels in a deferred pass, with distance and angle from plane fades * Translucent materials apply the nearest reflection plane in the base pass * Planar reflections require the project setting 'Support global clip plane for Planar Reflections' to be enabled, since writing to SV_ClipDistance all the time adds about 15% BasePass GPU time on PS4 * Fixed global distance field in materials which had been broken since moving global distance field properties into the view uniform buffer * Fixed PS4 removing system-value semantics when output from vertex shader and not read in next stage Change 2921734 on 2016/03/24 by Uriel.Doyon Fixed tessellated cube having wrong UVs #jira UE-28379 Change 2922063 on 2016/03/24 by Daniel.Wright Removed planar reflection debug code Change 2922428 on 2016/03/25 by Chris.Bunner Delete FShaderPipeline objects when clearing TMaterialShaderMaps. #rb Rolando.Caloca #jira UE-28621 Change 2922803 on 2016/03/25 by Rolando.Caloca DR - New cmd buffer management (disabled) - Move cmd buffer out of pending state and into context - Do not hardcode # cmd buffers - Move back buffer image mgmt into swapchain - Fixed some image layout transition bugs Change 2923056 on 2016/03/25 by Rolando.Caloca DR - Initial fix for canvas locking inside a render pass [CL 2932649 by Gil Gribb in Main branch]
2016-04-04 18:44:59 -04:00
const int32 HeightfieldCompositeTileSize = 8;
class FCompositeHeightfieldsIntoGlobalDistanceFieldCS : public FGlobalShader
{
DECLARE_SHADER_TYPE(FCompositeHeightfieldsIntoGlobalDistanceFieldCS, Global)
public:
static bool ShouldCache(EShaderPlatform Platform)
{
Copying //UE4/Dev-Platform to //UE4/Main (Source: //UE4/Dev-Platform @ 2945165) ========================== MAJOR FEATURES + CHANGES ========================== Change 2731596 on 2015/10/16 by Keith.Judge Fast Semantics - Add deferred resource deletion queue to make deleted resources be actually deleted a number of frames later so that the GPU is definitely finished with them. Hooked up the temporary SRVs for dynamic VBs as a first step. Change 2764244 on 2015/11/12 by Keith.Judge Stop creating/deleting SRVs for dynamic buffers all the time. Instead add a new view when a new buffer is cycled and keep them around until the buffer is dead. Change 2936695 on 2016/04/07 by Mark.Satterthwaite Disable Metal on Mac OS X versions prior to 10.11.4 because their drivers are too buggy and lots of issues users will encounter were fixed in 10.11.4. Change 2936701 on 2016/04/07 by Mark.Satterthwaite Re-enable DistanceFieldAO on Mac Metal, but disable the support for Heightfield composition into the DistanceFieldAO as that currently requires Read+Write texture support which isn't present in Metal yet. Change 2936702 on 2016/04/07 by Mark.Satterthwaite Enable capsule shadows on Metal SM5 for Mac Paragon. Change 2936704 on 2016/04/07 by Mark.Satterthwaite Re-add Metal_MacES3_1 entries in RHIDefinition functions - not quite sure how they disappeared... Change 2936706 on 2016/04/07 by Mark.Satterthwaite Fix implementation of BeginRenderingPrePass to only clear the depth+stencil buffers when it is asked to - this fixes LOD fading in Paragon on Mac when parallel execution is enabled. The parallel contexts were being instructed to clear the pre-pass data which was then fouling up all the subsequent rendering. Change 2936708 on 2016/04/07 by Mark.Satterthwaite Extend workaround from 2905206 for UE-27818 to Metal as well as OpenGL - it isn't permissable to have an unbound resource on these APIs as you aren't allowed to make the assumption than an if-branch will protect against null dereference on the GPU. Change 2936737 on 2016/04/07 by Mark.Satterthwaite Changes to reduce Metal's peak memory usage when streaming texture data in when loading levels: - When uploading data to Shared memory Metal textures dispose of the source buffer immediately as it won't be required after the call to replaceRegion. - Add an optional CVar ("rhi.Metal.MaxOutstandingAsyncTexUploads") and keep a count of outstanding async. texture uploads - really this might be better as a memory count - if teh current outstanding count is greater than the specified CVar wait for all operations to complete before purging the used buffers. By default the CVar is set to 0 which disables the tracking and the game will use all available memory. Change 2936741 on 2016/04/07 by Mark.Satterthwaite For Metal don't use a texture-view unless the mip-range or texture format is different - we only create a texture-view if we absolutely have to in order to avoid performance reductions on some vendors. When this happens we'll have to update the SRV to point to the new source texture but that can be handled on bind. Change 2936753 on 2016/04/07 by Mark.Satterthwaite Mutex the pipeline cache inside each Metal bound-shader-state - with parallel execution this can be modified from multiple threads. Change 2936758 on 2016/04/07 by Mark.Satterthwaite Don't eliminate redundant Metal command-encoders on Intel either - this only seems to work correctly on AMD... Change 2936762 on 2016/04/07 by Mark.Satterthwaite Validate that parallel contexts aren't asked to clear render-targets (incl depth/stencil) as this won't work as desired - each context will end up clearing the contents meaning none of the rendering will propagate which is probably not what you intend. Change 2936765 on 2016/04/07 by Mark.Satterthwaite Fix alignment of blit-commands between textures & buffers in Metal. Mac Metal has no minimum alignment and allows tightly packed rows, but mobile H/W has a fixed row alignment of 64-bytes for linear texture data. Change 2936767 on 2016/04/07 by Mark.Satterthwaite Remove the workaround for old Mac OS X AMD cards not supporting frame-buffer sRGB - I have a feeling it causes more problems than it solves. Change 2936773 on 2016/04/07 by Mark.Satterthwaite Warn the user when trying to run on OpenGL 3.2 GPUs that they are running unsupported and there may be rendering errors and performance problems. Change 2936777 on 2016/04/07 by Mark.Satterthwaite In MetalRHI move the calls that reset the state-cache and command-encoder to EndFrame and fix the fallout. Hopefully this will eliminate the problem where a draw call fails because a call to SetRenderTargets hasn't been made since the last time the command-buffer was submitted. Change 2936788 on 2016/04/07 by Daniel.Lamb PR #2193: Commandlet tweaks: Fixup Redirects and Cook-on-the-fly Server (Contributed by FineRedMist) Change 2936799 on 2016/04/07 by Mark.Satterthwaite Automatically enable RHI thread support in Metal if parallel execution is compiled in unless running the Editor which doesn't support it. The RHI thread is only enabled in Metal when parallel execution is available because there's little to no advantages to the RHI thread with Metal unless parallel execution is enabled. Also only enable async compute support when parallel execution is available and running on an AMD GPU as only AMD support it and again its only really useful when running things in parallel. Disable the async. compute fencing code when async. compute is disabled. Change 2936923 on 2016/04/07 by Peter.Sauerbrei add exception when trying to access an ini section which we don't cache Change 2937839 on 2016/04/08 by Lee.Clark PS4 - BC6H and BC7 support #jira OR-14804 Change 2937841 on 2016/04/08 by Lee.Clark Fix media player not opening the same file that has just been closed. Change 2937843 on 2016/04/08 by Lee.Clark PS4 - Fix Media player seek and duration Change 2938272 on 2016/04/08 by Mark.Satterthwaite Change the Metal texture lock/unlock code to use the immediate getBytes API to read from the texture unless it is stored in Private memory and requires we use the blit-encoder. This means iOS texture locking won't generally need to use the blit encoder with its row alignment restrictions as all textures will be Shared. On Mac most textures will still be accessed via blit operations since Private memory is much more common than Managed. Change 2938471 on 2016/04/08 by Nick.Shin load/save with "negative char size value" fixed two 'U''s were missing-- causing base64encoder and decoder to not return original data (or expected data for that matter)... C++ function shows these signatures: \Engine\Source\Runtime\Engine\Public\SaveGameSystem.h FGenericSaveGameSystem::SaveGame(..., ..., ..., ...<uint8>); FGenericSaveGameSystem::LoadGame(..., ..., ..., ...<uint8>); javascript version was using signed operators... answerhub 386719 - broken load/save Change 2938546 on 2016/04/08 by Nick.Shin upgrading zlib, openssl, libcurl & libwebsockets (part 1, for libwebsockets -- engine changes will be in part 2) and writting down all of the build instructions into a single script file future tasks may have the script broken to smaller sized files and placed in the respective library folders for simplicity -- for now, it's bundled together to ensure build is all-in-one... #jira UEPLAT-1246 - Update libWebsockets #jira UEPLAT-1221 - update websocket library #jira UEPLAT-1223 - Arrange testing for 'update websocket library' #jira UEPLAT-1203 - Add Linux library for libwebsockets #jira UEPLAT-1204 - Rebuild libwebsockets with SSL Change 2939402 on 2016/04/11 by Mark.Satterthwaite No need to call synchroniseTexture on iOS when locking a texture for read - that function call is only available and required on OS X. Change 2939526 on 2016/04/11 by Daniel.Lamb Don't mark content as modified when garbage collecting. #PR2249 #2249 Change 2940094 on 2016/04/11 by Michael.Trepka Temporarily rollback //UE4/Dev-Platform/Engine/Source/Runtime/Apple/MetalRHI/Private/MetalContext.cpp to revision 47 to unblock UE-29312 #codereview Mark.Satterthwaite Change 2940540 on 2016/04/12 by Jeff.Campeau Hardware decompress support for Xbox One. Change 2940793 on 2016/04/12 by Lee.Clark PS4 - Fix file handle leaks #codereview Marcus.Wassmer Change 2940903 on 2016/04/12 by Keith.Judge Xbox One - Fix validated driver warning when using async compute with dynamic buffers (remove dynamic buffers). Also resurrected the old deferred deletion queue (though simplified the code a lot as TQueue was horrible), as the platform agnostic one can't deal with placement created resources with a separate buffer/resource. Moved associated deferred deletion logic out of FD3D11DynamicBuffer. Attempted fix for UE-28758, but doesn't actually fix it. :( Change 2940959 on 2016/04/12 by Michael.Trepka Moved RadioEffectUnit CoreAudio plugin to Engine/Build/Mac #codereview Ben.Marsh Change 2940961 on 2016/04/12 by Michael.Trepka Updated GitHub readme for Mac with instructions for building ShaderCompileWorker #codereview Ben.Marsh Change 2941010 on 2016/04/12 by Daniel.Lamb Added xboxone.projectsettings to the ini processer in UAT. #codereview Peter.Sauerbrei #jira UE-29344 Change 2941671 on 2016/04/12 by Jeff.Campeau March 2016 XDK update Change 2941998 on 2016/04/13 by Mark.Satterthwaite In MetalCommandList retain/release when waiting on a command-buffer commit to ensure command-buffer lifetime. Change 2942008 on 2016/04/13 by Keith.Judge Fix black reflections when async compute is enabled. Shader resource tables are now properly bound on the context, and fixed a bug where the buffer offset and number of constants wasn't being properly set when using the ring buffer. #jira UE-28758 Change 2942046 on 2016/04/13 by Lee.Clark Fix SN-DBS launch failure with VS2015 - (path env seems to be null in 2015) Change 2942095 on 2016/04/13 by Mark.Satterthwaite MIssed a change to use AlignedStride rather than Stride in FMetalDynamicRHI::RHIReadSurfaceFloatData to fix iOS blit-encoder read-back. Change 2942383 on 2016/04/13 by Mark.Satterthwaite When locking a Metal texture on Mac you have to submit the command buffer & wait in a different location for Private vs. Managed access. Change 2942419 on 2016/04/13 by Michael.Trepka Enabled IPv6 on iOS and Mac #jira UEPLAT-1168 Change 2942472 on 2016/04/13 by Daniel.Lamb Fixed missing OnlineSubsystemGooglePlay section from UAT ini processing. #codereview Peter.Sauerbrei Change 2942829 on 2016/04/13 by Nick.Shin removing unused emscripten - ThirdParty folder the ThirdParty folder contained a lot of GPL (and sometimes missing) licensing projects - removing to keep legal happy tested with full rebuild, full cook, packaging and running QAGame Change 2943110 on 2016/04/13 by Chris.Babcock Fix Oculus mobile binaries filtered (include the jars) #jira ue-29080 #ue4 #android #gearvr #codereview Nick.Whiting Change 2943111 on 2016/04/13 by Chris.Babcock Fix GearVR plugin jar copy logic #jira UE-29108 #ue4 #android #gearvr #codereview Nick.Whiting Change 2943579 on 2016/04/14 by Peter.Sauerbrei enable tvOS in the binary build Change 2943587 on 2016/04/14 by Peter.Sauerbrei creation of xcode archives when using the archive command with iOS Change 2943619 on 2016/04/14 by Mark.Satterthwaite Make Metal SM4 the default on Mac again for the 4.12 release. As expected there are many bugs to resolve. Change 2943869 on 2016/04/14 by Daniel.Lamb Turned duplicate stage of files into warning. #codereview Peter.Sauerbrei #lockdown Josh.Adams #jira UE-29487 Change 2943901 on 2016/04/14 by Keith.Judge Xbox One - Fix start up crash. #lockdown Josh.Adams Change 2943981 on 2016/04/14 by Mark.Satterthwaite Allow RWBuffers in Metal, which should work, but not RWTextures which aren't yet supported. #jira UE-29492 #lockdown josh.adams Change 2944080 on 2016/04/14 by Peter.Sauerbrei temporarily remove code which removes passwords in ini files which reside in the pak file multiple defaultengine.ini files were being copied over one another #lockdown josh.adams Change 2944249 on 2016/04/14 by Peter.Sauerbrei fix for writing out DefaulEngine.ini when deprecated items are not changed #codereview zabir.hoque #lockdown josh.adams Change 2944454 on 2016/04/14 by Peter.Sauerbrei fix for incorrect archive directory and missing IPA #jira UE-29509 #lockdown josh.adams Change 2944834 on 2016/04/15 by Peter.Sauerbrei fix for crash on some iOS device due to resources being freed while still in use by the GPU #jira UE-29517 #lockdown josh.adams Change 2944915 on 2016/04/15 by Peter.Sauerbrei fix for including Facebook SDK in tvOS since we don't have the Facebook libraries for tvOS yet #lockdown josh.adams Change 2945164 on 2016/04/15 by Josh.Adams - Removing XboxOnePDBFile from junkmanifest, causes too much trouble when staging using AutoSDKs #lockdown nick.penwarden #codereview jeff.campeau #lockdown nick.penwarden [CL 2945174 by Josh Adams in Main branch]
2016-04-15 10:04:09 -04:00
return IsFeatureLevelSupported(Platform, ERHIFeatureLevel::SM5) && DoesPlatformSupportDistanceFieldAO(Platform) && !IsMetalPlatform(Platform);
Copying //UE4/Dev-Rendering to Dev-Main (Source //UE4/Dev-Rendering@2932636) #lockdown nick.penwarden ========================== MAJOR FEATURES + CHANGES ========================== Change 2917472 on 2016/03/21 by Rolando.Caloca DR - Fix SCW directcompile arguments, add -pipeline Change 2919580 on 2016/03/23 by Rolando.Caloca DR - HlslParser - Fix for used elements (sparrow's arrow was showing when it shouldn't) Arrays of input/outputs are now flattened so disjoint entries can be optimized out (and fixes a bug) #jira OR-15380 #tests Run game with sparrow, test with slomo to check for gfx glitches Change 2919660 on 2016/03/23 by Rolando.Caloca DR - Latest vk changes (from dev mobile's 2916881 to 2919157) Change 2919902 on 2016/03/23 by Rolando.Caloca DR - Fix skeletal meshes decrementing stats twice #codereview Marcus.Wassmer #jira UE-28478 Change 2920020 on 2016/03/23 by David.Hill #Jira UE-28503 EyeAdapation when used in material shader may not be initialized. #rb olaf.piesche Change 2920071 on 2016/03/23 by Rolando.Caloca DR - Remove old vk define - Started moving around direct calls to queue submit Change 2920252 on 2016/03/23 by Rolando.Caloca DR - Changes vk structs to classes Change 2920314 on 2016/03/23 by Olaf.Piesche Add -windowed to standalone game PIE command line to avoid PIE launching in full screen #jira UE-27870 #codereview michael.trepka Change 2920745 on 2016/03/24 by Uriel.Doyon Texture streaming build now takes into account the material texcoord scales applied to the texture sampling. Also finds out which texcoord is being used when sampling textures (between 0 and 3 currently). TexCoord analysis debug view shaders is now working with SM4 ane SM5. StaticMeshComponents hold persistent data coming from the texture streaming build. #tests tested with different Paragon assets. Editor SM4 & SM5. Cooked maps #codereview marcus.wassmer Change 2921335 on 2016/03/24 by Uriel.Doyon Added missing static keyword for locally defined console variable. #codereview rolando.caloca Change 2921416 on 2016/03/24 by Uriel.Doyon Revert enabling debugview shaders on non PC platforms (until properly tested and debugged) Change 2921446 on 2016/03/24 by Daniel.Wright Planar reflection mesh Change 2921530 on 2016/03/24 by Daniel.Wright Manual revert of Ronin planar reflections #codereview Ryan.Vance Change 2921608 on 2016/03/24 by Uriel.Doyon Updated texture streamer to take into account the new HLOD texture group. Change 2921677 on 2016/03/24 by Daniel.Wright Distance Field Specular Occlusion * Prototype - disabled by default Change 2921681 on 2016/03/24 by Daniel.Wright UnmappedTexelsPercentage is now 100 based Change 2921682 on 2016/03/24 by Daniel.Wright Planar reflections * New actor and component * The scene is rendered to texture with a mirrored camera and a clip plane each frame * The reflection texture is then applied to opaque pixels in a deferred pass, with distance and angle from plane fades * Translucent materials apply the nearest reflection plane in the base pass * Planar reflections require the project setting 'Support global clip plane for Planar Reflections' to be enabled, since writing to SV_ClipDistance all the time adds about 15% BasePass GPU time on PS4 * Fixed global distance field in materials which had been broken since moving global distance field properties into the view uniform buffer * Fixed PS4 removing system-value semantics when output from vertex shader and not read in next stage Change 2921734 on 2016/03/24 by Uriel.Doyon Fixed tessellated cube having wrong UVs #jira UE-28379 Change 2922063 on 2016/03/24 by Daniel.Wright Removed planar reflection debug code Change 2922428 on 2016/03/25 by Chris.Bunner Delete FShaderPipeline objects when clearing TMaterialShaderMaps. #rb Rolando.Caloca #jira UE-28621 Change 2922803 on 2016/03/25 by Rolando.Caloca DR - New cmd buffer management (disabled) - Move cmd buffer out of pending state and into context - Do not hardcode # cmd buffers - Move back buffer image mgmt into swapchain - Fixed some image layout transition bugs Change 2923056 on 2016/03/25 by Rolando.Caloca DR - Initial fix for canvas locking inside a render pass [CL 2932649 by Gil Gribb in Main branch]
2016-04-04 18:44:59 -04:00
}
static void ModifyCompilationEnvironment(EShaderPlatform Platform, FShaderCompilerEnvironment& OutEnvironment)
{
FGlobalShader::ModifyCompilationEnvironment(Platform, OutEnvironment);
OutEnvironment.SetDefine(TEXT("COMPOSITE_HEIGHTFIELDS_THREADGROUP_SIZE"), HeightfieldCompositeTileSize);
}
FCompositeHeightfieldsIntoGlobalDistanceFieldCS(const ShaderMetaType::CompiledShaderInitializerType& Initializer)
: FGlobalShader(Initializer)
{
GlobalDistanceFieldParameters.Bind(Initializer.ParameterMap);
GlobalDistanceFieldTexture.Bind(Initializer.ParameterMap, TEXT("GlobalDistanceFieldTexture"));
UpdateRegionSize.Bind(Initializer.ParameterMap, TEXT("UpdateRegionSize"));
VolumeTexelSize.Bind(Initializer.ParameterMap, TEXT("VolumeTexelSize"));
UpdateRegionVolumeMin.Bind(Initializer.ParameterMap, TEXT("UpdateRegionVolumeMin"));
ClipmapIndex.Bind(Initializer.ParameterMap, TEXT("ClipmapIndex"));
AOGlobalMaxSphereQueryRadius.Bind(Initializer.ParameterMap, TEXT("AOGlobalMaxSphereQueryRadius"));
HeightfieldDescriptionParameters.Bind(Initializer.ParameterMap);
HeightfieldTextureParameters.Bind(Initializer.ParameterMap);
}
FCompositeHeightfieldsIntoGlobalDistanceFieldCS()
{
}
void SetParameters(
FRHICommandList& RHICmdList,
const FScene* Scene,
const FSceneView& View,
float MaxOcclusionDistance,
const FGlobalDistanceFieldInfo& GlobalDistanceFieldInfo,
int32 ClipmapIndexValue,
const FVolumeUpdateRegion& UpdateRegion,
UTexture2D* HeightfieldTextureValue,
int32 NumHeightfieldsValue)
{
FComputeShaderRHIParamRef ShaderRHI = GetComputeShader();
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3357411) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3244756 on 2017/01/03 by Marcus.Wassmer Copying //Tasks/UE4/Dev-Niagara@3244743 to Dev-Rendering (//UE4/Dev-Rendering) Change 3248667 on 2017/01/05 by Olaf.Piesche Resaving default asset because of engine verison issue; maybe unnecessary, but resaving niagara engine content to be sure #jira UE-40160 Change 3249324 on 2017/01/06 by Marcus.Wassmer Resave with an actual version to stop cook warning Change 3249611 on 2017/01/06 by Marcus.Wassmer Just remove warning-causing niagara data for now. Change 3308052 on 2017/02/16 by Rolando.Caloca DR - Check for Vulkan SDK, and only use it if it's newer or the same as the headers we distribute Change 3308109 on 2017/02/16 by Rolando.Caloca DR - Upgrade glslang to 1.0.39.1 Change 3308111 on 2017/02/16 by Rolando.Caloca DR - Update Vulkan distribution to 1.0.39.1 Change 3308153 on 2017/02/16 by Rolando.Caloca DR - Updated glslang libs Change 3308842 on 2017/02/17 by Rolando.Caloca DR - Fixed copy/paste Change 3310007 on 2017/02/17 by Chris.Bunner Back out CL 3221219 - causing MIC generation issues and superseded by CL 3273971. #jira UE-37792 Change 3310154 on 2017/02/17 by Chris.Bunner Assert when attempting to add a custom material attribute already in the base attributes list. Change 3310155 on 2017/02/17 by Chris.Bunner PR #3231: Validate material index before accessing (Contributed by projectgheist) #jira UE-41774, UE-41788 Change 3310162 on 2017/02/17 by Chris.Bunner PR #3252: Added MobileMaterialInterface to UsedMaterials (Contributed by projectgheist) #jira UE-41823, UE-41950 Change 3310176 on 2017/02/17 by Chris.Bunner Merging CL 3233886: AMD HDR support (requires r.AMDSupportsHDRDisplayOutput=1 in ini). Update to AGS 5.0.5. Partial code tidy up. Change 3310187 on 2017/02/17 by Chris.Bunner Preserve constant expressions rather than always casting after translating a material attribute. Losing the notion of constant means we can't correctly detect used properties and falsely enable e.g. PDO. Happened because of the incorrect component masks in BreakMaterialNodes which then had to be downcast to the correct type which is done as an inline fragment rather than swizzle expression. #jira UE-41594 Change 3310215 on 2017/02/17 by Chris.Bunner Prevent SpeedTree node compiling for skeletal meshes (not supported as uses more UV sets than available). More descriptive error for missing Cubemap UV input on TextureSample material node . #jira UE-33098 Change 3310838 on 2017/02/18 by Joe.Graf Moved some private functions to public for a licensee #CodeReview: matt.kuhlenschmidt #rb: n/a Change 3311876 on 2017/02/20 by Rolando.Caloca DR - Expose skin cache cvar r.SkinCache.AccumulationBufferSizeInKB #jira UE-42014 Change 3314139 on 2017/02/21 by Rolando.Caloca DR - Minor cleanup pass - Remove FVulkanPendingState - Renamed some classes for clarity - Hoist pending UAVs for flush out to pending compute state Change 3314642 on 2017/02/21 by Rolando.Caloca DR - Some more renaming Change 3315431 on 2017/02/21 by Ben.Salem Properly set default values for test time out and tick. We now will default to ticking once per second, and tracking the macro stats of GPU/Render/Game thread time. #tests Ran showdown demo several times Change 3316710 on 2017/02/22 by Rolando.Caloca DR - hlslcc - Fix refract intrinsic Change 3316718 on 2017/02/22 by Rolando.Caloca DR - hlslcc - Built libs to pick up change from 3316710 - refract fix Change 3316820 on 2017/02/22 by Benjamin.Hyder updating Tm-TrigNodes map Change 3317192 on 2017/02/22 by Benjamin.Hyder Updating QA-Decals map Change 3317528 on 2017/02/22 by Benjamin.Hyder Updating QA-Decals map Change 3317639 on 2017/02/22 by Benjamin.Hyder Updating Decal on Complex Mesh example in QA-Decals Change 3317764 on 2017/02/22 by Benjamin.Hyder Final updates to QA-Decals Change 3318319 on 2017/02/22 by Rolando.Caloca DR - minor reorg/rename Change 3318379 on 2017/02/22 by Rolando.Caloca DR - more cleanup Change 3321181 on 2017/02/24 by Rolando.Caloca DR - Fix GL bug Change 3321247 on 2017/02/24 by Rolando.Caloca DR - Fix misc bugs Change 3321898 on 2017/02/24 by Chris.Bunner Only issue clear TLV dispatch if required. #jira UERNDR-193 Change 3321904 on 2017/02/24 by Chris.Bunner Added comment for potential future optimization. Change 3322013 on 2017/02/24 by Uriel.Doyon Fixed separate translucency being affected by Gaussian DOF #jira UE-40489 Change 3322517 on 2017/02/24 by Uriel.Doyon Fixed issue with InvestigateTexture command removing budget limit. Fixed StreamingBounds show flag not working. It nows shows the streaming bound for the currently selected textures. #jira UE-40485 Change 3323470 on 2017/02/27 by Chad.Garyet Removing DDC job from dev-rendering Change 3323479 on 2017/02/27 by Chad.Garyet Removing RDU agent type Change 3323519 on 2017/02/27 by Chad.Garyet removing NCL/LHR/SEA agent types to clean up space Change 3323639 on 2017/02/27 by Benjamin.Hyder More updates to QA-Decals Change 3324207 on 2017/02/27 by Uriel.Doyon Fixed typo ScaleTexturesByGlobalMyBias -> ScaleTexturesByGlobalMipBias Removed bad merge in FStreamingTextureLevelContext::GetBuildDataIndexRef Change 3324396 on 2017/02/27 by Uriel.Doyon Fixed an issue with the Streaming Bounds show flag interferring with the static level data initialization #jira UE-40485 Change 3325227 on 2017/02/28 by Chris.Bunner Fix-up AMD AGS libs. Change 3325566 on 2017/02/28 by Uriel.Doyon Fixed possible out-of-bound access in GetUsedTexture() when passing ERHIFeatureLevel::Num Change 3326009 on 2017/02/28 by Uriel.Doyon Better fix for 3325566, as the previous fix would ignore the material instance overrides. Change 3327058 on 2017/03/01 by Benjamin.Hyder Preparing TM_Shadermodels map for automation Change 3328222 on 2017/03/01 by Chris.Bunner Prevent decals from drawing in separate translucency pass. Whilst user control and material relevance were already removed, if the flag was checked before being disabled (by swapping to decal domain) this was still being read in the render loop, now explicitly ignores decals. #jira UE-42449, UE-42446 Change 3329848 on 2017/03/02 by Uriel.Doyon Added some extra logs to help track UE-42168 Change 3329977 on 2017/03/02 by Rolando.Caloca DR - Fix bad clear value Change 3330008 on 2017/03/02 by Benjamin.Hyder More preparations for QA-Decals automation Change 3330754 on 2017/03/02 by Daniel.Wright Prominent comment explaining reflection env async compute usage and why it's not overlapped with anything Change 3331451 on 2017/03/03 by Marc.Olano Manually unroll simplex noise loop to avoid PSO bug on AMD/Metal Change 3331839 on 2017/03/03 by Rolando.Caloca DR - hlslcc - add missing file to project Change 3332247 on 2017/03/03 by Rolando.Caloca DR - Fix for integrated intel PR #3305 #jira UE-42393 Change 3332259 on 2017/03/03 by Rolando.Caloca DR - Fix bad index into pixel formats PR #3237 #jira UE-41855 Change 3332305 on 2017/03/03 by Rolando.Caloca DR - OpenGL SRV for index buffers PR #3271 #jira UE-32618 Change 3332313 on 2017/03/03 by Rolando.Caloca DR - Fix for integrated intel (properly) PR #3305 #jira UE-42393 Change 3332317 on 2017/03/03 by Rolando.Caloca DR - OpenGL SRV for index buffers (properly) PR #3271 #jira UE-32618 Change 3332368 on 2017/03/03 by Rolando.Caloca DR - Minor fixes so -sm4 and -sm5 can be used on windows with OpenGL/Vulkan Change 3333690 on 2017/03/06 by Daniel.Wright [Copy] Changing movable skylight properties no longer affects static draw lists Change 3333693 on 2017/03/06 by Daniel.Wright [Copy] Added 'r.AOListMeshDistanceFields' which dumps out mesh distance fields sorted by memory size, useful for directing content optimizations Change 3333705 on 2017/03/06 by Daniel.Wright [Copy] Mesh distance fields are now 8 bit fixed point by default, but can be changed back to 16 bit floating piont with a project setting. * 8 bit uses half memory but introduces error for thin surfaces or large meshes. Change 3333721 on 2017/03/06 by David.Hill DecalProxy: Copy float FadeScreenSize to FDeferredDecalProxy for use in the render thread. This avoids pointer chasing to the UDecalComponent (game thread component). Change 3333772 on 2017/03/06 by Daniel.Wright [Copy] Scene motion blur data is only updated for the main renderer frames. Fixes scene captures and planar reflections breaking object motion blur. Change 3333790 on 2017/03/06 by Daniel.Wright [Copy] Mesh distance field generation uses Embree, for a 2.5x speedup * Can switch back to old kDOP generation with 'r.DistanceFieldBuild.UseEmbree 0' for debugging Change 3333822 on 2017/03/06 by Daniel.Wright [Copy] Moved mesh distance field code into MeshDistanceFieldUtilities.cpp * Moved FMeshUtilities to its own header so the 8k line MeshUtilites.cpp file can be further split up Change 3333827 on 2017/03/06 by Daniel.Wright [Copy] Range compress 8bit distance fields - gets one extra bit of precision on average Change 3333828 on 2017/03/06 by Daniel.Wright [Copy] Raised High ShadowQuality to 2048 as 1024 for CSM is way too low Change 3333831 on 2017/03/06 by Daniel.Wright Non-editor compile fix Change 3333836 on 2017/03/06 by Daniel.Wright [Copy] Workaround for gobal distance field volume textures being bloated by 4x on PS4 due to the recommended tiling modes. They now use a 2d tiling mode which avoids the bloat, saving 96Mb. Change 3333843 on 2017/03/06 by Daniel.Wright [Copy] Added OcclusionExponent to skylight component * Useful for brightening up indoors without losing contact shadows as MinOcclusion does Change 3333845 on 2017/03/06 by Daniel.Wright [Copy] Capsule shadow BP functions Change 3333850 on 2017/03/06 by Daniel.Wright [Copy] Added OcclusionCombineMode to skylight component Change 3333854 on 2017/03/06 by Daniel.Wright [Copy] Gnm properly registers clears as GPU work so those events show up in profilegpu Change 3333857 on 2017/03/06 by Daniel.Wright [Copy] Clear light attenuation for local lights with a quad covering their screen extents * Clearing the entire light attenuation buffer costs .1ms on PS4. This optimization lowers the minimum cost of a shadow casting light from .15ms -> .03ms. * Shadowed lights in Fortnite with 25 lights 3.7ms -> 1.42ms on PS4 Change 3333860 on 2017/03/06 by Daniel.Wright [Copy] Flush deferred deletes when reallocating distance field atlas to reduce peak memory Change 3333861 on 2017/03/06 by Daniel.Wright [Copy] Disable all distance field features on Intel cards as HD 4000 hangs in the RHICreateTexture3D call to allocate the large atlas Change 3333869 on 2017/03/06 by Daniel.Wright [Copy] Volumetric Fog using a volume texture mapped to the camera frustum * Volumetric fog can be enabled on an Exponential Height Fog component with additional controls * Lights have a VolumetricScatteringIntensity * New cvars r.VolumetricFog, r.VolumetricFog.GridPixelSize, r.VolumetricFog.GridSizeZ, r.VolumetricFog.DepthDistributionScale * Lighting features supported: * Directional light with CSM and a light function * Point / spot lights without shadows / light functions / IES profiles * Skylight with occlusion from distance fields * Analytical height fog covers the view range past where the volumetric fog ends * Temporal reprojection is used on the volumetric fog scattering and extinction to achieve stability * Translucency integrates properly into volumetric fog * Height fog StartDistance is not supported by volumetric fog and should be set to 0. Change 3333894 on 2017/03/06 by Daniel.Wright [Copy] Initialize GDummyVolumetricFogGlobalDataUniformBuffer outside of parallel rendering Change 3333902 on 2017/03/06 by Daniel.Wright [Copy] Better handling of volumetric fog enabled with distance of 0 Change 3333903 on 2017/03/06 by Daniel.Wright [Copy] Fixed volumetric fog trying to render light functions for a point light Change 3333908 on 2017/03/06 by Daniel.Wright [Copy] Volumetric materials * Added new material domain Volume, which can output Scattering, Absorption and Emissive. All properties are in world space densities. * Particle systems using the Volume domain are voxelized based on their ParticlePosition and ParticleRadius * Volumetric fog integration is now energy conservative - scattering is integrated against transmission over the depth of each slice. * Added bOverrideLightColorsWithFogInscatteringColors to exponential height fog, which can be enabled to make Volumetric Fog match Height fog more closely Change 3334134 on 2017/03/06 by Daniel.Wright [Copy from Michael Trepka] Added Embree 2.14.0 and changed MeshUtilities to use it as this solves issues with Embree leaking TLS keys. UnrealLightmass is still using older Embree 2.7.0 until we can find time to properly test it with the new version. Also, invalidated distance field DDC to force it to rebuild with updated Embree. Change 3334420 on 2017/03/06 by Daniel.Wright Fixed RTDF shadows Change 3335467 on 2017/03/07 by Benjamin.Hyder Initial submission of QA-Decals map to EngineTest Change 3335556 on 2017/03/07 by Daniel.Wright Changed mesh distance field default format back to R16f Change 3338020 on 2017/03/08 by Daniel.Wright Disable volumetric fog in vertex shaders for feature levels which don't support it Change 3339394 on 2017/03/09 by Chris.Bunner Correctly handle material texture translation error edge case. #jira UE-42579, UE-42670 Change 3339992 on 2017/03/09 by Daniel.Wright Only compile volumetric fog shaders on supporting platforms Change 3341858 on 2017/03/10 by Arne.Schober Copying //UE4/Dev-Rendering-PSO to Dev-Rendering (//UE4/Dev-Rendering) #RB Rolando.Caloca, Marcus.Wassmer, Daniel.Wright, Nick.Penwarden, Mark.Satterthwaite Change 3342004 on 2017/03/10 by Arne.Schober Copying //UE4/Dev-Rendering-PSO to Dev-Rendering (//UE4/Dev-Rendering) Fix unity build #RB Marcus.Wassmer Change 3343307 on 2017/03/13 by Marcus.Wassmer Update showflags when we are guaranteed it will happen in all possible ways to spawn the scenecapture. (drag into editor, PIE, -game, etc) Change 3343732 on 2017/03/13 by Rolando.Caloca DR - Vulkan compute pipeline & refactor Change 3344846 on 2017/03/14 by Rolando.Caloca DR - Android compile fixes Change 3344883 on 2017/03/14 by Rolando.Caloca DR - Add missing stencil load/store to PSO initializer Change 3344985 on 2017/03/14 by Rolando.Caloca DR - Made load/store actions uint8 Change 3345141 on 2017/03/14 by Rolando.Caloca DR - vk - Rework render pass hash Change 3345304 on 2017/03/14 by Benjamin.Hyder Updating TM-Distancefields map to include TemplateFloor mesh Change 3345387 on 2017/03/14 by Rolando.Caloca DR - Add _RenderThread calls for Create*Shader so RHIs can choose not to stall when creating Change 3345388 on 2017/03/14 by Rolando.Caloca DR - Do not stall when creating shaders on Vulkan Change 3345722 on 2017/03/14 by Chris.Bunner PR #3357: MinimalAPI add to many material expressions (Contributed by DeanoC) #jira UE-42752 Change 3345723 on 2017/03/14 by Chris.Bunner Reduce log verbosity causing spamming during landscape editing. #jira UE-42714 Change 3345725 on 2017/03/14 by Chris.Bunner [Duplicate 3341860] Fixed material translation error with multiple connections from custom interpolator nodes. Change 3345726 on 2017/03/14 by Chris.Bunner Typo fixes. Change 3345732 on 2017/03/14 by Rolando.Caloca DR - Decouple vertex declaration off BSS Change 3345746 on 2017/03/14 by Chris.Bunner Added sign() intrinsic material graph node and delisted material function workaround. Change 3346042 on 2017/03/14 by Chris.Bunner Implement missing size query interface for FRenderTargetResources. #jira UE-41672 Change 3346387 on 2017/03/14 by Daniel.Wright [Copy] Added VolumetricScatteringIntensity to particle lights Change 3346389 on 2017/03/14 by Daniel.Wright [Copy] Clamp Volumetric material attributes to fp16 range to avoid INFs Disable volumetric fog when the fog show flag is disabled Change 3346392 on 2017/03/14 by Daniel.Wright [Copy] Fixed skylight being much too bright on volumetric fog Change 3346406 on 2017/03/14 by Daniel.Wright [Copy] CSM resolution is now controlled by r.Shadow.MaxCSMResolution. * Changed HighPC to use 1024 MaxShadowResolution (max for all non-CSM shadows), saves 60Mb in Fortnite Change 3346412 on 2017/03/14 by Daniel.Wright [Copy] TexCreate_ReduceMemoryWithTilingMode for translucency lighting 3d textures, saves 13Mb Change 3346414 on 2017/03/14 by Daniel.Wright [Copy] TexCreate_ReduceMemoryWithTilingMode for volumetric fog 3d textures, saves 13Mb Change 3346415 on 2017/03/14 by Daniel.Wright [Copy] Missing file from cl 3338451 Change 3346421 on 2017/03/14 by Daniel.Wright [Copy] Fixed NaNs in volumetric fog due to rendering when height fog is disabled * Volumetric fog converts NaNs to black now so they don't spread Change 3346422 on 2017/03/14 by Daniel.Wright [Copy] Fixed NaN in volumetric fog with low density values Change 3346423 on 2017/03/14 by Daniel.Wright [Copy] Changed default VolumetricFogScatteringDistribution to .2 Change 3346430 on 2017/03/14 by Daniel.Wright [Copy] New translucent material option to compute fog per pixel instead of the default per vertex Change 3346432 on 2017/03/14 by Daniel.Wright [Copy] Moved Volumetric Fog parameters to view uniform buffer for translucency pass Fixed lifetimes of temporary Volumetric Fog render targets Change 3346526 on 2017/03/14 by Daniel.Wright [Copy] Volumetric Fog supports point and spot light shadows * These lights are injected separately so that per-light resources can be bound (shadow depth map, static shadow depth map) * Forward lighting of local lights can be forced with 'r.VolumetricFog.InjectShadowedLightsSeparately 0' * Shadowed lights come at a cost: 2.9ms for volumetric fog on 970 -> 4.2ms with shadowing Change 3347053 on 2017/03/15 by Rolando.Caloca DR - android compile fix Change 3347384 on 2017/03/15 by Rolando.Caloca DR - Fix merge issue Change 3347643 on 2017/03/15 by Marcus.Wassmer Fix some bugs with the 'disable stationary skylight ffor the project' feature. Fixes lighting in Persona on Paragon. Change 3347979 on 2017/03/15 by Rolando.Caloca DR - Allow to automatically apply cached rendertargets to PSO initializer Change 3348024 on 2017/03/15 by Rolando.Caloca DR - Remove NullPS on Vulkan to avoid deadlock Change 3348303 on 2017/03/15 by Rolando.Caloca DR - Fix for debugging SCW with material SRT Change 3348357 on 2017/03/15 by Marcus.Wassmer Fix stencildither and a stencilref bug that was probably breaking decals sometimes. Change 3348549 on 2017/03/15 by Marcus.Wassmer Hopefully fix static analysis for potential nullptr access. Change 3348614 on 2017/03/15 by Marcus.Wassmer Duplicate some switch changes to fix crash on launch. Change 3349369 on 2017/03/16 by Gil.Gribb Fixed botched merge Change 3349947 on 2017/03/16 by Rolando.Caloca DR - Fix for mismatched primitive type Change 3349956 on 2017/03/16 by Benjamin.Hyder initial updates to TM-DistanceFields map Change 3350151 on 2017/03/16 by Rolando.Caloca DR - Fix UT compile issue Change 3350155 on 2017/03/16 by Rolando.Caloca DR - Catch mismatched primitive type on PSOs on D3D11 Change 3350192 on 2017/03/16 by Daniel.Wright Fix for point light shadow depths rendering with wrong cull mode due to PSO refactor Change 3350736 on 2017/03/16 by Daniel.Wright Fixed formatting from merge Change 3350881 on 2017/03/16 by Rolando.Caloca DR - Fix texture arrays as UAVs on Metal Change 3350927 on 2017/03/16 by Rolando.Caloca DR - Fix warning Change 3350935 on 2017/03/16 by Daniel.Wright Fix for materials with non-Surface domains being skipped in mesh passes Change 3351583 on 2017/03/17 by Marcus.Wassmer Fix clang platforms Change 3351917 on 2017/03/17 by Marcus.Wassmer Fix linux compile Change 3351973 on 2017/03/17 by Marcus.Wassmer Fix mismatched rendertargetformat Change 3352038 on 2017/03/17 by Daniel.Wright Enabled GetAndOrCreateGraphicsPipelineState ensures in Development for testing Change 3352110 on 2017/03/17 by Marcus.Wassmer Fix missing RT PSO apply Change 3352695 on 2017/03/17 by Arne.Schober DR - Remove PSO Rendertarget check in DX12 Resolve with Shader. #RB Rolando.Caloca Change 3352960 on 2017/03/17 by Arne.Schober DR - Fix some things that slipped trough the PSO merge #RB none Change 3353150 on 2017/03/18 by Rolando.Caloca DR - compile fix Change 3353205 on 2017/03/18 by Arne.Schober DR - Fix Incremental Compile and PS4 runtime error where CMASK is not allowed for ThickTile Mode #RB none Change 3353207 on 2017/03/18 by Arne.Schober DR - Fix Confusion #RB none Change 3355183 on 2017/03/20 by Nick.Bullard Fixed up Content orginzation for Decals automation tests in EngineTest Change 3355627 on 2017/03/20 by Arne.Schober DR - [UE-43094] - removed ensure in comporiton graph as control of the clear color cannot be gurantueed. Change 3356342 on 2017/03/21 by Marcus.Wassmer Fix clang errors Change 3356591 on 2017/03/21 by Arne.Schober DR - Fix ensure message #RB none Change 3356873 on 2017/03/21 by Arne.Schober DR - Fix comparission of undefined values in RendertargetApply Check Change 3357261 on 2017/03/21 by Marcus.Wassmer Fix LinuxEditor compile Change 3357294 on 2017/03/21 by Marcus.Wassmer Add missing SSE functions Change 3357351 on 2017/03/21 by Frank.Fella Fix win32 and linux compiler errors Change 3357370 on 2017/03/21 by Arne.Schober DR - disable ensure in test builds #RB Marcus.Wassmer [CL 3357449 by Marcus Wassmer in Main branch]
2017-03-21 17:46:52 -04:00
FGlobalShader::SetParameters<FViewUniformShaderParameters>(RHICmdList, ShaderRHI, View.ViewUniformBuffer);
Copying //UE4/Dev-Rendering to Dev-Main (Source //UE4/Dev-Rendering@2932636) #lockdown nick.penwarden ========================== MAJOR FEATURES + CHANGES ========================== Change 2917472 on 2016/03/21 by Rolando.Caloca DR - Fix SCW directcompile arguments, add -pipeline Change 2919580 on 2016/03/23 by Rolando.Caloca DR - HlslParser - Fix for used elements (sparrow's arrow was showing when it shouldn't) Arrays of input/outputs are now flattened so disjoint entries can be optimized out (and fixes a bug) #jira OR-15380 #tests Run game with sparrow, test with slomo to check for gfx glitches Change 2919660 on 2016/03/23 by Rolando.Caloca DR - Latest vk changes (from dev mobile's 2916881 to 2919157) Change 2919902 on 2016/03/23 by Rolando.Caloca DR - Fix skeletal meshes decrementing stats twice #codereview Marcus.Wassmer #jira UE-28478 Change 2920020 on 2016/03/23 by David.Hill #Jira UE-28503 EyeAdapation when used in material shader may not be initialized. #rb olaf.piesche Change 2920071 on 2016/03/23 by Rolando.Caloca DR - Remove old vk define - Started moving around direct calls to queue submit Change 2920252 on 2016/03/23 by Rolando.Caloca DR - Changes vk structs to classes Change 2920314 on 2016/03/23 by Olaf.Piesche Add -windowed to standalone game PIE command line to avoid PIE launching in full screen #jira UE-27870 #codereview michael.trepka Change 2920745 on 2016/03/24 by Uriel.Doyon Texture streaming build now takes into account the material texcoord scales applied to the texture sampling. Also finds out which texcoord is being used when sampling textures (between 0 and 3 currently). TexCoord analysis debug view shaders is now working with SM4 ane SM5. StaticMeshComponents hold persistent data coming from the texture streaming build. #tests tested with different Paragon assets. Editor SM4 & SM5. Cooked maps #codereview marcus.wassmer Change 2921335 on 2016/03/24 by Uriel.Doyon Added missing static keyword for locally defined console variable. #codereview rolando.caloca Change 2921416 on 2016/03/24 by Uriel.Doyon Revert enabling debugview shaders on non PC platforms (until properly tested and debugged) Change 2921446 on 2016/03/24 by Daniel.Wright Planar reflection mesh Change 2921530 on 2016/03/24 by Daniel.Wright Manual revert of Ronin planar reflections #codereview Ryan.Vance Change 2921608 on 2016/03/24 by Uriel.Doyon Updated texture streamer to take into account the new HLOD texture group. Change 2921677 on 2016/03/24 by Daniel.Wright Distance Field Specular Occlusion * Prototype - disabled by default Change 2921681 on 2016/03/24 by Daniel.Wright UnmappedTexelsPercentage is now 100 based Change 2921682 on 2016/03/24 by Daniel.Wright Planar reflections * New actor and component * The scene is rendered to texture with a mirrored camera and a clip plane each frame * The reflection texture is then applied to opaque pixels in a deferred pass, with distance and angle from plane fades * Translucent materials apply the nearest reflection plane in the base pass * Planar reflections require the project setting 'Support global clip plane for Planar Reflections' to be enabled, since writing to SV_ClipDistance all the time adds about 15% BasePass GPU time on PS4 * Fixed global distance field in materials which had been broken since moving global distance field properties into the view uniform buffer * Fixed PS4 removing system-value semantics when output from vertex shader and not read in next stage Change 2921734 on 2016/03/24 by Uriel.Doyon Fixed tessellated cube having wrong UVs #jira UE-28379 Change 2922063 on 2016/03/24 by Daniel.Wright Removed planar reflection debug code Change 2922428 on 2016/03/25 by Chris.Bunner Delete FShaderPipeline objects when clearing TMaterialShaderMaps. #rb Rolando.Caloca #jira UE-28621 Change 2922803 on 2016/03/25 by Rolando.Caloca DR - New cmd buffer management (disabled) - Move cmd buffer out of pending state and into context - Do not hardcode # cmd buffers - Move back buffer image mgmt into swapchain - Fixed some image layout transition bugs Change 2923056 on 2016/03/25 by Rolando.Caloca DR - Initial fix for canvas locking inside a render pass [CL 2932649 by Gil Gribb in Main branch]
2016-04-04 18:44:59 -04:00
GlobalDistanceFieldParameters.Set(RHICmdList, ShaderRHI, GlobalDistanceFieldInfo.ParameterData);
const FSceneRenderTargetItem& ClipMapRTI = GlobalDistanceFieldInfo.Clipmaps[ClipmapIndexValue].RenderTarget->GetRenderTargetItem();
RHICmdList.TransitionResource(EResourceTransitionAccess::ERWBarrier, EResourceTransitionPipeline::EComputeToCompute, ClipMapRTI.UAV);
GlobalDistanceFieldTexture.SetTexture(RHICmdList, ShaderRHI, ClipMapRTI.ShaderResourceTexture, ClipMapRTI.UAV);
SetShaderValue(RHICmdList, ShaderRHI, UpdateRegionSize, UpdateRegion.CellsSize);
SetShaderValue(RHICmdList, ShaderRHI, VolumeTexelSize, FVector(1.0f / GAOGlobalDFResolution));
SetShaderValue(RHICmdList, ShaderRHI, UpdateRegionVolumeMin, UpdateRegion.Bounds.Min);
SetShaderValue(RHICmdList, ShaderRHI, ClipmapIndex, ClipmapIndexValue);
extern float GAOConeHalfAngle;
const float GlobalMaxSphereQueryRadius = MaxOcclusionDistance / (1 + FMath::Tan(GAOConeHalfAngle));
SetShaderValue(RHICmdList, ShaderRHI, AOGlobalMaxSphereQueryRadius, GlobalMaxSphereQueryRadius);
HeightfieldDescriptionParameters.Set(RHICmdList, ShaderRHI, GetHeightfieldDescriptionsSRV(), NumHeightfieldsValue);
HeightfieldTextureParameters.Set(RHICmdList, ShaderRHI, HeightfieldTextureValue, NULL);
}
void UnsetParameters(FRHICommandList& RHICmdList, const FGlobalDistanceFieldInfo& GlobalDistanceFieldInfo, int32 ClipmapIndexValue)
{
GlobalDistanceFieldTexture.UnsetUAV(RHICmdList, GetComputeShader());
const FSceneRenderTargetItem& ClipMapRTI = GlobalDistanceFieldInfo.Clipmaps[ClipmapIndexValue].RenderTarget->GetRenderTargetItem();
RHICmdList.TransitionResource(EResourceTransitionAccess::EReadable, EResourceTransitionPipeline::EComputeToCompute, ClipMapRTI.UAV);
}
virtual bool Serialize(FArchive& Ar)
{
bool bShaderHasOutdatedParameters = FGlobalShader::Serialize(Ar);
Ar << GlobalDistanceFieldParameters;
Ar << GlobalDistanceFieldTexture;
Ar << UpdateRegionSize;
Ar << VolumeTexelSize;
Ar << UpdateRegionVolumeMin;
Ar << ClipmapIndex;
Ar << AOGlobalMaxSphereQueryRadius;
Ar << HeightfieldDescriptionParameters;
Ar << HeightfieldTextureParameters;
return bShaderHasOutdatedParameters;
}
private:
FGlobalDistanceFieldParameters GlobalDistanceFieldParameters;
FRWShaderParameter GlobalDistanceFieldTexture;
FShaderParameter UpdateRegionSize;
FShaderParameter VolumeTexelSize;
FShaderParameter UpdateRegionVolumeMin;
FShaderParameter ClipmapIndex;
FShaderParameter AOGlobalMaxSphereQueryRadius;
FHeightfieldDescriptionParameters HeightfieldDescriptionParameters;
FHeightfieldTextureParameters HeightfieldTextureParameters;
};
IMPLEMENT_SHADER_TYPE(, FCompositeHeightfieldsIntoGlobalDistanceFieldCS, TEXT("GlobalDistanceField"), TEXT("CompositeHeightfieldsIntoGlobalDistanceFieldCS"), SF_Compute);
extern void UploadHeightfieldDescriptions(const TArray<FHeightfieldComponentDescription>& HeightfieldDescriptions, FVector2D InvLightingAtlasSize, float InvDownsampleFactor);
void FHeightfieldLightingViewInfo::CompositeHeightfieldsIntoGlobalDistanceField(
FRHICommandList& RHICmdList,
const FScene* Scene,
const FViewInfo& View,
float MaxOcclusionDistance,
const FGlobalDistanceFieldInfo& GlobalDistanceFieldInfo,
int32 ClipmapIndexValue,
const FVolumeUpdateRegion& UpdateRegion) const
{
const int32 NumPrimitives = Scene->DistanceFieldSceneData.HeightfieldPrimitives.Num();
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3185985) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3170391 on 2016/10/21 by Ben.Woodhouse Remove the wait on end of frame ensure, because we can't rely on all the the underlying codepaths to never miss a call to flush RHI resources. The consequences of missing a flush on a given frame are not serious now, since we enforce the synchronisation with a fence, preventing the rendering thread from getting too far ahead. We will simply accumulate resources for an additional frame when this happens. #jira UE-37437 #fyi rolando.caloca, marcus.wassmer Change 3170659 on 2016/10/21 by Rolando.Caloca DR - vk - Prep work for state key changes Change 3170676 on 2016/10/21 by Rolando.Caloca DR - vk - Reworked blend state keys - Added depth/stencil to pipeline key Change 3170848 on 2016/10/21 by Daniel.Wright Level viewport 'show stats' option is now enabled by default, which avoids confusion with artists thinking lighting is built, when really the message is hidden. Change 3170849 on 2016/10/21 by Daniel.Wright Split FProjectedShadowInfo::RenderProjection into smaller functions which make the algorithm structure clear Change 3170995 on 2016/10/21 by Rolando.Caloca DR - vk - Show object on vulkan validation msgs Change 3171085 on 2016/10/21 by Rolando.Caloca DR - vk - Fix pipelines being used with incompatible renderpasses Change 3171159 on 2016/10/21 by Rolando.Caloca DR - vk - Fix layout when reading textures on CPU Change 3171167 on 2016/10/21 by Rolando.Caloca DR - vk - compile fix Change 3172462 on 2016/10/24 by Daniel.Wright Added a warning about shader compile times to the material tooltip Change 3172463 on 2016/10/24 by Daniel.Wright Reduced MinUnoccludedFraction to avoid artitfacts when a stationary light touches only a tiny part of a mesh Change 3172716 on 2016/10/24 by Brian.Karis Fix for crash UE-37369 when reimporting over a generated LOD. Change 3172967 on 2016/10/24 by Rolando.Caloca DR - vk - Fix writing buffers while GPU was using them Change 3174187 on 2016/10/25 by Olaf.Piesche UE-37020 Change 3174718 on 2016/10/26 by Rolando.Caloca DR - vk - Remove old timestamp queries, increase occlusion queries per pool to 4k Change 3175960 on 2016/10/26 by Rolando.Caloca DR - Added support for hlslcc header to have custom parsing Change 3176611 on 2016/10/27 by David.Hill DrawWireCone confusion: In response to a UDN, I'm updating confusing parameter names and comments for DrawWireCone() and DrawWireSphereCappedCone() Change 3177111 on 2016/10/27 by Rolando.Caloca DR - vk - Fix timestamps for frame Change 3177192 on 2016/10/27 by Arne.Schober DR - DitherLOD refactor - moved computation of the DepthStencil state out of SetMeshRenderState into GetDitheredLODTransitionState this is a prerequisite of further PSO work where we want to move up State setting in a similar war and reuse FMeshDrawingRenderState Change 3177278 on 2016/10/27 by Olaf.Piesche UE-37484 Change 3177297 on 2016/10/27 by Rolando.Caloca DR - vk - Enable GRHISupportsBaseVertexIndex Change 3177607 on 2016/10/27 by Rolando.Caloca DR - vk - SM4 UB prep Change 3178052 on 2016/10/28 by Arne.Schober DR - fix WebGL - the WebGL compiler is very picky on double underscores and does want the presission to be defined before any function definition. Change 3178156 on 2016/10/28 by Rolando.Caloca DR - vk - Added query timer - Fixed inline issues Change 3178158 on 2016/10/28 by Rolando.Caloca DR - vk - Fixes for out of stencil bits Change 3178462 on 2016/10/28 by Rolando.Caloca DR - vk - Fixes for Elemental Change 3179131 on 2016/10/28 by Rolando.Caloca DR - vk - Fix for r.Vulkan.UseRealUBs Change 3179139 on 2016/10/28 by Rolando.Caloca DR - vk - Move UB ring buffer to context Change 3179145 on 2016/10/28 by Rolando.Caloca DR - vk - Fix buffer barriers Change 3179888 on 2016/10/31 by Rolando.Caloca DR - vk - Align buffers to 16 bytes as we sometimes write to them with SIMD Change 3179923 on 2016/10/31 by Rolando.Caloca DR - vk - Wait for swapchain counter Change 3180430 on 2016/10/31 by Rolando.Caloca DR - vk - Properly wait for occlusion queries/cmd buffer - Actual log error if trying to use occlusion queries out of order Change 3180746 on 2016/10/31 by Rolando.Caloca DR - vk - Undo some waiting as it was on the wrong thread Change 3182115 on 2016/11/01 by Rolando.Caloca DR - hlslcc Linux path fix Change 3182118 on 2016/11/01 by Daniel.Wright Fixed global distance field seam artifacts from landscapes with no subsections Change 3182368 on 2016/11/01 by Daniel.Wright Dynamic Indirect Shadows for static meshes using distance fields * These Distance Field indirect shadows use the same tile culled and downsampled framework that capsule shadows use, with similar GPU cost * Individual StaticMesh assets can enable bGenerateMeshDistanceField to compute a distance field, without the memory cost of enabling for the whole project * New StaticMeshComponent properties bCastDynamicIndirectShadow and DynamicIndirectShadowMinVisibility * New WorldSettings property DynamicIndirectShadowsSelfShadowingIntensity which replaces the cvar * The GBuffer now stores HasDynamicIndirectShadowCasterRepresentation instead of HasHeightfieldRepresentation * DFAO from landscape is now done through the global distance field entirely. Landscape contribution to the global distance field is deferred to attempt to workaround texture streaming issues. Change 3182408 on 2016/11/01 by Rolando.Caloca DR - vk - Reworked occlusion queries, fixes flickering on AMD Change 3182585 on 2016/11/01 by Daniel.Wright PS4 compile fix Change 3183151 on 2016/11/02 by Rolando.Caloca DR - vk - Fix issue when processing super quick cmd buffers Change 3183160 on 2016/11/02 by Rolando.Caloca Dr - vk - Call reset queries outside render pass Change 3183182 on 2016/11/02 by Rolando.Caloca DR - Switch clear Change 3183194 on 2016/11/02 by Rolando.Caloca DR - Try to catch crash ahead of time Change 3183268 on 2016/11/02 by Rolando.Caloca DR - vk - Rename RenderPassState to TransitionState Change 3183440 on 2016/11/02 by Daniel.Wright Renamed 'Dynamic Indirect Shadow' to 'Distance Field Indirect Shadow' Change 3183793 on 2016/11/02 by Daniel.Wright Added ShadowResolutionScale to lightcomponent Change 3183796 on 2016/11/02 by Daniel.Wright Improved bSimulatePhysics comment, with info on why it might be greyed out Change 3183797 on 2016/11/02 by Daniel.Wright Precomputed shadowmaps no longer enable Force2To1Aspect, which is only needed for lightmaps. Improves shadowmap utilization. Change 3183915 on 2016/11/02 by Rolando.Caloca DR - vk - Remove redundant renderpasses Change 3183991 on 2016/11/02 by Daniel.Wright Added r.ReflectionEnvironmentLightmapMixLargestWeight, useful for restricting lightmap mixing to darkening only Change 3184001 on 2016/11/02 by Daniel.Wright Better draw event for IndirectCapsuleShadows in stereo Change 3184096 on 2016/11/02 by Chris.Bunner HDR for D3D11 - NVAPI toggle and encoding, UI compositing. Removed some outdated tonemamping cvars and modes. Change 3184399 on 2016/11/02 by Daniel.Wright Static analysis workaround Change 3184455 on 2016/11/02 by Mark.Satterthwaite Fix missing log10 from FCompositePS on hlslcc shader platforms so that QA can continue their integration. #jira UE-38164 Change 3184953 on 2016/11/03 by Chris.Bunner Fixing CIS warnings. [CL 3186011 by Marcus Wassmer in Main branch]
2016-11-03 16:55:27 -04:00
if (GAOGlobalDistanceFieldRepresentHeightfields
Copying //UE4/Dev-Rendering to Dev-Main (Source //UE4/Dev-Rendering@2932636) #lockdown nick.penwarden ========================== MAJOR FEATURES + CHANGES ========================== Change 2917472 on 2016/03/21 by Rolando.Caloca DR - Fix SCW directcompile arguments, add -pipeline Change 2919580 on 2016/03/23 by Rolando.Caloca DR - HlslParser - Fix for used elements (sparrow's arrow was showing when it shouldn't) Arrays of input/outputs are now flattened so disjoint entries can be optimized out (and fixes a bug) #jira OR-15380 #tests Run game with sparrow, test with slomo to check for gfx glitches Change 2919660 on 2016/03/23 by Rolando.Caloca DR - Latest vk changes (from dev mobile's 2916881 to 2919157) Change 2919902 on 2016/03/23 by Rolando.Caloca DR - Fix skeletal meshes decrementing stats twice #codereview Marcus.Wassmer #jira UE-28478 Change 2920020 on 2016/03/23 by David.Hill #Jira UE-28503 EyeAdapation when used in material shader may not be initialized. #rb olaf.piesche Change 2920071 on 2016/03/23 by Rolando.Caloca DR - Remove old vk define - Started moving around direct calls to queue submit Change 2920252 on 2016/03/23 by Rolando.Caloca DR - Changes vk structs to classes Change 2920314 on 2016/03/23 by Olaf.Piesche Add -windowed to standalone game PIE command line to avoid PIE launching in full screen #jira UE-27870 #codereview michael.trepka Change 2920745 on 2016/03/24 by Uriel.Doyon Texture streaming build now takes into account the material texcoord scales applied to the texture sampling. Also finds out which texcoord is being used when sampling textures (between 0 and 3 currently). TexCoord analysis debug view shaders is now working with SM4 ane SM5. StaticMeshComponents hold persistent data coming from the texture streaming build. #tests tested with different Paragon assets. Editor SM4 & SM5. Cooked maps #codereview marcus.wassmer Change 2921335 on 2016/03/24 by Uriel.Doyon Added missing static keyword for locally defined console variable. #codereview rolando.caloca Change 2921416 on 2016/03/24 by Uriel.Doyon Revert enabling debugview shaders on non PC platforms (until properly tested and debugged) Change 2921446 on 2016/03/24 by Daniel.Wright Planar reflection mesh Change 2921530 on 2016/03/24 by Daniel.Wright Manual revert of Ronin planar reflections #codereview Ryan.Vance Change 2921608 on 2016/03/24 by Uriel.Doyon Updated texture streamer to take into account the new HLOD texture group. Change 2921677 on 2016/03/24 by Daniel.Wright Distance Field Specular Occlusion * Prototype - disabled by default Change 2921681 on 2016/03/24 by Daniel.Wright UnmappedTexelsPercentage is now 100 based Change 2921682 on 2016/03/24 by Daniel.Wright Planar reflections * New actor and component * The scene is rendered to texture with a mirrored camera and a clip plane each frame * The reflection texture is then applied to opaque pixels in a deferred pass, with distance and angle from plane fades * Translucent materials apply the nearest reflection plane in the base pass * Planar reflections require the project setting 'Support global clip plane for Planar Reflections' to be enabled, since writing to SV_ClipDistance all the time adds about 15% BasePass GPU time on PS4 * Fixed global distance field in materials which had been broken since moving global distance field properties into the view uniform buffer * Fixed PS4 removing system-value semantics when output from vertex shader and not read in next stage Change 2921734 on 2016/03/24 by Uriel.Doyon Fixed tessellated cube having wrong UVs #jira UE-28379 Change 2922063 on 2016/03/24 by Daniel.Wright Removed planar reflection debug code Change 2922428 on 2016/03/25 by Chris.Bunner Delete FShaderPipeline objects when clearing TMaterialShaderMaps. #rb Rolando.Caloca #jira UE-28621 Change 2922803 on 2016/03/25 by Rolando.Caloca DR - New cmd buffer management (disabled) - Move cmd buffer out of pending state and into context - Do not hardcode # cmd buffers - Move back buffer image mgmt into swapchain - Fixed some image layout transition bugs Change 2923056 on 2016/03/25 by Rolando.Caloca DR - Initial fix for canvas locking inside a render pass [CL 2932649 by Gil Gribb in Main branch]
2016-04-04 18:44:59 -04:00
&& NumPrimitives > 0
Copying //UE4/Dev-Platform to //UE4/Main (Source: //UE4/Dev-Platform @ 2945165) ========================== MAJOR FEATURES + CHANGES ========================== Change 2731596 on 2015/10/16 by Keith.Judge Fast Semantics - Add deferred resource deletion queue to make deleted resources be actually deleted a number of frames later so that the GPU is definitely finished with them. Hooked up the temporary SRVs for dynamic VBs as a first step. Change 2764244 on 2015/11/12 by Keith.Judge Stop creating/deleting SRVs for dynamic buffers all the time. Instead add a new view when a new buffer is cycled and keep them around until the buffer is dead. Change 2936695 on 2016/04/07 by Mark.Satterthwaite Disable Metal on Mac OS X versions prior to 10.11.4 because their drivers are too buggy and lots of issues users will encounter were fixed in 10.11.4. Change 2936701 on 2016/04/07 by Mark.Satterthwaite Re-enable DistanceFieldAO on Mac Metal, but disable the support for Heightfield composition into the DistanceFieldAO as that currently requires Read+Write texture support which isn't present in Metal yet. Change 2936702 on 2016/04/07 by Mark.Satterthwaite Enable capsule shadows on Metal SM5 for Mac Paragon. Change 2936704 on 2016/04/07 by Mark.Satterthwaite Re-add Metal_MacES3_1 entries in RHIDefinition functions - not quite sure how they disappeared... Change 2936706 on 2016/04/07 by Mark.Satterthwaite Fix implementation of BeginRenderingPrePass to only clear the depth+stencil buffers when it is asked to - this fixes LOD fading in Paragon on Mac when parallel execution is enabled. The parallel contexts were being instructed to clear the pre-pass data which was then fouling up all the subsequent rendering. Change 2936708 on 2016/04/07 by Mark.Satterthwaite Extend workaround from 2905206 for UE-27818 to Metal as well as OpenGL - it isn't permissable to have an unbound resource on these APIs as you aren't allowed to make the assumption than an if-branch will protect against null dereference on the GPU. Change 2936737 on 2016/04/07 by Mark.Satterthwaite Changes to reduce Metal's peak memory usage when streaming texture data in when loading levels: - When uploading data to Shared memory Metal textures dispose of the source buffer immediately as it won't be required after the call to replaceRegion. - Add an optional CVar ("rhi.Metal.MaxOutstandingAsyncTexUploads") and keep a count of outstanding async. texture uploads - really this might be better as a memory count - if teh current outstanding count is greater than the specified CVar wait for all operations to complete before purging the used buffers. By default the CVar is set to 0 which disables the tracking and the game will use all available memory. Change 2936741 on 2016/04/07 by Mark.Satterthwaite For Metal don't use a texture-view unless the mip-range or texture format is different - we only create a texture-view if we absolutely have to in order to avoid performance reductions on some vendors. When this happens we'll have to update the SRV to point to the new source texture but that can be handled on bind. Change 2936753 on 2016/04/07 by Mark.Satterthwaite Mutex the pipeline cache inside each Metal bound-shader-state - with parallel execution this can be modified from multiple threads. Change 2936758 on 2016/04/07 by Mark.Satterthwaite Don't eliminate redundant Metal command-encoders on Intel either - this only seems to work correctly on AMD... Change 2936762 on 2016/04/07 by Mark.Satterthwaite Validate that parallel contexts aren't asked to clear render-targets (incl depth/stencil) as this won't work as desired - each context will end up clearing the contents meaning none of the rendering will propagate which is probably not what you intend. Change 2936765 on 2016/04/07 by Mark.Satterthwaite Fix alignment of blit-commands between textures & buffers in Metal. Mac Metal has no minimum alignment and allows tightly packed rows, but mobile H/W has a fixed row alignment of 64-bytes for linear texture data. Change 2936767 on 2016/04/07 by Mark.Satterthwaite Remove the workaround for old Mac OS X AMD cards not supporting frame-buffer sRGB - I have a feeling it causes more problems than it solves. Change 2936773 on 2016/04/07 by Mark.Satterthwaite Warn the user when trying to run on OpenGL 3.2 GPUs that they are running unsupported and there may be rendering errors and performance problems. Change 2936777 on 2016/04/07 by Mark.Satterthwaite In MetalRHI move the calls that reset the state-cache and command-encoder to EndFrame and fix the fallout. Hopefully this will eliminate the problem where a draw call fails because a call to SetRenderTargets hasn't been made since the last time the command-buffer was submitted. Change 2936788 on 2016/04/07 by Daniel.Lamb PR #2193: Commandlet tweaks: Fixup Redirects and Cook-on-the-fly Server (Contributed by FineRedMist) Change 2936799 on 2016/04/07 by Mark.Satterthwaite Automatically enable RHI thread support in Metal if parallel execution is compiled in unless running the Editor which doesn't support it. The RHI thread is only enabled in Metal when parallel execution is available because there's little to no advantages to the RHI thread with Metal unless parallel execution is enabled. Also only enable async compute support when parallel execution is available and running on an AMD GPU as only AMD support it and again its only really useful when running things in parallel. Disable the async. compute fencing code when async. compute is disabled. Change 2936923 on 2016/04/07 by Peter.Sauerbrei add exception when trying to access an ini section which we don't cache Change 2937839 on 2016/04/08 by Lee.Clark PS4 - BC6H and BC7 support #jira OR-14804 Change 2937841 on 2016/04/08 by Lee.Clark Fix media player not opening the same file that has just been closed. Change 2937843 on 2016/04/08 by Lee.Clark PS4 - Fix Media player seek and duration Change 2938272 on 2016/04/08 by Mark.Satterthwaite Change the Metal texture lock/unlock code to use the immediate getBytes API to read from the texture unless it is stored in Private memory and requires we use the blit-encoder. This means iOS texture locking won't generally need to use the blit encoder with its row alignment restrictions as all textures will be Shared. On Mac most textures will still be accessed via blit operations since Private memory is much more common than Managed. Change 2938471 on 2016/04/08 by Nick.Shin load/save with "negative char size value" fixed two 'U''s were missing-- causing base64encoder and decoder to not return original data (or expected data for that matter)... C++ function shows these signatures: \Engine\Source\Runtime\Engine\Public\SaveGameSystem.h FGenericSaveGameSystem::SaveGame(..., ..., ..., ...<uint8>); FGenericSaveGameSystem::LoadGame(..., ..., ..., ...<uint8>); javascript version was using signed operators... answerhub 386719 - broken load/save Change 2938546 on 2016/04/08 by Nick.Shin upgrading zlib, openssl, libcurl & libwebsockets (part 1, for libwebsockets -- engine changes will be in part 2) and writting down all of the build instructions into a single script file future tasks may have the script broken to smaller sized files and placed in the respective library folders for simplicity -- for now, it's bundled together to ensure build is all-in-one... #jira UEPLAT-1246 - Update libWebsockets #jira UEPLAT-1221 - update websocket library #jira UEPLAT-1223 - Arrange testing for 'update websocket library' #jira UEPLAT-1203 - Add Linux library for libwebsockets #jira UEPLAT-1204 - Rebuild libwebsockets with SSL Change 2939402 on 2016/04/11 by Mark.Satterthwaite No need to call synchroniseTexture on iOS when locking a texture for read - that function call is only available and required on OS X. Change 2939526 on 2016/04/11 by Daniel.Lamb Don't mark content as modified when garbage collecting. #PR2249 #2249 Change 2940094 on 2016/04/11 by Michael.Trepka Temporarily rollback //UE4/Dev-Platform/Engine/Source/Runtime/Apple/MetalRHI/Private/MetalContext.cpp to revision 47 to unblock UE-29312 #codereview Mark.Satterthwaite Change 2940540 on 2016/04/12 by Jeff.Campeau Hardware decompress support for Xbox One. Change 2940793 on 2016/04/12 by Lee.Clark PS4 - Fix file handle leaks #codereview Marcus.Wassmer Change 2940903 on 2016/04/12 by Keith.Judge Xbox One - Fix validated driver warning when using async compute with dynamic buffers (remove dynamic buffers). Also resurrected the old deferred deletion queue (though simplified the code a lot as TQueue was horrible), as the platform agnostic one can't deal with placement created resources with a separate buffer/resource. Moved associated deferred deletion logic out of FD3D11DynamicBuffer. Attempted fix for UE-28758, but doesn't actually fix it. :( Change 2940959 on 2016/04/12 by Michael.Trepka Moved RadioEffectUnit CoreAudio plugin to Engine/Build/Mac #codereview Ben.Marsh Change 2940961 on 2016/04/12 by Michael.Trepka Updated GitHub readme for Mac with instructions for building ShaderCompileWorker #codereview Ben.Marsh Change 2941010 on 2016/04/12 by Daniel.Lamb Added xboxone.projectsettings to the ini processer in UAT. #codereview Peter.Sauerbrei #jira UE-29344 Change 2941671 on 2016/04/12 by Jeff.Campeau March 2016 XDK update Change 2941998 on 2016/04/13 by Mark.Satterthwaite In MetalCommandList retain/release when waiting on a command-buffer commit to ensure command-buffer lifetime. Change 2942008 on 2016/04/13 by Keith.Judge Fix black reflections when async compute is enabled. Shader resource tables are now properly bound on the context, and fixed a bug where the buffer offset and number of constants wasn't being properly set when using the ring buffer. #jira UE-28758 Change 2942046 on 2016/04/13 by Lee.Clark Fix SN-DBS launch failure with VS2015 - (path env seems to be null in 2015) Change 2942095 on 2016/04/13 by Mark.Satterthwaite MIssed a change to use AlignedStride rather than Stride in FMetalDynamicRHI::RHIReadSurfaceFloatData to fix iOS blit-encoder read-back. Change 2942383 on 2016/04/13 by Mark.Satterthwaite When locking a Metal texture on Mac you have to submit the command buffer & wait in a different location for Private vs. Managed access. Change 2942419 on 2016/04/13 by Michael.Trepka Enabled IPv6 on iOS and Mac #jira UEPLAT-1168 Change 2942472 on 2016/04/13 by Daniel.Lamb Fixed missing OnlineSubsystemGooglePlay section from UAT ini processing. #codereview Peter.Sauerbrei Change 2942829 on 2016/04/13 by Nick.Shin removing unused emscripten - ThirdParty folder the ThirdParty folder contained a lot of GPL (and sometimes missing) licensing projects - removing to keep legal happy tested with full rebuild, full cook, packaging and running QAGame Change 2943110 on 2016/04/13 by Chris.Babcock Fix Oculus mobile binaries filtered (include the jars) #jira ue-29080 #ue4 #android #gearvr #codereview Nick.Whiting Change 2943111 on 2016/04/13 by Chris.Babcock Fix GearVR plugin jar copy logic #jira UE-29108 #ue4 #android #gearvr #codereview Nick.Whiting Change 2943579 on 2016/04/14 by Peter.Sauerbrei enable tvOS in the binary build Change 2943587 on 2016/04/14 by Peter.Sauerbrei creation of xcode archives when using the archive command with iOS Change 2943619 on 2016/04/14 by Mark.Satterthwaite Make Metal SM4 the default on Mac again for the 4.12 release. As expected there are many bugs to resolve. Change 2943869 on 2016/04/14 by Daniel.Lamb Turned duplicate stage of files into warning. #codereview Peter.Sauerbrei #lockdown Josh.Adams #jira UE-29487 Change 2943901 on 2016/04/14 by Keith.Judge Xbox One - Fix start up crash. #lockdown Josh.Adams Change 2943981 on 2016/04/14 by Mark.Satterthwaite Allow RWBuffers in Metal, which should work, but not RWTextures which aren't yet supported. #jira UE-29492 #lockdown josh.adams Change 2944080 on 2016/04/14 by Peter.Sauerbrei temporarily remove code which removes passwords in ini files which reside in the pak file multiple defaultengine.ini files were being copied over one another #lockdown josh.adams Change 2944249 on 2016/04/14 by Peter.Sauerbrei fix for writing out DefaulEngine.ini when deprecated items are not changed #codereview zabir.hoque #lockdown josh.adams Change 2944454 on 2016/04/14 by Peter.Sauerbrei fix for incorrect archive directory and missing IPA #jira UE-29509 #lockdown josh.adams Change 2944834 on 2016/04/15 by Peter.Sauerbrei fix for crash on some iOS device due to resources being freed while still in use by the GPU #jira UE-29517 #lockdown josh.adams Change 2944915 on 2016/04/15 by Peter.Sauerbrei fix for including Facebook SDK in tvOS since we don't have the Facebook libraries for tvOS yet #lockdown josh.adams Change 2945164 on 2016/04/15 by Josh.Adams - Removing XboxOnePDBFile from junkmanifest, causes too much trouble when staging using AutoSDKs #lockdown nick.penwarden #codereview jeff.campeau #lockdown nick.penwarden [CL 2945174 by Josh Adams in Main branch]
2016-04-15 10:04:09 -04:00
&& SupportsDistanceFieldAO(Scene->GetFeatureLevel(), Scene->GetShaderPlatform())
&& !IsMetalPlatform(Scene->GetShaderPlatform()))
Copying //UE4/Dev-Rendering to Dev-Main (Source //UE4/Dev-Rendering@2932636) #lockdown nick.penwarden ========================== MAJOR FEATURES + CHANGES ========================== Change 2917472 on 2016/03/21 by Rolando.Caloca DR - Fix SCW directcompile arguments, add -pipeline Change 2919580 on 2016/03/23 by Rolando.Caloca DR - HlslParser - Fix for used elements (sparrow's arrow was showing when it shouldn't) Arrays of input/outputs are now flattened so disjoint entries can be optimized out (and fixes a bug) #jira OR-15380 #tests Run game with sparrow, test with slomo to check for gfx glitches Change 2919660 on 2016/03/23 by Rolando.Caloca DR - Latest vk changes (from dev mobile's 2916881 to 2919157) Change 2919902 on 2016/03/23 by Rolando.Caloca DR - Fix skeletal meshes decrementing stats twice #codereview Marcus.Wassmer #jira UE-28478 Change 2920020 on 2016/03/23 by David.Hill #Jira UE-28503 EyeAdapation when used in material shader may not be initialized. #rb olaf.piesche Change 2920071 on 2016/03/23 by Rolando.Caloca DR - Remove old vk define - Started moving around direct calls to queue submit Change 2920252 on 2016/03/23 by Rolando.Caloca DR - Changes vk structs to classes Change 2920314 on 2016/03/23 by Olaf.Piesche Add -windowed to standalone game PIE command line to avoid PIE launching in full screen #jira UE-27870 #codereview michael.trepka Change 2920745 on 2016/03/24 by Uriel.Doyon Texture streaming build now takes into account the material texcoord scales applied to the texture sampling. Also finds out which texcoord is being used when sampling textures (between 0 and 3 currently). TexCoord analysis debug view shaders is now working with SM4 ane SM5. StaticMeshComponents hold persistent data coming from the texture streaming build. #tests tested with different Paragon assets. Editor SM4 & SM5. Cooked maps #codereview marcus.wassmer Change 2921335 on 2016/03/24 by Uriel.Doyon Added missing static keyword for locally defined console variable. #codereview rolando.caloca Change 2921416 on 2016/03/24 by Uriel.Doyon Revert enabling debugview shaders on non PC platforms (until properly tested and debugged) Change 2921446 on 2016/03/24 by Daniel.Wright Planar reflection mesh Change 2921530 on 2016/03/24 by Daniel.Wright Manual revert of Ronin planar reflections #codereview Ryan.Vance Change 2921608 on 2016/03/24 by Uriel.Doyon Updated texture streamer to take into account the new HLOD texture group. Change 2921677 on 2016/03/24 by Daniel.Wright Distance Field Specular Occlusion * Prototype - disabled by default Change 2921681 on 2016/03/24 by Daniel.Wright UnmappedTexelsPercentage is now 100 based Change 2921682 on 2016/03/24 by Daniel.Wright Planar reflections * New actor and component * The scene is rendered to texture with a mirrored camera and a clip plane each frame * The reflection texture is then applied to opaque pixels in a deferred pass, with distance and angle from plane fades * Translucent materials apply the nearest reflection plane in the base pass * Planar reflections require the project setting 'Support global clip plane for Planar Reflections' to be enabled, since writing to SV_ClipDistance all the time adds about 15% BasePass GPU time on PS4 * Fixed global distance field in materials which had been broken since moving global distance field properties into the view uniform buffer * Fixed PS4 removing system-value semantics when output from vertex shader and not read in next stage Change 2921734 on 2016/03/24 by Uriel.Doyon Fixed tessellated cube having wrong UVs #jira UE-28379 Change 2922063 on 2016/03/24 by Daniel.Wright Removed planar reflection debug code Change 2922428 on 2016/03/25 by Chris.Bunner Delete FShaderPipeline objects when clearing TMaterialShaderMaps. #rb Rolando.Caloca #jira UE-28621 Change 2922803 on 2016/03/25 by Rolando.Caloca DR - New cmd buffer management (disabled) - Move cmd buffer out of pending state and into context - Do not hardcode # cmd buffers - Move back buffer image mgmt into swapchain - Fixed some image layout transition bugs Change 2923056 on 2016/03/25 by Rolando.Caloca DR - Initial fix for canvas locking inside a render pass [CL 2932649 by Gil Gribb in Main branch]
2016-04-04 18:44:59 -04:00
{
FHeightfieldDescription UpdateRegionHeightfield;
float LocalToWorldScale = 1;
for (int32 HeightfieldPrimitiveIndex = 0; HeightfieldPrimitiveIndex < NumPrimitives; HeightfieldPrimitiveIndex++)
{
const FPrimitiveSceneInfo* HeightfieldPrimitive = Scene->DistanceFieldSceneData.HeightfieldPrimitives[HeightfieldPrimitiveIndex];
const FBoxSphereBounds& PrimitiveBounds = HeightfieldPrimitive->Proxy->GetBounds();
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3154632) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3122543 on 2016/09/13 by Ben.Woodhouse Override HasOcclusion for Widget3DComponentProxy to detect if the material is has depth testing enabled. #jira UE-35878 Change 3122544 on 2016/09/13 by Ben.Woodhouse Shadow stencil optimisation with cvar (enabled by default) Avoids redundant clearing the stencil buffer for per-object and pre shadows by zeroing the stencil during testing, following discussion on UDN. This means we don't benefit from Hi Stencil on GCN for the shadow projection draw calls, but it's still faster in all the cases I could find, including for the player character where the bounding box is quite large. (Note: early stencil still works fine, according to PIX) Shadow projection GPU time profiling : Test map with 35 characters, stationary directional light - 4ms-2ms on XB1 - 2.5ms to 0.9ms on PC (r9-390X) - 3ms-2ms on PS4 Paragon PS4 (roughly 20% reduced - from ~0.39ms) Change 3122687 on 2016/09/13 by Rolando.Caloca DR - vk - Fix ES31 crash Change 3122691 on 2016/09/13 by Rolando.Caloca DR - vk - Fixes for SDK 1.0.26.0 Change 3122778 on 2016/09/13 by Rolando.Caloca DR - vk - Fix number of layers on barrier Change 3122921 on 2016/09/13 by Rolando.Caloca DR - vk - Fix ES3.1 Change 3122925 on 2016/09/13 by Ben.Woodhouse Fix sky lighting issue with skin and re-enable non-checkerboard lighting by default (fallout from lightaccumulator refactor) #jira UE-35904 Change 3123016 on 2016/09/13 by Chris.Bunner Fixed adaptive tessellation, broken by CL 3089208 refactor. #jira UE-35341 Change 3123079 on 2016/09/13 by Rolando.Caloca DR - vk - Force StoreOp store instead of DontCare everywhere (temporarily) Change 3123503 on 2016/09/13 by David.Hill #jira UE-25623 converted a check() to checkf() to include better diagnostic information. Change 3123617 on 2016/09/13 by Guillaume.Abadie Fixes artifact when the camera direction is almost parallel to a wide plane with SSR. #jira UE-35128 Change 3123743 on 2016/09/13 by Brian.Karis Separate mesh reduction interfaces for static and skeletal. Zero bad tangents from input mesh. Change 3125378 on 2016/09/14 by Arne.Schober DR - [UE-34481] - Extract all the State which is necessary to execute the DebugTextDrawingDelegate from the SceneProxy into its own Helpers to be drawn to the canvas later on. The issue was that the SceneProxys are only owned by the RT after their creation and the GT should avoid reading from or writing state to them. Change 3125527 on 2016/09/14 by Arne.Schober DR - [UE-34481] - Fix UT build and mac Change 3125741 on 2016/09/14 by Rolando.Caloca DR - Extra debug mode for tracking down SCW crashes (r.DumpSCWQueuedJobs=1) Change 3125763 on 2016/09/14 by Rolando.Caloca DR - vk - Added new Renderpass cache - Fix buffer barrier warning Change 3125769 on 2016/09/14 by Rolando.Caloca DR - Renamed cvar to r.DumpSCWQueuedJobs Change 3125771 on 2016/09/14 by Rolando.Caloca DR - Added support for SV_ClipDistance on GL3 & 4 Change 3125792 on 2016/09/14 by Arne.Schober DR - [UE-34481] - Fix Odin and PS4 Change 3125880 on 2016/09/14 by Arne.Schober DR - [UE-34481] - Fix Fortnite Change 3125968 on 2016/09/14 by Brian.Karis Removed comment Change 3126315 on 2016/09/15 by Ben.Woodhouse GPU profiler robustness - Change stat gathering update to handle multiple views and non-scenerenderer stats (such Slate) properly - Simplify gathering logic - Fix race condition where we could read back queries before they're submitted on the RHI thread. - Fix for movie player stat gathering - disable gathering outside of the main engine tick #jira UE-35975 Change 3126792 on 2016/09/15 by Rolando.Caloca DR - vk - Release render pass cache Change 3126804 on 2016/09/15 by Rolando.Caloca DR - vk - Fix UpdateTexture2D() #jira UE-34151 Change 3126884 on 2016/09/15 by Rolando.Caloca DR - vk - Compile fix Change 3126953 on 2016/09/15 by Rolando.Caloca DR - Enable GPU capture when running OpenGL under RenderDoc - Will also set the memory mode to non coherent so not to kill performance on RenderDoc Change 3126966 on 2016/09/15 by Rolando.Caloca DR - Allow cooking for Vulkan SM4 to help with packaging Change 3127082 on 2016/09/15 by Guillaume.Abadie Wraps up contact shadows for release fixing different artifacts and handling correctly their screen space length. #jira UE-35367, UE-33602, UE-33603, UE-33604 #review-3125887 @brian.karis Change 3127130 on 2016/09/15 by Mark.Satterthwaite Add overloads to as* functions in hlslcc - HLSL allows you to call these on variables of the same type, in which case it simply returns the existing value but we had only defined the float<->u/int conversions, so hlslcc added implicit casts that broke such cases (i.e. asuint(uint) -> floatBitsToUint(float(uint))). This change defines the missing overloads as returns with regular casts. #jira FORT-25869 #jira UE-34263 Change 3127475 on 2016/09/15 by Rolando.Caloca DR - vk - Debug dump Change 3128131 on 2016/09/16 by Ben.Woodhouse (Integrated from //UE4/Private-Partner-NREAL/...) Alpha output support for postprocess materials (optional via a parameter) Needed for end of frame compositing. Could be used to pass intermediate data from one blendable postprocess to another (e.g edge detection) Change 3128135 on 2016/09/16 by Ben.Woodhouse GPU profiler (PS4) - remove bubbles between commandlist submissions from query times Use r.ps4.AdjustRenderQueryTimestamps cvar to enable/disable (defaults to on) Also fixes some potential precision issues with unit GPU timing Change 3128247 on 2016/09/16 by Rolando.Caloca DR - vk - Cache framebuffers Change 3128593 on 2016/09/16 by Rolando.Caloca DR - vk - Fix for crash loading map #jira UE-36072 Change 3128759 on 2016/09/16 by Mark.Satterthwaite Back out changelist 3127130 - its causing a build failure in FPostProcessVelocityScatterVS because hlslcc is picking the wrong as_* overload. Change 3130236 on 2016/09/19 by Chris.Bunner Exposed full SceneCaptureComponent classes instead of select methods. #jira UE-35996 Change 3130388 on 2016/09/19 by Rolando.Caloca DR - Avoid crash when adding dynamic primitives #jira UE-35327 Change 3130393 on 2016/09/19 by Marc.Olano Improve vector noise tooltips & documentation Change 3130547 on 2016/09/19 by Ben.Woodhouse Fix for ensure fail when initializing point light shadowmaps. This came about because cubemap rendertargets always have Extents of (Resolution, 0). The Y component was implicitly used to determine if it was a cubemap, which is odd... The fix was to make the definition explicit via a flag and initialize both the X and Y parameters. I suspect the ensure started happening recently due to a more recent change, but fixing the underlying logic seems like the correct fix. #jira UE-35837 Change 3130578 on 2016/09/19 by Daniel.Wright Workaround OpenGL/NVidia bug with non-power-of-2 textures by disabling CSM atlassing if we're using OpenGL Change 3130682 on 2016/09/19 by Rolando.Caloca DR - Better fix for UE-35327 #jira UE-35327 Change 3130767 on 2016/09/19 by Uriel.Doyon Better handling of color array in VisualizeComplexity code to prevent assert. #jira UE-29332 Change 3130965 on 2016/09/19 by Arne.Schober DR - [UE-35679] - the crash was caused by the Resource of the UTexture being Null. And one of the Kismet Nodes calling a function on that resource. The solution was to disable that call from Kismet when only cooking. Change 3130967 on 2016/09/19 by Chris.Bunner Hid redundant texture sampler properties from texture object parameter. Hid redundant texture property input on texture parameter nodes. Fixed copy-paste error in expression texture parameter docs. #jira UE-32724 Change 3131118 on 2016/09/19 by Mark.Satterthwaite Second attempt - this time with the correct input types. Add overloads to as* functions in hlslcc - HLSL allows you to call these on variables of the same type, in which case it simply returns the existing value but we had only defined the float<->u/int conversions, so hlslcc added implicit casts that broke such cases (i.e. asuint(uint) -> floatBitsToUint(float(uint))). This change defines the missing overloads as returns with regular casts. #jira FORT-25869 #jira UE-34263 Change 3131153 on 2016/09/19 by Rolando.Caloca DR - Fix recompute normals when triangles have a LHS tangent basis Integrate from 3028634 - Also make meshes that don't have morphs be able to run through the recompute normals path #jira UE-35472 Change 3131228 on 2016/09/19 by Mark.Satterthwaite Duplicate CL #3114668: Always disable asynchronous shader compilation for the global shader map on Metal as some of them are needed very early. #jira UE-35240 Change 3131246 on 2016/09/19 by Rolando.Caloca DR - Shrink gpu skinning permutations Change 3131261 on 2016/09/19 by Mark.Satterthwaite Fix Metal validation failures due to particle rendering not binding buffers to all buffer inputs declared in the shader. ContentExamples Effects no longer aborts complaining that the particle system didn't bind a required buffer. Change 3131265 on 2016/09/19 by Mark.Satterthwaite Fix FMetalDynamicRHI::RHIReadSurfaceData for shared textures on iOS. Change 3131271 on 2016/09/19 by Mark.Satterthwaite Use private memory for the Metal stencil SRV workaround needed on El Capitan. Change 3131273 on 2016/09/19 by Mark.Satterthwaite Disable the lazy-encoder construction in Metal for AMD - there is a situation that causes the lazy construction to perform a clear that isn't wanted and so far this hasn't been tracked down and fixed. Until then, this will render correctly. Change 3131280 on 2016/09/19 by Mark.Satterthwaite For GLSL interpolation mode flags must come before storage mode flags and you can't redeclare the system variable gl_Layer to use a differing interpolation mode. Change 3131283 on 2016/09/19 by Mark.Satterthwaite Change the ShaderCache to not cache resource bindings in the draw states for shader platforms that don't care - reduces the number of draw states considered significantly without reducing effectiveness. We can support ShaderCache with Metal SM5 but not the RHI thread enabled so change when we enable it and make sure we load the binary shader cache. Change 3131402 on 2016/09/19 by Rolando.Caloca DR - Disambiguate callstack #jira UE-34415 Change 3131469 on 2016/09/19 by Rolando.Caloca DR - vk - Check if we can allocate descriptors off a pool Change 3131482 on 2016/09/19 by Rolando.Caloca DR - vk - Remove unused var Change 3131506 on 2016/09/19 by Mark.Satterthwaite With permission from Josh.A & Michael.T, deprecate Mac OpenGL support. For now this just means visibly warning users with message boxes - but in a future release OpenGL support will be removed from macOS. Change 3131536 on 2016/09/19 by Rolando.Caloca DR - vk - Compile fix Change 3131564 on 2016/09/19 by Rolando.Caloca DR - vk - Submit Hint - Disable framebuffer recycling as its causing a hang Change 3131625 on 2016/09/19 by Mark.Satterthwaite Inside MetalRHI add an optional cache for disposed texture objects so we may reuse them - controlled by CVAR rhi.Metal.TextureCacheMode which must be set prior to running as it can't be changed at runtime. Settings: 0 = off, 1 (default) = will attempt to reuse private memory texture objects within the frame they are released otherwise they are disposed of as before. Setting 2 extends the caching to all textures - though Managed/Shared textures cannot be reused until after the frame in which they were released has been processed on the GPU. In this mode id<MTLTexture> objects are never returned to the OS so in order to conserve VRAM calls to setPurgeableState are made to allow the driver to reclaim unusued memory if required. Change 3131630 on 2016/09/19 by Mark.Satterthwaite More statistics in Metal added to track down where performance was going in a particular project but which may be more generally useful. Change 3131955 on 2016/09/20 by Gil.Gribb Merging //UE4/Dev-Main@3129758 to Dev-Rendering (//UE4/Dev-Rendering) Change 3131978 on 2016/09/20 by Gil.Gribb CIS fix Change 3132584 on 2016/09/20 by Ben.Woodhouse Add some additional checks to help track down a rare crash with terrain rendering and shader recompiling #jira UE-35937 Change 3132696 on 2016/09/20 by Mark.Satterthwaite Use set*Bytes to handle uploading buffers < 4Kb when available - this is faster than lots of small Metal buffers and reduces the amount of GPU heap fragmentation. Where the API feature isn't available or hasn't been tested yet we'll use another ring-buffer inside the MetalCommandEncoder to emulate it. Change 3132772 on 2016/09/20 by Mark.Satterthwaite Rework Metal's handling of RHISetStreamSource calls that override the stride of vertex declarations to be much more efficient. Change 3132870 on 2016/09/20 by Ben.Woodhouse Fix mac compile error Change 3133049 on 2016/09/20 by Brian.Karis Changed light source shapes in reflection captures to use alpha Change 3133057 on 2016/09/20 by Brian.Karis Alphaed out on spot light cone as well. Change 3133263 on 2016/09/20 by Rolando.Caloca DR - vk - Debug names for objects Change 3133292 on 2016/09/20 by Rolando.Caloca DR - vk - Fix SRGB upload/formats Change 3133395 on 2016/09/20 by Rolando.Caloca DR - vk - SM5 fixes Change 3134026 on 2016/09/21 by Gil.Gribb Merging //UE4/Dev-Main@3133983 to Dev-Rendering (//UE4/Dev-Rendering) Change 3134663 on 2016/09/21 by Chris.Bunner Merging Dev-MaterialLayers to Dev-Rendering, CL 3134208. Initial material attribute extensibility changes. #jira UE-34347 Change 3134730 on 2016/09/21 by Arne.Schober DR - [UE-34481] - Fix minor brokenness found by Gil Change 3134792 on 2016/09/21 by Chris.Bunner Fixed compile errors for non-editor builds. Change 3135214 on 2016/09/21 by Rolando.Caloca DR - vk - Fix visualize texture - Dump memory when OOM (to track leaks) Change 3135225 on 2016/09/21 by Rolando.Caloca DR - vk - Ensure on exit if mem leak - Update fences if running wait for idle Change 3135672 on 2016/09/22 by Gil.Gribb Merging //UE4/Dev-Main@3135568 to Dev-Rendering (//UE4/Dev-Rendering) Change 3135793 on 2016/09/22 by Rolando.Caloca DR - vk - Set dynamic state after binding pipeline or on a fresh cmd buffer Change 3135816 on 2016/09/22 by Rolando.Caloca DR - Add names for d3d on renderdoc Change 3135894 on 2016/09/22 by Chris.Bunner Fixed initialization order warning. Change 3136024 on 2016/09/22 by Rolando.Caloca DR - vk - Fix stencil faces Change 3136042 on 2016/09/22 by Marcus.Wassmer Fix compile error Change 3136046 on 2016/09/22 by Chris.Bunner Renamed material for PostTonemapHDRColor visualization to reflect actual usage. Change 3136308 on 2016/09/22 by Uriel.Doyon Changed how the component relative rotation is computed, in order to have more consistency after blueprint rescript. #jira UE-36094 Change 3136798 on 2016/09/22 by Chris.Bunner Gather object references from stereo view state in USceneCaptureComponent. This matches behavior of other classes such as ULocalPlayer. Change 3137092 on 2016/09/22 by Rolando.Caloca DR - vk - Rename pipeline to gfx pipeline Change 3137263 on 2016/09/22 by Mark.Satterthwaite Duplicate CL #3135157: Fix one cause of Metal crashes loading into a zone - the PlanarReflection shader code needs to always set the IsStereoParameter so that the shader can perform the if-test without causing an invalid GPU access. #jira FORT-30061 Change 3137265 on 2016/09/22 by Mark.Satterthwaite Duplicate CL #3135169: Correct Metal texture creation for AVF media framework - we can't provide a render-targetable version of the texture without blitting. The native texture we get is a GPU copy that can be made CPU accessible (i.e. it is not tiled). Change 3137266 on 2016/09/22 by Mark.Satterthwaite Duplicate CL #3135237: Metal validation layer fix: under Metal if there are no reads from the vertex stage-in buffers we should use the Empty vertex declaration, not the filter declaration, otherwise we have to bind a redundant vertex stream buffer to silence the validation layer. Change 3137268 on 2016/09/22 by Mark.Satterthwaite Duplicate CL #3136033: To fix the Fortnite login screen force Nvidia Macs to use the set*Bytes API for small buffer updates even on El Capitan. We can't do this globally as Intel didn't implement these functions until macOS Sierra. Fix GPU selection code in MetalRHI to confirm everything is working. #jira FORT-30385 Change 3137269 on 2016/09/22 by Mark.Satterthwaite Duplicate CL #3137164: Add stats to track exactly how many command buffers are allocated and committed each frame to work out why Fortnite on AMD is hanging, which turns out to be because each texture update/reallocation ends up in its own command-buffer. This needs to be rethought to pack these into fewer command buffers with the same synchronisation requirements to minimise command-buffer splits but for now we'll just make the default sufficiently large that we shouldn't see the hang until the work is done. Also ensure that command-buffer failure is always fatal - there is no way to recover or continue if a command-buffer fails. #jira FORT-30377 Change 3137606 on 2016/09/23 by Gil.Gribb Merging //UE4/Dev-Main@3137560 to Dev-Rendering (//UE4/Dev-Rendering) Change 3137936 on 2016/09/23 by Rolando.Caloca DR - Split RHICmdList clear into color & ds in prep for changes Change 3138346 on 2016/09/23 by Rolando.Caloca DR - vk - Some renaming and splitting classes in prep for compute Change 3138628 on 2016/09/23 by Rolando.Caloca DR - vk - Fix mem leak on framebuffers Change 3138721 on 2016/09/23 by Daniel.Wright Better comment for r.DefaultFeature.AntiAliasing Change 3138722 on 2016/09/23 by Daniel.Wright Fixed assert from decals with MSAA due to binding the Scene Depth Texture instead of surface Change 3138723 on 2016/09/23 by Daniel.Wright Corrected GC doc Change 3138892 on 2016/09/23 by Daniel.Wright Fixed instanced static meshes being unbuilt after a lighting build if you ever cancelled a previous lighting build Change 3138905 on 2016/09/23 by Daniel.Wright "Optimizations" -> "Optimization Viewmodes" Change 3138939 on 2016/09/23 by Daniel.Wright Disabled the stationary light overlap viewmode with forward shading Change 3139710 on 2016/09/26 by Rolando.Caloca DR - Rename and added texture RHIClearDepthStencil -> RHIClearDepthStencilTexture Change 3139820 on 2016/09/26 by Rolando.Caloca DR - Remove prefix from shader frequency strings Change 3139828 on 2016/09/26 by Marcus.Wassmer Add SetShaderValue() specialization for bools on AsyncCompute commandlists to match the Gfx specialization. Change 3139840 on 2016/09/26 by Benjamin.Hyder Adding VectorNoise Examples to TM-Noise map Change 3139862 on 2016/09/26 by Rolando.Caloca DR - Better log to track down crash #jira UE-36271 Change 3140142 on 2016/09/26 by Rolando.Caloca DR - Fix clang warning Change 3140145 on 2016/09/26 by Rolando.Caloca DR - Rename RHIClearColor(MRT) to RHIClearColorTextures and pass textures as parameters Change 3140360 on 2016/09/26 by Daniel.Wright Lighting Scenarios and lightmaps moved to separate package * Levels can be marked as lighting scenarios (eg Day, Night). Lighting is built separately for each lighting scenario with actors / lights in all other scenario levels hidden. Only one lighting scenario level should be visible at a time in game, and its lightmaps will be applied to the world. * Most outputs of the lighting build now go into a separate _BuiltData package. This improves level Save and AutoSave times as the separate package will only be dirtied after lighting rebuilds. * If a lighting scenario is present, all lightmaps are placed inside it's _BuiltData package. This means that only the currently loaded lighting scenario's lightmaps will be loaded (Day or Night, but not both). This also means that lightmaps for a streaming level will not be streamed with it. * For backwards compatibility, existing lightmaps are moved to a new _BuiltData package on load. * Reflection captures and precomputed visibility were not moved to the separate package. Reflection captures are force updated on load of a lighting scenario level, which can increase load times. Change 3140361 on 2016/09/26 by Daniel.Wright Lighting Scenarios UI Change 3140582 on 2016/09/26 by Mark.Satterthwaite Duplicate CL #3140166 Fix the video playback in Fortnite - bind our shader resource texture as the render-target texture as for some reason the playback code expects it there, even though we could never provide one. #jira FORT-30551 Change 3140584 on 2016/09/26 by Mark.Satterthwaite Duplicate CL #3140131: Fix crash under the validation layer & Nvidia's El Capitan (10.11) drivers when distance field particle collisions are used without any scene distance fields available - bind the black volume texture when that is the case to avoid bad access on the GPU. #jira FORT-30622 Change 3140586 on 2016/09/26 by Mark.Satterthwaite Duplicate CL #3140450: Fix launching the game on Intel GPUs by disabling Tiled Reflections on Intel for macOS Sierra like we did for El Capitan as there's currently a driver bug to means it doesn't work properly. #jira FORT-30649 Change 3140594 on 2016/09/26 by Zabir.Hoque Fix benchmark shaders register bindings. TEXCOORD0 was bound to register 1 in VS and then register 0 in PS. DX12 treats this a PSO creation failuer unlike DX11 this was an error. Change 3140601 on 2016/09/26 by Marcus.Wassmer New 'Cinematic' Scalability level. Remove unused 'new' motionblur CVAR Change 3140602 on 2016/09/26 by Zabir.Hoque CreateTexture3D on XB1 DX11 was leaking ESRAM by reserving it but not allocating to it. #Tests: Fix was tested by licensee (GearBox). Change 3140622 on 2016/09/26 by Rolando.Caloca DR - vk - More prep for sm5 Change 3140765 on 2016/09/26 by Rolando.Caloca DR - Fix ensure from bad clear depth surface Change 3141251 on 2016/09/27 by Rolando.Caloca DR - vk - Rename & cleanup Change 3141394 on 2016/09/27 by Rolando.Caloca DR - vk - Compute pipeline state Change 3141463 on 2016/09/27 by Mark.Satterthwaite Fix the include order to avoid compile errors on Mac. Change 3141529 on 2016/09/27 by Gil.Gribb Merging //UE4/Dev-Main@3139632 to Dev-Rendering (//UE4/Dev-Rendering) Change 3141830 on 2016/09/27 by zachary.wilson Adding testing content for lighting scenarios to collaborate with Ben Change 3141941 on 2016/09/27 by Olaf.Piesche Speculative fix for UE-34815; have yet to repro this but there's really only so many things it could be. I currently don't see how the sim resources could go away after queueing, so I'm replacing the check with an ensure and null checking the resource pointer. Change 3142035 on 2016/09/27 by Olaf.Piesche Fix compiler error from silly leftover bit of code. Change 3142065 on 2016/09/27 by Benjamin.Hyder Updating Lighting Scenario map Change 3142262 on 2016/09/27 by Mark.Satterthwaite Change Apple RHI initialisation to select the first compatible shader platform to decide which RHI to initialise. Internally in MetalRHI we must gracefully fallback to a lower feature-level when this initial selection is not available on the current device/OS, in which case we need to validate that the selected shader platform was actually packaged. The order of initialisation is different per-platform: On Mac: Order of initialisation is the order listed in TargetedRHIs .ini specifications. On iOS/tvOS: Order is explicit: Metal MRT > Metal ES 3.1 > OpenGL ES 2 #jira UE-35749 Change 3142292 on 2016/09/27 by Rolando.Caloca DR - hlslcc - Fix for warning X3206: implicit truncation of vector type causing error #jira UE-31438 Change 3142397 on 2016/09/27 by Mark.Satterthwaite Update hlslcc for Mac including RCO's changes in CL #3142292. #jira UE-31438 Change 3142438 on 2016/09/27 by Daniel.Wright UMapBuildDataRegistry's created for legacy lightmap data are placed in the map package, which avoids problems with cooking Change 3142452 on 2016/09/27 by Rolando.Caloca DR - Proper support for int defines Change 3142519 on 2016/09/27 by Arne.Schober DR - [UE-33438] - Added a Project Setting to enable Skincache Shader Permuations. The Default value for the Skincache mode was changed to enabled. The reasoning behind this was that it will be auto disabled when Skincache Shaders are disabled, and runtime toggle is a debuging feature that mainly programmers are dealing with. The Recompute Tangents option in the Skinned Mesh is now automatically grayed out when no Skincache Shader Permuations are available. Change 3142537 on 2016/09/27 by Daniel.Wright Fixed r.ScreenPercentage with MSAA - a scissor rect was being setup during the resolve and not reset Change 3142691 on 2016/09/27 by Daniel.Wright Disabled renaming of legacy ULightmap2D's to the separate package since UMapBuildDataRegistry is no longer put in a separate package for legacy content Change 3142711 on 2016/09/27 by Daniel.Wright GComponentsWithLegacyLightmaps entries get handled by USceneComponent::AddReferencedObjects, fixes a crash when you open a map directly from the content browser Change 3142712 on 2016/09/27 by Daniel.Wright Separate category for ParticleCutout properties Change 3142762 on 2016/09/27 by Uriel.Doyon Added per static mesh and per skeletal mesh UV density data. The data is now saved and available in cooked builds. The density are computed by the engine but can be overridden by the user in the material tabs. Texture streaming intermediate component data is now per material instead of per lod-section. New ViewModeParam in FSceneViewFamily allowing context specific param per viewmode. This is currently used to show which UV channel and which texture index is being shown in the texture streaming accuracy viewmodes. This replaces r.Streaming.AnalysisIndex Renamed texture streaming viewmodes: MeshTexCoordSizeAccuracy -> MeshUVDensityAccuracy MaterialTexCoordScalesAccuracy -> MaterialTextureScaleAccuracy MaterialTexCoordScalesAnalysis -> OutputMaterialTextureScales Improved UV density computation and viewmode. LightmapUVDensity is now computed separately from UVChannel Density. Fixed texture streaming for instanced static mesh component and derived types. Change 3143464 on 2016/09/28 by Daniel.Wright Removed 'experimental' from forward shading setting Change 3143508 on 2016/09/28 by Chris.Bunner Added component type handling to FoldedMath and Length material expressions. #jira UE-36304 Change 3143557 on 2016/09/28 by Rolando.Caloca DR - Back out changelist 3142292 Change 3143563 on 2016/09/28 by Rolando.Caloca DR - vk - Force hlslcc re-link Change 3143648 on 2016/09/28 by Daniel.Wright Moved GetMeshMapBuildData to UStaticMeshComponent since FStaticMeshComponentLODInfo::OwningComponent can't be initialized reliably in the case of SpawnActor off of a blueprint default that has LODData entries already. Change 3143661 on 2016/09/28 by Chris.Bunner Warning fix. Change 3143723 on 2016/09/28 by Daniel.Wright DumpUnbuiltLightIteractions after lighting build for debugging Change 3143822 on 2016/09/28 by Arne.Schober DR - Refactoring of the ViewMatrices. Moved the Derived Matrices into the FViewMatrix struct. Made all members private do emphasize the static constness of that struct after creation. Renamed the heavy weight members on this struct to Compute*. Methods that modify The ViewMatrices have been renamed to Hack* to discurage their use in the future until a better solution for those problems is found. The ViewMatrix modification is especially misleading because it only changes the State of the ViewMatrices to read their Position from the Material Editior as if coming from the Lightsource (mainly for manual bilboards) as well as doing someting similar to generate CPU bilboards for shadows. Change 3143860 on 2016/09/28 by Benjamin.Hyder Updating TM-Noise map to include 3d noise examples Change 3143939 on 2016/09/28 by Rolando.Caloca DR - vk - Better debugging of submissions - Added r.Vulkan.IgnoreCPUReads to help track down hangs on some ihvs Change 3144006 on 2016/09/28 by Brian.Karis Fixed PixelError not being set correctly with LOD groups. Removed unneeded Simplygon references. Mesh reduction module can now be chosen by name with r.MeshReductionModule Change 3144026 on 2016/09/28 by Benjamin.Hyder Updating QA-Effects map to correct numbering issue Change 3144098 on 2016/09/28 by Arne.Schober DR - ViewMatrices Refactoring - Fix UT Change 3144158 on 2016/09/28 by Rolando.Caloca DR - Undo splitting RHI command context Change 3144952 on 2016/09/29 by Rolando.Caloca DR - vk - Missing swapchain flag Change 3145064 on 2016/09/29 by Olaf.Piesche #jira UE-36091 Pulling range update for vector distributions even when UDist is not dirty; some content has a lookup table and a clean dist, but the range values have not been baked; always pulling them should be safe and not significantly costly. Change 3145354 on 2016/09/29 by Benjamin.Hyder Updating Tm-ContactShadows Change 3145485 on 2016/09/29 by Daniel.Wright Made SeamlessTravelLoadCallback handle legacy lightmaps Change 3145527 on 2016/09/29 by Daniel.Wright Don't clear legacy lightmap annotations on each map - fixes lighting unbuilt when doing seamless travel Change 3145530 on 2016/09/29 by Simon.Tovey UE-36188 - Editor crash when updating hierarchical instance static mesh component Dirtied render state rather than unsafe update of bounds. Change 3145608 on 2016/09/29 by Gil.Gribb Attempt to fix a random compiler error under win32 Change 3145749 on 2016/09/29 by Uriel.Doyon Fix for static analysis warning Change 3146091 on 2016/09/29 by Zabir.Hoque RHI Interface changes to support PSO based APIs Change 3146092 on 2016/09/29 by Zabir.Hoque D3D12 RHI support for PSO based APIs. Change 3146590 on 2016/09/30 by Gil.Gribb Merging //UE4/Dev-Main@3146520 to Dev-Rendering (//UE4/Dev-Rendering) Change 3146731 on 2016/09/30 by Rolando.Caloca DR - Fix merge conflicts Change 3146778 on 2016/09/30 by Rolando.Caloca DR - More integration compile fixes Change 3146790 on 2016/09/30 by Rolando.Caloca DR - Integration fix Change 3146849 on 2016/09/30 by Rolando.Caloca DR - Final integration fix Change 3146899 on 2016/09/30 by Daniel.Wright Static analysis fix for dereferencing World Change 3147020 on 2016/09/30 by Rolando.Caloca DR - vk - Fix depth issue on AMD cards - Added VULKAN_KEEP_CREATE_INFO to help debugging creation - Added num color attachments to pipeline key Change 3147034 on 2016/09/30 by Rolando.Caloca DR - Fix Kite crash where shader pipelines were optimizing non-tessellation pipelines #jira UE-36277 #jira UE-36500 Change 3147080 on 2016/09/30 by Rolando.Caloca DR - vk - Disable debug info by default Change 3147082 on 2016/09/30 by Chris.Bunner Allow tessellation to be used with DrawTile calls by swapping fixed mesh to triangle list. #jira UE-36491 Change 3147388 on 2016/09/30 by Chris.Bunner Blacklisted Nvidia driver 372.70 as it has known stability issues skewing our top crashes list. Also updated recommended version numbers. #jira UE-35288 Change 3147394 on 2016/09/30 by Chris.Bunner Additional logging for rare error. #jira UE-35812 Change 3147459 on 2016/09/30 by Rolando.Caloca DR - vk - Some more srgb formats Change 3147537 on 2016/09/30 by Rolando.Caloca DR - vk - Standarize srgb flag like D3D11 - Minor FVulkanShader cleanup Change 3147620 on 2016/09/30 by Olaf.Piesche #jira UE=34486 particle component tick function task can be invalid during pause; add check Change 3148028 on 2016/10/01 by Daniel.Wright Renamed RenderingSettings.cpp to match header Change 3148059 on 2016/10/01 by Daniel.Wright Disabled reparenting in the profiler which is disorienting Change 3148067 on 2016/10/01 by Daniel.Wright Support for ReflectionEnvironment and light type show flags with ForwardShading Change 3148069 on 2016/10/01 by Daniel.Wright Added CapsuleIndirectShadowMinVisibility to SkinnedMeshComponent, so artists have control over indirect capsule shadow darkness without changing cvars Change 3148072 on 2016/10/01 by Daniel.Wright Added a rendering setting to disable the new lightmap mixing behavior, where smooth surfaces don't have any mixing. r.ReflectionEnvironmentLightmapMixBasedOnRoughness Change 3148073 on 2016/10/01 by Daniel.Wright r.VertexFoggingForOpaque only affects forward shading - manual copy of Ben's fix from Orion stream Change 3148074 on 2016/10/01 by Daniel.Wright Enabled planar reflection receiving on the material used for the preview of a APlanarReflection Change 3148084 on 2016/10/01 by Daniel.Wright Fixed reflections on Surface TranslucencyVolume in deferred Change 3148085 on 2016/10/01 by Daniel.Wright Fixed planar reflection composite being done too many times in stereo deferred Change 3148086 on 2016/10/01 by Daniel.Wright Clamp IndirectLightingQuality to 1 in preview builds - keeps preview useful even with IndirectLightingQuality jacked up to 10. Change 3148107 on 2016/10/01 by Daniel.Wright CIS fix Change 3148113 on 2016/10/01 by Daniel.Wright Translucency lighting modes for forward shading * Per-vertex modes use GetSimpleDynamicLighting since they can't support specular anyway Change 3148306 on 2016/10/02 by Rolando.Caloca DR - vk - Fix for some NV drivers on Win10 Change 3148307 on 2016/10/02 by Rolando.Caloca DR - vk - Compute pipeline Change 3148358 on 2016/10/02 by Rolando.Caloca DR - vk - Consolidate and renumber enum for binding types Change 3148396 on 2016/10/03 by Rolando.Caloca DR - vk - Warning fix Change 3148697 on 2016/10/03 by Benjamin.Hyder Submitting M_Chromebal after enabling planar reflectionsl Change 3148799 on 2016/10/03 by Rolando.Caloca DR - vk - static analysis fix Change 3148934 on 2016/10/03 by Chris.Bunner Added pre-skinned local position material graph node, vertex shader only. Change 3148994 on 2016/10/03 by Chris.Bunner Added missing header file. Change 3149085 on 2016/10/03 by Daniel.Wright Support for ReflectionEnvironment show flag in base pass reflections without any shader overhead Change 3149095 on 2016/10/03 by Rolando.Caloca DR - vk - Disable new render passes Change 3149125 on 2016/10/03 by Rolando.Caloca DR - vk - Fix for multiple memory types Change 3149181 on 2016/10/03 by Rolando.Caloca DR - Better message when missing pipelines Change 3149215 on 2016/10/03 by Rolando.Caloca DR - RHIClearColor -> RHIClearColorTexture #tests Orion Editor run match on Agora_P Change 3149288 on 2016/10/03 by Chris.Bunner Added PreTonemapHDRColor for buffer visualization and target output. #jira UE-36333 Change 3149402 on 2016/10/03 by Daniel.Wright Light attenuation buffer is now multisampled, fixes preshadows with MSAA (depth testing failed during stencil pass) but adds a resolve (.12ms at VR res) Change 3149403 on 2016/10/03 by Daniel.Wright Forward lighting supports lighting channels Change 3149574 on 2016/10/03 by Marcus.Wassmer PR #2817: Ansel/Photography system (Contributed by adamnv) Modified to become a plugin Change 3149615 on 2016/10/03 by Rolando.Caloca DR - vk - Fix PF_G16R16 which fixes reflections Change 3149639 on 2016/10/03 by Olaf.Piesche Adding more ensures to catch NaNs occasionally appearing in particle locations early Change 3149745 on 2016/10/03 by Uriel.Doyon Moved UVDensity computation in the staticmesh DDC. Change 3149749 on 2016/10/03 by Daniel.Wright Fixed lightmaps on BSP, which was fallout from Lighting Scenarios backwards compatibility Change 3149755 on 2016/10/03 by Benjamin.Hyder Checking in built lighting for QA-postprocessing Change 3149758 on 2016/10/03 by Benjamin.Hyder re-submitting built lighting for QA-PostProcessing Change 3149940 on 2016/10/04 by Gil.Gribb Merging //UE4/Dev-Main@3149754 to Dev-Rendering (//UE4/Dev-Rendering) Change 3150098 on 2016/10/04 by Marcus.Wassmer Fix some clang and win32 errors Change 3150323 on 2016/10/04 by Rolando.Caloca DR - vk - Static analysis fix Change 3150456 on 2016/10/04 by Daniel.Wright Revert temp logs Change 3150731 on 2016/10/04 by Daniel.Wright Static lights now add a dummy map build data entry for their ULightComponent::IsPrecomputedLightingValid Change 3150795 on 2016/10/04 by Marcus.Wassmer Fix RHIClearUAV and Drawindirect bugs on PS4. Also fix PS4 compile error from bad merge. Change 3151065 on 2016/10/04 by Ben.Marsh Merging //UE4/Dev-Main to Dev-Rendering (//UE4/Dev-Rendering) Change 3151134 on 2016/10/04 by Brian.Karis Fixed corrupt mesh generation from quadric simplifier due to uninitialized color array. Change 3151201 on 2016/10/04 by Marcus.Wassmer Nvidia approved icon for ansel plugin. Change 3151240 on 2016/10/04 by Marcus.Wassmer Fix string concat build error. Change 3151258 on 2016/10/04 by Ben.Marsh Fix compile error. Change 3151290 on 2016/10/04 by Marcus.Wassmer Bumping static mesh DDC key to hopefully fix distancefield crashes after brian's quadric simplifier fix. Change 3152104 on 2016/10/05 by Chris.Bunner Workaround for legacy BreakMA material node invalid component masks. #jira UE-36832 Change 3152130 on 2016/10/05 by Ben.Woodhouse Fix issue with skylight SH and fast semantics on DX11. We need to clear the cube scratch textures before writing to them to avoid issues when reading them back for mip downsampling #jira UE-35890 Change 3152240 on 2016/10/05 by Rolando.Caloca DR - Fix for missing gizmo colors #jira UE-36515 Change 3152338 on 2016/10/05 by Daniel.Wright Hopeful fix for FDistanceFieldVolumeTexture assert in the cooker Change 3152833 on 2016/10/05 by Brian.Karis Improved precision of quadrics. Fixes bad triangles on large meshes Change 3153376 on 2016/10/06 by Rolando.Caloca DR - Fix for SM4 missing pipelines fallout Change 3153650 on 2016/10/06 by Gil.Gribb Merging //UE4/Dev-Main@3153068 to Dev-Rendering (//UE4/Dev-Rendering) Change 3153656 on 2016/10/06 by Uriel.Doyon Fixed main integration compilation issues. Some of the Mesh UVDensity UI is temporary disabled. Change 3153725 on 2016/10/06 by Uriel.Doyon Fixed crash when source data is missing for lightmaps #jira UE-36157 Change 3153998 on 2016/10/06 by Gil.Gribb Merging //UE4/Dev-Main to Dev-Rendering@3153705 (//UE4/Dev-Rendering) Change 3154056 on 2016/10/06 by Marcus.Wassmer Fix compile errors from merge. Also restore some light scencario code Change 3154176 on 2016/10/06 by Marcus.Wassmer Fix deprecation warning Change 3154252 on 2016/10/06 by Marcus.Wassmer Fix more deprecation warnings Change 3154632 on 2016/10/07 by Chris.Bunner Fix for incorrect re-entrant detection with a function called twice in a row. The function input Preview expression is overridden when the function is called to link it into the caller graph, but it was restored too late for chained calls to the same function. #jira UE-37002 [CL 3154728 by Gil Gribb in Main branch]
2016-10-07 10:20:36 -04:00
const float DistanceToPrimitiveSq = (PrimitiveBounds.Origin - View.ViewMatrices.GetViewOrigin()).SizeSquared();
Copying //UE4/Dev-Rendering to Dev-Main (Source //UE4/Dev-Rendering@2932636) #lockdown nick.penwarden ========================== MAJOR FEATURES + CHANGES ========================== Change 2917472 on 2016/03/21 by Rolando.Caloca DR - Fix SCW directcompile arguments, add -pipeline Change 2919580 on 2016/03/23 by Rolando.Caloca DR - HlslParser - Fix for used elements (sparrow's arrow was showing when it shouldn't) Arrays of input/outputs are now flattened so disjoint entries can be optimized out (and fixes a bug) #jira OR-15380 #tests Run game with sparrow, test with slomo to check for gfx glitches Change 2919660 on 2016/03/23 by Rolando.Caloca DR - Latest vk changes (from dev mobile's 2916881 to 2919157) Change 2919902 on 2016/03/23 by Rolando.Caloca DR - Fix skeletal meshes decrementing stats twice #codereview Marcus.Wassmer #jira UE-28478 Change 2920020 on 2016/03/23 by David.Hill #Jira UE-28503 EyeAdapation when used in material shader may not be initialized. #rb olaf.piesche Change 2920071 on 2016/03/23 by Rolando.Caloca DR - Remove old vk define - Started moving around direct calls to queue submit Change 2920252 on 2016/03/23 by Rolando.Caloca DR - Changes vk structs to classes Change 2920314 on 2016/03/23 by Olaf.Piesche Add -windowed to standalone game PIE command line to avoid PIE launching in full screen #jira UE-27870 #codereview michael.trepka Change 2920745 on 2016/03/24 by Uriel.Doyon Texture streaming build now takes into account the material texcoord scales applied to the texture sampling. Also finds out which texcoord is being used when sampling textures (between 0 and 3 currently). TexCoord analysis debug view shaders is now working with SM4 ane SM5. StaticMeshComponents hold persistent data coming from the texture streaming build. #tests tested with different Paragon assets. Editor SM4 & SM5. Cooked maps #codereview marcus.wassmer Change 2921335 on 2016/03/24 by Uriel.Doyon Added missing static keyword for locally defined console variable. #codereview rolando.caloca Change 2921416 on 2016/03/24 by Uriel.Doyon Revert enabling debugview shaders on non PC platforms (until properly tested and debugged) Change 2921446 on 2016/03/24 by Daniel.Wright Planar reflection mesh Change 2921530 on 2016/03/24 by Daniel.Wright Manual revert of Ronin planar reflections #codereview Ryan.Vance Change 2921608 on 2016/03/24 by Uriel.Doyon Updated texture streamer to take into account the new HLOD texture group. Change 2921677 on 2016/03/24 by Daniel.Wright Distance Field Specular Occlusion * Prototype - disabled by default Change 2921681 on 2016/03/24 by Daniel.Wright UnmappedTexelsPercentage is now 100 based Change 2921682 on 2016/03/24 by Daniel.Wright Planar reflections * New actor and component * The scene is rendered to texture with a mirrored camera and a clip plane each frame * The reflection texture is then applied to opaque pixels in a deferred pass, with distance and angle from plane fades * Translucent materials apply the nearest reflection plane in the base pass * Planar reflections require the project setting 'Support global clip plane for Planar Reflections' to be enabled, since writing to SV_ClipDistance all the time adds about 15% BasePass GPU time on PS4 * Fixed global distance field in materials which had been broken since moving global distance field properties into the view uniform buffer * Fixed PS4 removing system-value semantics when output from vertex shader and not read in next stage Change 2921734 on 2016/03/24 by Uriel.Doyon Fixed tessellated cube having wrong UVs #jira UE-28379 Change 2922063 on 2016/03/24 by Daniel.Wright Removed planar reflection debug code Change 2922428 on 2016/03/25 by Chris.Bunner Delete FShaderPipeline objects when clearing TMaterialShaderMaps. #rb Rolando.Caloca #jira UE-28621 Change 2922803 on 2016/03/25 by Rolando.Caloca DR - New cmd buffer management (disabled) - Move cmd buffer out of pending state and into context - Do not hardcode # cmd buffers - Move back buffer image mgmt into swapchain - Fixed some image layout transition bugs Change 2923056 on 2016/03/25 by Rolando.Caloca DR - Initial fix for canvas locking inside a render pass [CL 2932649 by Gil Gribb in Main branch]
2016-04-04 18:44:59 -04:00
if (UpdateRegion.Bounds.Intersect(PrimitiveBounds.GetBox()))
{
UTexture2D* HeightfieldTexture = NULL;
UTexture2D* DiffuseColorTexture = NULL;
FHeightfieldComponentDescription NewComponentDescription(HeightfieldPrimitive->Proxy->GetLocalToWorld());
HeightfieldPrimitive->Proxy->GetHeightfieldRepresentation(HeightfieldTexture, DiffuseColorTexture, NewComponentDescription);
if (HeightfieldTexture && HeightfieldTexture->Resource->TextureRHI)
{
const FIntPoint HeightfieldSize = NewComponentDescription.HeightfieldRect.Size();
if (UpdateRegionHeightfield.Rect.Area() == 0)
{
UpdateRegionHeightfield.Rect = NewComponentDescription.HeightfieldRect;
LocalToWorldScale = NewComponentDescription.LocalToWorld.GetScaleVector().X;
}
else
{
UpdateRegionHeightfield.Rect.Union(NewComponentDescription.HeightfieldRect);
}
TArray<FHeightfieldComponentDescription>& ComponentDescriptions = UpdateRegionHeightfield.ComponentDescriptions.FindOrAdd(FHeightfieldComponentTextures(HeightfieldTexture, DiffuseColorTexture));
ComponentDescriptions.Add(NewComponentDescription);
}
}
}
if (UpdateRegionHeightfield.ComponentDescriptions.Num() > 0)
{
SCOPED_DRAW_EVENT(RHICmdList, CompositeHeightfields);
for (TMap<FHeightfieldComponentTextures, TArray<FHeightfieldComponentDescription>>::TConstIterator It(UpdateRegionHeightfield.ComponentDescriptions); It; ++It)
{
const TArray<FHeightfieldComponentDescription>& HeightfieldDescriptions = It.Value();
if (HeightfieldDescriptions.Num() > 0)
{
UploadHeightfieldDescriptions(HeightfieldDescriptions, FVector2D(1, 1), 1.0f / UpdateRegionHeightfield.DownsampleFactor);
UTexture2D* HeightfieldTexture = It.Key().HeightAndNormal;
TShaderMapRef<FCompositeHeightfieldsIntoGlobalDistanceFieldCS> ComputeShader(View.ShaderMap);
RHICmdList.SetComputeShader(ComputeShader->GetComputeShader());
ComputeShader->SetParameters(RHICmdList, Scene, View, MaxOcclusionDistance, GlobalDistanceFieldInfo, ClipmapIndexValue, UpdateRegion, HeightfieldTexture, HeightfieldDescriptions.Num());
//@todo - match typical update sizes. Camera movement creates narrow slabs.
const uint32 NumGroupsX = FMath::DivideAndRoundUp<int32>(UpdateRegion.CellsSize.X, HeightfieldCompositeTileSize);
const uint32 NumGroupsY = FMath::DivideAndRoundUp<int32>(UpdateRegion.CellsSize.Y, HeightfieldCompositeTileSize);
DispatchComputeShader(RHICmdList, *ComputeShader, NumGroupsX, NumGroupsY, 1);
ComputeShader->UnsetParameters(RHICmdList, GlobalDistanceFieldInfo, ClipmapIndexValue);
}
}
}
}
}
/** Constructs and adds an update region based on camera movement for the given axis. */
void AddUpdateRegionForAxis(FIntVector Movement, const FBox& ClipmapBounds, float CellSize, int32 ComponentIndex, TArray<FVolumeUpdateRegion, TInlineAllocator<3> >& UpdateRegions)
{
FVolumeUpdateRegion UpdateRegion;
UpdateRegion.Bounds = ClipmapBounds;
UpdateRegion.CellsSize = FIntVector(GAOGlobalDFResolution);
UpdateRegion.CellsSize[ComponentIndex] = FMath::Min(FMath::Abs(Movement[ComponentIndex]), GAOGlobalDFResolution);
if (Movement[ComponentIndex] > 0)
{
// Positive axis movement, set the min of that axis to contain the newly exposed area
UpdateRegion.Bounds.Min[ComponentIndex] = FMath::Max(ClipmapBounds.Max[ComponentIndex] - Movement[ComponentIndex] * CellSize, ClipmapBounds.Min[ComponentIndex]);
}
else if (Movement[ComponentIndex] < 0)
{
// Negative axis movement, set the max of that axis to contain the newly exposed area
UpdateRegion.Bounds.Max[ComponentIndex] = FMath::Min(ClipmapBounds.Min[ComponentIndex] - Movement[ComponentIndex] * CellSize, ClipmapBounds.Max[ComponentIndex]);
}
if (UpdateRegion.CellsSize[ComponentIndex] > 0)
{
UpdateRegions.Add(UpdateRegion);
}
}
/** Constructs and adds an update region based on the given primitive bounds. */
void AddUpdateRegionForPrimitive(const FVector4& Bounds, float MaxSphereQueryRadius, const FBox& ClipmapBounds, float CellSize, TArray<FVolumeUpdateRegion, TInlineAllocator<3> >& UpdateRegions)
{
// Object influence bounds
FBox BoundingBox((FVector)Bounds - Bounds.W - MaxSphereQueryRadius, (FVector)Bounds + Bounds.W + MaxSphereQueryRadius);
FVolumeUpdateRegion UpdateRegion;
UpdateRegion.Bounds.Init();
// Snap the min and clamp to clipmap bounds
UpdateRegion.Bounds.Min.X = FMath::Max(CellSize * FMath::FloorToFloat(BoundingBox.Min.X / CellSize), ClipmapBounds.Min.X);
UpdateRegion.Bounds.Min.Y = FMath::Max(CellSize * FMath::FloorToFloat(BoundingBox.Min.Y / CellSize), ClipmapBounds.Min.Y);
UpdateRegion.Bounds.Min.Z = FMath::Max(CellSize * FMath::FloorToFloat(BoundingBox.Min.Z / CellSize), ClipmapBounds.Min.Z);
// Derive the max from the snapped min and size, clamp to clipmap bounds
UpdateRegion.Bounds.Max = UpdateRegion.Bounds.Min + FVector(FMath::CeilToFloat((Bounds.W + MaxSphereQueryRadius) * 2 / CellSize)) * CellSize;
UpdateRegion.Bounds.Max.X = FMath::Min(UpdateRegion.Bounds.Max.X, ClipmapBounds.Max.X);
UpdateRegion.Bounds.Max.Y = FMath::Min(UpdateRegion.Bounds.Max.Y, ClipmapBounds.Max.Y);
UpdateRegion.Bounds.Max.Z = FMath::Min(UpdateRegion.Bounds.Max.Z, ClipmapBounds.Max.Z);
const FVector UpdateRegionSize = UpdateRegion.Bounds.GetSize();
UpdateRegion.CellsSize.X = FMath::TruncToInt(UpdateRegionSize.X / CellSize + .5f);
UpdateRegion.CellsSize.Y = FMath::TruncToInt(UpdateRegionSize.Y / CellSize + .5f);
UpdateRegion.CellsSize.Z = FMath::TruncToInt(UpdateRegionSize.Z / CellSize + .5f);
// Only add update regions with positive area
if (UpdateRegion.CellsSize.X > 0 && UpdateRegion.CellsSize.Y > 0 && UpdateRegion.CellsSize.Z > 0)
{
checkSlow(UpdateRegion.CellsSize.X <= GAOGlobalDFResolution && UpdateRegion.CellsSize.Y <= GAOGlobalDFResolution && UpdateRegion.CellsSize.Z <= GAOGlobalDFResolution);
UpdateRegions.Add(UpdateRegion);
}
}
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
void TrimOverlappingAxis(int32 TrimAxis, float CellSize, const FVolumeUpdateRegion& OtherUpdateRegion, FVolumeUpdateRegion& UpdateRegion)
{
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
int32 OtherAxis0 = (TrimAxis + 1) % 3;
int32 OtherAxis1 = (TrimAxis + 2) % 3;
// Check if the UpdateRegion is entirely contained in 2d
if (UpdateRegion.Bounds.Max[OtherAxis0] <= OtherUpdateRegion.Bounds.Max[OtherAxis0]
&& UpdateRegion.Bounds.Min[OtherAxis0] >= OtherUpdateRegion.Bounds.Min[OtherAxis0]
&& UpdateRegion.Bounds.Max[OtherAxis1] <= OtherUpdateRegion.Bounds.Max[OtherAxis1]
&& UpdateRegion.Bounds.Min[OtherAxis1] >= OtherUpdateRegion.Bounds.Min[OtherAxis1])
{
if (UpdateRegion.Bounds.Min[TrimAxis] >= OtherUpdateRegion.Bounds.Min[TrimAxis] && UpdateRegion.Bounds.Min[TrimAxis] <= OtherUpdateRegion.Bounds.Max[TrimAxis])
{
// Min on this axis is completely contained within the other region, clip it so there's no overlapping update region
UpdateRegion.Bounds.Min[TrimAxis] = OtherUpdateRegion.Bounds.Max[TrimAxis];
}
else
{
// otherwise Max on this axis must be inside the other region, because we know the two volumes intersect
UpdateRegion.Bounds.Max[TrimAxis] = OtherUpdateRegion.Bounds.Min[TrimAxis];
}
UpdateRegion.CellsSize[TrimAxis] = FMath::TruncToInt((FMath::Max(UpdateRegion.Bounds.Max[TrimAxis] - UpdateRegion.Bounds.Min[TrimAxis], 0.0f)) / CellSize + .5f);
}
}
void AllocateClipmapTexture(FRHICommandListImmediate& RHICmdList, int32 ClipmapIndex, FGlobalDFCacheType CacheType, TRefCountPtr<IPooledRenderTarget>& Texture)
{
const TCHAR* TextureName = CacheType == GDF_MostlyStatic ? TEXT("MostlyStaticGlobalDistanceField0") : TEXT("GlobalDistanceField0");
if (ClipmapIndex == 1)
{
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
TextureName = CacheType == GDF_MostlyStatic ? TEXT("MostlyStaticGlobalDistanceField1") : TEXT("GlobalDistanceField1");
}
else if (ClipmapIndex == 2)
{
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
TextureName = CacheType == GDF_MostlyStatic ? TEXT("MostlyStaticGlobalDistanceField2") : TEXT("GlobalDistanceField2");
}
else if (ClipmapIndex == 3)
{
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
TextureName = CacheType == GDF_MostlyStatic ? TEXT("MostlyStaticGlobalDistanceField3") : TEXT("GlobalDistanceField3");
}
FPooledRenderTargetDesc VolumeDesc = FPooledRenderTargetDesc(FPooledRenderTargetDesc::CreateVolumeDesc(
GAOGlobalDFResolution,
GAOGlobalDFResolution,
GAOGlobalDFResolution,
PF_R16F,
FClearValueBinding::None,
0,
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3357411) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3244756 on 2017/01/03 by Marcus.Wassmer Copying //Tasks/UE4/Dev-Niagara@3244743 to Dev-Rendering (//UE4/Dev-Rendering) Change 3248667 on 2017/01/05 by Olaf.Piesche Resaving default asset because of engine verison issue; maybe unnecessary, but resaving niagara engine content to be sure #jira UE-40160 Change 3249324 on 2017/01/06 by Marcus.Wassmer Resave with an actual version to stop cook warning Change 3249611 on 2017/01/06 by Marcus.Wassmer Just remove warning-causing niagara data for now. Change 3308052 on 2017/02/16 by Rolando.Caloca DR - Check for Vulkan SDK, and only use it if it's newer or the same as the headers we distribute Change 3308109 on 2017/02/16 by Rolando.Caloca DR - Upgrade glslang to 1.0.39.1 Change 3308111 on 2017/02/16 by Rolando.Caloca DR - Update Vulkan distribution to 1.0.39.1 Change 3308153 on 2017/02/16 by Rolando.Caloca DR - Updated glslang libs Change 3308842 on 2017/02/17 by Rolando.Caloca DR - Fixed copy/paste Change 3310007 on 2017/02/17 by Chris.Bunner Back out CL 3221219 - causing MIC generation issues and superseded by CL 3273971. #jira UE-37792 Change 3310154 on 2017/02/17 by Chris.Bunner Assert when attempting to add a custom material attribute already in the base attributes list. Change 3310155 on 2017/02/17 by Chris.Bunner PR #3231: Validate material index before accessing (Contributed by projectgheist) #jira UE-41774, UE-41788 Change 3310162 on 2017/02/17 by Chris.Bunner PR #3252: Added MobileMaterialInterface to UsedMaterials (Contributed by projectgheist) #jira UE-41823, UE-41950 Change 3310176 on 2017/02/17 by Chris.Bunner Merging CL 3233886: AMD HDR support (requires r.AMDSupportsHDRDisplayOutput=1 in ini). Update to AGS 5.0.5. Partial code tidy up. Change 3310187 on 2017/02/17 by Chris.Bunner Preserve constant expressions rather than always casting after translating a material attribute. Losing the notion of constant means we can't correctly detect used properties and falsely enable e.g. PDO. Happened because of the incorrect component masks in BreakMaterialNodes which then had to be downcast to the correct type which is done as an inline fragment rather than swizzle expression. #jira UE-41594 Change 3310215 on 2017/02/17 by Chris.Bunner Prevent SpeedTree node compiling for skeletal meshes (not supported as uses more UV sets than available). More descriptive error for missing Cubemap UV input on TextureSample material node . #jira UE-33098 Change 3310838 on 2017/02/18 by Joe.Graf Moved some private functions to public for a licensee #CodeReview: matt.kuhlenschmidt #rb: n/a Change 3311876 on 2017/02/20 by Rolando.Caloca DR - Expose skin cache cvar r.SkinCache.AccumulationBufferSizeInKB #jira UE-42014 Change 3314139 on 2017/02/21 by Rolando.Caloca DR - Minor cleanup pass - Remove FVulkanPendingState - Renamed some classes for clarity - Hoist pending UAVs for flush out to pending compute state Change 3314642 on 2017/02/21 by Rolando.Caloca DR - Some more renaming Change 3315431 on 2017/02/21 by Ben.Salem Properly set default values for test time out and tick. We now will default to ticking once per second, and tracking the macro stats of GPU/Render/Game thread time. #tests Ran showdown demo several times Change 3316710 on 2017/02/22 by Rolando.Caloca DR - hlslcc - Fix refract intrinsic Change 3316718 on 2017/02/22 by Rolando.Caloca DR - hlslcc - Built libs to pick up change from 3316710 - refract fix Change 3316820 on 2017/02/22 by Benjamin.Hyder updating Tm-TrigNodes map Change 3317192 on 2017/02/22 by Benjamin.Hyder Updating QA-Decals map Change 3317528 on 2017/02/22 by Benjamin.Hyder Updating QA-Decals map Change 3317639 on 2017/02/22 by Benjamin.Hyder Updating Decal on Complex Mesh example in QA-Decals Change 3317764 on 2017/02/22 by Benjamin.Hyder Final updates to QA-Decals Change 3318319 on 2017/02/22 by Rolando.Caloca DR - minor reorg/rename Change 3318379 on 2017/02/22 by Rolando.Caloca DR - more cleanup Change 3321181 on 2017/02/24 by Rolando.Caloca DR - Fix GL bug Change 3321247 on 2017/02/24 by Rolando.Caloca DR - Fix misc bugs Change 3321898 on 2017/02/24 by Chris.Bunner Only issue clear TLV dispatch if required. #jira UERNDR-193 Change 3321904 on 2017/02/24 by Chris.Bunner Added comment for potential future optimization. Change 3322013 on 2017/02/24 by Uriel.Doyon Fixed separate translucency being affected by Gaussian DOF #jira UE-40489 Change 3322517 on 2017/02/24 by Uriel.Doyon Fixed issue with InvestigateTexture command removing budget limit. Fixed StreamingBounds show flag not working. It nows shows the streaming bound for the currently selected textures. #jira UE-40485 Change 3323470 on 2017/02/27 by Chad.Garyet Removing DDC job from dev-rendering Change 3323479 on 2017/02/27 by Chad.Garyet Removing RDU agent type Change 3323519 on 2017/02/27 by Chad.Garyet removing NCL/LHR/SEA agent types to clean up space Change 3323639 on 2017/02/27 by Benjamin.Hyder More updates to QA-Decals Change 3324207 on 2017/02/27 by Uriel.Doyon Fixed typo ScaleTexturesByGlobalMyBias -> ScaleTexturesByGlobalMipBias Removed bad merge in FStreamingTextureLevelContext::GetBuildDataIndexRef Change 3324396 on 2017/02/27 by Uriel.Doyon Fixed an issue with the Streaming Bounds show flag interferring with the static level data initialization #jira UE-40485 Change 3325227 on 2017/02/28 by Chris.Bunner Fix-up AMD AGS libs. Change 3325566 on 2017/02/28 by Uriel.Doyon Fixed possible out-of-bound access in GetUsedTexture() when passing ERHIFeatureLevel::Num Change 3326009 on 2017/02/28 by Uriel.Doyon Better fix for 3325566, as the previous fix would ignore the material instance overrides. Change 3327058 on 2017/03/01 by Benjamin.Hyder Preparing TM_Shadermodels map for automation Change 3328222 on 2017/03/01 by Chris.Bunner Prevent decals from drawing in separate translucency pass. Whilst user control and material relevance were already removed, if the flag was checked before being disabled (by swapping to decal domain) this was still being read in the render loop, now explicitly ignores decals. #jira UE-42449, UE-42446 Change 3329848 on 2017/03/02 by Uriel.Doyon Added some extra logs to help track UE-42168 Change 3329977 on 2017/03/02 by Rolando.Caloca DR - Fix bad clear value Change 3330008 on 2017/03/02 by Benjamin.Hyder More preparations for QA-Decals automation Change 3330754 on 2017/03/02 by Daniel.Wright Prominent comment explaining reflection env async compute usage and why it's not overlapped with anything Change 3331451 on 2017/03/03 by Marc.Olano Manually unroll simplex noise loop to avoid PSO bug on AMD/Metal Change 3331839 on 2017/03/03 by Rolando.Caloca DR - hlslcc - add missing file to project Change 3332247 on 2017/03/03 by Rolando.Caloca DR - Fix for integrated intel PR #3305 #jira UE-42393 Change 3332259 on 2017/03/03 by Rolando.Caloca DR - Fix bad index into pixel formats PR #3237 #jira UE-41855 Change 3332305 on 2017/03/03 by Rolando.Caloca DR - OpenGL SRV for index buffers PR #3271 #jira UE-32618 Change 3332313 on 2017/03/03 by Rolando.Caloca DR - Fix for integrated intel (properly) PR #3305 #jira UE-42393 Change 3332317 on 2017/03/03 by Rolando.Caloca DR - OpenGL SRV for index buffers (properly) PR #3271 #jira UE-32618 Change 3332368 on 2017/03/03 by Rolando.Caloca DR - Minor fixes so -sm4 and -sm5 can be used on windows with OpenGL/Vulkan Change 3333690 on 2017/03/06 by Daniel.Wright [Copy] Changing movable skylight properties no longer affects static draw lists Change 3333693 on 2017/03/06 by Daniel.Wright [Copy] Added 'r.AOListMeshDistanceFields' which dumps out mesh distance fields sorted by memory size, useful for directing content optimizations Change 3333705 on 2017/03/06 by Daniel.Wright [Copy] Mesh distance fields are now 8 bit fixed point by default, but can be changed back to 16 bit floating piont with a project setting. * 8 bit uses half memory but introduces error for thin surfaces or large meshes. Change 3333721 on 2017/03/06 by David.Hill DecalProxy: Copy float FadeScreenSize to FDeferredDecalProxy for use in the render thread. This avoids pointer chasing to the UDecalComponent (game thread component). Change 3333772 on 2017/03/06 by Daniel.Wright [Copy] Scene motion blur data is only updated for the main renderer frames. Fixes scene captures and planar reflections breaking object motion blur. Change 3333790 on 2017/03/06 by Daniel.Wright [Copy] Mesh distance field generation uses Embree, for a 2.5x speedup * Can switch back to old kDOP generation with 'r.DistanceFieldBuild.UseEmbree 0' for debugging Change 3333822 on 2017/03/06 by Daniel.Wright [Copy] Moved mesh distance field code into MeshDistanceFieldUtilities.cpp * Moved FMeshUtilities to its own header so the 8k line MeshUtilites.cpp file can be further split up Change 3333827 on 2017/03/06 by Daniel.Wright [Copy] Range compress 8bit distance fields - gets one extra bit of precision on average Change 3333828 on 2017/03/06 by Daniel.Wright [Copy] Raised High ShadowQuality to 2048 as 1024 for CSM is way too low Change 3333831 on 2017/03/06 by Daniel.Wright Non-editor compile fix Change 3333836 on 2017/03/06 by Daniel.Wright [Copy] Workaround for gobal distance field volume textures being bloated by 4x on PS4 due to the recommended tiling modes. They now use a 2d tiling mode which avoids the bloat, saving 96Mb. Change 3333843 on 2017/03/06 by Daniel.Wright [Copy] Added OcclusionExponent to skylight component * Useful for brightening up indoors without losing contact shadows as MinOcclusion does Change 3333845 on 2017/03/06 by Daniel.Wright [Copy] Capsule shadow BP functions Change 3333850 on 2017/03/06 by Daniel.Wright [Copy] Added OcclusionCombineMode to skylight component Change 3333854 on 2017/03/06 by Daniel.Wright [Copy] Gnm properly registers clears as GPU work so those events show up in profilegpu Change 3333857 on 2017/03/06 by Daniel.Wright [Copy] Clear light attenuation for local lights with a quad covering their screen extents * Clearing the entire light attenuation buffer costs .1ms on PS4. This optimization lowers the minimum cost of a shadow casting light from .15ms -> .03ms. * Shadowed lights in Fortnite with 25 lights 3.7ms -> 1.42ms on PS4 Change 3333860 on 2017/03/06 by Daniel.Wright [Copy] Flush deferred deletes when reallocating distance field atlas to reduce peak memory Change 3333861 on 2017/03/06 by Daniel.Wright [Copy] Disable all distance field features on Intel cards as HD 4000 hangs in the RHICreateTexture3D call to allocate the large atlas Change 3333869 on 2017/03/06 by Daniel.Wright [Copy] Volumetric Fog using a volume texture mapped to the camera frustum * Volumetric fog can be enabled on an Exponential Height Fog component with additional controls * Lights have a VolumetricScatteringIntensity * New cvars r.VolumetricFog, r.VolumetricFog.GridPixelSize, r.VolumetricFog.GridSizeZ, r.VolumetricFog.DepthDistributionScale * Lighting features supported: * Directional light with CSM and a light function * Point / spot lights without shadows / light functions / IES profiles * Skylight with occlusion from distance fields * Analytical height fog covers the view range past where the volumetric fog ends * Temporal reprojection is used on the volumetric fog scattering and extinction to achieve stability * Translucency integrates properly into volumetric fog * Height fog StartDistance is not supported by volumetric fog and should be set to 0. Change 3333894 on 2017/03/06 by Daniel.Wright [Copy] Initialize GDummyVolumetricFogGlobalDataUniformBuffer outside of parallel rendering Change 3333902 on 2017/03/06 by Daniel.Wright [Copy] Better handling of volumetric fog enabled with distance of 0 Change 3333903 on 2017/03/06 by Daniel.Wright [Copy] Fixed volumetric fog trying to render light functions for a point light Change 3333908 on 2017/03/06 by Daniel.Wright [Copy] Volumetric materials * Added new material domain Volume, which can output Scattering, Absorption and Emissive. All properties are in world space densities. * Particle systems using the Volume domain are voxelized based on their ParticlePosition and ParticleRadius * Volumetric fog integration is now energy conservative - scattering is integrated against transmission over the depth of each slice. * Added bOverrideLightColorsWithFogInscatteringColors to exponential height fog, which can be enabled to make Volumetric Fog match Height fog more closely Change 3334134 on 2017/03/06 by Daniel.Wright [Copy from Michael Trepka] Added Embree 2.14.0 and changed MeshUtilities to use it as this solves issues with Embree leaking TLS keys. UnrealLightmass is still using older Embree 2.7.0 until we can find time to properly test it with the new version. Also, invalidated distance field DDC to force it to rebuild with updated Embree. Change 3334420 on 2017/03/06 by Daniel.Wright Fixed RTDF shadows Change 3335467 on 2017/03/07 by Benjamin.Hyder Initial submission of QA-Decals map to EngineTest Change 3335556 on 2017/03/07 by Daniel.Wright Changed mesh distance field default format back to R16f Change 3338020 on 2017/03/08 by Daniel.Wright Disable volumetric fog in vertex shaders for feature levels which don't support it Change 3339394 on 2017/03/09 by Chris.Bunner Correctly handle material texture translation error edge case. #jira UE-42579, UE-42670 Change 3339992 on 2017/03/09 by Daniel.Wright Only compile volumetric fog shaders on supporting platforms Change 3341858 on 2017/03/10 by Arne.Schober Copying //UE4/Dev-Rendering-PSO to Dev-Rendering (//UE4/Dev-Rendering) #RB Rolando.Caloca, Marcus.Wassmer, Daniel.Wright, Nick.Penwarden, Mark.Satterthwaite Change 3342004 on 2017/03/10 by Arne.Schober Copying //UE4/Dev-Rendering-PSO to Dev-Rendering (//UE4/Dev-Rendering) Fix unity build #RB Marcus.Wassmer Change 3343307 on 2017/03/13 by Marcus.Wassmer Update showflags when we are guaranteed it will happen in all possible ways to spawn the scenecapture. (drag into editor, PIE, -game, etc) Change 3343732 on 2017/03/13 by Rolando.Caloca DR - Vulkan compute pipeline & refactor Change 3344846 on 2017/03/14 by Rolando.Caloca DR - Android compile fixes Change 3344883 on 2017/03/14 by Rolando.Caloca DR - Add missing stencil load/store to PSO initializer Change 3344985 on 2017/03/14 by Rolando.Caloca DR - Made load/store actions uint8 Change 3345141 on 2017/03/14 by Rolando.Caloca DR - vk - Rework render pass hash Change 3345304 on 2017/03/14 by Benjamin.Hyder Updating TM-Distancefields map to include TemplateFloor mesh Change 3345387 on 2017/03/14 by Rolando.Caloca DR - Add _RenderThread calls for Create*Shader so RHIs can choose not to stall when creating Change 3345388 on 2017/03/14 by Rolando.Caloca DR - Do not stall when creating shaders on Vulkan Change 3345722 on 2017/03/14 by Chris.Bunner PR #3357: MinimalAPI add to many material expressions (Contributed by DeanoC) #jira UE-42752 Change 3345723 on 2017/03/14 by Chris.Bunner Reduce log verbosity causing spamming during landscape editing. #jira UE-42714 Change 3345725 on 2017/03/14 by Chris.Bunner [Duplicate 3341860] Fixed material translation error with multiple connections from custom interpolator nodes. Change 3345726 on 2017/03/14 by Chris.Bunner Typo fixes. Change 3345732 on 2017/03/14 by Rolando.Caloca DR - Decouple vertex declaration off BSS Change 3345746 on 2017/03/14 by Chris.Bunner Added sign() intrinsic material graph node and delisted material function workaround. Change 3346042 on 2017/03/14 by Chris.Bunner Implement missing size query interface for FRenderTargetResources. #jira UE-41672 Change 3346387 on 2017/03/14 by Daniel.Wright [Copy] Added VolumetricScatteringIntensity to particle lights Change 3346389 on 2017/03/14 by Daniel.Wright [Copy] Clamp Volumetric material attributes to fp16 range to avoid INFs Disable volumetric fog when the fog show flag is disabled Change 3346392 on 2017/03/14 by Daniel.Wright [Copy] Fixed skylight being much too bright on volumetric fog Change 3346406 on 2017/03/14 by Daniel.Wright [Copy] CSM resolution is now controlled by r.Shadow.MaxCSMResolution. * Changed HighPC to use 1024 MaxShadowResolution (max for all non-CSM shadows), saves 60Mb in Fortnite Change 3346412 on 2017/03/14 by Daniel.Wright [Copy] TexCreate_ReduceMemoryWithTilingMode for translucency lighting 3d textures, saves 13Mb Change 3346414 on 2017/03/14 by Daniel.Wright [Copy] TexCreate_ReduceMemoryWithTilingMode for volumetric fog 3d textures, saves 13Mb Change 3346415 on 2017/03/14 by Daniel.Wright [Copy] Missing file from cl 3338451 Change 3346421 on 2017/03/14 by Daniel.Wright [Copy] Fixed NaNs in volumetric fog due to rendering when height fog is disabled * Volumetric fog converts NaNs to black now so they don't spread Change 3346422 on 2017/03/14 by Daniel.Wright [Copy] Fixed NaN in volumetric fog with low density values Change 3346423 on 2017/03/14 by Daniel.Wright [Copy] Changed default VolumetricFogScatteringDistribution to .2 Change 3346430 on 2017/03/14 by Daniel.Wright [Copy] New translucent material option to compute fog per pixel instead of the default per vertex Change 3346432 on 2017/03/14 by Daniel.Wright [Copy] Moved Volumetric Fog parameters to view uniform buffer for translucency pass Fixed lifetimes of temporary Volumetric Fog render targets Change 3346526 on 2017/03/14 by Daniel.Wright [Copy] Volumetric Fog supports point and spot light shadows * These lights are injected separately so that per-light resources can be bound (shadow depth map, static shadow depth map) * Forward lighting of local lights can be forced with 'r.VolumetricFog.InjectShadowedLightsSeparately 0' * Shadowed lights come at a cost: 2.9ms for volumetric fog on 970 -> 4.2ms with shadowing Change 3347053 on 2017/03/15 by Rolando.Caloca DR - android compile fix Change 3347384 on 2017/03/15 by Rolando.Caloca DR - Fix merge issue Change 3347643 on 2017/03/15 by Marcus.Wassmer Fix some bugs with the 'disable stationary skylight ffor the project' feature. Fixes lighting in Persona on Paragon. Change 3347979 on 2017/03/15 by Rolando.Caloca DR - Allow to automatically apply cached rendertargets to PSO initializer Change 3348024 on 2017/03/15 by Rolando.Caloca DR - Remove NullPS on Vulkan to avoid deadlock Change 3348303 on 2017/03/15 by Rolando.Caloca DR - Fix for debugging SCW with material SRT Change 3348357 on 2017/03/15 by Marcus.Wassmer Fix stencildither and a stencilref bug that was probably breaking decals sometimes. Change 3348549 on 2017/03/15 by Marcus.Wassmer Hopefully fix static analysis for potential nullptr access. Change 3348614 on 2017/03/15 by Marcus.Wassmer Duplicate some switch changes to fix crash on launch. Change 3349369 on 2017/03/16 by Gil.Gribb Fixed botched merge Change 3349947 on 2017/03/16 by Rolando.Caloca DR - Fix for mismatched primitive type Change 3349956 on 2017/03/16 by Benjamin.Hyder initial updates to TM-DistanceFields map Change 3350151 on 2017/03/16 by Rolando.Caloca DR - Fix UT compile issue Change 3350155 on 2017/03/16 by Rolando.Caloca DR - Catch mismatched primitive type on PSOs on D3D11 Change 3350192 on 2017/03/16 by Daniel.Wright Fix for point light shadow depths rendering with wrong cull mode due to PSO refactor Change 3350736 on 2017/03/16 by Daniel.Wright Fixed formatting from merge Change 3350881 on 2017/03/16 by Rolando.Caloca DR - Fix texture arrays as UAVs on Metal Change 3350927 on 2017/03/16 by Rolando.Caloca DR - Fix warning Change 3350935 on 2017/03/16 by Daniel.Wright Fix for materials with non-Surface domains being skipped in mesh passes Change 3351583 on 2017/03/17 by Marcus.Wassmer Fix clang platforms Change 3351917 on 2017/03/17 by Marcus.Wassmer Fix linux compile Change 3351973 on 2017/03/17 by Marcus.Wassmer Fix mismatched rendertargetformat Change 3352038 on 2017/03/17 by Daniel.Wright Enabled GetAndOrCreateGraphicsPipelineState ensures in Development for testing Change 3352110 on 2017/03/17 by Marcus.Wassmer Fix missing RT PSO apply Change 3352695 on 2017/03/17 by Arne.Schober DR - Remove PSO Rendertarget check in DX12 Resolve with Shader. #RB Rolando.Caloca Change 3352960 on 2017/03/17 by Arne.Schober DR - Fix some things that slipped trough the PSO merge #RB none Change 3353150 on 2017/03/18 by Rolando.Caloca DR - compile fix Change 3353205 on 2017/03/18 by Arne.Schober DR - Fix Incremental Compile and PS4 runtime error where CMASK is not allowed for ThickTile Mode #RB none Change 3353207 on 2017/03/18 by Arne.Schober DR - Fix Confusion #RB none Change 3355183 on 2017/03/20 by Nick.Bullard Fixed up Content orginzation for Decals automation tests in EngineTest Change 3355627 on 2017/03/20 by Arne.Schober DR - [UE-43094] - removed ensure in comporiton graph as control of the clear color cannot be gurantueed. Change 3356342 on 2017/03/21 by Marcus.Wassmer Fix clang errors Change 3356591 on 2017/03/21 by Arne.Schober DR - Fix ensure message #RB none Change 3356873 on 2017/03/21 by Arne.Schober DR - Fix comparission of undefined values in RendertargetApply Check Change 3357261 on 2017/03/21 by Marcus.Wassmer Fix LinuxEditor compile Change 3357294 on 2017/03/21 by Marcus.Wassmer Add missing SSE functions Change 3357351 on 2017/03/21 by Frank.Fella Fix win32 and linux compiler errors Change 3357370 on 2017/03/21 by Arne.Schober DR - disable ensure in test builds #RB Marcus.Wassmer [CL 3357449 by Marcus Wassmer in Main branch]
2017-03-21 17:46:52 -04:00
// TexCreate_ReduceMemoryWithTilingMode used because 128^3 texture comes out 4x bigger on PS4 with recommended volume texture tiling modes
TexCreate_ShaderResource | TexCreate_RenderTargetable | TexCreate_UAV | TexCreate_ReduceMemoryWithTilingMode,
false));
VolumeDesc.AutoWritable = false;
GRenderTargetPool.FindFreeElement(
RHICmdList,
VolumeDesc,
Texture,
TextureName
);
}
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3185985) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3170391 on 2016/10/21 by Ben.Woodhouse Remove the wait on end of frame ensure, because we can't rely on all the the underlying codepaths to never miss a call to flush RHI resources. The consequences of missing a flush on a given frame are not serious now, since we enforce the synchronisation with a fence, preventing the rendering thread from getting too far ahead. We will simply accumulate resources for an additional frame when this happens. #jira UE-37437 #fyi rolando.caloca, marcus.wassmer Change 3170659 on 2016/10/21 by Rolando.Caloca DR - vk - Prep work for state key changes Change 3170676 on 2016/10/21 by Rolando.Caloca DR - vk - Reworked blend state keys - Added depth/stencil to pipeline key Change 3170848 on 2016/10/21 by Daniel.Wright Level viewport 'show stats' option is now enabled by default, which avoids confusion with artists thinking lighting is built, when really the message is hidden. Change 3170849 on 2016/10/21 by Daniel.Wright Split FProjectedShadowInfo::RenderProjection into smaller functions which make the algorithm structure clear Change 3170995 on 2016/10/21 by Rolando.Caloca DR - vk - Show object on vulkan validation msgs Change 3171085 on 2016/10/21 by Rolando.Caloca DR - vk - Fix pipelines being used with incompatible renderpasses Change 3171159 on 2016/10/21 by Rolando.Caloca DR - vk - Fix layout when reading textures on CPU Change 3171167 on 2016/10/21 by Rolando.Caloca DR - vk - compile fix Change 3172462 on 2016/10/24 by Daniel.Wright Added a warning about shader compile times to the material tooltip Change 3172463 on 2016/10/24 by Daniel.Wright Reduced MinUnoccludedFraction to avoid artitfacts when a stationary light touches only a tiny part of a mesh Change 3172716 on 2016/10/24 by Brian.Karis Fix for crash UE-37369 when reimporting over a generated LOD. Change 3172967 on 2016/10/24 by Rolando.Caloca DR - vk - Fix writing buffers while GPU was using them Change 3174187 on 2016/10/25 by Olaf.Piesche UE-37020 Change 3174718 on 2016/10/26 by Rolando.Caloca DR - vk - Remove old timestamp queries, increase occlusion queries per pool to 4k Change 3175960 on 2016/10/26 by Rolando.Caloca DR - Added support for hlslcc header to have custom parsing Change 3176611 on 2016/10/27 by David.Hill DrawWireCone confusion: In response to a UDN, I'm updating confusing parameter names and comments for DrawWireCone() and DrawWireSphereCappedCone() Change 3177111 on 2016/10/27 by Rolando.Caloca DR - vk - Fix timestamps for frame Change 3177192 on 2016/10/27 by Arne.Schober DR - DitherLOD refactor - moved computation of the DepthStencil state out of SetMeshRenderState into GetDitheredLODTransitionState this is a prerequisite of further PSO work where we want to move up State setting in a similar war and reuse FMeshDrawingRenderState Change 3177278 on 2016/10/27 by Olaf.Piesche UE-37484 Change 3177297 on 2016/10/27 by Rolando.Caloca DR - vk - Enable GRHISupportsBaseVertexIndex Change 3177607 on 2016/10/27 by Rolando.Caloca DR - vk - SM4 UB prep Change 3178052 on 2016/10/28 by Arne.Schober DR - fix WebGL - the WebGL compiler is very picky on double underscores and does want the presission to be defined before any function definition. Change 3178156 on 2016/10/28 by Rolando.Caloca DR - vk - Added query timer - Fixed inline issues Change 3178158 on 2016/10/28 by Rolando.Caloca DR - vk - Fixes for out of stencil bits Change 3178462 on 2016/10/28 by Rolando.Caloca DR - vk - Fixes for Elemental Change 3179131 on 2016/10/28 by Rolando.Caloca DR - vk - Fix for r.Vulkan.UseRealUBs Change 3179139 on 2016/10/28 by Rolando.Caloca DR - vk - Move UB ring buffer to context Change 3179145 on 2016/10/28 by Rolando.Caloca DR - vk - Fix buffer barriers Change 3179888 on 2016/10/31 by Rolando.Caloca DR - vk - Align buffers to 16 bytes as we sometimes write to them with SIMD Change 3179923 on 2016/10/31 by Rolando.Caloca DR - vk - Wait for swapchain counter Change 3180430 on 2016/10/31 by Rolando.Caloca DR - vk - Properly wait for occlusion queries/cmd buffer - Actual log error if trying to use occlusion queries out of order Change 3180746 on 2016/10/31 by Rolando.Caloca DR - vk - Undo some waiting as it was on the wrong thread Change 3182115 on 2016/11/01 by Rolando.Caloca DR - hlslcc Linux path fix Change 3182118 on 2016/11/01 by Daniel.Wright Fixed global distance field seam artifacts from landscapes with no subsections Change 3182368 on 2016/11/01 by Daniel.Wright Dynamic Indirect Shadows for static meshes using distance fields * These Distance Field indirect shadows use the same tile culled and downsampled framework that capsule shadows use, with similar GPU cost * Individual StaticMesh assets can enable bGenerateMeshDistanceField to compute a distance field, without the memory cost of enabling for the whole project * New StaticMeshComponent properties bCastDynamicIndirectShadow and DynamicIndirectShadowMinVisibility * New WorldSettings property DynamicIndirectShadowsSelfShadowingIntensity which replaces the cvar * The GBuffer now stores HasDynamicIndirectShadowCasterRepresentation instead of HasHeightfieldRepresentation * DFAO from landscape is now done through the global distance field entirely. Landscape contribution to the global distance field is deferred to attempt to workaround texture streaming issues. Change 3182408 on 2016/11/01 by Rolando.Caloca DR - vk - Reworked occlusion queries, fixes flickering on AMD Change 3182585 on 2016/11/01 by Daniel.Wright PS4 compile fix Change 3183151 on 2016/11/02 by Rolando.Caloca DR - vk - Fix issue when processing super quick cmd buffers Change 3183160 on 2016/11/02 by Rolando.Caloca Dr - vk - Call reset queries outside render pass Change 3183182 on 2016/11/02 by Rolando.Caloca DR - Switch clear Change 3183194 on 2016/11/02 by Rolando.Caloca DR - Try to catch crash ahead of time Change 3183268 on 2016/11/02 by Rolando.Caloca DR - vk - Rename RenderPassState to TransitionState Change 3183440 on 2016/11/02 by Daniel.Wright Renamed 'Dynamic Indirect Shadow' to 'Distance Field Indirect Shadow' Change 3183793 on 2016/11/02 by Daniel.Wright Added ShadowResolutionScale to lightcomponent Change 3183796 on 2016/11/02 by Daniel.Wright Improved bSimulatePhysics comment, with info on why it might be greyed out Change 3183797 on 2016/11/02 by Daniel.Wright Precomputed shadowmaps no longer enable Force2To1Aspect, which is only needed for lightmaps. Improves shadowmap utilization. Change 3183915 on 2016/11/02 by Rolando.Caloca DR - vk - Remove redundant renderpasses Change 3183991 on 2016/11/02 by Daniel.Wright Added r.ReflectionEnvironmentLightmapMixLargestWeight, useful for restricting lightmap mixing to darkening only Change 3184001 on 2016/11/02 by Daniel.Wright Better draw event for IndirectCapsuleShadows in stereo Change 3184096 on 2016/11/02 by Chris.Bunner HDR for D3D11 - NVAPI toggle and encoding, UI compositing. Removed some outdated tonemamping cvars and modes. Change 3184399 on 2016/11/02 by Daniel.Wright Static analysis workaround Change 3184455 on 2016/11/02 by Mark.Satterthwaite Fix missing log10 from FCompositePS on hlslcc shader platforms so that QA can continue their integration. #jira UE-38164 Change 3184953 on 2016/11/03 by Chris.Bunner Fixing CIS warnings. [CL 3186011 by Marcus Wassmer in Main branch]
2016-11-03 16:55:27 -04:00
void GetUpdateFrequencyForClipmap(int32 ClipmapIndex, int32& OutFrequency, int32& OutPhase)
{
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3185985) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3170391 on 2016/10/21 by Ben.Woodhouse Remove the wait on end of frame ensure, because we can't rely on all the the underlying codepaths to never miss a call to flush RHI resources. The consequences of missing a flush on a given frame are not serious now, since we enforce the synchronisation with a fence, preventing the rendering thread from getting too far ahead. We will simply accumulate resources for an additional frame when this happens. #jira UE-37437 #fyi rolando.caloca, marcus.wassmer Change 3170659 on 2016/10/21 by Rolando.Caloca DR - vk - Prep work for state key changes Change 3170676 on 2016/10/21 by Rolando.Caloca DR - vk - Reworked blend state keys - Added depth/stencil to pipeline key Change 3170848 on 2016/10/21 by Daniel.Wright Level viewport 'show stats' option is now enabled by default, which avoids confusion with artists thinking lighting is built, when really the message is hidden. Change 3170849 on 2016/10/21 by Daniel.Wright Split FProjectedShadowInfo::RenderProjection into smaller functions which make the algorithm structure clear Change 3170995 on 2016/10/21 by Rolando.Caloca DR - vk - Show object on vulkan validation msgs Change 3171085 on 2016/10/21 by Rolando.Caloca DR - vk - Fix pipelines being used with incompatible renderpasses Change 3171159 on 2016/10/21 by Rolando.Caloca DR - vk - Fix layout when reading textures on CPU Change 3171167 on 2016/10/21 by Rolando.Caloca DR - vk - compile fix Change 3172462 on 2016/10/24 by Daniel.Wright Added a warning about shader compile times to the material tooltip Change 3172463 on 2016/10/24 by Daniel.Wright Reduced MinUnoccludedFraction to avoid artitfacts when a stationary light touches only a tiny part of a mesh Change 3172716 on 2016/10/24 by Brian.Karis Fix for crash UE-37369 when reimporting over a generated LOD. Change 3172967 on 2016/10/24 by Rolando.Caloca DR - vk - Fix writing buffers while GPU was using them Change 3174187 on 2016/10/25 by Olaf.Piesche UE-37020 Change 3174718 on 2016/10/26 by Rolando.Caloca DR - vk - Remove old timestamp queries, increase occlusion queries per pool to 4k Change 3175960 on 2016/10/26 by Rolando.Caloca DR - Added support for hlslcc header to have custom parsing Change 3176611 on 2016/10/27 by David.Hill DrawWireCone confusion: In response to a UDN, I'm updating confusing parameter names and comments for DrawWireCone() and DrawWireSphereCappedCone() Change 3177111 on 2016/10/27 by Rolando.Caloca DR - vk - Fix timestamps for frame Change 3177192 on 2016/10/27 by Arne.Schober DR - DitherLOD refactor - moved computation of the DepthStencil state out of SetMeshRenderState into GetDitheredLODTransitionState this is a prerequisite of further PSO work where we want to move up State setting in a similar war and reuse FMeshDrawingRenderState Change 3177278 on 2016/10/27 by Olaf.Piesche UE-37484 Change 3177297 on 2016/10/27 by Rolando.Caloca DR - vk - Enable GRHISupportsBaseVertexIndex Change 3177607 on 2016/10/27 by Rolando.Caloca DR - vk - SM4 UB prep Change 3178052 on 2016/10/28 by Arne.Schober DR - fix WebGL - the WebGL compiler is very picky on double underscores and does want the presission to be defined before any function definition. Change 3178156 on 2016/10/28 by Rolando.Caloca DR - vk - Added query timer - Fixed inline issues Change 3178158 on 2016/10/28 by Rolando.Caloca DR - vk - Fixes for out of stencil bits Change 3178462 on 2016/10/28 by Rolando.Caloca DR - vk - Fixes for Elemental Change 3179131 on 2016/10/28 by Rolando.Caloca DR - vk - Fix for r.Vulkan.UseRealUBs Change 3179139 on 2016/10/28 by Rolando.Caloca DR - vk - Move UB ring buffer to context Change 3179145 on 2016/10/28 by Rolando.Caloca DR - vk - Fix buffer barriers Change 3179888 on 2016/10/31 by Rolando.Caloca DR - vk - Align buffers to 16 bytes as we sometimes write to them with SIMD Change 3179923 on 2016/10/31 by Rolando.Caloca DR - vk - Wait for swapchain counter Change 3180430 on 2016/10/31 by Rolando.Caloca DR - vk - Properly wait for occlusion queries/cmd buffer - Actual log error if trying to use occlusion queries out of order Change 3180746 on 2016/10/31 by Rolando.Caloca DR - vk - Undo some waiting as it was on the wrong thread Change 3182115 on 2016/11/01 by Rolando.Caloca DR - hlslcc Linux path fix Change 3182118 on 2016/11/01 by Daniel.Wright Fixed global distance field seam artifacts from landscapes with no subsections Change 3182368 on 2016/11/01 by Daniel.Wright Dynamic Indirect Shadows for static meshes using distance fields * These Distance Field indirect shadows use the same tile culled and downsampled framework that capsule shadows use, with similar GPU cost * Individual StaticMesh assets can enable bGenerateMeshDistanceField to compute a distance field, without the memory cost of enabling for the whole project * New StaticMeshComponent properties bCastDynamicIndirectShadow and DynamicIndirectShadowMinVisibility * New WorldSettings property DynamicIndirectShadowsSelfShadowingIntensity which replaces the cvar * The GBuffer now stores HasDynamicIndirectShadowCasterRepresentation instead of HasHeightfieldRepresentation * DFAO from landscape is now done through the global distance field entirely. Landscape contribution to the global distance field is deferred to attempt to workaround texture streaming issues. Change 3182408 on 2016/11/01 by Rolando.Caloca DR - vk - Reworked occlusion queries, fixes flickering on AMD Change 3182585 on 2016/11/01 by Daniel.Wright PS4 compile fix Change 3183151 on 2016/11/02 by Rolando.Caloca DR - vk - Fix issue when processing super quick cmd buffers Change 3183160 on 2016/11/02 by Rolando.Caloca Dr - vk - Call reset queries outside render pass Change 3183182 on 2016/11/02 by Rolando.Caloca DR - Switch clear Change 3183194 on 2016/11/02 by Rolando.Caloca DR - Try to catch crash ahead of time Change 3183268 on 2016/11/02 by Rolando.Caloca DR - vk - Rename RenderPassState to TransitionState Change 3183440 on 2016/11/02 by Daniel.Wright Renamed 'Dynamic Indirect Shadow' to 'Distance Field Indirect Shadow' Change 3183793 on 2016/11/02 by Daniel.Wright Added ShadowResolutionScale to lightcomponent Change 3183796 on 2016/11/02 by Daniel.Wright Improved bSimulatePhysics comment, with info on why it might be greyed out Change 3183797 on 2016/11/02 by Daniel.Wright Precomputed shadowmaps no longer enable Force2To1Aspect, which is only needed for lightmaps. Improves shadowmap utilization. Change 3183915 on 2016/11/02 by Rolando.Caloca DR - vk - Remove redundant renderpasses Change 3183991 on 2016/11/02 by Daniel.Wright Added r.ReflectionEnvironmentLightmapMixLargestWeight, useful for restricting lightmap mixing to darkening only Change 3184001 on 2016/11/02 by Daniel.Wright Better draw event for IndirectCapsuleShadows in stereo Change 3184096 on 2016/11/02 by Chris.Bunner HDR for D3D11 - NVAPI toggle and encoding, UI compositing. Removed some outdated tonemamping cvars and modes. Change 3184399 on 2016/11/02 by Daniel.Wright Static analysis workaround Change 3184455 on 2016/11/02 by Mark.Satterthwaite Fix missing log10 from FCompositePS on hlslcc shader platforms so that QA can continue their integration. #jira UE-38164 Change 3184953 on 2016/11/03 by Chris.Bunner Fixing CIS warnings. [CL 3186011 by Marcus Wassmer in Main branch]
2016-11-03 16:55:27 -04:00
OutFrequency = 1;
OutPhase = 0;
Copying //UE4/Dev-Rendering to Dev-Main (Source //UE4/Dev-Rendering@2932636) #lockdown nick.penwarden ========================== MAJOR FEATURES + CHANGES ========================== Change 2917472 on 2016/03/21 by Rolando.Caloca DR - Fix SCW directcompile arguments, add -pipeline Change 2919580 on 2016/03/23 by Rolando.Caloca DR - HlslParser - Fix for used elements (sparrow's arrow was showing when it shouldn't) Arrays of input/outputs are now flattened so disjoint entries can be optimized out (and fixes a bug) #jira OR-15380 #tests Run game with sparrow, test with slomo to check for gfx glitches Change 2919660 on 2016/03/23 by Rolando.Caloca DR - Latest vk changes (from dev mobile's 2916881 to 2919157) Change 2919902 on 2016/03/23 by Rolando.Caloca DR - Fix skeletal meshes decrementing stats twice #codereview Marcus.Wassmer #jira UE-28478 Change 2920020 on 2016/03/23 by David.Hill #Jira UE-28503 EyeAdapation when used in material shader may not be initialized. #rb olaf.piesche Change 2920071 on 2016/03/23 by Rolando.Caloca DR - Remove old vk define - Started moving around direct calls to queue submit Change 2920252 on 2016/03/23 by Rolando.Caloca DR - Changes vk structs to classes Change 2920314 on 2016/03/23 by Olaf.Piesche Add -windowed to standalone game PIE command line to avoid PIE launching in full screen #jira UE-27870 #codereview michael.trepka Change 2920745 on 2016/03/24 by Uriel.Doyon Texture streaming build now takes into account the material texcoord scales applied to the texture sampling. Also finds out which texcoord is being used when sampling textures (between 0 and 3 currently). TexCoord analysis debug view shaders is now working with SM4 ane SM5. StaticMeshComponents hold persistent data coming from the texture streaming build. #tests tested with different Paragon assets. Editor SM4 & SM5. Cooked maps #codereview marcus.wassmer Change 2921335 on 2016/03/24 by Uriel.Doyon Added missing static keyword for locally defined console variable. #codereview rolando.caloca Change 2921416 on 2016/03/24 by Uriel.Doyon Revert enabling debugview shaders on non PC platforms (until properly tested and debugged) Change 2921446 on 2016/03/24 by Daniel.Wright Planar reflection mesh Change 2921530 on 2016/03/24 by Daniel.Wright Manual revert of Ronin planar reflections #codereview Ryan.Vance Change 2921608 on 2016/03/24 by Uriel.Doyon Updated texture streamer to take into account the new HLOD texture group. Change 2921677 on 2016/03/24 by Daniel.Wright Distance Field Specular Occlusion * Prototype - disabled by default Change 2921681 on 2016/03/24 by Daniel.Wright UnmappedTexelsPercentage is now 100 based Change 2921682 on 2016/03/24 by Daniel.Wright Planar reflections * New actor and component * The scene is rendered to texture with a mirrored camera and a clip plane each frame * The reflection texture is then applied to opaque pixels in a deferred pass, with distance and angle from plane fades * Translucent materials apply the nearest reflection plane in the base pass * Planar reflections require the project setting 'Support global clip plane for Planar Reflections' to be enabled, since writing to SV_ClipDistance all the time adds about 15% BasePass GPU time on PS4 * Fixed global distance field in materials which had been broken since moving global distance field properties into the view uniform buffer * Fixed PS4 removing system-value semantics when output from vertex shader and not read in next stage Change 2921734 on 2016/03/24 by Uriel.Doyon Fixed tessellated cube having wrong UVs #jira UE-28379 Change 2922063 on 2016/03/24 by Daniel.Wright Removed planar reflection debug code Change 2922428 on 2016/03/25 by Chris.Bunner Delete FShaderPipeline objects when clearing TMaterialShaderMaps. #rb Rolando.Caloca #jira UE-28621 Change 2922803 on 2016/03/25 by Rolando.Caloca DR - New cmd buffer management (disabled) - Move cmd buffer out of pending state and into context - Do not hardcode # cmd buffers - Move back buffer image mgmt into swapchain - Fixed some image layout transition bugs Change 2923056 on 2016/03/25 by Rolando.Caloca DR - Initial fix for canvas locking inside a render pass [CL 2932649 by Gil Gribb in Main branch]
2016-04-04 18:44:59 -04:00
if (ClipmapIndex == 0 || !GAOGlobalDistanceFieldStaggeredUpdates)
{
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3185985) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3170391 on 2016/10/21 by Ben.Woodhouse Remove the wait on end of frame ensure, because we can't rely on all the the underlying codepaths to never miss a call to flush RHI resources. The consequences of missing a flush on a given frame are not serious now, since we enforce the synchronisation with a fence, preventing the rendering thread from getting too far ahead. We will simply accumulate resources for an additional frame when this happens. #jira UE-37437 #fyi rolando.caloca, marcus.wassmer Change 3170659 on 2016/10/21 by Rolando.Caloca DR - vk - Prep work for state key changes Change 3170676 on 2016/10/21 by Rolando.Caloca DR - vk - Reworked blend state keys - Added depth/stencil to pipeline key Change 3170848 on 2016/10/21 by Daniel.Wright Level viewport 'show stats' option is now enabled by default, which avoids confusion with artists thinking lighting is built, when really the message is hidden. Change 3170849 on 2016/10/21 by Daniel.Wright Split FProjectedShadowInfo::RenderProjection into smaller functions which make the algorithm structure clear Change 3170995 on 2016/10/21 by Rolando.Caloca DR - vk - Show object on vulkan validation msgs Change 3171085 on 2016/10/21 by Rolando.Caloca DR - vk - Fix pipelines being used with incompatible renderpasses Change 3171159 on 2016/10/21 by Rolando.Caloca DR - vk - Fix layout when reading textures on CPU Change 3171167 on 2016/10/21 by Rolando.Caloca DR - vk - compile fix Change 3172462 on 2016/10/24 by Daniel.Wright Added a warning about shader compile times to the material tooltip Change 3172463 on 2016/10/24 by Daniel.Wright Reduced MinUnoccludedFraction to avoid artitfacts when a stationary light touches only a tiny part of a mesh Change 3172716 on 2016/10/24 by Brian.Karis Fix for crash UE-37369 when reimporting over a generated LOD. Change 3172967 on 2016/10/24 by Rolando.Caloca DR - vk - Fix writing buffers while GPU was using them Change 3174187 on 2016/10/25 by Olaf.Piesche UE-37020 Change 3174718 on 2016/10/26 by Rolando.Caloca DR - vk - Remove old timestamp queries, increase occlusion queries per pool to 4k Change 3175960 on 2016/10/26 by Rolando.Caloca DR - Added support for hlslcc header to have custom parsing Change 3176611 on 2016/10/27 by David.Hill DrawWireCone confusion: In response to a UDN, I'm updating confusing parameter names and comments for DrawWireCone() and DrawWireSphereCappedCone() Change 3177111 on 2016/10/27 by Rolando.Caloca DR - vk - Fix timestamps for frame Change 3177192 on 2016/10/27 by Arne.Schober DR - DitherLOD refactor - moved computation of the DepthStencil state out of SetMeshRenderState into GetDitheredLODTransitionState this is a prerequisite of further PSO work where we want to move up State setting in a similar war and reuse FMeshDrawingRenderState Change 3177278 on 2016/10/27 by Olaf.Piesche UE-37484 Change 3177297 on 2016/10/27 by Rolando.Caloca DR - vk - Enable GRHISupportsBaseVertexIndex Change 3177607 on 2016/10/27 by Rolando.Caloca DR - vk - SM4 UB prep Change 3178052 on 2016/10/28 by Arne.Schober DR - fix WebGL - the WebGL compiler is very picky on double underscores and does want the presission to be defined before any function definition. Change 3178156 on 2016/10/28 by Rolando.Caloca DR - vk - Added query timer - Fixed inline issues Change 3178158 on 2016/10/28 by Rolando.Caloca DR - vk - Fixes for out of stencil bits Change 3178462 on 2016/10/28 by Rolando.Caloca DR - vk - Fixes for Elemental Change 3179131 on 2016/10/28 by Rolando.Caloca DR - vk - Fix for r.Vulkan.UseRealUBs Change 3179139 on 2016/10/28 by Rolando.Caloca DR - vk - Move UB ring buffer to context Change 3179145 on 2016/10/28 by Rolando.Caloca DR - vk - Fix buffer barriers Change 3179888 on 2016/10/31 by Rolando.Caloca DR - vk - Align buffers to 16 bytes as we sometimes write to them with SIMD Change 3179923 on 2016/10/31 by Rolando.Caloca DR - vk - Wait for swapchain counter Change 3180430 on 2016/10/31 by Rolando.Caloca DR - vk - Properly wait for occlusion queries/cmd buffer - Actual log error if trying to use occlusion queries out of order Change 3180746 on 2016/10/31 by Rolando.Caloca DR - vk - Undo some waiting as it was on the wrong thread Change 3182115 on 2016/11/01 by Rolando.Caloca DR - hlslcc Linux path fix Change 3182118 on 2016/11/01 by Daniel.Wright Fixed global distance field seam artifacts from landscapes with no subsections Change 3182368 on 2016/11/01 by Daniel.Wright Dynamic Indirect Shadows for static meshes using distance fields * These Distance Field indirect shadows use the same tile culled and downsampled framework that capsule shadows use, with similar GPU cost * Individual StaticMesh assets can enable bGenerateMeshDistanceField to compute a distance field, without the memory cost of enabling for the whole project * New StaticMeshComponent properties bCastDynamicIndirectShadow and DynamicIndirectShadowMinVisibility * New WorldSettings property DynamicIndirectShadowsSelfShadowingIntensity which replaces the cvar * The GBuffer now stores HasDynamicIndirectShadowCasterRepresentation instead of HasHeightfieldRepresentation * DFAO from landscape is now done through the global distance field entirely. Landscape contribution to the global distance field is deferred to attempt to workaround texture streaming issues. Change 3182408 on 2016/11/01 by Rolando.Caloca DR - vk - Reworked occlusion queries, fixes flickering on AMD Change 3182585 on 2016/11/01 by Daniel.Wright PS4 compile fix Change 3183151 on 2016/11/02 by Rolando.Caloca DR - vk - Fix issue when processing super quick cmd buffers Change 3183160 on 2016/11/02 by Rolando.Caloca Dr - vk - Call reset queries outside render pass Change 3183182 on 2016/11/02 by Rolando.Caloca DR - Switch clear Change 3183194 on 2016/11/02 by Rolando.Caloca DR - Try to catch crash ahead of time Change 3183268 on 2016/11/02 by Rolando.Caloca DR - vk - Rename RenderPassState to TransitionState Change 3183440 on 2016/11/02 by Daniel.Wright Renamed 'Dynamic Indirect Shadow' to 'Distance Field Indirect Shadow' Change 3183793 on 2016/11/02 by Daniel.Wright Added ShadowResolutionScale to lightcomponent Change 3183796 on 2016/11/02 by Daniel.Wright Improved bSimulatePhysics comment, with info on why it might be greyed out Change 3183797 on 2016/11/02 by Daniel.Wright Precomputed shadowmaps no longer enable Force2To1Aspect, which is only needed for lightmaps. Improves shadowmap utilization. Change 3183915 on 2016/11/02 by Rolando.Caloca DR - vk - Remove redundant renderpasses Change 3183991 on 2016/11/02 by Daniel.Wright Added r.ReflectionEnvironmentLightmapMixLargestWeight, useful for restricting lightmap mixing to darkening only Change 3184001 on 2016/11/02 by Daniel.Wright Better draw event for IndirectCapsuleShadows in stereo Change 3184096 on 2016/11/02 by Chris.Bunner HDR for D3D11 - NVAPI toggle and encoding, UI compositing. Removed some outdated tonemamping cvars and modes. Change 3184399 on 2016/11/02 by Daniel.Wright Static analysis workaround Change 3184455 on 2016/11/02 by Mark.Satterthwaite Fix missing log10 from FCompositePS on hlslcc shader platforms so that QA can continue their integration. #jira UE-38164 Change 3184953 on 2016/11/03 by Chris.Bunner Fixing CIS warnings. [CL 3186011 by Marcus Wassmer in Main branch]
2016-11-03 16:55:27 -04:00
OutFrequency = 1;
OutPhase = 0;
}
else if (ClipmapIndex == 1)
{
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3185985) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3170391 on 2016/10/21 by Ben.Woodhouse Remove the wait on end of frame ensure, because we can't rely on all the the underlying codepaths to never miss a call to flush RHI resources. The consequences of missing a flush on a given frame are not serious now, since we enforce the synchronisation with a fence, preventing the rendering thread from getting too far ahead. We will simply accumulate resources for an additional frame when this happens. #jira UE-37437 #fyi rolando.caloca, marcus.wassmer Change 3170659 on 2016/10/21 by Rolando.Caloca DR - vk - Prep work for state key changes Change 3170676 on 2016/10/21 by Rolando.Caloca DR - vk - Reworked blend state keys - Added depth/stencil to pipeline key Change 3170848 on 2016/10/21 by Daniel.Wright Level viewport 'show stats' option is now enabled by default, which avoids confusion with artists thinking lighting is built, when really the message is hidden. Change 3170849 on 2016/10/21 by Daniel.Wright Split FProjectedShadowInfo::RenderProjection into smaller functions which make the algorithm structure clear Change 3170995 on 2016/10/21 by Rolando.Caloca DR - vk - Show object on vulkan validation msgs Change 3171085 on 2016/10/21 by Rolando.Caloca DR - vk - Fix pipelines being used with incompatible renderpasses Change 3171159 on 2016/10/21 by Rolando.Caloca DR - vk - Fix layout when reading textures on CPU Change 3171167 on 2016/10/21 by Rolando.Caloca DR - vk - compile fix Change 3172462 on 2016/10/24 by Daniel.Wright Added a warning about shader compile times to the material tooltip Change 3172463 on 2016/10/24 by Daniel.Wright Reduced MinUnoccludedFraction to avoid artitfacts when a stationary light touches only a tiny part of a mesh Change 3172716 on 2016/10/24 by Brian.Karis Fix for crash UE-37369 when reimporting over a generated LOD. Change 3172967 on 2016/10/24 by Rolando.Caloca DR - vk - Fix writing buffers while GPU was using them Change 3174187 on 2016/10/25 by Olaf.Piesche UE-37020 Change 3174718 on 2016/10/26 by Rolando.Caloca DR - vk - Remove old timestamp queries, increase occlusion queries per pool to 4k Change 3175960 on 2016/10/26 by Rolando.Caloca DR - Added support for hlslcc header to have custom parsing Change 3176611 on 2016/10/27 by David.Hill DrawWireCone confusion: In response to a UDN, I'm updating confusing parameter names and comments for DrawWireCone() and DrawWireSphereCappedCone() Change 3177111 on 2016/10/27 by Rolando.Caloca DR - vk - Fix timestamps for frame Change 3177192 on 2016/10/27 by Arne.Schober DR - DitherLOD refactor - moved computation of the DepthStencil state out of SetMeshRenderState into GetDitheredLODTransitionState this is a prerequisite of further PSO work where we want to move up State setting in a similar war and reuse FMeshDrawingRenderState Change 3177278 on 2016/10/27 by Olaf.Piesche UE-37484 Change 3177297 on 2016/10/27 by Rolando.Caloca DR - vk - Enable GRHISupportsBaseVertexIndex Change 3177607 on 2016/10/27 by Rolando.Caloca DR - vk - SM4 UB prep Change 3178052 on 2016/10/28 by Arne.Schober DR - fix WebGL - the WebGL compiler is very picky on double underscores and does want the presission to be defined before any function definition. Change 3178156 on 2016/10/28 by Rolando.Caloca DR - vk - Added query timer - Fixed inline issues Change 3178158 on 2016/10/28 by Rolando.Caloca DR - vk - Fixes for out of stencil bits Change 3178462 on 2016/10/28 by Rolando.Caloca DR - vk - Fixes for Elemental Change 3179131 on 2016/10/28 by Rolando.Caloca DR - vk - Fix for r.Vulkan.UseRealUBs Change 3179139 on 2016/10/28 by Rolando.Caloca DR - vk - Move UB ring buffer to context Change 3179145 on 2016/10/28 by Rolando.Caloca DR - vk - Fix buffer barriers Change 3179888 on 2016/10/31 by Rolando.Caloca DR - vk - Align buffers to 16 bytes as we sometimes write to them with SIMD Change 3179923 on 2016/10/31 by Rolando.Caloca DR - vk - Wait for swapchain counter Change 3180430 on 2016/10/31 by Rolando.Caloca DR - vk - Properly wait for occlusion queries/cmd buffer - Actual log error if trying to use occlusion queries out of order Change 3180746 on 2016/10/31 by Rolando.Caloca DR - vk - Undo some waiting as it was on the wrong thread Change 3182115 on 2016/11/01 by Rolando.Caloca DR - hlslcc Linux path fix Change 3182118 on 2016/11/01 by Daniel.Wright Fixed global distance field seam artifacts from landscapes with no subsections Change 3182368 on 2016/11/01 by Daniel.Wright Dynamic Indirect Shadows for static meshes using distance fields * These Distance Field indirect shadows use the same tile culled and downsampled framework that capsule shadows use, with similar GPU cost * Individual StaticMesh assets can enable bGenerateMeshDistanceField to compute a distance field, without the memory cost of enabling for the whole project * New StaticMeshComponent properties bCastDynamicIndirectShadow and DynamicIndirectShadowMinVisibility * New WorldSettings property DynamicIndirectShadowsSelfShadowingIntensity which replaces the cvar * The GBuffer now stores HasDynamicIndirectShadowCasterRepresentation instead of HasHeightfieldRepresentation * DFAO from landscape is now done through the global distance field entirely. Landscape contribution to the global distance field is deferred to attempt to workaround texture streaming issues. Change 3182408 on 2016/11/01 by Rolando.Caloca DR - vk - Reworked occlusion queries, fixes flickering on AMD Change 3182585 on 2016/11/01 by Daniel.Wright PS4 compile fix Change 3183151 on 2016/11/02 by Rolando.Caloca DR - vk - Fix issue when processing super quick cmd buffers Change 3183160 on 2016/11/02 by Rolando.Caloca Dr - vk - Call reset queries outside render pass Change 3183182 on 2016/11/02 by Rolando.Caloca DR - Switch clear Change 3183194 on 2016/11/02 by Rolando.Caloca DR - Try to catch crash ahead of time Change 3183268 on 2016/11/02 by Rolando.Caloca DR - vk - Rename RenderPassState to TransitionState Change 3183440 on 2016/11/02 by Daniel.Wright Renamed 'Dynamic Indirect Shadow' to 'Distance Field Indirect Shadow' Change 3183793 on 2016/11/02 by Daniel.Wright Added ShadowResolutionScale to lightcomponent Change 3183796 on 2016/11/02 by Daniel.Wright Improved bSimulatePhysics comment, with info on why it might be greyed out Change 3183797 on 2016/11/02 by Daniel.Wright Precomputed shadowmaps no longer enable Force2To1Aspect, which is only needed for lightmaps. Improves shadowmap utilization. Change 3183915 on 2016/11/02 by Rolando.Caloca DR - vk - Remove redundant renderpasses Change 3183991 on 2016/11/02 by Daniel.Wright Added r.ReflectionEnvironmentLightmapMixLargestWeight, useful for restricting lightmap mixing to darkening only Change 3184001 on 2016/11/02 by Daniel.Wright Better draw event for IndirectCapsuleShadows in stereo Change 3184096 on 2016/11/02 by Chris.Bunner HDR for D3D11 - NVAPI toggle and encoding, UI compositing. Removed some outdated tonemamping cvars and modes. Change 3184399 on 2016/11/02 by Daniel.Wright Static analysis workaround Change 3184455 on 2016/11/02 by Mark.Satterthwaite Fix missing log10 from FCompositePS on hlslcc shader platforms so that QA can continue their integration. #jira UE-38164 Change 3184953 on 2016/11/03 by Chris.Bunner Fixing CIS warnings. [CL 3186011 by Marcus Wassmer in Main branch]
2016-11-03 16:55:27 -04:00
OutFrequency = 2;
OutPhase = 0;
}
else if (ClipmapIndex == 2)
{
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3185985) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3170391 on 2016/10/21 by Ben.Woodhouse Remove the wait on end of frame ensure, because we can't rely on all the the underlying codepaths to never miss a call to flush RHI resources. The consequences of missing a flush on a given frame are not serious now, since we enforce the synchronisation with a fence, preventing the rendering thread from getting too far ahead. We will simply accumulate resources for an additional frame when this happens. #jira UE-37437 #fyi rolando.caloca, marcus.wassmer Change 3170659 on 2016/10/21 by Rolando.Caloca DR - vk - Prep work for state key changes Change 3170676 on 2016/10/21 by Rolando.Caloca DR - vk - Reworked blend state keys - Added depth/stencil to pipeline key Change 3170848 on 2016/10/21 by Daniel.Wright Level viewport 'show stats' option is now enabled by default, which avoids confusion with artists thinking lighting is built, when really the message is hidden. Change 3170849 on 2016/10/21 by Daniel.Wright Split FProjectedShadowInfo::RenderProjection into smaller functions which make the algorithm structure clear Change 3170995 on 2016/10/21 by Rolando.Caloca DR - vk - Show object on vulkan validation msgs Change 3171085 on 2016/10/21 by Rolando.Caloca DR - vk - Fix pipelines being used with incompatible renderpasses Change 3171159 on 2016/10/21 by Rolando.Caloca DR - vk - Fix layout when reading textures on CPU Change 3171167 on 2016/10/21 by Rolando.Caloca DR - vk - compile fix Change 3172462 on 2016/10/24 by Daniel.Wright Added a warning about shader compile times to the material tooltip Change 3172463 on 2016/10/24 by Daniel.Wright Reduced MinUnoccludedFraction to avoid artitfacts when a stationary light touches only a tiny part of a mesh Change 3172716 on 2016/10/24 by Brian.Karis Fix for crash UE-37369 when reimporting over a generated LOD. Change 3172967 on 2016/10/24 by Rolando.Caloca DR - vk - Fix writing buffers while GPU was using them Change 3174187 on 2016/10/25 by Olaf.Piesche UE-37020 Change 3174718 on 2016/10/26 by Rolando.Caloca DR - vk - Remove old timestamp queries, increase occlusion queries per pool to 4k Change 3175960 on 2016/10/26 by Rolando.Caloca DR - Added support for hlslcc header to have custom parsing Change 3176611 on 2016/10/27 by David.Hill DrawWireCone confusion: In response to a UDN, I'm updating confusing parameter names and comments for DrawWireCone() and DrawWireSphereCappedCone() Change 3177111 on 2016/10/27 by Rolando.Caloca DR - vk - Fix timestamps for frame Change 3177192 on 2016/10/27 by Arne.Schober DR - DitherLOD refactor - moved computation of the DepthStencil state out of SetMeshRenderState into GetDitheredLODTransitionState this is a prerequisite of further PSO work where we want to move up State setting in a similar war and reuse FMeshDrawingRenderState Change 3177278 on 2016/10/27 by Olaf.Piesche UE-37484 Change 3177297 on 2016/10/27 by Rolando.Caloca DR - vk - Enable GRHISupportsBaseVertexIndex Change 3177607 on 2016/10/27 by Rolando.Caloca DR - vk - SM4 UB prep Change 3178052 on 2016/10/28 by Arne.Schober DR - fix WebGL - the WebGL compiler is very picky on double underscores and does want the presission to be defined before any function definition. Change 3178156 on 2016/10/28 by Rolando.Caloca DR - vk - Added query timer - Fixed inline issues Change 3178158 on 2016/10/28 by Rolando.Caloca DR - vk - Fixes for out of stencil bits Change 3178462 on 2016/10/28 by Rolando.Caloca DR - vk - Fixes for Elemental Change 3179131 on 2016/10/28 by Rolando.Caloca DR - vk - Fix for r.Vulkan.UseRealUBs Change 3179139 on 2016/10/28 by Rolando.Caloca DR - vk - Move UB ring buffer to context Change 3179145 on 2016/10/28 by Rolando.Caloca DR - vk - Fix buffer barriers Change 3179888 on 2016/10/31 by Rolando.Caloca DR - vk - Align buffers to 16 bytes as we sometimes write to them with SIMD Change 3179923 on 2016/10/31 by Rolando.Caloca DR - vk - Wait for swapchain counter Change 3180430 on 2016/10/31 by Rolando.Caloca DR - vk - Properly wait for occlusion queries/cmd buffer - Actual log error if trying to use occlusion queries out of order Change 3180746 on 2016/10/31 by Rolando.Caloca DR - vk - Undo some waiting as it was on the wrong thread Change 3182115 on 2016/11/01 by Rolando.Caloca DR - hlslcc Linux path fix Change 3182118 on 2016/11/01 by Daniel.Wright Fixed global distance field seam artifacts from landscapes with no subsections Change 3182368 on 2016/11/01 by Daniel.Wright Dynamic Indirect Shadows for static meshes using distance fields * These Distance Field indirect shadows use the same tile culled and downsampled framework that capsule shadows use, with similar GPU cost * Individual StaticMesh assets can enable bGenerateMeshDistanceField to compute a distance field, without the memory cost of enabling for the whole project * New StaticMeshComponent properties bCastDynamicIndirectShadow and DynamicIndirectShadowMinVisibility * New WorldSettings property DynamicIndirectShadowsSelfShadowingIntensity which replaces the cvar * The GBuffer now stores HasDynamicIndirectShadowCasterRepresentation instead of HasHeightfieldRepresentation * DFAO from landscape is now done through the global distance field entirely. Landscape contribution to the global distance field is deferred to attempt to workaround texture streaming issues. Change 3182408 on 2016/11/01 by Rolando.Caloca DR - vk - Reworked occlusion queries, fixes flickering on AMD Change 3182585 on 2016/11/01 by Daniel.Wright PS4 compile fix Change 3183151 on 2016/11/02 by Rolando.Caloca DR - vk - Fix issue when processing super quick cmd buffers Change 3183160 on 2016/11/02 by Rolando.Caloca Dr - vk - Call reset queries outside render pass Change 3183182 on 2016/11/02 by Rolando.Caloca DR - Switch clear Change 3183194 on 2016/11/02 by Rolando.Caloca DR - Try to catch crash ahead of time Change 3183268 on 2016/11/02 by Rolando.Caloca DR - vk - Rename RenderPassState to TransitionState Change 3183440 on 2016/11/02 by Daniel.Wright Renamed 'Dynamic Indirect Shadow' to 'Distance Field Indirect Shadow' Change 3183793 on 2016/11/02 by Daniel.Wright Added ShadowResolutionScale to lightcomponent Change 3183796 on 2016/11/02 by Daniel.Wright Improved bSimulatePhysics comment, with info on why it might be greyed out Change 3183797 on 2016/11/02 by Daniel.Wright Precomputed shadowmaps no longer enable Force2To1Aspect, which is only needed for lightmaps. Improves shadowmap utilization. Change 3183915 on 2016/11/02 by Rolando.Caloca DR - vk - Remove redundant renderpasses Change 3183991 on 2016/11/02 by Daniel.Wright Added r.ReflectionEnvironmentLightmapMixLargestWeight, useful for restricting lightmap mixing to darkening only Change 3184001 on 2016/11/02 by Daniel.Wright Better draw event for IndirectCapsuleShadows in stereo Change 3184096 on 2016/11/02 by Chris.Bunner HDR for D3D11 - NVAPI toggle and encoding, UI compositing. Removed some outdated tonemamping cvars and modes. Change 3184399 on 2016/11/02 by Daniel.Wright Static analysis workaround Change 3184455 on 2016/11/02 by Mark.Satterthwaite Fix missing log10 from FCompositePS on hlslcc shader platforms so that QA can continue their integration. #jira UE-38164 Change 3184953 on 2016/11/03 by Chris.Bunner Fixing CIS warnings. [CL 3186011 by Marcus Wassmer in Main branch]
2016-11-03 16:55:27 -04:00
OutFrequency = 4;
OutPhase = 1;
}
else
{
check(ClipmapIndex == 3);
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3185985) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3170391 on 2016/10/21 by Ben.Woodhouse Remove the wait on end of frame ensure, because we can't rely on all the the underlying codepaths to never miss a call to flush RHI resources. The consequences of missing a flush on a given frame are not serious now, since we enforce the synchronisation with a fence, preventing the rendering thread from getting too far ahead. We will simply accumulate resources for an additional frame when this happens. #jira UE-37437 #fyi rolando.caloca, marcus.wassmer Change 3170659 on 2016/10/21 by Rolando.Caloca DR - vk - Prep work for state key changes Change 3170676 on 2016/10/21 by Rolando.Caloca DR - vk - Reworked blend state keys - Added depth/stencil to pipeline key Change 3170848 on 2016/10/21 by Daniel.Wright Level viewport 'show stats' option is now enabled by default, which avoids confusion with artists thinking lighting is built, when really the message is hidden. Change 3170849 on 2016/10/21 by Daniel.Wright Split FProjectedShadowInfo::RenderProjection into smaller functions which make the algorithm structure clear Change 3170995 on 2016/10/21 by Rolando.Caloca DR - vk - Show object on vulkan validation msgs Change 3171085 on 2016/10/21 by Rolando.Caloca DR - vk - Fix pipelines being used with incompatible renderpasses Change 3171159 on 2016/10/21 by Rolando.Caloca DR - vk - Fix layout when reading textures on CPU Change 3171167 on 2016/10/21 by Rolando.Caloca DR - vk - compile fix Change 3172462 on 2016/10/24 by Daniel.Wright Added a warning about shader compile times to the material tooltip Change 3172463 on 2016/10/24 by Daniel.Wright Reduced MinUnoccludedFraction to avoid artitfacts when a stationary light touches only a tiny part of a mesh Change 3172716 on 2016/10/24 by Brian.Karis Fix for crash UE-37369 when reimporting over a generated LOD. Change 3172967 on 2016/10/24 by Rolando.Caloca DR - vk - Fix writing buffers while GPU was using them Change 3174187 on 2016/10/25 by Olaf.Piesche UE-37020 Change 3174718 on 2016/10/26 by Rolando.Caloca DR - vk - Remove old timestamp queries, increase occlusion queries per pool to 4k Change 3175960 on 2016/10/26 by Rolando.Caloca DR - Added support for hlslcc header to have custom parsing Change 3176611 on 2016/10/27 by David.Hill DrawWireCone confusion: In response to a UDN, I'm updating confusing parameter names and comments for DrawWireCone() and DrawWireSphereCappedCone() Change 3177111 on 2016/10/27 by Rolando.Caloca DR - vk - Fix timestamps for frame Change 3177192 on 2016/10/27 by Arne.Schober DR - DitherLOD refactor - moved computation of the DepthStencil state out of SetMeshRenderState into GetDitheredLODTransitionState this is a prerequisite of further PSO work where we want to move up State setting in a similar war and reuse FMeshDrawingRenderState Change 3177278 on 2016/10/27 by Olaf.Piesche UE-37484 Change 3177297 on 2016/10/27 by Rolando.Caloca DR - vk - Enable GRHISupportsBaseVertexIndex Change 3177607 on 2016/10/27 by Rolando.Caloca DR - vk - SM4 UB prep Change 3178052 on 2016/10/28 by Arne.Schober DR - fix WebGL - the WebGL compiler is very picky on double underscores and does want the presission to be defined before any function definition. Change 3178156 on 2016/10/28 by Rolando.Caloca DR - vk - Added query timer - Fixed inline issues Change 3178158 on 2016/10/28 by Rolando.Caloca DR - vk - Fixes for out of stencil bits Change 3178462 on 2016/10/28 by Rolando.Caloca DR - vk - Fixes for Elemental Change 3179131 on 2016/10/28 by Rolando.Caloca DR - vk - Fix for r.Vulkan.UseRealUBs Change 3179139 on 2016/10/28 by Rolando.Caloca DR - vk - Move UB ring buffer to context Change 3179145 on 2016/10/28 by Rolando.Caloca DR - vk - Fix buffer barriers Change 3179888 on 2016/10/31 by Rolando.Caloca DR - vk - Align buffers to 16 bytes as we sometimes write to them with SIMD Change 3179923 on 2016/10/31 by Rolando.Caloca DR - vk - Wait for swapchain counter Change 3180430 on 2016/10/31 by Rolando.Caloca DR - vk - Properly wait for occlusion queries/cmd buffer - Actual log error if trying to use occlusion queries out of order Change 3180746 on 2016/10/31 by Rolando.Caloca DR - vk - Undo some waiting as it was on the wrong thread Change 3182115 on 2016/11/01 by Rolando.Caloca DR - hlslcc Linux path fix Change 3182118 on 2016/11/01 by Daniel.Wright Fixed global distance field seam artifacts from landscapes with no subsections Change 3182368 on 2016/11/01 by Daniel.Wright Dynamic Indirect Shadows for static meshes using distance fields * These Distance Field indirect shadows use the same tile culled and downsampled framework that capsule shadows use, with similar GPU cost * Individual StaticMesh assets can enable bGenerateMeshDistanceField to compute a distance field, without the memory cost of enabling for the whole project * New StaticMeshComponent properties bCastDynamicIndirectShadow and DynamicIndirectShadowMinVisibility * New WorldSettings property DynamicIndirectShadowsSelfShadowingIntensity which replaces the cvar * The GBuffer now stores HasDynamicIndirectShadowCasterRepresentation instead of HasHeightfieldRepresentation * DFAO from landscape is now done through the global distance field entirely. Landscape contribution to the global distance field is deferred to attempt to workaround texture streaming issues. Change 3182408 on 2016/11/01 by Rolando.Caloca DR - vk - Reworked occlusion queries, fixes flickering on AMD Change 3182585 on 2016/11/01 by Daniel.Wright PS4 compile fix Change 3183151 on 2016/11/02 by Rolando.Caloca DR - vk - Fix issue when processing super quick cmd buffers Change 3183160 on 2016/11/02 by Rolando.Caloca Dr - vk - Call reset queries outside render pass Change 3183182 on 2016/11/02 by Rolando.Caloca DR - Switch clear Change 3183194 on 2016/11/02 by Rolando.Caloca DR - Try to catch crash ahead of time Change 3183268 on 2016/11/02 by Rolando.Caloca DR - vk - Rename RenderPassState to TransitionState Change 3183440 on 2016/11/02 by Daniel.Wright Renamed 'Dynamic Indirect Shadow' to 'Distance Field Indirect Shadow' Change 3183793 on 2016/11/02 by Daniel.Wright Added ShadowResolutionScale to lightcomponent Change 3183796 on 2016/11/02 by Daniel.Wright Improved bSimulatePhysics comment, with info on why it might be greyed out Change 3183797 on 2016/11/02 by Daniel.Wright Precomputed shadowmaps no longer enable Force2To1Aspect, which is only needed for lightmaps. Improves shadowmap utilization. Change 3183915 on 2016/11/02 by Rolando.Caloca DR - vk - Remove redundant renderpasses Change 3183991 on 2016/11/02 by Daniel.Wright Added r.ReflectionEnvironmentLightmapMixLargestWeight, useful for restricting lightmap mixing to darkening only Change 3184001 on 2016/11/02 by Daniel.Wright Better draw event for IndirectCapsuleShadows in stereo Change 3184096 on 2016/11/02 by Chris.Bunner HDR for D3D11 - NVAPI toggle and encoding, UI compositing. Removed some outdated tonemamping cvars and modes. Change 3184399 on 2016/11/02 by Daniel.Wright Static analysis workaround Change 3184455 on 2016/11/02 by Mark.Satterthwaite Fix missing log10 from FCompositePS on hlslcc shader platforms so that QA can continue their integration. #jira UE-38164 Change 3184953 on 2016/11/03 by Chris.Bunner Fixing CIS warnings. [CL 3186011 by Marcus Wassmer in Main branch]
2016-11-03 16:55:27 -04:00
OutFrequency = 4;
OutPhase = 3;
}
}
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3185985) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3170391 on 2016/10/21 by Ben.Woodhouse Remove the wait on end of frame ensure, because we can't rely on all the the underlying codepaths to never miss a call to flush RHI resources. The consequences of missing a flush on a given frame are not serious now, since we enforce the synchronisation with a fence, preventing the rendering thread from getting too far ahead. We will simply accumulate resources for an additional frame when this happens. #jira UE-37437 #fyi rolando.caloca, marcus.wassmer Change 3170659 on 2016/10/21 by Rolando.Caloca DR - vk - Prep work for state key changes Change 3170676 on 2016/10/21 by Rolando.Caloca DR - vk - Reworked blend state keys - Added depth/stencil to pipeline key Change 3170848 on 2016/10/21 by Daniel.Wright Level viewport 'show stats' option is now enabled by default, which avoids confusion with artists thinking lighting is built, when really the message is hidden. Change 3170849 on 2016/10/21 by Daniel.Wright Split FProjectedShadowInfo::RenderProjection into smaller functions which make the algorithm structure clear Change 3170995 on 2016/10/21 by Rolando.Caloca DR - vk - Show object on vulkan validation msgs Change 3171085 on 2016/10/21 by Rolando.Caloca DR - vk - Fix pipelines being used with incompatible renderpasses Change 3171159 on 2016/10/21 by Rolando.Caloca DR - vk - Fix layout when reading textures on CPU Change 3171167 on 2016/10/21 by Rolando.Caloca DR - vk - compile fix Change 3172462 on 2016/10/24 by Daniel.Wright Added a warning about shader compile times to the material tooltip Change 3172463 on 2016/10/24 by Daniel.Wright Reduced MinUnoccludedFraction to avoid artitfacts when a stationary light touches only a tiny part of a mesh Change 3172716 on 2016/10/24 by Brian.Karis Fix for crash UE-37369 when reimporting over a generated LOD. Change 3172967 on 2016/10/24 by Rolando.Caloca DR - vk - Fix writing buffers while GPU was using them Change 3174187 on 2016/10/25 by Olaf.Piesche UE-37020 Change 3174718 on 2016/10/26 by Rolando.Caloca DR - vk - Remove old timestamp queries, increase occlusion queries per pool to 4k Change 3175960 on 2016/10/26 by Rolando.Caloca DR - Added support for hlslcc header to have custom parsing Change 3176611 on 2016/10/27 by David.Hill DrawWireCone confusion: In response to a UDN, I'm updating confusing parameter names and comments for DrawWireCone() and DrawWireSphereCappedCone() Change 3177111 on 2016/10/27 by Rolando.Caloca DR - vk - Fix timestamps for frame Change 3177192 on 2016/10/27 by Arne.Schober DR - DitherLOD refactor - moved computation of the DepthStencil state out of SetMeshRenderState into GetDitheredLODTransitionState this is a prerequisite of further PSO work where we want to move up State setting in a similar war and reuse FMeshDrawingRenderState Change 3177278 on 2016/10/27 by Olaf.Piesche UE-37484 Change 3177297 on 2016/10/27 by Rolando.Caloca DR - vk - Enable GRHISupportsBaseVertexIndex Change 3177607 on 2016/10/27 by Rolando.Caloca DR - vk - SM4 UB prep Change 3178052 on 2016/10/28 by Arne.Schober DR - fix WebGL - the WebGL compiler is very picky on double underscores and does want the presission to be defined before any function definition. Change 3178156 on 2016/10/28 by Rolando.Caloca DR - vk - Added query timer - Fixed inline issues Change 3178158 on 2016/10/28 by Rolando.Caloca DR - vk - Fixes for out of stencil bits Change 3178462 on 2016/10/28 by Rolando.Caloca DR - vk - Fixes for Elemental Change 3179131 on 2016/10/28 by Rolando.Caloca DR - vk - Fix for r.Vulkan.UseRealUBs Change 3179139 on 2016/10/28 by Rolando.Caloca DR - vk - Move UB ring buffer to context Change 3179145 on 2016/10/28 by Rolando.Caloca DR - vk - Fix buffer barriers Change 3179888 on 2016/10/31 by Rolando.Caloca DR - vk - Align buffers to 16 bytes as we sometimes write to them with SIMD Change 3179923 on 2016/10/31 by Rolando.Caloca DR - vk - Wait for swapchain counter Change 3180430 on 2016/10/31 by Rolando.Caloca DR - vk - Properly wait for occlusion queries/cmd buffer - Actual log error if trying to use occlusion queries out of order Change 3180746 on 2016/10/31 by Rolando.Caloca DR - vk - Undo some waiting as it was on the wrong thread Change 3182115 on 2016/11/01 by Rolando.Caloca DR - hlslcc Linux path fix Change 3182118 on 2016/11/01 by Daniel.Wright Fixed global distance field seam artifacts from landscapes with no subsections Change 3182368 on 2016/11/01 by Daniel.Wright Dynamic Indirect Shadows for static meshes using distance fields * These Distance Field indirect shadows use the same tile culled and downsampled framework that capsule shadows use, with similar GPU cost * Individual StaticMesh assets can enable bGenerateMeshDistanceField to compute a distance field, without the memory cost of enabling for the whole project * New StaticMeshComponent properties bCastDynamicIndirectShadow and DynamicIndirectShadowMinVisibility * New WorldSettings property DynamicIndirectShadowsSelfShadowingIntensity which replaces the cvar * The GBuffer now stores HasDynamicIndirectShadowCasterRepresentation instead of HasHeightfieldRepresentation * DFAO from landscape is now done through the global distance field entirely. Landscape contribution to the global distance field is deferred to attempt to workaround texture streaming issues. Change 3182408 on 2016/11/01 by Rolando.Caloca DR - vk - Reworked occlusion queries, fixes flickering on AMD Change 3182585 on 2016/11/01 by Daniel.Wright PS4 compile fix Change 3183151 on 2016/11/02 by Rolando.Caloca DR - vk - Fix issue when processing super quick cmd buffers Change 3183160 on 2016/11/02 by Rolando.Caloca Dr - vk - Call reset queries outside render pass Change 3183182 on 2016/11/02 by Rolando.Caloca DR - Switch clear Change 3183194 on 2016/11/02 by Rolando.Caloca DR - Try to catch crash ahead of time Change 3183268 on 2016/11/02 by Rolando.Caloca DR - vk - Rename RenderPassState to TransitionState Change 3183440 on 2016/11/02 by Daniel.Wright Renamed 'Dynamic Indirect Shadow' to 'Distance Field Indirect Shadow' Change 3183793 on 2016/11/02 by Daniel.Wright Added ShadowResolutionScale to lightcomponent Change 3183796 on 2016/11/02 by Daniel.Wright Improved bSimulatePhysics comment, with info on why it might be greyed out Change 3183797 on 2016/11/02 by Daniel.Wright Precomputed shadowmaps no longer enable Force2To1Aspect, which is only needed for lightmaps. Improves shadowmap utilization. Change 3183915 on 2016/11/02 by Rolando.Caloca DR - vk - Remove redundant renderpasses Change 3183991 on 2016/11/02 by Daniel.Wright Added r.ReflectionEnvironmentLightmapMixLargestWeight, useful for restricting lightmap mixing to darkening only Change 3184001 on 2016/11/02 by Daniel.Wright Better draw event for IndirectCapsuleShadows in stereo Change 3184096 on 2016/11/02 by Chris.Bunner HDR for D3D11 - NVAPI toggle and encoding, UI compositing. Removed some outdated tonemamping cvars and modes. Change 3184399 on 2016/11/02 by Daniel.Wright Static analysis workaround Change 3184455 on 2016/11/02 by Mark.Satterthwaite Fix missing log10 from FCompositePS on hlslcc shader platforms so that QA can continue their integration. #jira UE-38164 Change 3184953 on 2016/11/03 by Chris.Bunner Fixing CIS warnings. [CL 3186011 by Marcus Wassmer in Main branch]
2016-11-03 16:55:27 -04:00
/** Staggers clipmap updates so there are only 2 per frame */
bool ShouldUpdateClipmapThisFrame(int32 ClipmapIndex, int32 GlobalDistanceFieldUpdateIndex)
{
int32 Frequency;
int32 Phase;
GetUpdateFrequencyForClipmap(ClipmapIndex, Frequency, Phase);
return GlobalDistanceFieldUpdateIndex % Frequency == Phase;
}
Copying //UE4/Dev-Rendering to Dev-Main (//UE4/Dev-Main) (Source: //UE4/Dev-Rendering @ 2943238) #lockdown nick.penwarden ========================== MAJOR FEATURES + CHANGES ========================== Change 2932679 on 2016/04/04 by Martin.Mittring remove hack/cvar that is not longer needed as we fixed the bug #rb:Bob.Tellez #code_review:Bob.Tellez Change 2932681 on 2016/04/04 by Martin.Mittring fixed cvars in consolevariables.ini can affect engine even if marked with cheat (no longer load consolevariables.ini in shipping and test), unified 3 code path, added testcase, cvars with cheat in ini file other than consolevariables.ini now trigger ensure, =on/off/true/false/.. works in all ini files, added enure if non scalability setting are used in ScalabilityIni (get now ignored) #rb:David.Hill #code_review:Marcus.Wassmer, Michael.Noland Change 2932719 on 2016/04/04 by Marcus.Wassmer Merge 3 band SH back to DevRendering #rb Daniel.Wright Change 2932760 on 2016/04/04 by Zabir.Hoque Migrating high resolution cubemaps for skylight and reflection probes. #rb: Daniel.Wright Change 2933121 on 2016/04/05 by Rolando.Caloca DR - vk - Fix free blocks not getting joined - Fix compile issue Change 2933122 on 2016/04/05 by Rolando.Caloca DR - Do not shorten dumped shaders path Change 2933126 on 2016/04/05 by Rolando.Caloca DR - vk - Index Buffers using new resource management Change 2933127 on 2016/04/05 by Rolando.Caloca DR - vk - Extract multibuffer off index buffer Change 2933131 on 2016/04/05 by Rolando.Caloca DR - vk - Transition to vb's using mutlibuffer Change 2933136 on 2016/04/05 by Rolando.Caloca DR - vk - Change staging buffers to use resource allocation system - Fix free block not getting joined - Remove define Change 2933140 on 2016/04/05 by Rolando.Caloca DR - vk - 'static' textures now use resource mgmt - Release free pages back to the OS - Remove ensure Change 2933152 on 2016/04/05 by Rolando.Caloca DR - vk - Fix aliasing granularity - Fix renderpass end/copy buffer ensure Change 2933155 on 2016/04/05 by Rolando.Caloca DR - SCW - Fix for -directcompile to directly load file for preprocessor Change 2933158 on 2016/04/05 by Rolando.Caloca DR - hlslcc - Error on Metal if trying to R & W on RWTextures - Fix indices on RW reads to be unsigned #codereview Mark.Satterthwaite, Michael.Trepka Change 2933169 on 2016/04/05 by Rolando.Caloca DR - vk - Move header to public to match changes on DevMobile Change 2933173 on 2016/04/05 by David.Hill Deferred decal rendering with negative scale #rb:Matrin.Mittring #jira:UE-27389 Change 2933273 on 2016/04/05 by Rolando.Caloca DR - vk - Fix renderdoc markers Change 2933274 on 2016/04/05 by Rolando.Caloca DR - Support for -AttachDebugger Change 2933316 on 2016/04/05 by Rolando.Caloca DR - vk - Compile fix whene enabling define Change 2933334 on 2016/04/05 by Rolando.Caloca DR - Compile fix #codereview Martin.Mittring Change 2933805 on 2016/04/05 by Brian.Karis Temporal AA dynamic antighosting. Fixed DOF Change 2933811 on 2016/04/05 by Brian.Karis Fixed area light NaNs. Improvements to area lights. Horizen handling for wrap around. Change 2933812 on 2016/04/05 by Brian.Karis Fixed fresnel on SSS skin. Change 2933813 on 2016/04/05 by Brian.Karis Tessellation fix Change 2933816 on 2016/04/05 by Brian.Karis Improved forward shading support [CL 2943241 by Gil Gribb in Main branch]
2016-04-13 21:24:38 -04:00
float ComputeClipmapExtent(int32 ClipmapIndex, const FScene* Scene)
{
Copying //UE4/Dev-Rendering to Dev-Main (//UE4/Dev-Main) (Source: //UE4/Dev-Rendering @ 2943238) #lockdown nick.penwarden ========================== MAJOR FEATURES + CHANGES ========================== Change 2932679 on 2016/04/04 by Martin.Mittring remove hack/cvar that is not longer needed as we fixed the bug #rb:Bob.Tellez #code_review:Bob.Tellez Change 2932681 on 2016/04/04 by Martin.Mittring fixed cvars in consolevariables.ini can affect engine even if marked with cheat (no longer load consolevariables.ini in shipping and test), unified 3 code path, added testcase, cvars with cheat in ini file other than consolevariables.ini now trigger ensure, =on/off/true/false/.. works in all ini files, added enure if non scalability setting are used in ScalabilityIni (get now ignored) #rb:David.Hill #code_review:Marcus.Wassmer, Michael.Noland Change 2932719 on 2016/04/04 by Marcus.Wassmer Merge 3 band SH back to DevRendering #rb Daniel.Wright Change 2932760 on 2016/04/04 by Zabir.Hoque Migrating high resolution cubemaps for skylight and reflection probes. #rb: Daniel.Wright Change 2933121 on 2016/04/05 by Rolando.Caloca DR - vk - Fix free blocks not getting joined - Fix compile issue Change 2933122 on 2016/04/05 by Rolando.Caloca DR - Do not shorten dumped shaders path Change 2933126 on 2016/04/05 by Rolando.Caloca DR - vk - Index Buffers using new resource management Change 2933127 on 2016/04/05 by Rolando.Caloca DR - vk - Extract multibuffer off index buffer Change 2933131 on 2016/04/05 by Rolando.Caloca DR - vk - Transition to vb's using mutlibuffer Change 2933136 on 2016/04/05 by Rolando.Caloca DR - vk - Change staging buffers to use resource allocation system - Fix free block not getting joined - Remove define Change 2933140 on 2016/04/05 by Rolando.Caloca DR - vk - 'static' textures now use resource mgmt - Release free pages back to the OS - Remove ensure Change 2933152 on 2016/04/05 by Rolando.Caloca DR - vk - Fix aliasing granularity - Fix renderpass end/copy buffer ensure Change 2933155 on 2016/04/05 by Rolando.Caloca DR - SCW - Fix for -directcompile to directly load file for preprocessor Change 2933158 on 2016/04/05 by Rolando.Caloca DR - hlslcc - Error on Metal if trying to R & W on RWTextures - Fix indices on RW reads to be unsigned #codereview Mark.Satterthwaite, Michael.Trepka Change 2933169 on 2016/04/05 by Rolando.Caloca DR - vk - Move header to public to match changes on DevMobile Change 2933173 on 2016/04/05 by David.Hill Deferred decal rendering with negative scale #rb:Matrin.Mittring #jira:UE-27389 Change 2933273 on 2016/04/05 by Rolando.Caloca DR - vk - Fix renderdoc markers Change 2933274 on 2016/04/05 by Rolando.Caloca DR - Support for -AttachDebugger Change 2933316 on 2016/04/05 by Rolando.Caloca DR - vk - Compile fix whene enabling define Change 2933334 on 2016/04/05 by Rolando.Caloca DR - Compile fix #codereview Martin.Mittring Change 2933805 on 2016/04/05 by Brian.Karis Temporal AA dynamic antighosting. Fixed DOF Change 2933811 on 2016/04/05 by Brian.Karis Fixed area light NaNs. Improvements to area lights. Horizen handling for wrap around. Change 2933812 on 2016/04/05 by Brian.Karis Fixed fresnel on SSS skin. Change 2933813 on 2016/04/05 by Brian.Karis Tessellation fix Change 2933816 on 2016/04/05 by Brian.Karis Improved forward shading support [CL 2943241 by Gil Gribb in Main branch]
2016-04-13 21:24:38 -04:00
const float InnerClipmapDistance = Scene->GlobalDistanceFieldViewDistance / FMath::Pow(GAOGlobalDFClipmapDistanceExponent, 3);
return InnerClipmapDistance * FMath::Pow(GAOGlobalDFClipmapDistanceExponent, ClipmapIndex);
}
Copying //UE4/Dev-Rendering to Dev-Main (//UE4/Dev-Main) (Source: //UE4/Dev-Rendering @ 2943238) #lockdown nick.penwarden ========================== MAJOR FEATURES + CHANGES ========================== Change 2932679 on 2016/04/04 by Martin.Mittring remove hack/cvar that is not longer needed as we fixed the bug #rb:Bob.Tellez #code_review:Bob.Tellez Change 2932681 on 2016/04/04 by Martin.Mittring fixed cvars in consolevariables.ini can affect engine even if marked with cheat (no longer load consolevariables.ini in shipping and test), unified 3 code path, added testcase, cvars with cheat in ini file other than consolevariables.ini now trigger ensure, =on/off/true/false/.. works in all ini files, added enure if non scalability setting are used in ScalabilityIni (get now ignored) #rb:David.Hill #code_review:Marcus.Wassmer, Michael.Noland Change 2932719 on 2016/04/04 by Marcus.Wassmer Merge 3 band SH back to DevRendering #rb Daniel.Wright Change 2932760 on 2016/04/04 by Zabir.Hoque Migrating high resolution cubemaps for skylight and reflection probes. #rb: Daniel.Wright Change 2933121 on 2016/04/05 by Rolando.Caloca DR - vk - Fix free blocks not getting joined - Fix compile issue Change 2933122 on 2016/04/05 by Rolando.Caloca DR - Do not shorten dumped shaders path Change 2933126 on 2016/04/05 by Rolando.Caloca DR - vk - Index Buffers using new resource management Change 2933127 on 2016/04/05 by Rolando.Caloca DR - vk - Extract multibuffer off index buffer Change 2933131 on 2016/04/05 by Rolando.Caloca DR - vk - Transition to vb's using mutlibuffer Change 2933136 on 2016/04/05 by Rolando.Caloca DR - vk - Change staging buffers to use resource allocation system - Fix free block not getting joined - Remove define Change 2933140 on 2016/04/05 by Rolando.Caloca DR - vk - 'static' textures now use resource mgmt - Release free pages back to the OS - Remove ensure Change 2933152 on 2016/04/05 by Rolando.Caloca DR - vk - Fix aliasing granularity - Fix renderpass end/copy buffer ensure Change 2933155 on 2016/04/05 by Rolando.Caloca DR - SCW - Fix for -directcompile to directly load file for preprocessor Change 2933158 on 2016/04/05 by Rolando.Caloca DR - hlslcc - Error on Metal if trying to R & W on RWTextures - Fix indices on RW reads to be unsigned #codereview Mark.Satterthwaite, Michael.Trepka Change 2933169 on 2016/04/05 by Rolando.Caloca DR - vk - Move header to public to match changes on DevMobile Change 2933173 on 2016/04/05 by David.Hill Deferred decal rendering with negative scale #rb:Matrin.Mittring #jira:UE-27389 Change 2933273 on 2016/04/05 by Rolando.Caloca DR - vk - Fix renderdoc markers Change 2933274 on 2016/04/05 by Rolando.Caloca DR - Support for -AttachDebugger Change 2933316 on 2016/04/05 by Rolando.Caloca DR - vk - Compile fix whene enabling define Change 2933334 on 2016/04/05 by Rolando.Caloca DR - Compile fix #codereview Martin.Mittring Change 2933805 on 2016/04/05 by Brian.Karis Temporal AA dynamic antighosting. Fixed DOF Change 2933811 on 2016/04/05 by Brian.Karis Fixed area light NaNs. Improvements to area lights. Horizen handling for wrap around. Change 2933812 on 2016/04/05 by Brian.Karis Fixed fresnel on SSS skin. Change 2933813 on 2016/04/05 by Brian.Karis Tessellation fix Change 2933816 on 2016/04/05 by Brian.Karis Improved forward shading support [CL 2943241 by Gil Gribb in Main branch]
2016-04-13 21:24:38 -04:00
void ComputeUpdateRegionsAndUpdateViewState(
FRHICommandListImmediate& RHICmdList,
const FViewInfo& View,
const FScene* Scene,
FGlobalDistanceFieldInfo& GlobalDistanceFieldInfo,
int32 NumClipmaps,
float MaxOcclusionDistance)
{
GlobalDistanceFieldInfo.Clipmaps.AddZeroed(NumClipmaps);
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
GlobalDistanceFieldInfo.MostlyStaticClipmaps.AddZeroed(NumClipmaps);
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
// Cache the heightfields update region boxes for fast reuse for each clip region.
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3219450) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3148067 on 2016/10/01 by Daniel.Wright Support for ReflectionEnvironment and light type show flags with ForwardShading Change 3149085 on 2016/10/03 by Daniel.Wright Support for ReflectionEnvironment show flag in base pass reflections without any shader overhead Change 3162206 on 2016/10/13 by Chris.Bunner Merging Dev-MaterialLayers to Dev-Rendering, CL 3161593: Material expressions; Trig, fast-trig, saturate, round, truncate, pre-skinned normal. Added CustomEyeTangent to material attributes. Resolved some hard-coded attribute typing and other minor fixes. Change 3186067 on 2016/11/03 by Daniel.Wright Updated Stationary primitive tooltip to indicate that it allows the primitive to be changed, but not moved Change 3186069 on 2016/11/03 by Daniel.Wright Using a weighted geometric mean to combine multiple Distance Field Indirect Shadows, greatly reduces over-occlusion when overlap is high Change 3186084 on 2016/11/03 by Mark.Satterthwaite Duplicate 3172511: Don't set Metal resource option fields on texture descriptors when running on an OS that doesn't support them. #jira UE-37481 Change 3186089 on 2016/11/03 by Mark.Satterthwaite Duplicate CL #3169764: Fixed automatic conversion of G8_sRGB into RGBA8_sRGB required for Mac Metal, which fixes FORT-27627. #jira FORT-27627 Change 3186113 on 2016/11/03 by Mark.Satterthwaite Duplicate CL #3183807: Change the way we access the Metal viewport's backbuffer, to reduce possible causes of FORT-31649: - Added console variable "rhi.Metal.SupportsIntermediateBackBuffer" to control whether to use an extra render-target so we can support screenshots & movie capture, or render directly to the back-buffer to save memory & GPU performance. Still defaults to ON for Mac & OFF for iOS/tvOS. - Change the way we handle updates to the back-buffer size to ensure that the different threads access their intended version. #jira FORT-31649 Change 3186116 on 2016/11/03 by Mark.Satterthwaite Duplicate CL #3183823: Record Metal resource & state objects used in a command-buffer when rhi.Metal.RuntimeDebugLevel is set to 3 or higher. The object labels, types & descriptions will be printed on failure - if the object is deleted prior to this then we have a lifetime error and it will crash at this point and can be debugged further using our -metalretainrefs command-line option or Xcode's zombie-objects. Used to verify that FORT-31649 is not a simple resource lifetime error and thereby speed up Apple/vendor investigations. #jira FORT-31649 Change 3186818 on 2016/11/04 by Chris.Bunner PR #2907 Export UMaterialExpressionNoise (contributed by kayosiii). Change 3186979 on 2016/11/04 by Rolando.Caloca DR - Misc minor cleanup Change 3187169 on 2016/11/04 by Uriel.Doyon Incremental insertion of level data between PostLoad and AddToWorld Change 3187205 on 2016/11/04 by Mark.Satterthwaite Compile fixes for iOS. Change 3187389 on 2016/11/04 by Uriel.Doyon Fix for possible stall when loading hidden level Change 3187598 on 2016/11/04 by Michael.Trepka MetalViewport compile fix Change 3187678 on 2016/11/04 by Uriel.Doyon Fix for landscape grass textures not being streamed in correctly. Change 3187731 on 2016/11/04 by Rolando.Caloca DR - Start making type safe some cross compiler enums Change 3187824 on 2016/11/04 by Rolando.Caloca DR - clang compile fix Change 3187953 on 2016/11/04 by Rolando.Caloca DR - vk - Mac compile fix Change 3188696 on 2016/11/07 by Mark.Satterthwaite Another iOS compile fix for new MetalViewport validation code. Change 3188906 on 2016/11/07 by Rolando.Caloca DR - Show permutation of LUTBlender Change 3189094 on 2016/11/07 by Chris.Bunner Fix RemoveAAJitter from projection matrix. #jira UE-37701, UE-38003 Change 3189134 on 2016/11/07 by Daniel.Wright Fix for CreateRenderTarget2D called in construction script during cooking Change 3189145 on 2016/11/07 by Chris.Bunner Follow-up to CL 3186818, export UMaterialExpressionVectorNoise. Change 3189239 on 2016/11/07 by Daniel.Wright Added show flag for Contact Shadows, disabled in planar reflections Change 3189252 on 2016/11/07 by Daniel.Wright Support for Reflection Capture intensity with simple reflections, which are the default with Forward Shading Change 3189406 on 2016/11/07 by Mark.Satterthwaite Really fix the last of the iOS compile errors from changes to the MetalViewport code. Change 3190854 on 2016/11/08 by Ben.Woodhouse XB1: Fix memory corruption with RHICreateVertexBuffer and RHICreateIndexBuffer when using initial data (Procedural Mesh Component crash) #jira UE-34264 #fyi james.golding #fyi keith.judge Change 3190962 on 2016/11/08 by Olaf.Piesche Unshelved from pending changelist '3176615' - Gil's fix for race condiiton with particle vertex factory reuse across different passes; potential to fix a number of issues Change 3191959 on 2016/11/09 by Uriel.Doyon Removed some static primitives from the dynamic primitive handler for texture streaming. Change 3193122 on 2016/11/10 by Chris.Bunner Always update non-preview material resources for use in code preview. #jira UE-38223 Change 3193190 on 2016/11/10 by Gil.Gribb UE4 - Fixed rare bug with shadow groups rendering things that have not been setup to render this frame. #jira UE-36379 Change 3193523 on 2016/11/10 by Uriel.Doyon Fixed incorrect section bounds used for texture streaming. Change 3193962 on 2016/11/10 by Uriel.Doyon Added defrag of dynamic bounds used for the texture streaming. Allows to remove unused bounds over time. Change 3193974 on 2016/11/10 by Uriel.Doyon New "Required Texture Resolution" view mode. Showing the ratio between the currently streamed texture resolution and the resolution wanted by the GPU. Change 3194109 on 2016/11/10 by Uriel.Doyon Another patch on material bounds used for texture streaming. Change 3194665 on 2016/11/11 by Chris.Bunner Duplicated behavior for inherited velocity scaling scaling to vert/surface spawned particles. Change 3194734 on 2016/11/11 by Rolando.Caloca DR - vk - Simplified some texture casting Change 3194867 on 2016/11/11 by Rolando.Caloca DR - vk - SM5 fixes Change 3195176 on 2016/11/11 by Chris.Bunner Fixed incorrectly updated NVAPI error. Change 3195425 on 2016/11/11 by Uriel.Doyon Fixed possible invalid level reference in the texture streamer Change 3196512 on 2016/11/14 by Gil.Gribb Merging //UE4/Dev-Main@3196156 to Dev-Rendering (//UE4/Dev-Rendering) Change 3196750 on 2016/11/14 by Marcus.Wassmer Fix ordering problem with GPU cache transitions Change 3196815 on 2016/11/14 by Daniel.Wright Suppressed 'Instanced stereo rendering is not supported' warning showing up in CIS Change 3196818 on 2016/11/14 by Daniel.Wright Fixed FIndirectLightingCache::UpdateCachePrimitivesInternal churning through a bunch of temporary memory Change 3196819 on 2016/11/14 by Daniel.Wright Volume lighting samples are allowed outside of the importance volume if their influence affects the volume. Fixes black indirect lighting on movable components in maps with small importance volumes. Volume lighting samples placed on surfaces use a radius that covers the layer height spacing, which prevents an uncovered region between layers Change 3197243 on 2016/11/14 by Uriel.Doyon Async Task For Updating static component LastRender time #jira UE-24268 Change 3197359 on 2016/11/14 by Daniel.Wright Added Inscattering Texture controls to ExponentialHeightFog * When InscatteringColorCubemap is specified, directional light inscattering is disabled * Lerps betwen 1x1 mip at NonDirectionalInscatteringColorDistance to mip 0 at FullyDirectionalInscatteringColorDistance * Added FogCutoffDistance, so artists can prevent fog on skyboxes (requires fog to be setup matching the fog that was rendered into the sky texture so that distant mountains match) * Fog shader permutations based on what feature is enabled Change 3198419 on 2016/11/15 by Chris.Bunner PS4 HDR: Runtime toggle (backbuffer recreation on resize matching), UI composition. Matches PC behavior and controls. HDR: Generalized buffer formats, cvar consistency pass, LUT for UI composition, refactoring common functions. Exposed RHICreateTargetableShaderResource3D. Moved some (translucent) volume rendering helpers to allow access in Slate. Change 3198822 on 2016/11/15 by Daniel.Wright Mac compile fix Change 3199509 on 2016/11/15 by Uriel.Doyon Added support for viewmode param asset name (and note just param value). Used to investigate texture streamer behavior. Change 3199578 on 2016/11/15 by Rolando.Caloca DR - Add some shader resource tables to SCW when running with -directcompile Change 3199698 on 2016/11/15 by Rolando.Caloca DR - vk - Refactor shader & descriptor bindings Change 3199712 on 2016/11/15 by Rolando.Caloca DR - vk - r.Vulkan.StripGlsl to always strip glsl at runtime to save memory per shader Change 3199717 on 2016/11/15 by Rolando.Caloca DR - vk - Show hitching PSO info again Change 3199750 on 2016/11/15 by Rolando.Caloca DR - SCW clang compile fixes Change 3200353 on 2016/11/16 by Rolando.Caloca DR - vk - Mac fix Change 3200358 on 2016/11/16 by Chris.Bunner Only allow UI composition on platforms we currently use it. Change 3200823 on 2016/11/16 by Chris.Bunner Remove expression key attribute ID when not translating an attribute output to allow intended expression sharing. #jira UE-38699 Change 3200947 on 2016/11/16 by Mark.Satterthwaite Fix UE-38695 by not trying to resize the viewport on the wrong thread. #jira UE-38695 Change 3201069 on 2016/11/16 by Daniel.Wright Fog inscattering texture limited to SM4 and above, fixes ES2 compile errors Change 3201346 on 2016/11/16 by Brian.Karis Temporal AA fix for correct edge gradients. Filtering now combined with importance sampling. Enabled Catmull-Rom resolve filter. Results are now slightly sharper. Fixed antighosting. Will yet require a dilation to be perfect. Optimized bicubic filtering to 5 taps instead of 9. Cleaned out unused code. Change 3201369 on 2016/11/16 by Brian.Karis Bicubic texture sample Change 3201522 on 2016/11/16 by Rolando.Caloca DR - vk - Fix static analysis issues Change 3201878 on 2016/11/17 by Chris.Bunner Temporarily disable Nvapi HDR error logging. #jira UE-38529 Change 3202108 on 2016/11/17 by Simon.Tovey Assets with easy repro for flickering particles bug Change 3202181 on 2016/11/17 by Rolando.Caloca DR - vk - CIS android fix Change 3202325 on 2016/11/17 by Ben.Woodhouse Integrate 4.14.1 fix from 14 //UE4/Release-4.14 (@3201850) Fix CreateVertexbuffer and CreateIndexBuffer memory corruption (Procedural Mesh Component crash) #jira UE-34264 Change 3204394 on 2016/11/18 by Guillaume.Abadie PR #2808: AlphaComposite Fog Opacity fix (Contributed by moritz-wundke) #br Ben.Woodhouse Change 3204428 on 2016/11/18 by Guillaume.Abadie Fixes a couple of issues in decals: * Crash in FDecalDrawingPolicyFactory::DrawMesh() * ActorPostion material expression * PixelNormalWS material expression * Missing renaming from DEFERRED_DECAL to DECAL_PRIMITIVE #jira UE-38327, UE-38158, UE-37818, UE-37350 Change 3204429 on 2016/11/18 by Uriel.Doyon Darker default undefined accuracy. Reenabled the texture streaming build in the build all. Change 3204458 on 2016/11/18 by Chris.Bunner Shader truncation warnings fix. Change 3204459 on 2016/11/18 by Chris.Bunner Engine 'Passthrough' material fuction fix. V4 is now actually a V4. Change 3204460 on 2016/11/18 by Chris.Bunner Correctly handle some known Nvapi warnings. #jira UE-38529 Change 3204653 on 2016/11/18 by Marc.Olano Helper functions for tiled textures Checking in for Ryan Brucks Change 3204863 on 2016/11/18 by Arne.Schober DR - Replaced ENQUEUE_UNIQUE_RENDER_COMMAND with a Debuggable template Implementation Change 3204939 on 2016/11/18 by Arne.Schober DR - Make clang happy Change 3204968 on 2016/11/18 by Arne.Schober DR - UE-38494 - Fixed SpeedTree Wind crash, when force deleting the Asset. Change 3206293 on 2016/11/21 by Uriel.Doyon New member bHasStreamingUpdatePending in UTexture2D to delay update of global distance fields. Set to true when the streamer can possibly load a mip in the near future. #jira UE-37787 Change 3206551 on 2016/11/21 by Chris.Bunner Added material update context when forcing all shaders to recompile. #jira UE-38481 Change 3206644 on 2016/11/21 by Benjamin.Hyder Updating Planar Reflection example in TM-Shadermodels. Change 3206899 on 2016/11/21 by Rolando.Caloca DR - vk - SM5 fixes Change 3206900 on 2016/11/21 by Rolando.Caloca DR - Added missing strings for shader formats Change 3206983 on 2016/11/21 by Rolando.Caloca DR - vk - Support for SV_Coverage Change 3207237 on 2016/11/22 by Simon.Tovey Exporting particle module base and a couple of child classes as it's commonly requested. #test compiles Change 3207241 on 2016/11/22 by Gil.Gribb Merging //UE4/Dev-Main@3206998 to Dev-Rendering (//UE4/Dev-Rendering) Change 3207520 on 2016/11/22 by Ben.Woodhouse Cherry picked from //Fortnite/Main@3206301 Fixed GPU hang in Zone Map view. Was an issue with RenderThread using the device context without appropriate RHIThread flushes. #jira FORT-31616 #code_review keith.judge Change 3207541 on 2016/11/22 by Ben.Woodhouse Cherry picked from //fortnite/Main@3207422 * Fix UpdateTexture3D to create a staging texture of the region to update rather than the whole texture. This prevents distance fields crashing during update (allocating 18GB per frame in some cases) * Put UpdateTexture2D DMA support onto a cvar, disabled by default (corruption issues reported by licensees, plus not sure it's actually faster - could be slower due to reduced bandwidth; issues reported by licensees) * Fix UpdateTexture2D to only create a staging texture of the region to update, saving memory #jira UE-38609 Change 3207654 on 2016/11/22 by Chris.Bunner Don't flag 16-bit PNG/JPG textures as sRGB on import. #jira UE-30279 Change 3208434 on 2016/11/22 by Rolando.Caloca DR - vk - UAV transitions Change 3208490 on 2016/11/22 by Chris.Bunner Break material code sharing when we detect an unresolvable loop. By default change IsResultMA loop detection to stop on functions as we can determine type definitively. Unified IsResultMA detection across switch nodes. Change 3208860 on 2016/11/23 by Rolando.Caloca DR - vk - Fix some format issues Change 3209265 on 2016/11/23 by Arne.Schober DR - originally unshelved from 3153924 - Made Depth and Velocity Rendering Passes to use PSO only RHI interface, We are now passing down two structs that collect all the necessary information for the drawing policies to construct a PSO object. One during construction of the Policy, which contains information abouyt the CullMode, FillMode and PrimType. And another during rendering that passes infomation like BlendState and DepthStencilState down to the low levelrenderer into SetSharedState. Performance of the static drawlist ist slightly slower (less than 0.1ms on Consoles) due to some addtional branches and copies. The branches in the FDrawingPolicyRenderState will go away as soon as everything is converted to use the PSO interface. Performace of the GPU is slightly better due to less context rolls (mainly CullMode sorts in differently now) Change 3209305 on 2016/11/23 by Guillaume.Abadie Fix contact shadow's assemption on objects thickness Change 3209334 on 2016/11/23 by Brian.Karis Fixed TAA handling of alpha. Switched the meaning of AA_ALPHA to make sense. Change 3209903 on 2016/11/24 by Guillaume.Abadie Cherry picks alpha through post processing changelists 3201959, 3204143 and 3209883 from //UE4/Private-Partner-NREAL Change 3209973 on 2016/11/24 by Ben.Woodhouse Fix D3D11 and 12 static analysis warnings reported by Rob Troughton of Coconut Lizard (http://coconutlizard.co.uk/blog/ue4/pvs-studio-part5/) Change 3210023 on 2016/11/24 by Uriel.Doyon Fixed an issue with DropDetail when FixedFrameRate was set to a value smaller than MinDesiredFrameRate. #jira UE-37210 Change 3210026 on 2016/11/24 by Ben.Woodhouse Disable renderthread hang detection if a debugger is present, so we can debug the renderthread without crashing Change 3210049 on 2016/11/24 by Ben.Woodhouse Fix mac build Change 3210071 on 2016/11/24 by Uriel.Doyon Fixed an issue with masked materials and shader complexity viewmode when DBuffer Decals are enabled. #jira UE-37542 Change 3210374 on 2016/11/25 by Ben.Woodhouse * Fix issues with fast cleared dbuffer targets not being resolved when no decals are in the scene. This caused graphical corruption on XB1 and ensure failures on PS4 (with RHIThread disabled) * Move Decal rendertarget manager function implementations out of the header. #jira UE-38879 Change 3210390 on 2016/11/25 by Uriel.Doyon Fixed cubemap resourcesize not taking into account mipgen settings #jira UE-37045 Change 3210407 on 2016/11/25 by Uriel.Doyon "resavepackages" commandlet now supports -buildtexturestreaming that rebuilds the map texture streaming data. That can be used in combination with -buildlighting. Change 3210563 on 2016/11/27 by Rolando.Caloca DR - vk - Integrate cached memory fixes and PF_D24 format fix #jira UE-39025 PR #2974 Change 3210564 on 2016/11/27 by Rolando.Caloca DR - Fix for GL linker PR #2975 #jira UE-39029 Change 3210592 on 2016/11/27 by Rolando.Caloca DR - vk - SM5 fixes Change 3210597 on 2016/11/27 by Rolando.Caloca DR - vk - Prep for staging UB copies to GPU memory Change 3210600 on 2016/11/27 by Rolando.Caloca DR - vk - Extract generic range code Change 3210613 on 2016/11/27 by Rolando.Caloca DR - vk - Added r.Vulkan.SubmitOnDispatch Change 3211054 on 2016/11/28 by Rolando.Caloca DR - vk - Missing reference Change 3211330 on 2016/11/28 by Chris.Bunner Shader compile error for max texture coordinate count on skinned meshes. Change 3211384 on 2016/11/28 by Arne.Schober DR - Enforce move on EnqueueRenderCommand Lambda Change 3211431 on 2016/11/28 by Gil.Gribb Merging //UE4/Dev-Main@3211016 to Dev-Rendering (//UE4/Dev-Rendering) Change 3211738 on 2016/11/28 by Gil.Gribb IWYU fixes after merge Change 3212231 on 2016/11/28 by Richard.Wallis Fix build errors Change 3212253 on 2016/11/28 by Richard.Wallis Remove MacGraphicsSwitching plugin. #jira UE-37640 Change 3212310 on 2016/11/28 by Rolando.Caloca DR - vk - Update glslang to 1.0.33.0 Change 3212446 on 2016/11/28 by Guillaume.Abadie Implements PreviousFrameSwitch material expression Change 3212594 on 2016/11/28 by Arne.Schober DR - Fix missing include Change 3212681 on 2016/11/29 by Rolando.Caloca DR - vk - Auto flush for compute shader Change 3213000 on 2016/11/29 by Gil.Gribb temp fix for PF_MAX Change 3213161 on 2016/11/29 by Ben.Woodhouse Integrate latest D3D12 changes from //depot/Partners/Microsoft/UE4-DX12/...@3211714 Using: - p4 integrate //depot/Partners/Microsoft/UE4-DX12/Engine/Source/Runtime/D3D12RHI/...@3211714 //UE4/Dev-Rendering/Engine/Source/Runtime/D3D12RHI/... - p4 integrate //depot/Partners/Microsoft/UE4-DX12/Engine/Source/ThirdParty/Windows/DirectX/...@3211714 //UE4/Dev-Rendering/Engine/Source/ThirdParty/Windows/DirectX/... - p4 integrate //depot/Partners/Microsoft/UE4-DX12/Engine/Source/Programs/UnrealBuildTool/...@3211714 //UE4/Dev-Rendering/Engine/Source/Programs/UnrealBuildTool/... Changes from UE4-DX12: *** CL 3183818 *** Update D3D12 RHI to 4.14: - Merged changes from Epic up until 10/20/16 - Fixed an issue where command allocators where resetting too early. I changed to aggressive command list batching by default now that more SubmitCommandListHint calls exist in the upper engine, we don't need to worry about starving the GPU. Fewer ExecuteCommandLists calls means better performance and fewer Signals() so this change provides a GPU perf win. I had to fix an issue with aggressive batching where we would sometimes hold on to a command list long enough (in the pending list) but hadn't executed it yet. The command allocator was being put back in the queue of allocators during ReleaseCommandAllocator() without a syncpoint set and was thus being reset too early. I added a simple counter to the command allocator so it could track how many command lists were using it. It doesn't need to be thread safe since only one thread uses a command allocator at a time. I also added some stats around the # command lists and # command allocators since it would be possible to leak command allocators now if it's pending command list count isn't decremented correctly. In that case we'd keep creating new command allocators and eventually run out of memory. -Remove clear during allocate in the FD3D12FastConstantAllocator and FD3D12FastAllocator. The supplied resource locations are assumed to be new and thus don't need to be cleared. -Cleanup D3D12RHI stats. There were some unused stats as well as some missing ones. -Mark shader resource table uniform buffers as dirty only when the shader changes. Cleanup SetComputeShader calls and Dispatch calls to not set/unset the CS for each Dispatch. -Remove unused Check SRV resolved code that epic added to the D3D11 RHI and was brought over. We dont need it and we won't use this. -Remove "always on" cycle counters for high frequency RHI methods like RHISetShaderTexture. These should use the engine's stat macros as they are removed on TEST + SHIPPING builds. On Xbox a significant amount of CPU time is spent in things like QueryPerformanceCounter even when STATS aren't enabled. Currently 1% of an entire capture on XBOX is spent inside this call. I improved and cleaned up high freqency call stacks like: - RHISetShaderTexture - RHISetShaderResourceViewParameter - RHISetShaderParameter - RHISetUAVParameter In general I moved to use templated functions, removed unused parameters, unnecessary copies, etc. -Change D3D12 PIX profiling enable/disable to match Xbox and handle logic in the UEBuildWindows.cs for UBT. Also add a static assert to inform the developer when PIX profiling is requested but the engine is compiling out draw events. -Resources should be associated with the rendering thread's frame that it's currently recording command lists for and they shouldnt be cleaned up until those command lists have been translated to D3D12 command lists on the RHI thread AND completed executing on the GPU. This was confirmed to resolve an issue where CBV resources were being released too early. This work involved a couple changes: 1) Move the "frame" fence to be incremented on the rendering thread (during RHIAdvanceFrameForGetViewportBackBuffer()) so that resources that are deleted from the rendering thread are assosicated with the correct frame count 2) Queue up a command from the rendering thread to signal the "frame" fence. It needs to be queued to ensure that it's signaled at the correct time on the RHI thread (after that frame's command lists have been executed). -Disable GRHIRequiresEarlyBackBufferRenderTarget. Metal/Vulkan/Xbox11.x already do this. This is used by the Slate renderer during BeginRenderFrame and avoids a SetRenderTargets call. -Enable GRHISupportsMSAADepthSampleAccess (used in the Editor). This was enabled for D3D11 on SM5, but not for D3D12. -Delay load D3D12.dll and add root signature 1.1 support. -Add explicit flush calls to improve resource barrier batching instead of implict flushes inside FConditionalScopeResourceBarrier and FScopeResourceBarrier. Also update those classes with const members. *** CL 3183824 *** Fix the D3D12 RHI after integrating UE 4.14 updates: - Fixed a bug where we would try to get the PSO of a nullptr in SetPipelineState if we needed to reset the current PSO on the cmd list. - Fixed a spelling error - Removed the need for bForceState, we use dirty bits now *** CL 3183830 *** - GetDebugFlags RHI extension, needed by XB1 movie player. - Only query memory info if stats are enabled - Add support for the engine's new RHISubmitCommandsAndFlushGPU function - Update CommitPendingPipelineState to be Graphics/Compute specific and avoid the need for a IsCompute parameter. *** CL 3183837 *** Made PipelineState caches contain pointers to FD3D12PipelineState objects to avoid issues with using pointers to after Find/Add to the maps. TMap indicates that the pointer to the value associated with a key "is only valid until the next change to any key in the map." The lifetime of the PSO pointers is managed by the low level caches (graphics and compute). Added stat for the number of Pipeline State Objects. *** CL 3183931 *** Update Windows D3D12 headers and libs to RS1 release bits (10.0.14393.0) *** CL 3183978 *** Update UBT Windows build settings: - Change D3D12 PIX profiling enable/disable to match Xbox and handle logic in the UEBuildWindows.cs for UBT. Also add a static assert to inform the developer when PIX profiling is requested but the engine is compiling out draw events. -Delay load D3D12.dll and add root signature 1.1 support. *** CL 3184132 *** Fix Xbox PSO cache code where it could leak PSOs. Related to change 3183837. *** Changelist 3211714 *** Update D3D12 RHI with fixes: - Check if we can reserve slots in GatherUniqueSamplerTables - DirtyState more often in StateCache - Remove InternalSetSamplerState. The alternate function isn't used. - Allow MRTClear for arrays with holes in them - Fix uninitialized descriptors. This was causing a GPU hang on Xbox. We need to set dirty bits for resources bound to slots outside of the current descriptor table's range - Cleanup SetDescriptorHeap code. Move setting descriptor heap logic to the descriptor cache since it also owns things like the sampler maps. Added members to the descriptor cache to track the last heaps that were set on the command list to avoid dirtying bit unnecessarily. - Resource transitions: go through Common between queues (3D <--> Compute) - Fix initial state for placed resources. - Merging epic Change 3213250 on 2016/11/29 by Chris.Bunner GBufferHints tooltip fix. #jira UE-39103 Change 3213345 on 2016/11/29 by Gil.Gribb more IWYU fallout Change 3213676 on 2016/11/29 by Rolando.Caloca DR - Fix incorrect texture getting cleared Change 3213728 on 2016/11/29 by Rolando.Caloca DR - Lambda-ize Change 3214461 on 2016/11/29 by Ben.Woodhouse Rollout August QFE4 XDK (required for latest DX12 changes on XB1) Change 3215317 on 2016/11/30 by Daniel.Wright PS4 compile fix Change 3216343 on 2016/11/30 by Arne.Schober DR - UE-39155 - after talking to Brian it occurred to us that flipping the world space normal is non sensical. And indeed the Grass was using world space normals. Change 3216844 on 2016/12/01 by Ben.Woodhouse Fix for static analysis warnings after discussion with Microsoft Change 3216916 on 2016/12/01 by Gil.Gribb Merging //UE4/Dev-Main@3216539 to Dev-Rendering (//UE4/Dev-Rendering) Change 3217385 on 2016/12/01 by Arne.Schober DR - UE-39218, UE-39221, UE-39224 and potentially UE-39214 - The Stencil bits for Light channels and decal application were not set in the dynamic basepass Change 3217464 on 2016/12/01 by Ben.Woodhouse Fix for reflection capture resize assert. The assert is only valid in cooked builds, so disable it in editor #jira UE-39225 Change 3217534 on 2016/12/01 by Arne.Schober DR - Fix Merge conflict Change 3217581 on 2016/12/01 by Rolando.Caloca DR - Fix assert on debug Change 3217741 on 2016/12/01 by Benjamin.Hyder Duplicate audio fix. Change 3217890 on 2016/12/01 by Rolando.Caloca DR - Fix widget not rendering properly when hidden #jira UE-39221 Change 3218129 on 2016/12/01 by Arne.Schober DR - UE-39214 - Lod dither value as accidently cached accross the static draw list. Change 3218759 on 2016/12/02 by Guillaume.Abadie Fixes editor compositing bug caused by alpha through post processing change 3209903 #jira UE-39221 [CL 3219854 by Marcus Wassmer in Main branch]
2016-12-02 16:43:04 -05:00
TArray<FBox> PendingStreamingHeightfieldBoxes;
for (const FPrimitiveSceneInfo* HeightfieldPrimitive : Scene->DistanceFieldSceneData.HeightfieldPrimitives)
{
if (HeightfieldPrimitive->Proxy->HeightfieldHasPendingStreaming())
{
PendingStreamingHeightfieldBoxes.Add(HeightfieldPrimitive->Proxy->GetBounds().GetBox());
}
}
if (View.ViewState)
{
View.ViewState->GlobalDistanceFieldUpdateIndex++;
if (View.ViewState->GlobalDistanceFieldUpdateIndex > 4)
{
View.ViewState->GlobalDistanceFieldUpdateIndex = 0;
}
for (int32 ClipmapIndex = 0; ClipmapIndex < NumClipmaps; ClipmapIndex++)
{
FGlobalDistanceFieldClipmapState& ClipmapViewState = View.ViewState->GlobalDistanceFieldClipmapState[ClipmapIndex];
Copying //UE4/Dev-Rendering to Dev-Main (//UE4/Dev-Main) (Source: //UE4/Dev-Rendering @ 2943238) #lockdown nick.penwarden ========================== MAJOR FEATURES + CHANGES ========================== Change 2932679 on 2016/04/04 by Martin.Mittring remove hack/cvar that is not longer needed as we fixed the bug #rb:Bob.Tellez #code_review:Bob.Tellez Change 2932681 on 2016/04/04 by Martin.Mittring fixed cvars in consolevariables.ini can affect engine even if marked with cheat (no longer load consolevariables.ini in shipping and test), unified 3 code path, added testcase, cvars with cheat in ini file other than consolevariables.ini now trigger ensure, =on/off/true/false/.. works in all ini files, added enure if non scalability setting are used in ScalabilityIni (get now ignored) #rb:David.Hill #code_review:Marcus.Wassmer, Michael.Noland Change 2932719 on 2016/04/04 by Marcus.Wassmer Merge 3 band SH back to DevRendering #rb Daniel.Wright Change 2932760 on 2016/04/04 by Zabir.Hoque Migrating high resolution cubemaps for skylight and reflection probes. #rb: Daniel.Wright Change 2933121 on 2016/04/05 by Rolando.Caloca DR - vk - Fix free blocks not getting joined - Fix compile issue Change 2933122 on 2016/04/05 by Rolando.Caloca DR - Do not shorten dumped shaders path Change 2933126 on 2016/04/05 by Rolando.Caloca DR - vk - Index Buffers using new resource management Change 2933127 on 2016/04/05 by Rolando.Caloca DR - vk - Extract multibuffer off index buffer Change 2933131 on 2016/04/05 by Rolando.Caloca DR - vk - Transition to vb's using mutlibuffer Change 2933136 on 2016/04/05 by Rolando.Caloca DR - vk - Change staging buffers to use resource allocation system - Fix free block not getting joined - Remove define Change 2933140 on 2016/04/05 by Rolando.Caloca DR - vk - 'static' textures now use resource mgmt - Release free pages back to the OS - Remove ensure Change 2933152 on 2016/04/05 by Rolando.Caloca DR - vk - Fix aliasing granularity - Fix renderpass end/copy buffer ensure Change 2933155 on 2016/04/05 by Rolando.Caloca DR - SCW - Fix for -directcompile to directly load file for preprocessor Change 2933158 on 2016/04/05 by Rolando.Caloca DR - hlslcc - Error on Metal if trying to R & W on RWTextures - Fix indices on RW reads to be unsigned #codereview Mark.Satterthwaite, Michael.Trepka Change 2933169 on 2016/04/05 by Rolando.Caloca DR - vk - Move header to public to match changes on DevMobile Change 2933173 on 2016/04/05 by David.Hill Deferred decal rendering with negative scale #rb:Matrin.Mittring #jira:UE-27389 Change 2933273 on 2016/04/05 by Rolando.Caloca DR - vk - Fix renderdoc markers Change 2933274 on 2016/04/05 by Rolando.Caloca DR - Support for -AttachDebugger Change 2933316 on 2016/04/05 by Rolando.Caloca DR - vk - Compile fix whene enabling define Change 2933334 on 2016/04/05 by Rolando.Caloca DR - Compile fix #codereview Martin.Mittring Change 2933805 on 2016/04/05 by Brian.Karis Temporal AA dynamic antighosting. Fixed DOF Change 2933811 on 2016/04/05 by Brian.Karis Fixed area light NaNs. Improvements to area lights. Horizen handling for wrap around. Change 2933812 on 2016/04/05 by Brian.Karis Fixed fresnel on SSS skin. Change 2933813 on 2016/04/05 by Brian.Karis Tessellation fix Change 2933816 on 2016/04/05 by Brian.Karis Improved forward shading support [CL 2943241 by Gil Gribb in Main branch]
2016-04-13 21:24:38 -04:00
const float Extent = ComputeClipmapExtent(ClipmapIndex, Scene);
const float CellSize = (Extent * 2) / GAOGlobalDFResolution;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
bool bReallocated = false;
// Accumulate primitive modifications in the viewstate in case we don't update the clipmap this frame
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
for (uint32 CacheType = 0; CacheType < GDF_Num; CacheType++)
{
const uint32 SourceCacheType = GAOGlobalDistanceFieldCacheMostlyStaticSeparately ? CacheType : GDF_Full;
ClipmapViewState.Cache[CacheType].PrimitiveModifiedBounds.Append(Scene->DistanceFieldSceneData.PrimitiveModifiedBounds[SourceCacheType]);
if (CacheType == GDF_Full || GAOGlobalDistanceFieldCacheMostlyStaticSeparately)
{
TRefCountPtr<IPooledRenderTarget>& RenderTarget = ClipmapViewState.Cache[CacheType].VolumeTexture;
if (!RenderTarget || RenderTarget->GetDesc().Extent.X != GAOGlobalDFResolution)
{
AllocateClipmapTexture(RHICmdList, ClipmapIndex, (FGlobalDFCacheType)CacheType, RenderTarget);
bReallocated = true;
}
}
}
Copying //UE4/Dev-Rendering to Dev-Main (Source //UE4/Dev-Rendering@2932636) #lockdown nick.penwarden ========================== MAJOR FEATURES + CHANGES ========================== Change 2917472 on 2016/03/21 by Rolando.Caloca DR - Fix SCW directcompile arguments, add -pipeline Change 2919580 on 2016/03/23 by Rolando.Caloca DR - HlslParser - Fix for used elements (sparrow's arrow was showing when it shouldn't) Arrays of input/outputs are now flattened so disjoint entries can be optimized out (and fixes a bug) #jira OR-15380 #tests Run game with sparrow, test with slomo to check for gfx glitches Change 2919660 on 2016/03/23 by Rolando.Caloca DR - Latest vk changes (from dev mobile's 2916881 to 2919157) Change 2919902 on 2016/03/23 by Rolando.Caloca DR - Fix skeletal meshes decrementing stats twice #codereview Marcus.Wassmer #jira UE-28478 Change 2920020 on 2016/03/23 by David.Hill #Jira UE-28503 EyeAdapation when used in material shader may not be initialized. #rb olaf.piesche Change 2920071 on 2016/03/23 by Rolando.Caloca DR - Remove old vk define - Started moving around direct calls to queue submit Change 2920252 on 2016/03/23 by Rolando.Caloca DR - Changes vk structs to classes Change 2920314 on 2016/03/23 by Olaf.Piesche Add -windowed to standalone game PIE command line to avoid PIE launching in full screen #jira UE-27870 #codereview michael.trepka Change 2920745 on 2016/03/24 by Uriel.Doyon Texture streaming build now takes into account the material texcoord scales applied to the texture sampling. Also finds out which texcoord is being used when sampling textures (between 0 and 3 currently). TexCoord analysis debug view shaders is now working with SM4 ane SM5. StaticMeshComponents hold persistent data coming from the texture streaming build. #tests tested with different Paragon assets. Editor SM4 & SM5. Cooked maps #codereview marcus.wassmer Change 2921335 on 2016/03/24 by Uriel.Doyon Added missing static keyword for locally defined console variable. #codereview rolando.caloca Change 2921416 on 2016/03/24 by Uriel.Doyon Revert enabling debugview shaders on non PC platforms (until properly tested and debugged) Change 2921446 on 2016/03/24 by Daniel.Wright Planar reflection mesh Change 2921530 on 2016/03/24 by Daniel.Wright Manual revert of Ronin planar reflections #codereview Ryan.Vance Change 2921608 on 2016/03/24 by Uriel.Doyon Updated texture streamer to take into account the new HLOD texture group. Change 2921677 on 2016/03/24 by Daniel.Wright Distance Field Specular Occlusion * Prototype - disabled by default Change 2921681 on 2016/03/24 by Daniel.Wright UnmappedTexelsPercentage is now 100 based Change 2921682 on 2016/03/24 by Daniel.Wright Planar reflections * New actor and component * The scene is rendered to texture with a mirrored camera and a clip plane each frame * The reflection texture is then applied to opaque pixels in a deferred pass, with distance and angle from plane fades * Translucent materials apply the nearest reflection plane in the base pass * Planar reflections require the project setting 'Support global clip plane for Planar Reflections' to be enabled, since writing to SV_ClipDistance all the time adds about 15% BasePass GPU time on PS4 * Fixed global distance field in materials which had been broken since moving global distance field properties into the view uniform buffer * Fixed PS4 removing system-value semantics when output from vertex shader and not read in next stage Change 2921734 on 2016/03/24 by Uriel.Doyon Fixed tessellated cube having wrong UVs #jira UE-28379 Change 2922063 on 2016/03/24 by Daniel.Wright Removed planar reflection debug code Change 2922428 on 2016/03/25 by Chris.Bunner Delete FShaderPipeline objects when clearing TMaterialShaderMaps. #rb Rolando.Caloca #jira UE-28621 Change 2922803 on 2016/03/25 by Rolando.Caloca DR - New cmd buffer management (disabled) - Move cmd buffer out of pending state and into context - Do not hardcode # cmd buffers - Move back buffer image mgmt into swapchain - Fixed some image layout transition bugs Change 2923056 on 2016/03/25 by Rolando.Caloca DR - Initial fix for canvas locking inside a render pass [CL 2932649 by Gil Gribb in Main branch]
2016-04-04 18:44:59 -04:00
const bool bForceFullUpdate = bReallocated
Copying //UE4/Dev-Framework to Dev-Main (//UE4/Dev-Main) @ 2944217 #lockdown Nick.Penwarden ========================== MAJOR FEATURES + CHANGES ========================== Change 2899855 on 2016/03/08 by Marc.Audy Merging //UE4/Dev-Main to Dev-Framework (//UE4/Dev-Framework) @ 2899785 Change 2926689 on 2016/03/29 by Jeff.Farris AAIController::SetFocus() will now implicitly clear any location focus at the same priority. UE-27975 #rb john.abercrombie Change 2926690 on 2016/03/29 by Jeff.Farris Using wildcard operator with the "KismetEvent" or "ke" console commands will now only trigger the event on objects in the world in which it was triggered. Prevents badness with running events on things like CDOs and editor actors. (UE-23106) Change 2926691 on 2016/03/29 by mason.seay Content for testing collision on scaled components Change 2926692 on 2016/03/29 by Jeff.Farris - FixupDeltaSeconds now considers time dilation when clamping. - Acceptable range for time dilation values is now a config parameter on WorldSettings - Acceptable range for undilated frame times is now a config parameter on WorldSettings (UE-27815) #rb marc.audy Change 2926711 on 2016/03/29 by Ori.Cohen Fix constraint rendering when scaling a cosntraint actor #JIRA UE-28691, UE-28700 #rb Lina.Halper Change 2926745 on 2016/03/29 by Lukasz.Furman navigation filters can now be instantiated per querier - usually AI agent required for FORT-21372 Change 2926789 on 2016/03/29 by Ori.Cohen Downgrade check to ensure for 2d physics during a hard shutdown #rb Michael.Noland Change 2926859 on 2016/03/29 by Ori.Cohen Fix red herring warnings of not locking physx scenes during hard shutdown. #JIRA UE-28747 #rb Michael.Noland Change 2927444 on 2016/03/30 by Thomas.Sarkanen Fixed Blueprint compiler errors when resetting timer handles Added basic support for 64-bit int/uint terms to Blueprint. This allows the use of opaque 64-bit integer types inside of BlueprintType structs, it in no way means that 64-bit ints are fully supported in Blueprint. Corrected a left-over formatting oversight when converting a FTimerHandle to a string. Added new by-ref "Clear and Invalidate Timer by Handle" function to Blueprint system library & deprecated old version. #rb Maciej.Mroz (and a few others!) #jira UE-28833 - Unresolved compiler error for B_Pickups blueprint in Fortnite Change 2927520 on 2016/03/30 by Jurre.deBaare Should not allow skeletal mesh components mobility to be set to static, but detach instead #fix Added CanHaveStaticMobility to SceneComponent class, and check this when trying to propogate Static mobility to parent component #jira UE-26364 Change 2927533 on 2016/03/30 by Jurre.deBaare Static Mesh Merge tool: when merging from multiple blueprints, fails to combine same materials #fix Material index remapping was part of if-clause where it shouldn't be #jira UE-23827 Static Mesh Merge tool, failed to combine physics data if using complex #fix Required copying the SectionInfoMap from source static meshes HLOD/MergeActor - Vertex Colours are not correctly propagated to negatively scaled meshes #fix had to re-order function calls #jira UE-28316 #rb James.Golding Change 2927535 on 2016/03/30 by Ori.Cohen Make sub-stepping run on game thread #JIRA UE-24011 #rb Gil.Gribb Change 2927537 on 2016/03/30 by Jurre.deBaare Warning message when HLOD mesh > 65536 vertices #jira UE-22365 #fix added messages when building proxy mesh Change 2927691 on 2016/03/30 by Jeff.Farris Fixed potential PlayerState leak (UE-22700) Change 2927692 on 2016/03/30 by Lina.Halper Allow it to select any name they want other than just restrict to what we have. - I think it may not be the best solution but with current widget built, you can't even clear name, which is problem. - Other solution is to add "Clear" as a name, and when that gets entered, we just clear it, but then the X button is odd and no purpose being there. - I think we should just allow them to choose if they don't like it but with suggestions. #rb: Ori.Cohen #jira UE-27786 #code review: Benn.Gallagher Change 2927853 on 2016/03/30 by Lina.Halper [CL 2944273 by Marc Audy in Main branch]
2016-04-14 16:25:11 -04:00
|| !View.ViewState->bInitializedGlobalDistanceFieldOrigins
Copying //UE4/Dev-Rendering to Dev-Main (Source //UE4/Dev-Rendering@2932636) #lockdown nick.penwarden ========================== MAJOR FEATURES + CHANGES ========================== Change 2917472 on 2016/03/21 by Rolando.Caloca DR - Fix SCW directcompile arguments, add -pipeline Change 2919580 on 2016/03/23 by Rolando.Caloca DR - HlslParser - Fix for used elements (sparrow's arrow was showing when it shouldn't) Arrays of input/outputs are now flattened so disjoint entries can be optimized out (and fixes a bug) #jira OR-15380 #tests Run game with sparrow, test with slomo to check for gfx glitches Change 2919660 on 2016/03/23 by Rolando.Caloca DR - Latest vk changes (from dev mobile's 2916881 to 2919157) Change 2919902 on 2016/03/23 by Rolando.Caloca DR - Fix skeletal meshes decrementing stats twice #codereview Marcus.Wassmer #jira UE-28478 Change 2920020 on 2016/03/23 by David.Hill #Jira UE-28503 EyeAdapation when used in material shader may not be initialized. #rb olaf.piesche Change 2920071 on 2016/03/23 by Rolando.Caloca DR - Remove old vk define - Started moving around direct calls to queue submit Change 2920252 on 2016/03/23 by Rolando.Caloca DR - Changes vk structs to classes Change 2920314 on 2016/03/23 by Olaf.Piesche Add -windowed to standalone game PIE command line to avoid PIE launching in full screen #jira UE-27870 #codereview michael.trepka Change 2920745 on 2016/03/24 by Uriel.Doyon Texture streaming build now takes into account the material texcoord scales applied to the texture sampling. Also finds out which texcoord is being used when sampling textures (between 0 and 3 currently). TexCoord analysis debug view shaders is now working with SM4 ane SM5. StaticMeshComponents hold persistent data coming from the texture streaming build. #tests tested with different Paragon assets. Editor SM4 & SM5. Cooked maps #codereview marcus.wassmer Change 2921335 on 2016/03/24 by Uriel.Doyon Added missing static keyword for locally defined console variable. #codereview rolando.caloca Change 2921416 on 2016/03/24 by Uriel.Doyon Revert enabling debugview shaders on non PC platforms (until properly tested and debugged) Change 2921446 on 2016/03/24 by Daniel.Wright Planar reflection mesh Change 2921530 on 2016/03/24 by Daniel.Wright Manual revert of Ronin planar reflections #codereview Ryan.Vance Change 2921608 on 2016/03/24 by Uriel.Doyon Updated texture streamer to take into account the new HLOD texture group. Change 2921677 on 2016/03/24 by Daniel.Wright Distance Field Specular Occlusion * Prototype - disabled by default Change 2921681 on 2016/03/24 by Daniel.Wright UnmappedTexelsPercentage is now 100 based Change 2921682 on 2016/03/24 by Daniel.Wright Planar reflections * New actor and component * The scene is rendered to texture with a mirrored camera and a clip plane each frame * The reflection texture is then applied to opaque pixels in a deferred pass, with distance and angle from plane fades * Translucent materials apply the nearest reflection plane in the base pass * Planar reflections require the project setting 'Support global clip plane for Planar Reflections' to be enabled, since writing to SV_ClipDistance all the time adds about 15% BasePass GPU time on PS4 * Fixed global distance field in materials which had been broken since moving global distance field properties into the view uniform buffer * Fixed PS4 removing system-value semantics when output from vertex shader and not read in next stage Change 2921734 on 2016/03/24 by Uriel.Doyon Fixed tessellated cube having wrong UVs #jira UE-28379 Change 2922063 on 2016/03/24 by Daniel.Wright Removed planar reflection debug code Change 2922428 on 2016/03/25 by Chris.Bunner Delete FShaderPipeline objects when clearing TMaterialShaderMaps. #rb Rolando.Caloca #jira UE-28621 Change 2922803 on 2016/03/25 by Rolando.Caloca DR - New cmd buffer management (disabled) - Move cmd buffer out of pending state and into context - Do not hardcode # cmd buffers - Move back buffer image mgmt into swapchain - Fixed some image layout transition bugs Change 2923056 on 2016/03/25 by Rolando.Caloca DR - Initial fix for canvas locking inside a render pass [CL 2932649 by Gil Gribb in Main branch]
2016-04-04 18:44:59 -04:00
// Detect when max occlusion distance has changed
Copying //UE4/Dev-Rendering to Dev-Main (//UE4/Dev-Main) (Source: //UE4/Dev-Rendering @ 2943238) #lockdown nick.penwarden ========================== MAJOR FEATURES + CHANGES ========================== Change 2932679 on 2016/04/04 by Martin.Mittring remove hack/cvar that is not longer needed as we fixed the bug #rb:Bob.Tellez #code_review:Bob.Tellez Change 2932681 on 2016/04/04 by Martin.Mittring fixed cvars in consolevariables.ini can affect engine even if marked with cheat (no longer load consolevariables.ini in shipping and test), unified 3 code path, added testcase, cvars with cheat in ini file other than consolevariables.ini now trigger ensure, =on/off/true/false/.. works in all ini files, added enure if non scalability setting are used in ScalabilityIni (get now ignored) #rb:David.Hill #code_review:Marcus.Wassmer, Michael.Noland Change 2932719 on 2016/04/04 by Marcus.Wassmer Merge 3 band SH back to DevRendering #rb Daniel.Wright Change 2932760 on 2016/04/04 by Zabir.Hoque Migrating high resolution cubemaps for skylight and reflection probes. #rb: Daniel.Wright Change 2933121 on 2016/04/05 by Rolando.Caloca DR - vk - Fix free blocks not getting joined - Fix compile issue Change 2933122 on 2016/04/05 by Rolando.Caloca DR - Do not shorten dumped shaders path Change 2933126 on 2016/04/05 by Rolando.Caloca DR - vk - Index Buffers using new resource management Change 2933127 on 2016/04/05 by Rolando.Caloca DR - vk - Extract multibuffer off index buffer Change 2933131 on 2016/04/05 by Rolando.Caloca DR - vk - Transition to vb's using mutlibuffer Change 2933136 on 2016/04/05 by Rolando.Caloca DR - vk - Change staging buffers to use resource allocation system - Fix free block not getting joined - Remove define Change 2933140 on 2016/04/05 by Rolando.Caloca DR - vk - 'static' textures now use resource mgmt - Release free pages back to the OS - Remove ensure Change 2933152 on 2016/04/05 by Rolando.Caloca DR - vk - Fix aliasing granularity - Fix renderpass end/copy buffer ensure Change 2933155 on 2016/04/05 by Rolando.Caloca DR - SCW - Fix for -directcompile to directly load file for preprocessor Change 2933158 on 2016/04/05 by Rolando.Caloca DR - hlslcc - Error on Metal if trying to R & W on RWTextures - Fix indices on RW reads to be unsigned #codereview Mark.Satterthwaite, Michael.Trepka Change 2933169 on 2016/04/05 by Rolando.Caloca DR - vk - Move header to public to match changes on DevMobile Change 2933173 on 2016/04/05 by David.Hill Deferred decal rendering with negative scale #rb:Matrin.Mittring #jira:UE-27389 Change 2933273 on 2016/04/05 by Rolando.Caloca DR - vk - Fix renderdoc markers Change 2933274 on 2016/04/05 by Rolando.Caloca DR - Support for -AttachDebugger Change 2933316 on 2016/04/05 by Rolando.Caloca DR - vk - Compile fix whene enabling define Change 2933334 on 2016/04/05 by Rolando.Caloca DR - Compile fix #codereview Martin.Mittring Change 2933805 on 2016/04/05 by Brian.Karis Temporal AA dynamic antighosting. Fixed DOF Change 2933811 on 2016/04/05 by Brian.Karis Fixed area light NaNs. Improvements to area lights. Horizen handling for wrap around. Change 2933812 on 2016/04/05 by Brian.Karis Fixed fresnel on SSS skin. Change 2933813 on 2016/04/05 by Brian.Karis Tessellation fix Change 2933816 on 2016/04/05 by Brian.Karis Improved forward shading support [CL 2943241 by Gil Gribb in Main branch]
2016-04-13 21:24:38 -04:00
|| ClipmapViewState.CachedMaxOcclusionDistance != MaxOcclusionDistance
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
|| ClipmapViewState.CachedGlobalDistanceFieldViewDistance != Scene->GlobalDistanceFieldViewDistance
|| ClipmapViewState.CacheMostlyStaticSeparately != GAOGlobalDistanceFieldCacheMostlyStaticSeparately;
Copying //UE4/Dev-Rendering to Dev-Main (Source //UE4/Dev-Rendering@2932636) #lockdown nick.penwarden ========================== MAJOR FEATURES + CHANGES ========================== Change 2917472 on 2016/03/21 by Rolando.Caloca DR - Fix SCW directcompile arguments, add -pipeline Change 2919580 on 2016/03/23 by Rolando.Caloca DR - HlslParser - Fix for used elements (sparrow's arrow was showing when it shouldn't) Arrays of input/outputs are now flattened so disjoint entries can be optimized out (and fixes a bug) #jira OR-15380 #tests Run game with sparrow, test with slomo to check for gfx glitches Change 2919660 on 2016/03/23 by Rolando.Caloca DR - Latest vk changes (from dev mobile's 2916881 to 2919157) Change 2919902 on 2016/03/23 by Rolando.Caloca DR - Fix skeletal meshes decrementing stats twice #codereview Marcus.Wassmer #jira UE-28478 Change 2920020 on 2016/03/23 by David.Hill #Jira UE-28503 EyeAdapation when used in material shader may not be initialized. #rb olaf.piesche Change 2920071 on 2016/03/23 by Rolando.Caloca DR - Remove old vk define - Started moving around direct calls to queue submit Change 2920252 on 2016/03/23 by Rolando.Caloca DR - Changes vk structs to classes Change 2920314 on 2016/03/23 by Olaf.Piesche Add -windowed to standalone game PIE command line to avoid PIE launching in full screen #jira UE-27870 #codereview michael.trepka Change 2920745 on 2016/03/24 by Uriel.Doyon Texture streaming build now takes into account the material texcoord scales applied to the texture sampling. Also finds out which texcoord is being used when sampling textures (between 0 and 3 currently). TexCoord analysis debug view shaders is now working with SM4 ane SM5. StaticMeshComponents hold persistent data coming from the texture streaming build. #tests tested with different Paragon assets. Editor SM4 & SM5. Cooked maps #codereview marcus.wassmer Change 2921335 on 2016/03/24 by Uriel.Doyon Added missing static keyword for locally defined console variable. #codereview rolando.caloca Change 2921416 on 2016/03/24 by Uriel.Doyon Revert enabling debugview shaders on non PC platforms (until properly tested and debugged) Change 2921446 on 2016/03/24 by Daniel.Wright Planar reflection mesh Change 2921530 on 2016/03/24 by Daniel.Wright Manual revert of Ronin planar reflections #codereview Ryan.Vance Change 2921608 on 2016/03/24 by Uriel.Doyon Updated texture streamer to take into account the new HLOD texture group. Change 2921677 on 2016/03/24 by Daniel.Wright Distance Field Specular Occlusion * Prototype - disabled by default Change 2921681 on 2016/03/24 by Daniel.Wright UnmappedTexelsPercentage is now 100 based Change 2921682 on 2016/03/24 by Daniel.Wright Planar reflections * New actor and component * The scene is rendered to texture with a mirrored camera and a clip plane each frame * The reflection texture is then applied to opaque pixels in a deferred pass, with distance and angle from plane fades * Translucent materials apply the nearest reflection plane in the base pass * Planar reflections require the project setting 'Support global clip plane for Planar Reflections' to be enabled, since writing to SV_ClipDistance all the time adds about 15% BasePass GPU time on PS4 * Fixed global distance field in materials which had been broken since moving global distance field properties into the view uniform buffer * Fixed PS4 removing system-value semantics when output from vertex shader and not read in next stage Change 2921734 on 2016/03/24 by Uriel.Doyon Fixed tessellated cube having wrong UVs #jira UE-28379 Change 2922063 on 2016/03/24 by Daniel.Wright Removed planar reflection debug code Change 2922428 on 2016/03/25 by Chris.Bunner Delete FShaderPipeline objects when clearing TMaterialShaderMaps. #rb Rolando.Caloca #jira UE-28621 Change 2922803 on 2016/03/25 by Rolando.Caloca DR - New cmd buffer management (disabled) - Move cmd buffer out of pending state and into context - Do not hardcode # cmd buffers - Move back buffer image mgmt into swapchain - Fixed some image layout transition bugs Change 2923056 on 2016/03/25 by Rolando.Caloca DR - Initial fix for canvas locking inside a render pass [CL 2932649 by Gil Gribb in Main branch]
2016-04-04 18:44:59 -04:00
if (ShouldUpdateClipmapThisFrame(ClipmapIndex, View.ViewState->GlobalDistanceFieldUpdateIndex)
|| bForceFullUpdate)
{
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3154632) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3122543 on 2016/09/13 by Ben.Woodhouse Override HasOcclusion for Widget3DComponentProxy to detect if the material is has depth testing enabled. #jira UE-35878 Change 3122544 on 2016/09/13 by Ben.Woodhouse Shadow stencil optimisation with cvar (enabled by default) Avoids redundant clearing the stencil buffer for per-object and pre shadows by zeroing the stencil during testing, following discussion on UDN. This means we don't benefit from Hi Stencil on GCN for the shadow projection draw calls, but it's still faster in all the cases I could find, including for the player character where the bounding box is quite large. (Note: early stencil still works fine, according to PIX) Shadow projection GPU time profiling : Test map with 35 characters, stationary directional light - 4ms-2ms on XB1 - 2.5ms to 0.9ms on PC (r9-390X) - 3ms-2ms on PS4 Paragon PS4 (roughly 20% reduced - from ~0.39ms) Change 3122687 on 2016/09/13 by Rolando.Caloca DR - vk - Fix ES31 crash Change 3122691 on 2016/09/13 by Rolando.Caloca DR - vk - Fixes for SDK 1.0.26.0 Change 3122778 on 2016/09/13 by Rolando.Caloca DR - vk - Fix number of layers on barrier Change 3122921 on 2016/09/13 by Rolando.Caloca DR - vk - Fix ES3.1 Change 3122925 on 2016/09/13 by Ben.Woodhouse Fix sky lighting issue with skin and re-enable non-checkerboard lighting by default (fallout from lightaccumulator refactor) #jira UE-35904 Change 3123016 on 2016/09/13 by Chris.Bunner Fixed adaptive tessellation, broken by CL 3089208 refactor. #jira UE-35341 Change 3123079 on 2016/09/13 by Rolando.Caloca DR - vk - Force StoreOp store instead of DontCare everywhere (temporarily) Change 3123503 on 2016/09/13 by David.Hill #jira UE-25623 converted a check() to checkf() to include better diagnostic information. Change 3123617 on 2016/09/13 by Guillaume.Abadie Fixes artifact when the camera direction is almost parallel to a wide plane with SSR. #jira UE-35128 Change 3123743 on 2016/09/13 by Brian.Karis Separate mesh reduction interfaces for static and skeletal. Zero bad tangents from input mesh. Change 3125378 on 2016/09/14 by Arne.Schober DR - [UE-34481] - Extract all the State which is necessary to execute the DebugTextDrawingDelegate from the SceneProxy into its own Helpers to be drawn to the canvas later on. The issue was that the SceneProxys are only owned by the RT after their creation and the GT should avoid reading from or writing state to them. Change 3125527 on 2016/09/14 by Arne.Schober DR - [UE-34481] - Fix UT build and mac Change 3125741 on 2016/09/14 by Rolando.Caloca DR - Extra debug mode for tracking down SCW crashes (r.DumpSCWQueuedJobs=1) Change 3125763 on 2016/09/14 by Rolando.Caloca DR - vk - Added new Renderpass cache - Fix buffer barrier warning Change 3125769 on 2016/09/14 by Rolando.Caloca DR - Renamed cvar to r.DumpSCWQueuedJobs Change 3125771 on 2016/09/14 by Rolando.Caloca DR - Added support for SV_ClipDistance on GL3 & 4 Change 3125792 on 2016/09/14 by Arne.Schober DR - [UE-34481] - Fix Odin and PS4 Change 3125880 on 2016/09/14 by Arne.Schober DR - [UE-34481] - Fix Fortnite Change 3125968 on 2016/09/14 by Brian.Karis Removed comment Change 3126315 on 2016/09/15 by Ben.Woodhouse GPU profiler robustness - Change stat gathering update to handle multiple views and non-scenerenderer stats (such Slate) properly - Simplify gathering logic - Fix race condition where we could read back queries before they're submitted on the RHI thread. - Fix for movie player stat gathering - disable gathering outside of the main engine tick #jira UE-35975 Change 3126792 on 2016/09/15 by Rolando.Caloca DR - vk - Release render pass cache Change 3126804 on 2016/09/15 by Rolando.Caloca DR - vk - Fix UpdateTexture2D() #jira UE-34151 Change 3126884 on 2016/09/15 by Rolando.Caloca DR - vk - Compile fix Change 3126953 on 2016/09/15 by Rolando.Caloca DR - Enable GPU capture when running OpenGL under RenderDoc - Will also set the memory mode to non coherent so not to kill performance on RenderDoc Change 3126966 on 2016/09/15 by Rolando.Caloca DR - Allow cooking for Vulkan SM4 to help with packaging Change 3127082 on 2016/09/15 by Guillaume.Abadie Wraps up contact shadows for release fixing different artifacts and handling correctly their screen space length. #jira UE-35367, UE-33602, UE-33603, UE-33604 #review-3125887 @brian.karis Change 3127130 on 2016/09/15 by Mark.Satterthwaite Add overloads to as* functions in hlslcc - HLSL allows you to call these on variables of the same type, in which case it simply returns the existing value but we had only defined the float<->u/int conversions, so hlslcc added implicit casts that broke such cases (i.e. asuint(uint) -> floatBitsToUint(float(uint))). This change defines the missing overloads as returns with regular casts. #jira FORT-25869 #jira UE-34263 Change 3127475 on 2016/09/15 by Rolando.Caloca DR - vk - Debug dump Change 3128131 on 2016/09/16 by Ben.Woodhouse (Integrated from //UE4/Private-Partner-NREAL/...) Alpha output support for postprocess materials (optional via a parameter) Needed for end of frame compositing. Could be used to pass intermediate data from one blendable postprocess to another (e.g edge detection) Change 3128135 on 2016/09/16 by Ben.Woodhouse GPU profiler (PS4) - remove bubbles between commandlist submissions from query times Use r.ps4.AdjustRenderQueryTimestamps cvar to enable/disable (defaults to on) Also fixes some potential precision issues with unit GPU timing Change 3128247 on 2016/09/16 by Rolando.Caloca DR - vk - Cache framebuffers Change 3128593 on 2016/09/16 by Rolando.Caloca DR - vk - Fix for crash loading map #jira UE-36072 Change 3128759 on 2016/09/16 by Mark.Satterthwaite Back out changelist 3127130 - its causing a build failure in FPostProcessVelocityScatterVS because hlslcc is picking the wrong as_* overload. Change 3130236 on 2016/09/19 by Chris.Bunner Exposed full SceneCaptureComponent classes instead of select methods. #jira UE-35996 Change 3130388 on 2016/09/19 by Rolando.Caloca DR - Avoid crash when adding dynamic primitives #jira UE-35327 Change 3130393 on 2016/09/19 by Marc.Olano Improve vector noise tooltips & documentation Change 3130547 on 2016/09/19 by Ben.Woodhouse Fix for ensure fail when initializing point light shadowmaps. This came about because cubemap rendertargets always have Extents of (Resolution, 0). The Y component was implicitly used to determine if it was a cubemap, which is odd... The fix was to make the definition explicit via a flag and initialize both the X and Y parameters. I suspect the ensure started happening recently due to a more recent change, but fixing the underlying logic seems like the correct fix. #jira UE-35837 Change 3130578 on 2016/09/19 by Daniel.Wright Workaround OpenGL/NVidia bug with non-power-of-2 textures by disabling CSM atlassing if we're using OpenGL Change 3130682 on 2016/09/19 by Rolando.Caloca DR - Better fix for UE-35327 #jira UE-35327 Change 3130767 on 2016/09/19 by Uriel.Doyon Better handling of color array in VisualizeComplexity code to prevent assert. #jira UE-29332 Change 3130965 on 2016/09/19 by Arne.Schober DR - [UE-35679] - the crash was caused by the Resource of the UTexture being Null. And one of the Kismet Nodes calling a function on that resource. The solution was to disable that call from Kismet when only cooking. Change 3130967 on 2016/09/19 by Chris.Bunner Hid redundant texture sampler properties from texture object parameter. Hid redundant texture property input on texture parameter nodes. Fixed copy-paste error in expression texture parameter docs. #jira UE-32724 Change 3131118 on 2016/09/19 by Mark.Satterthwaite Second attempt - this time with the correct input types. Add overloads to as* functions in hlslcc - HLSL allows you to call these on variables of the same type, in which case it simply returns the existing value but we had only defined the float<->u/int conversions, so hlslcc added implicit casts that broke such cases (i.e. asuint(uint) -> floatBitsToUint(float(uint))). This change defines the missing overloads as returns with regular casts. #jira FORT-25869 #jira UE-34263 Change 3131153 on 2016/09/19 by Rolando.Caloca DR - Fix recompute normals when triangles have a LHS tangent basis Integrate from 3028634 - Also make meshes that don't have morphs be able to run through the recompute normals path #jira UE-35472 Change 3131228 on 2016/09/19 by Mark.Satterthwaite Duplicate CL #3114668: Always disable asynchronous shader compilation for the global shader map on Metal as some of them are needed very early. #jira UE-35240 Change 3131246 on 2016/09/19 by Rolando.Caloca DR - Shrink gpu skinning permutations Change 3131261 on 2016/09/19 by Mark.Satterthwaite Fix Metal validation failures due to particle rendering not binding buffers to all buffer inputs declared in the shader. ContentExamples Effects no longer aborts complaining that the particle system didn't bind a required buffer. Change 3131265 on 2016/09/19 by Mark.Satterthwaite Fix FMetalDynamicRHI::RHIReadSurfaceData for shared textures on iOS. Change 3131271 on 2016/09/19 by Mark.Satterthwaite Use private memory for the Metal stencil SRV workaround needed on El Capitan. Change 3131273 on 2016/09/19 by Mark.Satterthwaite Disable the lazy-encoder construction in Metal for AMD - there is a situation that causes the lazy construction to perform a clear that isn't wanted and so far this hasn't been tracked down and fixed. Until then, this will render correctly. Change 3131280 on 2016/09/19 by Mark.Satterthwaite For GLSL interpolation mode flags must come before storage mode flags and you can't redeclare the system variable gl_Layer to use a differing interpolation mode. Change 3131283 on 2016/09/19 by Mark.Satterthwaite Change the ShaderCache to not cache resource bindings in the draw states for shader platforms that don't care - reduces the number of draw states considered significantly without reducing effectiveness. We can support ShaderCache with Metal SM5 but not the RHI thread enabled so change when we enable it and make sure we load the binary shader cache. Change 3131402 on 2016/09/19 by Rolando.Caloca DR - Disambiguate callstack #jira UE-34415 Change 3131469 on 2016/09/19 by Rolando.Caloca DR - vk - Check if we can allocate descriptors off a pool Change 3131482 on 2016/09/19 by Rolando.Caloca DR - vk - Remove unused var Change 3131506 on 2016/09/19 by Mark.Satterthwaite With permission from Josh.A & Michael.T, deprecate Mac OpenGL support. For now this just means visibly warning users with message boxes - but in a future release OpenGL support will be removed from macOS. Change 3131536 on 2016/09/19 by Rolando.Caloca DR - vk - Compile fix Change 3131564 on 2016/09/19 by Rolando.Caloca DR - vk - Submit Hint - Disable framebuffer recycling as its causing a hang Change 3131625 on 2016/09/19 by Mark.Satterthwaite Inside MetalRHI add an optional cache for disposed texture objects so we may reuse them - controlled by CVAR rhi.Metal.TextureCacheMode which must be set prior to running as it can't be changed at runtime. Settings: 0 = off, 1 (default) = will attempt to reuse private memory texture objects within the frame they are released otherwise they are disposed of as before. Setting 2 extends the caching to all textures - though Managed/Shared textures cannot be reused until after the frame in which they were released has been processed on the GPU. In this mode id<MTLTexture> objects are never returned to the OS so in order to conserve VRAM calls to setPurgeableState are made to allow the driver to reclaim unusued memory if required. Change 3131630 on 2016/09/19 by Mark.Satterthwaite More statistics in Metal added to track down where performance was going in a particular project but which may be more generally useful. Change 3131955 on 2016/09/20 by Gil.Gribb Merging //UE4/Dev-Main@3129758 to Dev-Rendering (//UE4/Dev-Rendering) Change 3131978 on 2016/09/20 by Gil.Gribb CIS fix Change 3132584 on 2016/09/20 by Ben.Woodhouse Add some additional checks to help track down a rare crash with terrain rendering and shader recompiling #jira UE-35937 Change 3132696 on 2016/09/20 by Mark.Satterthwaite Use set*Bytes to handle uploading buffers < 4Kb when available - this is faster than lots of small Metal buffers and reduces the amount of GPU heap fragmentation. Where the API feature isn't available or hasn't been tested yet we'll use another ring-buffer inside the MetalCommandEncoder to emulate it. Change 3132772 on 2016/09/20 by Mark.Satterthwaite Rework Metal's handling of RHISetStreamSource calls that override the stride of vertex declarations to be much more efficient. Change 3132870 on 2016/09/20 by Ben.Woodhouse Fix mac compile error Change 3133049 on 2016/09/20 by Brian.Karis Changed light source shapes in reflection captures to use alpha Change 3133057 on 2016/09/20 by Brian.Karis Alphaed out on spot light cone as well. Change 3133263 on 2016/09/20 by Rolando.Caloca DR - vk - Debug names for objects Change 3133292 on 2016/09/20 by Rolando.Caloca DR - vk - Fix SRGB upload/formats Change 3133395 on 2016/09/20 by Rolando.Caloca DR - vk - SM5 fixes Change 3134026 on 2016/09/21 by Gil.Gribb Merging //UE4/Dev-Main@3133983 to Dev-Rendering (//UE4/Dev-Rendering) Change 3134663 on 2016/09/21 by Chris.Bunner Merging Dev-MaterialLayers to Dev-Rendering, CL 3134208. Initial material attribute extensibility changes. #jira UE-34347 Change 3134730 on 2016/09/21 by Arne.Schober DR - [UE-34481] - Fix minor brokenness found by Gil Change 3134792 on 2016/09/21 by Chris.Bunner Fixed compile errors for non-editor builds. Change 3135214 on 2016/09/21 by Rolando.Caloca DR - vk - Fix visualize texture - Dump memory when OOM (to track leaks) Change 3135225 on 2016/09/21 by Rolando.Caloca DR - vk - Ensure on exit if mem leak - Update fences if running wait for idle Change 3135672 on 2016/09/22 by Gil.Gribb Merging //UE4/Dev-Main@3135568 to Dev-Rendering (//UE4/Dev-Rendering) Change 3135793 on 2016/09/22 by Rolando.Caloca DR - vk - Set dynamic state after binding pipeline or on a fresh cmd buffer Change 3135816 on 2016/09/22 by Rolando.Caloca DR - Add names for d3d on renderdoc Change 3135894 on 2016/09/22 by Chris.Bunner Fixed initialization order warning. Change 3136024 on 2016/09/22 by Rolando.Caloca DR - vk - Fix stencil faces Change 3136042 on 2016/09/22 by Marcus.Wassmer Fix compile error Change 3136046 on 2016/09/22 by Chris.Bunner Renamed material for PostTonemapHDRColor visualization to reflect actual usage. Change 3136308 on 2016/09/22 by Uriel.Doyon Changed how the component relative rotation is computed, in order to have more consistency after blueprint rescript. #jira UE-36094 Change 3136798 on 2016/09/22 by Chris.Bunner Gather object references from stereo view state in USceneCaptureComponent. This matches behavior of other classes such as ULocalPlayer. Change 3137092 on 2016/09/22 by Rolando.Caloca DR - vk - Rename pipeline to gfx pipeline Change 3137263 on 2016/09/22 by Mark.Satterthwaite Duplicate CL #3135157: Fix one cause of Metal crashes loading into a zone - the PlanarReflection shader code needs to always set the IsStereoParameter so that the shader can perform the if-test without causing an invalid GPU access. #jira FORT-30061 Change 3137265 on 2016/09/22 by Mark.Satterthwaite Duplicate CL #3135169: Correct Metal texture creation for AVF media framework - we can't provide a render-targetable version of the texture without blitting. The native texture we get is a GPU copy that can be made CPU accessible (i.e. it is not tiled). Change 3137266 on 2016/09/22 by Mark.Satterthwaite Duplicate CL #3135237: Metal validation layer fix: under Metal if there are no reads from the vertex stage-in buffers we should use the Empty vertex declaration, not the filter declaration, otherwise we have to bind a redundant vertex stream buffer to silence the validation layer. Change 3137268 on 2016/09/22 by Mark.Satterthwaite Duplicate CL #3136033: To fix the Fortnite login screen force Nvidia Macs to use the set*Bytes API for small buffer updates even on El Capitan. We can't do this globally as Intel didn't implement these functions until macOS Sierra. Fix GPU selection code in MetalRHI to confirm everything is working. #jira FORT-30385 Change 3137269 on 2016/09/22 by Mark.Satterthwaite Duplicate CL #3137164: Add stats to track exactly how many command buffers are allocated and committed each frame to work out why Fortnite on AMD is hanging, which turns out to be because each texture update/reallocation ends up in its own command-buffer. This needs to be rethought to pack these into fewer command buffers with the same synchronisation requirements to minimise command-buffer splits but for now we'll just make the default sufficiently large that we shouldn't see the hang until the work is done. Also ensure that command-buffer failure is always fatal - there is no way to recover or continue if a command-buffer fails. #jira FORT-30377 Change 3137606 on 2016/09/23 by Gil.Gribb Merging //UE4/Dev-Main@3137560 to Dev-Rendering (//UE4/Dev-Rendering) Change 3137936 on 2016/09/23 by Rolando.Caloca DR - Split RHICmdList clear into color & ds in prep for changes Change 3138346 on 2016/09/23 by Rolando.Caloca DR - vk - Some renaming and splitting classes in prep for compute Change 3138628 on 2016/09/23 by Rolando.Caloca DR - vk - Fix mem leak on framebuffers Change 3138721 on 2016/09/23 by Daniel.Wright Better comment for r.DefaultFeature.AntiAliasing Change 3138722 on 2016/09/23 by Daniel.Wright Fixed assert from decals with MSAA due to binding the Scene Depth Texture instead of surface Change 3138723 on 2016/09/23 by Daniel.Wright Corrected GC doc Change 3138892 on 2016/09/23 by Daniel.Wright Fixed instanced static meshes being unbuilt after a lighting build if you ever cancelled a previous lighting build Change 3138905 on 2016/09/23 by Daniel.Wright "Optimizations" -> "Optimization Viewmodes" Change 3138939 on 2016/09/23 by Daniel.Wright Disabled the stationary light overlap viewmode with forward shading Change 3139710 on 2016/09/26 by Rolando.Caloca DR - Rename and added texture RHIClearDepthStencil -> RHIClearDepthStencilTexture Change 3139820 on 2016/09/26 by Rolando.Caloca DR - Remove prefix from shader frequency strings Change 3139828 on 2016/09/26 by Marcus.Wassmer Add SetShaderValue() specialization for bools on AsyncCompute commandlists to match the Gfx specialization. Change 3139840 on 2016/09/26 by Benjamin.Hyder Adding VectorNoise Examples to TM-Noise map Change 3139862 on 2016/09/26 by Rolando.Caloca DR - Better log to track down crash #jira UE-36271 Change 3140142 on 2016/09/26 by Rolando.Caloca DR - Fix clang warning Change 3140145 on 2016/09/26 by Rolando.Caloca DR - Rename RHIClearColor(MRT) to RHIClearColorTextures and pass textures as parameters Change 3140360 on 2016/09/26 by Daniel.Wright Lighting Scenarios and lightmaps moved to separate package * Levels can be marked as lighting scenarios (eg Day, Night). Lighting is built separately for each lighting scenario with actors / lights in all other scenario levels hidden. Only one lighting scenario level should be visible at a time in game, and its lightmaps will be applied to the world. * Most outputs of the lighting build now go into a separate _BuiltData package. This improves level Save and AutoSave times as the separate package will only be dirtied after lighting rebuilds. * If a lighting scenario is present, all lightmaps are placed inside it's _BuiltData package. This means that only the currently loaded lighting scenario's lightmaps will be loaded (Day or Night, but not both). This also means that lightmaps for a streaming level will not be streamed with it. * For backwards compatibility, existing lightmaps are moved to a new _BuiltData package on load. * Reflection captures and precomputed visibility were not moved to the separate package. Reflection captures are force updated on load of a lighting scenario level, which can increase load times. Change 3140361 on 2016/09/26 by Daniel.Wright Lighting Scenarios UI Change 3140582 on 2016/09/26 by Mark.Satterthwaite Duplicate CL #3140166 Fix the video playback in Fortnite - bind our shader resource texture as the render-target texture as for some reason the playback code expects it there, even though we could never provide one. #jira FORT-30551 Change 3140584 on 2016/09/26 by Mark.Satterthwaite Duplicate CL #3140131: Fix crash under the validation layer & Nvidia's El Capitan (10.11) drivers when distance field particle collisions are used without any scene distance fields available - bind the black volume texture when that is the case to avoid bad access on the GPU. #jira FORT-30622 Change 3140586 on 2016/09/26 by Mark.Satterthwaite Duplicate CL #3140450: Fix launching the game on Intel GPUs by disabling Tiled Reflections on Intel for macOS Sierra like we did for El Capitan as there's currently a driver bug to means it doesn't work properly. #jira FORT-30649 Change 3140594 on 2016/09/26 by Zabir.Hoque Fix benchmark shaders register bindings. TEXCOORD0 was bound to register 1 in VS and then register 0 in PS. DX12 treats this a PSO creation failuer unlike DX11 this was an error. Change 3140601 on 2016/09/26 by Marcus.Wassmer New 'Cinematic' Scalability level. Remove unused 'new' motionblur CVAR Change 3140602 on 2016/09/26 by Zabir.Hoque CreateTexture3D on XB1 DX11 was leaking ESRAM by reserving it but not allocating to it. #Tests: Fix was tested by licensee (GearBox). Change 3140622 on 2016/09/26 by Rolando.Caloca DR - vk - More prep for sm5 Change 3140765 on 2016/09/26 by Rolando.Caloca DR - Fix ensure from bad clear depth surface Change 3141251 on 2016/09/27 by Rolando.Caloca DR - vk - Rename & cleanup Change 3141394 on 2016/09/27 by Rolando.Caloca DR - vk - Compute pipeline state Change 3141463 on 2016/09/27 by Mark.Satterthwaite Fix the include order to avoid compile errors on Mac. Change 3141529 on 2016/09/27 by Gil.Gribb Merging //UE4/Dev-Main@3139632 to Dev-Rendering (//UE4/Dev-Rendering) Change 3141830 on 2016/09/27 by zachary.wilson Adding testing content for lighting scenarios to collaborate with Ben Change 3141941 on 2016/09/27 by Olaf.Piesche Speculative fix for UE-34815; have yet to repro this but there's really only so many things it could be. I currently don't see how the sim resources could go away after queueing, so I'm replacing the check with an ensure and null checking the resource pointer. Change 3142035 on 2016/09/27 by Olaf.Piesche Fix compiler error from silly leftover bit of code. Change 3142065 on 2016/09/27 by Benjamin.Hyder Updating Lighting Scenario map Change 3142262 on 2016/09/27 by Mark.Satterthwaite Change Apple RHI initialisation to select the first compatible shader platform to decide which RHI to initialise. Internally in MetalRHI we must gracefully fallback to a lower feature-level when this initial selection is not available on the current device/OS, in which case we need to validate that the selected shader platform was actually packaged. The order of initialisation is different per-platform: On Mac: Order of initialisation is the order listed in TargetedRHIs .ini specifications. On iOS/tvOS: Order is explicit: Metal MRT > Metal ES 3.1 > OpenGL ES 2 #jira UE-35749 Change 3142292 on 2016/09/27 by Rolando.Caloca DR - hlslcc - Fix for warning X3206: implicit truncation of vector type causing error #jira UE-31438 Change 3142397 on 2016/09/27 by Mark.Satterthwaite Update hlslcc for Mac including RCO's changes in CL #3142292. #jira UE-31438 Change 3142438 on 2016/09/27 by Daniel.Wright UMapBuildDataRegistry's created for legacy lightmap data are placed in the map package, which avoids problems with cooking Change 3142452 on 2016/09/27 by Rolando.Caloca DR - Proper support for int defines Change 3142519 on 2016/09/27 by Arne.Schober DR - [UE-33438] - Added a Project Setting to enable Skincache Shader Permuations. The Default value for the Skincache mode was changed to enabled. The reasoning behind this was that it will be auto disabled when Skincache Shaders are disabled, and runtime toggle is a debuging feature that mainly programmers are dealing with. The Recompute Tangents option in the Skinned Mesh is now automatically grayed out when no Skincache Shader Permuations are available. Change 3142537 on 2016/09/27 by Daniel.Wright Fixed r.ScreenPercentage with MSAA - a scissor rect was being setup during the resolve and not reset Change 3142691 on 2016/09/27 by Daniel.Wright Disabled renaming of legacy ULightmap2D's to the separate package since UMapBuildDataRegistry is no longer put in a separate package for legacy content Change 3142711 on 2016/09/27 by Daniel.Wright GComponentsWithLegacyLightmaps entries get handled by USceneComponent::AddReferencedObjects, fixes a crash when you open a map directly from the content browser Change 3142712 on 2016/09/27 by Daniel.Wright Separate category for ParticleCutout properties Change 3142762 on 2016/09/27 by Uriel.Doyon Added per static mesh and per skeletal mesh UV density data. The data is now saved and available in cooked builds. The density are computed by the engine but can be overridden by the user in the material tabs. Texture streaming intermediate component data is now per material instead of per lod-section. New ViewModeParam in FSceneViewFamily allowing context specific param per viewmode. This is currently used to show which UV channel and which texture index is being shown in the texture streaming accuracy viewmodes. This replaces r.Streaming.AnalysisIndex Renamed texture streaming viewmodes: MeshTexCoordSizeAccuracy -> MeshUVDensityAccuracy MaterialTexCoordScalesAccuracy -> MaterialTextureScaleAccuracy MaterialTexCoordScalesAnalysis -> OutputMaterialTextureScales Improved UV density computation and viewmode. LightmapUVDensity is now computed separately from UVChannel Density. Fixed texture streaming for instanced static mesh component and derived types. Change 3143464 on 2016/09/28 by Daniel.Wright Removed 'experimental' from forward shading setting Change 3143508 on 2016/09/28 by Chris.Bunner Added component type handling to FoldedMath and Length material expressions. #jira UE-36304 Change 3143557 on 2016/09/28 by Rolando.Caloca DR - Back out changelist 3142292 Change 3143563 on 2016/09/28 by Rolando.Caloca DR - vk - Force hlslcc re-link Change 3143648 on 2016/09/28 by Daniel.Wright Moved GetMeshMapBuildData to UStaticMeshComponent since FStaticMeshComponentLODInfo::OwningComponent can't be initialized reliably in the case of SpawnActor off of a blueprint default that has LODData entries already. Change 3143661 on 2016/09/28 by Chris.Bunner Warning fix. Change 3143723 on 2016/09/28 by Daniel.Wright DumpUnbuiltLightIteractions after lighting build for debugging Change 3143822 on 2016/09/28 by Arne.Schober DR - Refactoring of the ViewMatrices. Moved the Derived Matrices into the FViewMatrix struct. Made all members private do emphasize the static constness of that struct after creation. Renamed the heavy weight members on this struct to Compute*. Methods that modify The ViewMatrices have been renamed to Hack* to discurage their use in the future until a better solution for those problems is found. The ViewMatrix modification is especially misleading because it only changes the State of the ViewMatrices to read their Position from the Material Editior as if coming from the Lightsource (mainly for manual bilboards) as well as doing someting similar to generate CPU bilboards for shadows. Change 3143860 on 2016/09/28 by Benjamin.Hyder Updating TM-Noise map to include 3d noise examples Change 3143939 on 2016/09/28 by Rolando.Caloca DR - vk - Better debugging of submissions - Added r.Vulkan.IgnoreCPUReads to help track down hangs on some ihvs Change 3144006 on 2016/09/28 by Brian.Karis Fixed PixelError not being set correctly with LOD groups. Removed unneeded Simplygon references. Mesh reduction module can now be chosen by name with r.MeshReductionModule Change 3144026 on 2016/09/28 by Benjamin.Hyder Updating QA-Effects map to correct numbering issue Change 3144098 on 2016/09/28 by Arne.Schober DR - ViewMatrices Refactoring - Fix UT Change 3144158 on 2016/09/28 by Rolando.Caloca DR - Undo splitting RHI command context Change 3144952 on 2016/09/29 by Rolando.Caloca DR - vk - Missing swapchain flag Change 3145064 on 2016/09/29 by Olaf.Piesche #jira UE-36091 Pulling range update for vector distributions even when UDist is not dirty; some content has a lookup table and a clean dist, but the range values have not been baked; always pulling them should be safe and not significantly costly. Change 3145354 on 2016/09/29 by Benjamin.Hyder Updating Tm-ContactShadows Change 3145485 on 2016/09/29 by Daniel.Wright Made SeamlessTravelLoadCallback handle legacy lightmaps Change 3145527 on 2016/09/29 by Daniel.Wright Don't clear legacy lightmap annotations on each map - fixes lighting unbuilt when doing seamless travel Change 3145530 on 2016/09/29 by Simon.Tovey UE-36188 - Editor crash when updating hierarchical instance static mesh component Dirtied render state rather than unsafe update of bounds. Change 3145608 on 2016/09/29 by Gil.Gribb Attempt to fix a random compiler error under win32 Change 3145749 on 2016/09/29 by Uriel.Doyon Fix for static analysis warning Change 3146091 on 2016/09/29 by Zabir.Hoque RHI Interface changes to support PSO based APIs Change 3146092 on 2016/09/29 by Zabir.Hoque D3D12 RHI support for PSO based APIs. Change 3146590 on 2016/09/30 by Gil.Gribb Merging //UE4/Dev-Main@3146520 to Dev-Rendering (//UE4/Dev-Rendering) Change 3146731 on 2016/09/30 by Rolando.Caloca DR - Fix merge conflicts Change 3146778 on 2016/09/30 by Rolando.Caloca DR - More integration compile fixes Change 3146790 on 2016/09/30 by Rolando.Caloca DR - Integration fix Change 3146849 on 2016/09/30 by Rolando.Caloca DR - Final integration fix Change 3146899 on 2016/09/30 by Daniel.Wright Static analysis fix for dereferencing World Change 3147020 on 2016/09/30 by Rolando.Caloca DR - vk - Fix depth issue on AMD cards - Added VULKAN_KEEP_CREATE_INFO to help debugging creation - Added num color attachments to pipeline key Change 3147034 on 2016/09/30 by Rolando.Caloca DR - Fix Kite crash where shader pipelines were optimizing non-tessellation pipelines #jira UE-36277 #jira UE-36500 Change 3147080 on 2016/09/30 by Rolando.Caloca DR - vk - Disable debug info by default Change 3147082 on 2016/09/30 by Chris.Bunner Allow tessellation to be used with DrawTile calls by swapping fixed mesh to triangle list. #jira UE-36491 Change 3147388 on 2016/09/30 by Chris.Bunner Blacklisted Nvidia driver 372.70 as it has known stability issues skewing our top crashes list. Also updated recommended version numbers. #jira UE-35288 Change 3147394 on 2016/09/30 by Chris.Bunner Additional logging for rare error. #jira UE-35812 Change 3147459 on 2016/09/30 by Rolando.Caloca DR - vk - Some more srgb formats Change 3147537 on 2016/09/30 by Rolando.Caloca DR - vk - Standarize srgb flag like D3D11 - Minor FVulkanShader cleanup Change 3147620 on 2016/09/30 by Olaf.Piesche #jira UE=34486 particle component tick function task can be invalid during pause; add check Change 3148028 on 2016/10/01 by Daniel.Wright Renamed RenderingSettings.cpp to match header Change 3148059 on 2016/10/01 by Daniel.Wright Disabled reparenting in the profiler which is disorienting Change 3148067 on 2016/10/01 by Daniel.Wright Support for ReflectionEnvironment and light type show flags with ForwardShading Change 3148069 on 2016/10/01 by Daniel.Wright Added CapsuleIndirectShadowMinVisibility to SkinnedMeshComponent, so artists have control over indirect capsule shadow darkness without changing cvars Change 3148072 on 2016/10/01 by Daniel.Wright Added a rendering setting to disable the new lightmap mixing behavior, where smooth surfaces don't have any mixing. r.ReflectionEnvironmentLightmapMixBasedOnRoughness Change 3148073 on 2016/10/01 by Daniel.Wright r.VertexFoggingForOpaque only affects forward shading - manual copy of Ben's fix from Orion stream Change 3148074 on 2016/10/01 by Daniel.Wright Enabled planar reflection receiving on the material used for the preview of a APlanarReflection Change 3148084 on 2016/10/01 by Daniel.Wright Fixed reflections on Surface TranslucencyVolume in deferred Change 3148085 on 2016/10/01 by Daniel.Wright Fixed planar reflection composite being done too many times in stereo deferred Change 3148086 on 2016/10/01 by Daniel.Wright Clamp IndirectLightingQuality to 1 in preview builds - keeps preview useful even with IndirectLightingQuality jacked up to 10. Change 3148107 on 2016/10/01 by Daniel.Wright CIS fix Change 3148113 on 2016/10/01 by Daniel.Wright Translucency lighting modes for forward shading * Per-vertex modes use GetSimpleDynamicLighting since they can't support specular anyway Change 3148306 on 2016/10/02 by Rolando.Caloca DR - vk - Fix for some NV drivers on Win10 Change 3148307 on 2016/10/02 by Rolando.Caloca DR - vk - Compute pipeline Change 3148358 on 2016/10/02 by Rolando.Caloca DR - vk - Consolidate and renumber enum for binding types Change 3148396 on 2016/10/03 by Rolando.Caloca DR - vk - Warning fix Change 3148697 on 2016/10/03 by Benjamin.Hyder Submitting M_Chromebal after enabling planar reflectionsl Change 3148799 on 2016/10/03 by Rolando.Caloca DR - vk - static analysis fix Change 3148934 on 2016/10/03 by Chris.Bunner Added pre-skinned local position material graph node, vertex shader only. Change 3148994 on 2016/10/03 by Chris.Bunner Added missing header file. Change 3149085 on 2016/10/03 by Daniel.Wright Support for ReflectionEnvironment show flag in base pass reflections without any shader overhead Change 3149095 on 2016/10/03 by Rolando.Caloca DR - vk - Disable new render passes Change 3149125 on 2016/10/03 by Rolando.Caloca DR - vk - Fix for multiple memory types Change 3149181 on 2016/10/03 by Rolando.Caloca DR - Better message when missing pipelines Change 3149215 on 2016/10/03 by Rolando.Caloca DR - RHIClearColor -> RHIClearColorTexture #tests Orion Editor run match on Agora_P Change 3149288 on 2016/10/03 by Chris.Bunner Added PreTonemapHDRColor for buffer visualization and target output. #jira UE-36333 Change 3149402 on 2016/10/03 by Daniel.Wright Light attenuation buffer is now multisampled, fixes preshadows with MSAA (depth testing failed during stencil pass) but adds a resolve (.12ms at VR res) Change 3149403 on 2016/10/03 by Daniel.Wright Forward lighting supports lighting channels Change 3149574 on 2016/10/03 by Marcus.Wassmer PR #2817: Ansel/Photography system (Contributed by adamnv) Modified to become a plugin Change 3149615 on 2016/10/03 by Rolando.Caloca DR - vk - Fix PF_G16R16 which fixes reflections Change 3149639 on 2016/10/03 by Olaf.Piesche Adding more ensures to catch NaNs occasionally appearing in particle locations early Change 3149745 on 2016/10/03 by Uriel.Doyon Moved UVDensity computation in the staticmesh DDC. Change 3149749 on 2016/10/03 by Daniel.Wright Fixed lightmaps on BSP, which was fallout from Lighting Scenarios backwards compatibility Change 3149755 on 2016/10/03 by Benjamin.Hyder Checking in built lighting for QA-postprocessing Change 3149758 on 2016/10/03 by Benjamin.Hyder re-submitting built lighting for QA-PostProcessing Change 3149940 on 2016/10/04 by Gil.Gribb Merging //UE4/Dev-Main@3149754 to Dev-Rendering (//UE4/Dev-Rendering) Change 3150098 on 2016/10/04 by Marcus.Wassmer Fix some clang and win32 errors Change 3150323 on 2016/10/04 by Rolando.Caloca DR - vk - Static analysis fix Change 3150456 on 2016/10/04 by Daniel.Wright Revert temp logs Change 3150731 on 2016/10/04 by Daniel.Wright Static lights now add a dummy map build data entry for their ULightComponent::IsPrecomputedLightingValid Change 3150795 on 2016/10/04 by Marcus.Wassmer Fix RHIClearUAV and Drawindirect bugs on PS4. Also fix PS4 compile error from bad merge. Change 3151065 on 2016/10/04 by Ben.Marsh Merging //UE4/Dev-Main to Dev-Rendering (//UE4/Dev-Rendering) Change 3151134 on 2016/10/04 by Brian.Karis Fixed corrupt mesh generation from quadric simplifier due to uninitialized color array. Change 3151201 on 2016/10/04 by Marcus.Wassmer Nvidia approved icon for ansel plugin. Change 3151240 on 2016/10/04 by Marcus.Wassmer Fix string concat build error. Change 3151258 on 2016/10/04 by Ben.Marsh Fix compile error. Change 3151290 on 2016/10/04 by Marcus.Wassmer Bumping static mesh DDC key to hopefully fix distancefield crashes after brian's quadric simplifier fix. Change 3152104 on 2016/10/05 by Chris.Bunner Workaround for legacy BreakMA material node invalid component masks. #jira UE-36832 Change 3152130 on 2016/10/05 by Ben.Woodhouse Fix issue with skylight SH and fast semantics on DX11. We need to clear the cube scratch textures before writing to them to avoid issues when reading them back for mip downsampling #jira UE-35890 Change 3152240 on 2016/10/05 by Rolando.Caloca DR - Fix for missing gizmo colors #jira UE-36515 Change 3152338 on 2016/10/05 by Daniel.Wright Hopeful fix for FDistanceFieldVolumeTexture assert in the cooker Change 3152833 on 2016/10/05 by Brian.Karis Improved precision of quadrics. Fixes bad triangles on large meshes Change 3153376 on 2016/10/06 by Rolando.Caloca DR - Fix for SM4 missing pipelines fallout Change 3153650 on 2016/10/06 by Gil.Gribb Merging //UE4/Dev-Main@3153068 to Dev-Rendering (//UE4/Dev-Rendering) Change 3153656 on 2016/10/06 by Uriel.Doyon Fixed main integration compilation issues. Some of the Mesh UVDensity UI is temporary disabled. Change 3153725 on 2016/10/06 by Uriel.Doyon Fixed crash when source data is missing for lightmaps #jira UE-36157 Change 3153998 on 2016/10/06 by Gil.Gribb Merging //UE4/Dev-Main to Dev-Rendering@3153705 (//UE4/Dev-Rendering) Change 3154056 on 2016/10/06 by Marcus.Wassmer Fix compile errors from merge. Also restore some light scencario code Change 3154176 on 2016/10/06 by Marcus.Wassmer Fix deprecation warning Change 3154252 on 2016/10/06 by Marcus.Wassmer Fix more deprecation warnings Change 3154632 on 2016/10/07 by Chris.Bunner Fix for incorrect re-entrant detection with a function called twice in a row. The function input Preview expression is overridden when the function is called to link it into the caller graph, but it was restored too late for chained calls to the same function. #jira UE-37002 [CL 3154728 by Gil Gribb in Main branch]
2016-10-07 10:20:36 -04:00
const FVector NewCenter = View.ViewMatrices.GetViewOrigin();
FIntVector GridCenter;
GridCenter.X = FMath::FloorToInt(NewCenter.X / CellSize);
GridCenter.Y = FMath::FloorToInt(NewCenter.Y / CellSize);
GridCenter.Z = FMath::FloorToInt(NewCenter.Z / CellSize);
const FVector SnappedCenter = FVector(GridCenter) * CellSize;
const FBox ClipmapBounds(SnappedCenter - Extent, SnappedCenter + Extent);
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
const bool bUsePartialUpdates = GAOGlobalDistanceFieldPartialUpdates && !bForceFullUpdate;
if (!bUsePartialUpdates)
{
// Store the location of the full update
ClipmapViewState.FullUpdateOrigin = GridCenter;
Copying //UE4/Dev-Framework to Dev-Main (//UE4/Dev-Main) @ 2944217 #lockdown Nick.Penwarden ========================== MAJOR FEATURES + CHANGES ========================== Change 2899855 on 2016/03/08 by Marc.Audy Merging //UE4/Dev-Main to Dev-Framework (//UE4/Dev-Framework) @ 2899785 Change 2926689 on 2016/03/29 by Jeff.Farris AAIController::SetFocus() will now implicitly clear any location focus at the same priority. UE-27975 #rb john.abercrombie Change 2926690 on 2016/03/29 by Jeff.Farris Using wildcard operator with the "KismetEvent" or "ke" console commands will now only trigger the event on objects in the world in which it was triggered. Prevents badness with running events on things like CDOs and editor actors. (UE-23106) Change 2926691 on 2016/03/29 by mason.seay Content for testing collision on scaled components Change 2926692 on 2016/03/29 by Jeff.Farris - FixupDeltaSeconds now considers time dilation when clamping. - Acceptable range for time dilation values is now a config parameter on WorldSettings - Acceptable range for undilated frame times is now a config parameter on WorldSettings (UE-27815) #rb marc.audy Change 2926711 on 2016/03/29 by Ori.Cohen Fix constraint rendering when scaling a cosntraint actor #JIRA UE-28691, UE-28700 #rb Lina.Halper Change 2926745 on 2016/03/29 by Lukasz.Furman navigation filters can now be instantiated per querier - usually AI agent required for FORT-21372 Change 2926789 on 2016/03/29 by Ori.Cohen Downgrade check to ensure for 2d physics during a hard shutdown #rb Michael.Noland Change 2926859 on 2016/03/29 by Ori.Cohen Fix red herring warnings of not locking physx scenes during hard shutdown. #JIRA UE-28747 #rb Michael.Noland Change 2927444 on 2016/03/30 by Thomas.Sarkanen Fixed Blueprint compiler errors when resetting timer handles Added basic support for 64-bit int/uint terms to Blueprint. This allows the use of opaque 64-bit integer types inside of BlueprintType structs, it in no way means that 64-bit ints are fully supported in Blueprint. Corrected a left-over formatting oversight when converting a FTimerHandle to a string. Added new by-ref "Clear and Invalidate Timer by Handle" function to Blueprint system library & deprecated old version. #rb Maciej.Mroz (and a few others!) #jira UE-28833 - Unresolved compiler error for B_Pickups blueprint in Fortnite Change 2927520 on 2016/03/30 by Jurre.deBaare Should not allow skeletal mesh components mobility to be set to static, but detach instead #fix Added CanHaveStaticMobility to SceneComponent class, and check this when trying to propogate Static mobility to parent component #jira UE-26364 Change 2927533 on 2016/03/30 by Jurre.deBaare Static Mesh Merge tool: when merging from multiple blueprints, fails to combine same materials #fix Material index remapping was part of if-clause where it shouldn't be #jira UE-23827 Static Mesh Merge tool, failed to combine physics data if using complex #fix Required copying the SectionInfoMap from source static meshes HLOD/MergeActor - Vertex Colours are not correctly propagated to negatively scaled meshes #fix had to re-order function calls #jira UE-28316 #rb James.Golding Change 2927535 on 2016/03/30 by Ori.Cohen Make sub-stepping run on game thread #JIRA UE-24011 #rb Gil.Gribb Change 2927537 on 2016/03/30 by Jurre.deBaare Warning message when HLOD mesh > 65536 vertices #jira UE-22365 #fix added messages when building proxy mesh Change 2927691 on 2016/03/30 by Jeff.Farris Fixed potential PlayerState leak (UE-22700) Change 2927692 on 2016/03/30 by Lina.Halper Allow it to select any name they want other than just restrict to what we have. - I think it may not be the best solution but with current widget built, you can't even clear name, which is problem. - Other solution is to add "Clear" as a name, and when that gets entered, we just clear it, but then the X button is odd and no purpose being there. - I think we should just allow them to choose if they don't like it but with suggestions. #rb: Ori.Cohen #jira UE-27786 #code review: Benn.Gallagher Change 2927853 on 2016/03/30 by Lina.Halper [CL 2944273 by Marc Audy in Main branch]
2016-04-14 16:25:11 -04:00
View.ViewState->bInitializedGlobalDistanceFieldOrigins = true;
}
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
const FGlobalDFCacheType StartCacheType = GAOGlobalDistanceFieldCacheMostlyStaticSeparately ? GDF_MostlyStatic : GDF_Full;
for (uint32 CacheType = StartCacheType; CacheType < GDF_Num; CacheType++)
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3185985) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3170391 on 2016/10/21 by Ben.Woodhouse Remove the wait on end of frame ensure, because we can't rely on all the the underlying codepaths to never miss a call to flush RHI resources. The consequences of missing a flush on a given frame are not serious now, since we enforce the synchronisation with a fence, preventing the rendering thread from getting too far ahead. We will simply accumulate resources for an additional frame when this happens. #jira UE-37437 #fyi rolando.caloca, marcus.wassmer Change 3170659 on 2016/10/21 by Rolando.Caloca DR - vk - Prep work for state key changes Change 3170676 on 2016/10/21 by Rolando.Caloca DR - vk - Reworked blend state keys - Added depth/stencil to pipeline key Change 3170848 on 2016/10/21 by Daniel.Wright Level viewport 'show stats' option is now enabled by default, which avoids confusion with artists thinking lighting is built, when really the message is hidden. Change 3170849 on 2016/10/21 by Daniel.Wright Split FProjectedShadowInfo::RenderProjection into smaller functions which make the algorithm structure clear Change 3170995 on 2016/10/21 by Rolando.Caloca DR - vk - Show object on vulkan validation msgs Change 3171085 on 2016/10/21 by Rolando.Caloca DR - vk - Fix pipelines being used with incompatible renderpasses Change 3171159 on 2016/10/21 by Rolando.Caloca DR - vk - Fix layout when reading textures on CPU Change 3171167 on 2016/10/21 by Rolando.Caloca DR - vk - compile fix Change 3172462 on 2016/10/24 by Daniel.Wright Added a warning about shader compile times to the material tooltip Change 3172463 on 2016/10/24 by Daniel.Wright Reduced MinUnoccludedFraction to avoid artitfacts when a stationary light touches only a tiny part of a mesh Change 3172716 on 2016/10/24 by Brian.Karis Fix for crash UE-37369 when reimporting over a generated LOD. Change 3172967 on 2016/10/24 by Rolando.Caloca DR - vk - Fix writing buffers while GPU was using them Change 3174187 on 2016/10/25 by Olaf.Piesche UE-37020 Change 3174718 on 2016/10/26 by Rolando.Caloca DR - vk - Remove old timestamp queries, increase occlusion queries per pool to 4k Change 3175960 on 2016/10/26 by Rolando.Caloca DR - Added support for hlslcc header to have custom parsing Change 3176611 on 2016/10/27 by David.Hill DrawWireCone confusion: In response to a UDN, I'm updating confusing parameter names and comments for DrawWireCone() and DrawWireSphereCappedCone() Change 3177111 on 2016/10/27 by Rolando.Caloca DR - vk - Fix timestamps for frame Change 3177192 on 2016/10/27 by Arne.Schober DR - DitherLOD refactor - moved computation of the DepthStencil state out of SetMeshRenderState into GetDitheredLODTransitionState this is a prerequisite of further PSO work where we want to move up State setting in a similar war and reuse FMeshDrawingRenderState Change 3177278 on 2016/10/27 by Olaf.Piesche UE-37484 Change 3177297 on 2016/10/27 by Rolando.Caloca DR - vk - Enable GRHISupportsBaseVertexIndex Change 3177607 on 2016/10/27 by Rolando.Caloca DR - vk - SM4 UB prep Change 3178052 on 2016/10/28 by Arne.Schober DR - fix WebGL - the WebGL compiler is very picky on double underscores and does want the presission to be defined before any function definition. Change 3178156 on 2016/10/28 by Rolando.Caloca DR - vk - Added query timer - Fixed inline issues Change 3178158 on 2016/10/28 by Rolando.Caloca DR - vk - Fixes for out of stencil bits Change 3178462 on 2016/10/28 by Rolando.Caloca DR - vk - Fixes for Elemental Change 3179131 on 2016/10/28 by Rolando.Caloca DR - vk - Fix for r.Vulkan.UseRealUBs Change 3179139 on 2016/10/28 by Rolando.Caloca DR - vk - Move UB ring buffer to context Change 3179145 on 2016/10/28 by Rolando.Caloca DR - vk - Fix buffer barriers Change 3179888 on 2016/10/31 by Rolando.Caloca DR - vk - Align buffers to 16 bytes as we sometimes write to them with SIMD Change 3179923 on 2016/10/31 by Rolando.Caloca DR - vk - Wait for swapchain counter Change 3180430 on 2016/10/31 by Rolando.Caloca DR - vk - Properly wait for occlusion queries/cmd buffer - Actual log error if trying to use occlusion queries out of order Change 3180746 on 2016/10/31 by Rolando.Caloca DR - vk - Undo some waiting as it was on the wrong thread Change 3182115 on 2016/11/01 by Rolando.Caloca DR - hlslcc Linux path fix Change 3182118 on 2016/11/01 by Daniel.Wright Fixed global distance field seam artifacts from landscapes with no subsections Change 3182368 on 2016/11/01 by Daniel.Wright Dynamic Indirect Shadows for static meshes using distance fields * These Distance Field indirect shadows use the same tile culled and downsampled framework that capsule shadows use, with similar GPU cost * Individual StaticMesh assets can enable bGenerateMeshDistanceField to compute a distance field, without the memory cost of enabling for the whole project * New StaticMeshComponent properties bCastDynamicIndirectShadow and DynamicIndirectShadowMinVisibility * New WorldSettings property DynamicIndirectShadowsSelfShadowingIntensity which replaces the cvar * The GBuffer now stores HasDynamicIndirectShadowCasterRepresentation instead of HasHeightfieldRepresentation * DFAO from landscape is now done through the global distance field entirely. Landscape contribution to the global distance field is deferred to attempt to workaround texture streaming issues. Change 3182408 on 2016/11/01 by Rolando.Caloca DR - vk - Reworked occlusion queries, fixes flickering on AMD Change 3182585 on 2016/11/01 by Daniel.Wright PS4 compile fix Change 3183151 on 2016/11/02 by Rolando.Caloca DR - vk - Fix issue when processing super quick cmd buffers Change 3183160 on 2016/11/02 by Rolando.Caloca Dr - vk - Call reset queries outside render pass Change 3183182 on 2016/11/02 by Rolando.Caloca DR - Switch clear Change 3183194 on 2016/11/02 by Rolando.Caloca DR - Try to catch crash ahead of time Change 3183268 on 2016/11/02 by Rolando.Caloca DR - vk - Rename RenderPassState to TransitionState Change 3183440 on 2016/11/02 by Daniel.Wright Renamed 'Dynamic Indirect Shadow' to 'Distance Field Indirect Shadow' Change 3183793 on 2016/11/02 by Daniel.Wright Added ShadowResolutionScale to lightcomponent Change 3183796 on 2016/11/02 by Daniel.Wright Improved bSimulatePhysics comment, with info on why it might be greyed out Change 3183797 on 2016/11/02 by Daniel.Wright Precomputed shadowmaps no longer enable Force2To1Aspect, which is only needed for lightmaps. Improves shadowmap utilization. Change 3183915 on 2016/11/02 by Rolando.Caloca DR - vk - Remove redundant renderpasses Change 3183991 on 2016/11/02 by Daniel.Wright Added r.ReflectionEnvironmentLightmapMixLargestWeight, useful for restricting lightmap mixing to darkening only Change 3184001 on 2016/11/02 by Daniel.Wright Better draw event for IndirectCapsuleShadows in stereo Change 3184096 on 2016/11/02 by Chris.Bunner HDR for D3D11 - NVAPI toggle and encoding, UI compositing. Removed some outdated tonemamping cvars and modes. Change 3184399 on 2016/11/02 by Daniel.Wright Static analysis workaround Change 3184455 on 2016/11/02 by Mark.Satterthwaite Fix missing log10 from FCompositePS on hlslcc shader platforms so that QA can continue their integration. #jira UE-38164 Change 3184953 on 2016/11/03 by Chris.Bunner Fixing CIS warnings. [CL 3186011 by Marcus Wassmer in Main branch]
2016-11-03 16:55:27 -04:00
{
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
FGlobalDistanceFieldClipmap& Clipmap = *(CacheType == GDF_MostlyStatic
? &GlobalDistanceFieldInfo.MostlyStaticClipmaps[ClipmapIndex]
: &GlobalDistanceFieldInfo.Clipmaps[ClipmapIndex]);
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3185985) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3170391 on 2016/10/21 by Ben.Woodhouse Remove the wait on end of frame ensure, because we can't rely on all the the underlying codepaths to never miss a call to flush RHI resources. The consequences of missing a flush on a given frame are not serious now, since we enforce the synchronisation with a fence, preventing the rendering thread from getting too far ahead. We will simply accumulate resources for an additional frame when this happens. #jira UE-37437 #fyi rolando.caloca, marcus.wassmer Change 3170659 on 2016/10/21 by Rolando.Caloca DR - vk - Prep work for state key changes Change 3170676 on 2016/10/21 by Rolando.Caloca DR - vk - Reworked blend state keys - Added depth/stencil to pipeline key Change 3170848 on 2016/10/21 by Daniel.Wright Level viewport 'show stats' option is now enabled by default, which avoids confusion with artists thinking lighting is built, when really the message is hidden. Change 3170849 on 2016/10/21 by Daniel.Wright Split FProjectedShadowInfo::RenderProjection into smaller functions which make the algorithm structure clear Change 3170995 on 2016/10/21 by Rolando.Caloca DR - vk - Show object on vulkan validation msgs Change 3171085 on 2016/10/21 by Rolando.Caloca DR - vk - Fix pipelines being used with incompatible renderpasses Change 3171159 on 2016/10/21 by Rolando.Caloca DR - vk - Fix layout when reading textures on CPU Change 3171167 on 2016/10/21 by Rolando.Caloca DR - vk - compile fix Change 3172462 on 2016/10/24 by Daniel.Wright Added a warning about shader compile times to the material tooltip Change 3172463 on 2016/10/24 by Daniel.Wright Reduced MinUnoccludedFraction to avoid artitfacts when a stationary light touches only a tiny part of a mesh Change 3172716 on 2016/10/24 by Brian.Karis Fix for crash UE-37369 when reimporting over a generated LOD. Change 3172967 on 2016/10/24 by Rolando.Caloca DR - vk - Fix writing buffers while GPU was using them Change 3174187 on 2016/10/25 by Olaf.Piesche UE-37020 Change 3174718 on 2016/10/26 by Rolando.Caloca DR - vk - Remove old timestamp queries, increase occlusion queries per pool to 4k Change 3175960 on 2016/10/26 by Rolando.Caloca DR - Added support for hlslcc header to have custom parsing Change 3176611 on 2016/10/27 by David.Hill DrawWireCone confusion: In response to a UDN, I'm updating confusing parameter names and comments for DrawWireCone() and DrawWireSphereCappedCone() Change 3177111 on 2016/10/27 by Rolando.Caloca DR - vk - Fix timestamps for frame Change 3177192 on 2016/10/27 by Arne.Schober DR - DitherLOD refactor - moved computation of the DepthStencil state out of SetMeshRenderState into GetDitheredLODTransitionState this is a prerequisite of further PSO work where we want to move up State setting in a similar war and reuse FMeshDrawingRenderState Change 3177278 on 2016/10/27 by Olaf.Piesche UE-37484 Change 3177297 on 2016/10/27 by Rolando.Caloca DR - vk - Enable GRHISupportsBaseVertexIndex Change 3177607 on 2016/10/27 by Rolando.Caloca DR - vk - SM4 UB prep Change 3178052 on 2016/10/28 by Arne.Schober DR - fix WebGL - the WebGL compiler is very picky on double underscores and does want the presission to be defined before any function definition. Change 3178156 on 2016/10/28 by Rolando.Caloca DR - vk - Added query timer - Fixed inline issues Change 3178158 on 2016/10/28 by Rolando.Caloca DR - vk - Fixes for out of stencil bits Change 3178462 on 2016/10/28 by Rolando.Caloca DR - vk - Fixes for Elemental Change 3179131 on 2016/10/28 by Rolando.Caloca DR - vk - Fix for r.Vulkan.UseRealUBs Change 3179139 on 2016/10/28 by Rolando.Caloca DR - vk - Move UB ring buffer to context Change 3179145 on 2016/10/28 by Rolando.Caloca DR - vk - Fix buffer barriers Change 3179888 on 2016/10/31 by Rolando.Caloca DR - vk - Align buffers to 16 bytes as we sometimes write to them with SIMD Change 3179923 on 2016/10/31 by Rolando.Caloca DR - vk - Wait for swapchain counter Change 3180430 on 2016/10/31 by Rolando.Caloca DR - vk - Properly wait for occlusion queries/cmd buffer - Actual log error if trying to use occlusion queries out of order Change 3180746 on 2016/10/31 by Rolando.Caloca DR - vk - Undo some waiting as it was on the wrong thread Change 3182115 on 2016/11/01 by Rolando.Caloca DR - hlslcc Linux path fix Change 3182118 on 2016/11/01 by Daniel.Wright Fixed global distance field seam artifacts from landscapes with no subsections Change 3182368 on 2016/11/01 by Daniel.Wright Dynamic Indirect Shadows for static meshes using distance fields * These Distance Field indirect shadows use the same tile culled and downsampled framework that capsule shadows use, with similar GPU cost * Individual StaticMesh assets can enable bGenerateMeshDistanceField to compute a distance field, without the memory cost of enabling for the whole project * New StaticMeshComponent properties bCastDynamicIndirectShadow and DynamicIndirectShadowMinVisibility * New WorldSettings property DynamicIndirectShadowsSelfShadowingIntensity which replaces the cvar * The GBuffer now stores HasDynamicIndirectShadowCasterRepresentation instead of HasHeightfieldRepresentation * DFAO from landscape is now done through the global distance field entirely. Landscape contribution to the global distance field is deferred to attempt to workaround texture streaming issues. Change 3182408 on 2016/11/01 by Rolando.Caloca DR - vk - Reworked occlusion queries, fixes flickering on AMD Change 3182585 on 2016/11/01 by Daniel.Wright PS4 compile fix Change 3183151 on 2016/11/02 by Rolando.Caloca DR - vk - Fix issue when processing super quick cmd buffers Change 3183160 on 2016/11/02 by Rolando.Caloca Dr - vk - Call reset queries outside render pass Change 3183182 on 2016/11/02 by Rolando.Caloca DR - Switch clear Change 3183194 on 2016/11/02 by Rolando.Caloca DR - Try to catch crash ahead of time Change 3183268 on 2016/11/02 by Rolando.Caloca DR - vk - Rename RenderPassState to TransitionState Change 3183440 on 2016/11/02 by Daniel.Wright Renamed 'Dynamic Indirect Shadow' to 'Distance Field Indirect Shadow' Change 3183793 on 2016/11/02 by Daniel.Wright Added ShadowResolutionScale to lightcomponent Change 3183796 on 2016/11/02 by Daniel.Wright Improved bSimulatePhysics comment, with info on why it might be greyed out Change 3183797 on 2016/11/02 by Daniel.Wright Precomputed shadowmaps no longer enable Force2To1Aspect, which is only needed for lightmaps. Improves shadowmap utilization. Change 3183915 on 2016/11/02 by Rolando.Caloca DR - vk - Remove redundant renderpasses Change 3183991 on 2016/11/02 by Daniel.Wright Added r.ReflectionEnvironmentLightmapMixLargestWeight, useful for restricting lightmap mixing to darkening only Change 3184001 on 2016/11/02 by Daniel.Wright Better draw event for IndirectCapsuleShadows in stereo Change 3184096 on 2016/11/02 by Chris.Bunner HDR for D3D11 - NVAPI toggle and encoding, UI compositing. Removed some outdated tonemamping cvars and modes. Change 3184399 on 2016/11/02 by Daniel.Wright Static analysis workaround Change 3184455 on 2016/11/02 by Mark.Satterthwaite Fix missing log10 from FCompositePS on hlslcc shader platforms so that QA can continue their integration. #jira UE-38164 Change 3184953 on 2016/11/03 by Chris.Bunner Fixing CIS warnings. [CL 3186011 by Marcus Wassmer in Main branch]
2016-11-03 16:55:27 -04:00
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
bool bLocalUsePartialUpdates = bUsePartialUpdates
// Only use partial updates with small numbers of primitive modifications
&& ClipmapViewState.Cache[CacheType].PrimitiveModifiedBounds.Num() < 100;
if (bLocalUsePartialUpdates)
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3219450) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3148067 on 2016/10/01 by Daniel.Wright Support for ReflectionEnvironment and light type show flags with ForwardShading Change 3149085 on 2016/10/03 by Daniel.Wright Support for ReflectionEnvironment show flag in base pass reflections without any shader overhead Change 3162206 on 2016/10/13 by Chris.Bunner Merging Dev-MaterialLayers to Dev-Rendering, CL 3161593: Material expressions; Trig, fast-trig, saturate, round, truncate, pre-skinned normal. Added CustomEyeTangent to material attributes. Resolved some hard-coded attribute typing and other minor fixes. Change 3186067 on 2016/11/03 by Daniel.Wright Updated Stationary primitive tooltip to indicate that it allows the primitive to be changed, but not moved Change 3186069 on 2016/11/03 by Daniel.Wright Using a weighted geometric mean to combine multiple Distance Field Indirect Shadows, greatly reduces over-occlusion when overlap is high Change 3186084 on 2016/11/03 by Mark.Satterthwaite Duplicate 3172511: Don't set Metal resource option fields on texture descriptors when running on an OS that doesn't support them. #jira UE-37481 Change 3186089 on 2016/11/03 by Mark.Satterthwaite Duplicate CL #3169764: Fixed automatic conversion of G8_sRGB into RGBA8_sRGB required for Mac Metal, which fixes FORT-27627. #jira FORT-27627 Change 3186113 on 2016/11/03 by Mark.Satterthwaite Duplicate CL #3183807: Change the way we access the Metal viewport's backbuffer, to reduce possible causes of FORT-31649: - Added console variable "rhi.Metal.SupportsIntermediateBackBuffer" to control whether to use an extra render-target so we can support screenshots & movie capture, or render directly to the back-buffer to save memory & GPU performance. Still defaults to ON for Mac & OFF for iOS/tvOS. - Change the way we handle updates to the back-buffer size to ensure that the different threads access their intended version. #jira FORT-31649 Change 3186116 on 2016/11/03 by Mark.Satterthwaite Duplicate CL #3183823: Record Metal resource & state objects used in a command-buffer when rhi.Metal.RuntimeDebugLevel is set to 3 or higher. The object labels, types & descriptions will be printed on failure - if the object is deleted prior to this then we have a lifetime error and it will crash at this point and can be debugged further using our -metalretainrefs command-line option or Xcode's zombie-objects. Used to verify that FORT-31649 is not a simple resource lifetime error and thereby speed up Apple/vendor investigations. #jira FORT-31649 Change 3186818 on 2016/11/04 by Chris.Bunner PR #2907 Export UMaterialExpressionNoise (contributed by kayosiii). Change 3186979 on 2016/11/04 by Rolando.Caloca DR - Misc minor cleanup Change 3187169 on 2016/11/04 by Uriel.Doyon Incremental insertion of level data between PostLoad and AddToWorld Change 3187205 on 2016/11/04 by Mark.Satterthwaite Compile fixes for iOS. Change 3187389 on 2016/11/04 by Uriel.Doyon Fix for possible stall when loading hidden level Change 3187598 on 2016/11/04 by Michael.Trepka MetalViewport compile fix Change 3187678 on 2016/11/04 by Uriel.Doyon Fix for landscape grass textures not being streamed in correctly. Change 3187731 on 2016/11/04 by Rolando.Caloca DR - Start making type safe some cross compiler enums Change 3187824 on 2016/11/04 by Rolando.Caloca DR - clang compile fix Change 3187953 on 2016/11/04 by Rolando.Caloca DR - vk - Mac compile fix Change 3188696 on 2016/11/07 by Mark.Satterthwaite Another iOS compile fix for new MetalViewport validation code. Change 3188906 on 2016/11/07 by Rolando.Caloca DR - Show permutation of LUTBlender Change 3189094 on 2016/11/07 by Chris.Bunner Fix RemoveAAJitter from projection matrix. #jira UE-37701, UE-38003 Change 3189134 on 2016/11/07 by Daniel.Wright Fix for CreateRenderTarget2D called in construction script during cooking Change 3189145 on 2016/11/07 by Chris.Bunner Follow-up to CL 3186818, export UMaterialExpressionVectorNoise. Change 3189239 on 2016/11/07 by Daniel.Wright Added show flag for Contact Shadows, disabled in planar reflections Change 3189252 on 2016/11/07 by Daniel.Wright Support for Reflection Capture intensity with simple reflections, which are the default with Forward Shading Change 3189406 on 2016/11/07 by Mark.Satterthwaite Really fix the last of the iOS compile errors from changes to the MetalViewport code. Change 3190854 on 2016/11/08 by Ben.Woodhouse XB1: Fix memory corruption with RHICreateVertexBuffer and RHICreateIndexBuffer when using initial data (Procedural Mesh Component crash) #jira UE-34264 #fyi james.golding #fyi keith.judge Change 3190962 on 2016/11/08 by Olaf.Piesche Unshelved from pending changelist '3176615' - Gil's fix for race condiiton with particle vertex factory reuse across different passes; potential to fix a number of issues Change 3191959 on 2016/11/09 by Uriel.Doyon Removed some static primitives from the dynamic primitive handler for texture streaming. Change 3193122 on 2016/11/10 by Chris.Bunner Always update non-preview material resources for use in code preview. #jira UE-38223 Change 3193190 on 2016/11/10 by Gil.Gribb UE4 - Fixed rare bug with shadow groups rendering things that have not been setup to render this frame. #jira UE-36379 Change 3193523 on 2016/11/10 by Uriel.Doyon Fixed incorrect section bounds used for texture streaming. Change 3193962 on 2016/11/10 by Uriel.Doyon Added defrag of dynamic bounds used for the texture streaming. Allows to remove unused bounds over time. Change 3193974 on 2016/11/10 by Uriel.Doyon New "Required Texture Resolution" view mode. Showing the ratio between the currently streamed texture resolution and the resolution wanted by the GPU. Change 3194109 on 2016/11/10 by Uriel.Doyon Another patch on material bounds used for texture streaming. Change 3194665 on 2016/11/11 by Chris.Bunner Duplicated behavior for inherited velocity scaling scaling to vert/surface spawned particles. Change 3194734 on 2016/11/11 by Rolando.Caloca DR - vk - Simplified some texture casting Change 3194867 on 2016/11/11 by Rolando.Caloca DR - vk - SM5 fixes Change 3195176 on 2016/11/11 by Chris.Bunner Fixed incorrectly updated NVAPI error. Change 3195425 on 2016/11/11 by Uriel.Doyon Fixed possible invalid level reference in the texture streamer Change 3196512 on 2016/11/14 by Gil.Gribb Merging //UE4/Dev-Main@3196156 to Dev-Rendering (//UE4/Dev-Rendering) Change 3196750 on 2016/11/14 by Marcus.Wassmer Fix ordering problem with GPU cache transitions Change 3196815 on 2016/11/14 by Daniel.Wright Suppressed 'Instanced stereo rendering is not supported' warning showing up in CIS Change 3196818 on 2016/11/14 by Daniel.Wright Fixed FIndirectLightingCache::UpdateCachePrimitivesInternal churning through a bunch of temporary memory Change 3196819 on 2016/11/14 by Daniel.Wright Volume lighting samples are allowed outside of the importance volume if their influence affects the volume. Fixes black indirect lighting on movable components in maps with small importance volumes. Volume lighting samples placed on surfaces use a radius that covers the layer height spacing, which prevents an uncovered region between layers Change 3197243 on 2016/11/14 by Uriel.Doyon Async Task For Updating static component LastRender time #jira UE-24268 Change 3197359 on 2016/11/14 by Daniel.Wright Added Inscattering Texture controls to ExponentialHeightFog * When InscatteringColorCubemap is specified, directional light inscattering is disabled * Lerps betwen 1x1 mip at NonDirectionalInscatteringColorDistance to mip 0 at FullyDirectionalInscatteringColorDistance * Added FogCutoffDistance, so artists can prevent fog on skyboxes (requires fog to be setup matching the fog that was rendered into the sky texture so that distant mountains match) * Fog shader permutations based on what feature is enabled Change 3198419 on 2016/11/15 by Chris.Bunner PS4 HDR: Runtime toggle (backbuffer recreation on resize matching), UI composition. Matches PC behavior and controls. HDR: Generalized buffer formats, cvar consistency pass, LUT for UI composition, refactoring common functions. Exposed RHICreateTargetableShaderResource3D. Moved some (translucent) volume rendering helpers to allow access in Slate. Change 3198822 on 2016/11/15 by Daniel.Wright Mac compile fix Change 3199509 on 2016/11/15 by Uriel.Doyon Added support for viewmode param asset name (and note just param value). Used to investigate texture streamer behavior. Change 3199578 on 2016/11/15 by Rolando.Caloca DR - Add some shader resource tables to SCW when running with -directcompile Change 3199698 on 2016/11/15 by Rolando.Caloca DR - vk - Refactor shader & descriptor bindings Change 3199712 on 2016/11/15 by Rolando.Caloca DR - vk - r.Vulkan.StripGlsl to always strip glsl at runtime to save memory per shader Change 3199717 on 2016/11/15 by Rolando.Caloca DR - vk - Show hitching PSO info again Change 3199750 on 2016/11/15 by Rolando.Caloca DR - SCW clang compile fixes Change 3200353 on 2016/11/16 by Rolando.Caloca DR - vk - Mac fix Change 3200358 on 2016/11/16 by Chris.Bunner Only allow UI composition on platforms we currently use it. Change 3200823 on 2016/11/16 by Chris.Bunner Remove expression key attribute ID when not translating an attribute output to allow intended expression sharing. #jira UE-38699 Change 3200947 on 2016/11/16 by Mark.Satterthwaite Fix UE-38695 by not trying to resize the viewport on the wrong thread. #jira UE-38695 Change 3201069 on 2016/11/16 by Daniel.Wright Fog inscattering texture limited to SM4 and above, fixes ES2 compile errors Change 3201346 on 2016/11/16 by Brian.Karis Temporal AA fix for correct edge gradients. Filtering now combined with importance sampling. Enabled Catmull-Rom resolve filter. Results are now slightly sharper. Fixed antighosting. Will yet require a dilation to be perfect. Optimized bicubic filtering to 5 taps instead of 9. Cleaned out unused code. Change 3201369 on 2016/11/16 by Brian.Karis Bicubic texture sample Change 3201522 on 2016/11/16 by Rolando.Caloca DR - vk - Fix static analysis issues Change 3201878 on 2016/11/17 by Chris.Bunner Temporarily disable Nvapi HDR error logging. #jira UE-38529 Change 3202108 on 2016/11/17 by Simon.Tovey Assets with easy repro for flickering particles bug Change 3202181 on 2016/11/17 by Rolando.Caloca DR - vk - CIS android fix Change 3202325 on 2016/11/17 by Ben.Woodhouse Integrate 4.14.1 fix from 14 //UE4/Release-4.14 (@3201850) Fix CreateVertexbuffer and CreateIndexBuffer memory corruption (Procedural Mesh Component crash) #jira UE-34264 Change 3204394 on 2016/11/18 by Guillaume.Abadie PR #2808: AlphaComposite Fog Opacity fix (Contributed by moritz-wundke) #br Ben.Woodhouse Change 3204428 on 2016/11/18 by Guillaume.Abadie Fixes a couple of issues in decals: * Crash in FDecalDrawingPolicyFactory::DrawMesh() * ActorPostion material expression * PixelNormalWS material expression * Missing renaming from DEFERRED_DECAL to DECAL_PRIMITIVE #jira UE-38327, UE-38158, UE-37818, UE-37350 Change 3204429 on 2016/11/18 by Uriel.Doyon Darker default undefined accuracy. Reenabled the texture streaming build in the build all. Change 3204458 on 2016/11/18 by Chris.Bunner Shader truncation warnings fix. Change 3204459 on 2016/11/18 by Chris.Bunner Engine 'Passthrough' material fuction fix. V4 is now actually a V4. Change 3204460 on 2016/11/18 by Chris.Bunner Correctly handle some known Nvapi warnings. #jira UE-38529 Change 3204653 on 2016/11/18 by Marc.Olano Helper functions for tiled textures Checking in for Ryan Brucks Change 3204863 on 2016/11/18 by Arne.Schober DR - Replaced ENQUEUE_UNIQUE_RENDER_COMMAND with a Debuggable template Implementation Change 3204939 on 2016/11/18 by Arne.Schober DR - Make clang happy Change 3204968 on 2016/11/18 by Arne.Schober DR - UE-38494 - Fixed SpeedTree Wind crash, when force deleting the Asset. Change 3206293 on 2016/11/21 by Uriel.Doyon New member bHasStreamingUpdatePending in UTexture2D to delay update of global distance fields. Set to true when the streamer can possibly load a mip in the near future. #jira UE-37787 Change 3206551 on 2016/11/21 by Chris.Bunner Added material update context when forcing all shaders to recompile. #jira UE-38481 Change 3206644 on 2016/11/21 by Benjamin.Hyder Updating Planar Reflection example in TM-Shadermodels. Change 3206899 on 2016/11/21 by Rolando.Caloca DR - vk - SM5 fixes Change 3206900 on 2016/11/21 by Rolando.Caloca DR - Added missing strings for shader formats Change 3206983 on 2016/11/21 by Rolando.Caloca DR - vk - Support for SV_Coverage Change 3207237 on 2016/11/22 by Simon.Tovey Exporting particle module base and a couple of child classes as it's commonly requested. #test compiles Change 3207241 on 2016/11/22 by Gil.Gribb Merging //UE4/Dev-Main@3206998 to Dev-Rendering (//UE4/Dev-Rendering) Change 3207520 on 2016/11/22 by Ben.Woodhouse Cherry picked from //Fortnite/Main@3206301 Fixed GPU hang in Zone Map view. Was an issue with RenderThread using the device context without appropriate RHIThread flushes. #jira FORT-31616 #code_review keith.judge Change 3207541 on 2016/11/22 by Ben.Woodhouse Cherry picked from //fortnite/Main@3207422 * Fix UpdateTexture3D to create a staging texture of the region to update rather than the whole texture. This prevents distance fields crashing during update (allocating 18GB per frame in some cases) * Put UpdateTexture2D DMA support onto a cvar, disabled by default (corruption issues reported by licensees, plus not sure it's actually faster - could be slower due to reduced bandwidth; issues reported by licensees) * Fix UpdateTexture2D to only create a staging texture of the region to update, saving memory #jira UE-38609 Change 3207654 on 2016/11/22 by Chris.Bunner Don't flag 16-bit PNG/JPG textures as sRGB on import. #jira UE-30279 Change 3208434 on 2016/11/22 by Rolando.Caloca DR - vk - UAV transitions Change 3208490 on 2016/11/22 by Chris.Bunner Break material code sharing when we detect an unresolvable loop. By default change IsResultMA loop detection to stop on functions as we can determine type definitively. Unified IsResultMA detection across switch nodes. Change 3208860 on 2016/11/23 by Rolando.Caloca DR - vk - Fix some format issues Change 3209265 on 2016/11/23 by Arne.Schober DR - originally unshelved from 3153924 - Made Depth and Velocity Rendering Passes to use PSO only RHI interface, We are now passing down two structs that collect all the necessary information for the drawing policies to construct a PSO object. One during construction of the Policy, which contains information abouyt the CullMode, FillMode and PrimType. And another during rendering that passes infomation like BlendState and DepthStencilState down to the low levelrenderer into SetSharedState. Performance of the static drawlist ist slightly slower (less than 0.1ms on Consoles) due to some addtional branches and copies. The branches in the FDrawingPolicyRenderState will go away as soon as everything is converted to use the PSO interface. Performace of the GPU is slightly better due to less context rolls (mainly CullMode sorts in differently now) Change 3209305 on 2016/11/23 by Guillaume.Abadie Fix contact shadow's assemption on objects thickness Change 3209334 on 2016/11/23 by Brian.Karis Fixed TAA handling of alpha. Switched the meaning of AA_ALPHA to make sense. Change 3209903 on 2016/11/24 by Guillaume.Abadie Cherry picks alpha through post processing changelists 3201959, 3204143 and 3209883 from //UE4/Private-Partner-NREAL Change 3209973 on 2016/11/24 by Ben.Woodhouse Fix D3D11 and 12 static analysis warnings reported by Rob Troughton of Coconut Lizard (http://coconutlizard.co.uk/blog/ue4/pvs-studio-part5/) Change 3210023 on 2016/11/24 by Uriel.Doyon Fixed an issue with DropDetail when FixedFrameRate was set to a value smaller than MinDesiredFrameRate. #jira UE-37210 Change 3210026 on 2016/11/24 by Ben.Woodhouse Disable renderthread hang detection if a debugger is present, so we can debug the renderthread without crashing Change 3210049 on 2016/11/24 by Ben.Woodhouse Fix mac build Change 3210071 on 2016/11/24 by Uriel.Doyon Fixed an issue with masked materials and shader complexity viewmode when DBuffer Decals are enabled. #jira UE-37542 Change 3210374 on 2016/11/25 by Ben.Woodhouse * Fix issues with fast cleared dbuffer targets not being resolved when no decals are in the scene. This caused graphical corruption on XB1 and ensure failures on PS4 (with RHIThread disabled) * Move Decal rendertarget manager function implementations out of the header. #jira UE-38879 Change 3210390 on 2016/11/25 by Uriel.Doyon Fixed cubemap resourcesize not taking into account mipgen settings #jira UE-37045 Change 3210407 on 2016/11/25 by Uriel.Doyon "resavepackages" commandlet now supports -buildtexturestreaming that rebuilds the map texture streaming data. That can be used in combination with -buildlighting. Change 3210563 on 2016/11/27 by Rolando.Caloca DR - vk - Integrate cached memory fixes and PF_D24 format fix #jira UE-39025 PR #2974 Change 3210564 on 2016/11/27 by Rolando.Caloca DR - Fix for GL linker PR #2975 #jira UE-39029 Change 3210592 on 2016/11/27 by Rolando.Caloca DR - vk - SM5 fixes Change 3210597 on 2016/11/27 by Rolando.Caloca DR - vk - Prep for staging UB copies to GPU memory Change 3210600 on 2016/11/27 by Rolando.Caloca DR - vk - Extract generic range code Change 3210613 on 2016/11/27 by Rolando.Caloca DR - vk - Added r.Vulkan.SubmitOnDispatch Change 3211054 on 2016/11/28 by Rolando.Caloca DR - vk - Missing reference Change 3211330 on 2016/11/28 by Chris.Bunner Shader compile error for max texture coordinate count on skinned meshes. Change 3211384 on 2016/11/28 by Arne.Schober DR - Enforce move on EnqueueRenderCommand Lambda Change 3211431 on 2016/11/28 by Gil.Gribb Merging //UE4/Dev-Main@3211016 to Dev-Rendering (//UE4/Dev-Rendering) Change 3211738 on 2016/11/28 by Gil.Gribb IWYU fixes after merge Change 3212231 on 2016/11/28 by Richard.Wallis Fix build errors Change 3212253 on 2016/11/28 by Richard.Wallis Remove MacGraphicsSwitching plugin. #jira UE-37640 Change 3212310 on 2016/11/28 by Rolando.Caloca DR - vk - Update glslang to 1.0.33.0 Change 3212446 on 2016/11/28 by Guillaume.Abadie Implements PreviousFrameSwitch material expression Change 3212594 on 2016/11/28 by Arne.Schober DR - Fix missing include Change 3212681 on 2016/11/29 by Rolando.Caloca DR - vk - Auto flush for compute shader Change 3213000 on 2016/11/29 by Gil.Gribb temp fix for PF_MAX Change 3213161 on 2016/11/29 by Ben.Woodhouse Integrate latest D3D12 changes from //depot/Partners/Microsoft/UE4-DX12/...@3211714 Using: - p4 integrate //depot/Partners/Microsoft/UE4-DX12/Engine/Source/Runtime/D3D12RHI/...@3211714 //UE4/Dev-Rendering/Engine/Source/Runtime/D3D12RHI/... - p4 integrate //depot/Partners/Microsoft/UE4-DX12/Engine/Source/ThirdParty/Windows/DirectX/...@3211714 //UE4/Dev-Rendering/Engine/Source/ThirdParty/Windows/DirectX/... - p4 integrate //depot/Partners/Microsoft/UE4-DX12/Engine/Source/Programs/UnrealBuildTool/...@3211714 //UE4/Dev-Rendering/Engine/Source/Programs/UnrealBuildTool/... Changes from UE4-DX12: *** CL 3183818 *** Update D3D12 RHI to 4.14: - Merged changes from Epic up until 10/20/16 - Fixed an issue where command allocators where resetting too early. I changed to aggressive command list batching by default now that more SubmitCommandListHint calls exist in the upper engine, we don't need to worry about starving the GPU. Fewer ExecuteCommandLists calls means better performance and fewer Signals() so this change provides a GPU perf win. I had to fix an issue with aggressive batching where we would sometimes hold on to a command list long enough (in the pending list) but hadn't executed it yet. The command allocator was being put back in the queue of allocators during ReleaseCommandAllocator() without a syncpoint set and was thus being reset too early. I added a simple counter to the command allocator so it could track how many command lists were using it. It doesn't need to be thread safe since only one thread uses a command allocator at a time. I also added some stats around the # command lists and # command allocators since it would be possible to leak command allocators now if it's pending command list count isn't decremented correctly. In that case we'd keep creating new command allocators and eventually run out of memory. -Remove clear during allocate in the FD3D12FastConstantAllocator and FD3D12FastAllocator. The supplied resource locations are assumed to be new and thus don't need to be cleared. -Cleanup D3D12RHI stats. There were some unused stats as well as some missing ones. -Mark shader resource table uniform buffers as dirty only when the shader changes. Cleanup SetComputeShader calls and Dispatch calls to not set/unset the CS for each Dispatch. -Remove unused Check SRV resolved code that epic added to the D3D11 RHI and was brought over. We dont need it and we won't use this. -Remove "always on" cycle counters for high frequency RHI methods like RHISetShaderTexture. These should use the engine's stat macros as they are removed on TEST + SHIPPING builds. On Xbox a significant amount of CPU time is spent in things like QueryPerformanceCounter even when STATS aren't enabled. Currently 1% of an entire capture on XBOX is spent inside this call. I improved and cleaned up high freqency call stacks like: - RHISetShaderTexture - RHISetShaderResourceViewParameter - RHISetShaderParameter - RHISetUAVParameter In general I moved to use templated functions, removed unused parameters, unnecessary copies, etc. -Change D3D12 PIX profiling enable/disable to match Xbox and handle logic in the UEBuildWindows.cs for UBT. Also add a static assert to inform the developer when PIX profiling is requested but the engine is compiling out draw events. -Resources should be associated with the rendering thread's frame that it's currently recording command lists for and they shouldnt be cleaned up until those command lists have been translated to D3D12 command lists on the RHI thread AND completed executing on the GPU. This was confirmed to resolve an issue where CBV resources were being released too early. This work involved a couple changes: 1) Move the "frame" fence to be incremented on the rendering thread (during RHIAdvanceFrameForGetViewportBackBuffer()) so that resources that are deleted from the rendering thread are assosicated with the correct frame count 2) Queue up a command from the rendering thread to signal the "frame" fence. It needs to be queued to ensure that it's signaled at the correct time on the RHI thread (after that frame's command lists have been executed). -Disable GRHIRequiresEarlyBackBufferRenderTarget. Metal/Vulkan/Xbox11.x already do this. This is used by the Slate renderer during BeginRenderFrame and avoids a SetRenderTargets call. -Enable GRHISupportsMSAADepthSampleAccess (used in the Editor). This was enabled for D3D11 on SM5, but not for D3D12. -Delay load D3D12.dll and add root signature 1.1 support. -Add explicit flush calls to improve resource barrier batching instead of implict flushes inside FConditionalScopeResourceBarrier and FScopeResourceBarrier. Also update those classes with const members. *** CL 3183824 *** Fix the D3D12 RHI after integrating UE 4.14 updates: - Fixed a bug where we would try to get the PSO of a nullptr in SetPipelineState if we needed to reset the current PSO on the cmd list. - Fixed a spelling error - Removed the need for bForceState, we use dirty bits now *** CL 3183830 *** - GetDebugFlags RHI extension, needed by XB1 movie player. - Only query memory info if stats are enabled - Add support for the engine's new RHISubmitCommandsAndFlushGPU function - Update CommitPendingPipelineState to be Graphics/Compute specific and avoid the need for a IsCompute parameter. *** CL 3183837 *** Made PipelineState caches contain pointers to FD3D12PipelineState objects to avoid issues with using pointers to after Find/Add to the maps. TMap indicates that the pointer to the value associated with a key "is only valid until the next change to any key in the map." The lifetime of the PSO pointers is managed by the low level caches (graphics and compute). Added stat for the number of Pipeline State Objects. *** CL 3183931 *** Update Windows D3D12 headers and libs to RS1 release bits (10.0.14393.0) *** CL 3183978 *** Update UBT Windows build settings: - Change D3D12 PIX profiling enable/disable to match Xbox and handle logic in the UEBuildWindows.cs for UBT. Also add a static assert to inform the developer when PIX profiling is requested but the engine is compiling out draw events. -Delay load D3D12.dll and add root signature 1.1 support. *** CL 3184132 *** Fix Xbox PSO cache code where it could leak PSOs. Related to change 3183837. *** Changelist 3211714 *** Update D3D12 RHI with fixes: - Check if we can reserve slots in GatherUniqueSamplerTables - DirtyState more often in StateCache - Remove InternalSetSamplerState. The alternate function isn't used. - Allow MRTClear for arrays with holes in them - Fix uninitialized descriptors. This was causing a GPU hang on Xbox. We need to set dirty bits for resources bound to slots outside of the current descriptor table's range - Cleanup SetDescriptorHeap code. Move setting descriptor heap logic to the descriptor cache since it also owns things like the sampler maps. Added members to the descriptor cache to track the last heaps that were set on the command list to avoid dirtying bit unnecessarily. - Resource transitions: go through Common between queues (3D <--> Compute) - Fix initial state for placed resources. - Merging epic Change 3213250 on 2016/11/29 by Chris.Bunner GBufferHints tooltip fix. #jira UE-39103 Change 3213345 on 2016/11/29 by Gil.Gribb more IWYU fallout Change 3213676 on 2016/11/29 by Rolando.Caloca DR - Fix incorrect texture getting cleared Change 3213728 on 2016/11/29 by Rolando.Caloca DR - Lambda-ize Change 3214461 on 2016/11/29 by Ben.Woodhouse Rollout August QFE4 XDK (required for latest DX12 changes on XB1) Change 3215317 on 2016/11/30 by Daniel.Wright PS4 compile fix Change 3216343 on 2016/11/30 by Arne.Schober DR - UE-39155 - after talking to Brian it occurred to us that flipping the world space normal is non sensical. And indeed the Grass was using world space normals. Change 3216844 on 2016/12/01 by Ben.Woodhouse Fix for static analysis warnings after discussion with Microsoft Change 3216916 on 2016/12/01 by Gil.Gribb Merging //UE4/Dev-Main@3216539 to Dev-Rendering (//UE4/Dev-Rendering) Change 3217385 on 2016/12/01 by Arne.Schober DR - UE-39218, UE-39221, UE-39224 and potentially UE-39214 - The Stencil bits for Light channels and decal application were not set in the dynamic basepass Change 3217464 on 2016/12/01 by Ben.Woodhouse Fix for reflection capture resize assert. The assert is only valid in cooked builds, so disable it in editor #jira UE-39225 Change 3217534 on 2016/12/01 by Arne.Schober DR - Fix Merge conflict Change 3217581 on 2016/12/01 by Rolando.Caloca DR - Fix assert on debug Change 3217741 on 2016/12/01 by Benjamin.Hyder Duplicate audio fix. Change 3217890 on 2016/12/01 by Rolando.Caloca DR - Fix widget not rendering properly when hidden #jira UE-39221 Change 3218129 on 2016/12/01 by Arne.Schober DR - UE-39214 - Lod dither value as accidently cached accross the static draw list. Change 3218759 on 2016/12/02 by Guillaume.Abadie Fixes editor compositing bug caused by alpha through post processing change 3209903 #jira UE-39221 [CL 3219854 by Marcus Wassmer in Main branch]
2016-12-02 16:43:04 -05:00
{
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
FIntVector Movement = GridCenter - ClipmapViewState.LastPartialUpdateOrigin;
if (CacheType == GDF_MostlyStatic || !GAOGlobalDistanceFieldCacheMostlyStaticSeparately)
{
// Add an update region for each potential axis of camera movement
AddUpdateRegionForAxis(Movement, ClipmapBounds, CellSize, 0, Clipmap.UpdateRegions);
AddUpdateRegionForAxis(Movement, ClipmapBounds, CellSize, 1, Clipmap.UpdateRegions);
AddUpdateRegionForAxis(Movement, ClipmapBounds, CellSize, 2, Clipmap.UpdateRegions);
}
else
{
// Inherit from parent
Clipmap.UpdateRegions.Append(GlobalDistanceFieldInfo.MostlyStaticClipmaps[ClipmapIndex].UpdateRegions);
}
extern float GAOConeHalfAngle;
const float GlobalMaxSphereQueryRadius = MaxOcclusionDistance / (1 + FMath::Tan(GAOConeHalfAngle));
// Add an update region for each primitive that has been modified
for (int32 BoundsIndex = 0; BoundsIndex < ClipmapViewState.Cache[CacheType].PrimitiveModifiedBounds.Num(); BoundsIndex++)
{
AddUpdateRegionForPrimitive(ClipmapViewState.Cache[CacheType].PrimitiveModifiedBounds[BoundsIndex], GlobalMaxSphereQueryRadius, ClipmapBounds, CellSize, Clipmap.UpdateRegions);
}
int32 TotalTexelsBeingUpdated = 0;
// Trim fully contained update regions
for (int32 UpdateRegionIndex = 0; UpdateRegionIndex < Clipmap.UpdateRegions.Num(); UpdateRegionIndex++)
{
const FVolumeUpdateRegion& UpdateRegion = Clipmap.UpdateRegions[UpdateRegionIndex];
bool bCompletelyContained = false;
for (int32 OtherUpdateRegionIndex = 0; OtherUpdateRegionIndex < Clipmap.UpdateRegions.Num(); OtherUpdateRegionIndex++)
{
if (UpdateRegionIndex != OtherUpdateRegionIndex)
{
const FVolumeUpdateRegion& OtherUpdateRegion = Clipmap.UpdateRegions[OtherUpdateRegionIndex];
if (OtherUpdateRegion.Bounds.IsInsideOrOn(UpdateRegion.Bounds.Min)
&& OtherUpdateRegion.Bounds.IsInsideOrOn(UpdateRegion.Bounds.Max))
{
bCompletelyContained = true;
break;
}
}
}
if (bCompletelyContained)
{
Clipmap.UpdateRegions.RemoveAt(UpdateRegionIndex);
UpdateRegionIndex--;
}
}
// Trim overlapping regions
for (int32 UpdateRegionIndex = 0; UpdateRegionIndex < Clipmap.UpdateRegions.Num(); UpdateRegionIndex++)
{
FVolumeUpdateRegion& UpdateRegion = Clipmap.UpdateRegions[UpdateRegionIndex];
bool bEmptyRegion = false;
for (int32 OtherUpdateRegionIndex = 0; OtherUpdateRegionIndex < Clipmap.UpdateRegions.Num(); OtherUpdateRegionIndex++)
{
if (UpdateRegionIndex != OtherUpdateRegionIndex)
{
const FVolumeUpdateRegion& OtherUpdateRegion = Clipmap.UpdateRegions[OtherUpdateRegionIndex];
if (OtherUpdateRegion.Bounds.Intersect(UpdateRegion.Bounds))
{
TrimOverlappingAxis(0, CellSize, OtherUpdateRegion, UpdateRegion);
TrimOverlappingAxis(1, CellSize, OtherUpdateRegion, UpdateRegion);
TrimOverlappingAxis(2, CellSize, OtherUpdateRegion, UpdateRegion);
if (UpdateRegion.CellsSize.X == 0 || UpdateRegion.CellsSize.Y == 0 || UpdateRegion.CellsSize.Z == 0)
{
bEmptyRegion = true;
break;
}
}
}
}
if (bEmptyRegion)
{
Clipmap.UpdateRegions.RemoveAt(UpdateRegionIndex);
UpdateRegionIndex--;
}
}
// Count how many texels are being updated
for (int32 UpdateRegionIndex = 0; UpdateRegionIndex < Clipmap.UpdateRegions.Num(); UpdateRegionIndex++)
{
const FVolumeUpdateRegion& UpdateRegion = Clipmap.UpdateRegions[UpdateRegionIndex];
TotalTexelsBeingUpdated += UpdateRegion.CellsSize.X * UpdateRegion.CellsSize.Y * UpdateRegion.CellsSize.Z;
}
// Fall back to a full update if the partial updates were going to do more work
if (TotalTexelsBeingUpdated >= GAOGlobalDFResolution * GAOGlobalDFResolution * GAOGlobalDFResolution)
{
Clipmap.UpdateRegions.Reset();
bLocalUsePartialUpdates = false;
}
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3219450) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3148067 on 2016/10/01 by Daniel.Wright Support for ReflectionEnvironment and light type show flags with ForwardShading Change 3149085 on 2016/10/03 by Daniel.Wright Support for ReflectionEnvironment show flag in base pass reflections without any shader overhead Change 3162206 on 2016/10/13 by Chris.Bunner Merging Dev-MaterialLayers to Dev-Rendering, CL 3161593: Material expressions; Trig, fast-trig, saturate, round, truncate, pre-skinned normal. Added CustomEyeTangent to material attributes. Resolved some hard-coded attribute typing and other minor fixes. Change 3186067 on 2016/11/03 by Daniel.Wright Updated Stationary primitive tooltip to indicate that it allows the primitive to be changed, but not moved Change 3186069 on 2016/11/03 by Daniel.Wright Using a weighted geometric mean to combine multiple Distance Field Indirect Shadows, greatly reduces over-occlusion when overlap is high Change 3186084 on 2016/11/03 by Mark.Satterthwaite Duplicate 3172511: Don't set Metal resource option fields on texture descriptors when running on an OS that doesn't support them. #jira UE-37481 Change 3186089 on 2016/11/03 by Mark.Satterthwaite Duplicate CL #3169764: Fixed automatic conversion of G8_sRGB into RGBA8_sRGB required for Mac Metal, which fixes FORT-27627. #jira FORT-27627 Change 3186113 on 2016/11/03 by Mark.Satterthwaite Duplicate CL #3183807: Change the way we access the Metal viewport's backbuffer, to reduce possible causes of FORT-31649: - Added console variable "rhi.Metal.SupportsIntermediateBackBuffer" to control whether to use an extra render-target so we can support screenshots & movie capture, or render directly to the back-buffer to save memory & GPU performance. Still defaults to ON for Mac & OFF for iOS/tvOS. - Change the way we handle updates to the back-buffer size to ensure that the different threads access their intended version. #jira FORT-31649 Change 3186116 on 2016/11/03 by Mark.Satterthwaite Duplicate CL #3183823: Record Metal resource & state objects used in a command-buffer when rhi.Metal.RuntimeDebugLevel is set to 3 or higher. The object labels, types & descriptions will be printed on failure - if the object is deleted prior to this then we have a lifetime error and it will crash at this point and can be debugged further using our -metalretainrefs command-line option or Xcode's zombie-objects. Used to verify that FORT-31649 is not a simple resource lifetime error and thereby speed up Apple/vendor investigations. #jira FORT-31649 Change 3186818 on 2016/11/04 by Chris.Bunner PR #2907 Export UMaterialExpressionNoise (contributed by kayosiii). Change 3186979 on 2016/11/04 by Rolando.Caloca DR - Misc minor cleanup Change 3187169 on 2016/11/04 by Uriel.Doyon Incremental insertion of level data between PostLoad and AddToWorld Change 3187205 on 2016/11/04 by Mark.Satterthwaite Compile fixes for iOS. Change 3187389 on 2016/11/04 by Uriel.Doyon Fix for possible stall when loading hidden level Change 3187598 on 2016/11/04 by Michael.Trepka MetalViewport compile fix Change 3187678 on 2016/11/04 by Uriel.Doyon Fix for landscape grass textures not being streamed in correctly. Change 3187731 on 2016/11/04 by Rolando.Caloca DR - Start making type safe some cross compiler enums Change 3187824 on 2016/11/04 by Rolando.Caloca DR - clang compile fix Change 3187953 on 2016/11/04 by Rolando.Caloca DR - vk - Mac compile fix Change 3188696 on 2016/11/07 by Mark.Satterthwaite Another iOS compile fix for new MetalViewport validation code. Change 3188906 on 2016/11/07 by Rolando.Caloca DR - Show permutation of LUTBlender Change 3189094 on 2016/11/07 by Chris.Bunner Fix RemoveAAJitter from projection matrix. #jira UE-37701, UE-38003 Change 3189134 on 2016/11/07 by Daniel.Wright Fix for CreateRenderTarget2D called in construction script during cooking Change 3189145 on 2016/11/07 by Chris.Bunner Follow-up to CL 3186818, export UMaterialExpressionVectorNoise. Change 3189239 on 2016/11/07 by Daniel.Wright Added show flag for Contact Shadows, disabled in planar reflections Change 3189252 on 2016/11/07 by Daniel.Wright Support for Reflection Capture intensity with simple reflections, which are the default with Forward Shading Change 3189406 on 2016/11/07 by Mark.Satterthwaite Really fix the last of the iOS compile errors from changes to the MetalViewport code. Change 3190854 on 2016/11/08 by Ben.Woodhouse XB1: Fix memory corruption with RHICreateVertexBuffer and RHICreateIndexBuffer when using initial data (Procedural Mesh Component crash) #jira UE-34264 #fyi james.golding #fyi keith.judge Change 3190962 on 2016/11/08 by Olaf.Piesche Unshelved from pending changelist '3176615' - Gil's fix for race condiiton with particle vertex factory reuse across different passes; potential to fix a number of issues Change 3191959 on 2016/11/09 by Uriel.Doyon Removed some static primitives from the dynamic primitive handler for texture streaming. Change 3193122 on 2016/11/10 by Chris.Bunner Always update non-preview material resources for use in code preview. #jira UE-38223 Change 3193190 on 2016/11/10 by Gil.Gribb UE4 - Fixed rare bug with shadow groups rendering things that have not been setup to render this frame. #jira UE-36379 Change 3193523 on 2016/11/10 by Uriel.Doyon Fixed incorrect section bounds used for texture streaming. Change 3193962 on 2016/11/10 by Uriel.Doyon Added defrag of dynamic bounds used for the texture streaming. Allows to remove unused bounds over time. Change 3193974 on 2016/11/10 by Uriel.Doyon New "Required Texture Resolution" view mode. Showing the ratio between the currently streamed texture resolution and the resolution wanted by the GPU. Change 3194109 on 2016/11/10 by Uriel.Doyon Another patch on material bounds used for texture streaming. Change 3194665 on 2016/11/11 by Chris.Bunner Duplicated behavior for inherited velocity scaling scaling to vert/surface spawned particles. Change 3194734 on 2016/11/11 by Rolando.Caloca DR - vk - Simplified some texture casting Change 3194867 on 2016/11/11 by Rolando.Caloca DR - vk - SM5 fixes Change 3195176 on 2016/11/11 by Chris.Bunner Fixed incorrectly updated NVAPI error. Change 3195425 on 2016/11/11 by Uriel.Doyon Fixed possible invalid level reference in the texture streamer Change 3196512 on 2016/11/14 by Gil.Gribb Merging //UE4/Dev-Main@3196156 to Dev-Rendering (//UE4/Dev-Rendering) Change 3196750 on 2016/11/14 by Marcus.Wassmer Fix ordering problem with GPU cache transitions Change 3196815 on 2016/11/14 by Daniel.Wright Suppressed 'Instanced stereo rendering is not supported' warning showing up in CIS Change 3196818 on 2016/11/14 by Daniel.Wright Fixed FIndirectLightingCache::UpdateCachePrimitivesInternal churning through a bunch of temporary memory Change 3196819 on 2016/11/14 by Daniel.Wright Volume lighting samples are allowed outside of the importance volume if their influence affects the volume. Fixes black indirect lighting on movable components in maps with small importance volumes. Volume lighting samples placed on surfaces use a radius that covers the layer height spacing, which prevents an uncovered region between layers Change 3197243 on 2016/11/14 by Uriel.Doyon Async Task For Updating static component LastRender time #jira UE-24268 Change 3197359 on 2016/11/14 by Daniel.Wright Added Inscattering Texture controls to ExponentialHeightFog * When InscatteringColorCubemap is specified, directional light inscattering is disabled * Lerps betwen 1x1 mip at NonDirectionalInscatteringColorDistance to mip 0 at FullyDirectionalInscatteringColorDistance * Added FogCutoffDistance, so artists can prevent fog on skyboxes (requires fog to be setup matching the fog that was rendered into the sky texture so that distant mountains match) * Fog shader permutations based on what feature is enabled Change 3198419 on 2016/11/15 by Chris.Bunner PS4 HDR: Runtime toggle (backbuffer recreation on resize matching), UI composition. Matches PC behavior and controls. HDR: Generalized buffer formats, cvar consistency pass, LUT for UI composition, refactoring common functions. Exposed RHICreateTargetableShaderResource3D. Moved some (translucent) volume rendering helpers to allow access in Slate. Change 3198822 on 2016/11/15 by Daniel.Wright Mac compile fix Change 3199509 on 2016/11/15 by Uriel.Doyon Added support for viewmode param asset name (and note just param value). Used to investigate texture streamer behavior. Change 3199578 on 2016/11/15 by Rolando.Caloca DR - Add some shader resource tables to SCW when running with -directcompile Change 3199698 on 2016/11/15 by Rolando.Caloca DR - vk - Refactor shader & descriptor bindings Change 3199712 on 2016/11/15 by Rolando.Caloca DR - vk - r.Vulkan.StripGlsl to always strip glsl at runtime to save memory per shader Change 3199717 on 2016/11/15 by Rolando.Caloca DR - vk - Show hitching PSO info again Change 3199750 on 2016/11/15 by Rolando.Caloca DR - SCW clang compile fixes Change 3200353 on 2016/11/16 by Rolando.Caloca DR - vk - Mac fix Change 3200358 on 2016/11/16 by Chris.Bunner Only allow UI composition on platforms we currently use it. Change 3200823 on 2016/11/16 by Chris.Bunner Remove expression key attribute ID when not translating an attribute output to allow intended expression sharing. #jira UE-38699 Change 3200947 on 2016/11/16 by Mark.Satterthwaite Fix UE-38695 by not trying to resize the viewport on the wrong thread. #jira UE-38695 Change 3201069 on 2016/11/16 by Daniel.Wright Fog inscattering texture limited to SM4 and above, fixes ES2 compile errors Change 3201346 on 2016/11/16 by Brian.Karis Temporal AA fix for correct edge gradients. Filtering now combined with importance sampling. Enabled Catmull-Rom resolve filter. Results are now slightly sharper. Fixed antighosting. Will yet require a dilation to be perfect. Optimized bicubic filtering to 5 taps instead of 9. Cleaned out unused code. Change 3201369 on 2016/11/16 by Brian.Karis Bicubic texture sample Change 3201522 on 2016/11/16 by Rolando.Caloca DR - vk - Fix static analysis issues Change 3201878 on 2016/11/17 by Chris.Bunner Temporarily disable Nvapi HDR error logging. #jira UE-38529 Change 3202108 on 2016/11/17 by Simon.Tovey Assets with easy repro for flickering particles bug Change 3202181 on 2016/11/17 by Rolando.Caloca DR - vk - CIS android fix Change 3202325 on 2016/11/17 by Ben.Woodhouse Integrate 4.14.1 fix from 14 //UE4/Release-4.14 (@3201850) Fix CreateVertexbuffer and CreateIndexBuffer memory corruption (Procedural Mesh Component crash) #jira UE-34264 Change 3204394 on 2016/11/18 by Guillaume.Abadie PR #2808: AlphaComposite Fog Opacity fix (Contributed by moritz-wundke) #br Ben.Woodhouse Change 3204428 on 2016/11/18 by Guillaume.Abadie Fixes a couple of issues in decals: * Crash in FDecalDrawingPolicyFactory::DrawMesh() * ActorPostion material expression * PixelNormalWS material expression * Missing renaming from DEFERRED_DECAL to DECAL_PRIMITIVE #jira UE-38327, UE-38158, UE-37818, UE-37350 Change 3204429 on 2016/11/18 by Uriel.Doyon Darker default undefined accuracy. Reenabled the texture streaming build in the build all. Change 3204458 on 2016/11/18 by Chris.Bunner Shader truncation warnings fix. Change 3204459 on 2016/11/18 by Chris.Bunner Engine 'Passthrough' material fuction fix. V4 is now actually a V4. Change 3204460 on 2016/11/18 by Chris.Bunner Correctly handle some known Nvapi warnings. #jira UE-38529 Change 3204653 on 2016/11/18 by Marc.Olano Helper functions for tiled textures Checking in for Ryan Brucks Change 3204863 on 2016/11/18 by Arne.Schober DR - Replaced ENQUEUE_UNIQUE_RENDER_COMMAND with a Debuggable template Implementation Change 3204939 on 2016/11/18 by Arne.Schober DR - Make clang happy Change 3204968 on 2016/11/18 by Arne.Schober DR - UE-38494 - Fixed SpeedTree Wind crash, when force deleting the Asset. Change 3206293 on 2016/11/21 by Uriel.Doyon New member bHasStreamingUpdatePending in UTexture2D to delay update of global distance fields. Set to true when the streamer can possibly load a mip in the near future. #jira UE-37787 Change 3206551 on 2016/11/21 by Chris.Bunner Added material update context when forcing all shaders to recompile. #jira UE-38481 Change 3206644 on 2016/11/21 by Benjamin.Hyder Updating Planar Reflection example in TM-Shadermodels. Change 3206899 on 2016/11/21 by Rolando.Caloca DR - vk - SM5 fixes Change 3206900 on 2016/11/21 by Rolando.Caloca DR - Added missing strings for shader formats Change 3206983 on 2016/11/21 by Rolando.Caloca DR - vk - Support for SV_Coverage Change 3207237 on 2016/11/22 by Simon.Tovey Exporting particle module base and a couple of child classes as it's commonly requested. #test compiles Change 3207241 on 2016/11/22 by Gil.Gribb Merging //UE4/Dev-Main@3206998 to Dev-Rendering (//UE4/Dev-Rendering) Change 3207520 on 2016/11/22 by Ben.Woodhouse Cherry picked from //Fortnite/Main@3206301 Fixed GPU hang in Zone Map view. Was an issue with RenderThread using the device context without appropriate RHIThread flushes. #jira FORT-31616 #code_review keith.judge Change 3207541 on 2016/11/22 by Ben.Woodhouse Cherry picked from //fortnite/Main@3207422 * Fix UpdateTexture3D to create a staging texture of the region to update rather than the whole texture. This prevents distance fields crashing during update (allocating 18GB per frame in some cases) * Put UpdateTexture2D DMA support onto a cvar, disabled by default (corruption issues reported by licensees, plus not sure it's actually faster - could be slower due to reduced bandwidth; issues reported by licensees) * Fix UpdateTexture2D to only create a staging texture of the region to update, saving memory #jira UE-38609 Change 3207654 on 2016/11/22 by Chris.Bunner Don't flag 16-bit PNG/JPG textures as sRGB on import. #jira UE-30279 Change 3208434 on 2016/11/22 by Rolando.Caloca DR - vk - UAV transitions Change 3208490 on 2016/11/22 by Chris.Bunner Break material code sharing when we detect an unresolvable loop. By default change IsResultMA loop detection to stop on functions as we can determine type definitively. Unified IsResultMA detection across switch nodes. Change 3208860 on 2016/11/23 by Rolando.Caloca DR - vk - Fix some format issues Change 3209265 on 2016/11/23 by Arne.Schober DR - originally unshelved from 3153924 - Made Depth and Velocity Rendering Passes to use PSO only RHI interface, We are now passing down two structs that collect all the necessary information for the drawing policies to construct a PSO object. One during construction of the Policy, which contains information abouyt the CullMode, FillMode and PrimType. And another during rendering that passes infomation like BlendState and DepthStencilState down to the low levelrenderer into SetSharedState. Performance of the static drawlist ist slightly slower (less than 0.1ms on Consoles) due to some addtional branches and copies. The branches in the FDrawingPolicyRenderState will go away as soon as everything is converted to use the PSO interface. Performace of the GPU is slightly better due to less context rolls (mainly CullMode sorts in differently now) Change 3209305 on 2016/11/23 by Guillaume.Abadie Fix contact shadow's assemption on objects thickness Change 3209334 on 2016/11/23 by Brian.Karis Fixed TAA handling of alpha. Switched the meaning of AA_ALPHA to make sense. Change 3209903 on 2016/11/24 by Guillaume.Abadie Cherry picks alpha through post processing changelists 3201959, 3204143 and 3209883 from //UE4/Private-Partner-NREAL Change 3209973 on 2016/11/24 by Ben.Woodhouse Fix D3D11 and 12 static analysis warnings reported by Rob Troughton of Coconut Lizard (http://coconutlizard.co.uk/blog/ue4/pvs-studio-part5/) Change 3210023 on 2016/11/24 by Uriel.Doyon Fixed an issue with DropDetail when FixedFrameRate was set to a value smaller than MinDesiredFrameRate. #jira UE-37210 Change 3210026 on 2016/11/24 by Ben.Woodhouse Disable renderthread hang detection if a debugger is present, so we can debug the renderthread without crashing Change 3210049 on 2016/11/24 by Ben.Woodhouse Fix mac build Change 3210071 on 2016/11/24 by Uriel.Doyon Fixed an issue with masked materials and shader complexity viewmode when DBuffer Decals are enabled. #jira UE-37542 Change 3210374 on 2016/11/25 by Ben.Woodhouse * Fix issues with fast cleared dbuffer targets not being resolved when no decals are in the scene. This caused graphical corruption on XB1 and ensure failures on PS4 (with RHIThread disabled) * Move Decal rendertarget manager function implementations out of the header. #jira UE-38879 Change 3210390 on 2016/11/25 by Uriel.Doyon Fixed cubemap resourcesize not taking into account mipgen settings #jira UE-37045 Change 3210407 on 2016/11/25 by Uriel.Doyon "resavepackages" commandlet now supports -buildtexturestreaming that rebuilds the map texture streaming data. That can be used in combination with -buildlighting. Change 3210563 on 2016/11/27 by Rolando.Caloca DR - vk - Integrate cached memory fixes and PF_D24 format fix #jira UE-39025 PR #2974 Change 3210564 on 2016/11/27 by Rolando.Caloca DR - Fix for GL linker PR #2975 #jira UE-39029 Change 3210592 on 2016/11/27 by Rolando.Caloca DR - vk - SM5 fixes Change 3210597 on 2016/11/27 by Rolando.Caloca DR - vk - Prep for staging UB copies to GPU memory Change 3210600 on 2016/11/27 by Rolando.Caloca DR - vk - Extract generic range code Change 3210613 on 2016/11/27 by Rolando.Caloca DR - vk - Added r.Vulkan.SubmitOnDispatch Change 3211054 on 2016/11/28 by Rolando.Caloca DR - vk - Missing reference Change 3211330 on 2016/11/28 by Chris.Bunner Shader compile error for max texture coordinate count on skinned meshes. Change 3211384 on 2016/11/28 by Arne.Schober DR - Enforce move on EnqueueRenderCommand Lambda Change 3211431 on 2016/11/28 by Gil.Gribb Merging //UE4/Dev-Main@3211016 to Dev-Rendering (//UE4/Dev-Rendering) Change 3211738 on 2016/11/28 by Gil.Gribb IWYU fixes after merge Change 3212231 on 2016/11/28 by Richard.Wallis Fix build errors Change 3212253 on 2016/11/28 by Richard.Wallis Remove MacGraphicsSwitching plugin. #jira UE-37640 Change 3212310 on 2016/11/28 by Rolando.Caloca DR - vk - Update glslang to 1.0.33.0 Change 3212446 on 2016/11/28 by Guillaume.Abadie Implements PreviousFrameSwitch material expression Change 3212594 on 2016/11/28 by Arne.Schober DR - Fix missing include Change 3212681 on 2016/11/29 by Rolando.Caloca DR - vk - Auto flush for compute shader Change 3213000 on 2016/11/29 by Gil.Gribb temp fix for PF_MAX Change 3213161 on 2016/11/29 by Ben.Woodhouse Integrate latest D3D12 changes from //depot/Partners/Microsoft/UE4-DX12/...@3211714 Using: - p4 integrate //depot/Partners/Microsoft/UE4-DX12/Engine/Source/Runtime/D3D12RHI/...@3211714 //UE4/Dev-Rendering/Engine/Source/Runtime/D3D12RHI/... - p4 integrate //depot/Partners/Microsoft/UE4-DX12/Engine/Source/ThirdParty/Windows/DirectX/...@3211714 //UE4/Dev-Rendering/Engine/Source/ThirdParty/Windows/DirectX/... - p4 integrate //depot/Partners/Microsoft/UE4-DX12/Engine/Source/Programs/UnrealBuildTool/...@3211714 //UE4/Dev-Rendering/Engine/Source/Programs/UnrealBuildTool/... Changes from UE4-DX12: *** CL 3183818 *** Update D3D12 RHI to 4.14: - Merged changes from Epic up until 10/20/16 - Fixed an issue where command allocators where resetting too early. I changed to aggressive command list batching by default now that more SubmitCommandListHint calls exist in the upper engine, we don't need to worry about starving the GPU. Fewer ExecuteCommandLists calls means better performance and fewer Signals() so this change provides a GPU perf win. I had to fix an issue with aggressive batching where we would sometimes hold on to a command list long enough (in the pending list) but hadn't executed it yet. The command allocator was being put back in the queue of allocators during ReleaseCommandAllocator() without a syncpoint set and was thus being reset too early. I added a simple counter to the command allocator so it could track how many command lists were using it. It doesn't need to be thread safe since only one thread uses a command allocator at a time. I also added some stats around the # command lists and # command allocators since it would be possible to leak command allocators now if it's pending command list count isn't decremented correctly. In that case we'd keep creating new command allocators and eventually run out of memory. -Remove clear during allocate in the FD3D12FastConstantAllocator and FD3D12FastAllocator. The supplied resource locations are assumed to be new and thus don't need to be cleared. -Cleanup D3D12RHI stats. There were some unused stats as well as some missing ones. -Mark shader resource table uniform buffers as dirty only when the shader changes. Cleanup SetComputeShader calls and Dispatch calls to not set/unset the CS for each Dispatch. -Remove unused Check SRV resolved code that epic added to the D3D11 RHI and was brought over. We dont need it and we won't use this. -Remove "always on" cycle counters for high frequency RHI methods like RHISetShaderTexture. These should use the engine's stat macros as they are removed on TEST + SHIPPING builds. On Xbox a significant amount of CPU time is spent in things like QueryPerformanceCounter even when STATS aren't enabled. Currently 1% of an entire capture on XBOX is spent inside this call. I improved and cleaned up high freqency call stacks like: - RHISetShaderTexture - RHISetShaderResourceViewParameter - RHISetShaderParameter - RHISetUAVParameter In general I moved to use templated functions, removed unused parameters, unnecessary copies, etc. -Change D3D12 PIX profiling enable/disable to match Xbox and handle logic in the UEBuildWindows.cs for UBT. Also add a static assert to inform the developer when PIX profiling is requested but the engine is compiling out draw events. -Resources should be associated with the rendering thread's frame that it's currently recording command lists for and they shouldnt be cleaned up until those command lists have been translated to D3D12 command lists on the RHI thread AND completed executing on the GPU. This was confirmed to resolve an issue where CBV resources were being released too early. This work involved a couple changes: 1) Move the "frame" fence to be incremented on the rendering thread (during RHIAdvanceFrameForGetViewportBackBuffer()) so that resources that are deleted from the rendering thread are assosicated with the correct frame count 2) Queue up a command from the rendering thread to signal the "frame" fence. It needs to be queued to ensure that it's signaled at the correct time on the RHI thread (after that frame's command lists have been executed). -Disable GRHIRequiresEarlyBackBufferRenderTarget. Metal/Vulkan/Xbox11.x already do this. This is used by the Slate renderer during BeginRenderFrame and avoids a SetRenderTargets call. -Enable GRHISupportsMSAADepthSampleAccess (used in the Editor). This was enabled for D3D11 on SM5, but not for D3D12. -Delay load D3D12.dll and add root signature 1.1 support. -Add explicit flush calls to improve resource barrier batching instead of implict flushes inside FConditionalScopeResourceBarrier and FScopeResourceBarrier. Also update those classes with const members. *** CL 3183824 *** Fix the D3D12 RHI after integrating UE 4.14 updates: - Fixed a bug where we would try to get the PSO of a nullptr in SetPipelineState if we needed to reset the current PSO on the cmd list. - Fixed a spelling error - Removed the need for bForceState, we use dirty bits now *** CL 3183830 *** - GetDebugFlags RHI extension, needed by XB1 movie player. - Only query memory info if stats are enabled - Add support for the engine's new RHISubmitCommandsAndFlushGPU function - Update CommitPendingPipelineState to be Graphics/Compute specific and avoid the need for a IsCompute parameter. *** CL 3183837 *** Made PipelineState caches contain pointers to FD3D12PipelineState objects to avoid issues with using pointers to after Find/Add to the maps. TMap indicates that the pointer to the value associated with a key "is only valid until the next change to any key in the map." The lifetime of the PSO pointers is managed by the low level caches (graphics and compute). Added stat for the number of Pipeline State Objects. *** CL 3183931 *** Update Windows D3D12 headers and libs to RS1 release bits (10.0.14393.0) *** CL 3183978 *** Update UBT Windows build settings: - Change D3D12 PIX profiling enable/disable to match Xbox and handle logic in the UEBuildWindows.cs for UBT. Also add a static assert to inform the developer when PIX profiling is requested but the engine is compiling out draw events. -Delay load D3D12.dll and add root signature 1.1 support. *** CL 3184132 *** Fix Xbox PSO cache code where it could leak PSOs. Related to change 3183837. *** Changelist 3211714 *** Update D3D12 RHI with fixes: - Check if we can reserve slots in GatherUniqueSamplerTables - DirtyState more often in StateCache - Remove InternalSetSamplerState. The alternate function isn't used. - Allow MRTClear for arrays with holes in them - Fix uninitialized descriptors. This was causing a GPU hang on Xbox. We need to set dirty bits for resources bound to slots outside of the current descriptor table's range - Cleanup SetDescriptorHeap code. Move setting descriptor heap logic to the descriptor cache since it also owns things like the sampler maps. Added members to the descriptor cache to track the last heaps that were set on the command list to avoid dirtying bit unnecessarily. - Resource transitions: go through Common between queues (3D <--> Compute) - Fix initial state for placed resources. - Merging epic Change 3213250 on 2016/11/29 by Chris.Bunner GBufferHints tooltip fix. #jira UE-39103 Change 3213345 on 2016/11/29 by Gil.Gribb more IWYU fallout Change 3213676 on 2016/11/29 by Rolando.Caloca DR - Fix incorrect texture getting cleared Change 3213728 on 2016/11/29 by Rolando.Caloca DR - Lambda-ize Change 3214461 on 2016/11/29 by Ben.Woodhouse Rollout August QFE4 XDK (required for latest DX12 changes on XB1) Change 3215317 on 2016/11/30 by Daniel.Wright PS4 compile fix Change 3216343 on 2016/11/30 by Arne.Schober DR - UE-39155 - after talking to Brian it occurred to us that flipping the world space normal is non sensical. And indeed the Grass was using world space normals. Change 3216844 on 2016/12/01 by Ben.Woodhouse Fix for static analysis warnings after discussion with Microsoft Change 3216916 on 2016/12/01 by Gil.Gribb Merging //UE4/Dev-Main@3216539 to Dev-Rendering (//UE4/Dev-Rendering) Change 3217385 on 2016/12/01 by Arne.Schober DR - UE-39218, UE-39221, UE-39224 and potentially UE-39214 - The Stencil bits for Light channels and decal application were not set in the dynamic basepass Change 3217464 on 2016/12/01 by Ben.Woodhouse Fix for reflection capture resize assert. The assert is only valid in cooked builds, so disable it in editor #jira UE-39225 Change 3217534 on 2016/12/01 by Arne.Schober DR - Fix Merge conflict Change 3217581 on 2016/12/01 by Rolando.Caloca DR - Fix assert on debug Change 3217741 on 2016/12/01 by Benjamin.Hyder Duplicate audio fix. Change 3217890 on 2016/12/01 by Rolando.Caloca DR - Fix widget not rendering properly when hidden #jira UE-39221 Change 3218129 on 2016/12/01 by Arne.Schober DR - UE-39214 - Lod dither value as accidently cached accross the static draw list. Change 3218759 on 2016/12/02 by Guillaume.Abadie Fixes editor compositing bug caused by alpha through post processing change 3209903 #jira UE-39221 [CL 3219854 by Marcus Wassmer in Main branch]
2016-12-02 16:43:04 -05:00
}
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
if (!bLocalUsePartialUpdates)
{
FVolumeUpdateRegion UpdateRegion;
UpdateRegion.Bounds = ClipmapBounds;
UpdateRegion.CellsSize = FIntVector(GAOGlobalDFResolution);
Clipmap.UpdateRegions.Add(UpdateRegion);
}
// Check if the clipmap intersects with a pending update region
bool bHasPendingStreaming = false;
for (const FBox& HeightfieldBox : PendingStreamingHeightfieldBoxes)
{
if (ClipmapBounds.Intersect(HeightfieldBox))
{
bHasPendingStreaming = true;
break;
}
}
// If some of the height fields has pending streaming regions, postpone a full update.
if (bHasPendingStreaming)
{
// Mark a pending update for this height field. It will get processed when all pending texture streaming affecting it will be completed.
View.ViewState->DeferredGlobalDistanceFieldUpdates[CacheType].AddUnique(ClipmapIndex);
// Remove the height fields from the update.
for (FVolumeUpdateRegion& UpdateRegion : Clipmap.UpdateRegions)
{
UpdateRegion.UpdateType = (EVolumeUpdateType)(UpdateRegion.UpdateType & ~VUT_Heightfields);
}
}
else if (View.ViewState->DeferredGlobalDistanceFieldUpdates[CacheType].Remove(ClipmapIndex) > 0)
{
// Remove the height fields from the current update as we are pushing a new full update.
for (FVolumeUpdateRegion& UpdateRegion : Clipmap.UpdateRegions)
{
UpdateRegion.UpdateType = (EVolumeUpdateType)(UpdateRegion.UpdateType & ~VUT_Heightfields);
}
FVolumeUpdateRegion UpdateRegion;
UpdateRegion.Bounds = ClipmapBounds;
UpdateRegion.CellsSize = FIntVector(GAOGlobalDFResolution);
UpdateRegion.UpdateType = VUT_Heightfields;
Clipmap.UpdateRegions.Add(UpdateRegion);
}
ClipmapViewState.Cache[CacheType].PrimitiveModifiedBounds.Reset();
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3219450) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3148067 on 2016/10/01 by Daniel.Wright Support for ReflectionEnvironment and light type show flags with ForwardShading Change 3149085 on 2016/10/03 by Daniel.Wright Support for ReflectionEnvironment show flag in base pass reflections without any shader overhead Change 3162206 on 2016/10/13 by Chris.Bunner Merging Dev-MaterialLayers to Dev-Rendering, CL 3161593: Material expressions; Trig, fast-trig, saturate, round, truncate, pre-skinned normal. Added CustomEyeTangent to material attributes. Resolved some hard-coded attribute typing and other minor fixes. Change 3186067 on 2016/11/03 by Daniel.Wright Updated Stationary primitive tooltip to indicate that it allows the primitive to be changed, but not moved Change 3186069 on 2016/11/03 by Daniel.Wright Using a weighted geometric mean to combine multiple Distance Field Indirect Shadows, greatly reduces over-occlusion when overlap is high Change 3186084 on 2016/11/03 by Mark.Satterthwaite Duplicate 3172511: Don't set Metal resource option fields on texture descriptors when running on an OS that doesn't support them. #jira UE-37481 Change 3186089 on 2016/11/03 by Mark.Satterthwaite Duplicate CL #3169764: Fixed automatic conversion of G8_sRGB into RGBA8_sRGB required for Mac Metal, which fixes FORT-27627. #jira FORT-27627 Change 3186113 on 2016/11/03 by Mark.Satterthwaite Duplicate CL #3183807: Change the way we access the Metal viewport's backbuffer, to reduce possible causes of FORT-31649: - Added console variable "rhi.Metal.SupportsIntermediateBackBuffer" to control whether to use an extra render-target so we can support screenshots & movie capture, or render directly to the back-buffer to save memory & GPU performance. Still defaults to ON for Mac & OFF for iOS/tvOS. - Change the way we handle updates to the back-buffer size to ensure that the different threads access their intended version. #jira FORT-31649 Change 3186116 on 2016/11/03 by Mark.Satterthwaite Duplicate CL #3183823: Record Metal resource & state objects used in a command-buffer when rhi.Metal.RuntimeDebugLevel is set to 3 or higher. The object labels, types & descriptions will be printed on failure - if the object is deleted prior to this then we have a lifetime error and it will crash at this point and can be debugged further using our -metalretainrefs command-line option or Xcode's zombie-objects. Used to verify that FORT-31649 is not a simple resource lifetime error and thereby speed up Apple/vendor investigations. #jira FORT-31649 Change 3186818 on 2016/11/04 by Chris.Bunner PR #2907 Export UMaterialExpressionNoise (contributed by kayosiii). Change 3186979 on 2016/11/04 by Rolando.Caloca DR - Misc minor cleanup Change 3187169 on 2016/11/04 by Uriel.Doyon Incremental insertion of level data between PostLoad and AddToWorld Change 3187205 on 2016/11/04 by Mark.Satterthwaite Compile fixes for iOS. Change 3187389 on 2016/11/04 by Uriel.Doyon Fix for possible stall when loading hidden level Change 3187598 on 2016/11/04 by Michael.Trepka MetalViewport compile fix Change 3187678 on 2016/11/04 by Uriel.Doyon Fix for landscape grass textures not being streamed in correctly. Change 3187731 on 2016/11/04 by Rolando.Caloca DR - Start making type safe some cross compiler enums Change 3187824 on 2016/11/04 by Rolando.Caloca DR - clang compile fix Change 3187953 on 2016/11/04 by Rolando.Caloca DR - vk - Mac compile fix Change 3188696 on 2016/11/07 by Mark.Satterthwaite Another iOS compile fix for new MetalViewport validation code. Change 3188906 on 2016/11/07 by Rolando.Caloca DR - Show permutation of LUTBlender Change 3189094 on 2016/11/07 by Chris.Bunner Fix RemoveAAJitter from projection matrix. #jira UE-37701, UE-38003 Change 3189134 on 2016/11/07 by Daniel.Wright Fix for CreateRenderTarget2D called in construction script during cooking Change 3189145 on 2016/11/07 by Chris.Bunner Follow-up to CL 3186818, export UMaterialExpressionVectorNoise. Change 3189239 on 2016/11/07 by Daniel.Wright Added show flag for Contact Shadows, disabled in planar reflections Change 3189252 on 2016/11/07 by Daniel.Wright Support for Reflection Capture intensity with simple reflections, which are the default with Forward Shading Change 3189406 on 2016/11/07 by Mark.Satterthwaite Really fix the last of the iOS compile errors from changes to the MetalViewport code. Change 3190854 on 2016/11/08 by Ben.Woodhouse XB1: Fix memory corruption with RHICreateVertexBuffer and RHICreateIndexBuffer when using initial data (Procedural Mesh Component crash) #jira UE-34264 #fyi james.golding #fyi keith.judge Change 3190962 on 2016/11/08 by Olaf.Piesche Unshelved from pending changelist '3176615' - Gil's fix for race condiiton with particle vertex factory reuse across different passes; potential to fix a number of issues Change 3191959 on 2016/11/09 by Uriel.Doyon Removed some static primitives from the dynamic primitive handler for texture streaming. Change 3193122 on 2016/11/10 by Chris.Bunner Always update non-preview material resources for use in code preview. #jira UE-38223 Change 3193190 on 2016/11/10 by Gil.Gribb UE4 - Fixed rare bug with shadow groups rendering things that have not been setup to render this frame. #jira UE-36379 Change 3193523 on 2016/11/10 by Uriel.Doyon Fixed incorrect section bounds used for texture streaming. Change 3193962 on 2016/11/10 by Uriel.Doyon Added defrag of dynamic bounds used for the texture streaming. Allows to remove unused bounds over time. Change 3193974 on 2016/11/10 by Uriel.Doyon New "Required Texture Resolution" view mode. Showing the ratio between the currently streamed texture resolution and the resolution wanted by the GPU. Change 3194109 on 2016/11/10 by Uriel.Doyon Another patch on material bounds used for texture streaming. Change 3194665 on 2016/11/11 by Chris.Bunner Duplicated behavior for inherited velocity scaling scaling to vert/surface spawned particles. Change 3194734 on 2016/11/11 by Rolando.Caloca DR - vk - Simplified some texture casting Change 3194867 on 2016/11/11 by Rolando.Caloca DR - vk - SM5 fixes Change 3195176 on 2016/11/11 by Chris.Bunner Fixed incorrectly updated NVAPI error. Change 3195425 on 2016/11/11 by Uriel.Doyon Fixed possible invalid level reference in the texture streamer Change 3196512 on 2016/11/14 by Gil.Gribb Merging //UE4/Dev-Main@3196156 to Dev-Rendering (//UE4/Dev-Rendering) Change 3196750 on 2016/11/14 by Marcus.Wassmer Fix ordering problem with GPU cache transitions Change 3196815 on 2016/11/14 by Daniel.Wright Suppressed 'Instanced stereo rendering is not supported' warning showing up in CIS Change 3196818 on 2016/11/14 by Daniel.Wright Fixed FIndirectLightingCache::UpdateCachePrimitivesInternal churning through a bunch of temporary memory Change 3196819 on 2016/11/14 by Daniel.Wright Volume lighting samples are allowed outside of the importance volume if their influence affects the volume. Fixes black indirect lighting on movable components in maps with small importance volumes. Volume lighting samples placed on surfaces use a radius that covers the layer height spacing, which prevents an uncovered region between layers Change 3197243 on 2016/11/14 by Uriel.Doyon Async Task For Updating static component LastRender time #jira UE-24268 Change 3197359 on 2016/11/14 by Daniel.Wright Added Inscattering Texture controls to ExponentialHeightFog * When InscatteringColorCubemap is specified, directional light inscattering is disabled * Lerps betwen 1x1 mip at NonDirectionalInscatteringColorDistance to mip 0 at FullyDirectionalInscatteringColorDistance * Added FogCutoffDistance, so artists can prevent fog on skyboxes (requires fog to be setup matching the fog that was rendered into the sky texture so that distant mountains match) * Fog shader permutations based on what feature is enabled Change 3198419 on 2016/11/15 by Chris.Bunner PS4 HDR: Runtime toggle (backbuffer recreation on resize matching), UI composition. Matches PC behavior and controls. HDR: Generalized buffer formats, cvar consistency pass, LUT for UI composition, refactoring common functions. Exposed RHICreateTargetableShaderResource3D. Moved some (translucent) volume rendering helpers to allow access in Slate. Change 3198822 on 2016/11/15 by Daniel.Wright Mac compile fix Change 3199509 on 2016/11/15 by Uriel.Doyon Added support for viewmode param asset name (and note just param value). Used to investigate texture streamer behavior. Change 3199578 on 2016/11/15 by Rolando.Caloca DR - Add some shader resource tables to SCW when running with -directcompile Change 3199698 on 2016/11/15 by Rolando.Caloca DR - vk - Refactor shader & descriptor bindings Change 3199712 on 2016/11/15 by Rolando.Caloca DR - vk - r.Vulkan.StripGlsl to always strip glsl at runtime to save memory per shader Change 3199717 on 2016/11/15 by Rolando.Caloca DR - vk - Show hitching PSO info again Change 3199750 on 2016/11/15 by Rolando.Caloca DR - SCW clang compile fixes Change 3200353 on 2016/11/16 by Rolando.Caloca DR - vk - Mac fix Change 3200358 on 2016/11/16 by Chris.Bunner Only allow UI composition on platforms we currently use it. Change 3200823 on 2016/11/16 by Chris.Bunner Remove expression key attribute ID when not translating an attribute output to allow intended expression sharing. #jira UE-38699 Change 3200947 on 2016/11/16 by Mark.Satterthwaite Fix UE-38695 by not trying to resize the viewport on the wrong thread. #jira UE-38695 Change 3201069 on 2016/11/16 by Daniel.Wright Fog inscattering texture limited to SM4 and above, fixes ES2 compile errors Change 3201346 on 2016/11/16 by Brian.Karis Temporal AA fix for correct edge gradients. Filtering now combined with importance sampling. Enabled Catmull-Rom resolve filter. Results are now slightly sharper. Fixed antighosting. Will yet require a dilation to be perfect. Optimized bicubic filtering to 5 taps instead of 9. Cleaned out unused code. Change 3201369 on 2016/11/16 by Brian.Karis Bicubic texture sample Change 3201522 on 2016/11/16 by Rolando.Caloca DR - vk - Fix static analysis issues Change 3201878 on 2016/11/17 by Chris.Bunner Temporarily disable Nvapi HDR error logging. #jira UE-38529 Change 3202108 on 2016/11/17 by Simon.Tovey Assets with easy repro for flickering particles bug Change 3202181 on 2016/11/17 by Rolando.Caloca DR - vk - CIS android fix Change 3202325 on 2016/11/17 by Ben.Woodhouse Integrate 4.14.1 fix from 14 //UE4/Release-4.14 (@3201850) Fix CreateVertexbuffer and CreateIndexBuffer memory corruption (Procedural Mesh Component crash) #jira UE-34264 Change 3204394 on 2016/11/18 by Guillaume.Abadie PR #2808: AlphaComposite Fog Opacity fix (Contributed by moritz-wundke) #br Ben.Woodhouse Change 3204428 on 2016/11/18 by Guillaume.Abadie Fixes a couple of issues in decals: * Crash in FDecalDrawingPolicyFactory::DrawMesh() * ActorPostion material expression * PixelNormalWS material expression * Missing renaming from DEFERRED_DECAL to DECAL_PRIMITIVE #jira UE-38327, UE-38158, UE-37818, UE-37350 Change 3204429 on 2016/11/18 by Uriel.Doyon Darker default undefined accuracy. Reenabled the texture streaming build in the build all. Change 3204458 on 2016/11/18 by Chris.Bunner Shader truncation warnings fix. Change 3204459 on 2016/11/18 by Chris.Bunner Engine 'Passthrough' material fuction fix. V4 is now actually a V4. Change 3204460 on 2016/11/18 by Chris.Bunner Correctly handle some known Nvapi warnings. #jira UE-38529 Change 3204653 on 2016/11/18 by Marc.Olano Helper functions for tiled textures Checking in for Ryan Brucks Change 3204863 on 2016/11/18 by Arne.Schober DR - Replaced ENQUEUE_UNIQUE_RENDER_COMMAND with a Debuggable template Implementation Change 3204939 on 2016/11/18 by Arne.Schober DR - Make clang happy Change 3204968 on 2016/11/18 by Arne.Schober DR - UE-38494 - Fixed SpeedTree Wind crash, when force deleting the Asset. Change 3206293 on 2016/11/21 by Uriel.Doyon New member bHasStreamingUpdatePending in UTexture2D to delay update of global distance fields. Set to true when the streamer can possibly load a mip in the near future. #jira UE-37787 Change 3206551 on 2016/11/21 by Chris.Bunner Added material update context when forcing all shaders to recompile. #jira UE-38481 Change 3206644 on 2016/11/21 by Benjamin.Hyder Updating Planar Reflection example in TM-Shadermodels. Change 3206899 on 2016/11/21 by Rolando.Caloca DR - vk - SM5 fixes Change 3206900 on 2016/11/21 by Rolando.Caloca DR - Added missing strings for shader formats Change 3206983 on 2016/11/21 by Rolando.Caloca DR - vk - Support for SV_Coverage Change 3207237 on 2016/11/22 by Simon.Tovey Exporting particle module base and a couple of child classes as it's commonly requested. #test compiles Change 3207241 on 2016/11/22 by Gil.Gribb Merging //UE4/Dev-Main@3206998 to Dev-Rendering (//UE4/Dev-Rendering) Change 3207520 on 2016/11/22 by Ben.Woodhouse Cherry picked from //Fortnite/Main@3206301 Fixed GPU hang in Zone Map view. Was an issue with RenderThread using the device context without appropriate RHIThread flushes. #jira FORT-31616 #code_review keith.judge Change 3207541 on 2016/11/22 by Ben.Woodhouse Cherry picked from //fortnite/Main@3207422 * Fix UpdateTexture3D to create a staging texture of the region to update rather than the whole texture. This prevents distance fields crashing during update (allocating 18GB per frame in some cases) * Put UpdateTexture2D DMA support onto a cvar, disabled by default (corruption issues reported by licensees, plus not sure it's actually faster - could be slower due to reduced bandwidth; issues reported by licensees) * Fix UpdateTexture2D to only create a staging texture of the region to update, saving memory #jira UE-38609 Change 3207654 on 2016/11/22 by Chris.Bunner Don't flag 16-bit PNG/JPG textures as sRGB on import. #jira UE-30279 Change 3208434 on 2016/11/22 by Rolando.Caloca DR - vk - UAV transitions Change 3208490 on 2016/11/22 by Chris.Bunner Break material code sharing when we detect an unresolvable loop. By default change IsResultMA loop detection to stop on functions as we can determine type definitively. Unified IsResultMA detection across switch nodes. Change 3208860 on 2016/11/23 by Rolando.Caloca DR - vk - Fix some format issues Change 3209265 on 2016/11/23 by Arne.Schober DR - originally unshelved from 3153924 - Made Depth and Velocity Rendering Passes to use PSO only RHI interface, We are now passing down two structs that collect all the necessary information for the drawing policies to construct a PSO object. One during construction of the Policy, which contains information abouyt the CullMode, FillMode and PrimType. And another during rendering that passes infomation like BlendState and DepthStencilState down to the low levelrenderer into SetSharedState. Performance of the static drawlist ist slightly slower (less than 0.1ms on Consoles) due to some addtional branches and copies. The branches in the FDrawingPolicyRenderState will go away as soon as everything is converted to use the PSO interface. Performace of the GPU is slightly better due to less context rolls (mainly CullMode sorts in differently now) Change 3209305 on 2016/11/23 by Guillaume.Abadie Fix contact shadow's assemption on objects thickness Change 3209334 on 2016/11/23 by Brian.Karis Fixed TAA handling of alpha. Switched the meaning of AA_ALPHA to make sense. Change 3209903 on 2016/11/24 by Guillaume.Abadie Cherry picks alpha through post processing changelists 3201959, 3204143 and 3209883 from //UE4/Private-Partner-NREAL Change 3209973 on 2016/11/24 by Ben.Woodhouse Fix D3D11 and 12 static analysis warnings reported by Rob Troughton of Coconut Lizard (http://coconutlizard.co.uk/blog/ue4/pvs-studio-part5/) Change 3210023 on 2016/11/24 by Uriel.Doyon Fixed an issue with DropDetail when FixedFrameRate was set to a value smaller than MinDesiredFrameRate. #jira UE-37210 Change 3210026 on 2016/11/24 by Ben.Woodhouse Disable renderthread hang detection if a debugger is present, so we can debug the renderthread without crashing Change 3210049 on 2016/11/24 by Ben.Woodhouse Fix mac build Change 3210071 on 2016/11/24 by Uriel.Doyon Fixed an issue with masked materials and shader complexity viewmode when DBuffer Decals are enabled. #jira UE-37542 Change 3210374 on 2016/11/25 by Ben.Woodhouse * Fix issues with fast cleared dbuffer targets not being resolved when no decals are in the scene. This caused graphical corruption on XB1 and ensure failures on PS4 (with RHIThread disabled) * Move Decal rendertarget manager function implementations out of the header. #jira UE-38879 Change 3210390 on 2016/11/25 by Uriel.Doyon Fixed cubemap resourcesize not taking into account mipgen settings #jira UE-37045 Change 3210407 on 2016/11/25 by Uriel.Doyon "resavepackages" commandlet now supports -buildtexturestreaming that rebuilds the map texture streaming data. That can be used in combination with -buildlighting. Change 3210563 on 2016/11/27 by Rolando.Caloca DR - vk - Integrate cached memory fixes and PF_D24 format fix #jira UE-39025 PR #2974 Change 3210564 on 2016/11/27 by Rolando.Caloca DR - Fix for GL linker PR #2975 #jira UE-39029 Change 3210592 on 2016/11/27 by Rolando.Caloca DR - vk - SM5 fixes Change 3210597 on 2016/11/27 by Rolando.Caloca DR - vk - Prep for staging UB copies to GPU memory Change 3210600 on 2016/11/27 by Rolando.Caloca DR - vk - Extract generic range code Change 3210613 on 2016/11/27 by Rolando.Caloca DR - vk - Added r.Vulkan.SubmitOnDispatch Change 3211054 on 2016/11/28 by Rolando.Caloca DR - vk - Missing reference Change 3211330 on 2016/11/28 by Chris.Bunner Shader compile error for max texture coordinate count on skinned meshes. Change 3211384 on 2016/11/28 by Arne.Schober DR - Enforce move on EnqueueRenderCommand Lambda Change 3211431 on 2016/11/28 by Gil.Gribb Merging //UE4/Dev-Main@3211016 to Dev-Rendering (//UE4/Dev-Rendering) Change 3211738 on 2016/11/28 by Gil.Gribb IWYU fixes after merge Change 3212231 on 2016/11/28 by Richard.Wallis Fix build errors Change 3212253 on 2016/11/28 by Richard.Wallis Remove MacGraphicsSwitching plugin. #jira UE-37640 Change 3212310 on 2016/11/28 by Rolando.Caloca DR - vk - Update glslang to 1.0.33.0 Change 3212446 on 2016/11/28 by Guillaume.Abadie Implements PreviousFrameSwitch material expression Change 3212594 on 2016/11/28 by Arne.Schober DR - Fix missing include Change 3212681 on 2016/11/29 by Rolando.Caloca DR - vk - Auto flush for compute shader Change 3213000 on 2016/11/29 by Gil.Gribb temp fix for PF_MAX Change 3213161 on 2016/11/29 by Ben.Woodhouse Integrate latest D3D12 changes from //depot/Partners/Microsoft/UE4-DX12/...@3211714 Using: - p4 integrate //depot/Partners/Microsoft/UE4-DX12/Engine/Source/Runtime/D3D12RHI/...@3211714 //UE4/Dev-Rendering/Engine/Source/Runtime/D3D12RHI/... - p4 integrate //depot/Partners/Microsoft/UE4-DX12/Engine/Source/ThirdParty/Windows/DirectX/...@3211714 //UE4/Dev-Rendering/Engine/Source/ThirdParty/Windows/DirectX/... - p4 integrate //depot/Partners/Microsoft/UE4-DX12/Engine/Source/Programs/UnrealBuildTool/...@3211714 //UE4/Dev-Rendering/Engine/Source/Programs/UnrealBuildTool/... Changes from UE4-DX12: *** CL 3183818 *** Update D3D12 RHI to 4.14: - Merged changes from Epic up until 10/20/16 - Fixed an issue where command allocators where resetting too early. I changed to aggressive command list batching by default now that more SubmitCommandListHint calls exist in the upper engine, we don't need to worry about starving the GPU. Fewer ExecuteCommandLists calls means better performance and fewer Signals() so this change provides a GPU perf win. I had to fix an issue with aggressive batching where we would sometimes hold on to a command list long enough (in the pending list) but hadn't executed it yet. The command allocator was being put back in the queue of allocators during ReleaseCommandAllocator() without a syncpoint set and was thus being reset too early. I added a simple counter to the command allocator so it could track how many command lists were using it. It doesn't need to be thread safe since only one thread uses a command allocator at a time. I also added some stats around the # command lists and # command allocators since it would be possible to leak command allocators now if it's pending command list count isn't decremented correctly. In that case we'd keep creating new command allocators and eventually run out of memory. -Remove clear during allocate in the FD3D12FastConstantAllocator and FD3D12FastAllocator. The supplied resource locations are assumed to be new and thus don't need to be cleared. -Cleanup D3D12RHI stats. There were some unused stats as well as some missing ones. -Mark shader resource table uniform buffers as dirty only when the shader changes. Cleanup SetComputeShader calls and Dispatch calls to not set/unset the CS for each Dispatch. -Remove unused Check SRV resolved code that epic added to the D3D11 RHI and was brought over. We dont need it and we won't use this. -Remove "always on" cycle counters for high frequency RHI methods like RHISetShaderTexture. These should use the engine's stat macros as they are removed on TEST + SHIPPING builds. On Xbox a significant amount of CPU time is spent in things like QueryPerformanceCounter even when STATS aren't enabled. Currently 1% of an entire capture on XBOX is spent inside this call. I improved and cleaned up high freqency call stacks like: - RHISetShaderTexture - RHISetShaderResourceViewParameter - RHISetShaderParameter - RHISetUAVParameter In general I moved to use templated functions, removed unused parameters, unnecessary copies, etc. -Change D3D12 PIX profiling enable/disable to match Xbox and handle logic in the UEBuildWindows.cs for UBT. Also add a static assert to inform the developer when PIX profiling is requested but the engine is compiling out draw events. -Resources should be associated with the rendering thread's frame that it's currently recording command lists for and they shouldnt be cleaned up until those command lists have been translated to D3D12 command lists on the RHI thread AND completed executing on the GPU. This was confirmed to resolve an issue where CBV resources were being released too early. This work involved a couple changes: 1) Move the "frame" fence to be incremented on the rendering thread (during RHIAdvanceFrameForGetViewportBackBuffer()) so that resources that are deleted from the rendering thread are assosicated with the correct frame count 2) Queue up a command from the rendering thread to signal the "frame" fence. It needs to be queued to ensure that it's signaled at the correct time on the RHI thread (after that frame's command lists have been executed). -Disable GRHIRequiresEarlyBackBufferRenderTarget. Metal/Vulkan/Xbox11.x already do this. This is used by the Slate renderer during BeginRenderFrame and avoids a SetRenderTargets call. -Enable GRHISupportsMSAADepthSampleAccess (used in the Editor). This was enabled for D3D11 on SM5, but not for D3D12. -Delay load D3D12.dll and add root signature 1.1 support. -Add explicit flush calls to improve resource barrier batching instead of implict flushes inside FConditionalScopeResourceBarrier and FScopeResourceBarrier. Also update those classes with const members. *** CL 3183824 *** Fix the D3D12 RHI after integrating UE 4.14 updates: - Fixed a bug where we would try to get the PSO of a nullptr in SetPipelineState if we needed to reset the current PSO on the cmd list. - Fixed a spelling error - Removed the need for bForceState, we use dirty bits now *** CL 3183830 *** - GetDebugFlags RHI extension, needed by XB1 movie player. - Only query memory info if stats are enabled - Add support for the engine's new RHISubmitCommandsAndFlushGPU function - Update CommitPendingPipelineState to be Graphics/Compute specific and avoid the need for a IsCompute parameter. *** CL 3183837 *** Made PipelineState caches contain pointers to FD3D12PipelineState objects to avoid issues with using pointers to after Find/Add to the maps. TMap indicates that the pointer to the value associated with a key "is only valid until the next change to any key in the map." The lifetime of the PSO pointers is managed by the low level caches (graphics and compute). Added stat for the number of Pipeline State Objects. *** CL 3183931 *** Update Windows D3D12 headers and libs to RS1 release bits (10.0.14393.0) *** CL 3183978 *** Update UBT Windows build settings: - Change D3D12 PIX profiling enable/disable to match Xbox and handle logic in the UEBuildWindows.cs for UBT. Also add a static assert to inform the developer when PIX profiling is requested but the engine is compiling out draw events. -Delay load D3D12.dll and add root signature 1.1 support. *** CL 3184132 *** Fix Xbox PSO cache code where it could leak PSOs. Related to change 3183837. *** Changelist 3211714 *** Update D3D12 RHI with fixes: - Check if we can reserve slots in GatherUniqueSamplerTables - DirtyState more often in StateCache - Remove InternalSetSamplerState. The alternate function isn't used. - Allow MRTClear for arrays with holes in them - Fix uninitialized descriptors. This was causing a GPU hang on Xbox. We need to set dirty bits for resources bound to slots outside of the current descriptor table's range - Cleanup SetDescriptorHeap code. Move setting descriptor heap logic to the descriptor cache since it also owns things like the sampler maps. Added members to the descriptor cache to track the last heaps that were set on the command list to avoid dirtying bit unnecessarily. - Resource transitions: go through Common between queues (3D <--> Compute) - Fix initial state for placed resources. - Merging epic Change 3213250 on 2016/11/29 by Chris.Bunner GBufferHints tooltip fix. #jira UE-39103 Change 3213345 on 2016/11/29 by Gil.Gribb more IWYU fallout Change 3213676 on 2016/11/29 by Rolando.Caloca DR - Fix incorrect texture getting cleared Change 3213728 on 2016/11/29 by Rolando.Caloca DR - Lambda-ize Change 3214461 on 2016/11/29 by Ben.Woodhouse Rollout August QFE4 XDK (required for latest DX12 changes on XB1) Change 3215317 on 2016/11/30 by Daniel.Wright PS4 compile fix Change 3216343 on 2016/11/30 by Arne.Schober DR - UE-39155 - after talking to Brian it occurred to us that flipping the world space normal is non sensical. And indeed the Grass was using world space normals. Change 3216844 on 2016/12/01 by Ben.Woodhouse Fix for static analysis warnings after discussion with Microsoft Change 3216916 on 2016/12/01 by Gil.Gribb Merging //UE4/Dev-Main@3216539 to Dev-Rendering (//UE4/Dev-Rendering) Change 3217385 on 2016/12/01 by Arne.Schober DR - UE-39218, UE-39221, UE-39224 and potentially UE-39214 - The Stencil bits for Light channels and decal application were not set in the dynamic basepass Change 3217464 on 2016/12/01 by Ben.Woodhouse Fix for reflection capture resize assert. The assert is only valid in cooked builds, so disable it in editor #jira UE-39225 Change 3217534 on 2016/12/01 by Arne.Schober DR - Fix Merge conflict Change 3217581 on 2016/12/01 by Rolando.Caloca DR - Fix assert on debug Change 3217741 on 2016/12/01 by Benjamin.Hyder Duplicate audio fix. Change 3217890 on 2016/12/01 by Rolando.Caloca DR - Fix widget not rendering properly when hidden #jira UE-39221 Change 3218129 on 2016/12/01 by Arne.Schober DR - UE-39214 - Lod dither value as accidently cached accross the static draw list. Change 3218759 on 2016/12/02 by Guillaume.Abadie Fixes editor compositing bug caused by alpha through post processing change 3209903 #jira UE-39221 [CL 3219854 by Marcus Wassmer in Main branch]
2016-12-02 16:43:04 -05:00
}
ClipmapViewState.LastPartialUpdateOrigin = GridCenter;
}
const FVector Center = FVector(ClipmapViewState.LastPartialUpdateOrigin) * CellSize;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
const FGlobalDFCacheType StartCacheType = GAOGlobalDistanceFieldCacheMostlyStaticSeparately ? GDF_MostlyStatic : GDF_Full;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
for (uint32 CacheType = StartCacheType; CacheType < GDF_Num; CacheType++)
{
FGlobalDistanceFieldClipmap& Clipmap = *(CacheType == GDF_MostlyStatic
? &GlobalDistanceFieldInfo.MostlyStaticClipmaps[ClipmapIndex]
: &GlobalDistanceFieldInfo.Clipmaps[ClipmapIndex]);
// Setup clipmap properties from view state exclusively, so we can skip updating on some frames
Clipmap.RenderTarget = ClipmapViewState.Cache[CacheType].VolumeTexture;
Clipmap.Bounds = FBox(Center - Extent, Center + Extent);
// Scroll offset so the contents of the global distance field don't have to be moved as the camera moves around, only updated in slabs
Clipmap.ScrollOffset = FVector(ClipmapViewState.LastPartialUpdateOrigin - ClipmapViewState.FullUpdateOrigin) * CellSize;
}
Copying //UE4/Dev-Rendering to Dev-Main (Source //UE4/Dev-Rendering@2932636) #lockdown nick.penwarden ========================== MAJOR FEATURES + CHANGES ========================== Change 2917472 on 2016/03/21 by Rolando.Caloca DR - Fix SCW directcompile arguments, add -pipeline Change 2919580 on 2016/03/23 by Rolando.Caloca DR - HlslParser - Fix for used elements (sparrow's arrow was showing when it shouldn't) Arrays of input/outputs are now flattened so disjoint entries can be optimized out (and fixes a bug) #jira OR-15380 #tests Run game with sparrow, test with slomo to check for gfx glitches Change 2919660 on 2016/03/23 by Rolando.Caloca DR - Latest vk changes (from dev mobile's 2916881 to 2919157) Change 2919902 on 2016/03/23 by Rolando.Caloca DR - Fix skeletal meshes decrementing stats twice #codereview Marcus.Wassmer #jira UE-28478 Change 2920020 on 2016/03/23 by David.Hill #Jira UE-28503 EyeAdapation when used in material shader may not be initialized. #rb olaf.piesche Change 2920071 on 2016/03/23 by Rolando.Caloca DR - Remove old vk define - Started moving around direct calls to queue submit Change 2920252 on 2016/03/23 by Rolando.Caloca DR - Changes vk structs to classes Change 2920314 on 2016/03/23 by Olaf.Piesche Add -windowed to standalone game PIE command line to avoid PIE launching in full screen #jira UE-27870 #codereview michael.trepka Change 2920745 on 2016/03/24 by Uriel.Doyon Texture streaming build now takes into account the material texcoord scales applied to the texture sampling. Also finds out which texcoord is being used when sampling textures (between 0 and 3 currently). TexCoord analysis debug view shaders is now working with SM4 ane SM5. StaticMeshComponents hold persistent data coming from the texture streaming build. #tests tested with different Paragon assets. Editor SM4 & SM5. Cooked maps #codereview marcus.wassmer Change 2921335 on 2016/03/24 by Uriel.Doyon Added missing static keyword for locally defined console variable. #codereview rolando.caloca Change 2921416 on 2016/03/24 by Uriel.Doyon Revert enabling debugview shaders on non PC platforms (until properly tested and debugged) Change 2921446 on 2016/03/24 by Daniel.Wright Planar reflection mesh Change 2921530 on 2016/03/24 by Daniel.Wright Manual revert of Ronin planar reflections #codereview Ryan.Vance Change 2921608 on 2016/03/24 by Uriel.Doyon Updated texture streamer to take into account the new HLOD texture group. Change 2921677 on 2016/03/24 by Daniel.Wright Distance Field Specular Occlusion * Prototype - disabled by default Change 2921681 on 2016/03/24 by Daniel.Wright UnmappedTexelsPercentage is now 100 based Change 2921682 on 2016/03/24 by Daniel.Wright Planar reflections * New actor and component * The scene is rendered to texture with a mirrored camera and a clip plane each frame * The reflection texture is then applied to opaque pixels in a deferred pass, with distance and angle from plane fades * Translucent materials apply the nearest reflection plane in the base pass * Planar reflections require the project setting 'Support global clip plane for Planar Reflections' to be enabled, since writing to SV_ClipDistance all the time adds about 15% BasePass GPU time on PS4 * Fixed global distance field in materials which had been broken since moving global distance field properties into the view uniform buffer * Fixed PS4 removing system-value semantics when output from vertex shader and not read in next stage Change 2921734 on 2016/03/24 by Uriel.Doyon Fixed tessellated cube having wrong UVs #jira UE-28379 Change 2922063 on 2016/03/24 by Daniel.Wright Removed planar reflection debug code Change 2922428 on 2016/03/25 by Chris.Bunner Delete FShaderPipeline objects when clearing TMaterialShaderMaps. #rb Rolando.Caloca #jira UE-28621 Change 2922803 on 2016/03/25 by Rolando.Caloca DR - New cmd buffer management (disabled) - Move cmd buffer out of pending state and into context - Do not hardcode # cmd buffers - Move back buffer image mgmt into swapchain - Fixed some image layout transition bugs Change 2923056 on 2016/03/25 by Rolando.Caloca DR - Initial fix for canvas locking inside a render pass [CL 2932649 by Gil Gribb in Main branch]
2016-04-04 18:44:59 -04:00
ClipmapViewState.CachedMaxOcclusionDistance = MaxOcclusionDistance;
Copying //UE4/Dev-Rendering to Dev-Main (//UE4/Dev-Main) (Source: //UE4/Dev-Rendering @ 2943238) #lockdown nick.penwarden ========================== MAJOR FEATURES + CHANGES ========================== Change 2932679 on 2016/04/04 by Martin.Mittring remove hack/cvar that is not longer needed as we fixed the bug #rb:Bob.Tellez #code_review:Bob.Tellez Change 2932681 on 2016/04/04 by Martin.Mittring fixed cvars in consolevariables.ini can affect engine even if marked with cheat (no longer load consolevariables.ini in shipping and test), unified 3 code path, added testcase, cvars with cheat in ini file other than consolevariables.ini now trigger ensure, =on/off/true/false/.. works in all ini files, added enure if non scalability setting are used in ScalabilityIni (get now ignored) #rb:David.Hill #code_review:Marcus.Wassmer, Michael.Noland Change 2932719 on 2016/04/04 by Marcus.Wassmer Merge 3 band SH back to DevRendering #rb Daniel.Wright Change 2932760 on 2016/04/04 by Zabir.Hoque Migrating high resolution cubemaps for skylight and reflection probes. #rb: Daniel.Wright Change 2933121 on 2016/04/05 by Rolando.Caloca DR - vk - Fix free blocks not getting joined - Fix compile issue Change 2933122 on 2016/04/05 by Rolando.Caloca DR - Do not shorten dumped shaders path Change 2933126 on 2016/04/05 by Rolando.Caloca DR - vk - Index Buffers using new resource management Change 2933127 on 2016/04/05 by Rolando.Caloca DR - vk - Extract multibuffer off index buffer Change 2933131 on 2016/04/05 by Rolando.Caloca DR - vk - Transition to vb's using mutlibuffer Change 2933136 on 2016/04/05 by Rolando.Caloca DR - vk - Change staging buffers to use resource allocation system - Fix free block not getting joined - Remove define Change 2933140 on 2016/04/05 by Rolando.Caloca DR - vk - 'static' textures now use resource mgmt - Release free pages back to the OS - Remove ensure Change 2933152 on 2016/04/05 by Rolando.Caloca DR - vk - Fix aliasing granularity - Fix renderpass end/copy buffer ensure Change 2933155 on 2016/04/05 by Rolando.Caloca DR - SCW - Fix for -directcompile to directly load file for preprocessor Change 2933158 on 2016/04/05 by Rolando.Caloca DR - hlslcc - Error on Metal if trying to R & W on RWTextures - Fix indices on RW reads to be unsigned #codereview Mark.Satterthwaite, Michael.Trepka Change 2933169 on 2016/04/05 by Rolando.Caloca DR - vk - Move header to public to match changes on DevMobile Change 2933173 on 2016/04/05 by David.Hill Deferred decal rendering with negative scale #rb:Matrin.Mittring #jira:UE-27389 Change 2933273 on 2016/04/05 by Rolando.Caloca DR - vk - Fix renderdoc markers Change 2933274 on 2016/04/05 by Rolando.Caloca DR - Support for -AttachDebugger Change 2933316 on 2016/04/05 by Rolando.Caloca DR - vk - Compile fix whene enabling define Change 2933334 on 2016/04/05 by Rolando.Caloca DR - Compile fix #codereview Martin.Mittring Change 2933805 on 2016/04/05 by Brian.Karis Temporal AA dynamic antighosting. Fixed DOF Change 2933811 on 2016/04/05 by Brian.Karis Fixed area light NaNs. Improvements to area lights. Horizen handling for wrap around. Change 2933812 on 2016/04/05 by Brian.Karis Fixed fresnel on SSS skin. Change 2933813 on 2016/04/05 by Brian.Karis Tessellation fix Change 2933816 on 2016/04/05 by Brian.Karis Improved forward shading support [CL 2943241 by Gil Gribb in Main branch]
2016-04-13 21:24:38 -04:00
ClipmapViewState.CachedGlobalDistanceFieldViewDistance = Scene->GlobalDistanceFieldViewDistance;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
ClipmapViewState.CacheMostlyStaticSeparately = GAOGlobalDistanceFieldCacheMostlyStaticSeparately;
}
}
else
{
for (int32 ClipmapIndex = 0; ClipmapIndex < NumClipmaps; ClipmapIndex++)
{
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
const FGlobalDFCacheType StartCacheType = GAOGlobalDistanceFieldCacheMostlyStaticSeparately ? GDF_MostlyStatic : GDF_Full;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
for (uint32 CacheType = StartCacheType; CacheType < GDF_Num; CacheType++)
{
FGlobalDistanceFieldClipmap& Clipmap = *(CacheType == GDF_MostlyStatic
? &GlobalDistanceFieldInfo.MostlyStaticClipmaps[ClipmapIndex]
: &GlobalDistanceFieldInfo.Clipmaps[ClipmapIndex]);
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
AllocateClipmapTexture(RHICmdList, ClipmapIndex, (FGlobalDFCacheType)CacheType, Clipmap.RenderTarget);
Clipmap.ScrollOffset = FVector::ZeroVector;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
const float Extent = ComputeClipmapExtent(ClipmapIndex, Scene);
FVector Center = View.ViewMatrices.GetViewOrigin();
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
const float CellSize = (Extent * 2) / GAOGlobalDFResolution;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
FIntVector GridCenter;
GridCenter.X = FMath::FloorToInt(Center.X / CellSize);
GridCenter.Y = FMath::FloorToInt(Center.Y / CellSize);
GridCenter.Z = FMath::FloorToInt(Center.Z / CellSize);
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
Center = FVector(GridCenter) * CellSize;
FBox ClipmapBounds(Center - Extent, Center + Extent);
Clipmap.Bounds = ClipmapBounds;
FVolumeUpdateRegion UpdateRegion;
UpdateRegion.Bounds = ClipmapBounds;
UpdateRegion.CellsSize = FIntVector(GAOGlobalDFResolution);
Clipmap.UpdateRegions.Add(UpdateRegion);
}
}
}
GlobalDistanceFieldInfo.UpdateParameterData(MaxOcclusionDistance);
}
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3091903) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3072947 on 2016/08/01 by Uriel.Doyon Texture GUIDs are now included in cooked builds, as they are required by the texture streamer to link build data to in game texture. #review-3072934 @marcus.wassmer #jira UE-34045 Change 3073301 on 2016/08/02 by Ben.Woodhouse Fix for large spotlight culling precision issues, reported on UDN by Aaron Jacobs at Double Fine. For a full description, see the UDN post https://udn.unrealengine.com/questions/305440/shadowed-light-flicker-caused-by-floating-point-pr.html #jira UE-34052 Change 3073689 on 2016/08/02 by Ben.Woodhouse Improved skin postprocess - support for full resolution, with diffuse/spec lighting combined into single RGBA (sharing chroma) Full res lighting gives less temporal AA flickering, sharper diffuse and specular lighting in the surface (since this is now at full resolution), faster postprocessing if using a 64-bit rendertarget (on NV 980Ti). Checkerboard rendering is controlled via the r.sss.checkerboard cvar. - 0 is off/full res, 1 is checkerboard, 2 is automatic based on scenecolor (non-checkerboard requires 64bit or more rendertarget w/separate alpha) Tested/profiled on PC, PS4 Change 3074666 on 2016/08/02 by Daniel.Wright Fixed stationary skylight brightness Change 3074667 on 2016/08/02 by Daniel.Wright Fixed r.ReflectionEnvironmentLightmapMixing Change 3074687 on 2016/08/02 by Daniel.Wright Disallowed DrawMaterialToRenderTarget and Begin/EndDrawCanvasToRenderTarget in construction scripts, since they don't work in game. Blutilities can be used to do blueprint rendering in the editor. Change 3075241 on 2016/08/03 by Rolando.Caloca DR - Fix linux compile issue & static analysis warning Change 3075746 on 2016/08/03 by Daniel.Wright Removed bOverride_AntiAliasingMethod and outdated ini references to PP AntiAliasingMethod Change 3075783 on 2016/08/03 by Ryan.Brucks #code.review Marcus.Wassmer Added two material nodes that return Atmospheric Light Vector and Light Direction using: View.AtmosphericFogSunColor View.AtmosphericFogSunDirection Nodes are called: AtmosphericLightVector AtmosphericLightColor Also changed SceneRendering.cpp so that values will be grabbed from directional lights without needing an Atmospheric Fog actor in the scene. Change 3075969 on 2016/08/03 by Uriel.Doyon Material GUIDs are not updated anymore when parents or textures change. Lighting now uses a hash built from the list of parents, textures and shader functions. #review-3072980 @marcus.wassmer @daniel.wright Change 3076116 on 2016/08/03 by Ryan.Brucks #code.review marcus.wassmer Fixed typo in the Caption of new Nodes "Atmospheric Light Vector" and "Atmospheric Light Color" Change 3076456 on 2016/08/03 by Rolando.Caloca DR - Fix geometry shader gl_Layer for SPIR-V Change 3076730 on 2016/08/03 by Uriel.Doyon Added user warning logic for the texture streaming build. Ran in MapCheck, BeginPlay and PreSave. #review-3072984 @marcus.wassmer Change 3077616 on 2016/08/04 by Daniel.Wright Planar reflection show flags can now be edited Change 3077621 on 2016/08/04 by Daniel.Wright Changed default Planar Reflection DistanceFromPlaneFadeoutEnd from 600 to 100, which reduces artifacts and is a more intuitive initial setting Change 3077792 on 2016/08/04 by Daniel.Wright Fixed an unnecessary sky capture caused by the sky light component owned by the default ASkyLight Change 3077799 on 2016/08/04 by Daniel.Wright Skip RF_ArchetypeObject for reflection captures Change 3077876 on 2016/08/04 by Marc.Olano Noise material perf improvements Change random number generator for Gradient-ALU (1.7x perf boost), improve speed of Voronoi noise quality level 3. Removes integer BBS random number generators. Fewer instructions, but too slow to use (see 1.7x perf boost above) Change 3077884 on 2016/08/04 by Daniel.Wright Lighting channels can now be edited on components with static mobility, since dynamic lights can still affect them Change 3078994 on 2016/08/05 by Simon.Tovey Fix for UE-34241 Scene proxy ptr was being cached during a downcast. Inside a call to CreateDynamicData, CheckMaterialUsage_Concurrent() was causing the scene proxy to be recreated an so the cached ptr was stale. I've fixed the immediate issue but recreating the scene proxy here doesn't seem great. Maybe CheckMaterailUsage() should be rethought a bit. Change 3079162 on 2016/08/05 by Ben.Woodhouse Fix for jittering in Paper2D. Was caused by override being ignored due to a change in intiialization order for AA settings. #jira UE-34091 Change 3079613 on 2016/08/05 by Daniel.Wright New blueprint function ClearRenderTarget2D, which is the only way to set a render target alpha directly New blueprint function CreateRenderTarget2D Change 3079708 on 2016/08/05 by Uriel.Doyon Fixed crash when building texture streaming on some levels. Change 3079795 on 2016/08/05 by Uriel.Doyon Fixed issue with instanced static meshes when building texture streaming. Fixed typo with func "GetNumTextureStreamingPrimitives" Change 3079806 on 2016/08/05 by Uriel.Doyon Enabled PerTexture MipBias. The per texture mip bias now resets to 0 when the texture gets required at low resolution. New scalability setting named "r.Streaming.LimitPoolSizeToVRAM" enabling the PoolSize to be limited the available VRAM (according to GPoolSizeVRAMPercentage) #review-3074662 @marcus.wassmer Change 3082698 on 2016/08/09 by Daniel.Wright Copy - CreateRenderTarget2D uses a world context object as owner, allows use in a construction script Change 3082699 on 2016/08/09 by Daniel.Wright Changed display name for 'Two Sided' shading model to 'Two Sided Foliage' to make it clear what it's intended to be used for Change 3083909 on 2016/08/10 by Olaf.Piesche #jira UE-34106 #jira UE-32784 #jira UE-31198 Reset vertex factories on mesh emitters if mesh has been reimported (if mesh package is dirty) Change 3084645 on 2016/08/10 by Olaf.Piesche #jira UE-30398 Fix offset added to particle collision locations. Change 3084709 on 2016/08/10 by Daniel.Wright Copy - Scene capture alpha is now inverted to match DrawMaterialToRenderTarget, and to allow compositing with existing render target contents Added CompositeMode to SceneCapture2D, which can be used to addively accumulate or composite instead of the default overwrite behavior Added bCaptureOnMovement to SceneCapture, which can be disabled so the only source of scene capturing is a manual capture by calling CaptureScene() Change 3084783 on 2016/08/10 by Rolando.Caloca DR - Use the first targeted rhi shader platform as the initial RHI to load on Windows #jira UE-34510 Change 3084958 on 2016/08/10 by Daniel.Wright Copy - Reverted cl 2938543 "Lightmass now respects owner bHidden, and bCastHiddenShadow" because it did not have backwards compatibility so breaks content using hidden light cards Change 3086023 on 2016/08/11 by Marcus.Wassmer Merging //UE4/Dev-Main@3085468 to Dev-Rendering (//UE4/Dev-Rendering) #test none Change 3086778 on 2016/08/11 by Ben.Woodhouse Workaround for fortnite character rendering issue. Enable checkerboard rendering by default until we can fix properly #jira UE-34561 Change 3087404 on 2016/08/12 by Rolando.Caloca DR - Upgrade glslang to 1.0.21.1 - Added some more debug output Change 3087524 on 2016/08/12 by Rolando.Caloca DR - vk - Fixed StencilRef, fixed size of RHIReadSurfaceFloatData (but still returns dummy data) Change 3087663 on 2016/08/12 by Rolando.Caloca DR - vk - Fix for SRGB; support for mip texture views Change 3087735 on 2016/08/12 by Daniel.Wright TextureRenderTarget2D's can now be up to 8192^2. Anything over 2048 pops up an 'are you sure' dialog. Change 3087750 on 2016/08/12 by Rolando.Caloca DR - vk - Minor renaming in prep for merge Change 3087813 on 2016/08/12 by Rolando.Caloca DR - vk - More minor cleanup Change 3087819 on 2016/08/12 by Chris.Bunner Check material function input types directly, no need to traverse connected graph. #jira UE-32134 Change 3087901 on 2016/08/12 by Rolando.Caloca DR - vk - Fix RT view to use 1 mip Fix depth buffer component swizzle Change 3088193 on 2016/08/12 by Daniel.Wright DFAO and RTDF shadows are enabled in High and Epic scalability settings by default Change 3088988 on 2016/08/15 by Rolando.Caloca DR - Add Accessors Change 3089104 on 2016/08/15 by Olaf.Piesche #jira UE-34241 Sceneproxy can be nullptr in FDynamicMeshEmitterData::Init if the proxy is being recreated Change 3089208 on 2016/08/15 by Daniel.Wright Downsampled separate translucency uses a separate view uniform buffer with correct buffer sizes * Fixes WorldPosition in downsampled translucency * View uniform buffer parameters are now cached on the view, to allow recreating the uniform buffer without having to rebuild the entire struct. Currently used by global distance field, downsampled separate translucency. * Fixed the downsampled translucency depth buffer being full res used together with a smaller color target, now they are both the downsampled res Change 3089209 on 2016/08/15 by Daniel.Wright Fixed atmospheric fog on translucency Change 3089457 on 2016/08/15 by Daniel.Wright Fixed lighting build failure from UMaterialInstanceDynamic assigned to a mesh that's being exported to Lightmass. The Swarm cache entry is created using the parent's guid, causing multiple MID's with the same parent to acquire a file handle multiple times which fails after the first. Change 3089549 on 2016/08/15 by Daniel.Wright UMaterialInterface initializes LightingGuid to something valid - causes UMaterialInstanceDynamic to have a valid LightingGuid so they can be used in lighting builds Change 3089703 on 2016/08/15 by Daniel.Wright Custom expression fixup for View.RenderTargetSize Change 3090546 on 2016/08/16 by Daniel.Wright Hopeful fix for recycled snapshot view crash Change 3091202 on 2016/08/16 by Daniel.Wright Manually clear FViewInfo::CachedViewUniformShaderParameters on creating a snapshot, since memcpy is used to create the snapshot view [CL 3091931 by Gil Gribb in Main branch]
2016-08-17 11:38:13 -04:00
void FViewInfo::SetupDefaultGlobalDistanceFieldUniformBufferParameters(FViewUniformShaderParameters& ViewUniformShaderParameters) const
{
// Initialize global distance field members to defaults, because View.GlobalDistanceFieldInfo is not valid yet
for (int32 Index = 0; Index < GMaxGlobalDistanceFieldClipmaps; Index++)
{
ViewUniformShaderParameters.GlobalVolumeCenterAndExtent_UB[Index] = FVector4(0);
ViewUniformShaderParameters.GlobalVolumeWorldToUVAddAndMul_UB[Index] = FVector4(0);
}
ViewUniformShaderParameters.GlobalVolumeDimension_UB = 0.0f;
ViewUniformShaderParameters.GlobalVolumeTexelSize_UB = 0.0f;
ViewUniformShaderParameters.MaxGlobalDistance_UB = 0.0f;
Copying //UE4/Dev-Mobile to //UE4/Dev-Main (Source: //UE4/Dev-Mobile @ 3116515) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3065209 on 2016/07/26 by Steve.Cano Add Android Texture Format used for packaging/cooking to the Manifest File #ue4 #android #jira UE-33645 Change 3068915 on 2016/07/28 by Steve.Cano Add an additional Texture Compression support line in the manifest for DXT #ue4 #android #jira UE-33645 Change 3075911 on 2016/08/03 by Steve.Cano Make the "Running {ProjectName} on {Device}" toast stay up when launching a game to an Android device until we've finished running it, as it does on other platforms. This logic already existed but only ran if the "Prebuilt" flag was passed in, however we want this to always run now. Re-writing to run through waiting on each process to finish for each device launched on #jira UE-3122 #ue4 #android Change 3080981 on 2016/08/08 by Steve.Cano Clear any input before removing the TouchInterface from the screen to prevent infinite input after it is cleared #jira UE-33956 #ue4 #platform Change 3092587 on 2016/08/17 by Steve.Cano Adding "IsGamepadAttached" functionality to Android Application #jira UE-33264 #ue4 #android Change 3095840 on 2016/08/21 by Dmitriy.Dyomin Fixed: Particle Cutout Crashes On Certain Devices (Samsung Galaxy Note 2) Happens only with non-instanced path #jira UE-34604 Change 3095855 on 2016/08/22 by Dmitriy.Dyomin Allow UWorldComposition::GetTilesList to be used in runtime code Licensee request https://udn.unrealengine.com/questions/307586/world-compositions-world-dimensions.html Change 3096093 on 2016/08/22 by Allan.Bentham Allow Vulkan api logging on android Change 3096361 on 2016/08/22 by Steve.Cano Github 2663 pull - Pass any extras used to launch SplashActivity down to GameActivity (Contributed by sangpan) #jira UE-34050 #github #2663 #ue4 #android Change 3097412 on 2016/08/23 by Dmitriy.Dyomin Using BulkSerialize for cooked collision data to speed up serialization Change 3098957 on 2016/08/23 by Jack.Porter Merging //UE4/Dev-Main to Dev-Mobile (//UE4/Dev-Mobile) Change 3099058 on 2016/08/24 by Jack.Porter Check EXT_debug_label and EXT_debug_marker functions were found before calling as a few devices to not implement these extensions #UE-35087 Change 3099131 on 2016/08/24 by Dmitriy.Dyomin Fixed: HDR compressed texture become black in some mali devices Use sized internal format for half-float textures on ES3 devices, as ES3 spec expects it #jira UE-35018 Change 3099150 on 2016/08/24 by Dmitriy.Dyomin Enable HALF_FLOAT and UNSIGNED_INT_2_10_10_10_REV vertex formats on ES3+ devices, spec req Change 3102252 on 2016/08/26 by Dmitriy.Dyomin Prevent view uniform buffer crash on ES2 devices that do not support 3D textures Change 3102258 on 2016/08/26 by Dmitriy.Dyomin Enabled refraction on iPadMini4 #jira UE-35079 Change 3102651 on 2016/08/26 by Dmitriy.Dyomin Fixed: instanced static mesh world normals Also removed unnecessary instance matrix transposing #jira UE-35075 Change 3109397 on 2016/09/01 by Jack.Porter Fix problem with Android sessions not appearing in Session Frontend #jira UE-35261 Change 3109490 on 2016/09/01 by Jack.Porter Merging //UE4/Dev-Main to Dev-Mobile (//UE4/Dev-Mobile) Change 3111628 on 2016/09/02 by Jack.Porter Landscape wireframe LOD visualization with tessellation Change 3112809 on 2016/09/02 by Chris.Babcock Update cached length when file is written to on Android #jira UE-35558 #ue4 #android Change 3113245 on 2016/09/04 by Dmitriy.Dyomin Fixed: Subway Sequencer plays only a black screen when packaged for ESDSR (3.1+AEP) #jira UE-34291 Change 3113249 on 2016/09/04 by Dmitriy.Dyomin Replicated fix from 4.13: GPU particles no longer work on iOS or TVOS Metal devices #jira UE-34782 Change 3113513 on 2016/09/05 by Allan.Bentham Add vulkan version parameter to android device profile selector's source inputs . reinstate Vulkan Disable cvar functionality. Added mali no vulkan device profile. Change 3113519 on 2016/09/05 by Allan.Bentham Remove temp 4.13 hack to avoid public header changes. Change 3113535 on 2016/09/05 by Allan.Bentham Decode 32 bit HDR formats when using scene captures. #jira UE-25444 Change 3113813 on 2016/09/06 by Dmitriy.Dyomin Resend to server sub-levels visibility state right after world actors are initialized. During seamless travel client loads always-loaded sub-levels before world actors are initialized and ServerUpdateLevelVisibility calls in UWorld::AddToWorld are skipped. Change 3113870 on 2016/09/06 by Jack.Porter Fix issue with ES2 Feature Level preview and Mobile Preview PIE not limiting materials to 8 textures #jira UE-35591 Change 3115031 on 2016/09/06 by Chris.Babcock Add Vulkan version code not moved over from 4.13.1 #jira UE-35642 #ue4 #android Change 3115496 on 2016/09/07 by Dmitriy.Dyomin Use the same DDC key for source reflection data and encoded data. Fixes broken reflections on mobile #jira UE-35647 [CL 3116720 by Chris Babcock in Main branch]
2016-09-07 17:04:11 -04:00
ViewUniformShaderParameters.GlobalDistanceFieldTexture0_UB = OrBlack3DIfNull(GBlackVolumeTexture->TextureRHI.GetReference());
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3091903) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3072947 on 2016/08/01 by Uriel.Doyon Texture GUIDs are now included in cooked builds, as they are required by the texture streamer to link build data to in game texture. #review-3072934 @marcus.wassmer #jira UE-34045 Change 3073301 on 2016/08/02 by Ben.Woodhouse Fix for large spotlight culling precision issues, reported on UDN by Aaron Jacobs at Double Fine. For a full description, see the UDN post https://udn.unrealengine.com/questions/305440/shadowed-light-flicker-caused-by-floating-point-pr.html #jira UE-34052 Change 3073689 on 2016/08/02 by Ben.Woodhouse Improved skin postprocess - support for full resolution, with diffuse/spec lighting combined into single RGBA (sharing chroma) Full res lighting gives less temporal AA flickering, sharper diffuse and specular lighting in the surface (since this is now at full resolution), faster postprocessing if using a 64-bit rendertarget (on NV 980Ti). Checkerboard rendering is controlled via the r.sss.checkerboard cvar. - 0 is off/full res, 1 is checkerboard, 2 is automatic based on scenecolor (non-checkerboard requires 64bit or more rendertarget w/separate alpha) Tested/profiled on PC, PS4 Change 3074666 on 2016/08/02 by Daniel.Wright Fixed stationary skylight brightness Change 3074667 on 2016/08/02 by Daniel.Wright Fixed r.ReflectionEnvironmentLightmapMixing Change 3074687 on 2016/08/02 by Daniel.Wright Disallowed DrawMaterialToRenderTarget and Begin/EndDrawCanvasToRenderTarget in construction scripts, since they don't work in game. Blutilities can be used to do blueprint rendering in the editor. Change 3075241 on 2016/08/03 by Rolando.Caloca DR - Fix linux compile issue & static analysis warning Change 3075746 on 2016/08/03 by Daniel.Wright Removed bOverride_AntiAliasingMethod and outdated ini references to PP AntiAliasingMethod Change 3075783 on 2016/08/03 by Ryan.Brucks #code.review Marcus.Wassmer Added two material nodes that return Atmospheric Light Vector and Light Direction using: View.AtmosphericFogSunColor View.AtmosphericFogSunDirection Nodes are called: AtmosphericLightVector AtmosphericLightColor Also changed SceneRendering.cpp so that values will be grabbed from directional lights without needing an Atmospheric Fog actor in the scene. Change 3075969 on 2016/08/03 by Uriel.Doyon Material GUIDs are not updated anymore when parents or textures change. Lighting now uses a hash built from the list of parents, textures and shader functions. #review-3072980 @marcus.wassmer @daniel.wright Change 3076116 on 2016/08/03 by Ryan.Brucks #code.review marcus.wassmer Fixed typo in the Caption of new Nodes "Atmospheric Light Vector" and "Atmospheric Light Color" Change 3076456 on 2016/08/03 by Rolando.Caloca DR - Fix geometry shader gl_Layer for SPIR-V Change 3076730 on 2016/08/03 by Uriel.Doyon Added user warning logic for the texture streaming build. Ran in MapCheck, BeginPlay and PreSave. #review-3072984 @marcus.wassmer Change 3077616 on 2016/08/04 by Daniel.Wright Planar reflection show flags can now be edited Change 3077621 on 2016/08/04 by Daniel.Wright Changed default Planar Reflection DistanceFromPlaneFadeoutEnd from 600 to 100, which reduces artifacts and is a more intuitive initial setting Change 3077792 on 2016/08/04 by Daniel.Wright Fixed an unnecessary sky capture caused by the sky light component owned by the default ASkyLight Change 3077799 on 2016/08/04 by Daniel.Wright Skip RF_ArchetypeObject for reflection captures Change 3077876 on 2016/08/04 by Marc.Olano Noise material perf improvements Change random number generator for Gradient-ALU (1.7x perf boost), improve speed of Voronoi noise quality level 3. Removes integer BBS random number generators. Fewer instructions, but too slow to use (see 1.7x perf boost above) Change 3077884 on 2016/08/04 by Daniel.Wright Lighting channels can now be edited on components with static mobility, since dynamic lights can still affect them Change 3078994 on 2016/08/05 by Simon.Tovey Fix for UE-34241 Scene proxy ptr was being cached during a downcast. Inside a call to CreateDynamicData, CheckMaterialUsage_Concurrent() was causing the scene proxy to be recreated an so the cached ptr was stale. I've fixed the immediate issue but recreating the scene proxy here doesn't seem great. Maybe CheckMaterailUsage() should be rethought a bit. Change 3079162 on 2016/08/05 by Ben.Woodhouse Fix for jittering in Paper2D. Was caused by override being ignored due to a change in intiialization order for AA settings. #jira UE-34091 Change 3079613 on 2016/08/05 by Daniel.Wright New blueprint function ClearRenderTarget2D, which is the only way to set a render target alpha directly New blueprint function CreateRenderTarget2D Change 3079708 on 2016/08/05 by Uriel.Doyon Fixed crash when building texture streaming on some levels. Change 3079795 on 2016/08/05 by Uriel.Doyon Fixed issue with instanced static meshes when building texture streaming. Fixed typo with func "GetNumTextureStreamingPrimitives" Change 3079806 on 2016/08/05 by Uriel.Doyon Enabled PerTexture MipBias. The per texture mip bias now resets to 0 when the texture gets required at low resolution. New scalability setting named "r.Streaming.LimitPoolSizeToVRAM" enabling the PoolSize to be limited the available VRAM (according to GPoolSizeVRAMPercentage) #review-3074662 @marcus.wassmer Change 3082698 on 2016/08/09 by Daniel.Wright Copy - CreateRenderTarget2D uses a world context object as owner, allows use in a construction script Change 3082699 on 2016/08/09 by Daniel.Wright Changed display name for 'Two Sided' shading model to 'Two Sided Foliage' to make it clear what it's intended to be used for Change 3083909 on 2016/08/10 by Olaf.Piesche #jira UE-34106 #jira UE-32784 #jira UE-31198 Reset vertex factories on mesh emitters if mesh has been reimported (if mesh package is dirty) Change 3084645 on 2016/08/10 by Olaf.Piesche #jira UE-30398 Fix offset added to particle collision locations. Change 3084709 on 2016/08/10 by Daniel.Wright Copy - Scene capture alpha is now inverted to match DrawMaterialToRenderTarget, and to allow compositing with existing render target contents Added CompositeMode to SceneCapture2D, which can be used to addively accumulate or composite instead of the default overwrite behavior Added bCaptureOnMovement to SceneCapture, which can be disabled so the only source of scene capturing is a manual capture by calling CaptureScene() Change 3084783 on 2016/08/10 by Rolando.Caloca DR - Use the first targeted rhi shader platform as the initial RHI to load on Windows #jira UE-34510 Change 3084958 on 2016/08/10 by Daniel.Wright Copy - Reverted cl 2938543 "Lightmass now respects owner bHidden, and bCastHiddenShadow" because it did not have backwards compatibility so breaks content using hidden light cards Change 3086023 on 2016/08/11 by Marcus.Wassmer Merging //UE4/Dev-Main@3085468 to Dev-Rendering (//UE4/Dev-Rendering) #test none Change 3086778 on 2016/08/11 by Ben.Woodhouse Workaround for fortnite character rendering issue. Enable checkerboard rendering by default until we can fix properly #jira UE-34561 Change 3087404 on 2016/08/12 by Rolando.Caloca DR - Upgrade glslang to 1.0.21.1 - Added some more debug output Change 3087524 on 2016/08/12 by Rolando.Caloca DR - vk - Fixed StencilRef, fixed size of RHIReadSurfaceFloatData (but still returns dummy data) Change 3087663 on 2016/08/12 by Rolando.Caloca DR - vk - Fix for SRGB; support for mip texture views Change 3087735 on 2016/08/12 by Daniel.Wright TextureRenderTarget2D's can now be up to 8192^2. Anything over 2048 pops up an 'are you sure' dialog. Change 3087750 on 2016/08/12 by Rolando.Caloca DR - vk - Minor renaming in prep for merge Change 3087813 on 2016/08/12 by Rolando.Caloca DR - vk - More minor cleanup Change 3087819 on 2016/08/12 by Chris.Bunner Check material function input types directly, no need to traverse connected graph. #jira UE-32134 Change 3087901 on 2016/08/12 by Rolando.Caloca DR - vk - Fix RT view to use 1 mip Fix depth buffer component swizzle Change 3088193 on 2016/08/12 by Daniel.Wright DFAO and RTDF shadows are enabled in High and Epic scalability settings by default Change 3088988 on 2016/08/15 by Rolando.Caloca DR - Add Accessors Change 3089104 on 2016/08/15 by Olaf.Piesche #jira UE-34241 Sceneproxy can be nullptr in FDynamicMeshEmitterData::Init if the proxy is being recreated Change 3089208 on 2016/08/15 by Daniel.Wright Downsampled separate translucency uses a separate view uniform buffer with correct buffer sizes * Fixes WorldPosition in downsampled translucency * View uniform buffer parameters are now cached on the view, to allow recreating the uniform buffer without having to rebuild the entire struct. Currently used by global distance field, downsampled separate translucency. * Fixed the downsampled translucency depth buffer being full res used together with a smaller color target, now they are both the downsampled res Change 3089209 on 2016/08/15 by Daniel.Wright Fixed atmospheric fog on translucency Change 3089457 on 2016/08/15 by Daniel.Wright Fixed lighting build failure from UMaterialInstanceDynamic assigned to a mesh that's being exported to Lightmass. The Swarm cache entry is created using the parent's guid, causing multiple MID's with the same parent to acquire a file handle multiple times which fails after the first. Change 3089549 on 2016/08/15 by Daniel.Wright UMaterialInterface initializes LightingGuid to something valid - causes UMaterialInstanceDynamic to have a valid LightingGuid so they can be used in lighting builds Change 3089703 on 2016/08/15 by Daniel.Wright Custom expression fixup for View.RenderTargetSize Change 3090546 on 2016/08/16 by Daniel.Wright Hopeful fix for recycled snapshot view crash Change 3091202 on 2016/08/16 by Daniel.Wright Manually clear FViewInfo::CachedViewUniformShaderParameters on creating a snapshot, since memcpy is used to create the snapshot view [CL 3091931 by Gil Gribb in Main branch]
2016-08-17 11:38:13 -04:00
ViewUniformShaderParameters.GlobalDistanceFieldSampler0_UB = TStaticSamplerState<SF_Bilinear, AM_Wrap, AM_Wrap, AM_Wrap>::GetRHI();
Copying //UE4/Dev-Mobile to //UE4/Dev-Main (Source: //UE4/Dev-Mobile @ 3116515) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3065209 on 2016/07/26 by Steve.Cano Add Android Texture Format used for packaging/cooking to the Manifest File #ue4 #android #jira UE-33645 Change 3068915 on 2016/07/28 by Steve.Cano Add an additional Texture Compression support line in the manifest for DXT #ue4 #android #jira UE-33645 Change 3075911 on 2016/08/03 by Steve.Cano Make the "Running {ProjectName} on {Device}" toast stay up when launching a game to an Android device until we've finished running it, as it does on other platforms. This logic already existed but only ran if the "Prebuilt" flag was passed in, however we want this to always run now. Re-writing to run through waiting on each process to finish for each device launched on #jira UE-3122 #ue4 #android Change 3080981 on 2016/08/08 by Steve.Cano Clear any input before removing the TouchInterface from the screen to prevent infinite input after it is cleared #jira UE-33956 #ue4 #platform Change 3092587 on 2016/08/17 by Steve.Cano Adding "IsGamepadAttached" functionality to Android Application #jira UE-33264 #ue4 #android Change 3095840 on 2016/08/21 by Dmitriy.Dyomin Fixed: Particle Cutout Crashes On Certain Devices (Samsung Galaxy Note 2) Happens only with non-instanced path #jira UE-34604 Change 3095855 on 2016/08/22 by Dmitriy.Dyomin Allow UWorldComposition::GetTilesList to be used in runtime code Licensee request https://udn.unrealengine.com/questions/307586/world-compositions-world-dimensions.html Change 3096093 on 2016/08/22 by Allan.Bentham Allow Vulkan api logging on android Change 3096361 on 2016/08/22 by Steve.Cano Github 2663 pull - Pass any extras used to launch SplashActivity down to GameActivity (Contributed by sangpan) #jira UE-34050 #github #2663 #ue4 #android Change 3097412 on 2016/08/23 by Dmitriy.Dyomin Using BulkSerialize for cooked collision data to speed up serialization Change 3098957 on 2016/08/23 by Jack.Porter Merging //UE4/Dev-Main to Dev-Mobile (//UE4/Dev-Mobile) Change 3099058 on 2016/08/24 by Jack.Porter Check EXT_debug_label and EXT_debug_marker functions were found before calling as a few devices to not implement these extensions #UE-35087 Change 3099131 on 2016/08/24 by Dmitriy.Dyomin Fixed: HDR compressed texture become black in some mali devices Use sized internal format for half-float textures on ES3 devices, as ES3 spec expects it #jira UE-35018 Change 3099150 on 2016/08/24 by Dmitriy.Dyomin Enable HALF_FLOAT and UNSIGNED_INT_2_10_10_10_REV vertex formats on ES3+ devices, spec req Change 3102252 on 2016/08/26 by Dmitriy.Dyomin Prevent view uniform buffer crash on ES2 devices that do not support 3D textures Change 3102258 on 2016/08/26 by Dmitriy.Dyomin Enabled refraction on iPadMini4 #jira UE-35079 Change 3102651 on 2016/08/26 by Dmitriy.Dyomin Fixed: instanced static mesh world normals Also removed unnecessary instance matrix transposing #jira UE-35075 Change 3109397 on 2016/09/01 by Jack.Porter Fix problem with Android sessions not appearing in Session Frontend #jira UE-35261 Change 3109490 on 2016/09/01 by Jack.Porter Merging //UE4/Dev-Main to Dev-Mobile (//UE4/Dev-Mobile) Change 3111628 on 2016/09/02 by Jack.Porter Landscape wireframe LOD visualization with tessellation Change 3112809 on 2016/09/02 by Chris.Babcock Update cached length when file is written to on Android #jira UE-35558 #ue4 #android Change 3113245 on 2016/09/04 by Dmitriy.Dyomin Fixed: Subway Sequencer plays only a black screen when packaged for ESDSR (3.1+AEP) #jira UE-34291 Change 3113249 on 2016/09/04 by Dmitriy.Dyomin Replicated fix from 4.13: GPU particles no longer work on iOS or TVOS Metal devices #jira UE-34782 Change 3113513 on 2016/09/05 by Allan.Bentham Add vulkan version parameter to android device profile selector's source inputs . reinstate Vulkan Disable cvar functionality. Added mali no vulkan device profile. Change 3113519 on 2016/09/05 by Allan.Bentham Remove temp 4.13 hack to avoid public header changes. Change 3113535 on 2016/09/05 by Allan.Bentham Decode 32 bit HDR formats when using scene captures. #jira UE-25444 Change 3113813 on 2016/09/06 by Dmitriy.Dyomin Resend to server sub-levels visibility state right after world actors are initialized. During seamless travel client loads always-loaded sub-levels before world actors are initialized and ServerUpdateLevelVisibility calls in UWorld::AddToWorld are skipped. Change 3113870 on 2016/09/06 by Jack.Porter Fix issue with ES2 Feature Level preview and Mobile Preview PIE not limiting materials to 8 textures #jira UE-35591 Change 3115031 on 2016/09/06 by Chris.Babcock Add Vulkan version code not moved over from 4.13.1 #jira UE-35642 #ue4 #android Change 3115496 on 2016/09/07 by Dmitriy.Dyomin Use the same DDC key for source reflection data and encoded data. Fixes broken reflections on mobile #jira UE-35647 [CL 3116720 by Chris Babcock in Main branch]
2016-09-07 17:04:11 -04:00
ViewUniformShaderParameters.GlobalDistanceFieldTexture1_UB = OrBlack3DIfNull(GBlackVolumeTexture->TextureRHI.GetReference());
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3091903) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3072947 on 2016/08/01 by Uriel.Doyon Texture GUIDs are now included in cooked builds, as they are required by the texture streamer to link build data to in game texture. #review-3072934 @marcus.wassmer #jira UE-34045 Change 3073301 on 2016/08/02 by Ben.Woodhouse Fix for large spotlight culling precision issues, reported on UDN by Aaron Jacobs at Double Fine. For a full description, see the UDN post https://udn.unrealengine.com/questions/305440/shadowed-light-flicker-caused-by-floating-point-pr.html #jira UE-34052 Change 3073689 on 2016/08/02 by Ben.Woodhouse Improved skin postprocess - support for full resolution, with diffuse/spec lighting combined into single RGBA (sharing chroma) Full res lighting gives less temporal AA flickering, sharper diffuse and specular lighting in the surface (since this is now at full resolution), faster postprocessing if using a 64-bit rendertarget (on NV 980Ti). Checkerboard rendering is controlled via the r.sss.checkerboard cvar. - 0 is off/full res, 1 is checkerboard, 2 is automatic based on scenecolor (non-checkerboard requires 64bit or more rendertarget w/separate alpha) Tested/profiled on PC, PS4 Change 3074666 on 2016/08/02 by Daniel.Wright Fixed stationary skylight brightness Change 3074667 on 2016/08/02 by Daniel.Wright Fixed r.ReflectionEnvironmentLightmapMixing Change 3074687 on 2016/08/02 by Daniel.Wright Disallowed DrawMaterialToRenderTarget and Begin/EndDrawCanvasToRenderTarget in construction scripts, since they don't work in game. Blutilities can be used to do blueprint rendering in the editor. Change 3075241 on 2016/08/03 by Rolando.Caloca DR - Fix linux compile issue & static analysis warning Change 3075746 on 2016/08/03 by Daniel.Wright Removed bOverride_AntiAliasingMethod and outdated ini references to PP AntiAliasingMethod Change 3075783 on 2016/08/03 by Ryan.Brucks #code.review Marcus.Wassmer Added two material nodes that return Atmospheric Light Vector and Light Direction using: View.AtmosphericFogSunColor View.AtmosphericFogSunDirection Nodes are called: AtmosphericLightVector AtmosphericLightColor Also changed SceneRendering.cpp so that values will be grabbed from directional lights without needing an Atmospheric Fog actor in the scene. Change 3075969 on 2016/08/03 by Uriel.Doyon Material GUIDs are not updated anymore when parents or textures change. Lighting now uses a hash built from the list of parents, textures and shader functions. #review-3072980 @marcus.wassmer @daniel.wright Change 3076116 on 2016/08/03 by Ryan.Brucks #code.review marcus.wassmer Fixed typo in the Caption of new Nodes "Atmospheric Light Vector" and "Atmospheric Light Color" Change 3076456 on 2016/08/03 by Rolando.Caloca DR - Fix geometry shader gl_Layer for SPIR-V Change 3076730 on 2016/08/03 by Uriel.Doyon Added user warning logic for the texture streaming build. Ran in MapCheck, BeginPlay and PreSave. #review-3072984 @marcus.wassmer Change 3077616 on 2016/08/04 by Daniel.Wright Planar reflection show flags can now be edited Change 3077621 on 2016/08/04 by Daniel.Wright Changed default Planar Reflection DistanceFromPlaneFadeoutEnd from 600 to 100, which reduces artifacts and is a more intuitive initial setting Change 3077792 on 2016/08/04 by Daniel.Wright Fixed an unnecessary sky capture caused by the sky light component owned by the default ASkyLight Change 3077799 on 2016/08/04 by Daniel.Wright Skip RF_ArchetypeObject for reflection captures Change 3077876 on 2016/08/04 by Marc.Olano Noise material perf improvements Change random number generator for Gradient-ALU (1.7x perf boost), improve speed of Voronoi noise quality level 3. Removes integer BBS random number generators. Fewer instructions, but too slow to use (see 1.7x perf boost above) Change 3077884 on 2016/08/04 by Daniel.Wright Lighting channels can now be edited on components with static mobility, since dynamic lights can still affect them Change 3078994 on 2016/08/05 by Simon.Tovey Fix for UE-34241 Scene proxy ptr was being cached during a downcast. Inside a call to CreateDynamicData, CheckMaterialUsage_Concurrent() was causing the scene proxy to be recreated an so the cached ptr was stale. I've fixed the immediate issue but recreating the scene proxy here doesn't seem great. Maybe CheckMaterailUsage() should be rethought a bit. Change 3079162 on 2016/08/05 by Ben.Woodhouse Fix for jittering in Paper2D. Was caused by override being ignored due to a change in intiialization order for AA settings. #jira UE-34091 Change 3079613 on 2016/08/05 by Daniel.Wright New blueprint function ClearRenderTarget2D, which is the only way to set a render target alpha directly New blueprint function CreateRenderTarget2D Change 3079708 on 2016/08/05 by Uriel.Doyon Fixed crash when building texture streaming on some levels. Change 3079795 on 2016/08/05 by Uriel.Doyon Fixed issue with instanced static meshes when building texture streaming. Fixed typo with func "GetNumTextureStreamingPrimitives" Change 3079806 on 2016/08/05 by Uriel.Doyon Enabled PerTexture MipBias. The per texture mip bias now resets to 0 when the texture gets required at low resolution. New scalability setting named "r.Streaming.LimitPoolSizeToVRAM" enabling the PoolSize to be limited the available VRAM (according to GPoolSizeVRAMPercentage) #review-3074662 @marcus.wassmer Change 3082698 on 2016/08/09 by Daniel.Wright Copy - CreateRenderTarget2D uses a world context object as owner, allows use in a construction script Change 3082699 on 2016/08/09 by Daniel.Wright Changed display name for 'Two Sided' shading model to 'Two Sided Foliage' to make it clear what it's intended to be used for Change 3083909 on 2016/08/10 by Olaf.Piesche #jira UE-34106 #jira UE-32784 #jira UE-31198 Reset vertex factories on mesh emitters if mesh has been reimported (if mesh package is dirty) Change 3084645 on 2016/08/10 by Olaf.Piesche #jira UE-30398 Fix offset added to particle collision locations. Change 3084709 on 2016/08/10 by Daniel.Wright Copy - Scene capture alpha is now inverted to match DrawMaterialToRenderTarget, and to allow compositing with existing render target contents Added CompositeMode to SceneCapture2D, which can be used to addively accumulate or composite instead of the default overwrite behavior Added bCaptureOnMovement to SceneCapture, which can be disabled so the only source of scene capturing is a manual capture by calling CaptureScene() Change 3084783 on 2016/08/10 by Rolando.Caloca DR - Use the first targeted rhi shader platform as the initial RHI to load on Windows #jira UE-34510 Change 3084958 on 2016/08/10 by Daniel.Wright Copy - Reverted cl 2938543 "Lightmass now respects owner bHidden, and bCastHiddenShadow" because it did not have backwards compatibility so breaks content using hidden light cards Change 3086023 on 2016/08/11 by Marcus.Wassmer Merging //UE4/Dev-Main@3085468 to Dev-Rendering (//UE4/Dev-Rendering) #test none Change 3086778 on 2016/08/11 by Ben.Woodhouse Workaround for fortnite character rendering issue. Enable checkerboard rendering by default until we can fix properly #jira UE-34561 Change 3087404 on 2016/08/12 by Rolando.Caloca DR - Upgrade glslang to 1.0.21.1 - Added some more debug output Change 3087524 on 2016/08/12 by Rolando.Caloca DR - vk - Fixed StencilRef, fixed size of RHIReadSurfaceFloatData (but still returns dummy data) Change 3087663 on 2016/08/12 by Rolando.Caloca DR - vk - Fix for SRGB; support for mip texture views Change 3087735 on 2016/08/12 by Daniel.Wright TextureRenderTarget2D's can now be up to 8192^2. Anything over 2048 pops up an 'are you sure' dialog. Change 3087750 on 2016/08/12 by Rolando.Caloca DR - vk - Minor renaming in prep for merge Change 3087813 on 2016/08/12 by Rolando.Caloca DR - vk - More minor cleanup Change 3087819 on 2016/08/12 by Chris.Bunner Check material function input types directly, no need to traverse connected graph. #jira UE-32134 Change 3087901 on 2016/08/12 by Rolando.Caloca DR - vk - Fix RT view to use 1 mip Fix depth buffer component swizzle Change 3088193 on 2016/08/12 by Daniel.Wright DFAO and RTDF shadows are enabled in High and Epic scalability settings by default Change 3088988 on 2016/08/15 by Rolando.Caloca DR - Add Accessors Change 3089104 on 2016/08/15 by Olaf.Piesche #jira UE-34241 Sceneproxy can be nullptr in FDynamicMeshEmitterData::Init if the proxy is being recreated Change 3089208 on 2016/08/15 by Daniel.Wright Downsampled separate translucency uses a separate view uniform buffer with correct buffer sizes * Fixes WorldPosition in downsampled translucency * View uniform buffer parameters are now cached on the view, to allow recreating the uniform buffer without having to rebuild the entire struct. Currently used by global distance field, downsampled separate translucency. * Fixed the downsampled translucency depth buffer being full res used together with a smaller color target, now they are both the downsampled res Change 3089209 on 2016/08/15 by Daniel.Wright Fixed atmospheric fog on translucency Change 3089457 on 2016/08/15 by Daniel.Wright Fixed lighting build failure from UMaterialInstanceDynamic assigned to a mesh that's being exported to Lightmass. The Swarm cache entry is created using the parent's guid, causing multiple MID's with the same parent to acquire a file handle multiple times which fails after the first. Change 3089549 on 2016/08/15 by Daniel.Wright UMaterialInterface initializes LightingGuid to something valid - causes UMaterialInstanceDynamic to have a valid LightingGuid so they can be used in lighting builds Change 3089703 on 2016/08/15 by Daniel.Wright Custom expression fixup for View.RenderTargetSize Change 3090546 on 2016/08/16 by Daniel.Wright Hopeful fix for recycled snapshot view crash Change 3091202 on 2016/08/16 by Daniel.Wright Manually clear FViewInfo::CachedViewUniformShaderParameters on creating a snapshot, since memcpy is used to create the snapshot view [CL 3091931 by Gil Gribb in Main branch]
2016-08-17 11:38:13 -04:00
ViewUniformShaderParameters.GlobalDistanceFieldSampler1_UB = TStaticSamplerState<SF_Bilinear, AM_Wrap, AM_Wrap, AM_Wrap>::GetRHI();
Copying //UE4/Dev-Mobile to //UE4/Dev-Main (Source: //UE4/Dev-Mobile @ 3116515) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3065209 on 2016/07/26 by Steve.Cano Add Android Texture Format used for packaging/cooking to the Manifest File #ue4 #android #jira UE-33645 Change 3068915 on 2016/07/28 by Steve.Cano Add an additional Texture Compression support line in the manifest for DXT #ue4 #android #jira UE-33645 Change 3075911 on 2016/08/03 by Steve.Cano Make the "Running {ProjectName} on {Device}" toast stay up when launching a game to an Android device until we've finished running it, as it does on other platforms. This logic already existed but only ran if the "Prebuilt" flag was passed in, however we want this to always run now. Re-writing to run through waiting on each process to finish for each device launched on #jira UE-3122 #ue4 #android Change 3080981 on 2016/08/08 by Steve.Cano Clear any input before removing the TouchInterface from the screen to prevent infinite input after it is cleared #jira UE-33956 #ue4 #platform Change 3092587 on 2016/08/17 by Steve.Cano Adding "IsGamepadAttached" functionality to Android Application #jira UE-33264 #ue4 #android Change 3095840 on 2016/08/21 by Dmitriy.Dyomin Fixed: Particle Cutout Crashes On Certain Devices (Samsung Galaxy Note 2) Happens only with non-instanced path #jira UE-34604 Change 3095855 on 2016/08/22 by Dmitriy.Dyomin Allow UWorldComposition::GetTilesList to be used in runtime code Licensee request https://udn.unrealengine.com/questions/307586/world-compositions-world-dimensions.html Change 3096093 on 2016/08/22 by Allan.Bentham Allow Vulkan api logging on android Change 3096361 on 2016/08/22 by Steve.Cano Github 2663 pull - Pass any extras used to launch SplashActivity down to GameActivity (Contributed by sangpan) #jira UE-34050 #github #2663 #ue4 #android Change 3097412 on 2016/08/23 by Dmitriy.Dyomin Using BulkSerialize for cooked collision data to speed up serialization Change 3098957 on 2016/08/23 by Jack.Porter Merging //UE4/Dev-Main to Dev-Mobile (//UE4/Dev-Mobile) Change 3099058 on 2016/08/24 by Jack.Porter Check EXT_debug_label and EXT_debug_marker functions were found before calling as a few devices to not implement these extensions #UE-35087 Change 3099131 on 2016/08/24 by Dmitriy.Dyomin Fixed: HDR compressed texture become black in some mali devices Use sized internal format for half-float textures on ES3 devices, as ES3 spec expects it #jira UE-35018 Change 3099150 on 2016/08/24 by Dmitriy.Dyomin Enable HALF_FLOAT and UNSIGNED_INT_2_10_10_10_REV vertex formats on ES3+ devices, spec req Change 3102252 on 2016/08/26 by Dmitriy.Dyomin Prevent view uniform buffer crash on ES2 devices that do not support 3D textures Change 3102258 on 2016/08/26 by Dmitriy.Dyomin Enabled refraction on iPadMini4 #jira UE-35079 Change 3102651 on 2016/08/26 by Dmitriy.Dyomin Fixed: instanced static mesh world normals Also removed unnecessary instance matrix transposing #jira UE-35075 Change 3109397 on 2016/09/01 by Jack.Porter Fix problem with Android sessions not appearing in Session Frontend #jira UE-35261 Change 3109490 on 2016/09/01 by Jack.Porter Merging //UE4/Dev-Main to Dev-Mobile (//UE4/Dev-Mobile) Change 3111628 on 2016/09/02 by Jack.Porter Landscape wireframe LOD visualization with tessellation Change 3112809 on 2016/09/02 by Chris.Babcock Update cached length when file is written to on Android #jira UE-35558 #ue4 #android Change 3113245 on 2016/09/04 by Dmitriy.Dyomin Fixed: Subway Sequencer plays only a black screen when packaged for ESDSR (3.1+AEP) #jira UE-34291 Change 3113249 on 2016/09/04 by Dmitriy.Dyomin Replicated fix from 4.13: GPU particles no longer work on iOS or TVOS Metal devices #jira UE-34782 Change 3113513 on 2016/09/05 by Allan.Bentham Add vulkan version parameter to android device profile selector's source inputs . reinstate Vulkan Disable cvar functionality. Added mali no vulkan device profile. Change 3113519 on 2016/09/05 by Allan.Bentham Remove temp 4.13 hack to avoid public header changes. Change 3113535 on 2016/09/05 by Allan.Bentham Decode 32 bit HDR formats when using scene captures. #jira UE-25444 Change 3113813 on 2016/09/06 by Dmitriy.Dyomin Resend to server sub-levels visibility state right after world actors are initialized. During seamless travel client loads always-loaded sub-levels before world actors are initialized and ServerUpdateLevelVisibility calls in UWorld::AddToWorld are skipped. Change 3113870 on 2016/09/06 by Jack.Porter Fix issue with ES2 Feature Level preview and Mobile Preview PIE not limiting materials to 8 textures #jira UE-35591 Change 3115031 on 2016/09/06 by Chris.Babcock Add Vulkan version code not moved over from 4.13.1 #jira UE-35642 #ue4 #android Change 3115496 on 2016/09/07 by Dmitriy.Dyomin Use the same DDC key for source reflection data and encoded data. Fixes broken reflections on mobile #jira UE-35647 [CL 3116720 by Chris Babcock in Main branch]
2016-09-07 17:04:11 -04:00
ViewUniformShaderParameters.GlobalDistanceFieldTexture2_UB = OrBlack3DIfNull(GBlackVolumeTexture->TextureRHI.GetReference());
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3091903) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3072947 on 2016/08/01 by Uriel.Doyon Texture GUIDs are now included in cooked builds, as they are required by the texture streamer to link build data to in game texture. #review-3072934 @marcus.wassmer #jira UE-34045 Change 3073301 on 2016/08/02 by Ben.Woodhouse Fix for large spotlight culling precision issues, reported on UDN by Aaron Jacobs at Double Fine. For a full description, see the UDN post https://udn.unrealengine.com/questions/305440/shadowed-light-flicker-caused-by-floating-point-pr.html #jira UE-34052 Change 3073689 on 2016/08/02 by Ben.Woodhouse Improved skin postprocess - support for full resolution, with diffuse/spec lighting combined into single RGBA (sharing chroma) Full res lighting gives less temporal AA flickering, sharper diffuse and specular lighting in the surface (since this is now at full resolution), faster postprocessing if using a 64-bit rendertarget (on NV 980Ti). Checkerboard rendering is controlled via the r.sss.checkerboard cvar. - 0 is off/full res, 1 is checkerboard, 2 is automatic based on scenecolor (non-checkerboard requires 64bit or more rendertarget w/separate alpha) Tested/profiled on PC, PS4 Change 3074666 on 2016/08/02 by Daniel.Wright Fixed stationary skylight brightness Change 3074667 on 2016/08/02 by Daniel.Wright Fixed r.ReflectionEnvironmentLightmapMixing Change 3074687 on 2016/08/02 by Daniel.Wright Disallowed DrawMaterialToRenderTarget and Begin/EndDrawCanvasToRenderTarget in construction scripts, since they don't work in game. Blutilities can be used to do blueprint rendering in the editor. Change 3075241 on 2016/08/03 by Rolando.Caloca DR - Fix linux compile issue & static analysis warning Change 3075746 on 2016/08/03 by Daniel.Wright Removed bOverride_AntiAliasingMethod and outdated ini references to PP AntiAliasingMethod Change 3075783 on 2016/08/03 by Ryan.Brucks #code.review Marcus.Wassmer Added two material nodes that return Atmospheric Light Vector and Light Direction using: View.AtmosphericFogSunColor View.AtmosphericFogSunDirection Nodes are called: AtmosphericLightVector AtmosphericLightColor Also changed SceneRendering.cpp so that values will be grabbed from directional lights without needing an Atmospheric Fog actor in the scene. Change 3075969 on 2016/08/03 by Uriel.Doyon Material GUIDs are not updated anymore when parents or textures change. Lighting now uses a hash built from the list of parents, textures and shader functions. #review-3072980 @marcus.wassmer @daniel.wright Change 3076116 on 2016/08/03 by Ryan.Brucks #code.review marcus.wassmer Fixed typo in the Caption of new Nodes "Atmospheric Light Vector" and "Atmospheric Light Color" Change 3076456 on 2016/08/03 by Rolando.Caloca DR - Fix geometry shader gl_Layer for SPIR-V Change 3076730 on 2016/08/03 by Uriel.Doyon Added user warning logic for the texture streaming build. Ran in MapCheck, BeginPlay and PreSave. #review-3072984 @marcus.wassmer Change 3077616 on 2016/08/04 by Daniel.Wright Planar reflection show flags can now be edited Change 3077621 on 2016/08/04 by Daniel.Wright Changed default Planar Reflection DistanceFromPlaneFadeoutEnd from 600 to 100, which reduces artifacts and is a more intuitive initial setting Change 3077792 on 2016/08/04 by Daniel.Wright Fixed an unnecessary sky capture caused by the sky light component owned by the default ASkyLight Change 3077799 on 2016/08/04 by Daniel.Wright Skip RF_ArchetypeObject for reflection captures Change 3077876 on 2016/08/04 by Marc.Olano Noise material perf improvements Change random number generator for Gradient-ALU (1.7x perf boost), improve speed of Voronoi noise quality level 3. Removes integer BBS random number generators. Fewer instructions, but too slow to use (see 1.7x perf boost above) Change 3077884 on 2016/08/04 by Daniel.Wright Lighting channels can now be edited on components with static mobility, since dynamic lights can still affect them Change 3078994 on 2016/08/05 by Simon.Tovey Fix for UE-34241 Scene proxy ptr was being cached during a downcast. Inside a call to CreateDynamicData, CheckMaterialUsage_Concurrent() was causing the scene proxy to be recreated an so the cached ptr was stale. I've fixed the immediate issue but recreating the scene proxy here doesn't seem great. Maybe CheckMaterailUsage() should be rethought a bit. Change 3079162 on 2016/08/05 by Ben.Woodhouse Fix for jittering in Paper2D. Was caused by override being ignored due to a change in intiialization order for AA settings. #jira UE-34091 Change 3079613 on 2016/08/05 by Daniel.Wright New blueprint function ClearRenderTarget2D, which is the only way to set a render target alpha directly New blueprint function CreateRenderTarget2D Change 3079708 on 2016/08/05 by Uriel.Doyon Fixed crash when building texture streaming on some levels. Change 3079795 on 2016/08/05 by Uriel.Doyon Fixed issue with instanced static meshes when building texture streaming. Fixed typo with func "GetNumTextureStreamingPrimitives" Change 3079806 on 2016/08/05 by Uriel.Doyon Enabled PerTexture MipBias. The per texture mip bias now resets to 0 when the texture gets required at low resolution. New scalability setting named "r.Streaming.LimitPoolSizeToVRAM" enabling the PoolSize to be limited the available VRAM (according to GPoolSizeVRAMPercentage) #review-3074662 @marcus.wassmer Change 3082698 on 2016/08/09 by Daniel.Wright Copy - CreateRenderTarget2D uses a world context object as owner, allows use in a construction script Change 3082699 on 2016/08/09 by Daniel.Wright Changed display name for 'Two Sided' shading model to 'Two Sided Foliage' to make it clear what it's intended to be used for Change 3083909 on 2016/08/10 by Olaf.Piesche #jira UE-34106 #jira UE-32784 #jira UE-31198 Reset vertex factories on mesh emitters if mesh has been reimported (if mesh package is dirty) Change 3084645 on 2016/08/10 by Olaf.Piesche #jira UE-30398 Fix offset added to particle collision locations. Change 3084709 on 2016/08/10 by Daniel.Wright Copy - Scene capture alpha is now inverted to match DrawMaterialToRenderTarget, and to allow compositing with existing render target contents Added CompositeMode to SceneCapture2D, which can be used to addively accumulate or composite instead of the default overwrite behavior Added bCaptureOnMovement to SceneCapture, which can be disabled so the only source of scene capturing is a manual capture by calling CaptureScene() Change 3084783 on 2016/08/10 by Rolando.Caloca DR - Use the first targeted rhi shader platform as the initial RHI to load on Windows #jira UE-34510 Change 3084958 on 2016/08/10 by Daniel.Wright Copy - Reverted cl 2938543 "Lightmass now respects owner bHidden, and bCastHiddenShadow" because it did not have backwards compatibility so breaks content using hidden light cards Change 3086023 on 2016/08/11 by Marcus.Wassmer Merging //UE4/Dev-Main@3085468 to Dev-Rendering (//UE4/Dev-Rendering) #test none Change 3086778 on 2016/08/11 by Ben.Woodhouse Workaround for fortnite character rendering issue. Enable checkerboard rendering by default until we can fix properly #jira UE-34561 Change 3087404 on 2016/08/12 by Rolando.Caloca DR - Upgrade glslang to 1.0.21.1 - Added some more debug output Change 3087524 on 2016/08/12 by Rolando.Caloca DR - vk - Fixed StencilRef, fixed size of RHIReadSurfaceFloatData (but still returns dummy data) Change 3087663 on 2016/08/12 by Rolando.Caloca DR - vk - Fix for SRGB; support for mip texture views Change 3087735 on 2016/08/12 by Daniel.Wright TextureRenderTarget2D's can now be up to 8192^2. Anything over 2048 pops up an 'are you sure' dialog. Change 3087750 on 2016/08/12 by Rolando.Caloca DR - vk - Minor renaming in prep for merge Change 3087813 on 2016/08/12 by Rolando.Caloca DR - vk - More minor cleanup Change 3087819 on 2016/08/12 by Chris.Bunner Check material function input types directly, no need to traverse connected graph. #jira UE-32134 Change 3087901 on 2016/08/12 by Rolando.Caloca DR - vk - Fix RT view to use 1 mip Fix depth buffer component swizzle Change 3088193 on 2016/08/12 by Daniel.Wright DFAO and RTDF shadows are enabled in High and Epic scalability settings by default Change 3088988 on 2016/08/15 by Rolando.Caloca DR - Add Accessors Change 3089104 on 2016/08/15 by Olaf.Piesche #jira UE-34241 Sceneproxy can be nullptr in FDynamicMeshEmitterData::Init if the proxy is being recreated Change 3089208 on 2016/08/15 by Daniel.Wright Downsampled separate translucency uses a separate view uniform buffer with correct buffer sizes * Fixes WorldPosition in downsampled translucency * View uniform buffer parameters are now cached on the view, to allow recreating the uniform buffer without having to rebuild the entire struct. Currently used by global distance field, downsampled separate translucency. * Fixed the downsampled translucency depth buffer being full res used together with a smaller color target, now they are both the downsampled res Change 3089209 on 2016/08/15 by Daniel.Wright Fixed atmospheric fog on translucency Change 3089457 on 2016/08/15 by Daniel.Wright Fixed lighting build failure from UMaterialInstanceDynamic assigned to a mesh that's being exported to Lightmass. The Swarm cache entry is created using the parent's guid, causing multiple MID's with the same parent to acquire a file handle multiple times which fails after the first. Change 3089549 on 2016/08/15 by Daniel.Wright UMaterialInterface initializes LightingGuid to something valid - causes UMaterialInstanceDynamic to have a valid LightingGuid so they can be used in lighting builds Change 3089703 on 2016/08/15 by Daniel.Wright Custom expression fixup for View.RenderTargetSize Change 3090546 on 2016/08/16 by Daniel.Wright Hopeful fix for recycled snapshot view crash Change 3091202 on 2016/08/16 by Daniel.Wright Manually clear FViewInfo::CachedViewUniformShaderParameters on creating a snapshot, since memcpy is used to create the snapshot view [CL 3091931 by Gil Gribb in Main branch]
2016-08-17 11:38:13 -04:00
ViewUniformShaderParameters.GlobalDistanceFieldSampler2_UB = TStaticSamplerState<SF_Bilinear, AM_Wrap, AM_Wrap, AM_Wrap>::GetRHI();
Copying //UE4/Dev-Mobile to //UE4/Dev-Main (Source: //UE4/Dev-Mobile @ 3116515) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3065209 on 2016/07/26 by Steve.Cano Add Android Texture Format used for packaging/cooking to the Manifest File #ue4 #android #jira UE-33645 Change 3068915 on 2016/07/28 by Steve.Cano Add an additional Texture Compression support line in the manifest for DXT #ue4 #android #jira UE-33645 Change 3075911 on 2016/08/03 by Steve.Cano Make the "Running {ProjectName} on {Device}" toast stay up when launching a game to an Android device until we've finished running it, as it does on other platforms. This logic already existed but only ran if the "Prebuilt" flag was passed in, however we want this to always run now. Re-writing to run through waiting on each process to finish for each device launched on #jira UE-3122 #ue4 #android Change 3080981 on 2016/08/08 by Steve.Cano Clear any input before removing the TouchInterface from the screen to prevent infinite input after it is cleared #jira UE-33956 #ue4 #platform Change 3092587 on 2016/08/17 by Steve.Cano Adding "IsGamepadAttached" functionality to Android Application #jira UE-33264 #ue4 #android Change 3095840 on 2016/08/21 by Dmitriy.Dyomin Fixed: Particle Cutout Crashes On Certain Devices (Samsung Galaxy Note 2) Happens only with non-instanced path #jira UE-34604 Change 3095855 on 2016/08/22 by Dmitriy.Dyomin Allow UWorldComposition::GetTilesList to be used in runtime code Licensee request https://udn.unrealengine.com/questions/307586/world-compositions-world-dimensions.html Change 3096093 on 2016/08/22 by Allan.Bentham Allow Vulkan api logging on android Change 3096361 on 2016/08/22 by Steve.Cano Github 2663 pull - Pass any extras used to launch SplashActivity down to GameActivity (Contributed by sangpan) #jira UE-34050 #github #2663 #ue4 #android Change 3097412 on 2016/08/23 by Dmitriy.Dyomin Using BulkSerialize for cooked collision data to speed up serialization Change 3098957 on 2016/08/23 by Jack.Porter Merging //UE4/Dev-Main to Dev-Mobile (//UE4/Dev-Mobile) Change 3099058 on 2016/08/24 by Jack.Porter Check EXT_debug_label and EXT_debug_marker functions were found before calling as a few devices to not implement these extensions #UE-35087 Change 3099131 on 2016/08/24 by Dmitriy.Dyomin Fixed: HDR compressed texture become black in some mali devices Use sized internal format for half-float textures on ES3 devices, as ES3 spec expects it #jira UE-35018 Change 3099150 on 2016/08/24 by Dmitriy.Dyomin Enable HALF_FLOAT and UNSIGNED_INT_2_10_10_10_REV vertex formats on ES3+ devices, spec req Change 3102252 on 2016/08/26 by Dmitriy.Dyomin Prevent view uniform buffer crash on ES2 devices that do not support 3D textures Change 3102258 on 2016/08/26 by Dmitriy.Dyomin Enabled refraction on iPadMini4 #jira UE-35079 Change 3102651 on 2016/08/26 by Dmitriy.Dyomin Fixed: instanced static mesh world normals Also removed unnecessary instance matrix transposing #jira UE-35075 Change 3109397 on 2016/09/01 by Jack.Porter Fix problem with Android sessions not appearing in Session Frontend #jira UE-35261 Change 3109490 on 2016/09/01 by Jack.Porter Merging //UE4/Dev-Main to Dev-Mobile (//UE4/Dev-Mobile) Change 3111628 on 2016/09/02 by Jack.Porter Landscape wireframe LOD visualization with tessellation Change 3112809 on 2016/09/02 by Chris.Babcock Update cached length when file is written to on Android #jira UE-35558 #ue4 #android Change 3113245 on 2016/09/04 by Dmitriy.Dyomin Fixed: Subway Sequencer plays only a black screen when packaged for ESDSR (3.1+AEP) #jira UE-34291 Change 3113249 on 2016/09/04 by Dmitriy.Dyomin Replicated fix from 4.13: GPU particles no longer work on iOS or TVOS Metal devices #jira UE-34782 Change 3113513 on 2016/09/05 by Allan.Bentham Add vulkan version parameter to android device profile selector's source inputs . reinstate Vulkan Disable cvar functionality. Added mali no vulkan device profile. Change 3113519 on 2016/09/05 by Allan.Bentham Remove temp 4.13 hack to avoid public header changes. Change 3113535 on 2016/09/05 by Allan.Bentham Decode 32 bit HDR formats when using scene captures. #jira UE-25444 Change 3113813 on 2016/09/06 by Dmitriy.Dyomin Resend to server sub-levels visibility state right after world actors are initialized. During seamless travel client loads always-loaded sub-levels before world actors are initialized and ServerUpdateLevelVisibility calls in UWorld::AddToWorld are skipped. Change 3113870 on 2016/09/06 by Jack.Porter Fix issue with ES2 Feature Level preview and Mobile Preview PIE not limiting materials to 8 textures #jira UE-35591 Change 3115031 on 2016/09/06 by Chris.Babcock Add Vulkan version code not moved over from 4.13.1 #jira UE-35642 #ue4 #android Change 3115496 on 2016/09/07 by Dmitriy.Dyomin Use the same DDC key for source reflection data and encoded data. Fixes broken reflections on mobile #jira UE-35647 [CL 3116720 by Chris Babcock in Main branch]
2016-09-07 17:04:11 -04:00
ViewUniformShaderParameters.GlobalDistanceFieldTexture3_UB = OrBlack3DIfNull(GBlackVolumeTexture->TextureRHI.GetReference());
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3091903) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3072947 on 2016/08/01 by Uriel.Doyon Texture GUIDs are now included in cooked builds, as they are required by the texture streamer to link build data to in game texture. #review-3072934 @marcus.wassmer #jira UE-34045 Change 3073301 on 2016/08/02 by Ben.Woodhouse Fix for large spotlight culling precision issues, reported on UDN by Aaron Jacobs at Double Fine. For a full description, see the UDN post https://udn.unrealengine.com/questions/305440/shadowed-light-flicker-caused-by-floating-point-pr.html #jira UE-34052 Change 3073689 on 2016/08/02 by Ben.Woodhouse Improved skin postprocess - support for full resolution, with diffuse/spec lighting combined into single RGBA (sharing chroma) Full res lighting gives less temporal AA flickering, sharper diffuse and specular lighting in the surface (since this is now at full resolution), faster postprocessing if using a 64-bit rendertarget (on NV 980Ti). Checkerboard rendering is controlled via the r.sss.checkerboard cvar. - 0 is off/full res, 1 is checkerboard, 2 is automatic based on scenecolor (non-checkerboard requires 64bit or more rendertarget w/separate alpha) Tested/profiled on PC, PS4 Change 3074666 on 2016/08/02 by Daniel.Wright Fixed stationary skylight brightness Change 3074667 on 2016/08/02 by Daniel.Wright Fixed r.ReflectionEnvironmentLightmapMixing Change 3074687 on 2016/08/02 by Daniel.Wright Disallowed DrawMaterialToRenderTarget and Begin/EndDrawCanvasToRenderTarget in construction scripts, since they don't work in game. Blutilities can be used to do blueprint rendering in the editor. Change 3075241 on 2016/08/03 by Rolando.Caloca DR - Fix linux compile issue & static analysis warning Change 3075746 on 2016/08/03 by Daniel.Wright Removed bOverride_AntiAliasingMethod and outdated ini references to PP AntiAliasingMethod Change 3075783 on 2016/08/03 by Ryan.Brucks #code.review Marcus.Wassmer Added two material nodes that return Atmospheric Light Vector and Light Direction using: View.AtmosphericFogSunColor View.AtmosphericFogSunDirection Nodes are called: AtmosphericLightVector AtmosphericLightColor Also changed SceneRendering.cpp so that values will be grabbed from directional lights without needing an Atmospheric Fog actor in the scene. Change 3075969 on 2016/08/03 by Uriel.Doyon Material GUIDs are not updated anymore when parents or textures change. Lighting now uses a hash built from the list of parents, textures and shader functions. #review-3072980 @marcus.wassmer @daniel.wright Change 3076116 on 2016/08/03 by Ryan.Brucks #code.review marcus.wassmer Fixed typo in the Caption of new Nodes "Atmospheric Light Vector" and "Atmospheric Light Color" Change 3076456 on 2016/08/03 by Rolando.Caloca DR - Fix geometry shader gl_Layer for SPIR-V Change 3076730 on 2016/08/03 by Uriel.Doyon Added user warning logic for the texture streaming build. Ran in MapCheck, BeginPlay and PreSave. #review-3072984 @marcus.wassmer Change 3077616 on 2016/08/04 by Daniel.Wright Planar reflection show flags can now be edited Change 3077621 on 2016/08/04 by Daniel.Wright Changed default Planar Reflection DistanceFromPlaneFadeoutEnd from 600 to 100, which reduces artifacts and is a more intuitive initial setting Change 3077792 on 2016/08/04 by Daniel.Wright Fixed an unnecessary sky capture caused by the sky light component owned by the default ASkyLight Change 3077799 on 2016/08/04 by Daniel.Wright Skip RF_ArchetypeObject for reflection captures Change 3077876 on 2016/08/04 by Marc.Olano Noise material perf improvements Change random number generator for Gradient-ALU (1.7x perf boost), improve speed of Voronoi noise quality level 3. Removes integer BBS random number generators. Fewer instructions, but too slow to use (see 1.7x perf boost above) Change 3077884 on 2016/08/04 by Daniel.Wright Lighting channels can now be edited on components with static mobility, since dynamic lights can still affect them Change 3078994 on 2016/08/05 by Simon.Tovey Fix for UE-34241 Scene proxy ptr was being cached during a downcast. Inside a call to CreateDynamicData, CheckMaterialUsage_Concurrent() was causing the scene proxy to be recreated an so the cached ptr was stale. I've fixed the immediate issue but recreating the scene proxy here doesn't seem great. Maybe CheckMaterailUsage() should be rethought a bit. Change 3079162 on 2016/08/05 by Ben.Woodhouse Fix for jittering in Paper2D. Was caused by override being ignored due to a change in intiialization order for AA settings. #jira UE-34091 Change 3079613 on 2016/08/05 by Daniel.Wright New blueprint function ClearRenderTarget2D, which is the only way to set a render target alpha directly New blueprint function CreateRenderTarget2D Change 3079708 on 2016/08/05 by Uriel.Doyon Fixed crash when building texture streaming on some levels. Change 3079795 on 2016/08/05 by Uriel.Doyon Fixed issue with instanced static meshes when building texture streaming. Fixed typo with func "GetNumTextureStreamingPrimitives" Change 3079806 on 2016/08/05 by Uriel.Doyon Enabled PerTexture MipBias. The per texture mip bias now resets to 0 when the texture gets required at low resolution. New scalability setting named "r.Streaming.LimitPoolSizeToVRAM" enabling the PoolSize to be limited the available VRAM (according to GPoolSizeVRAMPercentage) #review-3074662 @marcus.wassmer Change 3082698 on 2016/08/09 by Daniel.Wright Copy - CreateRenderTarget2D uses a world context object as owner, allows use in a construction script Change 3082699 on 2016/08/09 by Daniel.Wright Changed display name for 'Two Sided' shading model to 'Two Sided Foliage' to make it clear what it's intended to be used for Change 3083909 on 2016/08/10 by Olaf.Piesche #jira UE-34106 #jira UE-32784 #jira UE-31198 Reset vertex factories on mesh emitters if mesh has been reimported (if mesh package is dirty) Change 3084645 on 2016/08/10 by Olaf.Piesche #jira UE-30398 Fix offset added to particle collision locations. Change 3084709 on 2016/08/10 by Daniel.Wright Copy - Scene capture alpha is now inverted to match DrawMaterialToRenderTarget, and to allow compositing with existing render target contents Added CompositeMode to SceneCapture2D, which can be used to addively accumulate or composite instead of the default overwrite behavior Added bCaptureOnMovement to SceneCapture, which can be disabled so the only source of scene capturing is a manual capture by calling CaptureScene() Change 3084783 on 2016/08/10 by Rolando.Caloca DR - Use the first targeted rhi shader platform as the initial RHI to load on Windows #jira UE-34510 Change 3084958 on 2016/08/10 by Daniel.Wright Copy - Reverted cl 2938543 "Lightmass now respects owner bHidden, and bCastHiddenShadow" because it did not have backwards compatibility so breaks content using hidden light cards Change 3086023 on 2016/08/11 by Marcus.Wassmer Merging //UE4/Dev-Main@3085468 to Dev-Rendering (//UE4/Dev-Rendering) #test none Change 3086778 on 2016/08/11 by Ben.Woodhouse Workaround for fortnite character rendering issue. Enable checkerboard rendering by default until we can fix properly #jira UE-34561 Change 3087404 on 2016/08/12 by Rolando.Caloca DR - Upgrade glslang to 1.0.21.1 - Added some more debug output Change 3087524 on 2016/08/12 by Rolando.Caloca DR - vk - Fixed StencilRef, fixed size of RHIReadSurfaceFloatData (but still returns dummy data) Change 3087663 on 2016/08/12 by Rolando.Caloca DR - vk - Fix for SRGB; support for mip texture views Change 3087735 on 2016/08/12 by Daniel.Wright TextureRenderTarget2D's can now be up to 8192^2. Anything over 2048 pops up an 'are you sure' dialog. Change 3087750 on 2016/08/12 by Rolando.Caloca DR - vk - Minor renaming in prep for merge Change 3087813 on 2016/08/12 by Rolando.Caloca DR - vk - More minor cleanup Change 3087819 on 2016/08/12 by Chris.Bunner Check material function input types directly, no need to traverse connected graph. #jira UE-32134 Change 3087901 on 2016/08/12 by Rolando.Caloca DR - vk - Fix RT view to use 1 mip Fix depth buffer component swizzle Change 3088193 on 2016/08/12 by Daniel.Wright DFAO and RTDF shadows are enabled in High and Epic scalability settings by default Change 3088988 on 2016/08/15 by Rolando.Caloca DR - Add Accessors Change 3089104 on 2016/08/15 by Olaf.Piesche #jira UE-34241 Sceneproxy can be nullptr in FDynamicMeshEmitterData::Init if the proxy is being recreated Change 3089208 on 2016/08/15 by Daniel.Wright Downsampled separate translucency uses a separate view uniform buffer with correct buffer sizes * Fixes WorldPosition in downsampled translucency * View uniform buffer parameters are now cached on the view, to allow recreating the uniform buffer without having to rebuild the entire struct. Currently used by global distance field, downsampled separate translucency. * Fixed the downsampled translucency depth buffer being full res used together with a smaller color target, now they are both the downsampled res Change 3089209 on 2016/08/15 by Daniel.Wright Fixed atmospheric fog on translucency Change 3089457 on 2016/08/15 by Daniel.Wright Fixed lighting build failure from UMaterialInstanceDynamic assigned to a mesh that's being exported to Lightmass. The Swarm cache entry is created using the parent's guid, causing multiple MID's with the same parent to acquire a file handle multiple times which fails after the first. Change 3089549 on 2016/08/15 by Daniel.Wright UMaterialInterface initializes LightingGuid to something valid - causes UMaterialInstanceDynamic to have a valid LightingGuid so they can be used in lighting builds Change 3089703 on 2016/08/15 by Daniel.Wright Custom expression fixup for View.RenderTargetSize Change 3090546 on 2016/08/16 by Daniel.Wright Hopeful fix for recycled snapshot view crash Change 3091202 on 2016/08/16 by Daniel.Wright Manually clear FViewInfo::CachedViewUniformShaderParameters on creating a snapshot, since memcpy is used to create the snapshot view [CL 3091931 by Gil Gribb in Main branch]
2016-08-17 11:38:13 -04:00
ViewUniformShaderParameters.GlobalDistanceFieldSampler3_UB = TStaticSamplerState<SF_Bilinear, AM_Wrap, AM_Wrap, AM_Wrap>::GetRHI();
}
void FViewInfo::SetupGlobalDistanceFieldUniformBufferParameters(FViewUniformShaderParameters& ViewUniformShaderParameters) const
{
check(GlobalDistanceFieldInfo.bInitialized);
for (int32 Index = 0; Index < GMaxGlobalDistanceFieldClipmaps; Index++)
{
ViewUniformShaderParameters.GlobalVolumeCenterAndExtent_UB[Index] = GlobalDistanceFieldInfo.ParameterData.CenterAndExtent[Index];
ViewUniformShaderParameters.GlobalVolumeWorldToUVAddAndMul_UB[Index] = GlobalDistanceFieldInfo.ParameterData.WorldToUVAddAndMul[Index];
}
ViewUniformShaderParameters.GlobalVolumeDimension_UB = GlobalDistanceFieldInfo.ParameterData.GlobalDFResolution;
ViewUniformShaderParameters.GlobalVolumeTexelSize_UB = 1.0f / GlobalDistanceFieldInfo.ParameterData.GlobalDFResolution;
ViewUniformShaderParameters.MaxGlobalDistance_UB = GlobalDistanceFieldInfo.ParameterData.MaxDistance;
ViewUniformShaderParameters.GlobalDistanceFieldTexture0_UB = OrBlack3DIfNull(GlobalDistanceFieldInfo.ParameterData.Textures[0]);
ViewUniformShaderParameters.GlobalDistanceFieldSampler0_UB = TStaticSamplerState<SF_Bilinear, AM_Wrap, AM_Wrap, AM_Wrap>::GetRHI();
ViewUniformShaderParameters.GlobalDistanceFieldTexture1_UB = OrBlack3DIfNull(GlobalDistanceFieldInfo.ParameterData.Textures[1]);
ViewUniformShaderParameters.GlobalDistanceFieldSampler1_UB = TStaticSamplerState<SF_Bilinear, AM_Wrap, AM_Wrap, AM_Wrap>::GetRHI();
ViewUniformShaderParameters.GlobalDistanceFieldTexture2_UB = OrBlack3DIfNull(GlobalDistanceFieldInfo.ParameterData.Textures[2]);
ViewUniformShaderParameters.GlobalDistanceFieldSampler2_UB = TStaticSamplerState<SF_Bilinear, AM_Wrap, AM_Wrap, AM_Wrap>::GetRHI();
ViewUniformShaderParameters.GlobalDistanceFieldTexture3_UB = OrBlack3DIfNull(GlobalDistanceFieldInfo.ParameterData.Textures[3]);
ViewUniformShaderParameters.GlobalDistanceFieldSampler3_UB = TStaticSamplerState<SF_Bilinear, AM_Wrap, AM_Wrap, AM_Wrap>::GetRHI();
}
/**
* Updates the global distance field for a view.
* Typically issues updates for just the newly exposed regions of the volume due to camera movement.
* In the worst case of a camera cut or large distance field scene changes, a full update of the global distance field will be done.
**/
void UpdateGlobalDistanceFieldVolume(
FRHICommandListImmediate& RHICmdList,
Copying //UE4/Dev-Rendering to Dev-Main (Source //UE4/Dev-Rendering@2932636) #lockdown nick.penwarden ========================== MAJOR FEATURES + CHANGES ========================== Change 2917472 on 2016/03/21 by Rolando.Caloca DR - Fix SCW directcompile arguments, add -pipeline Change 2919580 on 2016/03/23 by Rolando.Caloca DR - HlslParser - Fix for used elements (sparrow's arrow was showing when it shouldn't) Arrays of input/outputs are now flattened so disjoint entries can be optimized out (and fixes a bug) #jira OR-15380 #tests Run game with sparrow, test with slomo to check for gfx glitches Change 2919660 on 2016/03/23 by Rolando.Caloca DR - Latest vk changes (from dev mobile's 2916881 to 2919157) Change 2919902 on 2016/03/23 by Rolando.Caloca DR - Fix skeletal meshes decrementing stats twice #codereview Marcus.Wassmer #jira UE-28478 Change 2920020 on 2016/03/23 by David.Hill #Jira UE-28503 EyeAdapation when used in material shader may not be initialized. #rb olaf.piesche Change 2920071 on 2016/03/23 by Rolando.Caloca DR - Remove old vk define - Started moving around direct calls to queue submit Change 2920252 on 2016/03/23 by Rolando.Caloca DR - Changes vk structs to classes Change 2920314 on 2016/03/23 by Olaf.Piesche Add -windowed to standalone game PIE command line to avoid PIE launching in full screen #jira UE-27870 #codereview michael.trepka Change 2920745 on 2016/03/24 by Uriel.Doyon Texture streaming build now takes into account the material texcoord scales applied to the texture sampling. Also finds out which texcoord is being used when sampling textures (between 0 and 3 currently). TexCoord analysis debug view shaders is now working with SM4 ane SM5. StaticMeshComponents hold persistent data coming from the texture streaming build. #tests tested with different Paragon assets. Editor SM4 & SM5. Cooked maps #codereview marcus.wassmer Change 2921335 on 2016/03/24 by Uriel.Doyon Added missing static keyword for locally defined console variable. #codereview rolando.caloca Change 2921416 on 2016/03/24 by Uriel.Doyon Revert enabling debugview shaders on non PC platforms (until properly tested and debugged) Change 2921446 on 2016/03/24 by Daniel.Wright Planar reflection mesh Change 2921530 on 2016/03/24 by Daniel.Wright Manual revert of Ronin planar reflections #codereview Ryan.Vance Change 2921608 on 2016/03/24 by Uriel.Doyon Updated texture streamer to take into account the new HLOD texture group. Change 2921677 on 2016/03/24 by Daniel.Wright Distance Field Specular Occlusion * Prototype - disabled by default Change 2921681 on 2016/03/24 by Daniel.Wright UnmappedTexelsPercentage is now 100 based Change 2921682 on 2016/03/24 by Daniel.Wright Planar reflections * New actor and component * The scene is rendered to texture with a mirrored camera and a clip plane each frame * The reflection texture is then applied to opaque pixels in a deferred pass, with distance and angle from plane fades * Translucent materials apply the nearest reflection plane in the base pass * Planar reflections require the project setting 'Support global clip plane for Planar Reflections' to be enabled, since writing to SV_ClipDistance all the time adds about 15% BasePass GPU time on PS4 * Fixed global distance field in materials which had been broken since moving global distance field properties into the view uniform buffer * Fixed PS4 removing system-value semantics when output from vertex shader and not read in next stage Change 2921734 on 2016/03/24 by Uriel.Doyon Fixed tessellated cube having wrong UVs #jira UE-28379 Change 2922063 on 2016/03/24 by Daniel.Wright Removed planar reflection debug code Change 2922428 on 2016/03/25 by Chris.Bunner Delete FShaderPipeline objects when clearing TMaterialShaderMaps. #rb Rolando.Caloca #jira UE-28621 Change 2922803 on 2016/03/25 by Rolando.Caloca DR - New cmd buffer management (disabled) - Move cmd buffer out of pending state and into context - Do not hardcode # cmd buffers - Move back buffer image mgmt into swapchain - Fixed some image layout transition bugs Change 2923056 on 2016/03/25 by Rolando.Caloca DR - Initial fix for canvas locking inside a render pass [CL 2932649 by Gil Gribb in Main branch]
2016-04-04 18:44:59 -04:00
FViewInfo& View,
const FScene* Scene,
float MaxOcclusionDistance,
FGlobalDistanceFieldInfo& GlobalDistanceFieldInfo)
{
if (Scene->DistanceFieldSceneData.NumObjectsInBuffer > 0)
{
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
ComputeUpdateRegionsAndUpdateViewState(RHICmdList, View, Scene, GlobalDistanceFieldInfo, GMaxGlobalDistanceFieldClipmaps, MaxOcclusionDistance);
Copying //UE4/Dev-Rendering to Dev-Main (Source //UE4/Dev-Rendering@2932636) #lockdown nick.penwarden ========================== MAJOR FEATURES + CHANGES ========================== Change 2917472 on 2016/03/21 by Rolando.Caloca DR - Fix SCW directcompile arguments, add -pipeline Change 2919580 on 2016/03/23 by Rolando.Caloca DR - HlslParser - Fix for used elements (sparrow's arrow was showing when it shouldn't) Arrays of input/outputs are now flattened so disjoint entries can be optimized out (and fixes a bug) #jira OR-15380 #tests Run game with sparrow, test with slomo to check for gfx glitches Change 2919660 on 2016/03/23 by Rolando.Caloca DR - Latest vk changes (from dev mobile's 2916881 to 2919157) Change 2919902 on 2016/03/23 by Rolando.Caloca DR - Fix skeletal meshes decrementing stats twice #codereview Marcus.Wassmer #jira UE-28478 Change 2920020 on 2016/03/23 by David.Hill #Jira UE-28503 EyeAdapation when used in material shader may not be initialized. #rb olaf.piesche Change 2920071 on 2016/03/23 by Rolando.Caloca DR - Remove old vk define - Started moving around direct calls to queue submit Change 2920252 on 2016/03/23 by Rolando.Caloca DR - Changes vk structs to classes Change 2920314 on 2016/03/23 by Olaf.Piesche Add -windowed to standalone game PIE command line to avoid PIE launching in full screen #jira UE-27870 #codereview michael.trepka Change 2920745 on 2016/03/24 by Uriel.Doyon Texture streaming build now takes into account the material texcoord scales applied to the texture sampling. Also finds out which texcoord is being used when sampling textures (between 0 and 3 currently). TexCoord analysis debug view shaders is now working with SM4 ane SM5. StaticMeshComponents hold persistent data coming from the texture streaming build. #tests tested with different Paragon assets. Editor SM4 & SM5. Cooked maps #codereview marcus.wassmer Change 2921335 on 2016/03/24 by Uriel.Doyon Added missing static keyword for locally defined console variable. #codereview rolando.caloca Change 2921416 on 2016/03/24 by Uriel.Doyon Revert enabling debugview shaders on non PC platforms (until properly tested and debugged) Change 2921446 on 2016/03/24 by Daniel.Wright Planar reflection mesh Change 2921530 on 2016/03/24 by Daniel.Wright Manual revert of Ronin planar reflections #codereview Ryan.Vance Change 2921608 on 2016/03/24 by Uriel.Doyon Updated texture streamer to take into account the new HLOD texture group. Change 2921677 on 2016/03/24 by Daniel.Wright Distance Field Specular Occlusion * Prototype - disabled by default Change 2921681 on 2016/03/24 by Daniel.Wright UnmappedTexelsPercentage is now 100 based Change 2921682 on 2016/03/24 by Daniel.Wright Planar reflections * New actor and component * The scene is rendered to texture with a mirrored camera and a clip plane each frame * The reflection texture is then applied to opaque pixels in a deferred pass, with distance and angle from plane fades * Translucent materials apply the nearest reflection plane in the base pass * Planar reflections require the project setting 'Support global clip plane for Planar Reflections' to be enabled, since writing to SV_ClipDistance all the time adds about 15% BasePass GPU time on PS4 * Fixed global distance field in materials which had been broken since moving global distance field properties into the view uniform buffer * Fixed PS4 removing system-value semantics when output from vertex shader and not read in next stage Change 2921734 on 2016/03/24 by Uriel.Doyon Fixed tessellated cube having wrong UVs #jira UE-28379 Change 2922063 on 2016/03/24 by Daniel.Wright Removed planar reflection debug code Change 2922428 on 2016/03/25 by Chris.Bunner Delete FShaderPipeline objects when clearing TMaterialShaderMaps. #rb Rolando.Caloca #jira UE-28621 Change 2922803 on 2016/03/25 by Rolando.Caloca DR - New cmd buffer management (disabled) - Move cmd buffer out of pending state and into context - Do not hardcode # cmd buffers - Move back buffer image mgmt into swapchain - Fixed some image layout transition bugs Change 2923056 on 2016/03/25 by Rolando.Caloca DR - Initial fix for canvas locking inside a render pass [CL 2932649 by Gil Gribb in Main branch]
2016-04-04 18:44:59 -04:00
// Recreate the view uniform buffer now that we have updated GlobalDistanceFieldInfo
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3091903) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3072947 on 2016/08/01 by Uriel.Doyon Texture GUIDs are now included in cooked builds, as they are required by the texture streamer to link build data to in game texture. #review-3072934 @marcus.wassmer #jira UE-34045 Change 3073301 on 2016/08/02 by Ben.Woodhouse Fix for large spotlight culling precision issues, reported on UDN by Aaron Jacobs at Double Fine. For a full description, see the UDN post https://udn.unrealengine.com/questions/305440/shadowed-light-flicker-caused-by-floating-point-pr.html #jira UE-34052 Change 3073689 on 2016/08/02 by Ben.Woodhouse Improved skin postprocess - support for full resolution, with diffuse/spec lighting combined into single RGBA (sharing chroma) Full res lighting gives less temporal AA flickering, sharper diffuse and specular lighting in the surface (since this is now at full resolution), faster postprocessing if using a 64-bit rendertarget (on NV 980Ti). Checkerboard rendering is controlled via the r.sss.checkerboard cvar. - 0 is off/full res, 1 is checkerboard, 2 is automatic based on scenecolor (non-checkerboard requires 64bit or more rendertarget w/separate alpha) Tested/profiled on PC, PS4 Change 3074666 on 2016/08/02 by Daniel.Wright Fixed stationary skylight brightness Change 3074667 on 2016/08/02 by Daniel.Wright Fixed r.ReflectionEnvironmentLightmapMixing Change 3074687 on 2016/08/02 by Daniel.Wright Disallowed DrawMaterialToRenderTarget and Begin/EndDrawCanvasToRenderTarget in construction scripts, since they don't work in game. Blutilities can be used to do blueprint rendering in the editor. Change 3075241 on 2016/08/03 by Rolando.Caloca DR - Fix linux compile issue & static analysis warning Change 3075746 on 2016/08/03 by Daniel.Wright Removed bOverride_AntiAliasingMethod and outdated ini references to PP AntiAliasingMethod Change 3075783 on 2016/08/03 by Ryan.Brucks #code.review Marcus.Wassmer Added two material nodes that return Atmospheric Light Vector and Light Direction using: View.AtmosphericFogSunColor View.AtmosphericFogSunDirection Nodes are called: AtmosphericLightVector AtmosphericLightColor Also changed SceneRendering.cpp so that values will be grabbed from directional lights without needing an Atmospheric Fog actor in the scene. Change 3075969 on 2016/08/03 by Uriel.Doyon Material GUIDs are not updated anymore when parents or textures change. Lighting now uses a hash built from the list of parents, textures and shader functions. #review-3072980 @marcus.wassmer @daniel.wright Change 3076116 on 2016/08/03 by Ryan.Brucks #code.review marcus.wassmer Fixed typo in the Caption of new Nodes "Atmospheric Light Vector" and "Atmospheric Light Color" Change 3076456 on 2016/08/03 by Rolando.Caloca DR - Fix geometry shader gl_Layer for SPIR-V Change 3076730 on 2016/08/03 by Uriel.Doyon Added user warning logic for the texture streaming build. Ran in MapCheck, BeginPlay and PreSave. #review-3072984 @marcus.wassmer Change 3077616 on 2016/08/04 by Daniel.Wright Planar reflection show flags can now be edited Change 3077621 on 2016/08/04 by Daniel.Wright Changed default Planar Reflection DistanceFromPlaneFadeoutEnd from 600 to 100, which reduces artifacts and is a more intuitive initial setting Change 3077792 on 2016/08/04 by Daniel.Wright Fixed an unnecessary sky capture caused by the sky light component owned by the default ASkyLight Change 3077799 on 2016/08/04 by Daniel.Wright Skip RF_ArchetypeObject for reflection captures Change 3077876 on 2016/08/04 by Marc.Olano Noise material perf improvements Change random number generator for Gradient-ALU (1.7x perf boost), improve speed of Voronoi noise quality level 3. Removes integer BBS random number generators. Fewer instructions, but too slow to use (see 1.7x perf boost above) Change 3077884 on 2016/08/04 by Daniel.Wright Lighting channels can now be edited on components with static mobility, since dynamic lights can still affect them Change 3078994 on 2016/08/05 by Simon.Tovey Fix for UE-34241 Scene proxy ptr was being cached during a downcast. Inside a call to CreateDynamicData, CheckMaterialUsage_Concurrent() was causing the scene proxy to be recreated an so the cached ptr was stale. I've fixed the immediate issue but recreating the scene proxy here doesn't seem great. Maybe CheckMaterailUsage() should be rethought a bit. Change 3079162 on 2016/08/05 by Ben.Woodhouse Fix for jittering in Paper2D. Was caused by override being ignored due to a change in intiialization order for AA settings. #jira UE-34091 Change 3079613 on 2016/08/05 by Daniel.Wright New blueprint function ClearRenderTarget2D, which is the only way to set a render target alpha directly New blueprint function CreateRenderTarget2D Change 3079708 on 2016/08/05 by Uriel.Doyon Fixed crash when building texture streaming on some levels. Change 3079795 on 2016/08/05 by Uriel.Doyon Fixed issue with instanced static meshes when building texture streaming. Fixed typo with func "GetNumTextureStreamingPrimitives" Change 3079806 on 2016/08/05 by Uriel.Doyon Enabled PerTexture MipBias. The per texture mip bias now resets to 0 when the texture gets required at low resolution. New scalability setting named "r.Streaming.LimitPoolSizeToVRAM" enabling the PoolSize to be limited the available VRAM (according to GPoolSizeVRAMPercentage) #review-3074662 @marcus.wassmer Change 3082698 on 2016/08/09 by Daniel.Wright Copy - CreateRenderTarget2D uses a world context object as owner, allows use in a construction script Change 3082699 on 2016/08/09 by Daniel.Wright Changed display name for 'Two Sided' shading model to 'Two Sided Foliage' to make it clear what it's intended to be used for Change 3083909 on 2016/08/10 by Olaf.Piesche #jira UE-34106 #jira UE-32784 #jira UE-31198 Reset vertex factories on mesh emitters if mesh has been reimported (if mesh package is dirty) Change 3084645 on 2016/08/10 by Olaf.Piesche #jira UE-30398 Fix offset added to particle collision locations. Change 3084709 on 2016/08/10 by Daniel.Wright Copy - Scene capture alpha is now inverted to match DrawMaterialToRenderTarget, and to allow compositing with existing render target contents Added CompositeMode to SceneCapture2D, which can be used to addively accumulate or composite instead of the default overwrite behavior Added bCaptureOnMovement to SceneCapture, which can be disabled so the only source of scene capturing is a manual capture by calling CaptureScene() Change 3084783 on 2016/08/10 by Rolando.Caloca DR - Use the first targeted rhi shader platform as the initial RHI to load on Windows #jira UE-34510 Change 3084958 on 2016/08/10 by Daniel.Wright Copy - Reverted cl 2938543 "Lightmass now respects owner bHidden, and bCastHiddenShadow" because it did not have backwards compatibility so breaks content using hidden light cards Change 3086023 on 2016/08/11 by Marcus.Wassmer Merging //UE4/Dev-Main@3085468 to Dev-Rendering (//UE4/Dev-Rendering) #test none Change 3086778 on 2016/08/11 by Ben.Woodhouse Workaround for fortnite character rendering issue. Enable checkerboard rendering by default until we can fix properly #jira UE-34561 Change 3087404 on 2016/08/12 by Rolando.Caloca DR - Upgrade glslang to 1.0.21.1 - Added some more debug output Change 3087524 on 2016/08/12 by Rolando.Caloca DR - vk - Fixed StencilRef, fixed size of RHIReadSurfaceFloatData (but still returns dummy data) Change 3087663 on 2016/08/12 by Rolando.Caloca DR - vk - Fix for SRGB; support for mip texture views Change 3087735 on 2016/08/12 by Daniel.Wright TextureRenderTarget2D's can now be up to 8192^2. Anything over 2048 pops up an 'are you sure' dialog. Change 3087750 on 2016/08/12 by Rolando.Caloca DR - vk - Minor renaming in prep for merge Change 3087813 on 2016/08/12 by Rolando.Caloca DR - vk - More minor cleanup Change 3087819 on 2016/08/12 by Chris.Bunner Check material function input types directly, no need to traverse connected graph. #jira UE-32134 Change 3087901 on 2016/08/12 by Rolando.Caloca DR - vk - Fix RT view to use 1 mip Fix depth buffer component swizzle Change 3088193 on 2016/08/12 by Daniel.Wright DFAO and RTDF shadows are enabled in High and Epic scalability settings by default Change 3088988 on 2016/08/15 by Rolando.Caloca DR - Add Accessors Change 3089104 on 2016/08/15 by Olaf.Piesche #jira UE-34241 Sceneproxy can be nullptr in FDynamicMeshEmitterData::Init if the proxy is being recreated Change 3089208 on 2016/08/15 by Daniel.Wright Downsampled separate translucency uses a separate view uniform buffer with correct buffer sizes * Fixes WorldPosition in downsampled translucency * View uniform buffer parameters are now cached on the view, to allow recreating the uniform buffer without having to rebuild the entire struct. Currently used by global distance field, downsampled separate translucency. * Fixed the downsampled translucency depth buffer being full res used together with a smaller color target, now they are both the downsampled res Change 3089209 on 2016/08/15 by Daniel.Wright Fixed atmospheric fog on translucency Change 3089457 on 2016/08/15 by Daniel.Wright Fixed lighting build failure from UMaterialInstanceDynamic assigned to a mesh that's being exported to Lightmass. The Swarm cache entry is created using the parent's guid, causing multiple MID's with the same parent to acquire a file handle multiple times which fails after the first. Change 3089549 on 2016/08/15 by Daniel.Wright UMaterialInterface initializes LightingGuid to something valid - causes UMaterialInstanceDynamic to have a valid LightingGuid so they can be used in lighting builds Change 3089703 on 2016/08/15 by Daniel.Wright Custom expression fixup for View.RenderTargetSize Change 3090546 on 2016/08/16 by Daniel.Wright Hopeful fix for recycled snapshot view crash Change 3091202 on 2016/08/16 by Daniel.Wright Manually clear FViewInfo::CachedViewUniformShaderParameters on creating a snapshot, since memcpy is used to create the snapshot view [CL 3091931 by Gil Gribb in Main branch]
2016-08-17 11:38:13 -04:00
View.SetupGlobalDistanceFieldUniformBufferParameters(*View.CachedViewUniformShaderParameters);
View.ViewUniformBuffer = TUniformBufferRef<FViewUniformShaderParameters>::CreateUniformBufferImmediate(*View.CachedViewUniformShaderParameters, UniformBuffer_SingleFrame);
Copying //UE4/Dev-Rendering to Dev-Main (Source //UE4/Dev-Rendering@2932636) #lockdown nick.penwarden ========================== MAJOR FEATURES + CHANGES ========================== Change 2917472 on 2016/03/21 by Rolando.Caloca DR - Fix SCW directcompile arguments, add -pipeline Change 2919580 on 2016/03/23 by Rolando.Caloca DR - HlslParser - Fix for used elements (sparrow's arrow was showing when it shouldn't) Arrays of input/outputs are now flattened so disjoint entries can be optimized out (and fixes a bug) #jira OR-15380 #tests Run game with sparrow, test with slomo to check for gfx glitches Change 2919660 on 2016/03/23 by Rolando.Caloca DR - Latest vk changes (from dev mobile's 2916881 to 2919157) Change 2919902 on 2016/03/23 by Rolando.Caloca DR - Fix skeletal meshes decrementing stats twice #codereview Marcus.Wassmer #jira UE-28478 Change 2920020 on 2016/03/23 by David.Hill #Jira UE-28503 EyeAdapation when used in material shader may not be initialized. #rb olaf.piesche Change 2920071 on 2016/03/23 by Rolando.Caloca DR - Remove old vk define - Started moving around direct calls to queue submit Change 2920252 on 2016/03/23 by Rolando.Caloca DR - Changes vk structs to classes Change 2920314 on 2016/03/23 by Olaf.Piesche Add -windowed to standalone game PIE command line to avoid PIE launching in full screen #jira UE-27870 #codereview michael.trepka Change 2920745 on 2016/03/24 by Uriel.Doyon Texture streaming build now takes into account the material texcoord scales applied to the texture sampling. Also finds out which texcoord is being used when sampling textures (between 0 and 3 currently). TexCoord analysis debug view shaders is now working with SM4 ane SM5. StaticMeshComponents hold persistent data coming from the texture streaming build. #tests tested with different Paragon assets. Editor SM4 & SM5. Cooked maps #codereview marcus.wassmer Change 2921335 on 2016/03/24 by Uriel.Doyon Added missing static keyword for locally defined console variable. #codereview rolando.caloca Change 2921416 on 2016/03/24 by Uriel.Doyon Revert enabling debugview shaders on non PC platforms (until properly tested and debugged) Change 2921446 on 2016/03/24 by Daniel.Wright Planar reflection mesh Change 2921530 on 2016/03/24 by Daniel.Wright Manual revert of Ronin planar reflections #codereview Ryan.Vance Change 2921608 on 2016/03/24 by Uriel.Doyon Updated texture streamer to take into account the new HLOD texture group. Change 2921677 on 2016/03/24 by Daniel.Wright Distance Field Specular Occlusion * Prototype - disabled by default Change 2921681 on 2016/03/24 by Daniel.Wright UnmappedTexelsPercentage is now 100 based Change 2921682 on 2016/03/24 by Daniel.Wright Planar reflections * New actor and component * The scene is rendered to texture with a mirrored camera and a clip plane each frame * The reflection texture is then applied to opaque pixels in a deferred pass, with distance and angle from plane fades * Translucent materials apply the nearest reflection plane in the base pass * Planar reflections require the project setting 'Support global clip plane for Planar Reflections' to be enabled, since writing to SV_ClipDistance all the time adds about 15% BasePass GPU time on PS4 * Fixed global distance field in materials which had been broken since moving global distance field properties into the view uniform buffer * Fixed PS4 removing system-value semantics when output from vertex shader and not read in next stage Change 2921734 on 2016/03/24 by Uriel.Doyon Fixed tessellated cube having wrong UVs #jira UE-28379 Change 2922063 on 2016/03/24 by Daniel.Wright Removed planar reflection debug code Change 2922428 on 2016/03/25 by Chris.Bunner Delete FShaderPipeline objects when clearing TMaterialShaderMaps. #rb Rolando.Caloca #jira UE-28621 Change 2922803 on 2016/03/25 by Rolando.Caloca DR - New cmd buffer management (disabled) - Move cmd buffer out of pending state and into context - Do not hardcode # cmd buffers - Move back buffer image mgmt into swapchain - Fixed some image layout transition bugs Change 2923056 on 2016/03/25 by Rolando.Caloca DR - Initial fix for canvas locking inside a render pass [CL 2932649 by Gil Gribb in Main branch]
2016-04-04 18:44:59 -04:00
bool bHasUpdateRegions = false;
for (int32 ClipmapIndex = 0; ClipmapIndex < GlobalDistanceFieldInfo.Clipmaps.Num(); ClipmapIndex++)
{
bHasUpdateRegions = bHasUpdateRegions || GlobalDistanceFieldInfo.Clipmaps[ClipmapIndex].UpdateRegions.Num() > 0;
}
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
for (int32 ClipmapIndex = 0; ClipmapIndex < GlobalDistanceFieldInfo.MostlyStaticClipmaps.Num(); ClipmapIndex++)
{
bHasUpdateRegions = bHasUpdateRegions || GlobalDistanceFieldInfo.MostlyStaticClipmaps[ClipmapIndex].UpdateRegions.Num() > 0;
}
if (bHasUpdateRegions && GAOUpdateGlobalDistanceField)
{
SCOPED_DRAW_EVENT(RHICmdList, UpdateGlobalDistanceFieldVolume);
if (GGlobalDistanceFieldCulledObjectBuffers.Buffers.MaxObjects < Scene->DistanceFieldSceneData.NumObjectsInBuffer
|| GGlobalDistanceFieldCulledObjectBuffers.Buffers.MaxObjects > 3 * Scene->DistanceFieldSceneData.NumObjectsInBuffer)
{
GGlobalDistanceFieldCulledObjectBuffers.Buffers.MaxObjects = Scene->DistanceFieldSceneData.NumObjectsInBuffer * 5 / 4;
GGlobalDistanceFieldCulledObjectBuffers.Buffers.Release();
GGlobalDistanceFieldCulledObjectBuffers.Buffers.Initialize();
}
const uint32 MaxCullGridDimension = GAOGlobalDFResolution / GCullGridTileSize;
if (GObjectGridBuffers.GridDimension != MaxCullGridDimension)
{
GObjectGridBuffers.GridDimension = MaxCullGridDimension;
GObjectGridBuffers.UpdateRHI();
}
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
const FGlobalDFCacheType StartCacheType = GAOGlobalDistanceFieldCacheMostlyStaticSeparately ? GDF_MostlyStatic : GDF_Full;
for (int32 CacheType = StartCacheType; CacheType < GDF_Num; CacheType++)
{
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
TArray<FGlobalDistanceFieldClipmap>& Clipmaps = CacheType == GDF_MostlyStatic
? GlobalDistanceFieldInfo.MostlyStaticClipmaps
: GlobalDistanceFieldInfo.Clipmaps;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
for (int32 ClipmapIndex = 0; ClipmapIndex < Clipmaps.Num(); ClipmapIndex++)
{
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
SCOPED_DRAW_EVENTF(RHICmdList, Clipmap, TEXT("CacheType %u Clipmap %u"), CacheType, ClipmapIndex);
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
FGlobalDistanceFieldClipmap& Clipmap = Clipmaps[ClipmapIndex];
for (int32 UpdateRegionIndex = 0; UpdateRegionIndex < Clipmap.UpdateRegions.Num(); UpdateRegionIndex++)
{
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
const FVolumeUpdateRegion& UpdateRegion = Clipmap.UpdateRegions[UpdateRegionIndex];
if (UpdateRegion.UpdateType & VUT_MeshDistanceFields)
{
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3185985) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3170391 on 2016/10/21 by Ben.Woodhouse Remove the wait on end of frame ensure, because we can't rely on all the the underlying codepaths to never miss a call to flush RHI resources. The consequences of missing a flush on a given frame are not serious now, since we enforce the synchronisation with a fence, preventing the rendering thread from getting too far ahead. We will simply accumulate resources for an additional frame when this happens. #jira UE-37437 #fyi rolando.caloca, marcus.wassmer Change 3170659 on 2016/10/21 by Rolando.Caloca DR - vk - Prep work for state key changes Change 3170676 on 2016/10/21 by Rolando.Caloca DR - vk - Reworked blend state keys - Added depth/stencil to pipeline key Change 3170848 on 2016/10/21 by Daniel.Wright Level viewport 'show stats' option is now enabled by default, which avoids confusion with artists thinking lighting is built, when really the message is hidden. Change 3170849 on 2016/10/21 by Daniel.Wright Split FProjectedShadowInfo::RenderProjection into smaller functions which make the algorithm structure clear Change 3170995 on 2016/10/21 by Rolando.Caloca DR - vk - Show object on vulkan validation msgs Change 3171085 on 2016/10/21 by Rolando.Caloca DR - vk - Fix pipelines being used with incompatible renderpasses Change 3171159 on 2016/10/21 by Rolando.Caloca DR - vk - Fix layout when reading textures on CPU Change 3171167 on 2016/10/21 by Rolando.Caloca DR - vk - compile fix Change 3172462 on 2016/10/24 by Daniel.Wright Added a warning about shader compile times to the material tooltip Change 3172463 on 2016/10/24 by Daniel.Wright Reduced MinUnoccludedFraction to avoid artitfacts when a stationary light touches only a tiny part of a mesh Change 3172716 on 2016/10/24 by Brian.Karis Fix for crash UE-37369 when reimporting over a generated LOD. Change 3172967 on 2016/10/24 by Rolando.Caloca DR - vk - Fix writing buffers while GPU was using them Change 3174187 on 2016/10/25 by Olaf.Piesche UE-37020 Change 3174718 on 2016/10/26 by Rolando.Caloca DR - vk - Remove old timestamp queries, increase occlusion queries per pool to 4k Change 3175960 on 2016/10/26 by Rolando.Caloca DR - Added support for hlslcc header to have custom parsing Change 3176611 on 2016/10/27 by David.Hill DrawWireCone confusion: In response to a UDN, I'm updating confusing parameter names and comments for DrawWireCone() and DrawWireSphereCappedCone() Change 3177111 on 2016/10/27 by Rolando.Caloca DR - vk - Fix timestamps for frame Change 3177192 on 2016/10/27 by Arne.Schober DR - DitherLOD refactor - moved computation of the DepthStencil state out of SetMeshRenderState into GetDitheredLODTransitionState this is a prerequisite of further PSO work where we want to move up State setting in a similar war and reuse FMeshDrawingRenderState Change 3177278 on 2016/10/27 by Olaf.Piesche UE-37484 Change 3177297 on 2016/10/27 by Rolando.Caloca DR - vk - Enable GRHISupportsBaseVertexIndex Change 3177607 on 2016/10/27 by Rolando.Caloca DR - vk - SM4 UB prep Change 3178052 on 2016/10/28 by Arne.Schober DR - fix WebGL - the WebGL compiler is very picky on double underscores and does want the presission to be defined before any function definition. Change 3178156 on 2016/10/28 by Rolando.Caloca DR - vk - Added query timer - Fixed inline issues Change 3178158 on 2016/10/28 by Rolando.Caloca DR - vk - Fixes for out of stencil bits Change 3178462 on 2016/10/28 by Rolando.Caloca DR - vk - Fixes for Elemental Change 3179131 on 2016/10/28 by Rolando.Caloca DR - vk - Fix for r.Vulkan.UseRealUBs Change 3179139 on 2016/10/28 by Rolando.Caloca DR - vk - Move UB ring buffer to context Change 3179145 on 2016/10/28 by Rolando.Caloca DR - vk - Fix buffer barriers Change 3179888 on 2016/10/31 by Rolando.Caloca DR - vk - Align buffers to 16 bytes as we sometimes write to them with SIMD Change 3179923 on 2016/10/31 by Rolando.Caloca DR - vk - Wait for swapchain counter Change 3180430 on 2016/10/31 by Rolando.Caloca DR - vk - Properly wait for occlusion queries/cmd buffer - Actual log error if trying to use occlusion queries out of order Change 3180746 on 2016/10/31 by Rolando.Caloca DR - vk - Undo some waiting as it was on the wrong thread Change 3182115 on 2016/11/01 by Rolando.Caloca DR - hlslcc Linux path fix Change 3182118 on 2016/11/01 by Daniel.Wright Fixed global distance field seam artifacts from landscapes with no subsections Change 3182368 on 2016/11/01 by Daniel.Wright Dynamic Indirect Shadows for static meshes using distance fields * These Distance Field indirect shadows use the same tile culled and downsampled framework that capsule shadows use, with similar GPU cost * Individual StaticMesh assets can enable bGenerateMeshDistanceField to compute a distance field, without the memory cost of enabling for the whole project * New StaticMeshComponent properties bCastDynamicIndirectShadow and DynamicIndirectShadowMinVisibility * New WorldSettings property DynamicIndirectShadowsSelfShadowingIntensity which replaces the cvar * The GBuffer now stores HasDynamicIndirectShadowCasterRepresentation instead of HasHeightfieldRepresentation * DFAO from landscape is now done through the global distance field entirely. Landscape contribution to the global distance field is deferred to attempt to workaround texture streaming issues. Change 3182408 on 2016/11/01 by Rolando.Caloca DR - vk - Reworked occlusion queries, fixes flickering on AMD Change 3182585 on 2016/11/01 by Daniel.Wright PS4 compile fix Change 3183151 on 2016/11/02 by Rolando.Caloca DR - vk - Fix issue when processing super quick cmd buffers Change 3183160 on 2016/11/02 by Rolando.Caloca Dr - vk - Call reset queries outside render pass Change 3183182 on 2016/11/02 by Rolando.Caloca DR - Switch clear Change 3183194 on 2016/11/02 by Rolando.Caloca DR - Try to catch crash ahead of time Change 3183268 on 2016/11/02 by Rolando.Caloca DR - vk - Rename RenderPassState to TransitionState Change 3183440 on 2016/11/02 by Daniel.Wright Renamed 'Dynamic Indirect Shadow' to 'Distance Field Indirect Shadow' Change 3183793 on 2016/11/02 by Daniel.Wright Added ShadowResolutionScale to lightcomponent Change 3183796 on 2016/11/02 by Daniel.Wright Improved bSimulatePhysics comment, with info on why it might be greyed out Change 3183797 on 2016/11/02 by Daniel.Wright Precomputed shadowmaps no longer enable Force2To1Aspect, which is only needed for lightmaps. Improves shadowmap utilization. Change 3183915 on 2016/11/02 by Rolando.Caloca DR - vk - Remove redundant renderpasses Change 3183991 on 2016/11/02 by Daniel.Wright Added r.ReflectionEnvironmentLightmapMixLargestWeight, useful for restricting lightmap mixing to darkening only Change 3184001 on 2016/11/02 by Daniel.Wright Better draw event for IndirectCapsuleShadows in stereo Change 3184096 on 2016/11/02 by Chris.Bunner HDR for D3D11 - NVAPI toggle and encoding, UI compositing. Removed some outdated tonemamping cvars and modes. Change 3184399 on 2016/11/02 by Daniel.Wright Static analysis workaround Change 3184455 on 2016/11/02 by Mark.Satterthwaite Fix missing log10 from FCompositePS on hlslcc shader platforms so that QA can continue their integration. #jira UE-38164 Change 3184953 on 2016/11/03 by Chris.Bunner Fixing CIS warnings. [CL 3186011 by Marcus Wassmer in Main branch]
2016-11-03 16:55:27 -04:00
{
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
SCOPED_DRAW_EVENT(RHICmdList, GridCull);
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
// Cull the global objects to the volume being updated
{
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3357411) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3244756 on 2017/01/03 by Marcus.Wassmer Copying //Tasks/UE4/Dev-Niagara@3244743 to Dev-Rendering (//UE4/Dev-Rendering) Change 3248667 on 2017/01/05 by Olaf.Piesche Resaving default asset because of engine verison issue; maybe unnecessary, but resaving niagara engine content to be sure #jira UE-40160 Change 3249324 on 2017/01/06 by Marcus.Wassmer Resave with an actual version to stop cook warning Change 3249611 on 2017/01/06 by Marcus.Wassmer Just remove warning-causing niagara data for now. Change 3308052 on 2017/02/16 by Rolando.Caloca DR - Check for Vulkan SDK, and only use it if it's newer or the same as the headers we distribute Change 3308109 on 2017/02/16 by Rolando.Caloca DR - Upgrade glslang to 1.0.39.1 Change 3308111 on 2017/02/16 by Rolando.Caloca DR - Update Vulkan distribution to 1.0.39.1 Change 3308153 on 2017/02/16 by Rolando.Caloca DR - Updated glslang libs Change 3308842 on 2017/02/17 by Rolando.Caloca DR - Fixed copy/paste Change 3310007 on 2017/02/17 by Chris.Bunner Back out CL 3221219 - causing MIC generation issues and superseded by CL 3273971. #jira UE-37792 Change 3310154 on 2017/02/17 by Chris.Bunner Assert when attempting to add a custom material attribute already in the base attributes list. Change 3310155 on 2017/02/17 by Chris.Bunner PR #3231: Validate material index before accessing (Contributed by projectgheist) #jira UE-41774, UE-41788 Change 3310162 on 2017/02/17 by Chris.Bunner PR #3252: Added MobileMaterialInterface to UsedMaterials (Contributed by projectgheist) #jira UE-41823, UE-41950 Change 3310176 on 2017/02/17 by Chris.Bunner Merging CL 3233886: AMD HDR support (requires r.AMDSupportsHDRDisplayOutput=1 in ini). Update to AGS 5.0.5. Partial code tidy up. Change 3310187 on 2017/02/17 by Chris.Bunner Preserve constant expressions rather than always casting after translating a material attribute. Losing the notion of constant means we can't correctly detect used properties and falsely enable e.g. PDO. Happened because of the incorrect component masks in BreakMaterialNodes which then had to be downcast to the correct type which is done as an inline fragment rather than swizzle expression. #jira UE-41594 Change 3310215 on 2017/02/17 by Chris.Bunner Prevent SpeedTree node compiling for skeletal meshes (not supported as uses more UV sets than available). More descriptive error for missing Cubemap UV input on TextureSample material node . #jira UE-33098 Change 3310838 on 2017/02/18 by Joe.Graf Moved some private functions to public for a licensee #CodeReview: matt.kuhlenschmidt #rb: n/a Change 3311876 on 2017/02/20 by Rolando.Caloca DR - Expose skin cache cvar r.SkinCache.AccumulationBufferSizeInKB #jira UE-42014 Change 3314139 on 2017/02/21 by Rolando.Caloca DR - Minor cleanup pass - Remove FVulkanPendingState - Renamed some classes for clarity - Hoist pending UAVs for flush out to pending compute state Change 3314642 on 2017/02/21 by Rolando.Caloca DR - Some more renaming Change 3315431 on 2017/02/21 by Ben.Salem Properly set default values for test time out and tick. We now will default to ticking once per second, and tracking the macro stats of GPU/Render/Game thread time. #tests Ran showdown demo several times Change 3316710 on 2017/02/22 by Rolando.Caloca DR - hlslcc - Fix refract intrinsic Change 3316718 on 2017/02/22 by Rolando.Caloca DR - hlslcc - Built libs to pick up change from 3316710 - refract fix Change 3316820 on 2017/02/22 by Benjamin.Hyder updating Tm-TrigNodes map Change 3317192 on 2017/02/22 by Benjamin.Hyder Updating QA-Decals map Change 3317528 on 2017/02/22 by Benjamin.Hyder Updating QA-Decals map Change 3317639 on 2017/02/22 by Benjamin.Hyder Updating Decal on Complex Mesh example in QA-Decals Change 3317764 on 2017/02/22 by Benjamin.Hyder Final updates to QA-Decals Change 3318319 on 2017/02/22 by Rolando.Caloca DR - minor reorg/rename Change 3318379 on 2017/02/22 by Rolando.Caloca DR - more cleanup Change 3321181 on 2017/02/24 by Rolando.Caloca DR - Fix GL bug Change 3321247 on 2017/02/24 by Rolando.Caloca DR - Fix misc bugs Change 3321898 on 2017/02/24 by Chris.Bunner Only issue clear TLV dispatch if required. #jira UERNDR-193 Change 3321904 on 2017/02/24 by Chris.Bunner Added comment for potential future optimization. Change 3322013 on 2017/02/24 by Uriel.Doyon Fixed separate translucency being affected by Gaussian DOF #jira UE-40489 Change 3322517 on 2017/02/24 by Uriel.Doyon Fixed issue with InvestigateTexture command removing budget limit. Fixed StreamingBounds show flag not working. It nows shows the streaming bound for the currently selected textures. #jira UE-40485 Change 3323470 on 2017/02/27 by Chad.Garyet Removing DDC job from dev-rendering Change 3323479 on 2017/02/27 by Chad.Garyet Removing RDU agent type Change 3323519 on 2017/02/27 by Chad.Garyet removing NCL/LHR/SEA agent types to clean up space Change 3323639 on 2017/02/27 by Benjamin.Hyder More updates to QA-Decals Change 3324207 on 2017/02/27 by Uriel.Doyon Fixed typo ScaleTexturesByGlobalMyBias -> ScaleTexturesByGlobalMipBias Removed bad merge in FStreamingTextureLevelContext::GetBuildDataIndexRef Change 3324396 on 2017/02/27 by Uriel.Doyon Fixed an issue with the Streaming Bounds show flag interferring with the static level data initialization #jira UE-40485 Change 3325227 on 2017/02/28 by Chris.Bunner Fix-up AMD AGS libs. Change 3325566 on 2017/02/28 by Uriel.Doyon Fixed possible out-of-bound access in GetUsedTexture() when passing ERHIFeatureLevel::Num Change 3326009 on 2017/02/28 by Uriel.Doyon Better fix for 3325566, as the previous fix would ignore the material instance overrides. Change 3327058 on 2017/03/01 by Benjamin.Hyder Preparing TM_Shadermodels map for automation Change 3328222 on 2017/03/01 by Chris.Bunner Prevent decals from drawing in separate translucency pass. Whilst user control and material relevance were already removed, if the flag was checked before being disabled (by swapping to decal domain) this was still being read in the render loop, now explicitly ignores decals. #jira UE-42449, UE-42446 Change 3329848 on 2017/03/02 by Uriel.Doyon Added some extra logs to help track UE-42168 Change 3329977 on 2017/03/02 by Rolando.Caloca DR - Fix bad clear value Change 3330008 on 2017/03/02 by Benjamin.Hyder More preparations for QA-Decals automation Change 3330754 on 2017/03/02 by Daniel.Wright Prominent comment explaining reflection env async compute usage and why it's not overlapped with anything Change 3331451 on 2017/03/03 by Marc.Olano Manually unroll simplex noise loop to avoid PSO bug on AMD/Metal Change 3331839 on 2017/03/03 by Rolando.Caloca DR - hlslcc - add missing file to project Change 3332247 on 2017/03/03 by Rolando.Caloca DR - Fix for integrated intel PR #3305 #jira UE-42393 Change 3332259 on 2017/03/03 by Rolando.Caloca DR - Fix bad index into pixel formats PR #3237 #jira UE-41855 Change 3332305 on 2017/03/03 by Rolando.Caloca DR - OpenGL SRV for index buffers PR #3271 #jira UE-32618 Change 3332313 on 2017/03/03 by Rolando.Caloca DR - Fix for integrated intel (properly) PR #3305 #jira UE-42393 Change 3332317 on 2017/03/03 by Rolando.Caloca DR - OpenGL SRV for index buffers (properly) PR #3271 #jira UE-32618 Change 3332368 on 2017/03/03 by Rolando.Caloca DR - Minor fixes so -sm4 and -sm5 can be used on windows with OpenGL/Vulkan Change 3333690 on 2017/03/06 by Daniel.Wright [Copy] Changing movable skylight properties no longer affects static draw lists Change 3333693 on 2017/03/06 by Daniel.Wright [Copy] Added 'r.AOListMeshDistanceFields' which dumps out mesh distance fields sorted by memory size, useful for directing content optimizations Change 3333705 on 2017/03/06 by Daniel.Wright [Copy] Mesh distance fields are now 8 bit fixed point by default, but can be changed back to 16 bit floating piont with a project setting. * 8 bit uses half memory but introduces error for thin surfaces or large meshes. Change 3333721 on 2017/03/06 by David.Hill DecalProxy: Copy float FadeScreenSize to FDeferredDecalProxy for use in the render thread. This avoids pointer chasing to the UDecalComponent (game thread component). Change 3333772 on 2017/03/06 by Daniel.Wright [Copy] Scene motion blur data is only updated for the main renderer frames. Fixes scene captures and planar reflections breaking object motion blur. Change 3333790 on 2017/03/06 by Daniel.Wright [Copy] Mesh distance field generation uses Embree, for a 2.5x speedup * Can switch back to old kDOP generation with 'r.DistanceFieldBuild.UseEmbree 0' for debugging Change 3333822 on 2017/03/06 by Daniel.Wright [Copy] Moved mesh distance field code into MeshDistanceFieldUtilities.cpp * Moved FMeshUtilities to its own header so the 8k line MeshUtilites.cpp file can be further split up Change 3333827 on 2017/03/06 by Daniel.Wright [Copy] Range compress 8bit distance fields - gets one extra bit of precision on average Change 3333828 on 2017/03/06 by Daniel.Wright [Copy] Raised High ShadowQuality to 2048 as 1024 for CSM is way too low Change 3333831 on 2017/03/06 by Daniel.Wright Non-editor compile fix Change 3333836 on 2017/03/06 by Daniel.Wright [Copy] Workaround for gobal distance field volume textures being bloated by 4x on PS4 due to the recommended tiling modes. They now use a 2d tiling mode which avoids the bloat, saving 96Mb. Change 3333843 on 2017/03/06 by Daniel.Wright [Copy] Added OcclusionExponent to skylight component * Useful for brightening up indoors without losing contact shadows as MinOcclusion does Change 3333845 on 2017/03/06 by Daniel.Wright [Copy] Capsule shadow BP functions Change 3333850 on 2017/03/06 by Daniel.Wright [Copy] Added OcclusionCombineMode to skylight component Change 3333854 on 2017/03/06 by Daniel.Wright [Copy] Gnm properly registers clears as GPU work so those events show up in profilegpu Change 3333857 on 2017/03/06 by Daniel.Wright [Copy] Clear light attenuation for local lights with a quad covering their screen extents * Clearing the entire light attenuation buffer costs .1ms on PS4. This optimization lowers the minimum cost of a shadow casting light from .15ms -> .03ms. * Shadowed lights in Fortnite with 25 lights 3.7ms -> 1.42ms on PS4 Change 3333860 on 2017/03/06 by Daniel.Wright [Copy] Flush deferred deletes when reallocating distance field atlas to reduce peak memory Change 3333861 on 2017/03/06 by Daniel.Wright [Copy] Disable all distance field features on Intel cards as HD 4000 hangs in the RHICreateTexture3D call to allocate the large atlas Change 3333869 on 2017/03/06 by Daniel.Wright [Copy] Volumetric Fog using a volume texture mapped to the camera frustum * Volumetric fog can be enabled on an Exponential Height Fog component with additional controls * Lights have a VolumetricScatteringIntensity * New cvars r.VolumetricFog, r.VolumetricFog.GridPixelSize, r.VolumetricFog.GridSizeZ, r.VolumetricFog.DepthDistributionScale * Lighting features supported: * Directional light with CSM and a light function * Point / spot lights without shadows / light functions / IES profiles * Skylight with occlusion from distance fields * Analytical height fog covers the view range past where the volumetric fog ends * Temporal reprojection is used on the volumetric fog scattering and extinction to achieve stability * Translucency integrates properly into volumetric fog * Height fog StartDistance is not supported by volumetric fog and should be set to 0. Change 3333894 on 2017/03/06 by Daniel.Wright [Copy] Initialize GDummyVolumetricFogGlobalDataUniformBuffer outside of parallel rendering Change 3333902 on 2017/03/06 by Daniel.Wright [Copy] Better handling of volumetric fog enabled with distance of 0 Change 3333903 on 2017/03/06 by Daniel.Wright [Copy] Fixed volumetric fog trying to render light functions for a point light Change 3333908 on 2017/03/06 by Daniel.Wright [Copy] Volumetric materials * Added new material domain Volume, which can output Scattering, Absorption and Emissive. All properties are in world space densities. * Particle systems using the Volume domain are voxelized based on their ParticlePosition and ParticleRadius * Volumetric fog integration is now energy conservative - scattering is integrated against transmission over the depth of each slice. * Added bOverrideLightColorsWithFogInscatteringColors to exponential height fog, which can be enabled to make Volumetric Fog match Height fog more closely Change 3334134 on 2017/03/06 by Daniel.Wright [Copy from Michael Trepka] Added Embree 2.14.0 and changed MeshUtilities to use it as this solves issues with Embree leaking TLS keys. UnrealLightmass is still using older Embree 2.7.0 until we can find time to properly test it with the new version. Also, invalidated distance field DDC to force it to rebuild with updated Embree. Change 3334420 on 2017/03/06 by Daniel.Wright Fixed RTDF shadows Change 3335467 on 2017/03/07 by Benjamin.Hyder Initial submission of QA-Decals map to EngineTest Change 3335556 on 2017/03/07 by Daniel.Wright Changed mesh distance field default format back to R16f Change 3338020 on 2017/03/08 by Daniel.Wright Disable volumetric fog in vertex shaders for feature levels which don't support it Change 3339394 on 2017/03/09 by Chris.Bunner Correctly handle material texture translation error edge case. #jira UE-42579, UE-42670 Change 3339992 on 2017/03/09 by Daniel.Wright Only compile volumetric fog shaders on supporting platforms Change 3341858 on 2017/03/10 by Arne.Schober Copying //UE4/Dev-Rendering-PSO to Dev-Rendering (//UE4/Dev-Rendering) #RB Rolando.Caloca, Marcus.Wassmer, Daniel.Wright, Nick.Penwarden, Mark.Satterthwaite Change 3342004 on 2017/03/10 by Arne.Schober Copying //UE4/Dev-Rendering-PSO to Dev-Rendering (//UE4/Dev-Rendering) Fix unity build #RB Marcus.Wassmer Change 3343307 on 2017/03/13 by Marcus.Wassmer Update showflags when we are guaranteed it will happen in all possible ways to spawn the scenecapture. (drag into editor, PIE, -game, etc) Change 3343732 on 2017/03/13 by Rolando.Caloca DR - Vulkan compute pipeline & refactor Change 3344846 on 2017/03/14 by Rolando.Caloca DR - Android compile fixes Change 3344883 on 2017/03/14 by Rolando.Caloca DR - Add missing stencil load/store to PSO initializer Change 3344985 on 2017/03/14 by Rolando.Caloca DR - Made load/store actions uint8 Change 3345141 on 2017/03/14 by Rolando.Caloca DR - vk - Rework render pass hash Change 3345304 on 2017/03/14 by Benjamin.Hyder Updating TM-Distancefields map to include TemplateFloor mesh Change 3345387 on 2017/03/14 by Rolando.Caloca DR - Add _RenderThread calls for Create*Shader so RHIs can choose not to stall when creating Change 3345388 on 2017/03/14 by Rolando.Caloca DR - Do not stall when creating shaders on Vulkan Change 3345722 on 2017/03/14 by Chris.Bunner PR #3357: MinimalAPI add to many material expressions (Contributed by DeanoC) #jira UE-42752 Change 3345723 on 2017/03/14 by Chris.Bunner Reduce log verbosity causing spamming during landscape editing. #jira UE-42714 Change 3345725 on 2017/03/14 by Chris.Bunner [Duplicate 3341860] Fixed material translation error with multiple connections from custom interpolator nodes. Change 3345726 on 2017/03/14 by Chris.Bunner Typo fixes. Change 3345732 on 2017/03/14 by Rolando.Caloca DR - Decouple vertex declaration off BSS Change 3345746 on 2017/03/14 by Chris.Bunner Added sign() intrinsic material graph node and delisted material function workaround. Change 3346042 on 2017/03/14 by Chris.Bunner Implement missing size query interface for FRenderTargetResources. #jira UE-41672 Change 3346387 on 2017/03/14 by Daniel.Wright [Copy] Added VolumetricScatteringIntensity to particle lights Change 3346389 on 2017/03/14 by Daniel.Wright [Copy] Clamp Volumetric material attributes to fp16 range to avoid INFs Disable volumetric fog when the fog show flag is disabled Change 3346392 on 2017/03/14 by Daniel.Wright [Copy] Fixed skylight being much too bright on volumetric fog Change 3346406 on 2017/03/14 by Daniel.Wright [Copy] CSM resolution is now controlled by r.Shadow.MaxCSMResolution. * Changed HighPC to use 1024 MaxShadowResolution (max for all non-CSM shadows), saves 60Mb in Fortnite Change 3346412 on 2017/03/14 by Daniel.Wright [Copy] TexCreate_ReduceMemoryWithTilingMode for translucency lighting 3d textures, saves 13Mb Change 3346414 on 2017/03/14 by Daniel.Wright [Copy] TexCreate_ReduceMemoryWithTilingMode for volumetric fog 3d textures, saves 13Mb Change 3346415 on 2017/03/14 by Daniel.Wright [Copy] Missing file from cl 3338451 Change 3346421 on 2017/03/14 by Daniel.Wright [Copy] Fixed NaNs in volumetric fog due to rendering when height fog is disabled * Volumetric fog converts NaNs to black now so they don't spread Change 3346422 on 2017/03/14 by Daniel.Wright [Copy] Fixed NaN in volumetric fog with low density values Change 3346423 on 2017/03/14 by Daniel.Wright [Copy] Changed default VolumetricFogScatteringDistribution to .2 Change 3346430 on 2017/03/14 by Daniel.Wright [Copy] New translucent material option to compute fog per pixel instead of the default per vertex Change 3346432 on 2017/03/14 by Daniel.Wright [Copy] Moved Volumetric Fog parameters to view uniform buffer for translucency pass Fixed lifetimes of temporary Volumetric Fog render targets Change 3346526 on 2017/03/14 by Daniel.Wright [Copy] Volumetric Fog supports point and spot light shadows * These lights are injected separately so that per-light resources can be bound (shadow depth map, static shadow depth map) * Forward lighting of local lights can be forced with 'r.VolumetricFog.InjectShadowedLightsSeparately 0' * Shadowed lights come at a cost: 2.9ms for volumetric fog on 970 -> 4.2ms with shadowing Change 3347053 on 2017/03/15 by Rolando.Caloca DR - android compile fix Change 3347384 on 2017/03/15 by Rolando.Caloca DR - Fix merge issue Change 3347643 on 2017/03/15 by Marcus.Wassmer Fix some bugs with the 'disable stationary skylight ffor the project' feature. Fixes lighting in Persona on Paragon. Change 3347979 on 2017/03/15 by Rolando.Caloca DR - Allow to automatically apply cached rendertargets to PSO initializer Change 3348024 on 2017/03/15 by Rolando.Caloca DR - Remove NullPS on Vulkan to avoid deadlock Change 3348303 on 2017/03/15 by Rolando.Caloca DR - Fix for debugging SCW with material SRT Change 3348357 on 2017/03/15 by Marcus.Wassmer Fix stencildither and a stencilref bug that was probably breaking decals sometimes. Change 3348549 on 2017/03/15 by Marcus.Wassmer Hopefully fix static analysis for potential nullptr access. Change 3348614 on 2017/03/15 by Marcus.Wassmer Duplicate some switch changes to fix crash on launch. Change 3349369 on 2017/03/16 by Gil.Gribb Fixed botched merge Change 3349947 on 2017/03/16 by Rolando.Caloca DR - Fix for mismatched primitive type Change 3349956 on 2017/03/16 by Benjamin.Hyder initial updates to TM-DistanceFields map Change 3350151 on 2017/03/16 by Rolando.Caloca DR - Fix UT compile issue Change 3350155 on 2017/03/16 by Rolando.Caloca DR - Catch mismatched primitive type on PSOs on D3D11 Change 3350192 on 2017/03/16 by Daniel.Wright Fix for point light shadow depths rendering with wrong cull mode due to PSO refactor Change 3350736 on 2017/03/16 by Daniel.Wright Fixed formatting from merge Change 3350881 on 2017/03/16 by Rolando.Caloca DR - Fix texture arrays as UAVs on Metal Change 3350927 on 2017/03/16 by Rolando.Caloca DR - Fix warning Change 3350935 on 2017/03/16 by Daniel.Wright Fix for materials with non-Surface domains being skipped in mesh passes Change 3351583 on 2017/03/17 by Marcus.Wassmer Fix clang platforms Change 3351917 on 2017/03/17 by Marcus.Wassmer Fix linux compile Change 3351973 on 2017/03/17 by Marcus.Wassmer Fix mismatched rendertargetformat Change 3352038 on 2017/03/17 by Daniel.Wright Enabled GetAndOrCreateGraphicsPipelineState ensures in Development for testing Change 3352110 on 2017/03/17 by Marcus.Wassmer Fix missing RT PSO apply Change 3352695 on 2017/03/17 by Arne.Schober DR - Remove PSO Rendertarget check in DX12 Resolve with Shader. #RB Rolando.Caloca Change 3352960 on 2017/03/17 by Arne.Schober DR - Fix some things that slipped trough the PSO merge #RB none Change 3353150 on 2017/03/18 by Rolando.Caloca DR - compile fix Change 3353205 on 2017/03/18 by Arne.Schober DR - Fix Incremental Compile and PS4 runtime error where CMASK is not allowed for ThickTile Mode #RB none Change 3353207 on 2017/03/18 by Arne.Schober DR - Fix Confusion #RB none Change 3355183 on 2017/03/20 by Nick.Bullard Fixed up Content orginzation for Decals automation tests in EngineTest Change 3355627 on 2017/03/20 by Arne.Schober DR - [UE-43094] - removed ensure in comporiton graph as control of the clear color cannot be gurantueed. Change 3356342 on 2017/03/21 by Marcus.Wassmer Fix clang errors Change 3356591 on 2017/03/21 by Arne.Schober DR - Fix ensure message #RB none Change 3356873 on 2017/03/21 by Arne.Schober DR - Fix comparission of undefined values in RendertargetApply Check Change 3357261 on 2017/03/21 by Marcus.Wassmer Fix LinuxEditor compile Change 3357294 on 2017/03/21 by Marcus.Wassmer Add missing SSE functions Change 3357351 on 2017/03/21 by Frank.Fella Fix win32 and linux compiler errors Change 3357370 on 2017/03/21 by Arne.Schober DR - disable ensure in test builds #RB Marcus.Wassmer [CL 3357449 by Marcus Wassmer in Main branch]
2017-03-21 17:46:52 -04:00
ClearUAV(RHICmdList, GMaxRHIFeatureLevel, GGlobalDistanceFieldCulledObjectBuffers.Buffers.ObjectIndirectArguments, 0);
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3185985) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3170391 on 2016/10/21 by Ben.Woodhouse Remove the wait on end of frame ensure, because we can't rely on all the the underlying codepaths to never miss a call to flush RHI resources. The consequences of missing a flush on a given frame are not serious now, since we enforce the synchronisation with a fence, preventing the rendering thread from getting too far ahead. We will simply accumulate resources for an additional frame when this happens. #jira UE-37437 #fyi rolando.caloca, marcus.wassmer Change 3170659 on 2016/10/21 by Rolando.Caloca DR - vk - Prep work for state key changes Change 3170676 on 2016/10/21 by Rolando.Caloca DR - vk - Reworked blend state keys - Added depth/stencil to pipeline key Change 3170848 on 2016/10/21 by Daniel.Wright Level viewport 'show stats' option is now enabled by default, which avoids confusion with artists thinking lighting is built, when really the message is hidden. Change 3170849 on 2016/10/21 by Daniel.Wright Split FProjectedShadowInfo::RenderProjection into smaller functions which make the algorithm structure clear Change 3170995 on 2016/10/21 by Rolando.Caloca DR - vk - Show object on vulkan validation msgs Change 3171085 on 2016/10/21 by Rolando.Caloca DR - vk - Fix pipelines being used with incompatible renderpasses Change 3171159 on 2016/10/21 by Rolando.Caloca DR - vk - Fix layout when reading textures on CPU Change 3171167 on 2016/10/21 by Rolando.Caloca DR - vk - compile fix Change 3172462 on 2016/10/24 by Daniel.Wright Added a warning about shader compile times to the material tooltip Change 3172463 on 2016/10/24 by Daniel.Wright Reduced MinUnoccludedFraction to avoid artitfacts when a stationary light touches only a tiny part of a mesh Change 3172716 on 2016/10/24 by Brian.Karis Fix for crash UE-37369 when reimporting over a generated LOD. Change 3172967 on 2016/10/24 by Rolando.Caloca DR - vk - Fix writing buffers while GPU was using them Change 3174187 on 2016/10/25 by Olaf.Piesche UE-37020 Change 3174718 on 2016/10/26 by Rolando.Caloca DR - vk - Remove old timestamp queries, increase occlusion queries per pool to 4k Change 3175960 on 2016/10/26 by Rolando.Caloca DR - Added support for hlslcc header to have custom parsing Change 3176611 on 2016/10/27 by David.Hill DrawWireCone confusion: In response to a UDN, I'm updating confusing parameter names and comments for DrawWireCone() and DrawWireSphereCappedCone() Change 3177111 on 2016/10/27 by Rolando.Caloca DR - vk - Fix timestamps for frame Change 3177192 on 2016/10/27 by Arne.Schober DR - DitherLOD refactor - moved computation of the DepthStencil state out of SetMeshRenderState into GetDitheredLODTransitionState this is a prerequisite of further PSO work where we want to move up State setting in a similar war and reuse FMeshDrawingRenderState Change 3177278 on 2016/10/27 by Olaf.Piesche UE-37484 Change 3177297 on 2016/10/27 by Rolando.Caloca DR - vk - Enable GRHISupportsBaseVertexIndex Change 3177607 on 2016/10/27 by Rolando.Caloca DR - vk - SM4 UB prep Change 3178052 on 2016/10/28 by Arne.Schober DR - fix WebGL - the WebGL compiler is very picky on double underscores and does want the presission to be defined before any function definition. Change 3178156 on 2016/10/28 by Rolando.Caloca DR - vk - Added query timer - Fixed inline issues Change 3178158 on 2016/10/28 by Rolando.Caloca DR - vk - Fixes for out of stencil bits Change 3178462 on 2016/10/28 by Rolando.Caloca DR - vk - Fixes for Elemental Change 3179131 on 2016/10/28 by Rolando.Caloca DR - vk - Fix for r.Vulkan.UseRealUBs Change 3179139 on 2016/10/28 by Rolando.Caloca DR - vk - Move UB ring buffer to context Change 3179145 on 2016/10/28 by Rolando.Caloca DR - vk - Fix buffer barriers Change 3179888 on 2016/10/31 by Rolando.Caloca DR - vk - Align buffers to 16 bytes as we sometimes write to them with SIMD Change 3179923 on 2016/10/31 by Rolando.Caloca DR - vk - Wait for swapchain counter Change 3180430 on 2016/10/31 by Rolando.Caloca DR - vk - Properly wait for occlusion queries/cmd buffer - Actual log error if trying to use occlusion queries out of order Change 3180746 on 2016/10/31 by Rolando.Caloca DR - vk - Undo some waiting as it was on the wrong thread Change 3182115 on 2016/11/01 by Rolando.Caloca DR - hlslcc Linux path fix Change 3182118 on 2016/11/01 by Daniel.Wright Fixed global distance field seam artifacts from landscapes with no subsections Change 3182368 on 2016/11/01 by Daniel.Wright Dynamic Indirect Shadows for static meshes using distance fields * These Distance Field indirect shadows use the same tile culled and downsampled framework that capsule shadows use, with similar GPU cost * Individual StaticMesh assets can enable bGenerateMeshDistanceField to compute a distance field, without the memory cost of enabling for the whole project * New StaticMeshComponent properties bCastDynamicIndirectShadow and DynamicIndirectShadowMinVisibility * New WorldSettings property DynamicIndirectShadowsSelfShadowingIntensity which replaces the cvar * The GBuffer now stores HasDynamicIndirectShadowCasterRepresentation instead of HasHeightfieldRepresentation * DFAO from landscape is now done through the global distance field entirely. Landscape contribution to the global distance field is deferred to attempt to workaround texture streaming issues. Change 3182408 on 2016/11/01 by Rolando.Caloca DR - vk - Reworked occlusion queries, fixes flickering on AMD Change 3182585 on 2016/11/01 by Daniel.Wright PS4 compile fix Change 3183151 on 2016/11/02 by Rolando.Caloca DR - vk - Fix issue when processing super quick cmd buffers Change 3183160 on 2016/11/02 by Rolando.Caloca Dr - vk - Call reset queries outside render pass Change 3183182 on 2016/11/02 by Rolando.Caloca DR - Switch clear Change 3183194 on 2016/11/02 by Rolando.Caloca DR - Try to catch crash ahead of time Change 3183268 on 2016/11/02 by Rolando.Caloca DR - vk - Rename RenderPassState to TransitionState Change 3183440 on 2016/11/02 by Daniel.Wright Renamed 'Dynamic Indirect Shadow' to 'Distance Field Indirect Shadow' Change 3183793 on 2016/11/02 by Daniel.Wright Added ShadowResolutionScale to lightcomponent Change 3183796 on 2016/11/02 by Daniel.Wright Improved bSimulatePhysics comment, with info on why it might be greyed out Change 3183797 on 2016/11/02 by Daniel.Wright Precomputed shadowmaps no longer enable Force2To1Aspect, which is only needed for lightmaps. Improves shadowmap utilization. Change 3183915 on 2016/11/02 by Rolando.Caloca DR - vk - Remove redundant renderpasses Change 3183991 on 2016/11/02 by Daniel.Wright Added r.ReflectionEnvironmentLightmapMixLargestWeight, useful for restricting lightmap mixing to darkening only Change 3184001 on 2016/11/02 by Daniel.Wright Better draw event for IndirectCapsuleShadows in stereo Change 3184096 on 2016/11/02 by Chris.Bunner HDR for D3D11 - NVAPI toggle and encoding, UI compositing. Removed some outdated tonemamping cvars and modes. Change 3184399 on 2016/11/02 by Daniel.Wright Static analysis workaround Change 3184455 on 2016/11/02 by Mark.Satterthwaite Fix missing log10 from FCompositePS on hlslcc shader platforms so that QA can continue their integration. #jira UE-38164 Change 3184953 on 2016/11/03 by Chris.Bunner Fixing CIS warnings. [CL 3186011 by Marcus Wassmer in Main branch]
2016-11-03 16:55:27 -04:00
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
TShaderMapRef<FCullObjectsForVolumeCS> ComputeShader(View.ShaderMap);
RHICmdList.SetComputeShader(ComputeShader->GetComputeShader());
const FVector4 VolumeBounds(UpdateRegion.Bounds.GetCenter(), UpdateRegion.Bounds.GetExtent().Size());
ComputeShader->SetParameters(RHICmdList, Scene, View, MaxOcclusionDistance, VolumeBounds, (FGlobalDFCacheType)CacheType);
DispatchComputeShader(RHICmdList, *ComputeShader, FMath::DivideAndRoundUp<uint32>(Scene->DistanceFieldSceneData.NumObjectsInBuffer, CullObjectsGroupSize), 1, 1);
ComputeShader->UnsetParameters(RHICmdList, Scene);
}
// Further cull the objects into a low resolution grid
{
TShaderMapRef<FCullObjectsToGridCS> ComputeShader(View.ShaderMap);
RHICmdList.SetComputeShader(ComputeShader->GetComputeShader());
ComputeShader->SetParameters(RHICmdList, Scene, View, MaxOcclusionDistance, GlobalDistanceFieldInfo, ClipmapIndex, UpdateRegion);
const uint32 NumGroupsX = FMath::DivideAndRoundUp<int32>(UpdateRegion.CellsSize.X, GCullGridTileSize);
const uint32 NumGroupsY = FMath::DivideAndRoundUp<int32>(UpdateRegion.CellsSize.Y, GCullGridTileSize);
const uint32 NumGroupsZ = FMath::DivideAndRoundUp<int32>(UpdateRegion.CellsSize.Z, GCullGridTileSize);
DispatchComputeShader(RHICmdList, *ComputeShader, NumGroupsX, NumGroupsY, NumGroupsZ);
ComputeShader->UnsetParameters(RHICmdList);
}
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3185985) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3170391 on 2016/10/21 by Ben.Woodhouse Remove the wait on end of frame ensure, because we can't rely on all the the underlying codepaths to never miss a call to flush RHI resources. The consequences of missing a flush on a given frame are not serious now, since we enforce the synchronisation with a fence, preventing the rendering thread from getting too far ahead. We will simply accumulate resources for an additional frame when this happens. #jira UE-37437 #fyi rolando.caloca, marcus.wassmer Change 3170659 on 2016/10/21 by Rolando.Caloca DR - vk - Prep work for state key changes Change 3170676 on 2016/10/21 by Rolando.Caloca DR - vk - Reworked blend state keys - Added depth/stencil to pipeline key Change 3170848 on 2016/10/21 by Daniel.Wright Level viewport 'show stats' option is now enabled by default, which avoids confusion with artists thinking lighting is built, when really the message is hidden. Change 3170849 on 2016/10/21 by Daniel.Wright Split FProjectedShadowInfo::RenderProjection into smaller functions which make the algorithm structure clear Change 3170995 on 2016/10/21 by Rolando.Caloca DR - vk - Show object on vulkan validation msgs Change 3171085 on 2016/10/21 by Rolando.Caloca DR - vk - Fix pipelines being used with incompatible renderpasses Change 3171159 on 2016/10/21 by Rolando.Caloca DR - vk - Fix layout when reading textures on CPU Change 3171167 on 2016/10/21 by Rolando.Caloca DR - vk - compile fix Change 3172462 on 2016/10/24 by Daniel.Wright Added a warning about shader compile times to the material tooltip Change 3172463 on 2016/10/24 by Daniel.Wright Reduced MinUnoccludedFraction to avoid artitfacts when a stationary light touches only a tiny part of a mesh Change 3172716 on 2016/10/24 by Brian.Karis Fix for crash UE-37369 when reimporting over a generated LOD. Change 3172967 on 2016/10/24 by Rolando.Caloca DR - vk - Fix writing buffers while GPU was using them Change 3174187 on 2016/10/25 by Olaf.Piesche UE-37020 Change 3174718 on 2016/10/26 by Rolando.Caloca DR - vk - Remove old timestamp queries, increase occlusion queries per pool to 4k Change 3175960 on 2016/10/26 by Rolando.Caloca DR - Added support for hlslcc header to have custom parsing Change 3176611 on 2016/10/27 by David.Hill DrawWireCone confusion: In response to a UDN, I'm updating confusing parameter names and comments for DrawWireCone() and DrawWireSphereCappedCone() Change 3177111 on 2016/10/27 by Rolando.Caloca DR - vk - Fix timestamps for frame Change 3177192 on 2016/10/27 by Arne.Schober DR - DitherLOD refactor - moved computation of the DepthStencil state out of SetMeshRenderState into GetDitheredLODTransitionState this is a prerequisite of further PSO work where we want to move up State setting in a similar war and reuse FMeshDrawingRenderState Change 3177278 on 2016/10/27 by Olaf.Piesche UE-37484 Change 3177297 on 2016/10/27 by Rolando.Caloca DR - vk - Enable GRHISupportsBaseVertexIndex Change 3177607 on 2016/10/27 by Rolando.Caloca DR - vk - SM4 UB prep Change 3178052 on 2016/10/28 by Arne.Schober DR - fix WebGL - the WebGL compiler is very picky on double underscores and does want the presission to be defined before any function definition. Change 3178156 on 2016/10/28 by Rolando.Caloca DR - vk - Added query timer - Fixed inline issues Change 3178158 on 2016/10/28 by Rolando.Caloca DR - vk - Fixes for out of stencil bits Change 3178462 on 2016/10/28 by Rolando.Caloca DR - vk - Fixes for Elemental Change 3179131 on 2016/10/28 by Rolando.Caloca DR - vk - Fix for r.Vulkan.UseRealUBs Change 3179139 on 2016/10/28 by Rolando.Caloca DR - vk - Move UB ring buffer to context Change 3179145 on 2016/10/28 by Rolando.Caloca DR - vk - Fix buffer barriers Change 3179888 on 2016/10/31 by Rolando.Caloca DR - vk - Align buffers to 16 bytes as we sometimes write to them with SIMD Change 3179923 on 2016/10/31 by Rolando.Caloca DR - vk - Wait for swapchain counter Change 3180430 on 2016/10/31 by Rolando.Caloca DR - vk - Properly wait for occlusion queries/cmd buffer - Actual log error if trying to use occlusion queries out of order Change 3180746 on 2016/10/31 by Rolando.Caloca DR - vk - Undo some waiting as it was on the wrong thread Change 3182115 on 2016/11/01 by Rolando.Caloca DR - hlslcc Linux path fix Change 3182118 on 2016/11/01 by Daniel.Wright Fixed global distance field seam artifacts from landscapes with no subsections Change 3182368 on 2016/11/01 by Daniel.Wright Dynamic Indirect Shadows for static meshes using distance fields * These Distance Field indirect shadows use the same tile culled and downsampled framework that capsule shadows use, with similar GPU cost * Individual StaticMesh assets can enable bGenerateMeshDistanceField to compute a distance field, without the memory cost of enabling for the whole project * New StaticMeshComponent properties bCastDynamicIndirectShadow and DynamicIndirectShadowMinVisibility * New WorldSettings property DynamicIndirectShadowsSelfShadowingIntensity which replaces the cvar * The GBuffer now stores HasDynamicIndirectShadowCasterRepresentation instead of HasHeightfieldRepresentation * DFAO from landscape is now done through the global distance field entirely. Landscape contribution to the global distance field is deferred to attempt to workaround texture streaming issues. Change 3182408 on 2016/11/01 by Rolando.Caloca DR - vk - Reworked occlusion queries, fixes flickering on AMD Change 3182585 on 2016/11/01 by Daniel.Wright PS4 compile fix Change 3183151 on 2016/11/02 by Rolando.Caloca DR - vk - Fix issue when processing super quick cmd buffers Change 3183160 on 2016/11/02 by Rolando.Caloca Dr - vk - Call reset queries outside render pass Change 3183182 on 2016/11/02 by Rolando.Caloca DR - Switch clear Change 3183194 on 2016/11/02 by Rolando.Caloca DR - Try to catch crash ahead of time Change 3183268 on 2016/11/02 by Rolando.Caloca DR - vk - Rename RenderPassState to TransitionState Change 3183440 on 2016/11/02 by Daniel.Wright Renamed 'Dynamic Indirect Shadow' to 'Distance Field Indirect Shadow' Change 3183793 on 2016/11/02 by Daniel.Wright Added ShadowResolutionScale to lightcomponent Change 3183796 on 2016/11/02 by Daniel.Wright Improved bSimulatePhysics comment, with info on why it might be greyed out Change 3183797 on 2016/11/02 by Daniel.Wright Precomputed shadowmaps no longer enable Force2To1Aspect, which is only needed for lightmaps. Improves shadowmap utilization. Change 3183915 on 2016/11/02 by Rolando.Caloca DR - vk - Remove redundant renderpasses Change 3183991 on 2016/11/02 by Daniel.Wright Added r.ReflectionEnvironmentLightmapMixLargestWeight, useful for restricting lightmap mixing to darkening only Change 3184001 on 2016/11/02 by Daniel.Wright Better draw event for IndirectCapsuleShadows in stereo Change 3184096 on 2016/11/02 by Chris.Bunner HDR for D3D11 - NVAPI toggle and encoding, UI compositing. Removed some outdated tonemamping cvars and modes. Change 3184399 on 2016/11/02 by Daniel.Wright Static analysis workaround Change 3184455 on 2016/11/02 by Mark.Satterthwaite Fix missing log10 from FCompositePS on hlslcc shader platforms so that QA can continue their integration. #jira UE-38164 Change 3184953 on 2016/11/03 by Chris.Bunner Fixing CIS warnings. [CL 3186011 by Marcus Wassmer in Main branch]
2016-11-03 16:55:27 -04:00
}
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
// Further cull the objects to the dispatch tile and composite the global distance field by computing the min distance from intersecting per-object distance fields
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3185985) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3170391 on 2016/10/21 by Ben.Woodhouse Remove the wait on end of frame ensure, because we can't rely on all the the underlying codepaths to never miss a call to flush RHI resources. The consequences of missing a flush on a given frame are not serious now, since we enforce the synchronisation with a fence, preventing the rendering thread from getting too far ahead. We will simply accumulate resources for an additional frame when this happens. #jira UE-37437 #fyi rolando.caloca, marcus.wassmer Change 3170659 on 2016/10/21 by Rolando.Caloca DR - vk - Prep work for state key changes Change 3170676 on 2016/10/21 by Rolando.Caloca DR - vk - Reworked blend state keys - Added depth/stencil to pipeline key Change 3170848 on 2016/10/21 by Daniel.Wright Level viewport 'show stats' option is now enabled by default, which avoids confusion with artists thinking lighting is built, when really the message is hidden. Change 3170849 on 2016/10/21 by Daniel.Wright Split FProjectedShadowInfo::RenderProjection into smaller functions which make the algorithm structure clear Change 3170995 on 2016/10/21 by Rolando.Caloca DR - vk - Show object on vulkan validation msgs Change 3171085 on 2016/10/21 by Rolando.Caloca DR - vk - Fix pipelines being used with incompatible renderpasses Change 3171159 on 2016/10/21 by Rolando.Caloca DR - vk - Fix layout when reading textures on CPU Change 3171167 on 2016/10/21 by Rolando.Caloca DR - vk - compile fix Change 3172462 on 2016/10/24 by Daniel.Wright Added a warning about shader compile times to the material tooltip Change 3172463 on 2016/10/24 by Daniel.Wright Reduced MinUnoccludedFraction to avoid artitfacts when a stationary light touches only a tiny part of a mesh Change 3172716 on 2016/10/24 by Brian.Karis Fix for crash UE-37369 when reimporting over a generated LOD. Change 3172967 on 2016/10/24 by Rolando.Caloca DR - vk - Fix writing buffers while GPU was using them Change 3174187 on 2016/10/25 by Olaf.Piesche UE-37020 Change 3174718 on 2016/10/26 by Rolando.Caloca DR - vk - Remove old timestamp queries, increase occlusion queries per pool to 4k Change 3175960 on 2016/10/26 by Rolando.Caloca DR - Added support for hlslcc header to have custom parsing Change 3176611 on 2016/10/27 by David.Hill DrawWireCone confusion: In response to a UDN, I'm updating confusing parameter names and comments for DrawWireCone() and DrawWireSphereCappedCone() Change 3177111 on 2016/10/27 by Rolando.Caloca DR - vk - Fix timestamps for frame Change 3177192 on 2016/10/27 by Arne.Schober DR - DitherLOD refactor - moved computation of the DepthStencil state out of SetMeshRenderState into GetDitheredLODTransitionState this is a prerequisite of further PSO work where we want to move up State setting in a similar war and reuse FMeshDrawingRenderState Change 3177278 on 2016/10/27 by Olaf.Piesche UE-37484 Change 3177297 on 2016/10/27 by Rolando.Caloca DR - vk - Enable GRHISupportsBaseVertexIndex Change 3177607 on 2016/10/27 by Rolando.Caloca DR - vk - SM4 UB prep Change 3178052 on 2016/10/28 by Arne.Schober DR - fix WebGL - the WebGL compiler is very picky on double underscores and does want the presission to be defined before any function definition. Change 3178156 on 2016/10/28 by Rolando.Caloca DR - vk - Added query timer - Fixed inline issues Change 3178158 on 2016/10/28 by Rolando.Caloca DR - vk - Fixes for out of stencil bits Change 3178462 on 2016/10/28 by Rolando.Caloca DR - vk - Fixes for Elemental Change 3179131 on 2016/10/28 by Rolando.Caloca DR - vk - Fix for r.Vulkan.UseRealUBs Change 3179139 on 2016/10/28 by Rolando.Caloca DR - vk - Move UB ring buffer to context Change 3179145 on 2016/10/28 by Rolando.Caloca DR - vk - Fix buffer barriers Change 3179888 on 2016/10/31 by Rolando.Caloca DR - vk - Align buffers to 16 bytes as we sometimes write to them with SIMD Change 3179923 on 2016/10/31 by Rolando.Caloca DR - vk - Wait for swapchain counter Change 3180430 on 2016/10/31 by Rolando.Caloca DR - vk - Properly wait for occlusion queries/cmd buffer - Actual log error if trying to use occlusion queries out of order Change 3180746 on 2016/10/31 by Rolando.Caloca DR - vk - Undo some waiting as it was on the wrong thread Change 3182115 on 2016/11/01 by Rolando.Caloca DR - hlslcc Linux path fix Change 3182118 on 2016/11/01 by Daniel.Wright Fixed global distance field seam artifacts from landscapes with no subsections Change 3182368 on 2016/11/01 by Daniel.Wright Dynamic Indirect Shadows for static meshes using distance fields * These Distance Field indirect shadows use the same tile culled and downsampled framework that capsule shadows use, with similar GPU cost * Individual StaticMesh assets can enable bGenerateMeshDistanceField to compute a distance field, without the memory cost of enabling for the whole project * New StaticMeshComponent properties bCastDynamicIndirectShadow and DynamicIndirectShadowMinVisibility * New WorldSettings property DynamicIndirectShadowsSelfShadowingIntensity which replaces the cvar * The GBuffer now stores HasDynamicIndirectShadowCasterRepresentation instead of HasHeightfieldRepresentation * DFAO from landscape is now done through the global distance field entirely. Landscape contribution to the global distance field is deferred to attempt to workaround texture streaming issues. Change 3182408 on 2016/11/01 by Rolando.Caloca DR - vk - Reworked occlusion queries, fixes flickering on AMD Change 3182585 on 2016/11/01 by Daniel.Wright PS4 compile fix Change 3183151 on 2016/11/02 by Rolando.Caloca DR - vk - Fix issue when processing super quick cmd buffers Change 3183160 on 2016/11/02 by Rolando.Caloca Dr - vk - Call reset queries outside render pass Change 3183182 on 2016/11/02 by Rolando.Caloca DR - Switch clear Change 3183194 on 2016/11/02 by Rolando.Caloca DR - Try to catch crash ahead of time Change 3183268 on 2016/11/02 by Rolando.Caloca DR - vk - Rename RenderPassState to TransitionState Change 3183440 on 2016/11/02 by Daniel.Wright Renamed 'Dynamic Indirect Shadow' to 'Distance Field Indirect Shadow' Change 3183793 on 2016/11/02 by Daniel.Wright Added ShadowResolutionScale to lightcomponent Change 3183796 on 2016/11/02 by Daniel.Wright Improved bSimulatePhysics comment, with info on why it might be greyed out Change 3183797 on 2016/11/02 by Daniel.Wright Precomputed shadowmaps no longer enable Force2To1Aspect, which is only needed for lightmaps. Improves shadowmap utilization. Change 3183915 on 2016/11/02 by Rolando.Caloca DR - vk - Remove redundant renderpasses Change 3183991 on 2016/11/02 by Daniel.Wright Added r.ReflectionEnvironmentLightmapMixLargestWeight, useful for restricting lightmap mixing to darkening only Change 3184001 on 2016/11/02 by Daniel.Wright Better draw event for IndirectCapsuleShadows in stereo Change 3184096 on 2016/11/02 by Chris.Bunner HDR for D3D11 - NVAPI toggle and encoding, UI compositing. Removed some outdated tonemamping cvars and modes. Change 3184399 on 2016/11/02 by Daniel.Wright Static analysis workaround Change 3184455 on 2016/11/02 by Mark.Satterthwaite Fix missing log10 from FCompositePS on hlslcc shader platforms so that QA can continue their integration. #jira UE-38164 Change 3184953 on 2016/11/03 by Chris.Bunner Fixing CIS warnings. [CL 3186011 by Marcus Wassmer in Main branch]
2016-11-03 16:55:27 -04:00
{
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
SCOPED_DRAW_EVENTF(RHICmdList, TileCullAndComposite, TEXT("TileCullAndComposite %ux%ux%u"), UpdateRegion.CellsSize.X, UpdateRegion.CellsSize.Y, UpdateRegion.CellsSize.Z);
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3185985) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3170391 on 2016/10/21 by Ben.Woodhouse Remove the wait on end of frame ensure, because we can't rely on all the the underlying codepaths to never miss a call to flush RHI resources. The consequences of missing a flush on a given frame are not serious now, since we enforce the synchronisation with a fence, preventing the rendering thread from getting too far ahead. We will simply accumulate resources for an additional frame when this happens. #jira UE-37437 #fyi rolando.caloca, marcus.wassmer Change 3170659 on 2016/10/21 by Rolando.Caloca DR - vk - Prep work for state key changes Change 3170676 on 2016/10/21 by Rolando.Caloca DR - vk - Reworked blend state keys - Added depth/stencil to pipeline key Change 3170848 on 2016/10/21 by Daniel.Wright Level viewport 'show stats' option is now enabled by default, which avoids confusion with artists thinking lighting is built, when really the message is hidden. Change 3170849 on 2016/10/21 by Daniel.Wright Split FProjectedShadowInfo::RenderProjection into smaller functions which make the algorithm structure clear Change 3170995 on 2016/10/21 by Rolando.Caloca DR - vk - Show object on vulkan validation msgs Change 3171085 on 2016/10/21 by Rolando.Caloca DR - vk - Fix pipelines being used with incompatible renderpasses Change 3171159 on 2016/10/21 by Rolando.Caloca DR - vk - Fix layout when reading textures on CPU Change 3171167 on 2016/10/21 by Rolando.Caloca DR - vk - compile fix Change 3172462 on 2016/10/24 by Daniel.Wright Added a warning about shader compile times to the material tooltip Change 3172463 on 2016/10/24 by Daniel.Wright Reduced MinUnoccludedFraction to avoid artitfacts when a stationary light touches only a tiny part of a mesh Change 3172716 on 2016/10/24 by Brian.Karis Fix for crash UE-37369 when reimporting over a generated LOD. Change 3172967 on 2016/10/24 by Rolando.Caloca DR - vk - Fix writing buffers while GPU was using them Change 3174187 on 2016/10/25 by Olaf.Piesche UE-37020 Change 3174718 on 2016/10/26 by Rolando.Caloca DR - vk - Remove old timestamp queries, increase occlusion queries per pool to 4k Change 3175960 on 2016/10/26 by Rolando.Caloca DR - Added support for hlslcc header to have custom parsing Change 3176611 on 2016/10/27 by David.Hill DrawWireCone confusion: In response to a UDN, I'm updating confusing parameter names and comments for DrawWireCone() and DrawWireSphereCappedCone() Change 3177111 on 2016/10/27 by Rolando.Caloca DR - vk - Fix timestamps for frame Change 3177192 on 2016/10/27 by Arne.Schober DR - DitherLOD refactor - moved computation of the DepthStencil state out of SetMeshRenderState into GetDitheredLODTransitionState this is a prerequisite of further PSO work where we want to move up State setting in a similar war and reuse FMeshDrawingRenderState Change 3177278 on 2016/10/27 by Olaf.Piesche UE-37484 Change 3177297 on 2016/10/27 by Rolando.Caloca DR - vk - Enable GRHISupportsBaseVertexIndex Change 3177607 on 2016/10/27 by Rolando.Caloca DR - vk - SM4 UB prep Change 3178052 on 2016/10/28 by Arne.Schober DR - fix WebGL - the WebGL compiler is very picky on double underscores and does want the presission to be defined before any function definition. Change 3178156 on 2016/10/28 by Rolando.Caloca DR - vk - Added query timer - Fixed inline issues Change 3178158 on 2016/10/28 by Rolando.Caloca DR - vk - Fixes for out of stencil bits Change 3178462 on 2016/10/28 by Rolando.Caloca DR - vk - Fixes for Elemental Change 3179131 on 2016/10/28 by Rolando.Caloca DR - vk - Fix for r.Vulkan.UseRealUBs Change 3179139 on 2016/10/28 by Rolando.Caloca DR - vk - Move UB ring buffer to context Change 3179145 on 2016/10/28 by Rolando.Caloca DR - vk - Fix buffer barriers Change 3179888 on 2016/10/31 by Rolando.Caloca DR - vk - Align buffers to 16 bytes as we sometimes write to them with SIMD Change 3179923 on 2016/10/31 by Rolando.Caloca DR - vk - Wait for swapchain counter Change 3180430 on 2016/10/31 by Rolando.Caloca DR - vk - Properly wait for occlusion queries/cmd buffer - Actual log error if trying to use occlusion queries out of order Change 3180746 on 2016/10/31 by Rolando.Caloca DR - vk - Undo some waiting as it was on the wrong thread Change 3182115 on 2016/11/01 by Rolando.Caloca DR - hlslcc Linux path fix Change 3182118 on 2016/11/01 by Daniel.Wright Fixed global distance field seam artifacts from landscapes with no subsections Change 3182368 on 2016/11/01 by Daniel.Wright Dynamic Indirect Shadows for static meshes using distance fields * These Distance Field indirect shadows use the same tile culled and downsampled framework that capsule shadows use, with similar GPU cost * Individual StaticMesh assets can enable bGenerateMeshDistanceField to compute a distance field, without the memory cost of enabling for the whole project * New StaticMeshComponent properties bCastDynamicIndirectShadow and DynamicIndirectShadowMinVisibility * New WorldSettings property DynamicIndirectShadowsSelfShadowingIntensity which replaces the cvar * The GBuffer now stores HasDynamicIndirectShadowCasterRepresentation instead of HasHeightfieldRepresentation * DFAO from landscape is now done through the global distance field entirely. Landscape contribution to the global distance field is deferred to attempt to workaround texture streaming issues. Change 3182408 on 2016/11/01 by Rolando.Caloca DR - vk - Reworked occlusion queries, fixes flickering on AMD Change 3182585 on 2016/11/01 by Daniel.Wright PS4 compile fix Change 3183151 on 2016/11/02 by Rolando.Caloca DR - vk - Fix issue when processing super quick cmd buffers Change 3183160 on 2016/11/02 by Rolando.Caloca Dr - vk - Call reset queries outside render pass Change 3183182 on 2016/11/02 by Rolando.Caloca DR - Switch clear Change 3183194 on 2016/11/02 by Rolando.Caloca DR - Try to catch crash ahead of time Change 3183268 on 2016/11/02 by Rolando.Caloca DR - vk - Rename RenderPassState to TransitionState Change 3183440 on 2016/11/02 by Daniel.Wright Renamed 'Dynamic Indirect Shadow' to 'Distance Field Indirect Shadow' Change 3183793 on 2016/11/02 by Daniel.Wright Added ShadowResolutionScale to lightcomponent Change 3183796 on 2016/11/02 by Daniel.Wright Improved bSimulatePhysics comment, with info on why it might be greyed out Change 3183797 on 2016/11/02 by Daniel.Wright Precomputed shadowmaps no longer enable Force2To1Aspect, which is only needed for lightmaps. Improves shadowmap utilization. Change 3183915 on 2016/11/02 by Rolando.Caloca DR - vk - Remove redundant renderpasses Change 3183991 on 2016/11/02 by Daniel.Wright Added r.ReflectionEnvironmentLightmapMixLargestWeight, useful for restricting lightmap mixing to darkening only Change 3184001 on 2016/11/02 by Daniel.Wright Better draw event for IndirectCapsuleShadows in stereo Change 3184096 on 2016/11/02 by Chris.Bunner HDR for D3D11 - NVAPI toggle and encoding, UI compositing. Removed some outdated tonemamping cvars and modes. Change 3184399 on 2016/11/02 by Daniel.Wright Static analysis workaround Change 3184455 on 2016/11/02 by Mark.Satterthwaite Fix missing log10 from FCompositePS on hlslcc shader platforms so that QA can continue their integration. #jira UE-38164 Change 3184953 on 2016/11/03 by Chris.Bunner Fixing CIS warnings. [CL 3186011 by Marcus Wassmer in Main branch]
2016-11-03 16:55:27 -04:00
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
//@todo - match typical update sizes. Camera movement creates narrow slabs.
const uint32 NumGroupsX = FMath::DivideAndRoundUp<int32>(UpdateRegion.CellsSize.X, CompositeTileSize);
const uint32 NumGroupsY = FMath::DivideAndRoundUp<int32>(UpdateRegion.CellsSize.Y, CompositeTileSize);
const uint32 NumGroupsZ = FMath::DivideAndRoundUp<int32>(UpdateRegion.CellsSize.Z, CompositeTileSize);
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3185985) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3170391 on 2016/10/21 by Ben.Woodhouse Remove the wait on end of frame ensure, because we can't rely on all the the underlying codepaths to never miss a call to flush RHI resources. The consequences of missing a flush on a given frame are not serious now, since we enforce the synchronisation with a fence, preventing the rendering thread from getting too far ahead. We will simply accumulate resources for an additional frame when this happens. #jira UE-37437 #fyi rolando.caloca, marcus.wassmer Change 3170659 on 2016/10/21 by Rolando.Caloca DR - vk - Prep work for state key changes Change 3170676 on 2016/10/21 by Rolando.Caloca DR - vk - Reworked blend state keys - Added depth/stencil to pipeline key Change 3170848 on 2016/10/21 by Daniel.Wright Level viewport 'show stats' option is now enabled by default, which avoids confusion with artists thinking lighting is built, when really the message is hidden. Change 3170849 on 2016/10/21 by Daniel.Wright Split FProjectedShadowInfo::RenderProjection into smaller functions which make the algorithm structure clear Change 3170995 on 2016/10/21 by Rolando.Caloca DR - vk - Show object on vulkan validation msgs Change 3171085 on 2016/10/21 by Rolando.Caloca DR - vk - Fix pipelines being used with incompatible renderpasses Change 3171159 on 2016/10/21 by Rolando.Caloca DR - vk - Fix layout when reading textures on CPU Change 3171167 on 2016/10/21 by Rolando.Caloca DR - vk - compile fix Change 3172462 on 2016/10/24 by Daniel.Wright Added a warning about shader compile times to the material tooltip Change 3172463 on 2016/10/24 by Daniel.Wright Reduced MinUnoccludedFraction to avoid artitfacts when a stationary light touches only a tiny part of a mesh Change 3172716 on 2016/10/24 by Brian.Karis Fix for crash UE-37369 when reimporting over a generated LOD. Change 3172967 on 2016/10/24 by Rolando.Caloca DR - vk - Fix writing buffers while GPU was using them Change 3174187 on 2016/10/25 by Olaf.Piesche UE-37020 Change 3174718 on 2016/10/26 by Rolando.Caloca DR - vk - Remove old timestamp queries, increase occlusion queries per pool to 4k Change 3175960 on 2016/10/26 by Rolando.Caloca DR - Added support for hlslcc header to have custom parsing Change 3176611 on 2016/10/27 by David.Hill DrawWireCone confusion: In response to a UDN, I'm updating confusing parameter names and comments for DrawWireCone() and DrawWireSphereCappedCone() Change 3177111 on 2016/10/27 by Rolando.Caloca DR - vk - Fix timestamps for frame Change 3177192 on 2016/10/27 by Arne.Schober DR - DitherLOD refactor - moved computation of the DepthStencil state out of SetMeshRenderState into GetDitheredLODTransitionState this is a prerequisite of further PSO work where we want to move up State setting in a similar war and reuse FMeshDrawingRenderState Change 3177278 on 2016/10/27 by Olaf.Piesche UE-37484 Change 3177297 on 2016/10/27 by Rolando.Caloca DR - vk - Enable GRHISupportsBaseVertexIndex Change 3177607 on 2016/10/27 by Rolando.Caloca DR - vk - SM4 UB prep Change 3178052 on 2016/10/28 by Arne.Schober DR - fix WebGL - the WebGL compiler is very picky on double underscores and does want the presission to be defined before any function definition. Change 3178156 on 2016/10/28 by Rolando.Caloca DR - vk - Added query timer - Fixed inline issues Change 3178158 on 2016/10/28 by Rolando.Caloca DR - vk - Fixes for out of stencil bits Change 3178462 on 2016/10/28 by Rolando.Caloca DR - vk - Fixes for Elemental Change 3179131 on 2016/10/28 by Rolando.Caloca DR - vk - Fix for r.Vulkan.UseRealUBs Change 3179139 on 2016/10/28 by Rolando.Caloca DR - vk - Move UB ring buffer to context Change 3179145 on 2016/10/28 by Rolando.Caloca DR - vk - Fix buffer barriers Change 3179888 on 2016/10/31 by Rolando.Caloca DR - vk - Align buffers to 16 bytes as we sometimes write to them with SIMD Change 3179923 on 2016/10/31 by Rolando.Caloca DR - vk - Wait for swapchain counter Change 3180430 on 2016/10/31 by Rolando.Caloca DR - vk - Properly wait for occlusion queries/cmd buffer - Actual log error if trying to use occlusion queries out of order Change 3180746 on 2016/10/31 by Rolando.Caloca DR - vk - Undo some waiting as it was on the wrong thread Change 3182115 on 2016/11/01 by Rolando.Caloca DR - hlslcc Linux path fix Change 3182118 on 2016/11/01 by Daniel.Wright Fixed global distance field seam artifacts from landscapes with no subsections Change 3182368 on 2016/11/01 by Daniel.Wright Dynamic Indirect Shadows for static meshes using distance fields * These Distance Field indirect shadows use the same tile culled and downsampled framework that capsule shadows use, with similar GPU cost * Individual StaticMesh assets can enable bGenerateMeshDistanceField to compute a distance field, without the memory cost of enabling for the whole project * New StaticMeshComponent properties bCastDynamicIndirectShadow and DynamicIndirectShadowMinVisibility * New WorldSettings property DynamicIndirectShadowsSelfShadowingIntensity which replaces the cvar * The GBuffer now stores HasDynamicIndirectShadowCasterRepresentation instead of HasHeightfieldRepresentation * DFAO from landscape is now done through the global distance field entirely. Landscape contribution to the global distance field is deferred to attempt to workaround texture streaming issues. Change 3182408 on 2016/11/01 by Rolando.Caloca DR - vk - Reworked occlusion queries, fixes flickering on AMD Change 3182585 on 2016/11/01 by Daniel.Wright PS4 compile fix Change 3183151 on 2016/11/02 by Rolando.Caloca DR - vk - Fix issue when processing super quick cmd buffers Change 3183160 on 2016/11/02 by Rolando.Caloca Dr - vk - Call reset queries outside render pass Change 3183182 on 2016/11/02 by Rolando.Caloca DR - Switch clear Change 3183194 on 2016/11/02 by Rolando.Caloca DR - Try to catch crash ahead of time Change 3183268 on 2016/11/02 by Rolando.Caloca DR - vk - Rename RenderPassState to TransitionState Change 3183440 on 2016/11/02 by Daniel.Wright Renamed 'Dynamic Indirect Shadow' to 'Distance Field Indirect Shadow' Change 3183793 on 2016/11/02 by Daniel.Wright Added ShadowResolutionScale to lightcomponent Change 3183796 on 2016/11/02 by Daniel.Wright Improved bSimulatePhysics comment, with info on why it might be greyed out Change 3183797 on 2016/11/02 by Daniel.Wright Precomputed shadowmaps no longer enable Force2To1Aspect, which is only needed for lightmaps. Improves shadowmap utilization. Change 3183915 on 2016/11/02 by Rolando.Caloca DR - vk - Remove redundant renderpasses Change 3183991 on 2016/11/02 by Daniel.Wright Added r.ReflectionEnvironmentLightmapMixLargestWeight, useful for restricting lightmap mixing to darkening only Change 3184001 on 2016/11/02 by Daniel.Wright Better draw event for IndirectCapsuleShadows in stereo Change 3184096 on 2016/11/02 by Chris.Bunner HDR for D3D11 - NVAPI toggle and encoding, UI compositing. Removed some outdated tonemamping cvars and modes. Change 3184399 on 2016/11/02 by Daniel.Wright Static analysis workaround Change 3184455 on 2016/11/02 by Mark.Satterthwaite Fix missing log10 from FCompositePS on hlslcc shader platforms so that QA can continue their integration. #jira UE-38164 Change 3184953 on 2016/11/03 by Chris.Bunner Fixing CIS warnings. [CL 3186011 by Marcus Wassmer in Main branch]
2016-11-03 16:55:27 -04:00
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
IPooledRenderTarget* ParentDistanceField = GlobalDistanceFieldInfo.MostlyStaticClipmaps[ClipmapIndex].RenderTarget;
if (CacheType == GDF_Full && GAOGlobalDistanceFieldCacheMostlyStaticSeparately && ParentDistanceField)
{
TShaderMapRef<TCompositeObjectDistanceFieldsCS<true>> ComputeShader(View.ShaderMap);
RHICmdList.SetComputeShader(ComputeShader->GetComputeShader());
ComputeShader->SetParameters(RHICmdList, Scene, View, MaxOcclusionDistance, GlobalDistanceFieldInfo.ParameterData, Clipmap, ParentDistanceField, ClipmapIndex, UpdateRegion);
DispatchComputeShader(RHICmdList, *ComputeShader, NumGroupsX, NumGroupsY, NumGroupsZ);
ComputeShader->UnsetParameters(RHICmdList, Clipmap);
}
else
{
TShaderMapRef<TCompositeObjectDistanceFieldsCS<false>> ComputeShader(View.ShaderMap);
RHICmdList.SetComputeShader(ComputeShader->GetComputeShader());
ComputeShader->SetParameters(RHICmdList, Scene, View, MaxOcclusionDistance, GlobalDistanceFieldInfo.ParameterData, Clipmap, NULL, ClipmapIndex, UpdateRegion);
DispatchComputeShader(RHICmdList, *ComputeShader, NumGroupsX, NumGroupsY, NumGroupsZ);
ComputeShader->UnsetParameters(RHICmdList, Clipmap);
}
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3185985) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3170391 on 2016/10/21 by Ben.Woodhouse Remove the wait on end of frame ensure, because we can't rely on all the the underlying codepaths to never miss a call to flush RHI resources. The consequences of missing a flush on a given frame are not serious now, since we enforce the synchronisation with a fence, preventing the rendering thread from getting too far ahead. We will simply accumulate resources for an additional frame when this happens. #jira UE-37437 #fyi rolando.caloca, marcus.wassmer Change 3170659 on 2016/10/21 by Rolando.Caloca DR - vk - Prep work for state key changes Change 3170676 on 2016/10/21 by Rolando.Caloca DR - vk - Reworked blend state keys - Added depth/stencil to pipeline key Change 3170848 on 2016/10/21 by Daniel.Wright Level viewport 'show stats' option is now enabled by default, which avoids confusion with artists thinking lighting is built, when really the message is hidden. Change 3170849 on 2016/10/21 by Daniel.Wright Split FProjectedShadowInfo::RenderProjection into smaller functions which make the algorithm structure clear Change 3170995 on 2016/10/21 by Rolando.Caloca DR - vk - Show object on vulkan validation msgs Change 3171085 on 2016/10/21 by Rolando.Caloca DR - vk - Fix pipelines being used with incompatible renderpasses Change 3171159 on 2016/10/21 by Rolando.Caloca DR - vk - Fix layout when reading textures on CPU Change 3171167 on 2016/10/21 by Rolando.Caloca DR - vk - compile fix Change 3172462 on 2016/10/24 by Daniel.Wright Added a warning about shader compile times to the material tooltip Change 3172463 on 2016/10/24 by Daniel.Wright Reduced MinUnoccludedFraction to avoid artitfacts when a stationary light touches only a tiny part of a mesh Change 3172716 on 2016/10/24 by Brian.Karis Fix for crash UE-37369 when reimporting over a generated LOD. Change 3172967 on 2016/10/24 by Rolando.Caloca DR - vk - Fix writing buffers while GPU was using them Change 3174187 on 2016/10/25 by Olaf.Piesche UE-37020 Change 3174718 on 2016/10/26 by Rolando.Caloca DR - vk - Remove old timestamp queries, increase occlusion queries per pool to 4k Change 3175960 on 2016/10/26 by Rolando.Caloca DR - Added support for hlslcc header to have custom parsing Change 3176611 on 2016/10/27 by David.Hill DrawWireCone confusion: In response to a UDN, I'm updating confusing parameter names and comments for DrawWireCone() and DrawWireSphereCappedCone() Change 3177111 on 2016/10/27 by Rolando.Caloca DR - vk - Fix timestamps for frame Change 3177192 on 2016/10/27 by Arne.Schober DR - DitherLOD refactor - moved computation of the DepthStencil state out of SetMeshRenderState into GetDitheredLODTransitionState this is a prerequisite of further PSO work where we want to move up State setting in a similar war and reuse FMeshDrawingRenderState Change 3177278 on 2016/10/27 by Olaf.Piesche UE-37484 Change 3177297 on 2016/10/27 by Rolando.Caloca DR - vk - Enable GRHISupportsBaseVertexIndex Change 3177607 on 2016/10/27 by Rolando.Caloca DR - vk - SM4 UB prep Change 3178052 on 2016/10/28 by Arne.Schober DR - fix WebGL - the WebGL compiler is very picky on double underscores and does want the presission to be defined before any function definition. Change 3178156 on 2016/10/28 by Rolando.Caloca DR - vk - Added query timer - Fixed inline issues Change 3178158 on 2016/10/28 by Rolando.Caloca DR - vk - Fixes for out of stencil bits Change 3178462 on 2016/10/28 by Rolando.Caloca DR - vk - Fixes for Elemental Change 3179131 on 2016/10/28 by Rolando.Caloca DR - vk - Fix for r.Vulkan.UseRealUBs Change 3179139 on 2016/10/28 by Rolando.Caloca DR - vk - Move UB ring buffer to context Change 3179145 on 2016/10/28 by Rolando.Caloca DR - vk - Fix buffer barriers Change 3179888 on 2016/10/31 by Rolando.Caloca DR - vk - Align buffers to 16 bytes as we sometimes write to them with SIMD Change 3179923 on 2016/10/31 by Rolando.Caloca DR - vk - Wait for swapchain counter Change 3180430 on 2016/10/31 by Rolando.Caloca DR - vk - Properly wait for occlusion queries/cmd buffer - Actual log error if trying to use occlusion queries out of order Change 3180746 on 2016/10/31 by Rolando.Caloca DR - vk - Undo some waiting as it was on the wrong thread Change 3182115 on 2016/11/01 by Rolando.Caloca DR - hlslcc Linux path fix Change 3182118 on 2016/11/01 by Daniel.Wright Fixed global distance field seam artifacts from landscapes with no subsections Change 3182368 on 2016/11/01 by Daniel.Wright Dynamic Indirect Shadows for static meshes using distance fields * These Distance Field indirect shadows use the same tile culled and downsampled framework that capsule shadows use, with similar GPU cost * Individual StaticMesh assets can enable bGenerateMeshDistanceField to compute a distance field, without the memory cost of enabling for the whole project * New StaticMeshComponent properties bCastDynamicIndirectShadow and DynamicIndirectShadowMinVisibility * New WorldSettings property DynamicIndirectShadowsSelfShadowingIntensity which replaces the cvar * The GBuffer now stores HasDynamicIndirectShadowCasterRepresentation instead of HasHeightfieldRepresentation * DFAO from landscape is now done through the global distance field entirely. Landscape contribution to the global distance field is deferred to attempt to workaround texture streaming issues. Change 3182408 on 2016/11/01 by Rolando.Caloca DR - vk - Reworked occlusion queries, fixes flickering on AMD Change 3182585 on 2016/11/01 by Daniel.Wright PS4 compile fix Change 3183151 on 2016/11/02 by Rolando.Caloca DR - vk - Fix issue when processing super quick cmd buffers Change 3183160 on 2016/11/02 by Rolando.Caloca Dr - vk - Call reset queries outside render pass Change 3183182 on 2016/11/02 by Rolando.Caloca DR - Switch clear Change 3183194 on 2016/11/02 by Rolando.Caloca DR - Try to catch crash ahead of time Change 3183268 on 2016/11/02 by Rolando.Caloca DR - vk - Rename RenderPassState to TransitionState Change 3183440 on 2016/11/02 by Daniel.Wright Renamed 'Dynamic Indirect Shadow' to 'Distance Field Indirect Shadow' Change 3183793 on 2016/11/02 by Daniel.Wright Added ShadowResolutionScale to lightcomponent Change 3183796 on 2016/11/02 by Daniel.Wright Improved bSimulatePhysics comment, with info on why it might be greyed out Change 3183797 on 2016/11/02 by Daniel.Wright Precomputed shadowmaps no longer enable Force2To1Aspect, which is only needed for lightmaps. Improves shadowmap utilization. Change 3183915 on 2016/11/02 by Rolando.Caloca DR - vk - Remove redundant renderpasses Change 3183991 on 2016/11/02 by Daniel.Wright Added r.ReflectionEnvironmentLightmapMixLargestWeight, useful for restricting lightmap mixing to darkening only Change 3184001 on 2016/11/02 by Daniel.Wright Better draw event for IndirectCapsuleShadows in stereo Change 3184096 on 2016/11/02 by Chris.Bunner HDR for D3D11 - NVAPI toggle and encoding, UI compositing. Removed some outdated tonemamping cvars and modes. Change 3184399 on 2016/11/02 by Daniel.Wright Static analysis workaround Change 3184455 on 2016/11/02 by Mark.Satterthwaite Fix missing log10 from FCompositePS on hlslcc shader platforms so that QA can continue their integration. #jira UE-38164 Change 3184953 on 2016/11/03 by Chris.Bunner Fixing CIS warnings. [CL 3186011 by Marcus Wassmer in Main branch]
2016-11-03 16:55:27 -04:00
}
}
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
if (UpdateRegion.UpdateType & VUT_Heightfields)
{
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3250856 on 2017/01/09 by Daniel.Wright Only showing instruction count for 'Base pass shader' now Change 3250943 on 2017/01/09 by Rolando.Caloca DR - Async Compute PSO creation Change 3251036 on 2017/01/09 by Rolando.Caloca DR - Add r.AsyncPipelineCompile - Dispatch on any thread - Wait for completion event Change 3251058 on 2017/01/09 by Ben.Woodhouse Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN) #jira UE-40332 Change 3251141 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite CL 3243458: D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory. On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage. Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX Change 3251142 on 2017/01/09 by Ben.Woodhouse Duplicated from Fortnite 3243496 memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time. Change 3252323 on 2017/01/10 by Rolando.Caloca DR - Gfx async PSO creation prep Change 3252474 on 2017/01/10 by Daniel.Wright Added 'Compile Unreal Lightmass' to error message Change 3252589 on 2017/01/10 by Daniel.Wright Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite Change 3252790 on 2017/01/10 by Daniel.Wright Added InscatteringColorCubemapAngle to exponential height fog Change 3252843 on 2017/01/10 by Uriel.Doyon Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways. The bound defrag is now done outside of the async work logic. Change 3252866 on 2017/01/10 by Mark.Satterthwaite Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future. #jira UE-40357 Change 3254511 on 2017/01/11 by Rolando.Caloca DR - PSO stats Change 3255958 on 2017/01/12 by Mark.Satterthwaite Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed. #jira UE-40554 Change 3256329 on 2017/01/12 by Olaf.Piesche #jira UE-38615 Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime. Change 3256371 on 2017/01/12 by Uriel.Doyon Reenabled texture streaming bound defrag as the fix is in CL 3252843 Change 3257032 on 2017/01/13 by Daniel.Wright Added fastClamp to fastmath.usf Change 3257111 on 2017/01/13 by Daniel.Wright Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game Change 3257112 on 2017/01/13 by Daniel.Wright DFAO optimizations * Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms) * Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms) * Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms * Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms) Change 3257113 on 2017/01/13 by Daniel.Wright Better distance field memory stats Change 3257326 on 2017/01/13 by Uriel.Doyon Workaround to support cases where several textures have the same lighting GUID. Change 3257448 on 2017/01/13 by Daniel.Wright Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles Change 3257616 on 2017/01/13 by Daniel.Wright Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable Change 3257657 on 2017/01/13 by Daniel.Wright Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU * 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms Change 3258063 on 2017/01/14 by Rolando.Caloca DR - vk - Refactor descriptor set reuse in prep for more changes Change 3258715 on 2017/01/16 by Daniel.Wright Added VisualizeGlobalDistanceField show flag Change 3258827 on 2017/01/16 by Daniel.Wright Global distance field update regions are clipped against others to reduce redundant updates. Change 3258959 on 2017/01/16 by Benjamin.Hyder Updating Planar Reflection example material in TM-Shadermodels Change 3259270 on 2017/01/16 by Daniel.Wright [Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons. Change 3259652 on 2017/01/16 by Uriel.Doyon Better support for static primitive becoming dynamic. Change 3260107 on 2017/01/17 by Ben.Woodhouse Fix FMonitoredProcess to prevent infinite loop in -nothreading mode #jira UE-40717 Change 3260594 on 2017/01/17 by Daniel.Wright Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary) * The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field * Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same. * Adds 12Mb for the new volume textures Change 3260956 on 2017/01/17 by Daniel.Wright Structured buffers for DF object data * Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads Change 3261296 on 2017/01/17 by Daniel.Wright Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms Change 3262036 on 2017/01/18 by Ben.Salem V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details. Change 3262056 on 2017/01/18 by Chris.Bunner Remove inverse tonemapping when rendering HDR output. #jira UE-40728 Change 3262661 on 2017/01/18 by Rolando.Caloca DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs - Fix hash for PSOs Change 3263674 on 2017/01/19 by Chris.Bunner PR #3144: Improved error messages (Contributed by DarkSlot) #jira UE-40835 Change 3264150 on 2017/01/19 by Ben.Woodhouse Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode #jira UE-40841 Change 3264153 on 2017/01/19 by Ben.Woodhouse Integrate latest changes from MS-DX12 CLs 3231395-3262526 - Added WinPixEventRuntime.tps - Includes PIX support, various optimizations (saved 1.3ms in testbed scene) CL 3262343: Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes: - Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case. CL 3231395: Update D3D12 RHI: - Fix deferred MSAA path in RHI - Add Pix3.h support - Cleanup SetName usage and remove it from shipping builds. - Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value. - Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64 - Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version. - Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident. Change 3264251 on 2017/01/19 by Mark.Satterthwaite Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions. #jira UE-40803 Change 3264642 on 2017/01/19 by Daniel.Wright Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096. Change 3265330 on 2017/01/20 by Ben.Salem Stop performance plugin from building in Win32. #tests recompiled and preflighted Change 3265678 on 2017/01/20 by Marcus.Wassmer Fix bad declaration. #3055 Change 3266656 on 2017/01/20 by Mark.Satterthwaite Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging). Duplicate & amend CL #3266053 from Trepka: Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled. Amend & integrate RCO's CL #3197085. Change 3267741 on 2017/01/23 by Rolando.Caloca DR - Detect duplicated shader and pipeline types Change 3268600 on 2017/01/23 by Uriel.Doyon Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings. Integrated CL 3227368 from Orion stream Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months. Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing, Added th MaxEffectiveScreenSize settings in the investigate texture command. Change 3269512 on 2017/01/24 by Richard.Wallis Fix for shader binary cache uncompress data size during internal shader log. Change 3271237 on 2017/01/25 by Ben.Woodhouse D3D12 updateTexture2D crash fix #jira UE-41059 Change 3271564 on 2017/01/25 by Olaf.Piesche #jira UE-40980 #udn 325525 Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe Change 3271594 on 2017/01/25 by Ben.Woodhouse ESRAM support stage 1: Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range. Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR #fyi rolando.caloca Change 3272616 on 2017/01/25 by Rolando.Caloca DR - Update shader version Change 3273138 on 2017/01/26 by Ben.Woodhouse Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main) [CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
View.HeightfieldLightingViewInfo.CompositeHeightfieldsIntoGlobalDistanceField(RHICmdList, Scene, View, MaxOcclusionDistance, GlobalDistanceFieldInfo, ClipmapIndex, UpdateRegion);
}
}
}
}
}
}
}
void ListGlobalDistanceFieldMemory()
{
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3357411) #lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3244756 on 2017/01/03 by Marcus.Wassmer Copying //Tasks/UE4/Dev-Niagara@3244743 to Dev-Rendering (//UE4/Dev-Rendering) Change 3248667 on 2017/01/05 by Olaf.Piesche Resaving default asset because of engine verison issue; maybe unnecessary, but resaving niagara engine content to be sure #jira UE-40160 Change 3249324 on 2017/01/06 by Marcus.Wassmer Resave with an actual version to stop cook warning Change 3249611 on 2017/01/06 by Marcus.Wassmer Just remove warning-causing niagara data for now. Change 3308052 on 2017/02/16 by Rolando.Caloca DR - Check for Vulkan SDK, and only use it if it's newer or the same as the headers we distribute Change 3308109 on 2017/02/16 by Rolando.Caloca DR - Upgrade glslang to 1.0.39.1 Change 3308111 on 2017/02/16 by Rolando.Caloca DR - Update Vulkan distribution to 1.0.39.1 Change 3308153 on 2017/02/16 by Rolando.Caloca DR - Updated glslang libs Change 3308842 on 2017/02/17 by Rolando.Caloca DR - Fixed copy/paste Change 3310007 on 2017/02/17 by Chris.Bunner Back out CL 3221219 - causing MIC generation issues and superseded by CL 3273971. #jira UE-37792 Change 3310154 on 2017/02/17 by Chris.Bunner Assert when attempting to add a custom material attribute already in the base attributes list. Change 3310155 on 2017/02/17 by Chris.Bunner PR #3231: Validate material index before accessing (Contributed by projectgheist) #jira UE-41774, UE-41788 Change 3310162 on 2017/02/17 by Chris.Bunner PR #3252: Added MobileMaterialInterface to UsedMaterials (Contributed by projectgheist) #jira UE-41823, UE-41950 Change 3310176 on 2017/02/17 by Chris.Bunner Merging CL 3233886: AMD HDR support (requires r.AMDSupportsHDRDisplayOutput=1 in ini). Update to AGS 5.0.5. Partial code tidy up. Change 3310187 on 2017/02/17 by Chris.Bunner Preserve constant expressions rather than always casting after translating a material attribute. Losing the notion of constant means we can't correctly detect used properties and falsely enable e.g. PDO. Happened because of the incorrect component masks in BreakMaterialNodes which then had to be downcast to the correct type which is done as an inline fragment rather than swizzle expression. #jira UE-41594 Change 3310215 on 2017/02/17 by Chris.Bunner Prevent SpeedTree node compiling for skeletal meshes (not supported as uses more UV sets than available). More descriptive error for missing Cubemap UV input on TextureSample material node . #jira UE-33098 Change 3310838 on 2017/02/18 by Joe.Graf Moved some private functions to public for a licensee #CodeReview: matt.kuhlenschmidt #rb: n/a Change 3311876 on 2017/02/20 by Rolando.Caloca DR - Expose skin cache cvar r.SkinCache.AccumulationBufferSizeInKB #jira UE-42014 Change 3314139 on 2017/02/21 by Rolando.Caloca DR - Minor cleanup pass - Remove FVulkanPendingState - Renamed some classes for clarity - Hoist pending UAVs for flush out to pending compute state Change 3314642 on 2017/02/21 by Rolando.Caloca DR - Some more renaming Change 3315431 on 2017/02/21 by Ben.Salem Properly set default values for test time out and tick. We now will default to ticking once per second, and tracking the macro stats of GPU/Render/Game thread time. #tests Ran showdown demo several times Change 3316710 on 2017/02/22 by Rolando.Caloca DR - hlslcc - Fix refract intrinsic Change 3316718 on 2017/02/22 by Rolando.Caloca DR - hlslcc - Built libs to pick up change from 3316710 - refract fix Change 3316820 on 2017/02/22 by Benjamin.Hyder updating Tm-TrigNodes map Change 3317192 on 2017/02/22 by Benjamin.Hyder Updating QA-Decals map Change 3317528 on 2017/02/22 by Benjamin.Hyder Updating QA-Decals map Change 3317639 on 2017/02/22 by Benjamin.Hyder Updating Decal on Complex Mesh example in QA-Decals Change 3317764 on 2017/02/22 by Benjamin.Hyder Final updates to QA-Decals Change 3318319 on 2017/02/22 by Rolando.Caloca DR - minor reorg/rename Change 3318379 on 2017/02/22 by Rolando.Caloca DR - more cleanup Change 3321181 on 2017/02/24 by Rolando.Caloca DR - Fix GL bug Change 3321247 on 2017/02/24 by Rolando.Caloca DR - Fix misc bugs Change 3321898 on 2017/02/24 by Chris.Bunner Only issue clear TLV dispatch if required. #jira UERNDR-193 Change 3321904 on 2017/02/24 by Chris.Bunner Added comment for potential future optimization. Change 3322013 on 2017/02/24 by Uriel.Doyon Fixed separate translucency being affected by Gaussian DOF #jira UE-40489 Change 3322517 on 2017/02/24 by Uriel.Doyon Fixed issue with InvestigateTexture command removing budget limit. Fixed StreamingBounds show flag not working. It nows shows the streaming bound for the currently selected textures. #jira UE-40485 Change 3323470 on 2017/02/27 by Chad.Garyet Removing DDC job from dev-rendering Change 3323479 on 2017/02/27 by Chad.Garyet Removing RDU agent type Change 3323519 on 2017/02/27 by Chad.Garyet removing NCL/LHR/SEA agent types to clean up space Change 3323639 on 2017/02/27 by Benjamin.Hyder More updates to QA-Decals Change 3324207 on 2017/02/27 by Uriel.Doyon Fixed typo ScaleTexturesByGlobalMyBias -> ScaleTexturesByGlobalMipBias Removed bad merge in FStreamingTextureLevelContext::GetBuildDataIndexRef Change 3324396 on 2017/02/27 by Uriel.Doyon Fixed an issue with the Streaming Bounds show flag interferring with the static level data initialization #jira UE-40485 Change 3325227 on 2017/02/28 by Chris.Bunner Fix-up AMD AGS libs. Change 3325566 on 2017/02/28 by Uriel.Doyon Fixed possible out-of-bound access in GetUsedTexture() when passing ERHIFeatureLevel::Num Change 3326009 on 2017/02/28 by Uriel.Doyon Better fix for 3325566, as the previous fix would ignore the material instance overrides. Change 3327058 on 2017/03/01 by Benjamin.Hyder Preparing TM_Shadermodels map for automation Change 3328222 on 2017/03/01 by Chris.Bunner Prevent decals from drawing in separate translucency pass. Whilst user control and material relevance were already removed, if the flag was checked before being disabled (by swapping to decal domain) this was still being read in the render loop, now explicitly ignores decals. #jira UE-42449, UE-42446 Change 3329848 on 2017/03/02 by Uriel.Doyon Added some extra logs to help track UE-42168 Change 3329977 on 2017/03/02 by Rolando.Caloca DR - Fix bad clear value Change 3330008 on 2017/03/02 by Benjamin.Hyder More preparations for QA-Decals automation Change 3330754 on 2017/03/02 by Daniel.Wright Prominent comment explaining reflection env async compute usage and why it's not overlapped with anything Change 3331451 on 2017/03/03 by Marc.Olano Manually unroll simplex noise loop to avoid PSO bug on AMD/Metal Change 3331839 on 2017/03/03 by Rolando.Caloca DR - hlslcc - add missing file to project Change 3332247 on 2017/03/03 by Rolando.Caloca DR - Fix for integrated intel PR #3305 #jira UE-42393 Change 3332259 on 2017/03/03 by Rolando.Caloca DR - Fix bad index into pixel formats PR #3237 #jira UE-41855 Change 3332305 on 2017/03/03 by Rolando.Caloca DR - OpenGL SRV for index buffers PR #3271 #jira UE-32618 Change 3332313 on 2017/03/03 by Rolando.Caloca DR - Fix for integrated intel (properly) PR #3305 #jira UE-42393 Change 3332317 on 2017/03/03 by Rolando.Caloca DR - OpenGL SRV for index buffers (properly) PR #3271 #jira UE-32618 Change 3332368 on 2017/03/03 by Rolando.Caloca DR - Minor fixes so -sm4 and -sm5 can be used on windows with OpenGL/Vulkan Change 3333690 on 2017/03/06 by Daniel.Wright [Copy] Changing movable skylight properties no longer affects static draw lists Change 3333693 on 2017/03/06 by Daniel.Wright [Copy] Added 'r.AOListMeshDistanceFields' which dumps out mesh distance fields sorted by memory size, useful for directing content optimizations Change 3333705 on 2017/03/06 by Daniel.Wright [Copy] Mesh distance fields are now 8 bit fixed point by default, but can be changed back to 16 bit floating piont with a project setting. * 8 bit uses half memory but introduces error for thin surfaces or large meshes. Change 3333721 on 2017/03/06 by David.Hill DecalProxy: Copy float FadeScreenSize to FDeferredDecalProxy for use in the render thread. This avoids pointer chasing to the UDecalComponent (game thread component). Change 3333772 on 2017/03/06 by Daniel.Wright [Copy] Scene motion blur data is only updated for the main renderer frames. Fixes scene captures and planar reflections breaking object motion blur. Change 3333790 on 2017/03/06 by Daniel.Wright [Copy] Mesh distance field generation uses Embree, for a 2.5x speedup * Can switch back to old kDOP generation with 'r.DistanceFieldBuild.UseEmbree 0' for debugging Change 3333822 on 2017/03/06 by Daniel.Wright [Copy] Moved mesh distance field code into MeshDistanceFieldUtilities.cpp * Moved FMeshUtilities to its own header so the 8k line MeshUtilites.cpp file can be further split up Change 3333827 on 2017/03/06 by Daniel.Wright [Copy] Range compress 8bit distance fields - gets one extra bit of precision on average Change 3333828 on 2017/03/06 by Daniel.Wright [Copy] Raised High ShadowQuality to 2048 as 1024 for CSM is way too low Change 3333831 on 2017/03/06 by Daniel.Wright Non-editor compile fix Change 3333836 on 2017/03/06 by Daniel.Wright [Copy] Workaround for gobal distance field volume textures being bloated by 4x on PS4 due to the recommended tiling modes. They now use a 2d tiling mode which avoids the bloat, saving 96Mb. Change 3333843 on 2017/03/06 by Daniel.Wright [Copy] Added OcclusionExponent to skylight component * Useful for brightening up indoors without losing contact shadows as MinOcclusion does Change 3333845 on 2017/03/06 by Daniel.Wright [Copy] Capsule shadow BP functions Change 3333850 on 2017/03/06 by Daniel.Wright [Copy] Added OcclusionCombineMode to skylight component Change 3333854 on 2017/03/06 by Daniel.Wright [Copy] Gnm properly registers clears as GPU work so those events show up in profilegpu Change 3333857 on 2017/03/06 by Daniel.Wright [Copy] Clear light attenuation for local lights with a quad covering their screen extents * Clearing the entire light attenuation buffer costs .1ms on PS4. This optimization lowers the minimum cost of a shadow casting light from .15ms -> .03ms. * Shadowed lights in Fortnite with 25 lights 3.7ms -> 1.42ms on PS4 Change 3333860 on 2017/03/06 by Daniel.Wright [Copy] Flush deferred deletes when reallocating distance field atlas to reduce peak memory Change 3333861 on 2017/03/06 by Daniel.Wright [Copy] Disable all distance field features on Intel cards as HD 4000 hangs in the RHICreateTexture3D call to allocate the large atlas Change 3333869 on 2017/03/06 by Daniel.Wright [Copy] Volumetric Fog using a volume texture mapped to the camera frustum * Volumetric fog can be enabled on an Exponential Height Fog component with additional controls * Lights have a VolumetricScatteringIntensity * New cvars r.VolumetricFog, r.VolumetricFog.GridPixelSize, r.VolumetricFog.GridSizeZ, r.VolumetricFog.DepthDistributionScale * Lighting features supported: * Directional light with CSM and a light function * Point / spot lights without shadows / light functions / IES profiles * Skylight with occlusion from distance fields * Analytical height fog covers the view range past where the volumetric fog ends * Temporal reprojection is used on the volumetric fog scattering and extinction to achieve stability * Translucency integrates properly into volumetric fog * Height fog StartDistance is not supported by volumetric fog and should be set to 0. Change 3333894 on 2017/03/06 by Daniel.Wright [Copy] Initialize GDummyVolumetricFogGlobalDataUniformBuffer outside of parallel rendering Change 3333902 on 2017/03/06 by Daniel.Wright [Copy] Better handling of volumetric fog enabled with distance of 0 Change 3333903 on 2017/03/06 by Daniel.Wright [Copy] Fixed volumetric fog trying to render light functions for a point light Change 3333908 on 2017/03/06 by Daniel.Wright [Copy] Volumetric materials * Added new material domain Volume, which can output Scattering, Absorption and Emissive. All properties are in world space densities. * Particle systems using the Volume domain are voxelized based on their ParticlePosition and ParticleRadius * Volumetric fog integration is now energy conservative - scattering is integrated against transmission over the depth of each slice. * Added bOverrideLightColorsWithFogInscatteringColors to exponential height fog, which can be enabled to make Volumetric Fog match Height fog more closely Change 3334134 on 2017/03/06 by Daniel.Wright [Copy from Michael Trepka] Added Embree 2.14.0 and changed MeshUtilities to use it as this solves issues with Embree leaking TLS keys. UnrealLightmass is still using older Embree 2.7.0 until we can find time to properly test it with the new version. Also, invalidated distance field DDC to force it to rebuild with updated Embree. Change 3334420 on 2017/03/06 by Daniel.Wright Fixed RTDF shadows Change 3335467 on 2017/03/07 by Benjamin.Hyder Initial submission of QA-Decals map to EngineTest Change 3335556 on 2017/03/07 by Daniel.Wright Changed mesh distance field default format back to R16f Change 3338020 on 2017/03/08 by Daniel.Wright Disable volumetric fog in vertex shaders for feature levels which don't support it Change 3339394 on 2017/03/09 by Chris.Bunner Correctly handle material texture translation error edge case. #jira UE-42579, UE-42670 Change 3339992 on 2017/03/09 by Daniel.Wright Only compile volumetric fog shaders on supporting platforms Change 3341858 on 2017/03/10 by Arne.Schober Copying //UE4/Dev-Rendering-PSO to Dev-Rendering (//UE4/Dev-Rendering) #RB Rolando.Caloca, Marcus.Wassmer, Daniel.Wright, Nick.Penwarden, Mark.Satterthwaite Change 3342004 on 2017/03/10 by Arne.Schober Copying //UE4/Dev-Rendering-PSO to Dev-Rendering (//UE4/Dev-Rendering) Fix unity build #RB Marcus.Wassmer Change 3343307 on 2017/03/13 by Marcus.Wassmer Update showflags when we are guaranteed it will happen in all possible ways to spawn the scenecapture. (drag into editor, PIE, -game, etc) Change 3343732 on 2017/03/13 by Rolando.Caloca DR - Vulkan compute pipeline & refactor Change 3344846 on 2017/03/14 by Rolando.Caloca DR - Android compile fixes Change 3344883 on 2017/03/14 by Rolando.Caloca DR - Add missing stencil load/store to PSO initializer Change 3344985 on 2017/03/14 by Rolando.Caloca DR - Made load/store actions uint8 Change 3345141 on 2017/03/14 by Rolando.Caloca DR - vk - Rework render pass hash Change 3345304 on 2017/03/14 by Benjamin.Hyder Updating TM-Distancefields map to include TemplateFloor mesh Change 3345387 on 2017/03/14 by Rolando.Caloca DR - Add _RenderThread calls for Create*Shader so RHIs can choose not to stall when creating Change 3345388 on 2017/03/14 by Rolando.Caloca DR - Do not stall when creating shaders on Vulkan Change 3345722 on 2017/03/14 by Chris.Bunner PR #3357: MinimalAPI add to many material expressions (Contributed by DeanoC) #jira UE-42752 Change 3345723 on 2017/03/14 by Chris.Bunner Reduce log verbosity causing spamming during landscape editing. #jira UE-42714 Change 3345725 on 2017/03/14 by Chris.Bunner [Duplicate 3341860] Fixed material translation error with multiple connections from custom interpolator nodes. Change 3345726 on 2017/03/14 by Chris.Bunner Typo fixes. Change 3345732 on 2017/03/14 by Rolando.Caloca DR - Decouple vertex declaration off BSS Change 3345746 on 2017/03/14 by Chris.Bunner Added sign() intrinsic material graph node and delisted material function workaround. Change 3346042 on 2017/03/14 by Chris.Bunner Implement missing size query interface for FRenderTargetResources. #jira UE-41672 Change 3346387 on 2017/03/14 by Daniel.Wright [Copy] Added VolumetricScatteringIntensity to particle lights Change 3346389 on 2017/03/14 by Daniel.Wright [Copy] Clamp Volumetric material attributes to fp16 range to avoid INFs Disable volumetric fog when the fog show flag is disabled Change 3346392 on 2017/03/14 by Daniel.Wright [Copy] Fixed skylight being much too bright on volumetric fog Change 3346406 on 2017/03/14 by Daniel.Wright [Copy] CSM resolution is now controlled by r.Shadow.MaxCSMResolution. * Changed HighPC to use 1024 MaxShadowResolution (max for all non-CSM shadows), saves 60Mb in Fortnite Change 3346412 on 2017/03/14 by Daniel.Wright [Copy] TexCreate_ReduceMemoryWithTilingMode for translucency lighting 3d textures, saves 13Mb Change 3346414 on 2017/03/14 by Daniel.Wright [Copy] TexCreate_ReduceMemoryWithTilingMode for volumetric fog 3d textures, saves 13Mb Change 3346415 on 2017/03/14 by Daniel.Wright [Copy] Missing file from cl 3338451 Change 3346421 on 2017/03/14 by Daniel.Wright [Copy] Fixed NaNs in volumetric fog due to rendering when height fog is disabled * Volumetric fog converts NaNs to black now so they don't spread Change 3346422 on 2017/03/14 by Daniel.Wright [Copy] Fixed NaN in volumetric fog with low density values Change 3346423 on 2017/03/14 by Daniel.Wright [Copy] Changed default VolumetricFogScatteringDistribution to .2 Change 3346430 on 2017/03/14 by Daniel.Wright [Copy] New translucent material option to compute fog per pixel instead of the default per vertex Change 3346432 on 2017/03/14 by Daniel.Wright [Copy] Moved Volumetric Fog parameters to view uniform buffer for translucency pass Fixed lifetimes of temporary Volumetric Fog render targets Change 3346526 on 2017/03/14 by Daniel.Wright [Copy] Volumetric Fog supports point and spot light shadows * These lights are injected separately so that per-light resources can be bound (shadow depth map, static shadow depth map) * Forward lighting of local lights can be forced with 'r.VolumetricFog.InjectShadowedLightsSeparately 0' * Shadowed lights come at a cost: 2.9ms for volumetric fog on 970 -> 4.2ms with shadowing Change 3347053 on 2017/03/15 by Rolando.Caloca DR - android compile fix Change 3347384 on 2017/03/15 by Rolando.Caloca DR - Fix merge issue Change 3347643 on 2017/03/15 by Marcus.Wassmer Fix some bugs with the 'disable stationary skylight ffor the project' feature. Fixes lighting in Persona on Paragon. Change 3347979 on 2017/03/15 by Rolando.Caloca DR - Allow to automatically apply cached rendertargets to PSO initializer Change 3348024 on 2017/03/15 by Rolando.Caloca DR - Remove NullPS on Vulkan to avoid deadlock Change 3348303 on 2017/03/15 by Rolando.Caloca DR - Fix for debugging SCW with material SRT Change 3348357 on 2017/03/15 by Marcus.Wassmer Fix stencildither and a stencilref bug that was probably breaking decals sometimes. Change 3348549 on 2017/03/15 by Marcus.Wassmer Hopefully fix static analysis for potential nullptr access. Change 3348614 on 2017/03/15 by Marcus.Wassmer Duplicate some switch changes to fix crash on launch. Change 3349369 on 2017/03/16 by Gil.Gribb Fixed botched merge Change 3349947 on 2017/03/16 by Rolando.Caloca DR - Fix for mismatched primitive type Change 3349956 on 2017/03/16 by Benjamin.Hyder initial updates to TM-DistanceFields map Change 3350151 on 2017/03/16 by Rolando.Caloca DR - Fix UT compile issue Change 3350155 on 2017/03/16 by Rolando.Caloca DR - Catch mismatched primitive type on PSOs on D3D11 Change 3350192 on 2017/03/16 by Daniel.Wright Fix for point light shadow depths rendering with wrong cull mode due to PSO refactor Change 3350736 on 2017/03/16 by Daniel.Wright Fixed formatting from merge Change 3350881 on 2017/03/16 by Rolando.Caloca DR - Fix texture arrays as UAVs on Metal Change 3350927 on 2017/03/16 by Rolando.Caloca DR - Fix warning Change 3350935 on 2017/03/16 by Daniel.Wright Fix for materials with non-Surface domains being skipped in mesh passes Change 3351583 on 2017/03/17 by Marcus.Wassmer Fix clang platforms Change 3351917 on 2017/03/17 by Marcus.Wassmer Fix linux compile Change 3351973 on 2017/03/17 by Marcus.Wassmer Fix mismatched rendertargetformat Change 3352038 on 2017/03/17 by Daniel.Wright Enabled GetAndOrCreateGraphicsPipelineState ensures in Development for testing Change 3352110 on 2017/03/17 by Marcus.Wassmer Fix missing RT PSO apply Change 3352695 on 2017/03/17 by Arne.Schober DR - Remove PSO Rendertarget check in DX12 Resolve with Shader. #RB Rolando.Caloca Change 3352960 on 2017/03/17 by Arne.Schober DR - Fix some things that slipped trough the PSO merge #RB none Change 3353150 on 2017/03/18 by Rolando.Caloca DR - compile fix Change 3353205 on 2017/03/18 by Arne.Schober DR - Fix Incremental Compile and PS4 runtime error where CMASK is not allowed for ThickTile Mode #RB none Change 3353207 on 2017/03/18 by Arne.Schober DR - Fix Confusion #RB none Change 3355183 on 2017/03/20 by Nick.Bullard Fixed up Content orginzation for Decals automation tests in EngineTest Change 3355627 on 2017/03/20 by Arne.Schober DR - [UE-43094] - removed ensure in comporiton graph as control of the clear color cannot be gurantueed. Change 3356342 on 2017/03/21 by Marcus.Wassmer Fix clang errors Change 3356591 on 2017/03/21 by Arne.Schober DR - Fix ensure message #RB none Change 3356873 on 2017/03/21 by Arne.Schober DR - Fix comparission of undefined values in RendertargetApply Check Change 3357261 on 2017/03/21 by Marcus.Wassmer Fix LinuxEditor compile Change 3357294 on 2017/03/21 by Marcus.Wassmer Add missing SSE functions Change 3357351 on 2017/03/21 by Frank.Fella Fix win32 and linux compiler errors Change 3357370 on 2017/03/21 by Arne.Schober DR - disable ensure in test builds #RB Marcus.Wassmer [CL 3357449 by Marcus Wassmer in Main branch]
2017-03-21 17:46:52 -04:00
UE_LOG(LogRenderer, Log, TEXT(" Global DF culled objects %.3fMb"), (GGlobalDistanceFieldCulledObjectBuffers.Buffers.GetSizeBytes() + GObjectGridBuffers.GetSizeBytes()) / 1024.0f / 1024.0f);
}