2019-12-26 14:45:42 -05:00
// Copyright Epic Games, Inc. All Rights Reserved.
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
/*=============================================================================
DistanceFieldObjectCulling . cpp
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = */
# include "DistanceFieldAmbientOcclusion.h"
# include "DeferredShadingRenderer.h"
# include "PostProcess/PostProcessing.h"
# include "PostProcess/SceneFilterRendering.h"
# include "DistanceFieldLightingShared.h"
# include "ScreenRendering.h"
# include "DistanceFieldLightingPost.h"
# include "OneColorShader.h"
# include "GlobalDistanceField.h"
# include "FXSystem.h"
# include "PostProcess/PostProcessSubsurface.h"
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3357411)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3244756 on 2017/01/03 by Marcus.Wassmer
Copying //Tasks/UE4/Dev-Niagara@3244743 to Dev-Rendering (//UE4/Dev-Rendering)
Change 3248667 on 2017/01/05 by Olaf.Piesche
Resaving default asset because of engine verison issue; maybe unnecessary, but resaving niagara engine content to be sure
#jira UE-40160
Change 3249324 on 2017/01/06 by Marcus.Wassmer
Resave with an actual version to stop cook warning
Change 3249611 on 2017/01/06 by Marcus.Wassmer
Just remove warning-causing niagara data for now.
Change 3308052 on 2017/02/16 by Rolando.Caloca
DR - Check for Vulkan SDK, and only use it if it's newer or the same as the headers we distribute
Change 3308109 on 2017/02/16 by Rolando.Caloca
DR - Upgrade glslang to 1.0.39.1
Change 3308111 on 2017/02/16 by Rolando.Caloca
DR - Update Vulkan distribution to 1.0.39.1
Change 3308153 on 2017/02/16 by Rolando.Caloca
DR - Updated glslang libs
Change 3308842 on 2017/02/17 by Rolando.Caloca
DR - Fixed copy/paste
Change 3310007 on 2017/02/17 by Chris.Bunner
Back out CL 3221219 - causing MIC generation issues and superseded by CL 3273971.
#jira UE-37792
Change 3310154 on 2017/02/17 by Chris.Bunner
Assert when attempting to add a custom material attribute already in the base attributes list.
Change 3310155 on 2017/02/17 by Chris.Bunner
PR #3231: Validate material index before accessing (Contributed by projectgheist)
#jira UE-41774, UE-41788
Change 3310162 on 2017/02/17 by Chris.Bunner
PR #3252: Added MobileMaterialInterface to UsedMaterials (Contributed by projectgheist)
#jira UE-41823, UE-41950
Change 3310176 on 2017/02/17 by Chris.Bunner
Merging CL 3233886: AMD HDR support (requires r.AMDSupportsHDRDisplayOutput=1 in ini).
Update to AGS 5.0.5.
Partial code tidy up.
Change 3310187 on 2017/02/17 by Chris.Bunner
Preserve constant expressions rather than always casting after translating a material attribute. Losing the notion of constant means we can't correctly detect used properties and falsely enable e.g. PDO. Happened because of the incorrect component masks in BreakMaterialNodes which then had to be downcast to the correct type which is done as an inline fragment rather than swizzle expression.
#jira UE-41594
Change 3310215 on 2017/02/17 by Chris.Bunner
Prevent SpeedTree node compiling for skeletal meshes (not supported as uses more UV sets than available).
More descriptive error for missing Cubemap UV input on TextureSample material node .
#jira UE-33098
Change 3310838 on 2017/02/18 by Joe.Graf
Moved some private functions to public for a licensee
#CodeReview: matt.kuhlenschmidt
#rb: n/a
Change 3311876 on 2017/02/20 by Rolando.Caloca
DR - Expose skin cache cvar r.SkinCache.AccumulationBufferSizeInKB
#jira UE-42014
Change 3314139 on 2017/02/21 by Rolando.Caloca
DR - Minor cleanup pass
- Remove FVulkanPendingState
- Renamed some classes for clarity
- Hoist pending UAVs for flush out to pending compute state
Change 3314642 on 2017/02/21 by Rolando.Caloca
DR - Some more renaming
Change 3315431 on 2017/02/21 by Ben.Salem
Properly set default values for test time out and tick. We now will default to ticking once per second, and tracking the macro stats of GPU/Render/Game thread time.
#tests Ran showdown demo several times
Change 3316710 on 2017/02/22 by Rolando.Caloca
DR - hlslcc - Fix refract intrinsic
Change 3316718 on 2017/02/22 by Rolando.Caloca
DR - hlslcc - Built libs to pick up change from 3316710 - refract fix
Change 3316820 on 2017/02/22 by Benjamin.Hyder
updating Tm-TrigNodes map
Change 3317192 on 2017/02/22 by Benjamin.Hyder
Updating QA-Decals map
Change 3317528 on 2017/02/22 by Benjamin.Hyder
Updating QA-Decals map
Change 3317639 on 2017/02/22 by Benjamin.Hyder
Updating Decal on Complex Mesh example in QA-Decals
Change 3317764 on 2017/02/22 by Benjamin.Hyder
Final updates to QA-Decals
Change 3318319 on 2017/02/22 by Rolando.Caloca
DR - minor reorg/rename
Change 3318379 on 2017/02/22 by Rolando.Caloca
DR - more cleanup
Change 3321181 on 2017/02/24 by Rolando.Caloca
DR - Fix GL bug
Change 3321247 on 2017/02/24 by Rolando.Caloca
DR - Fix misc bugs
Change 3321898 on 2017/02/24 by Chris.Bunner
Only issue clear TLV dispatch if required.
#jira UERNDR-193
Change 3321904 on 2017/02/24 by Chris.Bunner
Added comment for potential future optimization.
Change 3322013 on 2017/02/24 by Uriel.Doyon
Fixed separate translucency being affected by Gaussian DOF
#jira UE-40489
Change 3322517 on 2017/02/24 by Uriel.Doyon
Fixed issue with InvestigateTexture command removing budget limit.
Fixed StreamingBounds show flag not working. It nows shows the streaming bound for the currently selected textures.
#jira UE-40485
Change 3323470 on 2017/02/27 by Chad.Garyet
Removing DDC job from dev-rendering
Change 3323479 on 2017/02/27 by Chad.Garyet
Removing RDU agent type
Change 3323519 on 2017/02/27 by Chad.Garyet
removing NCL/LHR/SEA agent types to clean up space
Change 3323639 on 2017/02/27 by Benjamin.Hyder
More updates to QA-Decals
Change 3324207 on 2017/02/27 by Uriel.Doyon
Fixed typo ScaleTexturesByGlobalMyBias -> ScaleTexturesByGlobalMipBias
Removed bad merge in FStreamingTextureLevelContext::GetBuildDataIndexRef
Change 3324396 on 2017/02/27 by Uriel.Doyon
Fixed an issue with the Streaming Bounds show flag interferring with the static level data initialization
#jira UE-40485
Change 3325227 on 2017/02/28 by Chris.Bunner
Fix-up AMD AGS libs.
Change 3325566 on 2017/02/28 by Uriel.Doyon
Fixed possible out-of-bound access in GetUsedTexture() when passing ERHIFeatureLevel::Num
Change 3326009 on 2017/02/28 by Uriel.Doyon
Better fix for 3325566, as the previous fix would ignore the material instance overrides.
Change 3327058 on 2017/03/01 by Benjamin.Hyder
Preparing TM_Shadermodels map for automation
Change 3328222 on 2017/03/01 by Chris.Bunner
Prevent decals from drawing in separate translucency pass. Whilst user control and material relevance were already removed, if the flag was checked before being disabled (by swapping to decal domain) this was still being read in the render loop, now explicitly ignores decals.
#jira UE-42449, UE-42446
Change 3329848 on 2017/03/02 by Uriel.Doyon
Added some extra logs to help track UE-42168
Change 3329977 on 2017/03/02 by Rolando.Caloca
DR - Fix bad clear value
Change 3330008 on 2017/03/02 by Benjamin.Hyder
More preparations for QA-Decals automation
Change 3330754 on 2017/03/02 by Daniel.Wright
Prominent comment explaining reflection env async compute usage and why it's not overlapped with anything
Change 3331451 on 2017/03/03 by Marc.Olano
Manually unroll simplex noise loop to avoid PSO bug on AMD/Metal
Change 3331839 on 2017/03/03 by Rolando.Caloca
DR - hlslcc - add missing file to project
Change 3332247 on 2017/03/03 by Rolando.Caloca
DR - Fix for integrated intel
PR #3305
#jira UE-42393
Change 3332259 on 2017/03/03 by Rolando.Caloca
DR - Fix bad index into pixel formats
PR #3237
#jira UE-41855
Change 3332305 on 2017/03/03 by Rolando.Caloca
DR - OpenGL SRV for index buffers
PR #3271
#jira UE-32618
Change 3332313 on 2017/03/03 by Rolando.Caloca
DR - Fix for integrated intel (properly)
PR #3305
#jira UE-42393
Change 3332317 on 2017/03/03 by Rolando.Caloca
DR - OpenGL SRV for index buffers (properly)
PR #3271
#jira UE-32618
Change 3332368 on 2017/03/03 by Rolando.Caloca
DR - Minor fixes so -sm4 and -sm5 can be used on windows with OpenGL/Vulkan
Change 3333690 on 2017/03/06 by Daniel.Wright
[Copy] Changing movable skylight properties no longer affects static draw lists
Change 3333693 on 2017/03/06 by Daniel.Wright
[Copy] Added 'r.AOListMeshDistanceFields' which dumps out mesh distance fields sorted by memory size, useful for directing content optimizations
Change 3333705 on 2017/03/06 by Daniel.Wright
[Copy] Mesh distance fields are now 8 bit fixed point by default, but can be changed back to 16 bit floating piont with a project setting.
* 8 bit uses half memory but introduces error for thin surfaces or large meshes.
Change 3333721 on 2017/03/06 by David.Hill
DecalProxy:
Copy float FadeScreenSize to FDeferredDecalProxy for use in the render thread. This avoids pointer chasing to the UDecalComponent (game thread component).
Change 3333772 on 2017/03/06 by Daniel.Wright
[Copy] Scene motion blur data is only updated for the main renderer frames. Fixes scene captures and planar reflections breaking object motion blur.
Change 3333790 on 2017/03/06 by Daniel.Wright
[Copy] Mesh distance field generation uses Embree, for a 2.5x speedup
* Can switch back to old kDOP generation with 'r.DistanceFieldBuild.UseEmbree 0' for debugging
Change 3333822 on 2017/03/06 by Daniel.Wright
[Copy] Moved mesh distance field code into MeshDistanceFieldUtilities.cpp
* Moved FMeshUtilities to its own header so the 8k line MeshUtilites.cpp file can be further split up
Change 3333827 on 2017/03/06 by Daniel.Wright
[Copy] Range compress 8bit distance fields - gets one extra bit of precision on average
Change 3333828 on 2017/03/06 by Daniel.Wright
[Copy] Raised High ShadowQuality to 2048 as 1024 for CSM is way too low
Change 3333831 on 2017/03/06 by Daniel.Wright
Non-editor compile fix
Change 3333836 on 2017/03/06 by Daniel.Wright
[Copy] Workaround for gobal distance field volume textures being bloated by 4x on PS4 due to the recommended tiling modes. They now use a 2d tiling mode which avoids the bloat, saving 96Mb.
Change 3333843 on 2017/03/06 by Daniel.Wright
[Copy] Added OcclusionExponent to skylight component
* Useful for brightening up indoors without losing contact shadows as MinOcclusion does
Change 3333845 on 2017/03/06 by Daniel.Wright
[Copy] Capsule shadow BP functions
Change 3333850 on 2017/03/06 by Daniel.Wright
[Copy] Added OcclusionCombineMode to skylight component
Change 3333854 on 2017/03/06 by Daniel.Wright
[Copy] Gnm properly registers clears as GPU work so those events show up in profilegpu
Change 3333857 on 2017/03/06 by Daniel.Wright
[Copy] Clear light attenuation for local lights with a quad covering their screen extents
* Clearing the entire light attenuation buffer costs .1ms on PS4. This optimization lowers the minimum cost of a shadow casting light from .15ms -> .03ms.
* Shadowed lights in Fortnite with 25 lights 3.7ms -> 1.42ms on PS4
Change 3333860 on 2017/03/06 by Daniel.Wright
[Copy] Flush deferred deletes when reallocating distance field atlas to reduce peak memory
Change 3333861 on 2017/03/06 by Daniel.Wright
[Copy] Disable all distance field features on Intel cards as HD 4000 hangs in the RHICreateTexture3D call to allocate the large atlas
Change 3333869 on 2017/03/06 by Daniel.Wright
[Copy] Volumetric Fog using a volume texture mapped to the camera frustum
* Volumetric fog can be enabled on an Exponential Height Fog component with additional controls
* Lights have a VolumetricScatteringIntensity
* New cvars r.VolumetricFog, r.VolumetricFog.GridPixelSize, r.VolumetricFog.GridSizeZ, r.VolumetricFog.DepthDistributionScale
* Lighting features supported:
* Directional light with CSM and a light function
* Point / spot lights without shadows / light functions / IES profiles
* Skylight with occlusion from distance fields
* Analytical height fog covers the view range past where the volumetric fog ends
* Temporal reprojection is used on the volumetric fog scattering and extinction to achieve stability
* Translucency integrates properly into volumetric fog
* Height fog StartDistance is not supported by volumetric fog and should be set to 0.
Change 3333894 on 2017/03/06 by Daniel.Wright
[Copy] Initialize GDummyVolumetricFogGlobalDataUniformBuffer outside of parallel rendering
Change 3333902 on 2017/03/06 by Daniel.Wright
[Copy] Better handling of volumetric fog enabled with distance of 0
Change 3333903 on 2017/03/06 by Daniel.Wright
[Copy] Fixed volumetric fog trying to render light functions for a point light
Change 3333908 on 2017/03/06 by Daniel.Wright
[Copy] Volumetric materials
* Added new material domain Volume, which can output Scattering, Absorption and Emissive. All properties are in world space densities.
* Particle systems using the Volume domain are voxelized based on their ParticlePosition and ParticleRadius
* Volumetric fog integration is now energy conservative - scattering is integrated against transmission over the depth of each slice.
* Added bOverrideLightColorsWithFogInscatteringColors to exponential height fog, which can be enabled to make Volumetric Fog match Height fog more closely
Change 3334134 on 2017/03/06 by Daniel.Wright
[Copy from Michael Trepka] Added Embree 2.14.0 and changed MeshUtilities to use it as this solves issues with Embree leaking TLS keys. UnrealLightmass is still using older Embree 2.7.0 until we can find time to properly test it with the new version. Also, invalidated distance field DDC to force it to rebuild with updated Embree.
Change 3334420 on 2017/03/06 by Daniel.Wright
Fixed RTDF shadows
Change 3335467 on 2017/03/07 by Benjamin.Hyder
Initial submission of QA-Decals map to EngineTest
Change 3335556 on 2017/03/07 by Daniel.Wright
Changed mesh distance field default format back to R16f
Change 3338020 on 2017/03/08 by Daniel.Wright
Disable volumetric fog in vertex shaders for feature levels which don't support it
Change 3339394 on 2017/03/09 by Chris.Bunner
Correctly handle material texture translation error edge case.
#jira UE-42579, UE-42670
Change 3339992 on 2017/03/09 by Daniel.Wright
Only compile volumetric fog shaders on supporting platforms
Change 3341858 on 2017/03/10 by Arne.Schober
Copying //UE4/Dev-Rendering-PSO to Dev-Rendering (//UE4/Dev-Rendering)
#RB Rolando.Caloca, Marcus.Wassmer, Daniel.Wright, Nick.Penwarden, Mark.Satterthwaite
Change 3342004 on 2017/03/10 by Arne.Schober
Copying //UE4/Dev-Rendering-PSO to Dev-Rendering (//UE4/Dev-Rendering)
Fix unity build
#RB Marcus.Wassmer
Change 3343307 on 2017/03/13 by Marcus.Wassmer
Update showflags when we are guaranteed it will happen in all possible ways to spawn the scenecapture. (drag into editor, PIE, -game, etc)
Change 3343732 on 2017/03/13 by Rolando.Caloca
DR - Vulkan compute pipeline & refactor
Change 3344846 on 2017/03/14 by Rolando.Caloca
DR - Android compile fixes
Change 3344883 on 2017/03/14 by Rolando.Caloca
DR - Add missing stencil load/store to PSO initializer
Change 3344985 on 2017/03/14 by Rolando.Caloca
DR - Made load/store actions uint8
Change 3345141 on 2017/03/14 by Rolando.Caloca
DR - vk - Rework render pass hash
Change 3345304 on 2017/03/14 by Benjamin.Hyder
Updating TM-Distancefields map to include TemplateFloor mesh
Change 3345387 on 2017/03/14 by Rolando.Caloca
DR - Add _RenderThread calls for Create*Shader so RHIs can choose not to stall when creating
Change 3345388 on 2017/03/14 by Rolando.Caloca
DR - Do not stall when creating shaders on Vulkan
Change 3345722 on 2017/03/14 by Chris.Bunner
PR #3357: MinimalAPI add to many material expressions (Contributed by DeanoC)
#jira UE-42752
Change 3345723 on 2017/03/14 by Chris.Bunner
Reduce log verbosity causing spamming during landscape editing.
#jira UE-42714
Change 3345725 on 2017/03/14 by Chris.Bunner
[Duplicate 3341860] Fixed material translation error with multiple connections from custom interpolator nodes.
Change 3345726 on 2017/03/14 by Chris.Bunner
Typo fixes.
Change 3345732 on 2017/03/14 by Rolando.Caloca
DR - Decouple vertex declaration off BSS
Change 3345746 on 2017/03/14 by Chris.Bunner
Added sign() intrinsic material graph node and delisted material function workaround.
Change 3346042 on 2017/03/14 by Chris.Bunner
Implement missing size query interface for FRenderTargetResources.
#jira UE-41672
Change 3346387 on 2017/03/14 by Daniel.Wright
[Copy] Added VolumetricScatteringIntensity to particle lights
Change 3346389 on 2017/03/14 by Daniel.Wright
[Copy] Clamp Volumetric material attributes to fp16 range to avoid INFs
Disable volumetric fog when the fog show flag is disabled
Change 3346392 on 2017/03/14 by Daniel.Wright
[Copy] Fixed skylight being much too bright on volumetric fog
Change 3346406 on 2017/03/14 by Daniel.Wright
[Copy] CSM resolution is now controlled by r.Shadow.MaxCSMResolution.
* Changed HighPC to use 1024 MaxShadowResolution (max for all non-CSM shadows), saves 60Mb in Fortnite
Change 3346412 on 2017/03/14 by Daniel.Wright
[Copy] TexCreate_ReduceMemoryWithTilingMode for translucency lighting 3d textures, saves 13Mb
Change 3346414 on 2017/03/14 by Daniel.Wright
[Copy] TexCreate_ReduceMemoryWithTilingMode for volumetric fog 3d textures, saves 13Mb
Change 3346415 on 2017/03/14 by Daniel.Wright
[Copy] Missing file from cl 3338451
Change 3346421 on 2017/03/14 by Daniel.Wright
[Copy] Fixed NaNs in volumetric fog due to rendering when height fog is disabled
* Volumetric fog converts NaNs to black now so they don't spread
Change 3346422 on 2017/03/14 by Daniel.Wright
[Copy] Fixed NaN in volumetric fog with low density values
Change 3346423 on 2017/03/14 by Daniel.Wright
[Copy] Changed default VolumetricFogScatteringDistribution to .2
Change 3346430 on 2017/03/14 by Daniel.Wright
[Copy] New translucent material option to compute fog per pixel instead of the default per vertex
Change 3346432 on 2017/03/14 by Daniel.Wright
[Copy] Moved Volumetric Fog parameters to view uniform buffer for translucency pass
Fixed lifetimes of temporary Volumetric Fog render targets
Change 3346526 on 2017/03/14 by Daniel.Wright
[Copy] Volumetric Fog supports point and spot light shadows
* These lights are injected separately so that per-light resources can be bound (shadow depth map, static shadow depth map)
* Forward lighting of local lights can be forced with 'r.VolumetricFog.InjectShadowedLightsSeparately 0'
* Shadowed lights come at a cost: 2.9ms for volumetric fog on 970 -> 4.2ms with shadowing
Change 3347053 on 2017/03/15 by Rolando.Caloca
DR - android compile fix
Change 3347384 on 2017/03/15 by Rolando.Caloca
DR - Fix merge issue
Change 3347643 on 2017/03/15 by Marcus.Wassmer
Fix some bugs with the 'disable stationary skylight ffor the project' feature.
Fixes lighting in Persona on Paragon.
Change 3347979 on 2017/03/15 by Rolando.Caloca
DR - Allow to automatically apply cached rendertargets to PSO initializer
Change 3348024 on 2017/03/15 by Rolando.Caloca
DR - Remove NullPS on Vulkan to avoid deadlock
Change 3348303 on 2017/03/15 by Rolando.Caloca
DR - Fix for debugging SCW with material SRT
Change 3348357 on 2017/03/15 by Marcus.Wassmer
Fix stencildither and a stencilref bug that was probably breaking decals sometimes.
Change 3348549 on 2017/03/15 by Marcus.Wassmer
Hopefully fix static analysis for potential nullptr access.
Change 3348614 on 2017/03/15 by Marcus.Wassmer
Duplicate some switch changes to fix crash on launch.
Change 3349369 on 2017/03/16 by Gil.Gribb
Fixed botched merge
Change 3349947 on 2017/03/16 by Rolando.Caloca
DR - Fix for mismatched primitive type
Change 3349956 on 2017/03/16 by Benjamin.Hyder
initial updates to TM-DistanceFields map
Change 3350151 on 2017/03/16 by Rolando.Caloca
DR - Fix UT compile issue
Change 3350155 on 2017/03/16 by Rolando.Caloca
DR - Catch mismatched primitive type on PSOs on D3D11
Change 3350192 on 2017/03/16 by Daniel.Wright
Fix for point light shadow depths rendering with wrong cull mode due to PSO refactor
Change 3350736 on 2017/03/16 by Daniel.Wright
Fixed formatting from merge
Change 3350881 on 2017/03/16 by Rolando.Caloca
DR - Fix texture arrays as UAVs on Metal
Change 3350927 on 2017/03/16 by Rolando.Caloca
DR - Fix warning
Change 3350935 on 2017/03/16 by Daniel.Wright
Fix for materials with non-Surface domains being skipped in mesh passes
Change 3351583 on 2017/03/17 by Marcus.Wassmer
Fix clang platforms
Change 3351917 on 2017/03/17 by Marcus.Wassmer
Fix linux compile
Change 3351973 on 2017/03/17 by Marcus.Wassmer
Fix mismatched rendertargetformat
Change 3352038 on 2017/03/17 by Daniel.Wright
Enabled GetAndOrCreateGraphicsPipelineState ensures in Development for testing
Change 3352110 on 2017/03/17 by Marcus.Wassmer
Fix missing RT PSO apply
Change 3352695 on 2017/03/17 by Arne.Schober
DR - Remove PSO Rendertarget check in DX12 Resolve with Shader.
#RB Rolando.Caloca
Change 3352960 on 2017/03/17 by Arne.Schober
DR - Fix some things that slipped trough the PSO merge
#RB none
Change 3353150 on 2017/03/18 by Rolando.Caloca
DR - compile fix
Change 3353205 on 2017/03/18 by Arne.Schober
DR - Fix Incremental Compile and PS4 runtime error where CMASK is not allowed for ThickTile Mode
#RB none
Change 3353207 on 2017/03/18 by Arne.Schober
DR - Fix Confusion
#RB none
Change 3355183 on 2017/03/20 by Nick.Bullard
Fixed up Content orginzation for Decals automation tests in EngineTest
Change 3355627 on 2017/03/20 by Arne.Schober
DR - [UE-43094] - removed ensure in comporiton graph as control of the clear color cannot be gurantueed.
Change 3356342 on 2017/03/21 by Marcus.Wassmer
Fix clang errors
Change 3356591 on 2017/03/21 by Arne.Schober
DR - Fix ensure message
#RB none
Change 3356873 on 2017/03/21 by Arne.Schober
DR - Fix comparission of undefined values in RendertargetApply Check
Change 3357261 on 2017/03/21 by Marcus.Wassmer
Fix LinuxEditor compile
Change 3357294 on 2017/03/21 by Marcus.Wassmer
Add missing SSE functions
Change 3357351 on 2017/03/21 by Frank.Fella
Fix win32 and linux compiler errors
Change 3357370 on 2017/03/21 by Arne.Schober
DR - disable ensure in test builds
#RB Marcus.Wassmer
[CL 3357449 by Marcus Wassmer in Main branch]
2017-03-21 17:46:52 -04:00
# include "PipelineStateCache.h"
# include "ClearQuad.h"
2020-06-23 18:40:00 -04:00
# include "ShaderCompilerCore.h"
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
int32 GAOScatterTileCulling = 1 ;
FAutoConsoleVariableRef CVarAOScatterTileCulling (
TEXT ( " r.AOScatterTileCulling " ) ,
GAOScatterTileCulling ,
TEXT ( " Whether to use the rasterizer for binning occluder objects into screenspace tiles. " ) ,
ECVF_RenderThreadSafe
) ;
2018-02-22 11:25:06 -05:00
int32 GAverageDistanceFieldObjectsPerCullTile = 512 ;
FAutoConsoleVariableRef CVarMaxDistanceFieldObjectsPerCullTile (
TEXT ( " r.AOAverageObjectsPerCullTile " ) ,
GAverageDistanceFieldObjectsPerCullTile ,
TEXT ( " Determines how much memory should be allocated in distance field object culling data structures. Too much = memory waste, too little = flickering due to buffer overflow. " ) ,
ECVF_RenderThreadSafe | ECVF_ReadOnly
) ;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
class FCullObjectsForViewCS : public FGlobalShader
{
2021-06-14 12:46:26 -04:00
DECLARE_GLOBAL_SHADER ( FCullObjectsForViewCS ) ;
SHADER_USE_PARAMETER_STRUCT ( FCullObjectsForViewCS , FGlobalShader ) ;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
public :
2021-06-14 12:46:26 -04:00
BEGIN_SHADER_PARAMETER_STRUCT ( FParameters , )
SHADER_PARAMETER_STRUCT_INCLUDE ( FDistanceFieldCulledObjectBufferParameters , CulledObjectBufferParameters )
SHADER_PARAMETER_STRUCT_INCLUDE ( FDistanceFieldObjectBufferParameters , ObjectBufferParameters )
SHADER_PARAMETER_STRUCT_REF ( FViewUniformShaderParameters , View )
SHADER_PARAMETER ( uint32 , NumConvexHullPlanes )
2021-09-22 10:01:48 -04:00
SHADER_PARAMETER_ARRAY ( FVector4f , ViewFrustumConvexHull , [ 6 ] )
2021-06-14 12:46:26 -04:00
SHADER_PARAMETER ( uint32 , ObjectBoundingGeometryIndexCount )
SHADER_PARAMETER ( float , AOObjectMaxDistance )
SHADER_PARAMETER ( float , AOMaxViewDistance )
END_SHADER_PARAMETER_STRUCT ( )
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3809756)
#rb None
#lockdown Nick.Penwarden
============================
MAJOR FEATURES & CHANGES
============================
Change 3629223 by Rolando.Caloca
DR - Rollback //UE4/Dev-Rendering/Engine/Source/Runtime/VulkanRHI to changelist 3627847
Change 3629708 by Rolando.Caloca
DR - vk - Redo some changes from DevMobile
3601439
3604186
3606672
3617383
3617474
3617483
Change 3761370 by Arne.Schober
DR - Added CityHash to use with conatiners and stuff. It provides good performance and high quallity across multiple platforms.
Change 3761437 by Guillaume.Abadie
Optimises motion blur compute shader for consoles.
Change 3761483 by Guillaume.Abadie
Fixes D3D11 RHI lying to dynamic resolution heuristic with t.MaxFPS.
Change 3761995 by Mark.Satterthwaite
Add the Metal compiler path to the local .pch filename to avoid problems when Xcode moves.
Change 3761996 by Mark.Satterthwaite
Emit more details when a pixel shader is found to have no outputs at all which Metal doesn't permit. More likely this is a bug in the shader compiler not configuring the in-out mask correctly...
#jira UE-52292
Change 3761999 by Mark.Satterthwaite
No need to avoid tessellation for FMetalRHICommandContext::RHIEndDrawIndexedPrimitiveUP anymore - that was from back when the tessellation logic was replicated in each RHI*Draw* implementation.
#jira UE-51937
Change 3762181 by Joe.Graf
Changed MaxShaderJobBatchSize to 25 on Mac as it reduced shader compile time by 21%
Change 3762607 by Mark.Satterthwaite
Remove accidentally included changes from 3761995.
Change 3762612 by Mark.Satterthwaite
Enable the explicit sincos intrinsic for Metal to avoid instances of UE-52477 that can cause shaders to compile incorrectly through hlslcc.
#jira UE-52477
Change 3762772 by Michael.Lentine
Move RHI calls to render thread.
Change 3763021 by Richard.Wallis
Remove shader cache tool project and implementation.
#jira UE-51613
Change 3763082 by Guillaume.Abadie
More SceneTexture, SceneColor and SceneDepth automated tests
Change 3763111 by Richard.Wallis
Clone of CL 3763033 (Release-4.18):
Fix for crash upon launching packaged game on Mac with Share Material Shader Code enabled.
#jira UE-52121
Change 3763657 by Michael.Lentine
Invalidate ddc for skeletal mesh render data so that the duplicated vertex render structures are properly serialized.
Change 3763727 by Jian.Ru
Fix Player Collision view mode. It is caused by checking an uninitialized vertex buffer so the check always fail.
#jira UE-52052
Change 3763738 by Guillaume.Abadie
Implements SSR input post process material location.
Change 3764271 by Mark.Satterthwaite
Allow ControlPointPatch lists to flow through MetalRHI as it was setup to handle this transparently - the VSHS compute shader will convert them to triangles to draw. Report the same warning as in the pipeline creation stage as this hasn't been formally validated.
#jira UE-52454
Change 3764316 by Daniel.Wright
Added AVolumetricLightmapDensityVolume - gives local control over Volumetric Lightmap density. Dropping the top mip outside of the play area in Monolith saves 20Mb (35Mb original).
Volumetric Lightmap no longer refines around static translucent geometry - saves 5Mb in Monolith
Reworked brick culling by error mechanism. Now compares error to interpolated parent lighting instead of the brick average - prevents dropping constant value bricks which are near a wall and cause leaking due to parent interpolation after being culled.
Change 3764318 by Daniel.Wright
Missing file
Change 3764321 by Daniel.Wright
Shader compiling memory optimizations
* Editor memory: Sharing uniform buffer includes and GeneratedInstancedStereo.ush per FShaderType (was previously duplicated per FShader job)
* SCW input size: Sharing uniform buffer includes and SharedEnvironment per batch
* 7.6Gb of shader job inputs in memory -> .5Gb (13x less) when doing a full shader compile of Paragon Editor
* 13.8Gb written into worker input files -> 2.9Gb (4.7x less). Global shaders are never batched when sent to SCW so unoptimized by these changes.
Change 3764595 by Daniel.Wright
Added VolumetricLightmapDensityVolume asset icons
Change 3764701 by Michael.Lentine
Add duplicated vertices merging for meshmerge.
Change 3766002 by Guillaume.Abadie
Fixes a crash in translucency.
Change 3766007 by Guillaume.Abadie
Oups.... Fixes compilation failure.
Change 3766697 by Guillaume.Abadie
Giant refactor of global shader interface for upcoming native support of permutation.
CL generated by python script.
Change 3767205 by Chris.Bunner
Deferring FMaterial::RenderingThreadShaderMap update to render-thread rather than assumption commands have been flushed.
#jira UE-50652
Change 3767207 by Chris.Bunner
Clamp fetched texture coordinates to those available on the mesh.
Change 3767209 by Chris.Bunner
PR #4203: Early-outs in UMaterialInstance parameter setters (Contributed by stefanzimecki)
#jira UE-52193
Change 3767772 by Mark.Satterthwaite
MetalShaderFormat will no longer fallback to text shaders when you ask it to compile to bytecode but the bytecode compiler is not available (either locally or remotely) - this ensures that the DDC can't be poisoned by incorrectly configured clients. The Editor is already setup such that if the remote shader compiler is not configured & Xcode is not available locally the shader-compiler will be invoked to generate text shaders.
#jira UE-52554
Change 3768604 by Guillaume.Abadie
Polish up with new global shader function signature.
Change 3768993 by Guillaume.Abadie
Fixes r.Upscale.Panini cvars
Change 3769478 by Mark.Satterthwaite
Move the ue4_stdlib.metal & PCH into a temporary directory that exists for the lifetime of the SCW on the remote side as well as the local one and add this path as an include directory.
#jira UE-52587
Change 3769703 by Mark.Satterthwaite
For all Metal platforms >= Metal v1.2 transform mul(a,b) into fma(a,b,0) to prevent the Apple compiler reordering operations differently between the base & depth passes which results in variance in the position output.
For iOS disable fast-math when the vertex-shader uses World-Position-Offset because there are additional problems on the iOS shader compiler that result in position variance even with the above fix - WPO performance will suffer but I don't have any alternatives.
Remove the depth-offset hack from the depth-only vertex shader again.
#jira UES-5651
Change 3769763 by Mark.Satterthwaite
Handle swizzle's in the hlslcc fma identification pass so that we reduce the number of instructions and the platform compiler can't break the instructions up.
Change 3769849 by Mark.Satterthwaite
Fix CIS error.
Change 3770517 by Richard.Wallis
Fix for crash when creating a new media texture (AppleIntelHD5000GraphicsMTLDriver!SamplerStage::bindSamplerToTexture()). Missing texture resource for binding. Old InitDynamicRHI() code has been refactored out into seperate functions which leaves us on Mac with a NULL resource initially after creation which Metal doesn't like. This fix puts InitDynamicRHI down the default setup/clear path which inits default resources - I don't think we should use a global dummy in this instance as this is a render target.
#jira UE-51940
Change 3770688 by Uriel.Doyon
Fixed texture resolution returning 0 when running blueprint construction scripts at cook time.
Change 3771115 by Mark.Satterthwaite
Report errors from failed attempts to compile global shaders or we can't see why things fail on non-Windows platforms.
Change 3771263 by Mark.Satterthwaite
Change the way ManualVertexFetch is enabled on Metal platforms so that it is enabled when targeting Metal v1.2 and higher (macOS 10.12+/iOS 10+). This brings iOS in the Desktop Forward renderer back into line with the Mac.
#jira UERNDR-300
Change 3773472 by Guillaume.Abadie
Fixes a crash on PIE of SimpleComposure project.
Change 3773475 by Guillaume.Abadie
Fixes bug in editor viewport caused by SSR input changes.
Change 3774677 by Arne.Schober
DR - Deprecated SetLocal from the RHICmdlist
Fixed some unnecessary PSO collisions.
Change 3777037 by Mark.Satterthwaite
Remove incorrect change that caused a reference to "accurate::sincos" to appear in some Metal shaders rather than "precise::sincos".
Change 3777122 by Mark.Satterthwaite
Back out changelist 3777037 - I'm blind and wasn't seeing the real problem was a stale shader cache...
Change 3777196 by Mark.Satterthwaite
Fix text-shader compilation on iOS 10 - maybe iOS 9 too (untested!).
We need our own make_scalar type-trait template for ue4_stdlib.metal so that we still compile with older iOS runtime compilers and we can't use as_type to directly implement the packHalf2x16/unpackHalf2x16 intrinsics for these older runtime compilers either.
Change 3779098 by Rolando.Caloca
DR - vk - Fix query index
Change 3779275 by Mark.Satterthwaite
Silence the Metal runtime compiler warning caused by use of a deprecated enum value when running text shaders compiled for Metal v1.0/1.1 on a Metal v1.2+ OS.
#jira UE-52554
Change 3779427 by Rolando.Caloca
DR - vk - Fix for allocator contention
Change 3779608 by Uriel.Doyon
Fixed invalid access in the resave package commantlet when building texture streaming material data for materials enabling tesselation.
Change 3784496 by Mark.Satterthwaite
Temporarily disable USE_OBJECT_COMPOSITING_TILE_CULLING for Metal shader compilation only - other platforms are unaffected - as it isn't working properly for some reason. need to work out what's up but don't want Distance Fields to be completely snookered in the interim.
#jira UE-52952
Change 3784608 by Rolando.Caloca
DR - Copy 3784588
- Fix for drivers returning out of date swapchains during resizes
Change 3784734 by Mark.Satterthwaite
Real fix for UE-52952 - MetalShaderFormat wasn't propagating the full thread-group value.
#jira UE-52952
Change 3784741 by Mark.Satterthwaite
More Metal debugging commandline options "-metalfastmath" & "-metalnofastmath" to force fast-math on or off for all shaders, must be using runtime-compiled shaders (i.e. -metalshaderdebug or r.Shaders.Optimise=0) to take effect.
Change 3787103 by Guillaume.Abadie
Kills BuiltinSamplers UB
Change 3787207 by Guillaume.Abadie
Sorry, compile fix that were fine with local changes...
Change 3787396 by Marcus.Wassmer
PR #4271: UE-52901: Set VIS_Max meta to hidden (Contributed by projectgheist)
Change 3788028 by Peter.Sumanaseni
Working linear HDR exr output from sequencer
Change 3788536 by Mark.Satterthwaite
Track whether the Metal shader uses the discard_fragment function as when this is used but without any other outputs we know we need to bind at least one render-target or a depth-stencil surface but we don't know which. This lets us correctly error when we encounter a shader with no outputs at all which Metal doesn't permit.
#jira UE-52292
Change 3788538 by Mark.Satterthwaite
Let's try mitigating UE-46604 on Nvidia by retaining resource references in the command-buffer. This shouldn't be necessary and isn't typically on other vendors but we haven't been able to reproduce this reliably enough to get to the bottom of it.
#jira UE-46604
Change 3789083 by Guillaume.Abadie
Implements global shader permutations. Example in ScreenSpaceReflections.cpp.
Change 3789090 by Guillaume.Abadie
Fixes linux build.
Change 3789106 by Guillaume.Abadie
Fixes compilation failure in niagara plugin.
Change 3789274 by Guillaume.Abadie
Avoid hit proxies to clobber TAA's hitsory.
#jira UE-52968
Change 3789380 by Guillaume.Abadie
Back out changelist 3789083: global shader permutation because compilation failure in clang.
Change 3789648 by Guillaume.Abadie
Relands global shader permutation, with clang support.
Change 3789712 by Guillaume.Abadie
Fixes TestImage show flag with TAAU on.
#jira UE-53061
Change 3791593 by Guillaume.Abadie
Reinvalidates shaders with shader permutations.
Change 3791884 by Daniel.Wright
Added BP setter for LowerHemisphereColor
Change 3791886 by Daniel.Wright
Added LightmapType to PrimitiveComponent
* ForceVolumetric allows forcing static geometry to use Volumetric Lightmaps, which can be useful on instanced foliage where seams are prevalent. Lightmass internal caching still requires lightmap UVs and reasonable lightmap resolution.
* ForceSurface replaces bLightAsIfStatic
Improvements to Volumetric Lightmap quality needed for static geometry
* Stationary light shadowing is now dilated inside geometry
* Now doing two dilation passes since samples near geometry see inside due to ray start bias
* Refinement around geometry uses an expanded cell bounds when the geometry is going to use Volumetric Lightmaps, since cross-resolution stitching causes leaking
Lightmass debug primitives are now tied to a swarm task instead of global - allows debugging of Volumetric Lightmap tasks
Change 3792256 by Guillaume.Abadie
Fixes a bug where permutation was not actually serialised in FShader, so was ending up recompiling shader at every load.
Change 3792884 by Marcus.Wassmer
Copying //UE4/Partner-AMD to Dev-Rendering (//UE4/Dev-Rendering)
Change 3793200 by Marcus.Wassmer
Copying //UE4/Partner-IDV-SpeedTree to Dev-Rendering (//UE4/Dev-Rendering)
Speedtree 8 support
Change 3793206 by Brian.Karis
Added color grading control BlueCorrection to correct for artifacts with "electric" blues due to the ACEScg color space. Bright blue desaturates instead of going to violet.
Added color grading control ExpandGamut which expands bright saturated colors outside the sRGB gamut to fake wide gamut rendering.
ACES changes.
Change 3793344 by Marcus.Wassmer
Fix editortest compile
Change 3794285 by Guillaume.Abadie
Serializes PermutationId according to archive rendering version to avoid issues with old material that were serializing a shader map into UObject.
Change 3794307 by Guillaume.Abadie
Resaves uassets that were modified between 3789648 and 3794285
Change 3794627 by Mark.Satterthwaite
Implement two components for MTLPP, an IMP cache for Objective-C selector implementations & an interposition framework for those same selectors:
- imp_SelectorCache & friends provide the IMP caching for each of the Metal protocols which constitute most of the API, so far I've not covered the Metal classes used for the various descriptor/initializer types. Each type has its own IMPTable which caches the selector's implementation pointer and provides the mechanism to hook that implementation. As Objective-C is runtime dynamic this look up must be performed on the actual Class value returned by an object at runtime - you can't do this at compile time. Even things like NSString which appear compile-time static are really not as NSString is an alias for a class-cluster (NSString, NSMutableString, __NSInlineString and more).
- The interpose directory contains MTI* files which are the framework for interposing all the functions in Metal's runtime API - I deliberately omit the descriptor classes & read-only functions as there's no benefit to interposing them - which I can build off to create a trace tool or a superior validation layer. Right now this is Mac only as there'll be some problems to solve for iOS/tvOS due to difference in linking requirements - not insurmountable.
- Rebuild MTLPP's implementation of the C++ wrapper classes around the IMPTable's - this means we avoid all the objc_msgSend overhead for all the classes and functions whose implementations are cached. Right now the IMPTable is going to incur a look-up for all non-copy/move constructors which is suboptimal - ideally the Metal IMPTables would be cached in the Device object as they will be consistent within a single Device.
- Sort out the MTLPP availability logic - it now exports the availability warnings to the caller and internally just blithely assumes it may call the functions, the caller is responsible for ensuring that calls are made only on appropriate devices & OSes. This reduces MTLPP complexity and better fits how MetalRHI works.
- Fix a number of retain/release bugs that were lying dormant in MTLPP but exposed by the switch to IMPTables.
- Add tvOS support.
Next up, put this into MetalRHI and start fixing all the fallout.
Change 3794631 by Mark.Satterthwaite
Missed updating mtlpp's build.cs for TVOS.
Change 3794651 by Uriel.Doyon
UPointLightComponent::GetUnitsConversionFactor() now takes the cone angle as parameter. This allows to fix spotlight unit conversion when using lumens.
Change 3794720 by Guillaume.Abadie
Fixes a bug in Global{Bilinear,Trilinear}ClampedSampler that was actually doing a Point sampling.
Change 3794749 by Mark.Satterthwaite
Fix mtlpp.build.cs paths.
Change 3794856 by Mark.Satterthwaite
Fix some shadowing warnings.
Change 3795484 by Daniel.Wright
Implemented the Spherical Harmonic windowing algorithm from 'Stupid Spherical Harmonics (SH) Tricks'
New WorldSettings Lightmass property VolumetricLightmapSphericalHarmonicSmoothing controls the global amount of smoothing applied
Change 3795590 by Brian.Karis
Area light fixes
Fixed order of operations. This helps mixing of SourceRadius, SourceLength, and SoftSourceRadius.
Change 3796832 by Marcus.Wassmer
Correct shouldcache condition for new resolve shader
Change 3796884 by Marcus.Wassmer
Doing it right this time.
Change 3797196 by Mark.Satterthwaite
More updates to MTLPP to make things simpler and reduce the number of spurious Objective-C warnings that are emitted because of the way we are using the runtime.
Change 3797200 by Daniel.Wright
Lightmass now uses the highest density VolumetricLightmapDensityVolume settings that affect any part of a cell
Change 3797221 by Daniel.Wright
Reduced default SphericalHarmonicSmoothing based on RoboRecall tests. Now only active with strong direct lighting from static lights by default.
Change 3797411 by Brian.Karis
Disable ExpandGamut for old tone mapper.
Change 3797462 by Mark.Satterthwaite
More build warnings silenced after changing to the lowest possible deployment target OS for each library.
Change 3797585 by Mark.Satterthwaite
Range-based-For support in the NSArray wrapper.
Change 3797836 by Mark.Satterthwaite
Even more forward-declarations to avoid system headers poking through to the including code from mtlpp.
Change 3798027 by Mark.Satterthwaite
Fix handling of nil objects, on which no functions may be called, command-buffer retention and IMP declaration.
Change 3798154 by Mark.Satterthwaite
Fix some egregious memory leaks that rewriting to use mtlpp exposed before we carry on - don't want these slipping into 4.19.
Change 3800990 by Mark.Satterthwaite
Typedef all the completion-handler callback types in mtlpp to make future me's life easier.
Change 3801400 by Chris.Bunner
Improving automated test errors on failure to generate report data.
Change 3801726 by Mark.Satterthwaite
Correct some function availability and the command-buffer error status in mtlpp.
Change 3801808 by Chris.Bunner
Added DefaultScalability.ini to EngineTest that forces all quality levels to Engine default Epic for now to improve consistency.
Change 3801862 by Marcus.Wassmer
Update automated tests with color gamut change
Change 3802214 by Chris.Bunner
When running automated tests in and editor-locked PIE viewport, skip resizing as the editor can't handle this.
Added bindable delegate called when ScreenshotRequest is processed - Useful to allow screenshots to override and restore settings per capture.
#jira UE-53188
Change 3802243 by Chris.Bunner
Added button to automated test screenshot browser to add or replace all outstanding test reports if appropriate.
DeleteAllReports button is now only enabled whilst there are reports in the list.
Change 3802372 by Chris.Bunner
Updating more test screenshots.
Change 3803683 by Chris.Bunner
Adding more logging and multiple attempts to automated test report network save.
Added small wait on repeated operations that are known to fail.
Change 3803826 by Rolando.Caloca
DR - vk - Fix merge issue
Change 3804181 by Chris.Bunner
Tentative fix for CIS test failure.
Change 3804236 by Chris.Bunner
Additional logging for case where file write silently fails, report platform-specific error.
Change 3804303 by zachary.wilson
Cleaning up assets in QAGame saved with empty engine versions to resolve warnings seen when launching on
Change 3804410 by Chris.Bunner
Added additional logging when automated screenshot test fails due to size mismatch.
Mismatched bounds are colored red in the delta.
Change 3804455 by Mark.Satterthwaite
Fix a small number of persistent memory leaks on the Mac build that slowly consume more and more memory as you use the Editor - interacting with menu's was particularly egregious as each NSMenu would leak after you move away.
#jira NA
Change 3804667 by Chris.Bunner
Speculative CIS fixes.
Change 3806008 by Chris.Bunner
Partially reimplementing backed-out CL 3804181 to improve consistency of how automated screenshot test settings are applied/restored.
#tests CIS preflight job 8174412
Change 3806909 by Mark.Satterthwaite
Use the vertex-shader's in-out mask to ensure that we only validate legitmate vertex-streams in Metal's DrawIndexedPrimitive implementation.
#jira UE-53046
Change 3807059 by laz.matech
Checking in QAGame Rendering Map, QA-PhysicalLightingUnits, for testing Physical Light Units.
Wanted to get this in before copy up.
#Jira none
Change 3807726 by Chris.Bunner
Removed a check that we can't fix up. The check hits unbound buffers which it assumes means a failure but is actually due to m.v.fetch. We don't have the information available to know which are which removed from the input without reading from the shader.
#jira UE-53046
Change 3807800 by Guillaume.Abadie
Fixes some warning in shader headers.
Change 3807804 by Guillaume.Abadie
Back out changelist 3807800
Change 3807807 by Guillaume.Abadie
Relands shader header warnings.
Change 3808046 by Chris.Bunner
Dropping a new automated test error back to a warning as this may lead to genuine issues being ignored in the short term.
Change 3809579 by Chris.Bunner
Back out changelist 3774677.
#jira UE-53483
Change 3809620 by Chris.Bunner
Updating animated cloth test screenshot.
Change 3803629 by Chris.Bunner
Rebuilt CornellBox and DistanceField test maps, updated screenshots.
Change 3787045 by Guillaume.Abadie
Moves some global samplers to Common.ush
Change 3809756 by Chris.Bunner
Updating animated cloth test screenshot.
[CL 3809764 by Chris Bunner in Main branch]
2017-12-15 12:47:47 -05:00
static bool ShouldCompilePermutation ( const FGlobalShaderPermutationParameters & Parameters )
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
{
2021-07-12 10:24:46 -04:00
return ShouldCompileDistanceFieldShaders ( Parameters . Platform ) ;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
}
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3809756)
#rb None
#lockdown Nick.Penwarden
============================
MAJOR FEATURES & CHANGES
============================
Change 3629223 by Rolando.Caloca
DR - Rollback //UE4/Dev-Rendering/Engine/Source/Runtime/VulkanRHI to changelist 3627847
Change 3629708 by Rolando.Caloca
DR - vk - Redo some changes from DevMobile
3601439
3604186
3606672
3617383
3617474
3617483
Change 3761370 by Arne.Schober
DR - Added CityHash to use with conatiners and stuff. It provides good performance and high quallity across multiple platforms.
Change 3761437 by Guillaume.Abadie
Optimises motion blur compute shader for consoles.
Change 3761483 by Guillaume.Abadie
Fixes D3D11 RHI lying to dynamic resolution heuristic with t.MaxFPS.
Change 3761995 by Mark.Satterthwaite
Add the Metal compiler path to the local .pch filename to avoid problems when Xcode moves.
Change 3761996 by Mark.Satterthwaite
Emit more details when a pixel shader is found to have no outputs at all which Metal doesn't permit. More likely this is a bug in the shader compiler not configuring the in-out mask correctly...
#jira UE-52292
Change 3761999 by Mark.Satterthwaite
No need to avoid tessellation for FMetalRHICommandContext::RHIEndDrawIndexedPrimitiveUP anymore - that was from back when the tessellation logic was replicated in each RHI*Draw* implementation.
#jira UE-51937
Change 3762181 by Joe.Graf
Changed MaxShaderJobBatchSize to 25 on Mac as it reduced shader compile time by 21%
Change 3762607 by Mark.Satterthwaite
Remove accidentally included changes from 3761995.
Change 3762612 by Mark.Satterthwaite
Enable the explicit sincos intrinsic for Metal to avoid instances of UE-52477 that can cause shaders to compile incorrectly through hlslcc.
#jira UE-52477
Change 3762772 by Michael.Lentine
Move RHI calls to render thread.
Change 3763021 by Richard.Wallis
Remove shader cache tool project and implementation.
#jira UE-51613
Change 3763082 by Guillaume.Abadie
More SceneTexture, SceneColor and SceneDepth automated tests
Change 3763111 by Richard.Wallis
Clone of CL 3763033 (Release-4.18):
Fix for crash upon launching packaged game on Mac with Share Material Shader Code enabled.
#jira UE-52121
Change 3763657 by Michael.Lentine
Invalidate ddc for skeletal mesh render data so that the duplicated vertex render structures are properly serialized.
Change 3763727 by Jian.Ru
Fix Player Collision view mode. It is caused by checking an uninitialized vertex buffer so the check always fail.
#jira UE-52052
Change 3763738 by Guillaume.Abadie
Implements SSR input post process material location.
Change 3764271 by Mark.Satterthwaite
Allow ControlPointPatch lists to flow through MetalRHI as it was setup to handle this transparently - the VSHS compute shader will convert them to triangles to draw. Report the same warning as in the pipeline creation stage as this hasn't been formally validated.
#jira UE-52454
Change 3764316 by Daniel.Wright
Added AVolumetricLightmapDensityVolume - gives local control over Volumetric Lightmap density. Dropping the top mip outside of the play area in Monolith saves 20Mb (35Mb original).
Volumetric Lightmap no longer refines around static translucent geometry - saves 5Mb in Monolith
Reworked brick culling by error mechanism. Now compares error to interpolated parent lighting instead of the brick average - prevents dropping constant value bricks which are near a wall and cause leaking due to parent interpolation after being culled.
Change 3764318 by Daniel.Wright
Missing file
Change 3764321 by Daniel.Wright
Shader compiling memory optimizations
* Editor memory: Sharing uniform buffer includes and GeneratedInstancedStereo.ush per FShaderType (was previously duplicated per FShader job)
* SCW input size: Sharing uniform buffer includes and SharedEnvironment per batch
* 7.6Gb of shader job inputs in memory -> .5Gb (13x less) when doing a full shader compile of Paragon Editor
* 13.8Gb written into worker input files -> 2.9Gb (4.7x less). Global shaders are never batched when sent to SCW so unoptimized by these changes.
Change 3764595 by Daniel.Wright
Added VolumetricLightmapDensityVolume asset icons
Change 3764701 by Michael.Lentine
Add duplicated vertices merging for meshmerge.
Change 3766002 by Guillaume.Abadie
Fixes a crash in translucency.
Change 3766007 by Guillaume.Abadie
Oups.... Fixes compilation failure.
Change 3766697 by Guillaume.Abadie
Giant refactor of global shader interface for upcoming native support of permutation.
CL generated by python script.
Change 3767205 by Chris.Bunner
Deferring FMaterial::RenderingThreadShaderMap update to render-thread rather than assumption commands have been flushed.
#jira UE-50652
Change 3767207 by Chris.Bunner
Clamp fetched texture coordinates to those available on the mesh.
Change 3767209 by Chris.Bunner
PR #4203: Early-outs in UMaterialInstance parameter setters (Contributed by stefanzimecki)
#jira UE-52193
Change 3767772 by Mark.Satterthwaite
MetalShaderFormat will no longer fallback to text shaders when you ask it to compile to bytecode but the bytecode compiler is not available (either locally or remotely) - this ensures that the DDC can't be poisoned by incorrectly configured clients. The Editor is already setup such that if the remote shader compiler is not configured & Xcode is not available locally the shader-compiler will be invoked to generate text shaders.
#jira UE-52554
Change 3768604 by Guillaume.Abadie
Polish up with new global shader function signature.
Change 3768993 by Guillaume.Abadie
Fixes r.Upscale.Panini cvars
Change 3769478 by Mark.Satterthwaite
Move the ue4_stdlib.metal & PCH into a temporary directory that exists for the lifetime of the SCW on the remote side as well as the local one and add this path as an include directory.
#jira UE-52587
Change 3769703 by Mark.Satterthwaite
For all Metal platforms >= Metal v1.2 transform mul(a,b) into fma(a,b,0) to prevent the Apple compiler reordering operations differently between the base & depth passes which results in variance in the position output.
For iOS disable fast-math when the vertex-shader uses World-Position-Offset because there are additional problems on the iOS shader compiler that result in position variance even with the above fix - WPO performance will suffer but I don't have any alternatives.
Remove the depth-offset hack from the depth-only vertex shader again.
#jira UES-5651
Change 3769763 by Mark.Satterthwaite
Handle swizzle's in the hlslcc fma identification pass so that we reduce the number of instructions and the platform compiler can't break the instructions up.
Change 3769849 by Mark.Satterthwaite
Fix CIS error.
Change 3770517 by Richard.Wallis
Fix for crash when creating a new media texture (AppleIntelHD5000GraphicsMTLDriver!SamplerStage::bindSamplerToTexture()). Missing texture resource for binding. Old InitDynamicRHI() code has been refactored out into seperate functions which leaves us on Mac with a NULL resource initially after creation which Metal doesn't like. This fix puts InitDynamicRHI down the default setup/clear path which inits default resources - I don't think we should use a global dummy in this instance as this is a render target.
#jira UE-51940
Change 3770688 by Uriel.Doyon
Fixed texture resolution returning 0 when running blueprint construction scripts at cook time.
Change 3771115 by Mark.Satterthwaite
Report errors from failed attempts to compile global shaders or we can't see why things fail on non-Windows platforms.
Change 3771263 by Mark.Satterthwaite
Change the way ManualVertexFetch is enabled on Metal platforms so that it is enabled when targeting Metal v1.2 and higher (macOS 10.12+/iOS 10+). This brings iOS in the Desktop Forward renderer back into line with the Mac.
#jira UERNDR-300
Change 3773472 by Guillaume.Abadie
Fixes a crash on PIE of SimpleComposure project.
Change 3773475 by Guillaume.Abadie
Fixes bug in editor viewport caused by SSR input changes.
Change 3774677 by Arne.Schober
DR - Deprecated SetLocal from the RHICmdlist
Fixed some unnecessary PSO collisions.
Change 3777037 by Mark.Satterthwaite
Remove incorrect change that caused a reference to "accurate::sincos" to appear in some Metal shaders rather than "precise::sincos".
Change 3777122 by Mark.Satterthwaite
Back out changelist 3777037 - I'm blind and wasn't seeing the real problem was a stale shader cache...
Change 3777196 by Mark.Satterthwaite
Fix text-shader compilation on iOS 10 - maybe iOS 9 too (untested!).
We need our own make_scalar type-trait template for ue4_stdlib.metal so that we still compile with older iOS runtime compilers and we can't use as_type to directly implement the packHalf2x16/unpackHalf2x16 intrinsics for these older runtime compilers either.
Change 3779098 by Rolando.Caloca
DR - vk - Fix query index
Change 3779275 by Mark.Satterthwaite
Silence the Metal runtime compiler warning caused by use of a deprecated enum value when running text shaders compiled for Metal v1.0/1.1 on a Metal v1.2+ OS.
#jira UE-52554
Change 3779427 by Rolando.Caloca
DR - vk - Fix for allocator contention
Change 3779608 by Uriel.Doyon
Fixed invalid access in the resave package commantlet when building texture streaming material data for materials enabling tesselation.
Change 3784496 by Mark.Satterthwaite
Temporarily disable USE_OBJECT_COMPOSITING_TILE_CULLING for Metal shader compilation only - other platforms are unaffected - as it isn't working properly for some reason. need to work out what's up but don't want Distance Fields to be completely snookered in the interim.
#jira UE-52952
Change 3784608 by Rolando.Caloca
DR - Copy 3784588
- Fix for drivers returning out of date swapchains during resizes
Change 3784734 by Mark.Satterthwaite
Real fix for UE-52952 - MetalShaderFormat wasn't propagating the full thread-group value.
#jira UE-52952
Change 3784741 by Mark.Satterthwaite
More Metal debugging commandline options "-metalfastmath" & "-metalnofastmath" to force fast-math on or off for all shaders, must be using runtime-compiled shaders (i.e. -metalshaderdebug or r.Shaders.Optimise=0) to take effect.
Change 3787103 by Guillaume.Abadie
Kills BuiltinSamplers UB
Change 3787207 by Guillaume.Abadie
Sorry, compile fix that were fine with local changes...
Change 3787396 by Marcus.Wassmer
PR #4271: UE-52901: Set VIS_Max meta to hidden (Contributed by projectgheist)
Change 3788028 by Peter.Sumanaseni
Working linear HDR exr output from sequencer
Change 3788536 by Mark.Satterthwaite
Track whether the Metal shader uses the discard_fragment function as when this is used but without any other outputs we know we need to bind at least one render-target or a depth-stencil surface but we don't know which. This lets us correctly error when we encounter a shader with no outputs at all which Metal doesn't permit.
#jira UE-52292
Change 3788538 by Mark.Satterthwaite
Let's try mitigating UE-46604 on Nvidia by retaining resource references in the command-buffer. This shouldn't be necessary and isn't typically on other vendors but we haven't been able to reproduce this reliably enough to get to the bottom of it.
#jira UE-46604
Change 3789083 by Guillaume.Abadie
Implements global shader permutations. Example in ScreenSpaceReflections.cpp.
Change 3789090 by Guillaume.Abadie
Fixes linux build.
Change 3789106 by Guillaume.Abadie
Fixes compilation failure in niagara plugin.
Change 3789274 by Guillaume.Abadie
Avoid hit proxies to clobber TAA's hitsory.
#jira UE-52968
Change 3789380 by Guillaume.Abadie
Back out changelist 3789083: global shader permutation because compilation failure in clang.
Change 3789648 by Guillaume.Abadie
Relands global shader permutation, with clang support.
Change 3789712 by Guillaume.Abadie
Fixes TestImage show flag with TAAU on.
#jira UE-53061
Change 3791593 by Guillaume.Abadie
Reinvalidates shaders with shader permutations.
Change 3791884 by Daniel.Wright
Added BP setter for LowerHemisphereColor
Change 3791886 by Daniel.Wright
Added LightmapType to PrimitiveComponent
* ForceVolumetric allows forcing static geometry to use Volumetric Lightmaps, which can be useful on instanced foliage where seams are prevalent. Lightmass internal caching still requires lightmap UVs and reasonable lightmap resolution.
* ForceSurface replaces bLightAsIfStatic
Improvements to Volumetric Lightmap quality needed for static geometry
* Stationary light shadowing is now dilated inside geometry
* Now doing two dilation passes since samples near geometry see inside due to ray start bias
* Refinement around geometry uses an expanded cell bounds when the geometry is going to use Volumetric Lightmaps, since cross-resolution stitching causes leaking
Lightmass debug primitives are now tied to a swarm task instead of global - allows debugging of Volumetric Lightmap tasks
Change 3792256 by Guillaume.Abadie
Fixes a bug where permutation was not actually serialised in FShader, so was ending up recompiling shader at every load.
Change 3792884 by Marcus.Wassmer
Copying //UE4/Partner-AMD to Dev-Rendering (//UE4/Dev-Rendering)
Change 3793200 by Marcus.Wassmer
Copying //UE4/Partner-IDV-SpeedTree to Dev-Rendering (//UE4/Dev-Rendering)
Speedtree 8 support
Change 3793206 by Brian.Karis
Added color grading control BlueCorrection to correct for artifacts with "electric" blues due to the ACEScg color space. Bright blue desaturates instead of going to violet.
Added color grading control ExpandGamut which expands bright saturated colors outside the sRGB gamut to fake wide gamut rendering.
ACES changes.
Change 3793344 by Marcus.Wassmer
Fix editortest compile
Change 3794285 by Guillaume.Abadie
Serializes PermutationId according to archive rendering version to avoid issues with old material that were serializing a shader map into UObject.
Change 3794307 by Guillaume.Abadie
Resaves uassets that were modified between 3789648 and 3794285
Change 3794627 by Mark.Satterthwaite
Implement two components for MTLPP, an IMP cache for Objective-C selector implementations & an interposition framework for those same selectors:
- imp_SelectorCache & friends provide the IMP caching for each of the Metal protocols which constitute most of the API, so far I've not covered the Metal classes used for the various descriptor/initializer types. Each type has its own IMPTable which caches the selector's implementation pointer and provides the mechanism to hook that implementation. As Objective-C is runtime dynamic this look up must be performed on the actual Class value returned by an object at runtime - you can't do this at compile time. Even things like NSString which appear compile-time static are really not as NSString is an alias for a class-cluster (NSString, NSMutableString, __NSInlineString and more).
- The interpose directory contains MTI* files which are the framework for interposing all the functions in Metal's runtime API - I deliberately omit the descriptor classes & read-only functions as there's no benefit to interposing them - which I can build off to create a trace tool or a superior validation layer. Right now this is Mac only as there'll be some problems to solve for iOS/tvOS due to difference in linking requirements - not insurmountable.
- Rebuild MTLPP's implementation of the C++ wrapper classes around the IMPTable's - this means we avoid all the objc_msgSend overhead for all the classes and functions whose implementations are cached. Right now the IMPTable is going to incur a look-up for all non-copy/move constructors which is suboptimal - ideally the Metal IMPTables would be cached in the Device object as they will be consistent within a single Device.
- Sort out the MTLPP availability logic - it now exports the availability warnings to the caller and internally just blithely assumes it may call the functions, the caller is responsible for ensuring that calls are made only on appropriate devices & OSes. This reduces MTLPP complexity and better fits how MetalRHI works.
- Fix a number of retain/release bugs that were lying dormant in MTLPP but exposed by the switch to IMPTables.
- Add tvOS support.
Next up, put this into MetalRHI and start fixing all the fallout.
Change 3794631 by Mark.Satterthwaite
Missed updating mtlpp's build.cs for TVOS.
Change 3794651 by Uriel.Doyon
UPointLightComponent::GetUnitsConversionFactor() now takes the cone angle as parameter. This allows to fix spotlight unit conversion when using lumens.
Change 3794720 by Guillaume.Abadie
Fixes a bug in Global{Bilinear,Trilinear}ClampedSampler that was actually doing a Point sampling.
Change 3794749 by Mark.Satterthwaite
Fix mtlpp.build.cs paths.
Change 3794856 by Mark.Satterthwaite
Fix some shadowing warnings.
Change 3795484 by Daniel.Wright
Implemented the Spherical Harmonic windowing algorithm from 'Stupid Spherical Harmonics (SH) Tricks'
New WorldSettings Lightmass property VolumetricLightmapSphericalHarmonicSmoothing controls the global amount of smoothing applied
Change 3795590 by Brian.Karis
Area light fixes
Fixed order of operations. This helps mixing of SourceRadius, SourceLength, and SoftSourceRadius.
Change 3796832 by Marcus.Wassmer
Correct shouldcache condition for new resolve shader
Change 3796884 by Marcus.Wassmer
Doing it right this time.
Change 3797196 by Mark.Satterthwaite
More updates to MTLPP to make things simpler and reduce the number of spurious Objective-C warnings that are emitted because of the way we are using the runtime.
Change 3797200 by Daniel.Wright
Lightmass now uses the highest density VolumetricLightmapDensityVolume settings that affect any part of a cell
Change 3797221 by Daniel.Wright
Reduced default SphericalHarmonicSmoothing based on RoboRecall tests. Now only active with strong direct lighting from static lights by default.
Change 3797411 by Brian.Karis
Disable ExpandGamut for old tone mapper.
Change 3797462 by Mark.Satterthwaite
More build warnings silenced after changing to the lowest possible deployment target OS for each library.
Change 3797585 by Mark.Satterthwaite
Range-based-For support in the NSArray wrapper.
Change 3797836 by Mark.Satterthwaite
Even more forward-declarations to avoid system headers poking through to the including code from mtlpp.
Change 3798027 by Mark.Satterthwaite
Fix handling of nil objects, on which no functions may be called, command-buffer retention and IMP declaration.
Change 3798154 by Mark.Satterthwaite
Fix some egregious memory leaks that rewriting to use mtlpp exposed before we carry on - don't want these slipping into 4.19.
Change 3800990 by Mark.Satterthwaite
Typedef all the completion-handler callback types in mtlpp to make future me's life easier.
Change 3801400 by Chris.Bunner
Improving automated test errors on failure to generate report data.
Change 3801726 by Mark.Satterthwaite
Correct some function availability and the command-buffer error status in mtlpp.
Change 3801808 by Chris.Bunner
Added DefaultScalability.ini to EngineTest that forces all quality levels to Engine default Epic for now to improve consistency.
Change 3801862 by Marcus.Wassmer
Update automated tests with color gamut change
Change 3802214 by Chris.Bunner
When running automated tests in and editor-locked PIE viewport, skip resizing as the editor can't handle this.
Added bindable delegate called when ScreenshotRequest is processed - Useful to allow screenshots to override and restore settings per capture.
#jira UE-53188
Change 3802243 by Chris.Bunner
Added button to automated test screenshot browser to add or replace all outstanding test reports if appropriate.
DeleteAllReports button is now only enabled whilst there are reports in the list.
Change 3802372 by Chris.Bunner
Updating more test screenshots.
Change 3803683 by Chris.Bunner
Adding more logging and multiple attempts to automated test report network save.
Added small wait on repeated operations that are known to fail.
Change 3803826 by Rolando.Caloca
DR - vk - Fix merge issue
Change 3804181 by Chris.Bunner
Tentative fix for CIS test failure.
Change 3804236 by Chris.Bunner
Additional logging for case where file write silently fails, report platform-specific error.
Change 3804303 by zachary.wilson
Cleaning up assets in QAGame saved with empty engine versions to resolve warnings seen when launching on
Change 3804410 by Chris.Bunner
Added additional logging when automated screenshot test fails due to size mismatch.
Mismatched bounds are colored red in the delta.
Change 3804455 by Mark.Satterthwaite
Fix a small number of persistent memory leaks on the Mac build that slowly consume more and more memory as you use the Editor - interacting with menu's was particularly egregious as each NSMenu would leak after you move away.
#jira NA
Change 3804667 by Chris.Bunner
Speculative CIS fixes.
Change 3806008 by Chris.Bunner
Partially reimplementing backed-out CL 3804181 to improve consistency of how automated screenshot test settings are applied/restored.
#tests CIS preflight job 8174412
Change 3806909 by Mark.Satterthwaite
Use the vertex-shader's in-out mask to ensure that we only validate legitmate vertex-streams in Metal's DrawIndexedPrimitive implementation.
#jira UE-53046
Change 3807059 by laz.matech
Checking in QAGame Rendering Map, QA-PhysicalLightingUnits, for testing Physical Light Units.
Wanted to get this in before copy up.
#Jira none
Change 3807726 by Chris.Bunner
Removed a check that we can't fix up. The check hits unbound buffers which it assumes means a failure but is actually due to m.v.fetch. We don't have the information available to know which are which removed from the input without reading from the shader.
#jira UE-53046
Change 3807800 by Guillaume.Abadie
Fixes some warning in shader headers.
Change 3807804 by Guillaume.Abadie
Back out changelist 3807800
Change 3807807 by Guillaume.Abadie
Relands shader header warnings.
Change 3808046 by Chris.Bunner
Dropping a new automated test error back to a warning as this may lead to genuine issues being ignored in the short term.
Change 3809579 by Chris.Bunner
Back out changelist 3774677.
#jira UE-53483
Change 3809620 by Chris.Bunner
Updating animated cloth test screenshot.
Change 3803629 by Chris.Bunner
Rebuilt CornellBox and DistanceField test maps, updated screenshots.
Change 3787045 by Guillaume.Abadie
Moves some global samplers to Common.ush
Change 3809756 by Chris.Bunner
Updating animated cloth test screenshot.
[CL 3809764 by Chris Bunner in Main branch]
2017-12-15 12:47:47 -05:00
static void ModifyCompilationEnvironment ( const FGlobalShaderPermutationParameters & Parameters , FShaderCompilerEnvironment & OutEnvironment )
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
{
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3809756)
#rb None
#lockdown Nick.Penwarden
============================
MAJOR FEATURES & CHANGES
============================
Change 3629223 by Rolando.Caloca
DR - Rollback //UE4/Dev-Rendering/Engine/Source/Runtime/VulkanRHI to changelist 3627847
Change 3629708 by Rolando.Caloca
DR - vk - Redo some changes from DevMobile
3601439
3604186
3606672
3617383
3617474
3617483
Change 3761370 by Arne.Schober
DR - Added CityHash to use with conatiners and stuff. It provides good performance and high quallity across multiple platforms.
Change 3761437 by Guillaume.Abadie
Optimises motion blur compute shader for consoles.
Change 3761483 by Guillaume.Abadie
Fixes D3D11 RHI lying to dynamic resolution heuristic with t.MaxFPS.
Change 3761995 by Mark.Satterthwaite
Add the Metal compiler path to the local .pch filename to avoid problems when Xcode moves.
Change 3761996 by Mark.Satterthwaite
Emit more details when a pixel shader is found to have no outputs at all which Metal doesn't permit. More likely this is a bug in the shader compiler not configuring the in-out mask correctly...
#jira UE-52292
Change 3761999 by Mark.Satterthwaite
No need to avoid tessellation for FMetalRHICommandContext::RHIEndDrawIndexedPrimitiveUP anymore - that was from back when the tessellation logic was replicated in each RHI*Draw* implementation.
#jira UE-51937
Change 3762181 by Joe.Graf
Changed MaxShaderJobBatchSize to 25 on Mac as it reduced shader compile time by 21%
Change 3762607 by Mark.Satterthwaite
Remove accidentally included changes from 3761995.
Change 3762612 by Mark.Satterthwaite
Enable the explicit sincos intrinsic for Metal to avoid instances of UE-52477 that can cause shaders to compile incorrectly through hlslcc.
#jira UE-52477
Change 3762772 by Michael.Lentine
Move RHI calls to render thread.
Change 3763021 by Richard.Wallis
Remove shader cache tool project and implementation.
#jira UE-51613
Change 3763082 by Guillaume.Abadie
More SceneTexture, SceneColor and SceneDepth automated tests
Change 3763111 by Richard.Wallis
Clone of CL 3763033 (Release-4.18):
Fix for crash upon launching packaged game on Mac with Share Material Shader Code enabled.
#jira UE-52121
Change 3763657 by Michael.Lentine
Invalidate ddc for skeletal mesh render data so that the duplicated vertex render structures are properly serialized.
Change 3763727 by Jian.Ru
Fix Player Collision view mode. It is caused by checking an uninitialized vertex buffer so the check always fail.
#jira UE-52052
Change 3763738 by Guillaume.Abadie
Implements SSR input post process material location.
Change 3764271 by Mark.Satterthwaite
Allow ControlPointPatch lists to flow through MetalRHI as it was setup to handle this transparently - the VSHS compute shader will convert them to triangles to draw. Report the same warning as in the pipeline creation stage as this hasn't been formally validated.
#jira UE-52454
Change 3764316 by Daniel.Wright
Added AVolumetricLightmapDensityVolume - gives local control over Volumetric Lightmap density. Dropping the top mip outside of the play area in Monolith saves 20Mb (35Mb original).
Volumetric Lightmap no longer refines around static translucent geometry - saves 5Mb in Monolith
Reworked brick culling by error mechanism. Now compares error to interpolated parent lighting instead of the brick average - prevents dropping constant value bricks which are near a wall and cause leaking due to parent interpolation after being culled.
Change 3764318 by Daniel.Wright
Missing file
Change 3764321 by Daniel.Wright
Shader compiling memory optimizations
* Editor memory: Sharing uniform buffer includes and GeneratedInstancedStereo.ush per FShaderType (was previously duplicated per FShader job)
* SCW input size: Sharing uniform buffer includes and SharedEnvironment per batch
* 7.6Gb of shader job inputs in memory -> .5Gb (13x less) when doing a full shader compile of Paragon Editor
* 13.8Gb written into worker input files -> 2.9Gb (4.7x less). Global shaders are never batched when sent to SCW so unoptimized by these changes.
Change 3764595 by Daniel.Wright
Added VolumetricLightmapDensityVolume asset icons
Change 3764701 by Michael.Lentine
Add duplicated vertices merging for meshmerge.
Change 3766002 by Guillaume.Abadie
Fixes a crash in translucency.
Change 3766007 by Guillaume.Abadie
Oups.... Fixes compilation failure.
Change 3766697 by Guillaume.Abadie
Giant refactor of global shader interface for upcoming native support of permutation.
CL generated by python script.
Change 3767205 by Chris.Bunner
Deferring FMaterial::RenderingThreadShaderMap update to render-thread rather than assumption commands have been flushed.
#jira UE-50652
Change 3767207 by Chris.Bunner
Clamp fetched texture coordinates to those available on the mesh.
Change 3767209 by Chris.Bunner
PR #4203: Early-outs in UMaterialInstance parameter setters (Contributed by stefanzimecki)
#jira UE-52193
Change 3767772 by Mark.Satterthwaite
MetalShaderFormat will no longer fallback to text shaders when you ask it to compile to bytecode but the bytecode compiler is not available (either locally or remotely) - this ensures that the DDC can't be poisoned by incorrectly configured clients. The Editor is already setup such that if the remote shader compiler is not configured & Xcode is not available locally the shader-compiler will be invoked to generate text shaders.
#jira UE-52554
Change 3768604 by Guillaume.Abadie
Polish up with new global shader function signature.
Change 3768993 by Guillaume.Abadie
Fixes r.Upscale.Panini cvars
Change 3769478 by Mark.Satterthwaite
Move the ue4_stdlib.metal & PCH into a temporary directory that exists for the lifetime of the SCW on the remote side as well as the local one and add this path as an include directory.
#jira UE-52587
Change 3769703 by Mark.Satterthwaite
For all Metal platforms >= Metal v1.2 transform mul(a,b) into fma(a,b,0) to prevent the Apple compiler reordering operations differently between the base & depth passes which results in variance in the position output.
For iOS disable fast-math when the vertex-shader uses World-Position-Offset because there are additional problems on the iOS shader compiler that result in position variance even with the above fix - WPO performance will suffer but I don't have any alternatives.
Remove the depth-offset hack from the depth-only vertex shader again.
#jira UES-5651
Change 3769763 by Mark.Satterthwaite
Handle swizzle's in the hlslcc fma identification pass so that we reduce the number of instructions and the platform compiler can't break the instructions up.
Change 3769849 by Mark.Satterthwaite
Fix CIS error.
Change 3770517 by Richard.Wallis
Fix for crash when creating a new media texture (AppleIntelHD5000GraphicsMTLDriver!SamplerStage::bindSamplerToTexture()). Missing texture resource for binding. Old InitDynamicRHI() code has been refactored out into seperate functions which leaves us on Mac with a NULL resource initially after creation which Metal doesn't like. This fix puts InitDynamicRHI down the default setup/clear path which inits default resources - I don't think we should use a global dummy in this instance as this is a render target.
#jira UE-51940
Change 3770688 by Uriel.Doyon
Fixed texture resolution returning 0 when running blueprint construction scripts at cook time.
Change 3771115 by Mark.Satterthwaite
Report errors from failed attempts to compile global shaders or we can't see why things fail on non-Windows platforms.
Change 3771263 by Mark.Satterthwaite
Change the way ManualVertexFetch is enabled on Metal platforms so that it is enabled when targeting Metal v1.2 and higher (macOS 10.12+/iOS 10+). This brings iOS in the Desktop Forward renderer back into line with the Mac.
#jira UERNDR-300
Change 3773472 by Guillaume.Abadie
Fixes a crash on PIE of SimpleComposure project.
Change 3773475 by Guillaume.Abadie
Fixes bug in editor viewport caused by SSR input changes.
Change 3774677 by Arne.Schober
DR - Deprecated SetLocal from the RHICmdlist
Fixed some unnecessary PSO collisions.
Change 3777037 by Mark.Satterthwaite
Remove incorrect change that caused a reference to "accurate::sincos" to appear in some Metal shaders rather than "precise::sincos".
Change 3777122 by Mark.Satterthwaite
Back out changelist 3777037 - I'm blind and wasn't seeing the real problem was a stale shader cache...
Change 3777196 by Mark.Satterthwaite
Fix text-shader compilation on iOS 10 - maybe iOS 9 too (untested!).
We need our own make_scalar type-trait template for ue4_stdlib.metal so that we still compile with older iOS runtime compilers and we can't use as_type to directly implement the packHalf2x16/unpackHalf2x16 intrinsics for these older runtime compilers either.
Change 3779098 by Rolando.Caloca
DR - vk - Fix query index
Change 3779275 by Mark.Satterthwaite
Silence the Metal runtime compiler warning caused by use of a deprecated enum value when running text shaders compiled for Metal v1.0/1.1 on a Metal v1.2+ OS.
#jira UE-52554
Change 3779427 by Rolando.Caloca
DR - vk - Fix for allocator contention
Change 3779608 by Uriel.Doyon
Fixed invalid access in the resave package commantlet when building texture streaming material data for materials enabling tesselation.
Change 3784496 by Mark.Satterthwaite
Temporarily disable USE_OBJECT_COMPOSITING_TILE_CULLING for Metal shader compilation only - other platforms are unaffected - as it isn't working properly for some reason. need to work out what's up but don't want Distance Fields to be completely snookered in the interim.
#jira UE-52952
Change 3784608 by Rolando.Caloca
DR - Copy 3784588
- Fix for drivers returning out of date swapchains during resizes
Change 3784734 by Mark.Satterthwaite
Real fix for UE-52952 - MetalShaderFormat wasn't propagating the full thread-group value.
#jira UE-52952
Change 3784741 by Mark.Satterthwaite
More Metal debugging commandline options "-metalfastmath" & "-metalnofastmath" to force fast-math on or off for all shaders, must be using runtime-compiled shaders (i.e. -metalshaderdebug or r.Shaders.Optimise=0) to take effect.
Change 3787103 by Guillaume.Abadie
Kills BuiltinSamplers UB
Change 3787207 by Guillaume.Abadie
Sorry, compile fix that were fine with local changes...
Change 3787396 by Marcus.Wassmer
PR #4271: UE-52901: Set VIS_Max meta to hidden (Contributed by projectgheist)
Change 3788028 by Peter.Sumanaseni
Working linear HDR exr output from sequencer
Change 3788536 by Mark.Satterthwaite
Track whether the Metal shader uses the discard_fragment function as when this is used but without any other outputs we know we need to bind at least one render-target or a depth-stencil surface but we don't know which. This lets us correctly error when we encounter a shader with no outputs at all which Metal doesn't permit.
#jira UE-52292
Change 3788538 by Mark.Satterthwaite
Let's try mitigating UE-46604 on Nvidia by retaining resource references in the command-buffer. This shouldn't be necessary and isn't typically on other vendors but we haven't been able to reproduce this reliably enough to get to the bottom of it.
#jira UE-46604
Change 3789083 by Guillaume.Abadie
Implements global shader permutations. Example in ScreenSpaceReflections.cpp.
Change 3789090 by Guillaume.Abadie
Fixes linux build.
Change 3789106 by Guillaume.Abadie
Fixes compilation failure in niagara plugin.
Change 3789274 by Guillaume.Abadie
Avoid hit proxies to clobber TAA's hitsory.
#jira UE-52968
Change 3789380 by Guillaume.Abadie
Back out changelist 3789083: global shader permutation because compilation failure in clang.
Change 3789648 by Guillaume.Abadie
Relands global shader permutation, with clang support.
Change 3789712 by Guillaume.Abadie
Fixes TestImage show flag with TAAU on.
#jira UE-53061
Change 3791593 by Guillaume.Abadie
Reinvalidates shaders with shader permutations.
Change 3791884 by Daniel.Wright
Added BP setter for LowerHemisphereColor
Change 3791886 by Daniel.Wright
Added LightmapType to PrimitiveComponent
* ForceVolumetric allows forcing static geometry to use Volumetric Lightmaps, which can be useful on instanced foliage where seams are prevalent. Lightmass internal caching still requires lightmap UVs and reasonable lightmap resolution.
* ForceSurface replaces bLightAsIfStatic
Improvements to Volumetric Lightmap quality needed for static geometry
* Stationary light shadowing is now dilated inside geometry
* Now doing two dilation passes since samples near geometry see inside due to ray start bias
* Refinement around geometry uses an expanded cell bounds when the geometry is going to use Volumetric Lightmaps, since cross-resolution stitching causes leaking
Lightmass debug primitives are now tied to a swarm task instead of global - allows debugging of Volumetric Lightmap tasks
Change 3792256 by Guillaume.Abadie
Fixes a bug where permutation was not actually serialised in FShader, so was ending up recompiling shader at every load.
Change 3792884 by Marcus.Wassmer
Copying //UE4/Partner-AMD to Dev-Rendering (//UE4/Dev-Rendering)
Change 3793200 by Marcus.Wassmer
Copying //UE4/Partner-IDV-SpeedTree to Dev-Rendering (//UE4/Dev-Rendering)
Speedtree 8 support
Change 3793206 by Brian.Karis
Added color grading control BlueCorrection to correct for artifacts with "electric" blues due to the ACEScg color space. Bright blue desaturates instead of going to violet.
Added color grading control ExpandGamut which expands bright saturated colors outside the sRGB gamut to fake wide gamut rendering.
ACES changes.
Change 3793344 by Marcus.Wassmer
Fix editortest compile
Change 3794285 by Guillaume.Abadie
Serializes PermutationId according to archive rendering version to avoid issues with old material that were serializing a shader map into UObject.
Change 3794307 by Guillaume.Abadie
Resaves uassets that were modified between 3789648 and 3794285
Change 3794627 by Mark.Satterthwaite
Implement two components for MTLPP, an IMP cache for Objective-C selector implementations & an interposition framework for those same selectors:
- imp_SelectorCache & friends provide the IMP caching for each of the Metal protocols which constitute most of the API, so far I've not covered the Metal classes used for the various descriptor/initializer types. Each type has its own IMPTable which caches the selector's implementation pointer and provides the mechanism to hook that implementation. As Objective-C is runtime dynamic this look up must be performed on the actual Class value returned by an object at runtime - you can't do this at compile time. Even things like NSString which appear compile-time static are really not as NSString is an alias for a class-cluster (NSString, NSMutableString, __NSInlineString and more).
- The interpose directory contains MTI* files which are the framework for interposing all the functions in Metal's runtime API - I deliberately omit the descriptor classes & read-only functions as there's no benefit to interposing them - which I can build off to create a trace tool or a superior validation layer. Right now this is Mac only as there'll be some problems to solve for iOS/tvOS due to difference in linking requirements - not insurmountable.
- Rebuild MTLPP's implementation of the C++ wrapper classes around the IMPTable's - this means we avoid all the objc_msgSend overhead for all the classes and functions whose implementations are cached. Right now the IMPTable is going to incur a look-up for all non-copy/move constructors which is suboptimal - ideally the Metal IMPTables would be cached in the Device object as they will be consistent within a single Device.
- Sort out the MTLPP availability logic - it now exports the availability warnings to the caller and internally just blithely assumes it may call the functions, the caller is responsible for ensuring that calls are made only on appropriate devices & OSes. This reduces MTLPP complexity and better fits how MetalRHI works.
- Fix a number of retain/release bugs that were lying dormant in MTLPP but exposed by the switch to IMPTables.
- Add tvOS support.
Next up, put this into MetalRHI and start fixing all the fallout.
Change 3794631 by Mark.Satterthwaite
Missed updating mtlpp's build.cs for TVOS.
Change 3794651 by Uriel.Doyon
UPointLightComponent::GetUnitsConversionFactor() now takes the cone angle as parameter. This allows to fix spotlight unit conversion when using lumens.
Change 3794720 by Guillaume.Abadie
Fixes a bug in Global{Bilinear,Trilinear}ClampedSampler that was actually doing a Point sampling.
Change 3794749 by Mark.Satterthwaite
Fix mtlpp.build.cs paths.
Change 3794856 by Mark.Satterthwaite
Fix some shadowing warnings.
Change 3795484 by Daniel.Wright
Implemented the Spherical Harmonic windowing algorithm from 'Stupid Spherical Harmonics (SH) Tricks'
New WorldSettings Lightmass property VolumetricLightmapSphericalHarmonicSmoothing controls the global amount of smoothing applied
Change 3795590 by Brian.Karis
Area light fixes
Fixed order of operations. This helps mixing of SourceRadius, SourceLength, and SoftSourceRadius.
Change 3796832 by Marcus.Wassmer
Correct shouldcache condition for new resolve shader
Change 3796884 by Marcus.Wassmer
Doing it right this time.
Change 3797196 by Mark.Satterthwaite
More updates to MTLPP to make things simpler and reduce the number of spurious Objective-C warnings that are emitted because of the way we are using the runtime.
Change 3797200 by Daniel.Wright
Lightmass now uses the highest density VolumetricLightmapDensityVolume settings that affect any part of a cell
Change 3797221 by Daniel.Wright
Reduced default SphericalHarmonicSmoothing based on RoboRecall tests. Now only active with strong direct lighting from static lights by default.
Change 3797411 by Brian.Karis
Disable ExpandGamut for old tone mapper.
Change 3797462 by Mark.Satterthwaite
More build warnings silenced after changing to the lowest possible deployment target OS for each library.
Change 3797585 by Mark.Satterthwaite
Range-based-For support in the NSArray wrapper.
Change 3797836 by Mark.Satterthwaite
Even more forward-declarations to avoid system headers poking through to the including code from mtlpp.
Change 3798027 by Mark.Satterthwaite
Fix handling of nil objects, on which no functions may be called, command-buffer retention and IMP declaration.
Change 3798154 by Mark.Satterthwaite
Fix some egregious memory leaks that rewriting to use mtlpp exposed before we carry on - don't want these slipping into 4.19.
Change 3800990 by Mark.Satterthwaite
Typedef all the completion-handler callback types in mtlpp to make future me's life easier.
Change 3801400 by Chris.Bunner
Improving automated test errors on failure to generate report data.
Change 3801726 by Mark.Satterthwaite
Correct some function availability and the command-buffer error status in mtlpp.
Change 3801808 by Chris.Bunner
Added DefaultScalability.ini to EngineTest that forces all quality levels to Engine default Epic for now to improve consistency.
Change 3801862 by Marcus.Wassmer
Update automated tests with color gamut change
Change 3802214 by Chris.Bunner
When running automated tests in and editor-locked PIE viewport, skip resizing as the editor can't handle this.
Added bindable delegate called when ScreenshotRequest is processed - Useful to allow screenshots to override and restore settings per capture.
#jira UE-53188
Change 3802243 by Chris.Bunner
Added button to automated test screenshot browser to add or replace all outstanding test reports if appropriate.
DeleteAllReports button is now only enabled whilst there are reports in the list.
Change 3802372 by Chris.Bunner
Updating more test screenshots.
Change 3803683 by Chris.Bunner
Adding more logging and multiple attempts to automated test report network save.
Added small wait on repeated operations that are known to fail.
Change 3803826 by Rolando.Caloca
DR - vk - Fix merge issue
Change 3804181 by Chris.Bunner
Tentative fix for CIS test failure.
Change 3804236 by Chris.Bunner
Additional logging for case where file write silently fails, report platform-specific error.
Change 3804303 by zachary.wilson
Cleaning up assets in QAGame saved with empty engine versions to resolve warnings seen when launching on
Change 3804410 by Chris.Bunner
Added additional logging when automated screenshot test fails due to size mismatch.
Mismatched bounds are colored red in the delta.
Change 3804455 by Mark.Satterthwaite
Fix a small number of persistent memory leaks on the Mac build that slowly consume more and more memory as you use the Editor - interacting with menu's was particularly egregious as each NSMenu would leak after you move away.
#jira NA
Change 3804667 by Chris.Bunner
Speculative CIS fixes.
Change 3806008 by Chris.Bunner
Partially reimplementing backed-out CL 3804181 to improve consistency of how automated screenshot test settings are applied/restored.
#tests CIS preflight job 8174412
Change 3806909 by Mark.Satterthwaite
Use the vertex-shader's in-out mask to ensure that we only validate legitmate vertex-streams in Metal's DrawIndexedPrimitive implementation.
#jira UE-53046
Change 3807059 by laz.matech
Checking in QAGame Rendering Map, QA-PhysicalLightingUnits, for testing Physical Light Units.
Wanted to get this in before copy up.
#Jira none
Change 3807726 by Chris.Bunner
Removed a check that we can't fix up. The check hits unbound buffers which it assumes means a failure but is actually due to m.v.fetch. We don't have the information available to know which are which removed from the input without reading from the shader.
#jira UE-53046
Change 3807800 by Guillaume.Abadie
Fixes some warning in shader headers.
Change 3807804 by Guillaume.Abadie
Back out changelist 3807800
Change 3807807 by Guillaume.Abadie
Relands shader header warnings.
Change 3808046 by Chris.Bunner
Dropping a new automated test error back to a warning as this may lead to genuine issues being ignored in the short term.
Change 3809579 by Chris.Bunner
Back out changelist 3774677.
#jira UE-53483
Change 3809620 by Chris.Bunner
Updating animated cloth test screenshot.
Change 3803629 by Chris.Bunner
Rebuilt CornellBox and DistanceField test maps, updated screenshots.
Change 3787045 by Guillaume.Abadie
Moves some global samplers to Common.ush
Change 3809756 by Chris.Bunner
Updating animated cloth test screenshot.
[CL 3809764 by Chris Bunner in Main branch]
2017-12-15 12:47:47 -05:00
FGlobalShader : : ModifyCompilationEnvironment ( Parameters , OutEnvironment ) ;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
OutEnvironment . SetDefine ( TEXT ( " UPDATEOBJECTS_THREADGROUP_SIZE " ) , UpdateObjectsGroupSize ) ;
}
} ;
2021-06-14 12:46:26 -04:00
IMPLEMENT_GLOBAL_SHADER ( FCullObjectsForViewCS , " /Engine/Private/DistanceFieldObjectCulling.usf " , " CullObjectsForViewCS " , SF_Compute ) ;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
2021-06-14 12:46:26 -04:00
void CullObjectsToView ( FRDGBuilder & GraphBuilder , FScene * Scene , const FViewInfo & View , const FDistanceFieldAOParameters & Parameters , FDistanceFieldCulledObjectBufferParameters & CulledObjectBufferParameters )
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
{
2021-06-14 12:46:26 -04:00
AddClearUAVPass ( GraphBuilder , CulledObjectBufferParameters . RWObjectIndirectArguments , 0 ) ;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
{
2021-06-14 12:46:26 -04:00
const int32 NumObjectsInBuffer = Scene - > DistanceFieldSceneData . NumObjectsInBuffer ;
auto * PassParameters = GraphBuilder . AllocParameters < FCullObjectsForViewCS : : FParameters > ( ) ;
PassParameters - > CulledObjectBufferParameters = CulledObjectBufferParameters ;
2022-04-26 14:37:07 -04:00
PassParameters - > ObjectBufferParameters = DistanceField : : SetupObjectBufferParameters ( GraphBuilder , Scene - > DistanceFieldSceneData ) ;
2021-06-14 12:46:26 -04:00
PassParameters - > View = View . ViewUniformBuffer ;
PassParameters - > NumConvexHullPlanes = View . ViewFrustum . Planes . Num ( ) ;
for ( int32 i = 0 ; i < View . ViewFrustum . Planes . Num ( ) ; i + + )
2020-09-24 00:43:27 -04:00
{
2022-03-04 06:01:33 -05:00
const FPlane4f Plane ( View . ViewFrustum . Planes [ i ] . TranslateBy ( View . ViewMatrices . GetPreViewTranslation ( ) ) ) ;
2022-03-04 11:09:45 -05:00
PassParameters - > ViewFrustumConvexHull [ i ] = FVector4f ( FVector3f ( Plane ) , Plane . W ) ;
2020-09-24 00:43:27 -04:00
}
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
2021-06-14 12:46:26 -04:00
PassParameters - > ObjectBoundingGeometryIndexCount = StencilingGeometry : : GLowPolyStencilSphereIndexBuffer . GetIndexCount ( ) ;
PassParameters - > AOObjectMaxDistance = Parameters . ObjectMaxOcclusionDistance ;
PassParameters - > AOMaxViewDistance = GetMaxAOViewDistance ( ) ;
2020-09-24 00:43:27 -04:00
2021-06-14 12:46:26 -04:00
auto ComputeShader = View . ShaderMap - > GetShader < FCullObjectsForViewCS > ( ) ;
const int32 GroupSize = FMath : : DivideAndRoundUp < uint32 > ( NumObjectsInBuffer , UpdateObjectsGroupSize ) ;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
2021-06-14 12:46:26 -04:00
FComputeShaderUtils : : AddPass (
GraphBuilder ,
RDG_EVENT_NAME ( " ObjectFrustumCulling " ) ,
ComputeShader ,
PassParameters ,
FIntVector ( GroupSize , 1 , 1 ) ) ;
}
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
}
/** */
class FBuildTileConesCS : public FGlobalShader
{
public :
2021-12-09 04:49:36 -05:00
DECLARE_GLOBAL_SHADER ( FBuildTileConesCS ) ;
SHADER_USE_PARAMETER_STRUCT ( FBuildTileConesCS , FGlobalShader ) ;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
2021-06-14 12:46:26 -04:00
BEGIN_SHADER_PARAMETER_STRUCT ( FParameters , )
2021-12-09 04:49:36 -05:00
SHADER_PARAMETER_STRUCT_REF ( FViewUniformShaderParameters , View )
2021-09-22 10:01:48 -04:00
SHADER_PARAMETER_RDG_BUFFER_UAV ( RWStructuredBuffer < FVector4f > , RWTileConeAxisAndCos )
SHADER_PARAMETER_RDG_BUFFER_UAV ( RWStructuredBuffer < FVector4f > , RWTileConeDepthRanges )
2021-06-14 12:46:26 -04:00
SHADER_PARAMETER_RDG_UNIFORM_BUFFER ( FSceneTextureUniformParameters , SceneTextures )
SHADER_PARAMETER_RDG_TEXTURE ( Texture2D , DistanceFieldNormalTexture )
SHADER_PARAMETER_SAMPLER ( SamplerState , DistanceFieldNormalSampler )
2021-12-09 04:49:36 -05:00
SHADER_PARAMETER_STRUCT_INCLUDE ( FAOParameters , AOParameters )
SHADER_PARAMETER ( FUintVector4 , ViewDimensions )
SHADER_PARAMETER ( FVector2f , NumGroups )
2021-06-14 12:46:26 -04:00
END_SHADER_PARAMETER_STRUCT ( )
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3809756)
#rb None
#lockdown Nick.Penwarden
============================
MAJOR FEATURES & CHANGES
============================
Change 3629223 by Rolando.Caloca
DR - Rollback //UE4/Dev-Rendering/Engine/Source/Runtime/VulkanRHI to changelist 3627847
Change 3629708 by Rolando.Caloca
DR - vk - Redo some changes from DevMobile
3601439
3604186
3606672
3617383
3617474
3617483
Change 3761370 by Arne.Schober
DR - Added CityHash to use with conatiners and stuff. It provides good performance and high quallity across multiple platforms.
Change 3761437 by Guillaume.Abadie
Optimises motion blur compute shader for consoles.
Change 3761483 by Guillaume.Abadie
Fixes D3D11 RHI lying to dynamic resolution heuristic with t.MaxFPS.
Change 3761995 by Mark.Satterthwaite
Add the Metal compiler path to the local .pch filename to avoid problems when Xcode moves.
Change 3761996 by Mark.Satterthwaite
Emit more details when a pixel shader is found to have no outputs at all which Metal doesn't permit. More likely this is a bug in the shader compiler not configuring the in-out mask correctly...
#jira UE-52292
Change 3761999 by Mark.Satterthwaite
No need to avoid tessellation for FMetalRHICommandContext::RHIEndDrawIndexedPrimitiveUP anymore - that was from back when the tessellation logic was replicated in each RHI*Draw* implementation.
#jira UE-51937
Change 3762181 by Joe.Graf
Changed MaxShaderJobBatchSize to 25 on Mac as it reduced shader compile time by 21%
Change 3762607 by Mark.Satterthwaite
Remove accidentally included changes from 3761995.
Change 3762612 by Mark.Satterthwaite
Enable the explicit sincos intrinsic for Metal to avoid instances of UE-52477 that can cause shaders to compile incorrectly through hlslcc.
#jira UE-52477
Change 3762772 by Michael.Lentine
Move RHI calls to render thread.
Change 3763021 by Richard.Wallis
Remove shader cache tool project and implementation.
#jira UE-51613
Change 3763082 by Guillaume.Abadie
More SceneTexture, SceneColor and SceneDepth automated tests
Change 3763111 by Richard.Wallis
Clone of CL 3763033 (Release-4.18):
Fix for crash upon launching packaged game on Mac with Share Material Shader Code enabled.
#jira UE-52121
Change 3763657 by Michael.Lentine
Invalidate ddc for skeletal mesh render data so that the duplicated vertex render structures are properly serialized.
Change 3763727 by Jian.Ru
Fix Player Collision view mode. It is caused by checking an uninitialized vertex buffer so the check always fail.
#jira UE-52052
Change 3763738 by Guillaume.Abadie
Implements SSR input post process material location.
Change 3764271 by Mark.Satterthwaite
Allow ControlPointPatch lists to flow through MetalRHI as it was setup to handle this transparently - the VSHS compute shader will convert them to triangles to draw. Report the same warning as in the pipeline creation stage as this hasn't been formally validated.
#jira UE-52454
Change 3764316 by Daniel.Wright
Added AVolumetricLightmapDensityVolume - gives local control over Volumetric Lightmap density. Dropping the top mip outside of the play area in Monolith saves 20Mb (35Mb original).
Volumetric Lightmap no longer refines around static translucent geometry - saves 5Mb in Monolith
Reworked brick culling by error mechanism. Now compares error to interpolated parent lighting instead of the brick average - prevents dropping constant value bricks which are near a wall and cause leaking due to parent interpolation after being culled.
Change 3764318 by Daniel.Wright
Missing file
Change 3764321 by Daniel.Wright
Shader compiling memory optimizations
* Editor memory: Sharing uniform buffer includes and GeneratedInstancedStereo.ush per FShaderType (was previously duplicated per FShader job)
* SCW input size: Sharing uniform buffer includes and SharedEnvironment per batch
* 7.6Gb of shader job inputs in memory -> .5Gb (13x less) when doing a full shader compile of Paragon Editor
* 13.8Gb written into worker input files -> 2.9Gb (4.7x less). Global shaders are never batched when sent to SCW so unoptimized by these changes.
Change 3764595 by Daniel.Wright
Added VolumetricLightmapDensityVolume asset icons
Change 3764701 by Michael.Lentine
Add duplicated vertices merging for meshmerge.
Change 3766002 by Guillaume.Abadie
Fixes a crash in translucency.
Change 3766007 by Guillaume.Abadie
Oups.... Fixes compilation failure.
Change 3766697 by Guillaume.Abadie
Giant refactor of global shader interface for upcoming native support of permutation.
CL generated by python script.
Change 3767205 by Chris.Bunner
Deferring FMaterial::RenderingThreadShaderMap update to render-thread rather than assumption commands have been flushed.
#jira UE-50652
Change 3767207 by Chris.Bunner
Clamp fetched texture coordinates to those available on the mesh.
Change 3767209 by Chris.Bunner
PR #4203: Early-outs in UMaterialInstance parameter setters (Contributed by stefanzimecki)
#jira UE-52193
Change 3767772 by Mark.Satterthwaite
MetalShaderFormat will no longer fallback to text shaders when you ask it to compile to bytecode but the bytecode compiler is not available (either locally or remotely) - this ensures that the DDC can't be poisoned by incorrectly configured clients. The Editor is already setup such that if the remote shader compiler is not configured & Xcode is not available locally the shader-compiler will be invoked to generate text shaders.
#jira UE-52554
Change 3768604 by Guillaume.Abadie
Polish up with new global shader function signature.
Change 3768993 by Guillaume.Abadie
Fixes r.Upscale.Panini cvars
Change 3769478 by Mark.Satterthwaite
Move the ue4_stdlib.metal & PCH into a temporary directory that exists for the lifetime of the SCW on the remote side as well as the local one and add this path as an include directory.
#jira UE-52587
Change 3769703 by Mark.Satterthwaite
For all Metal platforms >= Metal v1.2 transform mul(a,b) into fma(a,b,0) to prevent the Apple compiler reordering operations differently between the base & depth passes which results in variance in the position output.
For iOS disable fast-math when the vertex-shader uses World-Position-Offset because there are additional problems on the iOS shader compiler that result in position variance even with the above fix - WPO performance will suffer but I don't have any alternatives.
Remove the depth-offset hack from the depth-only vertex shader again.
#jira UES-5651
Change 3769763 by Mark.Satterthwaite
Handle swizzle's in the hlslcc fma identification pass so that we reduce the number of instructions and the platform compiler can't break the instructions up.
Change 3769849 by Mark.Satterthwaite
Fix CIS error.
Change 3770517 by Richard.Wallis
Fix for crash when creating a new media texture (AppleIntelHD5000GraphicsMTLDriver!SamplerStage::bindSamplerToTexture()). Missing texture resource for binding. Old InitDynamicRHI() code has been refactored out into seperate functions which leaves us on Mac with a NULL resource initially after creation which Metal doesn't like. This fix puts InitDynamicRHI down the default setup/clear path which inits default resources - I don't think we should use a global dummy in this instance as this is a render target.
#jira UE-51940
Change 3770688 by Uriel.Doyon
Fixed texture resolution returning 0 when running blueprint construction scripts at cook time.
Change 3771115 by Mark.Satterthwaite
Report errors from failed attempts to compile global shaders or we can't see why things fail on non-Windows platforms.
Change 3771263 by Mark.Satterthwaite
Change the way ManualVertexFetch is enabled on Metal platforms so that it is enabled when targeting Metal v1.2 and higher (macOS 10.12+/iOS 10+). This brings iOS in the Desktop Forward renderer back into line with the Mac.
#jira UERNDR-300
Change 3773472 by Guillaume.Abadie
Fixes a crash on PIE of SimpleComposure project.
Change 3773475 by Guillaume.Abadie
Fixes bug in editor viewport caused by SSR input changes.
Change 3774677 by Arne.Schober
DR - Deprecated SetLocal from the RHICmdlist
Fixed some unnecessary PSO collisions.
Change 3777037 by Mark.Satterthwaite
Remove incorrect change that caused a reference to "accurate::sincos" to appear in some Metal shaders rather than "precise::sincos".
Change 3777122 by Mark.Satterthwaite
Back out changelist 3777037 - I'm blind and wasn't seeing the real problem was a stale shader cache...
Change 3777196 by Mark.Satterthwaite
Fix text-shader compilation on iOS 10 - maybe iOS 9 too (untested!).
We need our own make_scalar type-trait template for ue4_stdlib.metal so that we still compile with older iOS runtime compilers and we can't use as_type to directly implement the packHalf2x16/unpackHalf2x16 intrinsics for these older runtime compilers either.
Change 3779098 by Rolando.Caloca
DR - vk - Fix query index
Change 3779275 by Mark.Satterthwaite
Silence the Metal runtime compiler warning caused by use of a deprecated enum value when running text shaders compiled for Metal v1.0/1.1 on a Metal v1.2+ OS.
#jira UE-52554
Change 3779427 by Rolando.Caloca
DR - vk - Fix for allocator contention
Change 3779608 by Uriel.Doyon
Fixed invalid access in the resave package commantlet when building texture streaming material data for materials enabling tesselation.
Change 3784496 by Mark.Satterthwaite
Temporarily disable USE_OBJECT_COMPOSITING_TILE_CULLING for Metal shader compilation only - other platforms are unaffected - as it isn't working properly for some reason. need to work out what's up but don't want Distance Fields to be completely snookered in the interim.
#jira UE-52952
Change 3784608 by Rolando.Caloca
DR - Copy 3784588
- Fix for drivers returning out of date swapchains during resizes
Change 3784734 by Mark.Satterthwaite
Real fix for UE-52952 - MetalShaderFormat wasn't propagating the full thread-group value.
#jira UE-52952
Change 3784741 by Mark.Satterthwaite
More Metal debugging commandline options "-metalfastmath" & "-metalnofastmath" to force fast-math on or off for all shaders, must be using runtime-compiled shaders (i.e. -metalshaderdebug or r.Shaders.Optimise=0) to take effect.
Change 3787103 by Guillaume.Abadie
Kills BuiltinSamplers UB
Change 3787207 by Guillaume.Abadie
Sorry, compile fix that were fine with local changes...
Change 3787396 by Marcus.Wassmer
PR #4271: UE-52901: Set VIS_Max meta to hidden (Contributed by projectgheist)
Change 3788028 by Peter.Sumanaseni
Working linear HDR exr output from sequencer
Change 3788536 by Mark.Satterthwaite
Track whether the Metal shader uses the discard_fragment function as when this is used but without any other outputs we know we need to bind at least one render-target or a depth-stencil surface but we don't know which. This lets us correctly error when we encounter a shader with no outputs at all which Metal doesn't permit.
#jira UE-52292
Change 3788538 by Mark.Satterthwaite
Let's try mitigating UE-46604 on Nvidia by retaining resource references in the command-buffer. This shouldn't be necessary and isn't typically on other vendors but we haven't been able to reproduce this reliably enough to get to the bottom of it.
#jira UE-46604
Change 3789083 by Guillaume.Abadie
Implements global shader permutations. Example in ScreenSpaceReflections.cpp.
Change 3789090 by Guillaume.Abadie
Fixes linux build.
Change 3789106 by Guillaume.Abadie
Fixes compilation failure in niagara plugin.
Change 3789274 by Guillaume.Abadie
Avoid hit proxies to clobber TAA's hitsory.
#jira UE-52968
Change 3789380 by Guillaume.Abadie
Back out changelist 3789083: global shader permutation because compilation failure in clang.
Change 3789648 by Guillaume.Abadie
Relands global shader permutation, with clang support.
Change 3789712 by Guillaume.Abadie
Fixes TestImage show flag with TAAU on.
#jira UE-53061
Change 3791593 by Guillaume.Abadie
Reinvalidates shaders with shader permutations.
Change 3791884 by Daniel.Wright
Added BP setter for LowerHemisphereColor
Change 3791886 by Daniel.Wright
Added LightmapType to PrimitiveComponent
* ForceVolumetric allows forcing static geometry to use Volumetric Lightmaps, which can be useful on instanced foliage where seams are prevalent. Lightmass internal caching still requires lightmap UVs and reasonable lightmap resolution.
* ForceSurface replaces bLightAsIfStatic
Improvements to Volumetric Lightmap quality needed for static geometry
* Stationary light shadowing is now dilated inside geometry
* Now doing two dilation passes since samples near geometry see inside due to ray start bias
* Refinement around geometry uses an expanded cell bounds when the geometry is going to use Volumetric Lightmaps, since cross-resolution stitching causes leaking
Lightmass debug primitives are now tied to a swarm task instead of global - allows debugging of Volumetric Lightmap tasks
Change 3792256 by Guillaume.Abadie
Fixes a bug where permutation was not actually serialised in FShader, so was ending up recompiling shader at every load.
Change 3792884 by Marcus.Wassmer
Copying //UE4/Partner-AMD to Dev-Rendering (//UE4/Dev-Rendering)
Change 3793200 by Marcus.Wassmer
Copying //UE4/Partner-IDV-SpeedTree to Dev-Rendering (//UE4/Dev-Rendering)
Speedtree 8 support
Change 3793206 by Brian.Karis
Added color grading control BlueCorrection to correct for artifacts with "electric" blues due to the ACEScg color space. Bright blue desaturates instead of going to violet.
Added color grading control ExpandGamut which expands bright saturated colors outside the sRGB gamut to fake wide gamut rendering.
ACES changes.
Change 3793344 by Marcus.Wassmer
Fix editortest compile
Change 3794285 by Guillaume.Abadie
Serializes PermutationId according to archive rendering version to avoid issues with old material that were serializing a shader map into UObject.
Change 3794307 by Guillaume.Abadie
Resaves uassets that were modified between 3789648 and 3794285
Change 3794627 by Mark.Satterthwaite
Implement two components for MTLPP, an IMP cache for Objective-C selector implementations & an interposition framework for those same selectors:
- imp_SelectorCache & friends provide the IMP caching for each of the Metal protocols which constitute most of the API, so far I've not covered the Metal classes used for the various descriptor/initializer types. Each type has its own IMPTable which caches the selector's implementation pointer and provides the mechanism to hook that implementation. As Objective-C is runtime dynamic this look up must be performed on the actual Class value returned by an object at runtime - you can't do this at compile time. Even things like NSString which appear compile-time static are really not as NSString is an alias for a class-cluster (NSString, NSMutableString, __NSInlineString and more).
- The interpose directory contains MTI* files which are the framework for interposing all the functions in Metal's runtime API - I deliberately omit the descriptor classes & read-only functions as there's no benefit to interposing them - which I can build off to create a trace tool or a superior validation layer. Right now this is Mac only as there'll be some problems to solve for iOS/tvOS due to difference in linking requirements - not insurmountable.
- Rebuild MTLPP's implementation of the C++ wrapper classes around the IMPTable's - this means we avoid all the objc_msgSend overhead for all the classes and functions whose implementations are cached. Right now the IMPTable is going to incur a look-up for all non-copy/move constructors which is suboptimal - ideally the Metal IMPTables would be cached in the Device object as they will be consistent within a single Device.
- Sort out the MTLPP availability logic - it now exports the availability warnings to the caller and internally just blithely assumes it may call the functions, the caller is responsible for ensuring that calls are made only on appropriate devices & OSes. This reduces MTLPP complexity and better fits how MetalRHI works.
- Fix a number of retain/release bugs that were lying dormant in MTLPP but exposed by the switch to IMPTables.
- Add tvOS support.
Next up, put this into MetalRHI and start fixing all the fallout.
Change 3794631 by Mark.Satterthwaite
Missed updating mtlpp's build.cs for TVOS.
Change 3794651 by Uriel.Doyon
UPointLightComponent::GetUnitsConversionFactor() now takes the cone angle as parameter. This allows to fix spotlight unit conversion when using lumens.
Change 3794720 by Guillaume.Abadie
Fixes a bug in Global{Bilinear,Trilinear}ClampedSampler that was actually doing a Point sampling.
Change 3794749 by Mark.Satterthwaite
Fix mtlpp.build.cs paths.
Change 3794856 by Mark.Satterthwaite
Fix some shadowing warnings.
Change 3795484 by Daniel.Wright
Implemented the Spherical Harmonic windowing algorithm from 'Stupid Spherical Harmonics (SH) Tricks'
New WorldSettings Lightmass property VolumetricLightmapSphericalHarmonicSmoothing controls the global amount of smoothing applied
Change 3795590 by Brian.Karis
Area light fixes
Fixed order of operations. This helps mixing of SourceRadius, SourceLength, and SoftSourceRadius.
Change 3796832 by Marcus.Wassmer
Correct shouldcache condition for new resolve shader
Change 3796884 by Marcus.Wassmer
Doing it right this time.
Change 3797196 by Mark.Satterthwaite
More updates to MTLPP to make things simpler and reduce the number of spurious Objective-C warnings that are emitted because of the way we are using the runtime.
Change 3797200 by Daniel.Wright
Lightmass now uses the highest density VolumetricLightmapDensityVolume settings that affect any part of a cell
Change 3797221 by Daniel.Wright
Reduced default SphericalHarmonicSmoothing based on RoboRecall tests. Now only active with strong direct lighting from static lights by default.
Change 3797411 by Brian.Karis
Disable ExpandGamut for old tone mapper.
Change 3797462 by Mark.Satterthwaite
More build warnings silenced after changing to the lowest possible deployment target OS for each library.
Change 3797585 by Mark.Satterthwaite
Range-based-For support in the NSArray wrapper.
Change 3797836 by Mark.Satterthwaite
Even more forward-declarations to avoid system headers poking through to the including code from mtlpp.
Change 3798027 by Mark.Satterthwaite
Fix handling of nil objects, on which no functions may be called, command-buffer retention and IMP declaration.
Change 3798154 by Mark.Satterthwaite
Fix some egregious memory leaks that rewriting to use mtlpp exposed before we carry on - don't want these slipping into 4.19.
Change 3800990 by Mark.Satterthwaite
Typedef all the completion-handler callback types in mtlpp to make future me's life easier.
Change 3801400 by Chris.Bunner
Improving automated test errors on failure to generate report data.
Change 3801726 by Mark.Satterthwaite
Correct some function availability and the command-buffer error status in mtlpp.
Change 3801808 by Chris.Bunner
Added DefaultScalability.ini to EngineTest that forces all quality levels to Engine default Epic for now to improve consistency.
Change 3801862 by Marcus.Wassmer
Update automated tests with color gamut change
Change 3802214 by Chris.Bunner
When running automated tests in and editor-locked PIE viewport, skip resizing as the editor can't handle this.
Added bindable delegate called when ScreenshotRequest is processed - Useful to allow screenshots to override and restore settings per capture.
#jira UE-53188
Change 3802243 by Chris.Bunner
Added button to automated test screenshot browser to add or replace all outstanding test reports if appropriate.
DeleteAllReports button is now only enabled whilst there are reports in the list.
Change 3802372 by Chris.Bunner
Updating more test screenshots.
Change 3803683 by Chris.Bunner
Adding more logging and multiple attempts to automated test report network save.
Added small wait on repeated operations that are known to fail.
Change 3803826 by Rolando.Caloca
DR - vk - Fix merge issue
Change 3804181 by Chris.Bunner
Tentative fix for CIS test failure.
Change 3804236 by Chris.Bunner
Additional logging for case where file write silently fails, report platform-specific error.
Change 3804303 by zachary.wilson
Cleaning up assets in QAGame saved with empty engine versions to resolve warnings seen when launching on
Change 3804410 by Chris.Bunner
Added additional logging when automated screenshot test fails due to size mismatch.
Mismatched bounds are colored red in the delta.
Change 3804455 by Mark.Satterthwaite
Fix a small number of persistent memory leaks on the Mac build that slowly consume more and more memory as you use the Editor - interacting with menu's was particularly egregious as each NSMenu would leak after you move away.
#jira NA
Change 3804667 by Chris.Bunner
Speculative CIS fixes.
Change 3806008 by Chris.Bunner
Partially reimplementing backed-out CL 3804181 to improve consistency of how automated screenshot test settings are applied/restored.
#tests CIS preflight job 8174412
Change 3806909 by Mark.Satterthwaite
Use the vertex-shader's in-out mask to ensure that we only validate legitmate vertex-streams in Metal's DrawIndexedPrimitive implementation.
#jira UE-53046
Change 3807059 by laz.matech
Checking in QAGame Rendering Map, QA-PhysicalLightingUnits, for testing Physical Light Units.
Wanted to get this in before copy up.
#Jira none
Change 3807726 by Chris.Bunner
Removed a check that we can't fix up. The check hits unbound buffers which it assumes means a failure but is actually due to m.v.fetch. We don't have the information available to know which are which removed from the input without reading from the shader.
#jira UE-53046
Change 3807800 by Guillaume.Abadie
Fixes some warning in shader headers.
Change 3807804 by Guillaume.Abadie
Back out changelist 3807800
Change 3807807 by Guillaume.Abadie
Relands shader header warnings.
Change 3808046 by Chris.Bunner
Dropping a new automated test error back to a warning as this may lead to genuine issues being ignored in the short term.
Change 3809579 by Chris.Bunner
Back out changelist 3774677.
#jira UE-53483
Change 3809620 by Chris.Bunner
Updating animated cloth test screenshot.
Change 3803629 by Chris.Bunner
Rebuilt CornellBox and DistanceField test maps, updated screenshots.
Change 3787045 by Guillaume.Abadie
Moves some global samplers to Common.ush
Change 3809756 by Chris.Bunner
Updating animated cloth test screenshot.
[CL 3809764 by Chris Bunner in Main branch]
2017-12-15 12:47:47 -05:00
static bool ShouldCompilePermutation ( const FGlobalShaderPermutationParameters & Parameters )
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
{
2021-07-12 10:24:46 -04:00
return ShouldCompileDistanceFieldShaders ( Parameters . Platform ) ;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
}
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3809756)
#rb None
#lockdown Nick.Penwarden
============================
MAJOR FEATURES & CHANGES
============================
Change 3629223 by Rolando.Caloca
DR - Rollback //UE4/Dev-Rendering/Engine/Source/Runtime/VulkanRHI to changelist 3627847
Change 3629708 by Rolando.Caloca
DR - vk - Redo some changes from DevMobile
3601439
3604186
3606672
3617383
3617474
3617483
Change 3761370 by Arne.Schober
DR - Added CityHash to use with conatiners and stuff. It provides good performance and high quallity across multiple platforms.
Change 3761437 by Guillaume.Abadie
Optimises motion blur compute shader for consoles.
Change 3761483 by Guillaume.Abadie
Fixes D3D11 RHI lying to dynamic resolution heuristic with t.MaxFPS.
Change 3761995 by Mark.Satterthwaite
Add the Metal compiler path to the local .pch filename to avoid problems when Xcode moves.
Change 3761996 by Mark.Satterthwaite
Emit more details when a pixel shader is found to have no outputs at all which Metal doesn't permit. More likely this is a bug in the shader compiler not configuring the in-out mask correctly...
#jira UE-52292
Change 3761999 by Mark.Satterthwaite
No need to avoid tessellation for FMetalRHICommandContext::RHIEndDrawIndexedPrimitiveUP anymore - that was from back when the tessellation logic was replicated in each RHI*Draw* implementation.
#jira UE-51937
Change 3762181 by Joe.Graf
Changed MaxShaderJobBatchSize to 25 on Mac as it reduced shader compile time by 21%
Change 3762607 by Mark.Satterthwaite
Remove accidentally included changes from 3761995.
Change 3762612 by Mark.Satterthwaite
Enable the explicit sincos intrinsic for Metal to avoid instances of UE-52477 that can cause shaders to compile incorrectly through hlslcc.
#jira UE-52477
Change 3762772 by Michael.Lentine
Move RHI calls to render thread.
Change 3763021 by Richard.Wallis
Remove shader cache tool project and implementation.
#jira UE-51613
Change 3763082 by Guillaume.Abadie
More SceneTexture, SceneColor and SceneDepth automated tests
Change 3763111 by Richard.Wallis
Clone of CL 3763033 (Release-4.18):
Fix for crash upon launching packaged game on Mac with Share Material Shader Code enabled.
#jira UE-52121
Change 3763657 by Michael.Lentine
Invalidate ddc for skeletal mesh render data so that the duplicated vertex render structures are properly serialized.
Change 3763727 by Jian.Ru
Fix Player Collision view mode. It is caused by checking an uninitialized vertex buffer so the check always fail.
#jira UE-52052
Change 3763738 by Guillaume.Abadie
Implements SSR input post process material location.
Change 3764271 by Mark.Satterthwaite
Allow ControlPointPatch lists to flow through MetalRHI as it was setup to handle this transparently - the VSHS compute shader will convert them to triangles to draw. Report the same warning as in the pipeline creation stage as this hasn't been formally validated.
#jira UE-52454
Change 3764316 by Daniel.Wright
Added AVolumetricLightmapDensityVolume - gives local control over Volumetric Lightmap density. Dropping the top mip outside of the play area in Monolith saves 20Mb (35Mb original).
Volumetric Lightmap no longer refines around static translucent geometry - saves 5Mb in Monolith
Reworked brick culling by error mechanism. Now compares error to interpolated parent lighting instead of the brick average - prevents dropping constant value bricks which are near a wall and cause leaking due to parent interpolation after being culled.
Change 3764318 by Daniel.Wright
Missing file
Change 3764321 by Daniel.Wright
Shader compiling memory optimizations
* Editor memory: Sharing uniform buffer includes and GeneratedInstancedStereo.ush per FShaderType (was previously duplicated per FShader job)
* SCW input size: Sharing uniform buffer includes and SharedEnvironment per batch
* 7.6Gb of shader job inputs in memory -> .5Gb (13x less) when doing a full shader compile of Paragon Editor
* 13.8Gb written into worker input files -> 2.9Gb (4.7x less). Global shaders are never batched when sent to SCW so unoptimized by these changes.
Change 3764595 by Daniel.Wright
Added VolumetricLightmapDensityVolume asset icons
Change 3764701 by Michael.Lentine
Add duplicated vertices merging for meshmerge.
Change 3766002 by Guillaume.Abadie
Fixes a crash in translucency.
Change 3766007 by Guillaume.Abadie
Oups.... Fixes compilation failure.
Change 3766697 by Guillaume.Abadie
Giant refactor of global shader interface for upcoming native support of permutation.
CL generated by python script.
Change 3767205 by Chris.Bunner
Deferring FMaterial::RenderingThreadShaderMap update to render-thread rather than assumption commands have been flushed.
#jira UE-50652
Change 3767207 by Chris.Bunner
Clamp fetched texture coordinates to those available on the mesh.
Change 3767209 by Chris.Bunner
PR #4203: Early-outs in UMaterialInstance parameter setters (Contributed by stefanzimecki)
#jira UE-52193
Change 3767772 by Mark.Satterthwaite
MetalShaderFormat will no longer fallback to text shaders when you ask it to compile to bytecode but the bytecode compiler is not available (either locally or remotely) - this ensures that the DDC can't be poisoned by incorrectly configured clients. The Editor is already setup such that if the remote shader compiler is not configured & Xcode is not available locally the shader-compiler will be invoked to generate text shaders.
#jira UE-52554
Change 3768604 by Guillaume.Abadie
Polish up with new global shader function signature.
Change 3768993 by Guillaume.Abadie
Fixes r.Upscale.Panini cvars
Change 3769478 by Mark.Satterthwaite
Move the ue4_stdlib.metal & PCH into a temporary directory that exists for the lifetime of the SCW on the remote side as well as the local one and add this path as an include directory.
#jira UE-52587
Change 3769703 by Mark.Satterthwaite
For all Metal platforms >= Metal v1.2 transform mul(a,b) into fma(a,b,0) to prevent the Apple compiler reordering operations differently between the base & depth passes which results in variance in the position output.
For iOS disable fast-math when the vertex-shader uses World-Position-Offset because there are additional problems on the iOS shader compiler that result in position variance even with the above fix - WPO performance will suffer but I don't have any alternatives.
Remove the depth-offset hack from the depth-only vertex shader again.
#jira UES-5651
Change 3769763 by Mark.Satterthwaite
Handle swizzle's in the hlslcc fma identification pass so that we reduce the number of instructions and the platform compiler can't break the instructions up.
Change 3769849 by Mark.Satterthwaite
Fix CIS error.
Change 3770517 by Richard.Wallis
Fix for crash when creating a new media texture (AppleIntelHD5000GraphicsMTLDriver!SamplerStage::bindSamplerToTexture()). Missing texture resource for binding. Old InitDynamicRHI() code has been refactored out into seperate functions which leaves us on Mac with a NULL resource initially after creation which Metal doesn't like. This fix puts InitDynamicRHI down the default setup/clear path which inits default resources - I don't think we should use a global dummy in this instance as this is a render target.
#jira UE-51940
Change 3770688 by Uriel.Doyon
Fixed texture resolution returning 0 when running blueprint construction scripts at cook time.
Change 3771115 by Mark.Satterthwaite
Report errors from failed attempts to compile global shaders or we can't see why things fail on non-Windows platforms.
Change 3771263 by Mark.Satterthwaite
Change the way ManualVertexFetch is enabled on Metal platforms so that it is enabled when targeting Metal v1.2 and higher (macOS 10.12+/iOS 10+). This brings iOS in the Desktop Forward renderer back into line with the Mac.
#jira UERNDR-300
Change 3773472 by Guillaume.Abadie
Fixes a crash on PIE of SimpleComposure project.
Change 3773475 by Guillaume.Abadie
Fixes bug in editor viewport caused by SSR input changes.
Change 3774677 by Arne.Schober
DR - Deprecated SetLocal from the RHICmdlist
Fixed some unnecessary PSO collisions.
Change 3777037 by Mark.Satterthwaite
Remove incorrect change that caused a reference to "accurate::sincos" to appear in some Metal shaders rather than "precise::sincos".
Change 3777122 by Mark.Satterthwaite
Back out changelist 3777037 - I'm blind and wasn't seeing the real problem was a stale shader cache...
Change 3777196 by Mark.Satterthwaite
Fix text-shader compilation on iOS 10 - maybe iOS 9 too (untested!).
We need our own make_scalar type-trait template for ue4_stdlib.metal so that we still compile with older iOS runtime compilers and we can't use as_type to directly implement the packHalf2x16/unpackHalf2x16 intrinsics for these older runtime compilers either.
Change 3779098 by Rolando.Caloca
DR - vk - Fix query index
Change 3779275 by Mark.Satterthwaite
Silence the Metal runtime compiler warning caused by use of a deprecated enum value when running text shaders compiled for Metal v1.0/1.1 on a Metal v1.2+ OS.
#jira UE-52554
Change 3779427 by Rolando.Caloca
DR - vk - Fix for allocator contention
Change 3779608 by Uriel.Doyon
Fixed invalid access in the resave package commantlet when building texture streaming material data for materials enabling tesselation.
Change 3784496 by Mark.Satterthwaite
Temporarily disable USE_OBJECT_COMPOSITING_TILE_CULLING for Metal shader compilation only - other platforms are unaffected - as it isn't working properly for some reason. need to work out what's up but don't want Distance Fields to be completely snookered in the interim.
#jira UE-52952
Change 3784608 by Rolando.Caloca
DR - Copy 3784588
- Fix for drivers returning out of date swapchains during resizes
Change 3784734 by Mark.Satterthwaite
Real fix for UE-52952 - MetalShaderFormat wasn't propagating the full thread-group value.
#jira UE-52952
Change 3784741 by Mark.Satterthwaite
More Metal debugging commandline options "-metalfastmath" & "-metalnofastmath" to force fast-math on or off for all shaders, must be using runtime-compiled shaders (i.e. -metalshaderdebug or r.Shaders.Optimise=0) to take effect.
Change 3787103 by Guillaume.Abadie
Kills BuiltinSamplers UB
Change 3787207 by Guillaume.Abadie
Sorry, compile fix that were fine with local changes...
Change 3787396 by Marcus.Wassmer
PR #4271: UE-52901: Set VIS_Max meta to hidden (Contributed by projectgheist)
Change 3788028 by Peter.Sumanaseni
Working linear HDR exr output from sequencer
Change 3788536 by Mark.Satterthwaite
Track whether the Metal shader uses the discard_fragment function as when this is used but without any other outputs we know we need to bind at least one render-target or a depth-stencil surface but we don't know which. This lets us correctly error when we encounter a shader with no outputs at all which Metal doesn't permit.
#jira UE-52292
Change 3788538 by Mark.Satterthwaite
Let's try mitigating UE-46604 on Nvidia by retaining resource references in the command-buffer. This shouldn't be necessary and isn't typically on other vendors but we haven't been able to reproduce this reliably enough to get to the bottom of it.
#jira UE-46604
Change 3789083 by Guillaume.Abadie
Implements global shader permutations. Example in ScreenSpaceReflections.cpp.
Change 3789090 by Guillaume.Abadie
Fixes linux build.
Change 3789106 by Guillaume.Abadie
Fixes compilation failure in niagara plugin.
Change 3789274 by Guillaume.Abadie
Avoid hit proxies to clobber TAA's hitsory.
#jira UE-52968
Change 3789380 by Guillaume.Abadie
Back out changelist 3789083: global shader permutation because compilation failure in clang.
Change 3789648 by Guillaume.Abadie
Relands global shader permutation, with clang support.
Change 3789712 by Guillaume.Abadie
Fixes TestImage show flag with TAAU on.
#jira UE-53061
Change 3791593 by Guillaume.Abadie
Reinvalidates shaders with shader permutations.
Change 3791884 by Daniel.Wright
Added BP setter for LowerHemisphereColor
Change 3791886 by Daniel.Wright
Added LightmapType to PrimitiveComponent
* ForceVolumetric allows forcing static geometry to use Volumetric Lightmaps, which can be useful on instanced foliage where seams are prevalent. Lightmass internal caching still requires lightmap UVs and reasonable lightmap resolution.
* ForceSurface replaces bLightAsIfStatic
Improvements to Volumetric Lightmap quality needed for static geometry
* Stationary light shadowing is now dilated inside geometry
* Now doing two dilation passes since samples near geometry see inside due to ray start bias
* Refinement around geometry uses an expanded cell bounds when the geometry is going to use Volumetric Lightmaps, since cross-resolution stitching causes leaking
Lightmass debug primitives are now tied to a swarm task instead of global - allows debugging of Volumetric Lightmap tasks
Change 3792256 by Guillaume.Abadie
Fixes a bug where permutation was not actually serialised in FShader, so was ending up recompiling shader at every load.
Change 3792884 by Marcus.Wassmer
Copying //UE4/Partner-AMD to Dev-Rendering (//UE4/Dev-Rendering)
Change 3793200 by Marcus.Wassmer
Copying //UE4/Partner-IDV-SpeedTree to Dev-Rendering (//UE4/Dev-Rendering)
Speedtree 8 support
Change 3793206 by Brian.Karis
Added color grading control BlueCorrection to correct for artifacts with "electric" blues due to the ACEScg color space. Bright blue desaturates instead of going to violet.
Added color grading control ExpandGamut which expands bright saturated colors outside the sRGB gamut to fake wide gamut rendering.
ACES changes.
Change 3793344 by Marcus.Wassmer
Fix editortest compile
Change 3794285 by Guillaume.Abadie
Serializes PermutationId according to archive rendering version to avoid issues with old material that were serializing a shader map into UObject.
Change 3794307 by Guillaume.Abadie
Resaves uassets that were modified between 3789648 and 3794285
Change 3794627 by Mark.Satterthwaite
Implement two components for MTLPP, an IMP cache for Objective-C selector implementations & an interposition framework for those same selectors:
- imp_SelectorCache & friends provide the IMP caching for each of the Metal protocols which constitute most of the API, so far I've not covered the Metal classes used for the various descriptor/initializer types. Each type has its own IMPTable which caches the selector's implementation pointer and provides the mechanism to hook that implementation. As Objective-C is runtime dynamic this look up must be performed on the actual Class value returned by an object at runtime - you can't do this at compile time. Even things like NSString which appear compile-time static are really not as NSString is an alias for a class-cluster (NSString, NSMutableString, __NSInlineString and more).
- The interpose directory contains MTI* files which are the framework for interposing all the functions in Metal's runtime API - I deliberately omit the descriptor classes & read-only functions as there's no benefit to interposing them - which I can build off to create a trace tool or a superior validation layer. Right now this is Mac only as there'll be some problems to solve for iOS/tvOS due to difference in linking requirements - not insurmountable.
- Rebuild MTLPP's implementation of the C++ wrapper classes around the IMPTable's - this means we avoid all the objc_msgSend overhead for all the classes and functions whose implementations are cached. Right now the IMPTable is going to incur a look-up for all non-copy/move constructors which is suboptimal - ideally the Metal IMPTables would be cached in the Device object as they will be consistent within a single Device.
- Sort out the MTLPP availability logic - it now exports the availability warnings to the caller and internally just blithely assumes it may call the functions, the caller is responsible for ensuring that calls are made only on appropriate devices & OSes. This reduces MTLPP complexity and better fits how MetalRHI works.
- Fix a number of retain/release bugs that were lying dormant in MTLPP but exposed by the switch to IMPTables.
- Add tvOS support.
Next up, put this into MetalRHI and start fixing all the fallout.
Change 3794631 by Mark.Satterthwaite
Missed updating mtlpp's build.cs for TVOS.
Change 3794651 by Uriel.Doyon
UPointLightComponent::GetUnitsConversionFactor() now takes the cone angle as parameter. This allows to fix spotlight unit conversion when using lumens.
Change 3794720 by Guillaume.Abadie
Fixes a bug in Global{Bilinear,Trilinear}ClampedSampler that was actually doing a Point sampling.
Change 3794749 by Mark.Satterthwaite
Fix mtlpp.build.cs paths.
Change 3794856 by Mark.Satterthwaite
Fix some shadowing warnings.
Change 3795484 by Daniel.Wright
Implemented the Spherical Harmonic windowing algorithm from 'Stupid Spherical Harmonics (SH) Tricks'
New WorldSettings Lightmass property VolumetricLightmapSphericalHarmonicSmoothing controls the global amount of smoothing applied
Change 3795590 by Brian.Karis
Area light fixes
Fixed order of operations. This helps mixing of SourceRadius, SourceLength, and SoftSourceRadius.
Change 3796832 by Marcus.Wassmer
Correct shouldcache condition for new resolve shader
Change 3796884 by Marcus.Wassmer
Doing it right this time.
Change 3797196 by Mark.Satterthwaite
More updates to MTLPP to make things simpler and reduce the number of spurious Objective-C warnings that are emitted because of the way we are using the runtime.
Change 3797200 by Daniel.Wright
Lightmass now uses the highest density VolumetricLightmapDensityVolume settings that affect any part of a cell
Change 3797221 by Daniel.Wright
Reduced default SphericalHarmonicSmoothing based on RoboRecall tests. Now only active with strong direct lighting from static lights by default.
Change 3797411 by Brian.Karis
Disable ExpandGamut for old tone mapper.
Change 3797462 by Mark.Satterthwaite
More build warnings silenced after changing to the lowest possible deployment target OS for each library.
Change 3797585 by Mark.Satterthwaite
Range-based-For support in the NSArray wrapper.
Change 3797836 by Mark.Satterthwaite
Even more forward-declarations to avoid system headers poking through to the including code from mtlpp.
Change 3798027 by Mark.Satterthwaite
Fix handling of nil objects, on which no functions may be called, command-buffer retention and IMP declaration.
Change 3798154 by Mark.Satterthwaite
Fix some egregious memory leaks that rewriting to use mtlpp exposed before we carry on - don't want these slipping into 4.19.
Change 3800990 by Mark.Satterthwaite
Typedef all the completion-handler callback types in mtlpp to make future me's life easier.
Change 3801400 by Chris.Bunner
Improving automated test errors on failure to generate report data.
Change 3801726 by Mark.Satterthwaite
Correct some function availability and the command-buffer error status in mtlpp.
Change 3801808 by Chris.Bunner
Added DefaultScalability.ini to EngineTest that forces all quality levels to Engine default Epic for now to improve consistency.
Change 3801862 by Marcus.Wassmer
Update automated tests with color gamut change
Change 3802214 by Chris.Bunner
When running automated tests in and editor-locked PIE viewport, skip resizing as the editor can't handle this.
Added bindable delegate called when ScreenshotRequest is processed - Useful to allow screenshots to override and restore settings per capture.
#jira UE-53188
Change 3802243 by Chris.Bunner
Added button to automated test screenshot browser to add or replace all outstanding test reports if appropriate.
DeleteAllReports button is now only enabled whilst there are reports in the list.
Change 3802372 by Chris.Bunner
Updating more test screenshots.
Change 3803683 by Chris.Bunner
Adding more logging and multiple attempts to automated test report network save.
Added small wait on repeated operations that are known to fail.
Change 3803826 by Rolando.Caloca
DR - vk - Fix merge issue
Change 3804181 by Chris.Bunner
Tentative fix for CIS test failure.
Change 3804236 by Chris.Bunner
Additional logging for case where file write silently fails, report platform-specific error.
Change 3804303 by zachary.wilson
Cleaning up assets in QAGame saved with empty engine versions to resolve warnings seen when launching on
Change 3804410 by Chris.Bunner
Added additional logging when automated screenshot test fails due to size mismatch.
Mismatched bounds are colored red in the delta.
Change 3804455 by Mark.Satterthwaite
Fix a small number of persistent memory leaks on the Mac build that slowly consume more and more memory as you use the Editor - interacting with menu's was particularly egregious as each NSMenu would leak after you move away.
#jira NA
Change 3804667 by Chris.Bunner
Speculative CIS fixes.
Change 3806008 by Chris.Bunner
Partially reimplementing backed-out CL 3804181 to improve consistency of how automated screenshot test settings are applied/restored.
#tests CIS preflight job 8174412
Change 3806909 by Mark.Satterthwaite
Use the vertex-shader's in-out mask to ensure that we only validate legitmate vertex-streams in Metal's DrawIndexedPrimitive implementation.
#jira UE-53046
Change 3807059 by laz.matech
Checking in QAGame Rendering Map, QA-PhysicalLightingUnits, for testing Physical Light Units.
Wanted to get this in before copy up.
#Jira none
Change 3807726 by Chris.Bunner
Removed a check that we can't fix up. The check hits unbound buffers which it assumes means a failure but is actually due to m.v.fetch. We don't have the information available to know which are which removed from the input without reading from the shader.
#jira UE-53046
Change 3807800 by Guillaume.Abadie
Fixes some warning in shader headers.
Change 3807804 by Guillaume.Abadie
Back out changelist 3807800
Change 3807807 by Guillaume.Abadie
Relands shader header warnings.
Change 3808046 by Chris.Bunner
Dropping a new automated test error back to a warning as this may lead to genuine issues being ignored in the short term.
Change 3809579 by Chris.Bunner
Back out changelist 3774677.
#jira UE-53483
Change 3809620 by Chris.Bunner
Updating animated cloth test screenshot.
Change 3803629 by Chris.Bunner
Rebuilt CornellBox and DistanceField test maps, updated screenshots.
Change 3787045 by Guillaume.Abadie
Moves some global samplers to Common.ush
Change 3809756 by Chris.Bunner
Updating animated cloth test screenshot.
[CL 3809764 by Chris Bunner in Main branch]
2017-12-15 12:47:47 -05:00
static void ModifyCompilationEnvironment ( const FGlobalShaderPermutationParameters & Parameters , FShaderCompilerEnvironment & OutEnvironment )
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
{
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3809756)
#rb None
#lockdown Nick.Penwarden
============================
MAJOR FEATURES & CHANGES
============================
Change 3629223 by Rolando.Caloca
DR - Rollback //UE4/Dev-Rendering/Engine/Source/Runtime/VulkanRHI to changelist 3627847
Change 3629708 by Rolando.Caloca
DR - vk - Redo some changes from DevMobile
3601439
3604186
3606672
3617383
3617474
3617483
Change 3761370 by Arne.Schober
DR - Added CityHash to use with conatiners and stuff. It provides good performance and high quallity across multiple platforms.
Change 3761437 by Guillaume.Abadie
Optimises motion blur compute shader for consoles.
Change 3761483 by Guillaume.Abadie
Fixes D3D11 RHI lying to dynamic resolution heuristic with t.MaxFPS.
Change 3761995 by Mark.Satterthwaite
Add the Metal compiler path to the local .pch filename to avoid problems when Xcode moves.
Change 3761996 by Mark.Satterthwaite
Emit more details when a pixel shader is found to have no outputs at all which Metal doesn't permit. More likely this is a bug in the shader compiler not configuring the in-out mask correctly...
#jira UE-52292
Change 3761999 by Mark.Satterthwaite
No need to avoid tessellation for FMetalRHICommandContext::RHIEndDrawIndexedPrimitiveUP anymore - that was from back when the tessellation logic was replicated in each RHI*Draw* implementation.
#jira UE-51937
Change 3762181 by Joe.Graf
Changed MaxShaderJobBatchSize to 25 on Mac as it reduced shader compile time by 21%
Change 3762607 by Mark.Satterthwaite
Remove accidentally included changes from 3761995.
Change 3762612 by Mark.Satterthwaite
Enable the explicit sincos intrinsic for Metal to avoid instances of UE-52477 that can cause shaders to compile incorrectly through hlslcc.
#jira UE-52477
Change 3762772 by Michael.Lentine
Move RHI calls to render thread.
Change 3763021 by Richard.Wallis
Remove shader cache tool project and implementation.
#jira UE-51613
Change 3763082 by Guillaume.Abadie
More SceneTexture, SceneColor and SceneDepth automated tests
Change 3763111 by Richard.Wallis
Clone of CL 3763033 (Release-4.18):
Fix for crash upon launching packaged game on Mac with Share Material Shader Code enabled.
#jira UE-52121
Change 3763657 by Michael.Lentine
Invalidate ddc for skeletal mesh render data so that the duplicated vertex render structures are properly serialized.
Change 3763727 by Jian.Ru
Fix Player Collision view mode. It is caused by checking an uninitialized vertex buffer so the check always fail.
#jira UE-52052
Change 3763738 by Guillaume.Abadie
Implements SSR input post process material location.
Change 3764271 by Mark.Satterthwaite
Allow ControlPointPatch lists to flow through MetalRHI as it was setup to handle this transparently - the VSHS compute shader will convert them to triangles to draw. Report the same warning as in the pipeline creation stage as this hasn't been formally validated.
#jira UE-52454
Change 3764316 by Daniel.Wright
Added AVolumetricLightmapDensityVolume - gives local control over Volumetric Lightmap density. Dropping the top mip outside of the play area in Monolith saves 20Mb (35Mb original).
Volumetric Lightmap no longer refines around static translucent geometry - saves 5Mb in Monolith
Reworked brick culling by error mechanism. Now compares error to interpolated parent lighting instead of the brick average - prevents dropping constant value bricks which are near a wall and cause leaking due to parent interpolation after being culled.
Change 3764318 by Daniel.Wright
Missing file
Change 3764321 by Daniel.Wright
Shader compiling memory optimizations
* Editor memory: Sharing uniform buffer includes and GeneratedInstancedStereo.ush per FShaderType (was previously duplicated per FShader job)
* SCW input size: Sharing uniform buffer includes and SharedEnvironment per batch
* 7.6Gb of shader job inputs in memory -> .5Gb (13x less) when doing a full shader compile of Paragon Editor
* 13.8Gb written into worker input files -> 2.9Gb (4.7x less). Global shaders are never batched when sent to SCW so unoptimized by these changes.
Change 3764595 by Daniel.Wright
Added VolumetricLightmapDensityVolume asset icons
Change 3764701 by Michael.Lentine
Add duplicated vertices merging for meshmerge.
Change 3766002 by Guillaume.Abadie
Fixes a crash in translucency.
Change 3766007 by Guillaume.Abadie
Oups.... Fixes compilation failure.
Change 3766697 by Guillaume.Abadie
Giant refactor of global shader interface for upcoming native support of permutation.
CL generated by python script.
Change 3767205 by Chris.Bunner
Deferring FMaterial::RenderingThreadShaderMap update to render-thread rather than assumption commands have been flushed.
#jira UE-50652
Change 3767207 by Chris.Bunner
Clamp fetched texture coordinates to those available on the mesh.
Change 3767209 by Chris.Bunner
PR #4203: Early-outs in UMaterialInstance parameter setters (Contributed by stefanzimecki)
#jira UE-52193
Change 3767772 by Mark.Satterthwaite
MetalShaderFormat will no longer fallback to text shaders when you ask it to compile to bytecode but the bytecode compiler is not available (either locally or remotely) - this ensures that the DDC can't be poisoned by incorrectly configured clients. The Editor is already setup such that if the remote shader compiler is not configured & Xcode is not available locally the shader-compiler will be invoked to generate text shaders.
#jira UE-52554
Change 3768604 by Guillaume.Abadie
Polish up with new global shader function signature.
Change 3768993 by Guillaume.Abadie
Fixes r.Upscale.Panini cvars
Change 3769478 by Mark.Satterthwaite
Move the ue4_stdlib.metal & PCH into a temporary directory that exists for the lifetime of the SCW on the remote side as well as the local one and add this path as an include directory.
#jira UE-52587
Change 3769703 by Mark.Satterthwaite
For all Metal platforms >= Metal v1.2 transform mul(a,b) into fma(a,b,0) to prevent the Apple compiler reordering operations differently between the base & depth passes which results in variance in the position output.
For iOS disable fast-math when the vertex-shader uses World-Position-Offset because there are additional problems on the iOS shader compiler that result in position variance even with the above fix - WPO performance will suffer but I don't have any alternatives.
Remove the depth-offset hack from the depth-only vertex shader again.
#jira UES-5651
Change 3769763 by Mark.Satterthwaite
Handle swizzle's in the hlslcc fma identification pass so that we reduce the number of instructions and the platform compiler can't break the instructions up.
Change 3769849 by Mark.Satterthwaite
Fix CIS error.
Change 3770517 by Richard.Wallis
Fix for crash when creating a new media texture (AppleIntelHD5000GraphicsMTLDriver!SamplerStage::bindSamplerToTexture()). Missing texture resource for binding. Old InitDynamicRHI() code has been refactored out into seperate functions which leaves us on Mac with a NULL resource initially after creation which Metal doesn't like. This fix puts InitDynamicRHI down the default setup/clear path which inits default resources - I don't think we should use a global dummy in this instance as this is a render target.
#jira UE-51940
Change 3770688 by Uriel.Doyon
Fixed texture resolution returning 0 when running blueprint construction scripts at cook time.
Change 3771115 by Mark.Satterthwaite
Report errors from failed attempts to compile global shaders or we can't see why things fail on non-Windows platforms.
Change 3771263 by Mark.Satterthwaite
Change the way ManualVertexFetch is enabled on Metal platforms so that it is enabled when targeting Metal v1.2 and higher (macOS 10.12+/iOS 10+). This brings iOS in the Desktop Forward renderer back into line with the Mac.
#jira UERNDR-300
Change 3773472 by Guillaume.Abadie
Fixes a crash on PIE of SimpleComposure project.
Change 3773475 by Guillaume.Abadie
Fixes bug in editor viewport caused by SSR input changes.
Change 3774677 by Arne.Schober
DR - Deprecated SetLocal from the RHICmdlist
Fixed some unnecessary PSO collisions.
Change 3777037 by Mark.Satterthwaite
Remove incorrect change that caused a reference to "accurate::sincos" to appear in some Metal shaders rather than "precise::sincos".
Change 3777122 by Mark.Satterthwaite
Back out changelist 3777037 - I'm blind and wasn't seeing the real problem was a stale shader cache...
Change 3777196 by Mark.Satterthwaite
Fix text-shader compilation on iOS 10 - maybe iOS 9 too (untested!).
We need our own make_scalar type-trait template for ue4_stdlib.metal so that we still compile with older iOS runtime compilers and we can't use as_type to directly implement the packHalf2x16/unpackHalf2x16 intrinsics for these older runtime compilers either.
Change 3779098 by Rolando.Caloca
DR - vk - Fix query index
Change 3779275 by Mark.Satterthwaite
Silence the Metal runtime compiler warning caused by use of a deprecated enum value when running text shaders compiled for Metal v1.0/1.1 on a Metal v1.2+ OS.
#jira UE-52554
Change 3779427 by Rolando.Caloca
DR - vk - Fix for allocator contention
Change 3779608 by Uriel.Doyon
Fixed invalid access in the resave package commantlet when building texture streaming material data for materials enabling tesselation.
Change 3784496 by Mark.Satterthwaite
Temporarily disable USE_OBJECT_COMPOSITING_TILE_CULLING for Metal shader compilation only - other platforms are unaffected - as it isn't working properly for some reason. need to work out what's up but don't want Distance Fields to be completely snookered in the interim.
#jira UE-52952
Change 3784608 by Rolando.Caloca
DR - Copy 3784588
- Fix for drivers returning out of date swapchains during resizes
Change 3784734 by Mark.Satterthwaite
Real fix for UE-52952 - MetalShaderFormat wasn't propagating the full thread-group value.
#jira UE-52952
Change 3784741 by Mark.Satterthwaite
More Metal debugging commandline options "-metalfastmath" & "-metalnofastmath" to force fast-math on or off for all shaders, must be using runtime-compiled shaders (i.e. -metalshaderdebug or r.Shaders.Optimise=0) to take effect.
Change 3787103 by Guillaume.Abadie
Kills BuiltinSamplers UB
Change 3787207 by Guillaume.Abadie
Sorry, compile fix that were fine with local changes...
Change 3787396 by Marcus.Wassmer
PR #4271: UE-52901: Set VIS_Max meta to hidden (Contributed by projectgheist)
Change 3788028 by Peter.Sumanaseni
Working linear HDR exr output from sequencer
Change 3788536 by Mark.Satterthwaite
Track whether the Metal shader uses the discard_fragment function as when this is used but without any other outputs we know we need to bind at least one render-target or a depth-stencil surface but we don't know which. This lets us correctly error when we encounter a shader with no outputs at all which Metal doesn't permit.
#jira UE-52292
Change 3788538 by Mark.Satterthwaite
Let's try mitigating UE-46604 on Nvidia by retaining resource references in the command-buffer. This shouldn't be necessary and isn't typically on other vendors but we haven't been able to reproduce this reliably enough to get to the bottom of it.
#jira UE-46604
Change 3789083 by Guillaume.Abadie
Implements global shader permutations. Example in ScreenSpaceReflections.cpp.
Change 3789090 by Guillaume.Abadie
Fixes linux build.
Change 3789106 by Guillaume.Abadie
Fixes compilation failure in niagara plugin.
Change 3789274 by Guillaume.Abadie
Avoid hit proxies to clobber TAA's hitsory.
#jira UE-52968
Change 3789380 by Guillaume.Abadie
Back out changelist 3789083: global shader permutation because compilation failure in clang.
Change 3789648 by Guillaume.Abadie
Relands global shader permutation, with clang support.
Change 3789712 by Guillaume.Abadie
Fixes TestImage show flag with TAAU on.
#jira UE-53061
Change 3791593 by Guillaume.Abadie
Reinvalidates shaders with shader permutations.
Change 3791884 by Daniel.Wright
Added BP setter for LowerHemisphereColor
Change 3791886 by Daniel.Wright
Added LightmapType to PrimitiveComponent
* ForceVolumetric allows forcing static geometry to use Volumetric Lightmaps, which can be useful on instanced foliage where seams are prevalent. Lightmass internal caching still requires lightmap UVs and reasonable lightmap resolution.
* ForceSurface replaces bLightAsIfStatic
Improvements to Volumetric Lightmap quality needed for static geometry
* Stationary light shadowing is now dilated inside geometry
* Now doing two dilation passes since samples near geometry see inside due to ray start bias
* Refinement around geometry uses an expanded cell bounds when the geometry is going to use Volumetric Lightmaps, since cross-resolution stitching causes leaking
Lightmass debug primitives are now tied to a swarm task instead of global - allows debugging of Volumetric Lightmap tasks
Change 3792256 by Guillaume.Abadie
Fixes a bug where permutation was not actually serialised in FShader, so was ending up recompiling shader at every load.
Change 3792884 by Marcus.Wassmer
Copying //UE4/Partner-AMD to Dev-Rendering (//UE4/Dev-Rendering)
Change 3793200 by Marcus.Wassmer
Copying //UE4/Partner-IDV-SpeedTree to Dev-Rendering (//UE4/Dev-Rendering)
Speedtree 8 support
Change 3793206 by Brian.Karis
Added color grading control BlueCorrection to correct for artifacts with "electric" blues due to the ACEScg color space. Bright blue desaturates instead of going to violet.
Added color grading control ExpandGamut which expands bright saturated colors outside the sRGB gamut to fake wide gamut rendering.
ACES changes.
Change 3793344 by Marcus.Wassmer
Fix editortest compile
Change 3794285 by Guillaume.Abadie
Serializes PermutationId according to archive rendering version to avoid issues with old material that were serializing a shader map into UObject.
Change 3794307 by Guillaume.Abadie
Resaves uassets that were modified between 3789648 and 3794285
Change 3794627 by Mark.Satterthwaite
Implement two components for MTLPP, an IMP cache for Objective-C selector implementations & an interposition framework for those same selectors:
- imp_SelectorCache & friends provide the IMP caching for each of the Metal protocols which constitute most of the API, so far I've not covered the Metal classes used for the various descriptor/initializer types. Each type has its own IMPTable which caches the selector's implementation pointer and provides the mechanism to hook that implementation. As Objective-C is runtime dynamic this look up must be performed on the actual Class value returned by an object at runtime - you can't do this at compile time. Even things like NSString which appear compile-time static are really not as NSString is an alias for a class-cluster (NSString, NSMutableString, __NSInlineString and more).
- The interpose directory contains MTI* files which are the framework for interposing all the functions in Metal's runtime API - I deliberately omit the descriptor classes & read-only functions as there's no benefit to interposing them - which I can build off to create a trace tool or a superior validation layer. Right now this is Mac only as there'll be some problems to solve for iOS/tvOS due to difference in linking requirements - not insurmountable.
- Rebuild MTLPP's implementation of the C++ wrapper classes around the IMPTable's - this means we avoid all the objc_msgSend overhead for all the classes and functions whose implementations are cached. Right now the IMPTable is going to incur a look-up for all non-copy/move constructors which is suboptimal - ideally the Metal IMPTables would be cached in the Device object as they will be consistent within a single Device.
- Sort out the MTLPP availability logic - it now exports the availability warnings to the caller and internally just blithely assumes it may call the functions, the caller is responsible for ensuring that calls are made only on appropriate devices & OSes. This reduces MTLPP complexity and better fits how MetalRHI works.
- Fix a number of retain/release bugs that were lying dormant in MTLPP but exposed by the switch to IMPTables.
- Add tvOS support.
Next up, put this into MetalRHI and start fixing all the fallout.
Change 3794631 by Mark.Satterthwaite
Missed updating mtlpp's build.cs for TVOS.
Change 3794651 by Uriel.Doyon
UPointLightComponent::GetUnitsConversionFactor() now takes the cone angle as parameter. This allows to fix spotlight unit conversion when using lumens.
Change 3794720 by Guillaume.Abadie
Fixes a bug in Global{Bilinear,Trilinear}ClampedSampler that was actually doing a Point sampling.
Change 3794749 by Mark.Satterthwaite
Fix mtlpp.build.cs paths.
Change 3794856 by Mark.Satterthwaite
Fix some shadowing warnings.
Change 3795484 by Daniel.Wright
Implemented the Spherical Harmonic windowing algorithm from 'Stupid Spherical Harmonics (SH) Tricks'
New WorldSettings Lightmass property VolumetricLightmapSphericalHarmonicSmoothing controls the global amount of smoothing applied
Change 3795590 by Brian.Karis
Area light fixes
Fixed order of operations. This helps mixing of SourceRadius, SourceLength, and SoftSourceRadius.
Change 3796832 by Marcus.Wassmer
Correct shouldcache condition for new resolve shader
Change 3796884 by Marcus.Wassmer
Doing it right this time.
Change 3797196 by Mark.Satterthwaite
More updates to MTLPP to make things simpler and reduce the number of spurious Objective-C warnings that are emitted because of the way we are using the runtime.
Change 3797200 by Daniel.Wright
Lightmass now uses the highest density VolumetricLightmapDensityVolume settings that affect any part of a cell
Change 3797221 by Daniel.Wright
Reduced default SphericalHarmonicSmoothing based on RoboRecall tests. Now only active with strong direct lighting from static lights by default.
Change 3797411 by Brian.Karis
Disable ExpandGamut for old tone mapper.
Change 3797462 by Mark.Satterthwaite
More build warnings silenced after changing to the lowest possible deployment target OS for each library.
Change 3797585 by Mark.Satterthwaite
Range-based-For support in the NSArray wrapper.
Change 3797836 by Mark.Satterthwaite
Even more forward-declarations to avoid system headers poking through to the including code from mtlpp.
Change 3798027 by Mark.Satterthwaite
Fix handling of nil objects, on which no functions may be called, command-buffer retention and IMP declaration.
Change 3798154 by Mark.Satterthwaite
Fix some egregious memory leaks that rewriting to use mtlpp exposed before we carry on - don't want these slipping into 4.19.
Change 3800990 by Mark.Satterthwaite
Typedef all the completion-handler callback types in mtlpp to make future me's life easier.
Change 3801400 by Chris.Bunner
Improving automated test errors on failure to generate report data.
Change 3801726 by Mark.Satterthwaite
Correct some function availability and the command-buffer error status in mtlpp.
Change 3801808 by Chris.Bunner
Added DefaultScalability.ini to EngineTest that forces all quality levels to Engine default Epic for now to improve consistency.
Change 3801862 by Marcus.Wassmer
Update automated tests with color gamut change
Change 3802214 by Chris.Bunner
When running automated tests in and editor-locked PIE viewport, skip resizing as the editor can't handle this.
Added bindable delegate called when ScreenshotRequest is processed - Useful to allow screenshots to override and restore settings per capture.
#jira UE-53188
Change 3802243 by Chris.Bunner
Added button to automated test screenshot browser to add or replace all outstanding test reports if appropriate.
DeleteAllReports button is now only enabled whilst there are reports in the list.
Change 3802372 by Chris.Bunner
Updating more test screenshots.
Change 3803683 by Chris.Bunner
Adding more logging and multiple attempts to automated test report network save.
Added small wait on repeated operations that are known to fail.
Change 3803826 by Rolando.Caloca
DR - vk - Fix merge issue
Change 3804181 by Chris.Bunner
Tentative fix for CIS test failure.
Change 3804236 by Chris.Bunner
Additional logging for case where file write silently fails, report platform-specific error.
Change 3804303 by zachary.wilson
Cleaning up assets in QAGame saved with empty engine versions to resolve warnings seen when launching on
Change 3804410 by Chris.Bunner
Added additional logging when automated screenshot test fails due to size mismatch.
Mismatched bounds are colored red in the delta.
Change 3804455 by Mark.Satterthwaite
Fix a small number of persistent memory leaks on the Mac build that slowly consume more and more memory as you use the Editor - interacting with menu's was particularly egregious as each NSMenu would leak after you move away.
#jira NA
Change 3804667 by Chris.Bunner
Speculative CIS fixes.
Change 3806008 by Chris.Bunner
Partially reimplementing backed-out CL 3804181 to improve consistency of how automated screenshot test settings are applied/restored.
#tests CIS preflight job 8174412
Change 3806909 by Mark.Satterthwaite
Use the vertex-shader's in-out mask to ensure that we only validate legitmate vertex-streams in Metal's DrawIndexedPrimitive implementation.
#jira UE-53046
Change 3807059 by laz.matech
Checking in QAGame Rendering Map, QA-PhysicalLightingUnits, for testing Physical Light Units.
Wanted to get this in before copy up.
#Jira none
Change 3807726 by Chris.Bunner
Removed a check that we can't fix up. The check hits unbound buffers which it assumes means a failure but is actually due to m.v.fetch. We don't have the information available to know which are which removed from the input without reading from the shader.
#jira UE-53046
Change 3807800 by Guillaume.Abadie
Fixes some warning in shader headers.
Change 3807804 by Guillaume.Abadie
Back out changelist 3807800
Change 3807807 by Guillaume.Abadie
Relands shader header warnings.
Change 3808046 by Chris.Bunner
Dropping a new automated test error back to a warning as this may lead to genuine issues being ignored in the short term.
Change 3809579 by Chris.Bunner
Back out changelist 3774677.
#jira UE-53483
Change 3809620 by Chris.Bunner
Updating animated cloth test screenshot.
Change 3803629 by Chris.Bunner
Rebuilt CornellBox and DistanceField test maps, updated screenshots.
Change 3787045 by Guillaume.Abadie
Moves some global samplers to Common.ush
Change 3809756 by Chris.Bunner
Updating animated cloth test screenshot.
[CL 3809764 by Chris Bunner in Main branch]
2017-12-15 12:47:47 -05:00
FGlobalShader : : ModifyCompilationEnvironment ( Parameters , OutEnvironment ) ;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
OutEnvironment . SetDefine ( TEXT ( " THREADGROUP_SIZEX " ) , GDistanceFieldAOTileSizeX ) ;
OutEnvironment . SetDefine ( TEXT ( " THREADGROUP_SIZEY " ) , GDistanceFieldAOTileSizeY ) ;
OutEnvironment . SetDefine ( TEXT ( " DOWNSAMPLE_FACTOR " ) , GAODownsampleFactor ) ;
// To reduce shader compile time of compute shaders with shared memory, doesn't have an impact on generated code with current compiler (June 2010 DX SDK)
OutEnvironment . CompilerFlags . Add ( CFLAG_StandardOptimization ) ;
}
} ;
2021-06-14 12:46:26 -04:00
IMPLEMENT_GLOBAL_SHADER ( FBuildTileConesCS , " /Engine/Private/DistanceFieldObjectCulling.usf " , " BuildTileConesMain " , SF_Compute ) ;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
/** */
class FObjectCullVS : public FGlobalShader
{
public :
2021-12-09 04:49:36 -05:00
DECLARE_GLOBAL_SHADER ( FObjectCullVS ) ;
SHADER_USE_PARAMETER_STRUCT ( FObjectCullVS , FGlobalShader ) ;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
2021-06-14 12:46:26 -04:00
BEGIN_SHADER_PARAMETER_STRUCT ( FParameters , )
2021-12-09 04:49:36 -05:00
SHADER_PARAMETER_STRUCT_REF ( FViewUniformShaderParameters , View )
2022-01-31 10:23:36 -05:00
SHADER_PARAMETER_STRUCT_INCLUDE ( FDistanceFieldObjectBufferParameters , DistanceFieldObjectBuffers )
2021-06-14 12:46:26 -04:00
SHADER_PARAMETER_STRUCT_INCLUDE ( FDistanceFieldCulledObjectBufferParameters , DistanceFieldCulledObjectBuffers )
SHADER_PARAMETER_STRUCT_INCLUDE ( FDistanceFieldAtlasParameters , DistanceFieldAtlas )
2021-12-09 04:49:36 -05:00
SHADER_PARAMETER_STRUCT_INCLUDE ( FAOParameters , AOParameters )
SHADER_PARAMETER ( float , ConservativeRadiusScale )
2021-06-14 12:46:26 -04:00
END_SHADER_PARAMETER_STRUCT ( )
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3809756)
#rb None
#lockdown Nick.Penwarden
============================
MAJOR FEATURES & CHANGES
============================
Change 3629223 by Rolando.Caloca
DR - Rollback //UE4/Dev-Rendering/Engine/Source/Runtime/VulkanRHI to changelist 3627847
Change 3629708 by Rolando.Caloca
DR - vk - Redo some changes from DevMobile
3601439
3604186
3606672
3617383
3617474
3617483
Change 3761370 by Arne.Schober
DR - Added CityHash to use with conatiners and stuff. It provides good performance and high quallity across multiple platforms.
Change 3761437 by Guillaume.Abadie
Optimises motion blur compute shader for consoles.
Change 3761483 by Guillaume.Abadie
Fixes D3D11 RHI lying to dynamic resolution heuristic with t.MaxFPS.
Change 3761995 by Mark.Satterthwaite
Add the Metal compiler path to the local .pch filename to avoid problems when Xcode moves.
Change 3761996 by Mark.Satterthwaite
Emit more details when a pixel shader is found to have no outputs at all which Metal doesn't permit. More likely this is a bug in the shader compiler not configuring the in-out mask correctly...
#jira UE-52292
Change 3761999 by Mark.Satterthwaite
No need to avoid tessellation for FMetalRHICommandContext::RHIEndDrawIndexedPrimitiveUP anymore - that was from back when the tessellation logic was replicated in each RHI*Draw* implementation.
#jira UE-51937
Change 3762181 by Joe.Graf
Changed MaxShaderJobBatchSize to 25 on Mac as it reduced shader compile time by 21%
Change 3762607 by Mark.Satterthwaite
Remove accidentally included changes from 3761995.
Change 3762612 by Mark.Satterthwaite
Enable the explicit sincos intrinsic for Metal to avoid instances of UE-52477 that can cause shaders to compile incorrectly through hlslcc.
#jira UE-52477
Change 3762772 by Michael.Lentine
Move RHI calls to render thread.
Change 3763021 by Richard.Wallis
Remove shader cache tool project and implementation.
#jira UE-51613
Change 3763082 by Guillaume.Abadie
More SceneTexture, SceneColor and SceneDepth automated tests
Change 3763111 by Richard.Wallis
Clone of CL 3763033 (Release-4.18):
Fix for crash upon launching packaged game on Mac with Share Material Shader Code enabled.
#jira UE-52121
Change 3763657 by Michael.Lentine
Invalidate ddc for skeletal mesh render data so that the duplicated vertex render structures are properly serialized.
Change 3763727 by Jian.Ru
Fix Player Collision view mode. It is caused by checking an uninitialized vertex buffer so the check always fail.
#jira UE-52052
Change 3763738 by Guillaume.Abadie
Implements SSR input post process material location.
Change 3764271 by Mark.Satterthwaite
Allow ControlPointPatch lists to flow through MetalRHI as it was setup to handle this transparently - the VSHS compute shader will convert them to triangles to draw. Report the same warning as in the pipeline creation stage as this hasn't been formally validated.
#jira UE-52454
Change 3764316 by Daniel.Wright
Added AVolumetricLightmapDensityVolume - gives local control over Volumetric Lightmap density. Dropping the top mip outside of the play area in Monolith saves 20Mb (35Mb original).
Volumetric Lightmap no longer refines around static translucent geometry - saves 5Mb in Monolith
Reworked brick culling by error mechanism. Now compares error to interpolated parent lighting instead of the brick average - prevents dropping constant value bricks which are near a wall and cause leaking due to parent interpolation after being culled.
Change 3764318 by Daniel.Wright
Missing file
Change 3764321 by Daniel.Wright
Shader compiling memory optimizations
* Editor memory: Sharing uniform buffer includes and GeneratedInstancedStereo.ush per FShaderType (was previously duplicated per FShader job)
* SCW input size: Sharing uniform buffer includes and SharedEnvironment per batch
* 7.6Gb of shader job inputs in memory -> .5Gb (13x less) when doing a full shader compile of Paragon Editor
* 13.8Gb written into worker input files -> 2.9Gb (4.7x less). Global shaders are never batched when sent to SCW so unoptimized by these changes.
Change 3764595 by Daniel.Wright
Added VolumetricLightmapDensityVolume asset icons
Change 3764701 by Michael.Lentine
Add duplicated vertices merging for meshmerge.
Change 3766002 by Guillaume.Abadie
Fixes a crash in translucency.
Change 3766007 by Guillaume.Abadie
Oups.... Fixes compilation failure.
Change 3766697 by Guillaume.Abadie
Giant refactor of global shader interface for upcoming native support of permutation.
CL generated by python script.
Change 3767205 by Chris.Bunner
Deferring FMaterial::RenderingThreadShaderMap update to render-thread rather than assumption commands have been flushed.
#jira UE-50652
Change 3767207 by Chris.Bunner
Clamp fetched texture coordinates to those available on the mesh.
Change 3767209 by Chris.Bunner
PR #4203: Early-outs in UMaterialInstance parameter setters (Contributed by stefanzimecki)
#jira UE-52193
Change 3767772 by Mark.Satterthwaite
MetalShaderFormat will no longer fallback to text shaders when you ask it to compile to bytecode but the bytecode compiler is not available (either locally or remotely) - this ensures that the DDC can't be poisoned by incorrectly configured clients. The Editor is already setup such that if the remote shader compiler is not configured & Xcode is not available locally the shader-compiler will be invoked to generate text shaders.
#jira UE-52554
Change 3768604 by Guillaume.Abadie
Polish up with new global shader function signature.
Change 3768993 by Guillaume.Abadie
Fixes r.Upscale.Panini cvars
Change 3769478 by Mark.Satterthwaite
Move the ue4_stdlib.metal & PCH into a temporary directory that exists for the lifetime of the SCW on the remote side as well as the local one and add this path as an include directory.
#jira UE-52587
Change 3769703 by Mark.Satterthwaite
For all Metal platforms >= Metal v1.2 transform mul(a,b) into fma(a,b,0) to prevent the Apple compiler reordering operations differently between the base & depth passes which results in variance in the position output.
For iOS disable fast-math when the vertex-shader uses World-Position-Offset because there are additional problems on the iOS shader compiler that result in position variance even with the above fix - WPO performance will suffer but I don't have any alternatives.
Remove the depth-offset hack from the depth-only vertex shader again.
#jira UES-5651
Change 3769763 by Mark.Satterthwaite
Handle swizzle's in the hlslcc fma identification pass so that we reduce the number of instructions and the platform compiler can't break the instructions up.
Change 3769849 by Mark.Satterthwaite
Fix CIS error.
Change 3770517 by Richard.Wallis
Fix for crash when creating a new media texture (AppleIntelHD5000GraphicsMTLDriver!SamplerStage::bindSamplerToTexture()). Missing texture resource for binding. Old InitDynamicRHI() code has been refactored out into seperate functions which leaves us on Mac with a NULL resource initially after creation which Metal doesn't like. This fix puts InitDynamicRHI down the default setup/clear path which inits default resources - I don't think we should use a global dummy in this instance as this is a render target.
#jira UE-51940
Change 3770688 by Uriel.Doyon
Fixed texture resolution returning 0 when running blueprint construction scripts at cook time.
Change 3771115 by Mark.Satterthwaite
Report errors from failed attempts to compile global shaders or we can't see why things fail on non-Windows platforms.
Change 3771263 by Mark.Satterthwaite
Change the way ManualVertexFetch is enabled on Metal platforms so that it is enabled when targeting Metal v1.2 and higher (macOS 10.12+/iOS 10+). This brings iOS in the Desktop Forward renderer back into line with the Mac.
#jira UERNDR-300
Change 3773472 by Guillaume.Abadie
Fixes a crash on PIE of SimpleComposure project.
Change 3773475 by Guillaume.Abadie
Fixes bug in editor viewport caused by SSR input changes.
Change 3774677 by Arne.Schober
DR - Deprecated SetLocal from the RHICmdlist
Fixed some unnecessary PSO collisions.
Change 3777037 by Mark.Satterthwaite
Remove incorrect change that caused a reference to "accurate::sincos" to appear in some Metal shaders rather than "precise::sincos".
Change 3777122 by Mark.Satterthwaite
Back out changelist 3777037 - I'm blind and wasn't seeing the real problem was a stale shader cache...
Change 3777196 by Mark.Satterthwaite
Fix text-shader compilation on iOS 10 - maybe iOS 9 too (untested!).
We need our own make_scalar type-trait template for ue4_stdlib.metal so that we still compile with older iOS runtime compilers and we can't use as_type to directly implement the packHalf2x16/unpackHalf2x16 intrinsics for these older runtime compilers either.
Change 3779098 by Rolando.Caloca
DR - vk - Fix query index
Change 3779275 by Mark.Satterthwaite
Silence the Metal runtime compiler warning caused by use of a deprecated enum value when running text shaders compiled for Metal v1.0/1.1 on a Metal v1.2+ OS.
#jira UE-52554
Change 3779427 by Rolando.Caloca
DR - vk - Fix for allocator contention
Change 3779608 by Uriel.Doyon
Fixed invalid access in the resave package commantlet when building texture streaming material data for materials enabling tesselation.
Change 3784496 by Mark.Satterthwaite
Temporarily disable USE_OBJECT_COMPOSITING_TILE_CULLING for Metal shader compilation only - other platforms are unaffected - as it isn't working properly for some reason. need to work out what's up but don't want Distance Fields to be completely snookered in the interim.
#jira UE-52952
Change 3784608 by Rolando.Caloca
DR - Copy 3784588
- Fix for drivers returning out of date swapchains during resizes
Change 3784734 by Mark.Satterthwaite
Real fix for UE-52952 - MetalShaderFormat wasn't propagating the full thread-group value.
#jira UE-52952
Change 3784741 by Mark.Satterthwaite
More Metal debugging commandline options "-metalfastmath" & "-metalnofastmath" to force fast-math on or off for all shaders, must be using runtime-compiled shaders (i.e. -metalshaderdebug or r.Shaders.Optimise=0) to take effect.
Change 3787103 by Guillaume.Abadie
Kills BuiltinSamplers UB
Change 3787207 by Guillaume.Abadie
Sorry, compile fix that were fine with local changes...
Change 3787396 by Marcus.Wassmer
PR #4271: UE-52901: Set VIS_Max meta to hidden (Contributed by projectgheist)
Change 3788028 by Peter.Sumanaseni
Working linear HDR exr output from sequencer
Change 3788536 by Mark.Satterthwaite
Track whether the Metal shader uses the discard_fragment function as when this is used but without any other outputs we know we need to bind at least one render-target or a depth-stencil surface but we don't know which. This lets us correctly error when we encounter a shader with no outputs at all which Metal doesn't permit.
#jira UE-52292
Change 3788538 by Mark.Satterthwaite
Let's try mitigating UE-46604 on Nvidia by retaining resource references in the command-buffer. This shouldn't be necessary and isn't typically on other vendors but we haven't been able to reproduce this reliably enough to get to the bottom of it.
#jira UE-46604
Change 3789083 by Guillaume.Abadie
Implements global shader permutations. Example in ScreenSpaceReflections.cpp.
Change 3789090 by Guillaume.Abadie
Fixes linux build.
Change 3789106 by Guillaume.Abadie
Fixes compilation failure in niagara plugin.
Change 3789274 by Guillaume.Abadie
Avoid hit proxies to clobber TAA's hitsory.
#jira UE-52968
Change 3789380 by Guillaume.Abadie
Back out changelist 3789083: global shader permutation because compilation failure in clang.
Change 3789648 by Guillaume.Abadie
Relands global shader permutation, with clang support.
Change 3789712 by Guillaume.Abadie
Fixes TestImage show flag with TAAU on.
#jira UE-53061
Change 3791593 by Guillaume.Abadie
Reinvalidates shaders with shader permutations.
Change 3791884 by Daniel.Wright
Added BP setter for LowerHemisphereColor
Change 3791886 by Daniel.Wright
Added LightmapType to PrimitiveComponent
* ForceVolumetric allows forcing static geometry to use Volumetric Lightmaps, which can be useful on instanced foliage where seams are prevalent. Lightmass internal caching still requires lightmap UVs and reasonable lightmap resolution.
* ForceSurface replaces bLightAsIfStatic
Improvements to Volumetric Lightmap quality needed for static geometry
* Stationary light shadowing is now dilated inside geometry
* Now doing two dilation passes since samples near geometry see inside due to ray start bias
* Refinement around geometry uses an expanded cell bounds when the geometry is going to use Volumetric Lightmaps, since cross-resolution stitching causes leaking
Lightmass debug primitives are now tied to a swarm task instead of global - allows debugging of Volumetric Lightmap tasks
Change 3792256 by Guillaume.Abadie
Fixes a bug where permutation was not actually serialised in FShader, so was ending up recompiling shader at every load.
Change 3792884 by Marcus.Wassmer
Copying //UE4/Partner-AMD to Dev-Rendering (//UE4/Dev-Rendering)
Change 3793200 by Marcus.Wassmer
Copying //UE4/Partner-IDV-SpeedTree to Dev-Rendering (//UE4/Dev-Rendering)
Speedtree 8 support
Change 3793206 by Brian.Karis
Added color grading control BlueCorrection to correct for artifacts with "electric" blues due to the ACEScg color space. Bright blue desaturates instead of going to violet.
Added color grading control ExpandGamut which expands bright saturated colors outside the sRGB gamut to fake wide gamut rendering.
ACES changes.
Change 3793344 by Marcus.Wassmer
Fix editortest compile
Change 3794285 by Guillaume.Abadie
Serializes PermutationId according to archive rendering version to avoid issues with old material that were serializing a shader map into UObject.
Change 3794307 by Guillaume.Abadie
Resaves uassets that were modified between 3789648 and 3794285
Change 3794627 by Mark.Satterthwaite
Implement two components for MTLPP, an IMP cache for Objective-C selector implementations & an interposition framework for those same selectors:
- imp_SelectorCache & friends provide the IMP caching for each of the Metal protocols which constitute most of the API, so far I've not covered the Metal classes used for the various descriptor/initializer types. Each type has its own IMPTable which caches the selector's implementation pointer and provides the mechanism to hook that implementation. As Objective-C is runtime dynamic this look up must be performed on the actual Class value returned by an object at runtime - you can't do this at compile time. Even things like NSString which appear compile-time static are really not as NSString is an alias for a class-cluster (NSString, NSMutableString, __NSInlineString and more).
- The interpose directory contains MTI* files which are the framework for interposing all the functions in Metal's runtime API - I deliberately omit the descriptor classes & read-only functions as there's no benefit to interposing them - which I can build off to create a trace tool or a superior validation layer. Right now this is Mac only as there'll be some problems to solve for iOS/tvOS due to difference in linking requirements - not insurmountable.
- Rebuild MTLPP's implementation of the C++ wrapper classes around the IMPTable's - this means we avoid all the objc_msgSend overhead for all the classes and functions whose implementations are cached. Right now the IMPTable is going to incur a look-up for all non-copy/move constructors which is suboptimal - ideally the Metal IMPTables would be cached in the Device object as they will be consistent within a single Device.
- Sort out the MTLPP availability logic - it now exports the availability warnings to the caller and internally just blithely assumes it may call the functions, the caller is responsible for ensuring that calls are made only on appropriate devices & OSes. This reduces MTLPP complexity and better fits how MetalRHI works.
- Fix a number of retain/release bugs that were lying dormant in MTLPP but exposed by the switch to IMPTables.
- Add tvOS support.
Next up, put this into MetalRHI and start fixing all the fallout.
Change 3794631 by Mark.Satterthwaite
Missed updating mtlpp's build.cs for TVOS.
Change 3794651 by Uriel.Doyon
UPointLightComponent::GetUnitsConversionFactor() now takes the cone angle as parameter. This allows to fix spotlight unit conversion when using lumens.
Change 3794720 by Guillaume.Abadie
Fixes a bug in Global{Bilinear,Trilinear}ClampedSampler that was actually doing a Point sampling.
Change 3794749 by Mark.Satterthwaite
Fix mtlpp.build.cs paths.
Change 3794856 by Mark.Satterthwaite
Fix some shadowing warnings.
Change 3795484 by Daniel.Wright
Implemented the Spherical Harmonic windowing algorithm from 'Stupid Spherical Harmonics (SH) Tricks'
New WorldSettings Lightmass property VolumetricLightmapSphericalHarmonicSmoothing controls the global amount of smoothing applied
Change 3795590 by Brian.Karis
Area light fixes
Fixed order of operations. This helps mixing of SourceRadius, SourceLength, and SoftSourceRadius.
Change 3796832 by Marcus.Wassmer
Correct shouldcache condition for new resolve shader
Change 3796884 by Marcus.Wassmer
Doing it right this time.
Change 3797196 by Mark.Satterthwaite
More updates to MTLPP to make things simpler and reduce the number of spurious Objective-C warnings that are emitted because of the way we are using the runtime.
Change 3797200 by Daniel.Wright
Lightmass now uses the highest density VolumetricLightmapDensityVolume settings that affect any part of a cell
Change 3797221 by Daniel.Wright
Reduced default SphericalHarmonicSmoothing based on RoboRecall tests. Now only active with strong direct lighting from static lights by default.
Change 3797411 by Brian.Karis
Disable ExpandGamut for old tone mapper.
Change 3797462 by Mark.Satterthwaite
More build warnings silenced after changing to the lowest possible deployment target OS for each library.
Change 3797585 by Mark.Satterthwaite
Range-based-For support in the NSArray wrapper.
Change 3797836 by Mark.Satterthwaite
Even more forward-declarations to avoid system headers poking through to the including code from mtlpp.
Change 3798027 by Mark.Satterthwaite
Fix handling of nil objects, on which no functions may be called, command-buffer retention and IMP declaration.
Change 3798154 by Mark.Satterthwaite
Fix some egregious memory leaks that rewriting to use mtlpp exposed before we carry on - don't want these slipping into 4.19.
Change 3800990 by Mark.Satterthwaite
Typedef all the completion-handler callback types in mtlpp to make future me's life easier.
Change 3801400 by Chris.Bunner
Improving automated test errors on failure to generate report data.
Change 3801726 by Mark.Satterthwaite
Correct some function availability and the command-buffer error status in mtlpp.
Change 3801808 by Chris.Bunner
Added DefaultScalability.ini to EngineTest that forces all quality levels to Engine default Epic for now to improve consistency.
Change 3801862 by Marcus.Wassmer
Update automated tests with color gamut change
Change 3802214 by Chris.Bunner
When running automated tests in and editor-locked PIE viewport, skip resizing as the editor can't handle this.
Added bindable delegate called when ScreenshotRequest is processed - Useful to allow screenshots to override and restore settings per capture.
#jira UE-53188
Change 3802243 by Chris.Bunner
Added button to automated test screenshot browser to add or replace all outstanding test reports if appropriate.
DeleteAllReports button is now only enabled whilst there are reports in the list.
Change 3802372 by Chris.Bunner
Updating more test screenshots.
Change 3803683 by Chris.Bunner
Adding more logging and multiple attempts to automated test report network save.
Added small wait on repeated operations that are known to fail.
Change 3803826 by Rolando.Caloca
DR - vk - Fix merge issue
Change 3804181 by Chris.Bunner
Tentative fix for CIS test failure.
Change 3804236 by Chris.Bunner
Additional logging for case where file write silently fails, report platform-specific error.
Change 3804303 by zachary.wilson
Cleaning up assets in QAGame saved with empty engine versions to resolve warnings seen when launching on
Change 3804410 by Chris.Bunner
Added additional logging when automated screenshot test fails due to size mismatch.
Mismatched bounds are colored red in the delta.
Change 3804455 by Mark.Satterthwaite
Fix a small number of persistent memory leaks on the Mac build that slowly consume more and more memory as you use the Editor - interacting with menu's was particularly egregious as each NSMenu would leak after you move away.
#jira NA
Change 3804667 by Chris.Bunner
Speculative CIS fixes.
Change 3806008 by Chris.Bunner
Partially reimplementing backed-out CL 3804181 to improve consistency of how automated screenshot test settings are applied/restored.
#tests CIS preflight job 8174412
Change 3806909 by Mark.Satterthwaite
Use the vertex-shader's in-out mask to ensure that we only validate legitmate vertex-streams in Metal's DrawIndexedPrimitive implementation.
#jira UE-53046
Change 3807059 by laz.matech
Checking in QAGame Rendering Map, QA-PhysicalLightingUnits, for testing Physical Light Units.
Wanted to get this in before copy up.
#Jira none
Change 3807726 by Chris.Bunner
Removed a check that we can't fix up. The check hits unbound buffers which it assumes means a failure but is actually due to m.v.fetch. We don't have the information available to know which are which removed from the input without reading from the shader.
#jira UE-53046
Change 3807800 by Guillaume.Abadie
Fixes some warning in shader headers.
Change 3807804 by Guillaume.Abadie
Back out changelist 3807800
Change 3807807 by Guillaume.Abadie
Relands shader header warnings.
Change 3808046 by Chris.Bunner
Dropping a new automated test error back to a warning as this may lead to genuine issues being ignored in the short term.
Change 3809579 by Chris.Bunner
Back out changelist 3774677.
#jira UE-53483
Change 3809620 by Chris.Bunner
Updating animated cloth test screenshot.
Change 3803629 by Chris.Bunner
Rebuilt CornellBox and DistanceField test maps, updated screenshots.
Change 3787045 by Guillaume.Abadie
Moves some global samplers to Common.ush
Change 3809756 by Chris.Bunner
Updating animated cloth test screenshot.
[CL 3809764 by Chris Bunner in Main branch]
2017-12-15 12:47:47 -05:00
static bool ShouldCompilePermutation ( const FGlobalShaderPermutationParameters & Parameters )
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
{
2021-07-12 10:24:46 -04:00
return ShouldCompileDistanceFieldShaders ( Parameters . Platform ) ;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
}
} ;
2021-06-14 12:46:26 -04:00
IMPLEMENT_GLOBAL_SHADER ( FObjectCullVS , " /Engine/Private/DistanceFieldObjectCulling.usf " , " ObjectCullVS " , SF_Vertex ) ;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
2021-05-24 14:08:15 -04:00
class FObjectCullPS : public FGlobalShader
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
{
public :
2021-12-09 04:49:36 -05:00
DECLARE_GLOBAL_SHADER ( FObjectCullPS ) ;
SHADER_USE_PARAMETER_STRUCT ( FObjectCullPS , FGlobalShader ) ;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
2021-06-14 12:46:26 -04:00
BEGIN_SHADER_PARAMETER_STRUCT ( FParameters , )
2021-12-09 04:49:36 -05:00
SHADER_PARAMETER_STRUCT_REF ( FViewUniformShaderParameters , View )
2021-06-14 12:46:26 -04:00
SHADER_PARAMETER_STRUCT_INCLUDE ( FTileIntersectionParameters , TileIntersectionParameters )
2022-01-31 10:23:36 -05:00
SHADER_PARAMETER_STRUCT_INCLUDE ( FDistanceFieldObjectBufferParameters , DistanceFieldObjectBuffers )
2021-06-14 12:46:26 -04:00
SHADER_PARAMETER_STRUCT_INCLUDE ( FDistanceFieldCulledObjectBufferParameters , DistanceFieldCulledObjectBuffers )
SHADER_PARAMETER_STRUCT_INCLUDE ( FDistanceFieldAtlasParameters , DistanceFieldAtlas )
2021-12-09 04:49:36 -05:00
SHADER_PARAMETER_STRUCT_INCLUDE ( FAOParameters , AOParameters )
SHADER_PARAMETER ( FVector2f , NumGroups )
2021-06-14 12:46:26 -04:00
END_SHADER_PARAMETER_STRUCT ( )
2021-05-24 14:08:15 -04:00
class FCountingPass : SHADER_PERMUTATION_BOOL ( " SCATTER_CULLING_COUNT_PASS " ) ;
using FPermutationDomain = TShaderPermutationDomain < FCountingPass > ;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3809756)
#rb None
#lockdown Nick.Penwarden
============================
MAJOR FEATURES & CHANGES
============================
Change 3629223 by Rolando.Caloca
DR - Rollback //UE4/Dev-Rendering/Engine/Source/Runtime/VulkanRHI to changelist 3627847
Change 3629708 by Rolando.Caloca
DR - vk - Redo some changes from DevMobile
3601439
3604186
3606672
3617383
3617474
3617483
Change 3761370 by Arne.Schober
DR - Added CityHash to use with conatiners and stuff. It provides good performance and high quallity across multiple platforms.
Change 3761437 by Guillaume.Abadie
Optimises motion blur compute shader for consoles.
Change 3761483 by Guillaume.Abadie
Fixes D3D11 RHI lying to dynamic resolution heuristic with t.MaxFPS.
Change 3761995 by Mark.Satterthwaite
Add the Metal compiler path to the local .pch filename to avoid problems when Xcode moves.
Change 3761996 by Mark.Satterthwaite
Emit more details when a pixel shader is found to have no outputs at all which Metal doesn't permit. More likely this is a bug in the shader compiler not configuring the in-out mask correctly...
#jira UE-52292
Change 3761999 by Mark.Satterthwaite
No need to avoid tessellation for FMetalRHICommandContext::RHIEndDrawIndexedPrimitiveUP anymore - that was from back when the tessellation logic was replicated in each RHI*Draw* implementation.
#jira UE-51937
Change 3762181 by Joe.Graf
Changed MaxShaderJobBatchSize to 25 on Mac as it reduced shader compile time by 21%
Change 3762607 by Mark.Satterthwaite
Remove accidentally included changes from 3761995.
Change 3762612 by Mark.Satterthwaite
Enable the explicit sincos intrinsic for Metal to avoid instances of UE-52477 that can cause shaders to compile incorrectly through hlslcc.
#jira UE-52477
Change 3762772 by Michael.Lentine
Move RHI calls to render thread.
Change 3763021 by Richard.Wallis
Remove shader cache tool project and implementation.
#jira UE-51613
Change 3763082 by Guillaume.Abadie
More SceneTexture, SceneColor and SceneDepth automated tests
Change 3763111 by Richard.Wallis
Clone of CL 3763033 (Release-4.18):
Fix for crash upon launching packaged game on Mac with Share Material Shader Code enabled.
#jira UE-52121
Change 3763657 by Michael.Lentine
Invalidate ddc for skeletal mesh render data so that the duplicated vertex render structures are properly serialized.
Change 3763727 by Jian.Ru
Fix Player Collision view mode. It is caused by checking an uninitialized vertex buffer so the check always fail.
#jira UE-52052
Change 3763738 by Guillaume.Abadie
Implements SSR input post process material location.
Change 3764271 by Mark.Satterthwaite
Allow ControlPointPatch lists to flow through MetalRHI as it was setup to handle this transparently - the VSHS compute shader will convert them to triangles to draw. Report the same warning as in the pipeline creation stage as this hasn't been formally validated.
#jira UE-52454
Change 3764316 by Daniel.Wright
Added AVolumetricLightmapDensityVolume - gives local control over Volumetric Lightmap density. Dropping the top mip outside of the play area in Monolith saves 20Mb (35Mb original).
Volumetric Lightmap no longer refines around static translucent geometry - saves 5Mb in Monolith
Reworked brick culling by error mechanism. Now compares error to interpolated parent lighting instead of the brick average - prevents dropping constant value bricks which are near a wall and cause leaking due to parent interpolation after being culled.
Change 3764318 by Daniel.Wright
Missing file
Change 3764321 by Daniel.Wright
Shader compiling memory optimizations
* Editor memory: Sharing uniform buffer includes and GeneratedInstancedStereo.ush per FShaderType (was previously duplicated per FShader job)
* SCW input size: Sharing uniform buffer includes and SharedEnvironment per batch
* 7.6Gb of shader job inputs in memory -> .5Gb (13x less) when doing a full shader compile of Paragon Editor
* 13.8Gb written into worker input files -> 2.9Gb (4.7x less). Global shaders are never batched when sent to SCW so unoptimized by these changes.
Change 3764595 by Daniel.Wright
Added VolumetricLightmapDensityVolume asset icons
Change 3764701 by Michael.Lentine
Add duplicated vertices merging for meshmerge.
Change 3766002 by Guillaume.Abadie
Fixes a crash in translucency.
Change 3766007 by Guillaume.Abadie
Oups.... Fixes compilation failure.
Change 3766697 by Guillaume.Abadie
Giant refactor of global shader interface for upcoming native support of permutation.
CL generated by python script.
Change 3767205 by Chris.Bunner
Deferring FMaterial::RenderingThreadShaderMap update to render-thread rather than assumption commands have been flushed.
#jira UE-50652
Change 3767207 by Chris.Bunner
Clamp fetched texture coordinates to those available on the mesh.
Change 3767209 by Chris.Bunner
PR #4203: Early-outs in UMaterialInstance parameter setters (Contributed by stefanzimecki)
#jira UE-52193
Change 3767772 by Mark.Satterthwaite
MetalShaderFormat will no longer fallback to text shaders when you ask it to compile to bytecode but the bytecode compiler is not available (either locally or remotely) - this ensures that the DDC can't be poisoned by incorrectly configured clients. The Editor is already setup such that if the remote shader compiler is not configured & Xcode is not available locally the shader-compiler will be invoked to generate text shaders.
#jira UE-52554
Change 3768604 by Guillaume.Abadie
Polish up with new global shader function signature.
Change 3768993 by Guillaume.Abadie
Fixes r.Upscale.Panini cvars
Change 3769478 by Mark.Satterthwaite
Move the ue4_stdlib.metal & PCH into a temporary directory that exists for the lifetime of the SCW on the remote side as well as the local one and add this path as an include directory.
#jira UE-52587
Change 3769703 by Mark.Satterthwaite
For all Metal platforms >= Metal v1.2 transform mul(a,b) into fma(a,b,0) to prevent the Apple compiler reordering operations differently between the base & depth passes which results in variance in the position output.
For iOS disable fast-math when the vertex-shader uses World-Position-Offset because there are additional problems on the iOS shader compiler that result in position variance even with the above fix - WPO performance will suffer but I don't have any alternatives.
Remove the depth-offset hack from the depth-only vertex shader again.
#jira UES-5651
Change 3769763 by Mark.Satterthwaite
Handle swizzle's in the hlslcc fma identification pass so that we reduce the number of instructions and the platform compiler can't break the instructions up.
Change 3769849 by Mark.Satterthwaite
Fix CIS error.
Change 3770517 by Richard.Wallis
Fix for crash when creating a new media texture (AppleIntelHD5000GraphicsMTLDriver!SamplerStage::bindSamplerToTexture()). Missing texture resource for binding. Old InitDynamicRHI() code has been refactored out into seperate functions which leaves us on Mac with a NULL resource initially after creation which Metal doesn't like. This fix puts InitDynamicRHI down the default setup/clear path which inits default resources - I don't think we should use a global dummy in this instance as this is a render target.
#jira UE-51940
Change 3770688 by Uriel.Doyon
Fixed texture resolution returning 0 when running blueprint construction scripts at cook time.
Change 3771115 by Mark.Satterthwaite
Report errors from failed attempts to compile global shaders or we can't see why things fail on non-Windows platforms.
Change 3771263 by Mark.Satterthwaite
Change the way ManualVertexFetch is enabled on Metal platforms so that it is enabled when targeting Metal v1.2 and higher (macOS 10.12+/iOS 10+). This brings iOS in the Desktop Forward renderer back into line with the Mac.
#jira UERNDR-300
Change 3773472 by Guillaume.Abadie
Fixes a crash on PIE of SimpleComposure project.
Change 3773475 by Guillaume.Abadie
Fixes bug in editor viewport caused by SSR input changes.
Change 3774677 by Arne.Schober
DR - Deprecated SetLocal from the RHICmdlist
Fixed some unnecessary PSO collisions.
Change 3777037 by Mark.Satterthwaite
Remove incorrect change that caused a reference to "accurate::sincos" to appear in some Metal shaders rather than "precise::sincos".
Change 3777122 by Mark.Satterthwaite
Back out changelist 3777037 - I'm blind and wasn't seeing the real problem was a stale shader cache...
Change 3777196 by Mark.Satterthwaite
Fix text-shader compilation on iOS 10 - maybe iOS 9 too (untested!).
We need our own make_scalar type-trait template for ue4_stdlib.metal so that we still compile with older iOS runtime compilers and we can't use as_type to directly implement the packHalf2x16/unpackHalf2x16 intrinsics for these older runtime compilers either.
Change 3779098 by Rolando.Caloca
DR - vk - Fix query index
Change 3779275 by Mark.Satterthwaite
Silence the Metal runtime compiler warning caused by use of a deprecated enum value when running text shaders compiled for Metal v1.0/1.1 on a Metal v1.2+ OS.
#jira UE-52554
Change 3779427 by Rolando.Caloca
DR - vk - Fix for allocator contention
Change 3779608 by Uriel.Doyon
Fixed invalid access in the resave package commantlet when building texture streaming material data for materials enabling tesselation.
Change 3784496 by Mark.Satterthwaite
Temporarily disable USE_OBJECT_COMPOSITING_TILE_CULLING for Metal shader compilation only - other platforms are unaffected - as it isn't working properly for some reason. need to work out what's up but don't want Distance Fields to be completely snookered in the interim.
#jira UE-52952
Change 3784608 by Rolando.Caloca
DR - Copy 3784588
- Fix for drivers returning out of date swapchains during resizes
Change 3784734 by Mark.Satterthwaite
Real fix for UE-52952 - MetalShaderFormat wasn't propagating the full thread-group value.
#jira UE-52952
Change 3784741 by Mark.Satterthwaite
More Metal debugging commandline options "-metalfastmath" & "-metalnofastmath" to force fast-math on or off for all shaders, must be using runtime-compiled shaders (i.e. -metalshaderdebug or r.Shaders.Optimise=0) to take effect.
Change 3787103 by Guillaume.Abadie
Kills BuiltinSamplers UB
Change 3787207 by Guillaume.Abadie
Sorry, compile fix that were fine with local changes...
Change 3787396 by Marcus.Wassmer
PR #4271: UE-52901: Set VIS_Max meta to hidden (Contributed by projectgheist)
Change 3788028 by Peter.Sumanaseni
Working linear HDR exr output from sequencer
Change 3788536 by Mark.Satterthwaite
Track whether the Metal shader uses the discard_fragment function as when this is used but without any other outputs we know we need to bind at least one render-target or a depth-stencil surface but we don't know which. This lets us correctly error when we encounter a shader with no outputs at all which Metal doesn't permit.
#jira UE-52292
Change 3788538 by Mark.Satterthwaite
Let's try mitigating UE-46604 on Nvidia by retaining resource references in the command-buffer. This shouldn't be necessary and isn't typically on other vendors but we haven't been able to reproduce this reliably enough to get to the bottom of it.
#jira UE-46604
Change 3789083 by Guillaume.Abadie
Implements global shader permutations. Example in ScreenSpaceReflections.cpp.
Change 3789090 by Guillaume.Abadie
Fixes linux build.
Change 3789106 by Guillaume.Abadie
Fixes compilation failure in niagara plugin.
Change 3789274 by Guillaume.Abadie
Avoid hit proxies to clobber TAA's hitsory.
#jira UE-52968
Change 3789380 by Guillaume.Abadie
Back out changelist 3789083: global shader permutation because compilation failure in clang.
Change 3789648 by Guillaume.Abadie
Relands global shader permutation, with clang support.
Change 3789712 by Guillaume.Abadie
Fixes TestImage show flag with TAAU on.
#jira UE-53061
Change 3791593 by Guillaume.Abadie
Reinvalidates shaders with shader permutations.
Change 3791884 by Daniel.Wright
Added BP setter for LowerHemisphereColor
Change 3791886 by Daniel.Wright
Added LightmapType to PrimitiveComponent
* ForceVolumetric allows forcing static geometry to use Volumetric Lightmaps, which can be useful on instanced foliage where seams are prevalent. Lightmass internal caching still requires lightmap UVs and reasonable lightmap resolution.
* ForceSurface replaces bLightAsIfStatic
Improvements to Volumetric Lightmap quality needed for static geometry
* Stationary light shadowing is now dilated inside geometry
* Now doing two dilation passes since samples near geometry see inside due to ray start bias
* Refinement around geometry uses an expanded cell bounds when the geometry is going to use Volumetric Lightmaps, since cross-resolution stitching causes leaking
Lightmass debug primitives are now tied to a swarm task instead of global - allows debugging of Volumetric Lightmap tasks
Change 3792256 by Guillaume.Abadie
Fixes a bug where permutation was not actually serialised in FShader, so was ending up recompiling shader at every load.
Change 3792884 by Marcus.Wassmer
Copying //UE4/Partner-AMD to Dev-Rendering (//UE4/Dev-Rendering)
Change 3793200 by Marcus.Wassmer
Copying //UE4/Partner-IDV-SpeedTree to Dev-Rendering (//UE4/Dev-Rendering)
Speedtree 8 support
Change 3793206 by Brian.Karis
Added color grading control BlueCorrection to correct for artifacts with "electric" blues due to the ACEScg color space. Bright blue desaturates instead of going to violet.
Added color grading control ExpandGamut which expands bright saturated colors outside the sRGB gamut to fake wide gamut rendering.
ACES changes.
Change 3793344 by Marcus.Wassmer
Fix editortest compile
Change 3794285 by Guillaume.Abadie
Serializes PermutationId according to archive rendering version to avoid issues with old material that were serializing a shader map into UObject.
Change 3794307 by Guillaume.Abadie
Resaves uassets that were modified between 3789648 and 3794285
Change 3794627 by Mark.Satterthwaite
Implement two components for MTLPP, an IMP cache for Objective-C selector implementations & an interposition framework for those same selectors:
- imp_SelectorCache & friends provide the IMP caching for each of the Metal protocols which constitute most of the API, so far I've not covered the Metal classes used for the various descriptor/initializer types. Each type has its own IMPTable which caches the selector's implementation pointer and provides the mechanism to hook that implementation. As Objective-C is runtime dynamic this look up must be performed on the actual Class value returned by an object at runtime - you can't do this at compile time. Even things like NSString which appear compile-time static are really not as NSString is an alias for a class-cluster (NSString, NSMutableString, __NSInlineString and more).
- The interpose directory contains MTI* files which are the framework for interposing all the functions in Metal's runtime API - I deliberately omit the descriptor classes & read-only functions as there's no benefit to interposing them - which I can build off to create a trace tool or a superior validation layer. Right now this is Mac only as there'll be some problems to solve for iOS/tvOS due to difference in linking requirements - not insurmountable.
- Rebuild MTLPP's implementation of the C++ wrapper classes around the IMPTable's - this means we avoid all the objc_msgSend overhead for all the classes and functions whose implementations are cached. Right now the IMPTable is going to incur a look-up for all non-copy/move constructors which is suboptimal - ideally the Metal IMPTables would be cached in the Device object as they will be consistent within a single Device.
- Sort out the MTLPP availability logic - it now exports the availability warnings to the caller and internally just blithely assumes it may call the functions, the caller is responsible for ensuring that calls are made only on appropriate devices & OSes. This reduces MTLPP complexity and better fits how MetalRHI works.
- Fix a number of retain/release bugs that were lying dormant in MTLPP but exposed by the switch to IMPTables.
- Add tvOS support.
Next up, put this into MetalRHI and start fixing all the fallout.
Change 3794631 by Mark.Satterthwaite
Missed updating mtlpp's build.cs for TVOS.
Change 3794651 by Uriel.Doyon
UPointLightComponent::GetUnitsConversionFactor() now takes the cone angle as parameter. This allows to fix spotlight unit conversion when using lumens.
Change 3794720 by Guillaume.Abadie
Fixes a bug in Global{Bilinear,Trilinear}ClampedSampler that was actually doing a Point sampling.
Change 3794749 by Mark.Satterthwaite
Fix mtlpp.build.cs paths.
Change 3794856 by Mark.Satterthwaite
Fix some shadowing warnings.
Change 3795484 by Daniel.Wright
Implemented the Spherical Harmonic windowing algorithm from 'Stupid Spherical Harmonics (SH) Tricks'
New WorldSettings Lightmass property VolumetricLightmapSphericalHarmonicSmoothing controls the global amount of smoothing applied
Change 3795590 by Brian.Karis
Area light fixes
Fixed order of operations. This helps mixing of SourceRadius, SourceLength, and SoftSourceRadius.
Change 3796832 by Marcus.Wassmer
Correct shouldcache condition for new resolve shader
Change 3796884 by Marcus.Wassmer
Doing it right this time.
Change 3797196 by Mark.Satterthwaite
More updates to MTLPP to make things simpler and reduce the number of spurious Objective-C warnings that are emitted because of the way we are using the runtime.
Change 3797200 by Daniel.Wright
Lightmass now uses the highest density VolumetricLightmapDensityVolume settings that affect any part of a cell
Change 3797221 by Daniel.Wright
Reduced default SphericalHarmonicSmoothing based on RoboRecall tests. Now only active with strong direct lighting from static lights by default.
Change 3797411 by Brian.Karis
Disable ExpandGamut for old tone mapper.
Change 3797462 by Mark.Satterthwaite
More build warnings silenced after changing to the lowest possible deployment target OS for each library.
Change 3797585 by Mark.Satterthwaite
Range-based-For support in the NSArray wrapper.
Change 3797836 by Mark.Satterthwaite
Even more forward-declarations to avoid system headers poking through to the including code from mtlpp.
Change 3798027 by Mark.Satterthwaite
Fix handling of nil objects, on which no functions may be called, command-buffer retention and IMP declaration.
Change 3798154 by Mark.Satterthwaite
Fix some egregious memory leaks that rewriting to use mtlpp exposed before we carry on - don't want these slipping into 4.19.
Change 3800990 by Mark.Satterthwaite
Typedef all the completion-handler callback types in mtlpp to make future me's life easier.
Change 3801400 by Chris.Bunner
Improving automated test errors on failure to generate report data.
Change 3801726 by Mark.Satterthwaite
Correct some function availability and the command-buffer error status in mtlpp.
Change 3801808 by Chris.Bunner
Added DefaultScalability.ini to EngineTest that forces all quality levels to Engine default Epic for now to improve consistency.
Change 3801862 by Marcus.Wassmer
Update automated tests with color gamut change
Change 3802214 by Chris.Bunner
When running automated tests in and editor-locked PIE viewport, skip resizing as the editor can't handle this.
Added bindable delegate called when ScreenshotRequest is processed - Useful to allow screenshots to override and restore settings per capture.
#jira UE-53188
Change 3802243 by Chris.Bunner
Added button to automated test screenshot browser to add or replace all outstanding test reports if appropriate.
DeleteAllReports button is now only enabled whilst there are reports in the list.
Change 3802372 by Chris.Bunner
Updating more test screenshots.
Change 3803683 by Chris.Bunner
Adding more logging and multiple attempts to automated test report network save.
Added small wait on repeated operations that are known to fail.
Change 3803826 by Rolando.Caloca
DR - vk - Fix merge issue
Change 3804181 by Chris.Bunner
Tentative fix for CIS test failure.
Change 3804236 by Chris.Bunner
Additional logging for case where file write silently fails, report platform-specific error.
Change 3804303 by zachary.wilson
Cleaning up assets in QAGame saved with empty engine versions to resolve warnings seen when launching on
Change 3804410 by Chris.Bunner
Added additional logging when automated screenshot test fails due to size mismatch.
Mismatched bounds are colored red in the delta.
Change 3804455 by Mark.Satterthwaite
Fix a small number of persistent memory leaks on the Mac build that slowly consume more and more memory as you use the Editor - interacting with menu's was particularly egregious as each NSMenu would leak after you move away.
#jira NA
Change 3804667 by Chris.Bunner
Speculative CIS fixes.
Change 3806008 by Chris.Bunner
Partially reimplementing backed-out CL 3804181 to improve consistency of how automated screenshot test settings are applied/restored.
#tests CIS preflight job 8174412
Change 3806909 by Mark.Satterthwaite
Use the vertex-shader's in-out mask to ensure that we only validate legitmate vertex-streams in Metal's DrawIndexedPrimitive implementation.
#jira UE-53046
Change 3807059 by laz.matech
Checking in QAGame Rendering Map, QA-PhysicalLightingUnits, for testing Physical Light Units.
Wanted to get this in before copy up.
#Jira none
Change 3807726 by Chris.Bunner
Removed a check that we can't fix up. The check hits unbound buffers which it assumes means a failure but is actually due to m.v.fetch. We don't have the information available to know which are which removed from the input without reading from the shader.
#jira UE-53046
Change 3807800 by Guillaume.Abadie
Fixes some warning in shader headers.
Change 3807804 by Guillaume.Abadie
Back out changelist 3807800
Change 3807807 by Guillaume.Abadie
Relands shader header warnings.
Change 3808046 by Chris.Bunner
Dropping a new automated test error back to a warning as this may lead to genuine issues being ignored in the short term.
Change 3809579 by Chris.Bunner
Back out changelist 3774677.
#jira UE-53483
Change 3809620 by Chris.Bunner
Updating animated cloth test screenshot.
Change 3803629 by Chris.Bunner
Rebuilt CornellBox and DistanceField test maps, updated screenshots.
Change 3787045 by Guillaume.Abadie
Moves some global samplers to Common.ush
Change 3809756 by Chris.Bunner
Updating animated cloth test screenshot.
[CL 3809764 by Chris Bunner in Main branch]
2017-12-15 12:47:47 -05:00
static bool ShouldCompilePermutation ( const FGlobalShaderPermutationParameters & Parameters )
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
{
2021-07-12 10:24:46 -04:00
return ShouldCompileDistanceFieldShaders ( Parameters . Platform ) ;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
}
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3809756)
#rb None
#lockdown Nick.Penwarden
============================
MAJOR FEATURES & CHANGES
============================
Change 3629223 by Rolando.Caloca
DR - Rollback //UE4/Dev-Rendering/Engine/Source/Runtime/VulkanRHI to changelist 3627847
Change 3629708 by Rolando.Caloca
DR - vk - Redo some changes from DevMobile
3601439
3604186
3606672
3617383
3617474
3617483
Change 3761370 by Arne.Schober
DR - Added CityHash to use with conatiners and stuff. It provides good performance and high quallity across multiple platforms.
Change 3761437 by Guillaume.Abadie
Optimises motion blur compute shader for consoles.
Change 3761483 by Guillaume.Abadie
Fixes D3D11 RHI lying to dynamic resolution heuristic with t.MaxFPS.
Change 3761995 by Mark.Satterthwaite
Add the Metal compiler path to the local .pch filename to avoid problems when Xcode moves.
Change 3761996 by Mark.Satterthwaite
Emit more details when a pixel shader is found to have no outputs at all which Metal doesn't permit. More likely this is a bug in the shader compiler not configuring the in-out mask correctly...
#jira UE-52292
Change 3761999 by Mark.Satterthwaite
No need to avoid tessellation for FMetalRHICommandContext::RHIEndDrawIndexedPrimitiveUP anymore - that was from back when the tessellation logic was replicated in each RHI*Draw* implementation.
#jira UE-51937
Change 3762181 by Joe.Graf
Changed MaxShaderJobBatchSize to 25 on Mac as it reduced shader compile time by 21%
Change 3762607 by Mark.Satterthwaite
Remove accidentally included changes from 3761995.
Change 3762612 by Mark.Satterthwaite
Enable the explicit sincos intrinsic for Metal to avoid instances of UE-52477 that can cause shaders to compile incorrectly through hlslcc.
#jira UE-52477
Change 3762772 by Michael.Lentine
Move RHI calls to render thread.
Change 3763021 by Richard.Wallis
Remove shader cache tool project and implementation.
#jira UE-51613
Change 3763082 by Guillaume.Abadie
More SceneTexture, SceneColor and SceneDepth automated tests
Change 3763111 by Richard.Wallis
Clone of CL 3763033 (Release-4.18):
Fix for crash upon launching packaged game on Mac with Share Material Shader Code enabled.
#jira UE-52121
Change 3763657 by Michael.Lentine
Invalidate ddc for skeletal mesh render data so that the duplicated vertex render structures are properly serialized.
Change 3763727 by Jian.Ru
Fix Player Collision view mode. It is caused by checking an uninitialized vertex buffer so the check always fail.
#jira UE-52052
Change 3763738 by Guillaume.Abadie
Implements SSR input post process material location.
Change 3764271 by Mark.Satterthwaite
Allow ControlPointPatch lists to flow through MetalRHI as it was setup to handle this transparently - the VSHS compute shader will convert them to triangles to draw. Report the same warning as in the pipeline creation stage as this hasn't been formally validated.
#jira UE-52454
Change 3764316 by Daniel.Wright
Added AVolumetricLightmapDensityVolume - gives local control over Volumetric Lightmap density. Dropping the top mip outside of the play area in Monolith saves 20Mb (35Mb original).
Volumetric Lightmap no longer refines around static translucent geometry - saves 5Mb in Monolith
Reworked brick culling by error mechanism. Now compares error to interpolated parent lighting instead of the brick average - prevents dropping constant value bricks which are near a wall and cause leaking due to parent interpolation after being culled.
Change 3764318 by Daniel.Wright
Missing file
Change 3764321 by Daniel.Wright
Shader compiling memory optimizations
* Editor memory: Sharing uniform buffer includes and GeneratedInstancedStereo.ush per FShaderType (was previously duplicated per FShader job)
* SCW input size: Sharing uniform buffer includes and SharedEnvironment per batch
* 7.6Gb of shader job inputs in memory -> .5Gb (13x less) when doing a full shader compile of Paragon Editor
* 13.8Gb written into worker input files -> 2.9Gb (4.7x less). Global shaders are never batched when sent to SCW so unoptimized by these changes.
Change 3764595 by Daniel.Wright
Added VolumetricLightmapDensityVolume asset icons
Change 3764701 by Michael.Lentine
Add duplicated vertices merging for meshmerge.
Change 3766002 by Guillaume.Abadie
Fixes a crash in translucency.
Change 3766007 by Guillaume.Abadie
Oups.... Fixes compilation failure.
Change 3766697 by Guillaume.Abadie
Giant refactor of global shader interface for upcoming native support of permutation.
CL generated by python script.
Change 3767205 by Chris.Bunner
Deferring FMaterial::RenderingThreadShaderMap update to render-thread rather than assumption commands have been flushed.
#jira UE-50652
Change 3767207 by Chris.Bunner
Clamp fetched texture coordinates to those available on the mesh.
Change 3767209 by Chris.Bunner
PR #4203: Early-outs in UMaterialInstance parameter setters (Contributed by stefanzimecki)
#jira UE-52193
Change 3767772 by Mark.Satterthwaite
MetalShaderFormat will no longer fallback to text shaders when you ask it to compile to bytecode but the bytecode compiler is not available (either locally or remotely) - this ensures that the DDC can't be poisoned by incorrectly configured clients. The Editor is already setup such that if the remote shader compiler is not configured & Xcode is not available locally the shader-compiler will be invoked to generate text shaders.
#jira UE-52554
Change 3768604 by Guillaume.Abadie
Polish up with new global shader function signature.
Change 3768993 by Guillaume.Abadie
Fixes r.Upscale.Panini cvars
Change 3769478 by Mark.Satterthwaite
Move the ue4_stdlib.metal & PCH into a temporary directory that exists for the lifetime of the SCW on the remote side as well as the local one and add this path as an include directory.
#jira UE-52587
Change 3769703 by Mark.Satterthwaite
For all Metal platforms >= Metal v1.2 transform mul(a,b) into fma(a,b,0) to prevent the Apple compiler reordering operations differently between the base & depth passes which results in variance in the position output.
For iOS disable fast-math when the vertex-shader uses World-Position-Offset because there are additional problems on the iOS shader compiler that result in position variance even with the above fix - WPO performance will suffer but I don't have any alternatives.
Remove the depth-offset hack from the depth-only vertex shader again.
#jira UES-5651
Change 3769763 by Mark.Satterthwaite
Handle swizzle's in the hlslcc fma identification pass so that we reduce the number of instructions and the platform compiler can't break the instructions up.
Change 3769849 by Mark.Satterthwaite
Fix CIS error.
Change 3770517 by Richard.Wallis
Fix for crash when creating a new media texture (AppleIntelHD5000GraphicsMTLDriver!SamplerStage::bindSamplerToTexture()). Missing texture resource for binding. Old InitDynamicRHI() code has been refactored out into seperate functions which leaves us on Mac with a NULL resource initially after creation which Metal doesn't like. This fix puts InitDynamicRHI down the default setup/clear path which inits default resources - I don't think we should use a global dummy in this instance as this is a render target.
#jira UE-51940
Change 3770688 by Uriel.Doyon
Fixed texture resolution returning 0 when running blueprint construction scripts at cook time.
Change 3771115 by Mark.Satterthwaite
Report errors from failed attempts to compile global shaders or we can't see why things fail on non-Windows platforms.
Change 3771263 by Mark.Satterthwaite
Change the way ManualVertexFetch is enabled on Metal platforms so that it is enabled when targeting Metal v1.2 and higher (macOS 10.12+/iOS 10+). This brings iOS in the Desktop Forward renderer back into line with the Mac.
#jira UERNDR-300
Change 3773472 by Guillaume.Abadie
Fixes a crash on PIE of SimpleComposure project.
Change 3773475 by Guillaume.Abadie
Fixes bug in editor viewport caused by SSR input changes.
Change 3774677 by Arne.Schober
DR - Deprecated SetLocal from the RHICmdlist
Fixed some unnecessary PSO collisions.
Change 3777037 by Mark.Satterthwaite
Remove incorrect change that caused a reference to "accurate::sincos" to appear in some Metal shaders rather than "precise::sincos".
Change 3777122 by Mark.Satterthwaite
Back out changelist 3777037 - I'm blind and wasn't seeing the real problem was a stale shader cache...
Change 3777196 by Mark.Satterthwaite
Fix text-shader compilation on iOS 10 - maybe iOS 9 too (untested!).
We need our own make_scalar type-trait template for ue4_stdlib.metal so that we still compile with older iOS runtime compilers and we can't use as_type to directly implement the packHalf2x16/unpackHalf2x16 intrinsics for these older runtime compilers either.
Change 3779098 by Rolando.Caloca
DR - vk - Fix query index
Change 3779275 by Mark.Satterthwaite
Silence the Metal runtime compiler warning caused by use of a deprecated enum value when running text shaders compiled for Metal v1.0/1.1 on a Metal v1.2+ OS.
#jira UE-52554
Change 3779427 by Rolando.Caloca
DR - vk - Fix for allocator contention
Change 3779608 by Uriel.Doyon
Fixed invalid access in the resave package commantlet when building texture streaming material data for materials enabling tesselation.
Change 3784496 by Mark.Satterthwaite
Temporarily disable USE_OBJECT_COMPOSITING_TILE_CULLING for Metal shader compilation only - other platforms are unaffected - as it isn't working properly for some reason. need to work out what's up but don't want Distance Fields to be completely snookered in the interim.
#jira UE-52952
Change 3784608 by Rolando.Caloca
DR - Copy 3784588
- Fix for drivers returning out of date swapchains during resizes
Change 3784734 by Mark.Satterthwaite
Real fix for UE-52952 - MetalShaderFormat wasn't propagating the full thread-group value.
#jira UE-52952
Change 3784741 by Mark.Satterthwaite
More Metal debugging commandline options "-metalfastmath" & "-metalnofastmath" to force fast-math on or off for all shaders, must be using runtime-compiled shaders (i.e. -metalshaderdebug or r.Shaders.Optimise=0) to take effect.
Change 3787103 by Guillaume.Abadie
Kills BuiltinSamplers UB
Change 3787207 by Guillaume.Abadie
Sorry, compile fix that were fine with local changes...
Change 3787396 by Marcus.Wassmer
PR #4271: UE-52901: Set VIS_Max meta to hidden (Contributed by projectgheist)
Change 3788028 by Peter.Sumanaseni
Working linear HDR exr output from sequencer
Change 3788536 by Mark.Satterthwaite
Track whether the Metal shader uses the discard_fragment function as when this is used but without any other outputs we know we need to bind at least one render-target or a depth-stencil surface but we don't know which. This lets us correctly error when we encounter a shader with no outputs at all which Metal doesn't permit.
#jira UE-52292
Change 3788538 by Mark.Satterthwaite
Let's try mitigating UE-46604 on Nvidia by retaining resource references in the command-buffer. This shouldn't be necessary and isn't typically on other vendors but we haven't been able to reproduce this reliably enough to get to the bottom of it.
#jira UE-46604
Change 3789083 by Guillaume.Abadie
Implements global shader permutations. Example in ScreenSpaceReflections.cpp.
Change 3789090 by Guillaume.Abadie
Fixes linux build.
Change 3789106 by Guillaume.Abadie
Fixes compilation failure in niagara plugin.
Change 3789274 by Guillaume.Abadie
Avoid hit proxies to clobber TAA's hitsory.
#jira UE-52968
Change 3789380 by Guillaume.Abadie
Back out changelist 3789083: global shader permutation because compilation failure in clang.
Change 3789648 by Guillaume.Abadie
Relands global shader permutation, with clang support.
Change 3789712 by Guillaume.Abadie
Fixes TestImage show flag with TAAU on.
#jira UE-53061
Change 3791593 by Guillaume.Abadie
Reinvalidates shaders with shader permutations.
Change 3791884 by Daniel.Wright
Added BP setter for LowerHemisphereColor
Change 3791886 by Daniel.Wright
Added LightmapType to PrimitiveComponent
* ForceVolumetric allows forcing static geometry to use Volumetric Lightmaps, which can be useful on instanced foliage where seams are prevalent. Lightmass internal caching still requires lightmap UVs and reasonable lightmap resolution.
* ForceSurface replaces bLightAsIfStatic
Improvements to Volumetric Lightmap quality needed for static geometry
* Stationary light shadowing is now dilated inside geometry
* Now doing two dilation passes since samples near geometry see inside due to ray start bias
* Refinement around geometry uses an expanded cell bounds when the geometry is going to use Volumetric Lightmaps, since cross-resolution stitching causes leaking
Lightmass debug primitives are now tied to a swarm task instead of global - allows debugging of Volumetric Lightmap tasks
Change 3792256 by Guillaume.Abadie
Fixes a bug where permutation was not actually serialised in FShader, so was ending up recompiling shader at every load.
Change 3792884 by Marcus.Wassmer
Copying //UE4/Partner-AMD to Dev-Rendering (//UE4/Dev-Rendering)
Change 3793200 by Marcus.Wassmer
Copying //UE4/Partner-IDV-SpeedTree to Dev-Rendering (//UE4/Dev-Rendering)
Speedtree 8 support
Change 3793206 by Brian.Karis
Added color grading control BlueCorrection to correct for artifacts with "electric" blues due to the ACEScg color space. Bright blue desaturates instead of going to violet.
Added color grading control ExpandGamut which expands bright saturated colors outside the sRGB gamut to fake wide gamut rendering.
ACES changes.
Change 3793344 by Marcus.Wassmer
Fix editortest compile
Change 3794285 by Guillaume.Abadie
Serializes PermutationId according to archive rendering version to avoid issues with old material that were serializing a shader map into UObject.
Change 3794307 by Guillaume.Abadie
Resaves uassets that were modified between 3789648 and 3794285
Change 3794627 by Mark.Satterthwaite
Implement two components for MTLPP, an IMP cache for Objective-C selector implementations & an interposition framework for those same selectors:
- imp_SelectorCache & friends provide the IMP caching for each of the Metal protocols which constitute most of the API, so far I've not covered the Metal classes used for the various descriptor/initializer types. Each type has its own IMPTable which caches the selector's implementation pointer and provides the mechanism to hook that implementation. As Objective-C is runtime dynamic this look up must be performed on the actual Class value returned by an object at runtime - you can't do this at compile time. Even things like NSString which appear compile-time static are really not as NSString is an alias for a class-cluster (NSString, NSMutableString, __NSInlineString and more).
- The interpose directory contains MTI* files which are the framework for interposing all the functions in Metal's runtime API - I deliberately omit the descriptor classes & read-only functions as there's no benefit to interposing them - which I can build off to create a trace tool or a superior validation layer. Right now this is Mac only as there'll be some problems to solve for iOS/tvOS due to difference in linking requirements - not insurmountable.
- Rebuild MTLPP's implementation of the C++ wrapper classes around the IMPTable's - this means we avoid all the objc_msgSend overhead for all the classes and functions whose implementations are cached. Right now the IMPTable is going to incur a look-up for all non-copy/move constructors which is suboptimal - ideally the Metal IMPTables would be cached in the Device object as they will be consistent within a single Device.
- Sort out the MTLPP availability logic - it now exports the availability warnings to the caller and internally just blithely assumes it may call the functions, the caller is responsible for ensuring that calls are made only on appropriate devices & OSes. This reduces MTLPP complexity and better fits how MetalRHI works.
- Fix a number of retain/release bugs that were lying dormant in MTLPP but exposed by the switch to IMPTables.
- Add tvOS support.
Next up, put this into MetalRHI and start fixing all the fallout.
Change 3794631 by Mark.Satterthwaite
Missed updating mtlpp's build.cs for TVOS.
Change 3794651 by Uriel.Doyon
UPointLightComponent::GetUnitsConversionFactor() now takes the cone angle as parameter. This allows to fix spotlight unit conversion when using lumens.
Change 3794720 by Guillaume.Abadie
Fixes a bug in Global{Bilinear,Trilinear}ClampedSampler that was actually doing a Point sampling.
Change 3794749 by Mark.Satterthwaite
Fix mtlpp.build.cs paths.
Change 3794856 by Mark.Satterthwaite
Fix some shadowing warnings.
Change 3795484 by Daniel.Wright
Implemented the Spherical Harmonic windowing algorithm from 'Stupid Spherical Harmonics (SH) Tricks'
New WorldSettings Lightmass property VolumetricLightmapSphericalHarmonicSmoothing controls the global amount of smoothing applied
Change 3795590 by Brian.Karis
Area light fixes
Fixed order of operations. This helps mixing of SourceRadius, SourceLength, and SoftSourceRadius.
Change 3796832 by Marcus.Wassmer
Correct shouldcache condition for new resolve shader
Change 3796884 by Marcus.Wassmer
Doing it right this time.
Change 3797196 by Mark.Satterthwaite
More updates to MTLPP to make things simpler and reduce the number of spurious Objective-C warnings that are emitted because of the way we are using the runtime.
Change 3797200 by Daniel.Wright
Lightmass now uses the highest density VolumetricLightmapDensityVolume settings that affect any part of a cell
Change 3797221 by Daniel.Wright
Reduced default SphericalHarmonicSmoothing based on RoboRecall tests. Now only active with strong direct lighting from static lights by default.
Change 3797411 by Brian.Karis
Disable ExpandGamut for old tone mapper.
Change 3797462 by Mark.Satterthwaite
More build warnings silenced after changing to the lowest possible deployment target OS for each library.
Change 3797585 by Mark.Satterthwaite
Range-based-For support in the NSArray wrapper.
Change 3797836 by Mark.Satterthwaite
Even more forward-declarations to avoid system headers poking through to the including code from mtlpp.
Change 3798027 by Mark.Satterthwaite
Fix handling of nil objects, on which no functions may be called, command-buffer retention and IMP declaration.
Change 3798154 by Mark.Satterthwaite
Fix some egregious memory leaks that rewriting to use mtlpp exposed before we carry on - don't want these slipping into 4.19.
Change 3800990 by Mark.Satterthwaite
Typedef all the completion-handler callback types in mtlpp to make future me's life easier.
Change 3801400 by Chris.Bunner
Improving automated test errors on failure to generate report data.
Change 3801726 by Mark.Satterthwaite
Correct some function availability and the command-buffer error status in mtlpp.
Change 3801808 by Chris.Bunner
Added DefaultScalability.ini to EngineTest that forces all quality levels to Engine default Epic for now to improve consistency.
Change 3801862 by Marcus.Wassmer
Update automated tests with color gamut change
Change 3802214 by Chris.Bunner
When running automated tests in and editor-locked PIE viewport, skip resizing as the editor can't handle this.
Added bindable delegate called when ScreenshotRequest is processed - Useful to allow screenshots to override and restore settings per capture.
#jira UE-53188
Change 3802243 by Chris.Bunner
Added button to automated test screenshot browser to add or replace all outstanding test reports if appropriate.
DeleteAllReports button is now only enabled whilst there are reports in the list.
Change 3802372 by Chris.Bunner
Updating more test screenshots.
Change 3803683 by Chris.Bunner
Adding more logging and multiple attempts to automated test report network save.
Added small wait on repeated operations that are known to fail.
Change 3803826 by Rolando.Caloca
DR - vk - Fix merge issue
Change 3804181 by Chris.Bunner
Tentative fix for CIS test failure.
Change 3804236 by Chris.Bunner
Additional logging for case where file write silently fails, report platform-specific error.
Change 3804303 by zachary.wilson
Cleaning up assets in QAGame saved with empty engine versions to resolve warnings seen when launching on
Change 3804410 by Chris.Bunner
Added additional logging when automated screenshot test fails due to size mismatch.
Mismatched bounds are colored red in the delta.
Change 3804455 by Mark.Satterthwaite
Fix a small number of persistent memory leaks on the Mac build that slowly consume more and more memory as you use the Editor - interacting with menu's was particularly egregious as each NSMenu would leak after you move away.
#jira NA
Change 3804667 by Chris.Bunner
Speculative CIS fixes.
Change 3806008 by Chris.Bunner
Partially reimplementing backed-out CL 3804181 to improve consistency of how automated screenshot test settings are applied/restored.
#tests CIS preflight job 8174412
Change 3806909 by Mark.Satterthwaite
Use the vertex-shader's in-out mask to ensure that we only validate legitmate vertex-streams in Metal's DrawIndexedPrimitive implementation.
#jira UE-53046
Change 3807059 by laz.matech
Checking in QAGame Rendering Map, QA-PhysicalLightingUnits, for testing Physical Light Units.
Wanted to get this in before copy up.
#Jira none
Change 3807726 by Chris.Bunner
Removed a check that we can't fix up. The check hits unbound buffers which it assumes means a failure but is actually due to m.v.fetch. We don't have the information available to know which are which removed from the input without reading from the shader.
#jira UE-53046
Change 3807800 by Guillaume.Abadie
Fixes some warning in shader headers.
Change 3807804 by Guillaume.Abadie
Back out changelist 3807800
Change 3807807 by Guillaume.Abadie
Relands shader header warnings.
Change 3808046 by Chris.Bunner
Dropping a new automated test error back to a warning as this may lead to genuine issues being ignored in the short term.
Change 3809579 by Chris.Bunner
Back out changelist 3774677.
#jira UE-53483
Change 3809620 by Chris.Bunner
Updating animated cloth test screenshot.
Change 3803629 by Chris.Bunner
Rebuilt CornellBox and DistanceField test maps, updated screenshots.
Change 3787045 by Guillaume.Abadie
Moves some global samplers to Common.ush
Change 3809756 by Chris.Bunner
Updating animated cloth test screenshot.
[CL 3809764 by Chris Bunner in Main branch]
2017-12-15 12:47:47 -05:00
static void ModifyCompilationEnvironment ( const FGlobalShaderPermutationParameters & Parameters , FShaderCompilerEnvironment & OutEnvironment )
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
{
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3809756)
#rb None
#lockdown Nick.Penwarden
============================
MAJOR FEATURES & CHANGES
============================
Change 3629223 by Rolando.Caloca
DR - Rollback //UE4/Dev-Rendering/Engine/Source/Runtime/VulkanRHI to changelist 3627847
Change 3629708 by Rolando.Caloca
DR - vk - Redo some changes from DevMobile
3601439
3604186
3606672
3617383
3617474
3617483
Change 3761370 by Arne.Schober
DR - Added CityHash to use with conatiners and stuff. It provides good performance and high quallity across multiple platforms.
Change 3761437 by Guillaume.Abadie
Optimises motion blur compute shader for consoles.
Change 3761483 by Guillaume.Abadie
Fixes D3D11 RHI lying to dynamic resolution heuristic with t.MaxFPS.
Change 3761995 by Mark.Satterthwaite
Add the Metal compiler path to the local .pch filename to avoid problems when Xcode moves.
Change 3761996 by Mark.Satterthwaite
Emit more details when a pixel shader is found to have no outputs at all which Metal doesn't permit. More likely this is a bug in the shader compiler not configuring the in-out mask correctly...
#jira UE-52292
Change 3761999 by Mark.Satterthwaite
No need to avoid tessellation for FMetalRHICommandContext::RHIEndDrawIndexedPrimitiveUP anymore - that was from back when the tessellation logic was replicated in each RHI*Draw* implementation.
#jira UE-51937
Change 3762181 by Joe.Graf
Changed MaxShaderJobBatchSize to 25 on Mac as it reduced shader compile time by 21%
Change 3762607 by Mark.Satterthwaite
Remove accidentally included changes from 3761995.
Change 3762612 by Mark.Satterthwaite
Enable the explicit sincos intrinsic for Metal to avoid instances of UE-52477 that can cause shaders to compile incorrectly through hlslcc.
#jira UE-52477
Change 3762772 by Michael.Lentine
Move RHI calls to render thread.
Change 3763021 by Richard.Wallis
Remove shader cache tool project and implementation.
#jira UE-51613
Change 3763082 by Guillaume.Abadie
More SceneTexture, SceneColor and SceneDepth automated tests
Change 3763111 by Richard.Wallis
Clone of CL 3763033 (Release-4.18):
Fix for crash upon launching packaged game on Mac with Share Material Shader Code enabled.
#jira UE-52121
Change 3763657 by Michael.Lentine
Invalidate ddc for skeletal mesh render data so that the duplicated vertex render structures are properly serialized.
Change 3763727 by Jian.Ru
Fix Player Collision view mode. It is caused by checking an uninitialized vertex buffer so the check always fail.
#jira UE-52052
Change 3763738 by Guillaume.Abadie
Implements SSR input post process material location.
Change 3764271 by Mark.Satterthwaite
Allow ControlPointPatch lists to flow through MetalRHI as it was setup to handle this transparently - the VSHS compute shader will convert them to triangles to draw. Report the same warning as in the pipeline creation stage as this hasn't been formally validated.
#jira UE-52454
Change 3764316 by Daniel.Wright
Added AVolumetricLightmapDensityVolume - gives local control over Volumetric Lightmap density. Dropping the top mip outside of the play area in Monolith saves 20Mb (35Mb original).
Volumetric Lightmap no longer refines around static translucent geometry - saves 5Mb in Monolith
Reworked brick culling by error mechanism. Now compares error to interpolated parent lighting instead of the brick average - prevents dropping constant value bricks which are near a wall and cause leaking due to parent interpolation after being culled.
Change 3764318 by Daniel.Wright
Missing file
Change 3764321 by Daniel.Wright
Shader compiling memory optimizations
* Editor memory: Sharing uniform buffer includes and GeneratedInstancedStereo.ush per FShaderType (was previously duplicated per FShader job)
* SCW input size: Sharing uniform buffer includes and SharedEnvironment per batch
* 7.6Gb of shader job inputs in memory -> .5Gb (13x less) when doing a full shader compile of Paragon Editor
* 13.8Gb written into worker input files -> 2.9Gb (4.7x less). Global shaders are never batched when sent to SCW so unoptimized by these changes.
Change 3764595 by Daniel.Wright
Added VolumetricLightmapDensityVolume asset icons
Change 3764701 by Michael.Lentine
Add duplicated vertices merging for meshmerge.
Change 3766002 by Guillaume.Abadie
Fixes a crash in translucency.
Change 3766007 by Guillaume.Abadie
Oups.... Fixes compilation failure.
Change 3766697 by Guillaume.Abadie
Giant refactor of global shader interface for upcoming native support of permutation.
CL generated by python script.
Change 3767205 by Chris.Bunner
Deferring FMaterial::RenderingThreadShaderMap update to render-thread rather than assumption commands have been flushed.
#jira UE-50652
Change 3767207 by Chris.Bunner
Clamp fetched texture coordinates to those available on the mesh.
Change 3767209 by Chris.Bunner
PR #4203: Early-outs in UMaterialInstance parameter setters (Contributed by stefanzimecki)
#jira UE-52193
Change 3767772 by Mark.Satterthwaite
MetalShaderFormat will no longer fallback to text shaders when you ask it to compile to bytecode but the bytecode compiler is not available (either locally or remotely) - this ensures that the DDC can't be poisoned by incorrectly configured clients. The Editor is already setup such that if the remote shader compiler is not configured & Xcode is not available locally the shader-compiler will be invoked to generate text shaders.
#jira UE-52554
Change 3768604 by Guillaume.Abadie
Polish up with new global shader function signature.
Change 3768993 by Guillaume.Abadie
Fixes r.Upscale.Panini cvars
Change 3769478 by Mark.Satterthwaite
Move the ue4_stdlib.metal & PCH into a temporary directory that exists for the lifetime of the SCW on the remote side as well as the local one and add this path as an include directory.
#jira UE-52587
Change 3769703 by Mark.Satterthwaite
For all Metal platforms >= Metal v1.2 transform mul(a,b) into fma(a,b,0) to prevent the Apple compiler reordering operations differently between the base & depth passes which results in variance in the position output.
For iOS disable fast-math when the vertex-shader uses World-Position-Offset because there are additional problems on the iOS shader compiler that result in position variance even with the above fix - WPO performance will suffer but I don't have any alternatives.
Remove the depth-offset hack from the depth-only vertex shader again.
#jira UES-5651
Change 3769763 by Mark.Satterthwaite
Handle swizzle's in the hlslcc fma identification pass so that we reduce the number of instructions and the platform compiler can't break the instructions up.
Change 3769849 by Mark.Satterthwaite
Fix CIS error.
Change 3770517 by Richard.Wallis
Fix for crash when creating a new media texture (AppleIntelHD5000GraphicsMTLDriver!SamplerStage::bindSamplerToTexture()). Missing texture resource for binding. Old InitDynamicRHI() code has been refactored out into seperate functions which leaves us on Mac with a NULL resource initially after creation which Metal doesn't like. This fix puts InitDynamicRHI down the default setup/clear path which inits default resources - I don't think we should use a global dummy in this instance as this is a render target.
#jira UE-51940
Change 3770688 by Uriel.Doyon
Fixed texture resolution returning 0 when running blueprint construction scripts at cook time.
Change 3771115 by Mark.Satterthwaite
Report errors from failed attempts to compile global shaders or we can't see why things fail on non-Windows platforms.
Change 3771263 by Mark.Satterthwaite
Change the way ManualVertexFetch is enabled on Metal platforms so that it is enabled when targeting Metal v1.2 and higher (macOS 10.12+/iOS 10+). This brings iOS in the Desktop Forward renderer back into line with the Mac.
#jira UERNDR-300
Change 3773472 by Guillaume.Abadie
Fixes a crash on PIE of SimpleComposure project.
Change 3773475 by Guillaume.Abadie
Fixes bug in editor viewport caused by SSR input changes.
Change 3774677 by Arne.Schober
DR - Deprecated SetLocal from the RHICmdlist
Fixed some unnecessary PSO collisions.
Change 3777037 by Mark.Satterthwaite
Remove incorrect change that caused a reference to "accurate::sincos" to appear in some Metal shaders rather than "precise::sincos".
Change 3777122 by Mark.Satterthwaite
Back out changelist 3777037 - I'm blind and wasn't seeing the real problem was a stale shader cache...
Change 3777196 by Mark.Satterthwaite
Fix text-shader compilation on iOS 10 - maybe iOS 9 too (untested!).
We need our own make_scalar type-trait template for ue4_stdlib.metal so that we still compile with older iOS runtime compilers and we can't use as_type to directly implement the packHalf2x16/unpackHalf2x16 intrinsics for these older runtime compilers either.
Change 3779098 by Rolando.Caloca
DR - vk - Fix query index
Change 3779275 by Mark.Satterthwaite
Silence the Metal runtime compiler warning caused by use of a deprecated enum value when running text shaders compiled for Metal v1.0/1.1 on a Metal v1.2+ OS.
#jira UE-52554
Change 3779427 by Rolando.Caloca
DR - vk - Fix for allocator contention
Change 3779608 by Uriel.Doyon
Fixed invalid access in the resave package commantlet when building texture streaming material data for materials enabling tesselation.
Change 3784496 by Mark.Satterthwaite
Temporarily disable USE_OBJECT_COMPOSITING_TILE_CULLING for Metal shader compilation only - other platforms are unaffected - as it isn't working properly for some reason. need to work out what's up but don't want Distance Fields to be completely snookered in the interim.
#jira UE-52952
Change 3784608 by Rolando.Caloca
DR - Copy 3784588
- Fix for drivers returning out of date swapchains during resizes
Change 3784734 by Mark.Satterthwaite
Real fix for UE-52952 - MetalShaderFormat wasn't propagating the full thread-group value.
#jira UE-52952
Change 3784741 by Mark.Satterthwaite
More Metal debugging commandline options "-metalfastmath" & "-metalnofastmath" to force fast-math on or off for all shaders, must be using runtime-compiled shaders (i.e. -metalshaderdebug or r.Shaders.Optimise=0) to take effect.
Change 3787103 by Guillaume.Abadie
Kills BuiltinSamplers UB
Change 3787207 by Guillaume.Abadie
Sorry, compile fix that were fine with local changes...
Change 3787396 by Marcus.Wassmer
PR #4271: UE-52901: Set VIS_Max meta to hidden (Contributed by projectgheist)
Change 3788028 by Peter.Sumanaseni
Working linear HDR exr output from sequencer
Change 3788536 by Mark.Satterthwaite
Track whether the Metal shader uses the discard_fragment function as when this is used but without any other outputs we know we need to bind at least one render-target or a depth-stencil surface but we don't know which. This lets us correctly error when we encounter a shader with no outputs at all which Metal doesn't permit.
#jira UE-52292
Change 3788538 by Mark.Satterthwaite
Let's try mitigating UE-46604 on Nvidia by retaining resource references in the command-buffer. This shouldn't be necessary and isn't typically on other vendors but we haven't been able to reproduce this reliably enough to get to the bottom of it.
#jira UE-46604
Change 3789083 by Guillaume.Abadie
Implements global shader permutations. Example in ScreenSpaceReflections.cpp.
Change 3789090 by Guillaume.Abadie
Fixes linux build.
Change 3789106 by Guillaume.Abadie
Fixes compilation failure in niagara plugin.
Change 3789274 by Guillaume.Abadie
Avoid hit proxies to clobber TAA's hitsory.
#jira UE-52968
Change 3789380 by Guillaume.Abadie
Back out changelist 3789083: global shader permutation because compilation failure in clang.
Change 3789648 by Guillaume.Abadie
Relands global shader permutation, with clang support.
Change 3789712 by Guillaume.Abadie
Fixes TestImage show flag with TAAU on.
#jira UE-53061
Change 3791593 by Guillaume.Abadie
Reinvalidates shaders with shader permutations.
Change 3791884 by Daniel.Wright
Added BP setter for LowerHemisphereColor
Change 3791886 by Daniel.Wright
Added LightmapType to PrimitiveComponent
* ForceVolumetric allows forcing static geometry to use Volumetric Lightmaps, which can be useful on instanced foliage where seams are prevalent. Lightmass internal caching still requires lightmap UVs and reasonable lightmap resolution.
* ForceSurface replaces bLightAsIfStatic
Improvements to Volumetric Lightmap quality needed for static geometry
* Stationary light shadowing is now dilated inside geometry
* Now doing two dilation passes since samples near geometry see inside due to ray start bias
* Refinement around geometry uses an expanded cell bounds when the geometry is going to use Volumetric Lightmaps, since cross-resolution stitching causes leaking
Lightmass debug primitives are now tied to a swarm task instead of global - allows debugging of Volumetric Lightmap tasks
Change 3792256 by Guillaume.Abadie
Fixes a bug where permutation was not actually serialised in FShader, so was ending up recompiling shader at every load.
Change 3792884 by Marcus.Wassmer
Copying //UE4/Partner-AMD to Dev-Rendering (//UE4/Dev-Rendering)
Change 3793200 by Marcus.Wassmer
Copying //UE4/Partner-IDV-SpeedTree to Dev-Rendering (//UE4/Dev-Rendering)
Speedtree 8 support
Change 3793206 by Brian.Karis
Added color grading control BlueCorrection to correct for artifacts with "electric" blues due to the ACEScg color space. Bright blue desaturates instead of going to violet.
Added color grading control ExpandGamut which expands bright saturated colors outside the sRGB gamut to fake wide gamut rendering.
ACES changes.
Change 3793344 by Marcus.Wassmer
Fix editortest compile
Change 3794285 by Guillaume.Abadie
Serializes PermutationId according to archive rendering version to avoid issues with old material that were serializing a shader map into UObject.
Change 3794307 by Guillaume.Abadie
Resaves uassets that were modified between 3789648 and 3794285
Change 3794627 by Mark.Satterthwaite
Implement two components for MTLPP, an IMP cache for Objective-C selector implementations & an interposition framework for those same selectors:
- imp_SelectorCache & friends provide the IMP caching for each of the Metal protocols which constitute most of the API, so far I've not covered the Metal classes used for the various descriptor/initializer types. Each type has its own IMPTable which caches the selector's implementation pointer and provides the mechanism to hook that implementation. As Objective-C is runtime dynamic this look up must be performed on the actual Class value returned by an object at runtime - you can't do this at compile time. Even things like NSString which appear compile-time static are really not as NSString is an alias for a class-cluster (NSString, NSMutableString, __NSInlineString and more).
- The interpose directory contains MTI* files which are the framework for interposing all the functions in Metal's runtime API - I deliberately omit the descriptor classes & read-only functions as there's no benefit to interposing them - which I can build off to create a trace tool or a superior validation layer. Right now this is Mac only as there'll be some problems to solve for iOS/tvOS due to difference in linking requirements - not insurmountable.
- Rebuild MTLPP's implementation of the C++ wrapper classes around the IMPTable's - this means we avoid all the objc_msgSend overhead for all the classes and functions whose implementations are cached. Right now the IMPTable is going to incur a look-up for all non-copy/move constructors which is suboptimal - ideally the Metal IMPTables would be cached in the Device object as they will be consistent within a single Device.
- Sort out the MTLPP availability logic - it now exports the availability warnings to the caller and internally just blithely assumes it may call the functions, the caller is responsible for ensuring that calls are made only on appropriate devices & OSes. This reduces MTLPP complexity and better fits how MetalRHI works.
- Fix a number of retain/release bugs that were lying dormant in MTLPP but exposed by the switch to IMPTables.
- Add tvOS support.
Next up, put this into MetalRHI and start fixing all the fallout.
Change 3794631 by Mark.Satterthwaite
Missed updating mtlpp's build.cs for TVOS.
Change 3794651 by Uriel.Doyon
UPointLightComponent::GetUnitsConversionFactor() now takes the cone angle as parameter. This allows to fix spotlight unit conversion when using lumens.
Change 3794720 by Guillaume.Abadie
Fixes a bug in Global{Bilinear,Trilinear}ClampedSampler that was actually doing a Point sampling.
Change 3794749 by Mark.Satterthwaite
Fix mtlpp.build.cs paths.
Change 3794856 by Mark.Satterthwaite
Fix some shadowing warnings.
Change 3795484 by Daniel.Wright
Implemented the Spherical Harmonic windowing algorithm from 'Stupid Spherical Harmonics (SH) Tricks'
New WorldSettings Lightmass property VolumetricLightmapSphericalHarmonicSmoothing controls the global amount of smoothing applied
Change 3795590 by Brian.Karis
Area light fixes
Fixed order of operations. This helps mixing of SourceRadius, SourceLength, and SoftSourceRadius.
Change 3796832 by Marcus.Wassmer
Correct shouldcache condition for new resolve shader
Change 3796884 by Marcus.Wassmer
Doing it right this time.
Change 3797196 by Mark.Satterthwaite
More updates to MTLPP to make things simpler and reduce the number of spurious Objective-C warnings that are emitted because of the way we are using the runtime.
Change 3797200 by Daniel.Wright
Lightmass now uses the highest density VolumetricLightmapDensityVolume settings that affect any part of a cell
Change 3797221 by Daniel.Wright
Reduced default SphericalHarmonicSmoothing based on RoboRecall tests. Now only active with strong direct lighting from static lights by default.
Change 3797411 by Brian.Karis
Disable ExpandGamut for old tone mapper.
Change 3797462 by Mark.Satterthwaite
More build warnings silenced after changing to the lowest possible deployment target OS for each library.
Change 3797585 by Mark.Satterthwaite
Range-based-For support in the NSArray wrapper.
Change 3797836 by Mark.Satterthwaite
Even more forward-declarations to avoid system headers poking through to the including code from mtlpp.
Change 3798027 by Mark.Satterthwaite
Fix handling of nil objects, on which no functions may be called, command-buffer retention and IMP declaration.
Change 3798154 by Mark.Satterthwaite
Fix some egregious memory leaks that rewriting to use mtlpp exposed before we carry on - don't want these slipping into 4.19.
Change 3800990 by Mark.Satterthwaite
Typedef all the completion-handler callback types in mtlpp to make future me's life easier.
Change 3801400 by Chris.Bunner
Improving automated test errors on failure to generate report data.
Change 3801726 by Mark.Satterthwaite
Correct some function availability and the command-buffer error status in mtlpp.
Change 3801808 by Chris.Bunner
Added DefaultScalability.ini to EngineTest that forces all quality levels to Engine default Epic for now to improve consistency.
Change 3801862 by Marcus.Wassmer
Update automated tests with color gamut change
Change 3802214 by Chris.Bunner
When running automated tests in and editor-locked PIE viewport, skip resizing as the editor can't handle this.
Added bindable delegate called when ScreenshotRequest is processed - Useful to allow screenshots to override and restore settings per capture.
#jira UE-53188
Change 3802243 by Chris.Bunner
Added button to automated test screenshot browser to add or replace all outstanding test reports if appropriate.
DeleteAllReports button is now only enabled whilst there are reports in the list.
Change 3802372 by Chris.Bunner
Updating more test screenshots.
Change 3803683 by Chris.Bunner
Adding more logging and multiple attempts to automated test report network save.
Added small wait on repeated operations that are known to fail.
Change 3803826 by Rolando.Caloca
DR - vk - Fix merge issue
Change 3804181 by Chris.Bunner
Tentative fix for CIS test failure.
Change 3804236 by Chris.Bunner
Additional logging for case where file write silently fails, report platform-specific error.
Change 3804303 by zachary.wilson
Cleaning up assets in QAGame saved with empty engine versions to resolve warnings seen when launching on
Change 3804410 by Chris.Bunner
Added additional logging when automated screenshot test fails due to size mismatch.
Mismatched bounds are colored red in the delta.
Change 3804455 by Mark.Satterthwaite
Fix a small number of persistent memory leaks on the Mac build that slowly consume more and more memory as you use the Editor - interacting with menu's was particularly egregious as each NSMenu would leak after you move away.
#jira NA
Change 3804667 by Chris.Bunner
Speculative CIS fixes.
Change 3806008 by Chris.Bunner
Partially reimplementing backed-out CL 3804181 to improve consistency of how automated screenshot test settings are applied/restored.
#tests CIS preflight job 8174412
Change 3806909 by Mark.Satterthwaite
Use the vertex-shader's in-out mask to ensure that we only validate legitmate vertex-streams in Metal's DrawIndexedPrimitive implementation.
#jira UE-53046
Change 3807059 by laz.matech
Checking in QAGame Rendering Map, QA-PhysicalLightingUnits, for testing Physical Light Units.
Wanted to get this in before copy up.
#Jira none
Change 3807726 by Chris.Bunner
Removed a check that we can't fix up. The check hits unbound buffers which it assumes means a failure but is actually due to m.v.fetch. We don't have the information available to know which are which removed from the input without reading from the shader.
#jira UE-53046
Change 3807800 by Guillaume.Abadie
Fixes some warning in shader headers.
Change 3807804 by Guillaume.Abadie
Back out changelist 3807800
Change 3807807 by Guillaume.Abadie
Relands shader header warnings.
Change 3808046 by Chris.Bunner
Dropping a new automated test error back to a warning as this may lead to genuine issues being ignored in the short term.
Change 3809579 by Chris.Bunner
Back out changelist 3774677.
#jira UE-53483
Change 3809620 by Chris.Bunner
Updating animated cloth test screenshot.
Change 3803629 by Chris.Bunner
Rebuilt CornellBox and DistanceField test maps, updated screenshots.
Change 3787045 by Guillaume.Abadie
Moves some global samplers to Common.ush
Change 3809756 by Chris.Bunner
Updating animated cloth test screenshot.
[CL 3809764 by Chris Bunner in Main branch]
2017-12-15 12:47:47 -05:00
FGlobalShader : : ModifyCompilationEnvironment ( Parameters , OutEnvironment ) ;
2021-06-14 12:46:26 -04:00
TileIntersectionModifyCompilationEnvironment ( Parameters . Platform , OutEnvironment ) ;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
OutEnvironment . SetDefine ( TEXT ( " DOWNSAMPLE_FACTOR " ) , GAODownsampleFactor ) ;
}
} ;
2021-05-24 14:08:15 -04:00
IMPLEMENT_GLOBAL_SHADER ( FObjectCullPS , " /Engine/Private/DistanceFieldObjectCulling.usf " , " ObjectCullPS " , SF_Pixel ) ;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
2021-06-14 12:46:26 -04:00
BEGIN_SHADER_PARAMETER_STRUCT ( FObjectCullParameters , )
SHADER_PARAMETER_STRUCT_INCLUDE ( FObjectCullVS : : FParameters , VS )
SHADER_PARAMETER_STRUCT_INCLUDE ( FObjectCullPS : : FParameters , PS )
SHADER_PARAMETER_RDG_UNIFORM_BUFFER ( FSceneTextureUniformParameters , SceneTextures )
RDG_BUFFER_ACCESS ( ObjectIndirectArguments , ERHIAccess : : IndirectArgs )
RENDER_TARGET_BINDING_SLOTS ( )
END_SHADER_PARAMETER_STRUCT ( )
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
const uint32 ComputeStartOffsetGroupSize = 64 ;
/** */
class FComputeCulledTilesStartOffsetCS : public FGlobalShader
{
2021-06-14 12:46:26 -04:00
DECLARE_GLOBAL_SHADER ( FComputeCulledTilesStartOffsetCS ) ;
SHADER_USE_PARAMETER_STRUCT ( FComputeCulledTilesStartOffsetCS , FGlobalShader ) ;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
public :
2021-06-14 12:46:26 -04:00
BEGIN_SHADER_PARAMETER_STRUCT ( FParameters , )
SHADER_PARAMETER_STRUCT_REF ( FViewUniformShaderParameters , View )
SHADER_PARAMETER_STRUCT_INCLUDE ( FTileIntersectionParameters , TileIntersectionParameters )
SHADER_PARAMETER_STRUCT_INCLUDE ( FDistanceFieldCulledObjectBufferParameters , DistanceFieldCulledObjectBuffers )
SHADER_PARAMETER_STRUCT_INCLUDE ( FDistanceFieldAtlasParameters , DistanceFieldAtlas )
SHADER_PARAMETER_RDG_UNIFORM_BUFFER ( FSceneTextureUniformParameters , SceneTextures )
END_SHADER_PARAMETER_STRUCT ( )
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3809756)
#rb None
#lockdown Nick.Penwarden
============================
MAJOR FEATURES & CHANGES
============================
Change 3629223 by Rolando.Caloca
DR - Rollback //UE4/Dev-Rendering/Engine/Source/Runtime/VulkanRHI to changelist 3627847
Change 3629708 by Rolando.Caloca
DR - vk - Redo some changes from DevMobile
3601439
3604186
3606672
3617383
3617474
3617483
Change 3761370 by Arne.Schober
DR - Added CityHash to use with conatiners and stuff. It provides good performance and high quallity across multiple platforms.
Change 3761437 by Guillaume.Abadie
Optimises motion blur compute shader for consoles.
Change 3761483 by Guillaume.Abadie
Fixes D3D11 RHI lying to dynamic resolution heuristic with t.MaxFPS.
Change 3761995 by Mark.Satterthwaite
Add the Metal compiler path to the local .pch filename to avoid problems when Xcode moves.
Change 3761996 by Mark.Satterthwaite
Emit more details when a pixel shader is found to have no outputs at all which Metal doesn't permit. More likely this is a bug in the shader compiler not configuring the in-out mask correctly...
#jira UE-52292
Change 3761999 by Mark.Satterthwaite
No need to avoid tessellation for FMetalRHICommandContext::RHIEndDrawIndexedPrimitiveUP anymore - that was from back when the tessellation logic was replicated in each RHI*Draw* implementation.
#jira UE-51937
Change 3762181 by Joe.Graf
Changed MaxShaderJobBatchSize to 25 on Mac as it reduced shader compile time by 21%
Change 3762607 by Mark.Satterthwaite
Remove accidentally included changes from 3761995.
Change 3762612 by Mark.Satterthwaite
Enable the explicit sincos intrinsic for Metal to avoid instances of UE-52477 that can cause shaders to compile incorrectly through hlslcc.
#jira UE-52477
Change 3762772 by Michael.Lentine
Move RHI calls to render thread.
Change 3763021 by Richard.Wallis
Remove shader cache tool project and implementation.
#jira UE-51613
Change 3763082 by Guillaume.Abadie
More SceneTexture, SceneColor and SceneDepth automated tests
Change 3763111 by Richard.Wallis
Clone of CL 3763033 (Release-4.18):
Fix for crash upon launching packaged game on Mac with Share Material Shader Code enabled.
#jira UE-52121
Change 3763657 by Michael.Lentine
Invalidate ddc for skeletal mesh render data so that the duplicated vertex render structures are properly serialized.
Change 3763727 by Jian.Ru
Fix Player Collision view mode. It is caused by checking an uninitialized vertex buffer so the check always fail.
#jira UE-52052
Change 3763738 by Guillaume.Abadie
Implements SSR input post process material location.
Change 3764271 by Mark.Satterthwaite
Allow ControlPointPatch lists to flow through MetalRHI as it was setup to handle this transparently - the VSHS compute shader will convert them to triangles to draw. Report the same warning as in the pipeline creation stage as this hasn't been formally validated.
#jira UE-52454
Change 3764316 by Daniel.Wright
Added AVolumetricLightmapDensityVolume - gives local control over Volumetric Lightmap density. Dropping the top mip outside of the play area in Monolith saves 20Mb (35Mb original).
Volumetric Lightmap no longer refines around static translucent geometry - saves 5Mb in Monolith
Reworked brick culling by error mechanism. Now compares error to interpolated parent lighting instead of the brick average - prevents dropping constant value bricks which are near a wall and cause leaking due to parent interpolation after being culled.
Change 3764318 by Daniel.Wright
Missing file
Change 3764321 by Daniel.Wright
Shader compiling memory optimizations
* Editor memory: Sharing uniform buffer includes and GeneratedInstancedStereo.ush per FShaderType (was previously duplicated per FShader job)
* SCW input size: Sharing uniform buffer includes and SharedEnvironment per batch
* 7.6Gb of shader job inputs in memory -> .5Gb (13x less) when doing a full shader compile of Paragon Editor
* 13.8Gb written into worker input files -> 2.9Gb (4.7x less). Global shaders are never batched when sent to SCW so unoptimized by these changes.
Change 3764595 by Daniel.Wright
Added VolumetricLightmapDensityVolume asset icons
Change 3764701 by Michael.Lentine
Add duplicated vertices merging for meshmerge.
Change 3766002 by Guillaume.Abadie
Fixes a crash in translucency.
Change 3766007 by Guillaume.Abadie
Oups.... Fixes compilation failure.
Change 3766697 by Guillaume.Abadie
Giant refactor of global shader interface for upcoming native support of permutation.
CL generated by python script.
Change 3767205 by Chris.Bunner
Deferring FMaterial::RenderingThreadShaderMap update to render-thread rather than assumption commands have been flushed.
#jira UE-50652
Change 3767207 by Chris.Bunner
Clamp fetched texture coordinates to those available on the mesh.
Change 3767209 by Chris.Bunner
PR #4203: Early-outs in UMaterialInstance parameter setters (Contributed by stefanzimecki)
#jira UE-52193
Change 3767772 by Mark.Satterthwaite
MetalShaderFormat will no longer fallback to text shaders when you ask it to compile to bytecode but the bytecode compiler is not available (either locally or remotely) - this ensures that the DDC can't be poisoned by incorrectly configured clients. The Editor is already setup such that if the remote shader compiler is not configured & Xcode is not available locally the shader-compiler will be invoked to generate text shaders.
#jira UE-52554
Change 3768604 by Guillaume.Abadie
Polish up with new global shader function signature.
Change 3768993 by Guillaume.Abadie
Fixes r.Upscale.Panini cvars
Change 3769478 by Mark.Satterthwaite
Move the ue4_stdlib.metal & PCH into a temporary directory that exists for the lifetime of the SCW on the remote side as well as the local one and add this path as an include directory.
#jira UE-52587
Change 3769703 by Mark.Satterthwaite
For all Metal platforms >= Metal v1.2 transform mul(a,b) into fma(a,b,0) to prevent the Apple compiler reordering operations differently between the base & depth passes which results in variance in the position output.
For iOS disable fast-math when the vertex-shader uses World-Position-Offset because there are additional problems on the iOS shader compiler that result in position variance even with the above fix - WPO performance will suffer but I don't have any alternatives.
Remove the depth-offset hack from the depth-only vertex shader again.
#jira UES-5651
Change 3769763 by Mark.Satterthwaite
Handle swizzle's in the hlslcc fma identification pass so that we reduce the number of instructions and the platform compiler can't break the instructions up.
Change 3769849 by Mark.Satterthwaite
Fix CIS error.
Change 3770517 by Richard.Wallis
Fix for crash when creating a new media texture (AppleIntelHD5000GraphicsMTLDriver!SamplerStage::bindSamplerToTexture()). Missing texture resource for binding. Old InitDynamicRHI() code has been refactored out into seperate functions which leaves us on Mac with a NULL resource initially after creation which Metal doesn't like. This fix puts InitDynamicRHI down the default setup/clear path which inits default resources - I don't think we should use a global dummy in this instance as this is a render target.
#jira UE-51940
Change 3770688 by Uriel.Doyon
Fixed texture resolution returning 0 when running blueprint construction scripts at cook time.
Change 3771115 by Mark.Satterthwaite
Report errors from failed attempts to compile global shaders or we can't see why things fail on non-Windows platforms.
Change 3771263 by Mark.Satterthwaite
Change the way ManualVertexFetch is enabled on Metal platforms so that it is enabled when targeting Metal v1.2 and higher (macOS 10.12+/iOS 10+). This brings iOS in the Desktop Forward renderer back into line with the Mac.
#jira UERNDR-300
Change 3773472 by Guillaume.Abadie
Fixes a crash on PIE of SimpleComposure project.
Change 3773475 by Guillaume.Abadie
Fixes bug in editor viewport caused by SSR input changes.
Change 3774677 by Arne.Schober
DR - Deprecated SetLocal from the RHICmdlist
Fixed some unnecessary PSO collisions.
Change 3777037 by Mark.Satterthwaite
Remove incorrect change that caused a reference to "accurate::sincos" to appear in some Metal shaders rather than "precise::sincos".
Change 3777122 by Mark.Satterthwaite
Back out changelist 3777037 - I'm blind and wasn't seeing the real problem was a stale shader cache...
Change 3777196 by Mark.Satterthwaite
Fix text-shader compilation on iOS 10 - maybe iOS 9 too (untested!).
We need our own make_scalar type-trait template for ue4_stdlib.metal so that we still compile with older iOS runtime compilers and we can't use as_type to directly implement the packHalf2x16/unpackHalf2x16 intrinsics for these older runtime compilers either.
Change 3779098 by Rolando.Caloca
DR - vk - Fix query index
Change 3779275 by Mark.Satterthwaite
Silence the Metal runtime compiler warning caused by use of a deprecated enum value when running text shaders compiled for Metal v1.0/1.1 on a Metal v1.2+ OS.
#jira UE-52554
Change 3779427 by Rolando.Caloca
DR - vk - Fix for allocator contention
Change 3779608 by Uriel.Doyon
Fixed invalid access in the resave package commantlet when building texture streaming material data for materials enabling tesselation.
Change 3784496 by Mark.Satterthwaite
Temporarily disable USE_OBJECT_COMPOSITING_TILE_CULLING for Metal shader compilation only - other platforms are unaffected - as it isn't working properly for some reason. need to work out what's up but don't want Distance Fields to be completely snookered in the interim.
#jira UE-52952
Change 3784608 by Rolando.Caloca
DR - Copy 3784588
- Fix for drivers returning out of date swapchains during resizes
Change 3784734 by Mark.Satterthwaite
Real fix for UE-52952 - MetalShaderFormat wasn't propagating the full thread-group value.
#jira UE-52952
Change 3784741 by Mark.Satterthwaite
More Metal debugging commandline options "-metalfastmath" & "-metalnofastmath" to force fast-math on or off for all shaders, must be using runtime-compiled shaders (i.e. -metalshaderdebug or r.Shaders.Optimise=0) to take effect.
Change 3787103 by Guillaume.Abadie
Kills BuiltinSamplers UB
Change 3787207 by Guillaume.Abadie
Sorry, compile fix that were fine with local changes...
Change 3787396 by Marcus.Wassmer
PR #4271: UE-52901: Set VIS_Max meta to hidden (Contributed by projectgheist)
Change 3788028 by Peter.Sumanaseni
Working linear HDR exr output from sequencer
Change 3788536 by Mark.Satterthwaite
Track whether the Metal shader uses the discard_fragment function as when this is used but without any other outputs we know we need to bind at least one render-target or a depth-stencil surface but we don't know which. This lets us correctly error when we encounter a shader with no outputs at all which Metal doesn't permit.
#jira UE-52292
Change 3788538 by Mark.Satterthwaite
Let's try mitigating UE-46604 on Nvidia by retaining resource references in the command-buffer. This shouldn't be necessary and isn't typically on other vendors but we haven't been able to reproduce this reliably enough to get to the bottom of it.
#jira UE-46604
Change 3789083 by Guillaume.Abadie
Implements global shader permutations. Example in ScreenSpaceReflections.cpp.
Change 3789090 by Guillaume.Abadie
Fixes linux build.
Change 3789106 by Guillaume.Abadie
Fixes compilation failure in niagara plugin.
Change 3789274 by Guillaume.Abadie
Avoid hit proxies to clobber TAA's hitsory.
#jira UE-52968
Change 3789380 by Guillaume.Abadie
Back out changelist 3789083: global shader permutation because compilation failure in clang.
Change 3789648 by Guillaume.Abadie
Relands global shader permutation, with clang support.
Change 3789712 by Guillaume.Abadie
Fixes TestImage show flag with TAAU on.
#jira UE-53061
Change 3791593 by Guillaume.Abadie
Reinvalidates shaders with shader permutations.
Change 3791884 by Daniel.Wright
Added BP setter for LowerHemisphereColor
Change 3791886 by Daniel.Wright
Added LightmapType to PrimitiveComponent
* ForceVolumetric allows forcing static geometry to use Volumetric Lightmaps, which can be useful on instanced foliage where seams are prevalent. Lightmass internal caching still requires lightmap UVs and reasonable lightmap resolution.
* ForceSurface replaces bLightAsIfStatic
Improvements to Volumetric Lightmap quality needed for static geometry
* Stationary light shadowing is now dilated inside geometry
* Now doing two dilation passes since samples near geometry see inside due to ray start bias
* Refinement around geometry uses an expanded cell bounds when the geometry is going to use Volumetric Lightmaps, since cross-resolution stitching causes leaking
Lightmass debug primitives are now tied to a swarm task instead of global - allows debugging of Volumetric Lightmap tasks
Change 3792256 by Guillaume.Abadie
Fixes a bug where permutation was not actually serialised in FShader, so was ending up recompiling shader at every load.
Change 3792884 by Marcus.Wassmer
Copying //UE4/Partner-AMD to Dev-Rendering (//UE4/Dev-Rendering)
Change 3793200 by Marcus.Wassmer
Copying //UE4/Partner-IDV-SpeedTree to Dev-Rendering (//UE4/Dev-Rendering)
Speedtree 8 support
Change 3793206 by Brian.Karis
Added color grading control BlueCorrection to correct for artifacts with "electric" blues due to the ACEScg color space. Bright blue desaturates instead of going to violet.
Added color grading control ExpandGamut which expands bright saturated colors outside the sRGB gamut to fake wide gamut rendering.
ACES changes.
Change 3793344 by Marcus.Wassmer
Fix editortest compile
Change 3794285 by Guillaume.Abadie
Serializes PermutationId according to archive rendering version to avoid issues with old material that were serializing a shader map into UObject.
Change 3794307 by Guillaume.Abadie
Resaves uassets that were modified between 3789648 and 3794285
Change 3794627 by Mark.Satterthwaite
Implement two components for MTLPP, an IMP cache for Objective-C selector implementations & an interposition framework for those same selectors:
- imp_SelectorCache & friends provide the IMP caching for each of the Metal protocols which constitute most of the API, so far I've not covered the Metal classes used for the various descriptor/initializer types. Each type has its own IMPTable which caches the selector's implementation pointer and provides the mechanism to hook that implementation. As Objective-C is runtime dynamic this look up must be performed on the actual Class value returned by an object at runtime - you can't do this at compile time. Even things like NSString which appear compile-time static are really not as NSString is an alias for a class-cluster (NSString, NSMutableString, __NSInlineString and more).
- The interpose directory contains MTI* files which are the framework for interposing all the functions in Metal's runtime API - I deliberately omit the descriptor classes & read-only functions as there's no benefit to interposing them - which I can build off to create a trace tool or a superior validation layer. Right now this is Mac only as there'll be some problems to solve for iOS/tvOS due to difference in linking requirements - not insurmountable.
- Rebuild MTLPP's implementation of the C++ wrapper classes around the IMPTable's - this means we avoid all the objc_msgSend overhead for all the classes and functions whose implementations are cached. Right now the IMPTable is going to incur a look-up for all non-copy/move constructors which is suboptimal - ideally the Metal IMPTables would be cached in the Device object as they will be consistent within a single Device.
- Sort out the MTLPP availability logic - it now exports the availability warnings to the caller and internally just blithely assumes it may call the functions, the caller is responsible for ensuring that calls are made only on appropriate devices & OSes. This reduces MTLPP complexity and better fits how MetalRHI works.
- Fix a number of retain/release bugs that were lying dormant in MTLPP but exposed by the switch to IMPTables.
- Add tvOS support.
Next up, put this into MetalRHI and start fixing all the fallout.
Change 3794631 by Mark.Satterthwaite
Missed updating mtlpp's build.cs for TVOS.
Change 3794651 by Uriel.Doyon
UPointLightComponent::GetUnitsConversionFactor() now takes the cone angle as parameter. This allows to fix spotlight unit conversion when using lumens.
Change 3794720 by Guillaume.Abadie
Fixes a bug in Global{Bilinear,Trilinear}ClampedSampler that was actually doing a Point sampling.
Change 3794749 by Mark.Satterthwaite
Fix mtlpp.build.cs paths.
Change 3794856 by Mark.Satterthwaite
Fix some shadowing warnings.
Change 3795484 by Daniel.Wright
Implemented the Spherical Harmonic windowing algorithm from 'Stupid Spherical Harmonics (SH) Tricks'
New WorldSettings Lightmass property VolumetricLightmapSphericalHarmonicSmoothing controls the global amount of smoothing applied
Change 3795590 by Brian.Karis
Area light fixes
Fixed order of operations. This helps mixing of SourceRadius, SourceLength, and SoftSourceRadius.
Change 3796832 by Marcus.Wassmer
Correct shouldcache condition for new resolve shader
Change 3796884 by Marcus.Wassmer
Doing it right this time.
Change 3797196 by Mark.Satterthwaite
More updates to MTLPP to make things simpler and reduce the number of spurious Objective-C warnings that are emitted because of the way we are using the runtime.
Change 3797200 by Daniel.Wright
Lightmass now uses the highest density VolumetricLightmapDensityVolume settings that affect any part of a cell
Change 3797221 by Daniel.Wright
Reduced default SphericalHarmonicSmoothing based on RoboRecall tests. Now only active with strong direct lighting from static lights by default.
Change 3797411 by Brian.Karis
Disable ExpandGamut for old tone mapper.
Change 3797462 by Mark.Satterthwaite
More build warnings silenced after changing to the lowest possible deployment target OS for each library.
Change 3797585 by Mark.Satterthwaite
Range-based-For support in the NSArray wrapper.
Change 3797836 by Mark.Satterthwaite
Even more forward-declarations to avoid system headers poking through to the including code from mtlpp.
Change 3798027 by Mark.Satterthwaite
Fix handling of nil objects, on which no functions may be called, command-buffer retention and IMP declaration.
Change 3798154 by Mark.Satterthwaite
Fix some egregious memory leaks that rewriting to use mtlpp exposed before we carry on - don't want these slipping into 4.19.
Change 3800990 by Mark.Satterthwaite
Typedef all the completion-handler callback types in mtlpp to make future me's life easier.
Change 3801400 by Chris.Bunner
Improving automated test errors on failure to generate report data.
Change 3801726 by Mark.Satterthwaite
Correct some function availability and the command-buffer error status in mtlpp.
Change 3801808 by Chris.Bunner
Added DefaultScalability.ini to EngineTest that forces all quality levels to Engine default Epic for now to improve consistency.
Change 3801862 by Marcus.Wassmer
Update automated tests with color gamut change
Change 3802214 by Chris.Bunner
When running automated tests in and editor-locked PIE viewport, skip resizing as the editor can't handle this.
Added bindable delegate called when ScreenshotRequest is processed - Useful to allow screenshots to override and restore settings per capture.
#jira UE-53188
Change 3802243 by Chris.Bunner
Added button to automated test screenshot browser to add or replace all outstanding test reports if appropriate.
DeleteAllReports button is now only enabled whilst there are reports in the list.
Change 3802372 by Chris.Bunner
Updating more test screenshots.
Change 3803683 by Chris.Bunner
Adding more logging and multiple attempts to automated test report network save.
Added small wait on repeated operations that are known to fail.
Change 3803826 by Rolando.Caloca
DR - vk - Fix merge issue
Change 3804181 by Chris.Bunner
Tentative fix for CIS test failure.
Change 3804236 by Chris.Bunner
Additional logging for case where file write silently fails, report platform-specific error.
Change 3804303 by zachary.wilson
Cleaning up assets in QAGame saved with empty engine versions to resolve warnings seen when launching on
Change 3804410 by Chris.Bunner
Added additional logging when automated screenshot test fails due to size mismatch.
Mismatched bounds are colored red in the delta.
Change 3804455 by Mark.Satterthwaite
Fix a small number of persistent memory leaks on the Mac build that slowly consume more and more memory as you use the Editor - interacting with menu's was particularly egregious as each NSMenu would leak after you move away.
#jira NA
Change 3804667 by Chris.Bunner
Speculative CIS fixes.
Change 3806008 by Chris.Bunner
Partially reimplementing backed-out CL 3804181 to improve consistency of how automated screenshot test settings are applied/restored.
#tests CIS preflight job 8174412
Change 3806909 by Mark.Satterthwaite
Use the vertex-shader's in-out mask to ensure that we only validate legitmate vertex-streams in Metal's DrawIndexedPrimitive implementation.
#jira UE-53046
Change 3807059 by laz.matech
Checking in QAGame Rendering Map, QA-PhysicalLightingUnits, for testing Physical Light Units.
Wanted to get this in before copy up.
#Jira none
Change 3807726 by Chris.Bunner
Removed a check that we can't fix up. The check hits unbound buffers which it assumes means a failure but is actually due to m.v.fetch. We don't have the information available to know which are which removed from the input without reading from the shader.
#jira UE-53046
Change 3807800 by Guillaume.Abadie
Fixes some warning in shader headers.
Change 3807804 by Guillaume.Abadie
Back out changelist 3807800
Change 3807807 by Guillaume.Abadie
Relands shader header warnings.
Change 3808046 by Chris.Bunner
Dropping a new automated test error back to a warning as this may lead to genuine issues being ignored in the short term.
Change 3809579 by Chris.Bunner
Back out changelist 3774677.
#jira UE-53483
Change 3809620 by Chris.Bunner
Updating animated cloth test screenshot.
Change 3803629 by Chris.Bunner
Rebuilt CornellBox and DistanceField test maps, updated screenshots.
Change 3787045 by Guillaume.Abadie
Moves some global samplers to Common.ush
Change 3809756 by Chris.Bunner
Updating animated cloth test screenshot.
[CL 3809764 by Chris Bunner in Main branch]
2017-12-15 12:47:47 -05:00
static bool ShouldCompilePermutation ( const FGlobalShaderPermutationParameters & Parameters )
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
{
2021-07-12 10:24:46 -04:00
return ShouldCompileDistanceFieldShaders ( Parameters . Platform ) ;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
}
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3809756)
#rb None
#lockdown Nick.Penwarden
============================
MAJOR FEATURES & CHANGES
============================
Change 3629223 by Rolando.Caloca
DR - Rollback //UE4/Dev-Rendering/Engine/Source/Runtime/VulkanRHI to changelist 3627847
Change 3629708 by Rolando.Caloca
DR - vk - Redo some changes from DevMobile
3601439
3604186
3606672
3617383
3617474
3617483
Change 3761370 by Arne.Schober
DR - Added CityHash to use with conatiners and stuff. It provides good performance and high quallity across multiple platforms.
Change 3761437 by Guillaume.Abadie
Optimises motion blur compute shader for consoles.
Change 3761483 by Guillaume.Abadie
Fixes D3D11 RHI lying to dynamic resolution heuristic with t.MaxFPS.
Change 3761995 by Mark.Satterthwaite
Add the Metal compiler path to the local .pch filename to avoid problems when Xcode moves.
Change 3761996 by Mark.Satterthwaite
Emit more details when a pixel shader is found to have no outputs at all which Metal doesn't permit. More likely this is a bug in the shader compiler not configuring the in-out mask correctly...
#jira UE-52292
Change 3761999 by Mark.Satterthwaite
No need to avoid tessellation for FMetalRHICommandContext::RHIEndDrawIndexedPrimitiveUP anymore - that was from back when the tessellation logic was replicated in each RHI*Draw* implementation.
#jira UE-51937
Change 3762181 by Joe.Graf
Changed MaxShaderJobBatchSize to 25 on Mac as it reduced shader compile time by 21%
Change 3762607 by Mark.Satterthwaite
Remove accidentally included changes from 3761995.
Change 3762612 by Mark.Satterthwaite
Enable the explicit sincos intrinsic for Metal to avoid instances of UE-52477 that can cause shaders to compile incorrectly through hlslcc.
#jira UE-52477
Change 3762772 by Michael.Lentine
Move RHI calls to render thread.
Change 3763021 by Richard.Wallis
Remove shader cache tool project and implementation.
#jira UE-51613
Change 3763082 by Guillaume.Abadie
More SceneTexture, SceneColor and SceneDepth automated tests
Change 3763111 by Richard.Wallis
Clone of CL 3763033 (Release-4.18):
Fix for crash upon launching packaged game on Mac with Share Material Shader Code enabled.
#jira UE-52121
Change 3763657 by Michael.Lentine
Invalidate ddc for skeletal mesh render data so that the duplicated vertex render structures are properly serialized.
Change 3763727 by Jian.Ru
Fix Player Collision view mode. It is caused by checking an uninitialized vertex buffer so the check always fail.
#jira UE-52052
Change 3763738 by Guillaume.Abadie
Implements SSR input post process material location.
Change 3764271 by Mark.Satterthwaite
Allow ControlPointPatch lists to flow through MetalRHI as it was setup to handle this transparently - the VSHS compute shader will convert them to triangles to draw. Report the same warning as in the pipeline creation stage as this hasn't been formally validated.
#jira UE-52454
Change 3764316 by Daniel.Wright
Added AVolumetricLightmapDensityVolume - gives local control over Volumetric Lightmap density. Dropping the top mip outside of the play area in Monolith saves 20Mb (35Mb original).
Volumetric Lightmap no longer refines around static translucent geometry - saves 5Mb in Monolith
Reworked brick culling by error mechanism. Now compares error to interpolated parent lighting instead of the brick average - prevents dropping constant value bricks which are near a wall and cause leaking due to parent interpolation after being culled.
Change 3764318 by Daniel.Wright
Missing file
Change 3764321 by Daniel.Wright
Shader compiling memory optimizations
* Editor memory: Sharing uniform buffer includes and GeneratedInstancedStereo.ush per FShaderType (was previously duplicated per FShader job)
* SCW input size: Sharing uniform buffer includes and SharedEnvironment per batch
* 7.6Gb of shader job inputs in memory -> .5Gb (13x less) when doing a full shader compile of Paragon Editor
* 13.8Gb written into worker input files -> 2.9Gb (4.7x less). Global shaders are never batched when sent to SCW so unoptimized by these changes.
Change 3764595 by Daniel.Wright
Added VolumetricLightmapDensityVolume asset icons
Change 3764701 by Michael.Lentine
Add duplicated vertices merging for meshmerge.
Change 3766002 by Guillaume.Abadie
Fixes a crash in translucency.
Change 3766007 by Guillaume.Abadie
Oups.... Fixes compilation failure.
Change 3766697 by Guillaume.Abadie
Giant refactor of global shader interface for upcoming native support of permutation.
CL generated by python script.
Change 3767205 by Chris.Bunner
Deferring FMaterial::RenderingThreadShaderMap update to render-thread rather than assumption commands have been flushed.
#jira UE-50652
Change 3767207 by Chris.Bunner
Clamp fetched texture coordinates to those available on the mesh.
Change 3767209 by Chris.Bunner
PR #4203: Early-outs in UMaterialInstance parameter setters (Contributed by stefanzimecki)
#jira UE-52193
Change 3767772 by Mark.Satterthwaite
MetalShaderFormat will no longer fallback to text shaders when you ask it to compile to bytecode but the bytecode compiler is not available (either locally or remotely) - this ensures that the DDC can't be poisoned by incorrectly configured clients. The Editor is already setup such that if the remote shader compiler is not configured & Xcode is not available locally the shader-compiler will be invoked to generate text shaders.
#jira UE-52554
Change 3768604 by Guillaume.Abadie
Polish up with new global shader function signature.
Change 3768993 by Guillaume.Abadie
Fixes r.Upscale.Panini cvars
Change 3769478 by Mark.Satterthwaite
Move the ue4_stdlib.metal & PCH into a temporary directory that exists for the lifetime of the SCW on the remote side as well as the local one and add this path as an include directory.
#jira UE-52587
Change 3769703 by Mark.Satterthwaite
For all Metal platforms >= Metal v1.2 transform mul(a,b) into fma(a,b,0) to prevent the Apple compiler reordering operations differently between the base & depth passes which results in variance in the position output.
For iOS disable fast-math when the vertex-shader uses World-Position-Offset because there are additional problems on the iOS shader compiler that result in position variance even with the above fix - WPO performance will suffer but I don't have any alternatives.
Remove the depth-offset hack from the depth-only vertex shader again.
#jira UES-5651
Change 3769763 by Mark.Satterthwaite
Handle swizzle's in the hlslcc fma identification pass so that we reduce the number of instructions and the platform compiler can't break the instructions up.
Change 3769849 by Mark.Satterthwaite
Fix CIS error.
Change 3770517 by Richard.Wallis
Fix for crash when creating a new media texture (AppleIntelHD5000GraphicsMTLDriver!SamplerStage::bindSamplerToTexture()). Missing texture resource for binding. Old InitDynamicRHI() code has been refactored out into seperate functions which leaves us on Mac with a NULL resource initially after creation which Metal doesn't like. This fix puts InitDynamicRHI down the default setup/clear path which inits default resources - I don't think we should use a global dummy in this instance as this is a render target.
#jira UE-51940
Change 3770688 by Uriel.Doyon
Fixed texture resolution returning 0 when running blueprint construction scripts at cook time.
Change 3771115 by Mark.Satterthwaite
Report errors from failed attempts to compile global shaders or we can't see why things fail on non-Windows platforms.
Change 3771263 by Mark.Satterthwaite
Change the way ManualVertexFetch is enabled on Metal platforms so that it is enabled when targeting Metal v1.2 and higher (macOS 10.12+/iOS 10+). This brings iOS in the Desktop Forward renderer back into line with the Mac.
#jira UERNDR-300
Change 3773472 by Guillaume.Abadie
Fixes a crash on PIE of SimpleComposure project.
Change 3773475 by Guillaume.Abadie
Fixes bug in editor viewport caused by SSR input changes.
Change 3774677 by Arne.Schober
DR - Deprecated SetLocal from the RHICmdlist
Fixed some unnecessary PSO collisions.
Change 3777037 by Mark.Satterthwaite
Remove incorrect change that caused a reference to "accurate::sincos" to appear in some Metal shaders rather than "precise::sincos".
Change 3777122 by Mark.Satterthwaite
Back out changelist 3777037 - I'm blind and wasn't seeing the real problem was a stale shader cache...
Change 3777196 by Mark.Satterthwaite
Fix text-shader compilation on iOS 10 - maybe iOS 9 too (untested!).
We need our own make_scalar type-trait template for ue4_stdlib.metal so that we still compile with older iOS runtime compilers and we can't use as_type to directly implement the packHalf2x16/unpackHalf2x16 intrinsics for these older runtime compilers either.
Change 3779098 by Rolando.Caloca
DR - vk - Fix query index
Change 3779275 by Mark.Satterthwaite
Silence the Metal runtime compiler warning caused by use of a deprecated enum value when running text shaders compiled for Metal v1.0/1.1 on a Metal v1.2+ OS.
#jira UE-52554
Change 3779427 by Rolando.Caloca
DR - vk - Fix for allocator contention
Change 3779608 by Uriel.Doyon
Fixed invalid access in the resave package commantlet when building texture streaming material data for materials enabling tesselation.
Change 3784496 by Mark.Satterthwaite
Temporarily disable USE_OBJECT_COMPOSITING_TILE_CULLING for Metal shader compilation only - other platforms are unaffected - as it isn't working properly for some reason. need to work out what's up but don't want Distance Fields to be completely snookered in the interim.
#jira UE-52952
Change 3784608 by Rolando.Caloca
DR - Copy 3784588
- Fix for drivers returning out of date swapchains during resizes
Change 3784734 by Mark.Satterthwaite
Real fix for UE-52952 - MetalShaderFormat wasn't propagating the full thread-group value.
#jira UE-52952
Change 3784741 by Mark.Satterthwaite
More Metal debugging commandline options "-metalfastmath" & "-metalnofastmath" to force fast-math on or off for all shaders, must be using runtime-compiled shaders (i.e. -metalshaderdebug or r.Shaders.Optimise=0) to take effect.
Change 3787103 by Guillaume.Abadie
Kills BuiltinSamplers UB
Change 3787207 by Guillaume.Abadie
Sorry, compile fix that were fine with local changes...
Change 3787396 by Marcus.Wassmer
PR #4271: UE-52901: Set VIS_Max meta to hidden (Contributed by projectgheist)
Change 3788028 by Peter.Sumanaseni
Working linear HDR exr output from sequencer
Change 3788536 by Mark.Satterthwaite
Track whether the Metal shader uses the discard_fragment function as when this is used but without any other outputs we know we need to bind at least one render-target or a depth-stencil surface but we don't know which. This lets us correctly error when we encounter a shader with no outputs at all which Metal doesn't permit.
#jira UE-52292
Change 3788538 by Mark.Satterthwaite
Let's try mitigating UE-46604 on Nvidia by retaining resource references in the command-buffer. This shouldn't be necessary and isn't typically on other vendors but we haven't been able to reproduce this reliably enough to get to the bottom of it.
#jira UE-46604
Change 3789083 by Guillaume.Abadie
Implements global shader permutations. Example in ScreenSpaceReflections.cpp.
Change 3789090 by Guillaume.Abadie
Fixes linux build.
Change 3789106 by Guillaume.Abadie
Fixes compilation failure in niagara plugin.
Change 3789274 by Guillaume.Abadie
Avoid hit proxies to clobber TAA's hitsory.
#jira UE-52968
Change 3789380 by Guillaume.Abadie
Back out changelist 3789083: global shader permutation because compilation failure in clang.
Change 3789648 by Guillaume.Abadie
Relands global shader permutation, with clang support.
Change 3789712 by Guillaume.Abadie
Fixes TestImage show flag with TAAU on.
#jira UE-53061
Change 3791593 by Guillaume.Abadie
Reinvalidates shaders with shader permutations.
Change 3791884 by Daniel.Wright
Added BP setter for LowerHemisphereColor
Change 3791886 by Daniel.Wright
Added LightmapType to PrimitiveComponent
* ForceVolumetric allows forcing static geometry to use Volumetric Lightmaps, which can be useful on instanced foliage where seams are prevalent. Lightmass internal caching still requires lightmap UVs and reasonable lightmap resolution.
* ForceSurface replaces bLightAsIfStatic
Improvements to Volumetric Lightmap quality needed for static geometry
* Stationary light shadowing is now dilated inside geometry
* Now doing two dilation passes since samples near geometry see inside due to ray start bias
* Refinement around geometry uses an expanded cell bounds when the geometry is going to use Volumetric Lightmaps, since cross-resolution stitching causes leaking
Lightmass debug primitives are now tied to a swarm task instead of global - allows debugging of Volumetric Lightmap tasks
Change 3792256 by Guillaume.Abadie
Fixes a bug where permutation was not actually serialised in FShader, so was ending up recompiling shader at every load.
Change 3792884 by Marcus.Wassmer
Copying //UE4/Partner-AMD to Dev-Rendering (//UE4/Dev-Rendering)
Change 3793200 by Marcus.Wassmer
Copying //UE4/Partner-IDV-SpeedTree to Dev-Rendering (//UE4/Dev-Rendering)
Speedtree 8 support
Change 3793206 by Brian.Karis
Added color grading control BlueCorrection to correct for artifacts with "electric" blues due to the ACEScg color space. Bright blue desaturates instead of going to violet.
Added color grading control ExpandGamut which expands bright saturated colors outside the sRGB gamut to fake wide gamut rendering.
ACES changes.
Change 3793344 by Marcus.Wassmer
Fix editortest compile
Change 3794285 by Guillaume.Abadie
Serializes PermutationId according to archive rendering version to avoid issues with old material that were serializing a shader map into UObject.
Change 3794307 by Guillaume.Abadie
Resaves uassets that were modified between 3789648 and 3794285
Change 3794627 by Mark.Satterthwaite
Implement two components for MTLPP, an IMP cache for Objective-C selector implementations & an interposition framework for those same selectors:
- imp_SelectorCache & friends provide the IMP caching for each of the Metal protocols which constitute most of the API, so far I've not covered the Metal classes used for the various descriptor/initializer types. Each type has its own IMPTable which caches the selector's implementation pointer and provides the mechanism to hook that implementation. As Objective-C is runtime dynamic this look up must be performed on the actual Class value returned by an object at runtime - you can't do this at compile time. Even things like NSString which appear compile-time static are really not as NSString is an alias for a class-cluster (NSString, NSMutableString, __NSInlineString and more).
- The interpose directory contains MTI* files which are the framework for interposing all the functions in Metal's runtime API - I deliberately omit the descriptor classes & read-only functions as there's no benefit to interposing them - which I can build off to create a trace tool or a superior validation layer. Right now this is Mac only as there'll be some problems to solve for iOS/tvOS due to difference in linking requirements - not insurmountable.
- Rebuild MTLPP's implementation of the C++ wrapper classes around the IMPTable's - this means we avoid all the objc_msgSend overhead for all the classes and functions whose implementations are cached. Right now the IMPTable is going to incur a look-up for all non-copy/move constructors which is suboptimal - ideally the Metal IMPTables would be cached in the Device object as they will be consistent within a single Device.
- Sort out the MTLPP availability logic - it now exports the availability warnings to the caller and internally just blithely assumes it may call the functions, the caller is responsible for ensuring that calls are made only on appropriate devices & OSes. This reduces MTLPP complexity and better fits how MetalRHI works.
- Fix a number of retain/release bugs that were lying dormant in MTLPP but exposed by the switch to IMPTables.
- Add tvOS support.
Next up, put this into MetalRHI and start fixing all the fallout.
Change 3794631 by Mark.Satterthwaite
Missed updating mtlpp's build.cs for TVOS.
Change 3794651 by Uriel.Doyon
UPointLightComponent::GetUnitsConversionFactor() now takes the cone angle as parameter. This allows to fix spotlight unit conversion when using lumens.
Change 3794720 by Guillaume.Abadie
Fixes a bug in Global{Bilinear,Trilinear}ClampedSampler that was actually doing a Point sampling.
Change 3794749 by Mark.Satterthwaite
Fix mtlpp.build.cs paths.
Change 3794856 by Mark.Satterthwaite
Fix some shadowing warnings.
Change 3795484 by Daniel.Wright
Implemented the Spherical Harmonic windowing algorithm from 'Stupid Spherical Harmonics (SH) Tricks'
New WorldSettings Lightmass property VolumetricLightmapSphericalHarmonicSmoothing controls the global amount of smoothing applied
Change 3795590 by Brian.Karis
Area light fixes
Fixed order of operations. This helps mixing of SourceRadius, SourceLength, and SoftSourceRadius.
Change 3796832 by Marcus.Wassmer
Correct shouldcache condition for new resolve shader
Change 3796884 by Marcus.Wassmer
Doing it right this time.
Change 3797196 by Mark.Satterthwaite
More updates to MTLPP to make things simpler and reduce the number of spurious Objective-C warnings that are emitted because of the way we are using the runtime.
Change 3797200 by Daniel.Wright
Lightmass now uses the highest density VolumetricLightmapDensityVolume settings that affect any part of a cell
Change 3797221 by Daniel.Wright
Reduced default SphericalHarmonicSmoothing based on RoboRecall tests. Now only active with strong direct lighting from static lights by default.
Change 3797411 by Brian.Karis
Disable ExpandGamut for old tone mapper.
Change 3797462 by Mark.Satterthwaite
More build warnings silenced after changing to the lowest possible deployment target OS for each library.
Change 3797585 by Mark.Satterthwaite
Range-based-For support in the NSArray wrapper.
Change 3797836 by Mark.Satterthwaite
Even more forward-declarations to avoid system headers poking through to the including code from mtlpp.
Change 3798027 by Mark.Satterthwaite
Fix handling of nil objects, on which no functions may be called, command-buffer retention and IMP declaration.
Change 3798154 by Mark.Satterthwaite
Fix some egregious memory leaks that rewriting to use mtlpp exposed before we carry on - don't want these slipping into 4.19.
Change 3800990 by Mark.Satterthwaite
Typedef all the completion-handler callback types in mtlpp to make future me's life easier.
Change 3801400 by Chris.Bunner
Improving automated test errors on failure to generate report data.
Change 3801726 by Mark.Satterthwaite
Correct some function availability and the command-buffer error status in mtlpp.
Change 3801808 by Chris.Bunner
Added DefaultScalability.ini to EngineTest that forces all quality levels to Engine default Epic for now to improve consistency.
Change 3801862 by Marcus.Wassmer
Update automated tests with color gamut change
Change 3802214 by Chris.Bunner
When running automated tests in and editor-locked PIE viewport, skip resizing as the editor can't handle this.
Added bindable delegate called when ScreenshotRequest is processed - Useful to allow screenshots to override and restore settings per capture.
#jira UE-53188
Change 3802243 by Chris.Bunner
Added button to automated test screenshot browser to add or replace all outstanding test reports if appropriate.
DeleteAllReports button is now only enabled whilst there are reports in the list.
Change 3802372 by Chris.Bunner
Updating more test screenshots.
Change 3803683 by Chris.Bunner
Adding more logging and multiple attempts to automated test report network save.
Added small wait on repeated operations that are known to fail.
Change 3803826 by Rolando.Caloca
DR - vk - Fix merge issue
Change 3804181 by Chris.Bunner
Tentative fix for CIS test failure.
Change 3804236 by Chris.Bunner
Additional logging for case where file write silently fails, report platform-specific error.
Change 3804303 by zachary.wilson
Cleaning up assets in QAGame saved with empty engine versions to resolve warnings seen when launching on
Change 3804410 by Chris.Bunner
Added additional logging when automated screenshot test fails due to size mismatch.
Mismatched bounds are colored red in the delta.
Change 3804455 by Mark.Satterthwaite
Fix a small number of persistent memory leaks on the Mac build that slowly consume more and more memory as you use the Editor - interacting with menu's was particularly egregious as each NSMenu would leak after you move away.
#jira NA
Change 3804667 by Chris.Bunner
Speculative CIS fixes.
Change 3806008 by Chris.Bunner
Partially reimplementing backed-out CL 3804181 to improve consistency of how automated screenshot test settings are applied/restored.
#tests CIS preflight job 8174412
Change 3806909 by Mark.Satterthwaite
Use the vertex-shader's in-out mask to ensure that we only validate legitmate vertex-streams in Metal's DrawIndexedPrimitive implementation.
#jira UE-53046
Change 3807059 by laz.matech
Checking in QAGame Rendering Map, QA-PhysicalLightingUnits, for testing Physical Light Units.
Wanted to get this in before copy up.
#Jira none
Change 3807726 by Chris.Bunner
Removed a check that we can't fix up. The check hits unbound buffers which it assumes means a failure but is actually due to m.v.fetch. We don't have the information available to know which are which removed from the input without reading from the shader.
#jira UE-53046
Change 3807800 by Guillaume.Abadie
Fixes some warning in shader headers.
Change 3807804 by Guillaume.Abadie
Back out changelist 3807800
Change 3807807 by Guillaume.Abadie
Relands shader header warnings.
Change 3808046 by Chris.Bunner
Dropping a new automated test error back to a warning as this may lead to genuine issues being ignored in the short term.
Change 3809579 by Chris.Bunner
Back out changelist 3774677.
#jira UE-53483
Change 3809620 by Chris.Bunner
Updating animated cloth test screenshot.
Change 3803629 by Chris.Bunner
Rebuilt CornellBox and DistanceField test maps, updated screenshots.
Change 3787045 by Guillaume.Abadie
Moves some global samplers to Common.ush
Change 3809756 by Chris.Bunner
Updating animated cloth test screenshot.
[CL 3809764 by Chris Bunner in Main branch]
2017-12-15 12:47:47 -05:00
static void ModifyCompilationEnvironment ( const FGlobalShaderPermutationParameters & Parameters , FShaderCompilerEnvironment & OutEnvironment )
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
{
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3809756)
#rb None
#lockdown Nick.Penwarden
============================
MAJOR FEATURES & CHANGES
============================
Change 3629223 by Rolando.Caloca
DR - Rollback //UE4/Dev-Rendering/Engine/Source/Runtime/VulkanRHI to changelist 3627847
Change 3629708 by Rolando.Caloca
DR - vk - Redo some changes from DevMobile
3601439
3604186
3606672
3617383
3617474
3617483
Change 3761370 by Arne.Schober
DR - Added CityHash to use with conatiners and stuff. It provides good performance and high quallity across multiple platforms.
Change 3761437 by Guillaume.Abadie
Optimises motion blur compute shader for consoles.
Change 3761483 by Guillaume.Abadie
Fixes D3D11 RHI lying to dynamic resolution heuristic with t.MaxFPS.
Change 3761995 by Mark.Satterthwaite
Add the Metal compiler path to the local .pch filename to avoid problems when Xcode moves.
Change 3761996 by Mark.Satterthwaite
Emit more details when a pixel shader is found to have no outputs at all which Metal doesn't permit. More likely this is a bug in the shader compiler not configuring the in-out mask correctly...
#jira UE-52292
Change 3761999 by Mark.Satterthwaite
No need to avoid tessellation for FMetalRHICommandContext::RHIEndDrawIndexedPrimitiveUP anymore - that was from back when the tessellation logic was replicated in each RHI*Draw* implementation.
#jira UE-51937
Change 3762181 by Joe.Graf
Changed MaxShaderJobBatchSize to 25 on Mac as it reduced shader compile time by 21%
Change 3762607 by Mark.Satterthwaite
Remove accidentally included changes from 3761995.
Change 3762612 by Mark.Satterthwaite
Enable the explicit sincos intrinsic for Metal to avoid instances of UE-52477 that can cause shaders to compile incorrectly through hlslcc.
#jira UE-52477
Change 3762772 by Michael.Lentine
Move RHI calls to render thread.
Change 3763021 by Richard.Wallis
Remove shader cache tool project and implementation.
#jira UE-51613
Change 3763082 by Guillaume.Abadie
More SceneTexture, SceneColor and SceneDepth automated tests
Change 3763111 by Richard.Wallis
Clone of CL 3763033 (Release-4.18):
Fix for crash upon launching packaged game on Mac with Share Material Shader Code enabled.
#jira UE-52121
Change 3763657 by Michael.Lentine
Invalidate ddc for skeletal mesh render data so that the duplicated vertex render structures are properly serialized.
Change 3763727 by Jian.Ru
Fix Player Collision view mode. It is caused by checking an uninitialized vertex buffer so the check always fail.
#jira UE-52052
Change 3763738 by Guillaume.Abadie
Implements SSR input post process material location.
Change 3764271 by Mark.Satterthwaite
Allow ControlPointPatch lists to flow through MetalRHI as it was setup to handle this transparently - the VSHS compute shader will convert them to triangles to draw. Report the same warning as in the pipeline creation stage as this hasn't been formally validated.
#jira UE-52454
Change 3764316 by Daniel.Wright
Added AVolumetricLightmapDensityVolume - gives local control over Volumetric Lightmap density. Dropping the top mip outside of the play area in Monolith saves 20Mb (35Mb original).
Volumetric Lightmap no longer refines around static translucent geometry - saves 5Mb in Monolith
Reworked brick culling by error mechanism. Now compares error to interpolated parent lighting instead of the brick average - prevents dropping constant value bricks which are near a wall and cause leaking due to parent interpolation after being culled.
Change 3764318 by Daniel.Wright
Missing file
Change 3764321 by Daniel.Wright
Shader compiling memory optimizations
* Editor memory: Sharing uniform buffer includes and GeneratedInstancedStereo.ush per FShaderType (was previously duplicated per FShader job)
* SCW input size: Sharing uniform buffer includes and SharedEnvironment per batch
* 7.6Gb of shader job inputs in memory -> .5Gb (13x less) when doing a full shader compile of Paragon Editor
* 13.8Gb written into worker input files -> 2.9Gb (4.7x less). Global shaders are never batched when sent to SCW so unoptimized by these changes.
Change 3764595 by Daniel.Wright
Added VolumetricLightmapDensityVolume asset icons
Change 3764701 by Michael.Lentine
Add duplicated vertices merging for meshmerge.
Change 3766002 by Guillaume.Abadie
Fixes a crash in translucency.
Change 3766007 by Guillaume.Abadie
Oups.... Fixes compilation failure.
Change 3766697 by Guillaume.Abadie
Giant refactor of global shader interface for upcoming native support of permutation.
CL generated by python script.
Change 3767205 by Chris.Bunner
Deferring FMaterial::RenderingThreadShaderMap update to render-thread rather than assumption commands have been flushed.
#jira UE-50652
Change 3767207 by Chris.Bunner
Clamp fetched texture coordinates to those available on the mesh.
Change 3767209 by Chris.Bunner
PR #4203: Early-outs in UMaterialInstance parameter setters (Contributed by stefanzimecki)
#jira UE-52193
Change 3767772 by Mark.Satterthwaite
MetalShaderFormat will no longer fallback to text shaders when you ask it to compile to bytecode but the bytecode compiler is not available (either locally or remotely) - this ensures that the DDC can't be poisoned by incorrectly configured clients. The Editor is already setup such that if the remote shader compiler is not configured & Xcode is not available locally the shader-compiler will be invoked to generate text shaders.
#jira UE-52554
Change 3768604 by Guillaume.Abadie
Polish up with new global shader function signature.
Change 3768993 by Guillaume.Abadie
Fixes r.Upscale.Panini cvars
Change 3769478 by Mark.Satterthwaite
Move the ue4_stdlib.metal & PCH into a temporary directory that exists for the lifetime of the SCW on the remote side as well as the local one and add this path as an include directory.
#jira UE-52587
Change 3769703 by Mark.Satterthwaite
For all Metal platforms >= Metal v1.2 transform mul(a,b) into fma(a,b,0) to prevent the Apple compiler reordering operations differently between the base & depth passes which results in variance in the position output.
For iOS disable fast-math when the vertex-shader uses World-Position-Offset because there are additional problems on the iOS shader compiler that result in position variance even with the above fix - WPO performance will suffer but I don't have any alternatives.
Remove the depth-offset hack from the depth-only vertex shader again.
#jira UES-5651
Change 3769763 by Mark.Satterthwaite
Handle swizzle's in the hlslcc fma identification pass so that we reduce the number of instructions and the platform compiler can't break the instructions up.
Change 3769849 by Mark.Satterthwaite
Fix CIS error.
Change 3770517 by Richard.Wallis
Fix for crash when creating a new media texture (AppleIntelHD5000GraphicsMTLDriver!SamplerStage::bindSamplerToTexture()). Missing texture resource for binding. Old InitDynamicRHI() code has been refactored out into seperate functions which leaves us on Mac with a NULL resource initially after creation which Metal doesn't like. This fix puts InitDynamicRHI down the default setup/clear path which inits default resources - I don't think we should use a global dummy in this instance as this is a render target.
#jira UE-51940
Change 3770688 by Uriel.Doyon
Fixed texture resolution returning 0 when running blueprint construction scripts at cook time.
Change 3771115 by Mark.Satterthwaite
Report errors from failed attempts to compile global shaders or we can't see why things fail on non-Windows platforms.
Change 3771263 by Mark.Satterthwaite
Change the way ManualVertexFetch is enabled on Metal platforms so that it is enabled when targeting Metal v1.2 and higher (macOS 10.12+/iOS 10+). This brings iOS in the Desktop Forward renderer back into line with the Mac.
#jira UERNDR-300
Change 3773472 by Guillaume.Abadie
Fixes a crash on PIE of SimpleComposure project.
Change 3773475 by Guillaume.Abadie
Fixes bug in editor viewport caused by SSR input changes.
Change 3774677 by Arne.Schober
DR - Deprecated SetLocal from the RHICmdlist
Fixed some unnecessary PSO collisions.
Change 3777037 by Mark.Satterthwaite
Remove incorrect change that caused a reference to "accurate::sincos" to appear in some Metal shaders rather than "precise::sincos".
Change 3777122 by Mark.Satterthwaite
Back out changelist 3777037 - I'm blind and wasn't seeing the real problem was a stale shader cache...
Change 3777196 by Mark.Satterthwaite
Fix text-shader compilation on iOS 10 - maybe iOS 9 too (untested!).
We need our own make_scalar type-trait template for ue4_stdlib.metal so that we still compile with older iOS runtime compilers and we can't use as_type to directly implement the packHalf2x16/unpackHalf2x16 intrinsics for these older runtime compilers either.
Change 3779098 by Rolando.Caloca
DR - vk - Fix query index
Change 3779275 by Mark.Satterthwaite
Silence the Metal runtime compiler warning caused by use of a deprecated enum value when running text shaders compiled for Metal v1.0/1.1 on a Metal v1.2+ OS.
#jira UE-52554
Change 3779427 by Rolando.Caloca
DR - vk - Fix for allocator contention
Change 3779608 by Uriel.Doyon
Fixed invalid access in the resave package commantlet when building texture streaming material data for materials enabling tesselation.
Change 3784496 by Mark.Satterthwaite
Temporarily disable USE_OBJECT_COMPOSITING_TILE_CULLING for Metal shader compilation only - other platforms are unaffected - as it isn't working properly for some reason. need to work out what's up but don't want Distance Fields to be completely snookered in the interim.
#jira UE-52952
Change 3784608 by Rolando.Caloca
DR - Copy 3784588
- Fix for drivers returning out of date swapchains during resizes
Change 3784734 by Mark.Satterthwaite
Real fix for UE-52952 - MetalShaderFormat wasn't propagating the full thread-group value.
#jira UE-52952
Change 3784741 by Mark.Satterthwaite
More Metal debugging commandline options "-metalfastmath" & "-metalnofastmath" to force fast-math on or off for all shaders, must be using runtime-compiled shaders (i.e. -metalshaderdebug or r.Shaders.Optimise=0) to take effect.
Change 3787103 by Guillaume.Abadie
Kills BuiltinSamplers UB
Change 3787207 by Guillaume.Abadie
Sorry, compile fix that were fine with local changes...
Change 3787396 by Marcus.Wassmer
PR #4271: UE-52901: Set VIS_Max meta to hidden (Contributed by projectgheist)
Change 3788028 by Peter.Sumanaseni
Working linear HDR exr output from sequencer
Change 3788536 by Mark.Satterthwaite
Track whether the Metal shader uses the discard_fragment function as when this is used but without any other outputs we know we need to bind at least one render-target or a depth-stencil surface but we don't know which. This lets us correctly error when we encounter a shader with no outputs at all which Metal doesn't permit.
#jira UE-52292
Change 3788538 by Mark.Satterthwaite
Let's try mitigating UE-46604 on Nvidia by retaining resource references in the command-buffer. This shouldn't be necessary and isn't typically on other vendors but we haven't been able to reproduce this reliably enough to get to the bottom of it.
#jira UE-46604
Change 3789083 by Guillaume.Abadie
Implements global shader permutations. Example in ScreenSpaceReflections.cpp.
Change 3789090 by Guillaume.Abadie
Fixes linux build.
Change 3789106 by Guillaume.Abadie
Fixes compilation failure in niagara plugin.
Change 3789274 by Guillaume.Abadie
Avoid hit proxies to clobber TAA's hitsory.
#jira UE-52968
Change 3789380 by Guillaume.Abadie
Back out changelist 3789083: global shader permutation because compilation failure in clang.
Change 3789648 by Guillaume.Abadie
Relands global shader permutation, with clang support.
Change 3789712 by Guillaume.Abadie
Fixes TestImage show flag with TAAU on.
#jira UE-53061
Change 3791593 by Guillaume.Abadie
Reinvalidates shaders with shader permutations.
Change 3791884 by Daniel.Wright
Added BP setter for LowerHemisphereColor
Change 3791886 by Daniel.Wright
Added LightmapType to PrimitiveComponent
* ForceVolumetric allows forcing static geometry to use Volumetric Lightmaps, which can be useful on instanced foliage where seams are prevalent. Lightmass internal caching still requires lightmap UVs and reasonable lightmap resolution.
* ForceSurface replaces bLightAsIfStatic
Improvements to Volumetric Lightmap quality needed for static geometry
* Stationary light shadowing is now dilated inside geometry
* Now doing two dilation passes since samples near geometry see inside due to ray start bias
* Refinement around geometry uses an expanded cell bounds when the geometry is going to use Volumetric Lightmaps, since cross-resolution stitching causes leaking
Lightmass debug primitives are now tied to a swarm task instead of global - allows debugging of Volumetric Lightmap tasks
Change 3792256 by Guillaume.Abadie
Fixes a bug where permutation was not actually serialised in FShader, so was ending up recompiling shader at every load.
Change 3792884 by Marcus.Wassmer
Copying //UE4/Partner-AMD to Dev-Rendering (//UE4/Dev-Rendering)
Change 3793200 by Marcus.Wassmer
Copying //UE4/Partner-IDV-SpeedTree to Dev-Rendering (//UE4/Dev-Rendering)
Speedtree 8 support
Change 3793206 by Brian.Karis
Added color grading control BlueCorrection to correct for artifacts with "electric" blues due to the ACEScg color space. Bright blue desaturates instead of going to violet.
Added color grading control ExpandGamut which expands bright saturated colors outside the sRGB gamut to fake wide gamut rendering.
ACES changes.
Change 3793344 by Marcus.Wassmer
Fix editortest compile
Change 3794285 by Guillaume.Abadie
Serializes PermutationId according to archive rendering version to avoid issues with old material that were serializing a shader map into UObject.
Change 3794307 by Guillaume.Abadie
Resaves uassets that were modified between 3789648 and 3794285
Change 3794627 by Mark.Satterthwaite
Implement two components for MTLPP, an IMP cache for Objective-C selector implementations & an interposition framework for those same selectors:
- imp_SelectorCache & friends provide the IMP caching for each of the Metal protocols which constitute most of the API, so far I've not covered the Metal classes used for the various descriptor/initializer types. Each type has its own IMPTable which caches the selector's implementation pointer and provides the mechanism to hook that implementation. As Objective-C is runtime dynamic this look up must be performed on the actual Class value returned by an object at runtime - you can't do this at compile time. Even things like NSString which appear compile-time static are really not as NSString is an alias for a class-cluster (NSString, NSMutableString, __NSInlineString and more).
- The interpose directory contains MTI* files which are the framework for interposing all the functions in Metal's runtime API - I deliberately omit the descriptor classes & read-only functions as there's no benefit to interposing them - which I can build off to create a trace tool or a superior validation layer. Right now this is Mac only as there'll be some problems to solve for iOS/tvOS due to difference in linking requirements - not insurmountable.
- Rebuild MTLPP's implementation of the C++ wrapper classes around the IMPTable's - this means we avoid all the objc_msgSend overhead for all the classes and functions whose implementations are cached. Right now the IMPTable is going to incur a look-up for all non-copy/move constructors which is suboptimal - ideally the Metal IMPTables would be cached in the Device object as they will be consistent within a single Device.
- Sort out the MTLPP availability logic - it now exports the availability warnings to the caller and internally just blithely assumes it may call the functions, the caller is responsible for ensuring that calls are made only on appropriate devices & OSes. This reduces MTLPP complexity and better fits how MetalRHI works.
- Fix a number of retain/release bugs that were lying dormant in MTLPP but exposed by the switch to IMPTables.
- Add tvOS support.
Next up, put this into MetalRHI and start fixing all the fallout.
Change 3794631 by Mark.Satterthwaite
Missed updating mtlpp's build.cs for TVOS.
Change 3794651 by Uriel.Doyon
UPointLightComponent::GetUnitsConversionFactor() now takes the cone angle as parameter. This allows to fix spotlight unit conversion when using lumens.
Change 3794720 by Guillaume.Abadie
Fixes a bug in Global{Bilinear,Trilinear}ClampedSampler that was actually doing a Point sampling.
Change 3794749 by Mark.Satterthwaite
Fix mtlpp.build.cs paths.
Change 3794856 by Mark.Satterthwaite
Fix some shadowing warnings.
Change 3795484 by Daniel.Wright
Implemented the Spherical Harmonic windowing algorithm from 'Stupid Spherical Harmonics (SH) Tricks'
New WorldSettings Lightmass property VolumetricLightmapSphericalHarmonicSmoothing controls the global amount of smoothing applied
Change 3795590 by Brian.Karis
Area light fixes
Fixed order of operations. This helps mixing of SourceRadius, SourceLength, and SoftSourceRadius.
Change 3796832 by Marcus.Wassmer
Correct shouldcache condition for new resolve shader
Change 3796884 by Marcus.Wassmer
Doing it right this time.
Change 3797196 by Mark.Satterthwaite
More updates to MTLPP to make things simpler and reduce the number of spurious Objective-C warnings that are emitted because of the way we are using the runtime.
Change 3797200 by Daniel.Wright
Lightmass now uses the highest density VolumetricLightmapDensityVolume settings that affect any part of a cell
Change 3797221 by Daniel.Wright
Reduced default SphericalHarmonicSmoothing based on RoboRecall tests. Now only active with strong direct lighting from static lights by default.
Change 3797411 by Brian.Karis
Disable ExpandGamut for old tone mapper.
Change 3797462 by Mark.Satterthwaite
More build warnings silenced after changing to the lowest possible deployment target OS for each library.
Change 3797585 by Mark.Satterthwaite
Range-based-For support in the NSArray wrapper.
Change 3797836 by Mark.Satterthwaite
Even more forward-declarations to avoid system headers poking through to the including code from mtlpp.
Change 3798027 by Mark.Satterthwaite
Fix handling of nil objects, on which no functions may be called, command-buffer retention and IMP declaration.
Change 3798154 by Mark.Satterthwaite
Fix some egregious memory leaks that rewriting to use mtlpp exposed before we carry on - don't want these slipping into 4.19.
Change 3800990 by Mark.Satterthwaite
Typedef all the completion-handler callback types in mtlpp to make future me's life easier.
Change 3801400 by Chris.Bunner
Improving automated test errors on failure to generate report data.
Change 3801726 by Mark.Satterthwaite
Correct some function availability and the command-buffer error status in mtlpp.
Change 3801808 by Chris.Bunner
Added DefaultScalability.ini to EngineTest that forces all quality levels to Engine default Epic for now to improve consistency.
Change 3801862 by Marcus.Wassmer
Update automated tests with color gamut change
Change 3802214 by Chris.Bunner
When running automated tests in and editor-locked PIE viewport, skip resizing as the editor can't handle this.
Added bindable delegate called when ScreenshotRequest is processed - Useful to allow screenshots to override and restore settings per capture.
#jira UE-53188
Change 3802243 by Chris.Bunner
Added button to automated test screenshot browser to add or replace all outstanding test reports if appropriate.
DeleteAllReports button is now only enabled whilst there are reports in the list.
Change 3802372 by Chris.Bunner
Updating more test screenshots.
Change 3803683 by Chris.Bunner
Adding more logging and multiple attempts to automated test report network save.
Added small wait on repeated operations that are known to fail.
Change 3803826 by Rolando.Caloca
DR - vk - Fix merge issue
Change 3804181 by Chris.Bunner
Tentative fix for CIS test failure.
Change 3804236 by Chris.Bunner
Additional logging for case where file write silently fails, report platform-specific error.
Change 3804303 by zachary.wilson
Cleaning up assets in QAGame saved with empty engine versions to resolve warnings seen when launching on
Change 3804410 by Chris.Bunner
Added additional logging when automated screenshot test fails due to size mismatch.
Mismatched bounds are colored red in the delta.
Change 3804455 by Mark.Satterthwaite
Fix a small number of persistent memory leaks on the Mac build that slowly consume more and more memory as you use the Editor - interacting with menu's was particularly egregious as each NSMenu would leak after you move away.
#jira NA
Change 3804667 by Chris.Bunner
Speculative CIS fixes.
Change 3806008 by Chris.Bunner
Partially reimplementing backed-out CL 3804181 to improve consistency of how automated screenshot test settings are applied/restored.
#tests CIS preflight job 8174412
Change 3806909 by Mark.Satterthwaite
Use the vertex-shader's in-out mask to ensure that we only validate legitmate vertex-streams in Metal's DrawIndexedPrimitive implementation.
#jira UE-53046
Change 3807059 by laz.matech
Checking in QAGame Rendering Map, QA-PhysicalLightingUnits, for testing Physical Light Units.
Wanted to get this in before copy up.
#Jira none
Change 3807726 by Chris.Bunner
Removed a check that we can't fix up. The check hits unbound buffers which it assumes means a failure but is actually due to m.v.fetch. We don't have the information available to know which are which removed from the input without reading from the shader.
#jira UE-53046
Change 3807800 by Guillaume.Abadie
Fixes some warning in shader headers.
Change 3807804 by Guillaume.Abadie
Back out changelist 3807800
Change 3807807 by Guillaume.Abadie
Relands shader header warnings.
Change 3808046 by Chris.Bunner
Dropping a new automated test error back to a warning as this may lead to genuine issues being ignored in the short term.
Change 3809579 by Chris.Bunner
Back out changelist 3774677.
#jira UE-53483
Change 3809620 by Chris.Bunner
Updating animated cloth test screenshot.
Change 3803629 by Chris.Bunner
Rebuilt CornellBox and DistanceField test maps, updated screenshots.
Change 3787045 by Guillaume.Abadie
Moves some global samplers to Common.ush
Change 3809756 by Chris.Bunner
Updating animated cloth test screenshot.
[CL 3809764 by Chris Bunner in Main branch]
2017-12-15 12:47:47 -05:00
FGlobalShader : : ModifyCompilationEnvironment ( Parameters , OutEnvironment ) ;
2021-06-14 12:46:26 -04:00
TileIntersectionModifyCompilationEnvironment ( Parameters . Platform , OutEnvironment ) ;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
OutEnvironment . SetDefine ( TEXT ( " COMPUTE_START_OFFSET_GROUP_SIZE " ) , ComputeStartOffsetGroupSize ) ;
}
} ;
2021-06-14 12:46:26 -04:00
IMPLEMENT_GLOBAL_SHADER ( FComputeCulledTilesStartOffsetCS , " /Engine/Private/DistanceFieldObjectCulling.usf " , " ComputeCulledTilesStartOffsetCS " , SF_Compute ) ;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
2021-06-14 12:46:26 -04:00
void ScatterTilesToObjects (
FRDGBuilder & GraphBuilder ,
bool bCountingPass ,
const FViewInfo & View ,
const FDistanceFieldSceneData & DistanceFieldSceneData ,
FIntPoint TileListGroupSize ,
const FDistanceFieldAOParameters & Parameters ,
FRDGBufferRef ObjectIndirectArguments ,
const FDistanceFieldCulledObjectBufferParameters & CulledObjectBufferParameters ,
const FTileIntersectionParameters & TileIntersectionParameters ,
TRDGUniformBufferRef < FSceneTextureUniformParameters > SceneTexturesUniformBuffer )
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
{
2022-04-26 14:37:07 -04:00
FDistanceFieldObjectBufferParameters DistanceFieldObjectBuffers = DistanceField : : SetupObjectBufferParameters ( GraphBuilder , DistanceFieldSceneData ) ;
2022-01-31 10:23:36 -05:00
2021-05-24 14:08:15 -04:00
FObjectCullPS : : FPermutationDomain PermutationVector ;
PermutationVector . Set < FObjectCullPS : : FCountingPass > ( bCountingPass ) ;
auto VertexShader = View . ShaderMap - > GetShader < FObjectCullVS > ( ) ;
auto PixelShader = View . ShaderMap - > GetShader < FObjectCullPS > ( PermutationVector ) ;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
2021-06-14 12:46:26 -04:00
auto * PassParameters = GraphBuilder . AllocParameters < FObjectCullParameters > ( ) ;
2021-12-09 04:49:36 -05:00
PassParameters - > VS . View = GetShaderBinding ( View . ViewUniformBuffer ) ;
2022-01-31 10:23:36 -05:00
PassParameters - > VS . DistanceFieldObjectBuffers = DistanceFieldObjectBuffers ;
2021-06-14 12:46:26 -04:00
PassParameters - > VS . DistanceFieldCulledObjectBuffers = CulledObjectBufferParameters ;
2022-04-26 14:37:07 -04:00
PassParameters - > VS . DistanceFieldAtlas = DistanceField : : SetupAtlasParameters ( GraphBuilder , DistanceFieldSceneData ) ;
2021-12-09 04:49:36 -05:00
PassParameters - > VS . AOParameters = DistanceField : : SetupAOShaderParameters ( Parameters ) ;
{
const int32 NumRings = StencilingGeometry : : GLowPolyStencilSphereVertexBuffer . GetNumRings ( ) ;
const float RadiansPerRingSegment = PI / ( float ) NumRings ;
// Boost the effective radius so that the edges of the sphere approximation lie on the sphere, instead of the vertices
PassParameters - > VS . ConservativeRadiusScale = 1.0f / FMath : : Cos ( RadiansPerRingSegment ) ;
}
PassParameters - > PS . View = GetShaderBinding ( View . ViewUniformBuffer ) ;
2021-06-14 12:46:26 -04:00
PassParameters - > PS . TileIntersectionParameters = TileIntersectionParameters ;
2022-01-31 10:23:36 -05:00
PassParameters - > PS . DistanceFieldObjectBuffers = DistanceFieldObjectBuffers ;
2021-06-14 12:46:26 -04:00
PassParameters - > PS . DistanceFieldCulledObjectBuffers = CulledObjectBufferParameters ;
2022-04-26 14:37:07 -04:00
PassParameters - > PS . DistanceFieldAtlas = DistanceField : : SetupAtlasParameters ( GraphBuilder , DistanceFieldSceneData ) ;
2021-12-09 04:49:36 -05:00
PassParameters - > PS . AOParameters = DistanceField : : SetupAOShaderParameters ( Parameters ) ;
2022-02-02 01:43:41 -05:00
PassParameters - > PS . NumGroups = FVector2f ( TileListGroupSize . X , TileListGroupSize . Y ) ;
2021-12-09 04:49:36 -05:00
2021-06-14 12:46:26 -04:00
PassParameters - > SceneTextures = SceneTexturesUniformBuffer ;
PassParameters - > ObjectIndirectArguments = ObjectIndirectArguments ;
2020-06-23 18:40:00 -04:00
2020-10-09 22:42:26 -04:00
if ( GRHIRequiresRenderTargetForPixelShaderUAVs )
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3607928)
#lockdown Nick.Penwarden
============================
MAJOR FEATURES & CHANGES
============================
Change 3441680 by Uriel.Doyon
Added units to point light intensity, to allow the user to specify the value in candelas or lumens.
New point light actors now configure the intensity in candelas by default.
Replaced viewport exposure settings by an EV100 slider.
Hidding the tone mapper in the show flag now still applies the exposure.
Added a new AutoExposure method called EV100 which allows to specify :
- MinEV100, MaxEV100
- Calibration Constnat
- Exposure Compensation
#jira UE-42783
Change 3454934 by Chris.Bunner
Backing out changelists 3441680, 3454636 and 3454844 for the sake of integration stability.
Change 3512118 by Marc.Olano
Fix rare Sobol shader data problem. Mismatch with CPU code after a large number of points
Resubmit of portion of //UE4/Dev-Rendering@3509854 that was rolled back to avoid massive shader recompiles during integration testing
Change 3512129 by Benjamin.Hyder
Fixing up content in TM-SobolNoise
Change 3512151 by Rolando.Caloca
DR - Fixed some layouts that were general
- Added some extra dump information
Change 3512160 by Benjamin.Hyder
Still Fixing TM-Sobol
Change 3512180 by Marc.Olano
PCSS for spotlights. Like directional PCSS this is experimental, activated by r.Shadow.FilterMethod.
Change 3512261 by Michael.Lentine
Move Subsurface to shared properties.
Previously the same code could be executed multiple times without being optimized out if multiple inputs used the same subsurface output.
#jira UE-44405
Change 3512288 by Rolando.Caloca
DR - Fix issue when recycling image handles
Change 3512338 by Michael.Lentine
Fix precision if user enters a multiple of 90 degree rotation for transforms.
This will only work for exact values. Generally comparing float point numbers using == is unsafe but it should be ok in this case as they are exact values entered from the UI. We may want to later expand this to include thresholding using a value ~1e-7.
#jira UE-46137
Change 3512424 by Michael.Lentine
Regenerate BaseColor.uasset and Specular.uasset to not have the notforclient flags set.
#jira UE-44315
Change 3512686 by Brian.Karis
Fix for quadric assert in infiltrator. Due to bad tangents in source mesh.
Change 3512696 by Brian.Karis
Unrevert TAA. Fixed DOF NaN artifacts
Change 3512717 by Marcus.Wassmer
PR #3714: Fix typo in EOcclusionCombineMode (Contributed by Mumbles4)
Change 3513112 by Richard.Wallis
Crash when packaging for iOS with Shared Material Native Libraries and Share Material Shader Code from windows platform. Offline shader compile for archiving not done - shader header has missing offline compile flag for native Metal library archiving.
Fix includes:
- Handle offline compile failure when not running on Mac and no remote is configured (or remote fails). (I think it's this point at which the crash in the bug report is at).
- Make sure remote can build for native Metal libraries and archive correctly - this should now support Linux platforms or Mac to Mac (if enabled in MetalShaderCompiler.cpp) for testing if required.
- Updated to include remote calling into the xcode 9 Metal pch fix already submitted by Mark Satt.
#jira UE-45657
Change 3513357 by Richard.Wallis
Windows compile fix.
Change 3513375 by Guillaume.Abadie
Exposes the possibility to manually destroy the GPU ressource of UTextureRenderTarget2D.
Change 3513685 by Richard.Hinckley
#jira UEDOC-3822
Fixing a comment that refers to a non-existent function, for documentation purposes.
Change 3513705 by Marc.Olano
Updates to Sobol test levels in RenderTest project
Change 3513730 by Rolando.Caloca
DR - Fix mip size copying resolve targets
- Fix compute fence
- Fix descriptor set texture layout
- More dump info
Change 3513742 by Marc.Olano
Texture-free numeric print for shader debugging
Change 3513777 by Daniel.Wright
Handled edge case where no furthest samples are found in precomputed visibility
Change 3514852 by Rolando.Caloca
DR - Fix -directcompile on SCW
Change 3515049 by Rolando.Caloca
DR - hlslcc dump crash fix
Change 3515167 by Rolando.Caloca
DR - hlslcc - Fix bogus string pointer
- Allow reading from non-scalar UAVs
Change 3515745 by Rolando.Caloca
DR - Linux compile fix
Change 3515862 by Rolando.Caloca
DR - Remove old reference to CCT
- Link with hlslcc debug libs on SCW debug config for easier debugging
Change 3516292 by Rolando.Caloca
DR - glslang exe fixes
Change 3516568 by Rolando.Caloca
DR - hlslcc - Copy fix for *Buffer as functionparameters
Change 3516659 by Marcus.Wassmer
Fix some d3derrors with distance fields
Change 3516801 by Daniel.Wright
Fixed crash when doing editor 'Force Delete' on a static mesh whose distance field is still being built. Any UObject reference that is to an asset can be NULL'ed by the editor.
Change 3516825 by Rolando.Caloca
DR - Some initial fixes for structured buffers
Change 3516843 by Rolando.Caloca
DR - Fix for Vulkan dist fields
Change 3516869 by Marcus.Wassmer
Add format to the createrendertarget blueprint node
Change 3516957 by Daniel.Wright
Fixed bUsesDistortion being editable
Change 3516965 by Daniel.Wright
Still mark the distance field task completed, even if the static mesh has been deleted
Change 3517039 by Yujiang.Wang
GitHub #2655: Optimization for shadow map resolution selection for spot lights
* Use the radius of the inscribed sphere at the cone end as the spot light's screen radius
Note: slight drop of shadow quality of spot lights may occur when they are far away from the camera. This is intended, since before this optimization they tend to be always rendered with the maximum shadow map resolution (2048), which is very costly
#jira UE-33982
Change 3517069 by Yujiang.Wang
Fix for ScissorRect settings in d3d11 being lost under certain scenarios
* Scissor rectangle is always enabled in the low-level d3d11 pipeline, and it is expected that at least one ScissorRect is present no matter whether RHISetScissorRect is called with bEnable=false (when it is false we just use a big rect to make it effectively disabled)
* However FD3D11StateCacheBase::ClearState() clears all the states, which removes scissor rectangles and causes problems for certain routines (FScene::UpdateSkyCaptureContents)
* Now SetScissorRectIfRequiredWhenSettingViewport will always set a effectively disabled ScissorRect on each FD3D11DynamicRHI::RHISetViewport call, just like d3d12 does
#jira UE-45465 UE-44760
Change 3517134 by Yujiang.Wang
CIS fix
Change 3517662 by Rolando.Caloca
DR - Execute upload Vulkan cmds on the RHI thread
- Fix crash with structured buffer
Change 3517677 by Rolando.Caloca
DR - Update/copy textures on RHI thread
Change 3517680 by Rolando.Caloca
DR - Copy texture bulk data on rhi thread
Change 3517748 by Marcus.Wassmer
temporary workaround for one class of GPU crashes
Change 3518832 by Rolando.Caloca
DR - Copy & extend 3518077
- Fix for movable skylight shader missing on simple forward (low lighting quality mode)
Change 3519973 by Richard.Wallis
Jittering in Engine Menu Dropdown Options. Jitter fix: Fix some areas that hadn't been changed from RoundToInt (from previous CL's) to CeilToInt.
#jira UE-46505
Change 3520849 by Uriel.Doyon
Fixed issue with investigate texture command and dynamic component entries.
Change 3521064 by Guillaume.Abadie
Returns absolute path of shader files on error to avoid work loss in visual studio that can't figure out that a sln relative and absolute path might leading to same file on disk.
Change 3521834 by Rolando.Caloca
DR - Fix decals on Vulkan
Change 3521892 by Rolando.Caloca
DR - Fix Vulkan texture streaming
Change 3523181 by Rolando.Caloca
DR - Copy from 3523176
UE4.17 - Fix Vulkan scissor causing text to not clip
Change 3523534 by Yujiang.Wang
UE-46631: Implement a scalable LongGPUTask to fix ProfileGPU
* A new, scalable, platform-independent IssueLongGPUTask is now implemented in UtilityShaders
* Removed IssueLongGPUTask and G*Vector4VertexDeclaration from RHI implementations
* The measurement of the execution time of a basic LongGPUTask unit is kicked off on the very first frame
#jira UE-46631
Change 3524552 by Yujiang.Wang
Fix iteration number calculation of LongGPUTask
Change 3524975 by Joe.Graf
Moved the Hamming-weight function from StaticMeshDrawList.inl to FGenericPlatformMath
Added SSE versions using _mm_popcnt_u64 for platforms that support it
Added a SSE check to gracefully exit when missing the instruction and it was expected to be there
#CodeReview: arciel.rekman, brian.karis
Change 3525306 by Daniel.Wright
Fixed ensure from LPV
Change 3525346 by Rolando.Caloca
DR - Fix linking issue
Change 3525459 by Daniel.Wright
Volumetric Lightmaps - higher quality precomputed GI on dynamic objects and GI on Volumetric Fog
* Enabled by default on all maps, effective after a lighting build. This replaces the existing Precomputed Light Volume and Indirect Lighting Cache features.
* New Lightmass World Settings: VolumeLightingMethod, VolumetricLightmapDetailCellSize and VolumetricLightmapMaximumBrickMemoryMb.
* Lightmass computes lighting samples in an adaptive grid, with higher density around geometry inside the importance volume. Positions outside the importance volume get lit with the border texels.
* Improved Lightmass volume solver to use importance photons and full adaptive final gather, so volume samples have similar quality to 2d lightmaps.
* A static indirection texture is built covering the importance volume and flattening the brick tree by storing the offset to the highest density brick at each indirection cell.
* Seamless and efficient GPU interpolation across density levels is achieved by adding a single row of padding to bricks, copied from neighbors, and stitching up bricks with lower density neighbors
* The Volumetric lightmap stores Irradiance as a 3 band SH, which is 27 floats, quantized into 28 bytes, 7 texture lookups.
* A full screen barebones material using Volumetric Lightmaps costs .42ms on 970 GTX, while Indirect Lighting Cache Point costs .32ms
* Sky bent normal is also stored for stationary skylights and Directional Light Shadowing for Single Sample Shadow receiving.
* Volumetric fog, Movable components, unbuilt Static Components, SingleSampleShadow receiving and Capsule Shadows use Volumetric Lightmaps if available
* New Visualization show flag for Volumetric Lightmap sample points
* Level streaming of volume light data is not currently supported with this method
Change 3525461 by Daniel.Wright
Lowered default r.Shadow.RadiusThreshold for Epic shadow settings as it was causing a lot of visible artifacts from small objects popping out. This will increase shadowmap cost slightly (13.5ms RT -> 14.3ms RT in Fortnite on PS4, no measurable GPU difference).
Change 3526459 by Rolando.Caloca
DR - Fix validation error
Change 3526474 by Rolando.Caloca
DR - Integrate from GV
Change 3526487 by Daniel.Wright
Disabled Volumetric Lightmap filtering with neighbors due to artifacts
Fix linux compile errors
Change 3526833 by Rolando.Caloca
DR - Workaround for hlslcc
Change 3526991 by Uriel.Doyon
Integrated 3526859 : Texture mip bias is now reset whenever the streaming budget increases. This fixes an issue where textures persistently become low res after a memory spike.
Change 3527574 by Rolando.Caloca
DR - Added some missing resource entries for SCW direct mode
Change 3527625 by Rolando.Caloca
DR - Copy from 3527113
UE4.17 - Fix Vulkan not calling Present
Change 3528461 by Brian.Karis
Support larger hash sizes. Added uint list hashing function.
Change 3528780 by Rolando.Caloca
DR - Default Vulkan resources
Change 3528818 by Rolando.Caloca
DR - glslang - Added missing accessor
Change 3528839 by Rolando.Caloca
DR - Fix virtual path issue when using non-engine relative absolute paths
Change 3528900 by Daniel.Wright
Fixed variable shadowing
Change 3529039 by Rolando.Caloca
DR - Read Spirv reflection data (not used yet)
Change 3529040 by Joe.Graf
Fixed the 32bit compile failures for the popcnt optimization
#CodeReview: arciel.rekman
Change 3529060 by Rolando.Caloca
DR - hlslcc - New flag for keeping resource names
Change 3529344 by Rolando.Caloca
DR - Delete unused file
Change 3529723 by Brian.Karis
Fixed static analysis cleaner.
Change 3531357 by Michael.Trepka
Updated Mac glslang libraries with latest changes. Also, updated the Xcode project (generated with CMake) and moved it to a different location so that it no longer uses hardcoded absolute paths. It should be easy to rebuild these libraries in the future.
Change 3531517 by Joe.Graf
Added support for ddx_fine, ddy_fine, ddx_coarse, ddy_coarse to hlslcc
#CodeReview: arciel.rekman, mark.satterthwaite, rolando.caloca
Change 3531626 by Joe.Graf
Mac version of the popcount optimization
Changed Linux version to use the same builtin
#CodeReview: mark.satterthwaite, arciel.rekman
Change 3531837 by Chris.Bunner
SetScissorRectIfRequiredWhenSettingViewport sets the viewport size by default rather than disabling the scissor rect.
#jra UE-46753
Change 3533415 by Joe.Graf
Renamed the SSSE3 checks per feedback
#CodeReview: arciel.rekman
Change 3533480 by Michael.Lentine
Use more accurate descriptions for shader recompile options
Change 3533511 by Joe.Graf
Updated the GenericPlatformMisc to match the SSSE3 name change
#CodeReview: arciel.rekman
Change 3533521 by Marcus.Wassmer
Fix scenerenderer leak when updating out of view planar reflections
Change 3533528 by Joe.Graf
Updated comments
#CodeReview: n/a
Change 3533608 by Mark.Satterthwaite
New manual Xcode project for glslang so that we include all the necessary code and can link again.
Change 3534260 by Mark.Satterthwaite
Fix the Xcode 9 Beta 3 compile errors in MetalRHI without breaking Xcode 8.3.3.
Change 3535789 by Yujiang.Wang
Fix for wrong hair shading in forward shading
* IBL reflections should be turned off for hairs
Change 3537059 by Ben.Marsh
Fixing case of iOS directories, pt1
Change 3537060 by Ben.Marsh
Fixing case of iOS directories, pt2
Change 3538297 by Michael.Lentine
Add shader comparison test.
Adding the basic test case.
Adding logic to Common.ush to enable FP16 conditionally on a define (which is not set by default)
Adding more exported functionality to automation for use in the shader test.
Change 3538309 by Michael.Lentine
Add missing file from Shader Test CL.
Change 3538751 by Michael.Lentine
Add missing pragma once.
Change 3539236 by Michael.Lentine
Do not ignore return values.
Change 3539237 by Michael.Lentine
Check in the correct file
Change 3540343 by Rolando.Caloca
DR - Added t.DumpHitches.AllThreads
Change 3540661 by Yujiang.Wang
Fix spot tube light direction
* The tube direction for a spot light was pointing along the light direction, now it is along the local Z axis which is perpendicular to the light direction. Lightmass is also touched
* A new LightTangent is added to FDeferredLightData
* Packed all the values from LightSceneProxy->GetParameters into a single FLightParameters struct to avoid copy-pasting them everywhere
Change 3541129 by Rolando.Caloca
DR - vk - Copy all Vulkan fixes from 4.17
Change 3541347 by Yujiang.Wang
Fix wrong ViewFlags being set between objects when rendering shadow depth maps
* Bug caused by trying to share DrawRenderState between objects, but SetViewFlagsForShadowPass was designed to start from a fresh render state
* Now SetViewFlagsForShadowPass recalculates and sets the flags on each call
Change 3542603 by Rolando.Caloca
DR - vk - Allow sharing samplers on Vulkan
Change 3542639 by Jian.Ru
Changed warning text to better indicate that global clip plane needs to be enabled for planar reflection
#RB Marcus.Wassmer
Change 3543167 by Michael.Lentine
Fix naming for the shader comparison tests.
Change 3543210 by Uriel.Doyon
Fixed an issue when computing material scales where the default material ends up being used instead of the required material.
In that case, we used the default settings for texture streaming (assuming a scale of 1).
Change 3543221 by Brian.Karis
Simplifier optimizations
Change 3543239 by Arciel.Rekman
hlslcc: remove FCustomStd* workarounds.
- This was previous attempt to work around problems arising from different STL used for building libhlslcc (in the cross-toolchain) and possibly different STL used for building engine (on the system).
- The same problem has been resolved by bundling libc++.
Change 3543946 by Michael.Lentine
Add comparison output.
Change 3544277 by Brian.Karis
Fixed uninitialized var error
Change 3544404 by Rolando.Caloca
DR - Fix broken textures
Change 3544503 by Jian.Ru
Ensure lighting failure delegates are always called
#RB Marcus.Wassmer,Daniel.Wright
#3689
Change 3545241 by Daniel.Wright
Fixed spotlight whole scene shadows using a radius 2x too long
Change 3545347 by Daniel.Wright
Fixed shadow occlusion culling broken by shadowmap caching change. FProjectedShadowKey is now computed correctly for whole scene shadows and SDCM_StaticPrimitivesOnly shadowmaps will fall back to the query for a SDCM_MovablePrimitivesOnly, since the static primitives shadowmap's query is not issued every frame.
Change 3546196 by Marcus.Wassmer
Fix minor typo
Change 3546459 by Daniel.Wright
ULevel::PostEditChangeProperty recreates rendering resources if MapBuildData is modified - fixes a crash when Force Deleting the MapBuildData package.
Change 3546469 by Jian.Ru
Take into account CVarStaticMeshLODDistanceScale during static mesh LOD calculation
Change 3546804 by Daniel.Wright
[Copy] Added SendAllEndOfFrameUpdates draw event to wrap skin cache events
Change 3546814 by Daniel.Wright
[Copy] Only use skylight OcclusionMaxDistance for the global distance field if it casts shadows
Change 3546815 by Daniel.Wright
[Copy] Snap volumetric fog light function target resolution to a factor of 32 to avoid constant texture reallocation
Change 3546817 by Daniel.Wright
[Copy] Warmup time warning
Change 3546828 by Daniel.Wright
[Copy] Fixed UWorld::DestroyActor in PIE calling InvalidateLightingCacheDetailed which can do a FlushRenderingCommands and cause a large hitch
Change 3546836 by Daniel.Wright
[Copy] ULightComponent::InvalidateLightingCacheInner uses MarkRenderStateDirty instead of slow reregister + FlushRendingCommands, and only for lights which might have static lighting data
Change 3546849 by Rolando.Caloca
DR - vk - Fix missing samplerstates
- Fixes for structured buffers
- Add missing Draw and Dispatch Indirect
Change 3547516 by Brian.Karis
Linear time 5-coloring for planar graphs.
Brought in the Planarity library written by John Boyer, heavily edited and trimmed down to only include code necesary for graph coloring. Put behind a simple wrapper.
Change 3547542 by Brian.Karis
Linear time 5-coloring for planar graphs.
Brought in the Planarity library written by John Boyer, heavily edited and trimmed down to only include code necesary for graph coloring. Put behind a simple wrapper.
Change 3547563 by Brian.Karis
Fixed some compiler warnings and hopefully some errors.
Change 3547610 by Brian.Karis
Replaced macros with inlined functions
Change 3547620 by Brian.Karis
Clean up includes
Change 3547770 by Marcus.Wassmer
GPU Crash for MTBF analytics
Change 3547773 by Marcus.Wassmer
Updated doxygen comment for new analytic
Change 3548244 by Rolando.Caloca
DR - Fix for translucency
Change 3548352 by Yujiang.Wang
Added soft source radius for point and spot lights
* Soft source radius controls how 'blurry' the shape of specular lighting looks
* Implemented by LobeRoughness modification
* Better approximation for spherical lights so that they don't look sharp when the radius is large using 'smoothed representative point' method
* Suppoted LightTangent in forward shading
Change 3548530 by Brian.Karis
Fix for mac build
Change 3548770 by Rolando.Caloca
DR - vk - Prereq work for Vulkan parallel RHI contexts
Change 3548772 by Jian.Ru
Fixed an issue that caused an ensure when switching levels in D3D10. #rb Marcus.Wassmer
Change 3548865 by Daniel.Wright
With shadowmap caching of whole scene shadows, only one of the cache modes issues an occlusion query. Fixes a crash where the static primitive shadowmap is culled but the movable primitive shadowmap is visible, which is normally not possible.
Change 3548952 by Rolando.Caloca
DR - Allow separate samplers in the shaders on Vulkan
Change 3549197 by Marcus.Wassmer
Fix DX12 PIx not working in cooked builds
Change 3549209 by Daniel.Wright
Occlusion culling for CSM, from the main camera, controlled by 'r.Shadow.OcclusionCullCascadedShadowMaps'. Disabled by default as rapid view changes don't work well with latent occlusion queries.
Change 3549943 by Ben.Marsh
Include better diagnostic information when a modified build product is detected after running a build step.
Change 3550546 by Rolando.Caloca
DR - Fix merge issue
Change 3550962 by Marcus.Wassmer
EarlyZ Masking requires full depth prepass, so just force it to.
Change 3551062 by Daniel.Wright
Handle NULL skylight
Change 3551104 by Rolando.Caloca
DR - vk - Remove assert to match other platforms
Change 3551221 by Rolando.Caloca
DR - vk - Add mirror clamp to edge extension
- Fix framebuffer deletion
Change 3551224 by Daniel.Wright
Volumetric lightmap increase density around static lights affecting a voxel brighter than LightBrightnessSubdivideThreshold.
Change 3551495 by Rolando.Caloca
DR - vk - Intiial support for async queue
Change 3552101 by Rolando.Caloca
DR - vk - Fix for async
Change 3552102 by Rolando.Caloca
DR - SkinCache - Fix potential leak on staging buffers for recompute tangents
- Integrate changes from 4.17 for memory optimizations
Change 3552104 by Rolando.Caloca
DR - vk - Support for SRVs for index buffers
Change 3552838 by Rolando.Caloca
DR - vk - Enable debug markers if found
Change 3553106 by Rolando.Caloca
DR - vk - Fixes for index buffer SRVs
Change 3553107 by Rolando.Caloca
DR - vk - Enable recompute tangents on Vulkan
Change 3553154 by Rolando.Caloca
DR - vk - Fix crash with null uav
Change 3553342 by Yujiang.Wang
Fix redundant skylights in AdvancedPreviewScene
* PreviewScene was changed to using a skylight instead of ambient cubemap to support forward shading
* AdvancedPreviewScene originally had a skylight, now it is changed to using the one inherited from PreviewScene
Change 3553481 by Rolando.Caloca
DR - Integrate fix for D3D12 support of index buffers SRVs
#jira UE-47674
Change 3553715 by Rolando.Caloca
DR - Fix crash when launching PC with -featureleveles31
Change 3553725 by Rolando.Caloca
DR - Redo fix
Change 3553803 by Rolando.Caloca
DR - Shader compile fixes for ES3.1
Change 3553963 by Rolando.Caloca
DR - vk - Remove extra IRDump
Change 3554741 by Ben.Marsh
CIS fix.
Change 3555222 by Rolando.Caloca
DR - vk - static analysis fix
Change 3555362 by Rolando.Caloca
DR - vk - Prep work for separate present queue
Change 3556800 by Daniel.Wright
Fixed screenshot for simple volume material doc
Change 3556942 by Brian.Karis
Fixed Bokeh DOF regression.
Change 3556959 by Rolando.Caloca
DR - vk - Rework staging buffer peak usage
Change 3557497 by Daniel.Wright
Better display name for Unbound property on post process volume
Change 3557499 by Daniel.Wright
Disable r.GenerateLandscapeGIData by default, opt-in for kite demo. Projects that want to use heightfield GI need to opt-in to r.GenerateLandscapeGIData.
Change 3557068 by Olaf.Piesche
Configurable spawn rate scaling reference value; sets the zero-scale reference value (default: 2), so additional quality levels can be added and scaling customized further.
IMPORTANT: This sets the reference to 3 in PS4Scalability.ini; effects on PS4 are again going to have reduced spawn rates versus PC and Neo, as intended by the FX artists starting with this change.
#tests QAGame test maps
Change 3558123 by Rolando.Caloca
DR - vk - static analysis fix
Change 3558685 by Yujiang.Wang
Github #3323: Two sided foliage lightmap directionality fix
* Subsurface is not intended to work with lightmaps that don't have directionality, however we still want it to look similar to a directional one
* Now it uses a constant directionality value
#jira UE-42523
Change 3559052 by Brian.Karis
Hopefully fix static analysis
Change 3559113 by Rolando.Caloca
DR - Fix crash witrh planar reflections
Change 3559275 by Yujiang.Wang
Fix race condition on several scalability CVars between rendering thread and game thread
Change 3559612 by Rolando.Caloca
DR - vk - SM5 with uniform buffers backend support
Change 3559716 by Rolando.Caloca
DR - hlslcc - Fix linker warning on SCW debug
Change 3559768 by Rolando.Caloca
DR - vk - Keep ub names for bindings
Change 3560195 by Rolando.Caloca
DR - accessor
Change 3560275 by Rolando.Caloca
DR - vk - Support for uniform buffers
Change 3560913 by Rolando.Caloca
DR - vk - Fix static analysis
Change 3561145 by Rolando.Caloca
DR - Don't crash if out of resource table bits
Change 3561194 by Rolando.Caloca
DR - vk - Integrate timestamp fixes
Change 3562009 by Rolando.Caloca
DR - vk - Workaround for bad UTexture data
Change 3563884 by Chris.Bunner
VK_NULL_HANDLE fix.
Change 3563885 by Jian.Ru
Ignore a warning caused by enabling distance field generation so that test Cube_Blue and Cube_Section don't fail. #rb Chris.Bunner
Change 3565943 by Jian.Ru
Add extra warning log triggered when attempt to create FRWBuffer greater than 256MB in ComputeLightGrid() #rb Chris.Bunner
Change 3569479 by Michael.Lentine
Integrate rhino shader changes to dev-rendering
Change 3569511 by Michael.Lentine
Fix formating and string out on windows.
Change 3569572 by Yujiang.Wang
Fix MeasureLongGPUTaskExecutionTime crashing on AMD on Macs
Change 3569614 by Yujiang.Wang
Flush rendering commands before measuring the long GPU task's excution time to get accurate results
Change 3570524 by Jian.Ru
Add extra parentheses to avoid compilation warning #rb Chris.Bunner
Change 3570722 by Chris.Bunner
Static analysis workaround - same code, just validating compile-time assumptions a little further.
Change 3570880 by Jian.Ru
Add small depth offset to avoid depth test failing during velocity pass
#jira UE-37556
Change 3572532 by Jian.Ru
Disable a warning to let tests pass
#jira UE-48021
Change 3573109 by Michael.Lentine
Checkin Michael.Trepka's fix for external dynamic libraries on mac.
This is needed to make the build go green on mac.
Change 3573995 by Jian.Ru
Move an include out of define to let nightly build pass
Change 3574777 by Chris.Bunner
Continued merge fixes.
Change 3574792 by Rolando.Caloca
DR - Rename todo
Change 3574794 by Chris.Bunner
Re-adding includes lost in a pre-merge merge.
Change 3574879 by Michael.Trepka
Disabled a couple of Mac deprecation warnings
Change 3574932 by Chris.Bunner
Merge fix.
Change 3575048 by Michael.Trepka
Fixed iOS compile warnings
Change 3575530 by Chris.Bunner
Duplicating static analysis fix CL 3539836.
Change 3575582 by Chris.Bunner
Fixed GetDimensions return type in depth resolve shaders.
Compile error fix.
Change 3576326 by Chris.Bunner
Static analysis fixes.
Change 3576513 by Michael.Trepka
Updated Mac MCPP lib to be compatible with OS X 10.9
Change 3576555 by Richard.Wallis
Metal Validation Errors. Dummy black volume texture is in the wrong format in the Metal shader for the VolumetricLightmapIndirectionTexture. Create a new dummy texture with pixel format PF_R8G8B8A8_UINT.
#jira UE-47549
Change 3576562 by Chris.Bunner
OpenGL SetStreamSource stride updates.
Change 3576589 by Michael.Trepka
Fixed Mac CIS warnings and errors in Dev-Rendering
Change 3576708 by Jian.Ru
Fix cascade preview viewport background color not changing
#jira UE-39687
Change 3576827 by Rolando.Caloca
DR - Minor fix for licensee
Change 3576973 by Chris.Bunner
Fixing up HLSLCC language spec mismatch (potential shader compile crashes in GL and Vulkan).
Change 3577729 by Rolando.Caloca
DR - Fix for info on SCW crashes
Change 3578723 by Chris.Bunner
Fixed issue where custom material attribute was using display name as hlsl function name.
Change 3578797 by Chris.Bunner
Fixed pixel inspector crashing on high-precision normals gbuffer format.
#jira UE-48094
Change 3578815 by Yujiang.Wang
Fix for UE-48207 Orion cooked windows server crash on startup
* Crash caused by rendering features not available in a dedicated server build
* Skip over MeasureLongGPUTaskExecutionTime when !FApp::CanEvenRender()
#jira UE-48207
Change 3578828 by Daniel.Wright
Disable volumetric lightmap 3d texture creation on mobile
Change 3579473 by Daniel.Wright
Added View.SharedBilinearClampSampler and View.SharedBilinearWrapSampler. Used these to reduce base pass sampler counts with volumetric lightmaps.
Change 3580088 by Jian.Ru
Fix QAGame TM-CharacterMovement crashing on PIE
#jira UE-48031
Change 3580388 by Daniel.Wright
Fixed shadowed light injection into volumetric fog fallout from Rhino merge
Change 3580407 by Michael.Trepka
Updated Mac UnrealPak binaries
Change 3581094 by Michael.Trepka
Fix for ScreenSpaceReflections not working properly on iOS 11
Change 3581242 by Michael.Trepka
Fixed a crash on startup on Mac when launching TM-ShaderModels in QAGame
#jira UE-48255
Change 3581489 by Olaf.Piesche
Replicating CL 3578030 from Fortnite-Main to fix #jira UE-46475
#jira FORT-47068, FORT-49705
Don't inappropriaely touch game thread data on the render thread. Push SubUV cutout data into a RT side object owned by the sprite dynamic data.
#tests FN LastPerfTest
Change 3581544 by Simon.Tovey
Fix for ensure accessing cvar from task thread.
#tests no more ensure
Change 3581934 by Chris.Bunner
Fixed ConsoleVariables.ini break from merge.
Change 3581968 by Jian.Ru
Fix QAGame TM-ShaderModels PIE crash when resizing game viewport
#jira UE-48251
Change 3581989 by Richard.Wallis
Fix for NULL PrecomputedLightingBuffer. It is null for first frame request in forward rendering so should have the GEmptyPrecomputedLightingUniformBuffer set in these cases after it's been initially tried to be set not before.
#jira UE-46955
Change 3582632 by Chris.Bunner
Resolved merge error.
Change 3582722 by Rolando.Caloca
DR - Workaround for PF_R8G8B8A8_UINT on GL
#jira UE-48208
Change 3584096 by Rolando.Caloca
DR - Fix for renderdoc crashing in shipping
#jira UE-46867
Change 3584245 by Jian.Ru
Fix System.Promotion.Editor.Particle Editor test crash
#jira UE-48235
Change 3584359 by Yujiang.Wang
Fix for UE-48315 Wall behind base in Monolith is flickering white in -game Orion
* Caused by dot(N, V) being negative
* Clamp to (0, 1)
#jira UE-48315
Change 3587864 by Mark.Satterthwaite
Fix the GPU hang on iOS caused by changes to the Depth-Stencil MSAA handling: you can't store the MSAA stencil results on iOS < 10 unless you use the slower MTLStoreActionStoreAndMultisampleResolve which we don't need for the mobile renderer.
#jira UE-48342
Change 3587866 by Mark.Satterthwaite
Correctly fix iOS compilation errors against Xcode 9 Beta 5 and Xcode 8.3.3 - duplicating function definitions is guaranteed to be wrong.
Change 3588168 by Mark.Satterthwaite
Move the Xcode version into the Metal shader format header, not the DDC key, so that we can handle bad compiler/driver combinations in the runtime and don't force all users to recompile every time the Xcode version changes.
Change 3588192 by Rolando.Caloca
DR - Fix d3d12 linker error when EXECUTE_DEBUG_COMMAND_LISTS is enabled
Change 3588291 by Rolando.Caloca
DR - Fix for d3d12 command list crash: Commited resources can not have aliasing barriers
#jira UE-48299
Change 3590134 by Michael.Trepka
Copy of CL 3578963
Reset automation tests timer after shader compilation when preparing for screenshots taking to make sure tests don't time out.
Change 3590405 by Rolando.Caloca
DR - hlslcc - support for sqrt(uint)
Change 3590436 by Mark.Satterthwaite
Rebuild Mac hlslcc for CL #3590405 - without the various compiler workarounds left over from before the recent code changes.
Change 3590674 by Rolando.Caloca
DR - vk - Integration from working branch
- Fixes distance field maps
- Compute pipelines stored in saved file
- Adds GRHIRequiresRenderTargetForPixelShaderUAVs for platforms that need dummy render targets
Change 3590699 by Rolando.Caloca
DR - Fix distance fields mem leak
Change 3590815 by Rolando.Caloca
DR - vk - Fixes for uniform buffers and empty resource tables
Change 3590818 by Mark.Satterthwaite
Temporarily switch back to OpenVR v1.0.6 for Mac only until I can clarify what to do about a required but missing API hook for Metal. Re-enabled and fixed compile errors with Mac SteamVR plugin code.
Change 3590905 by Mark.Satterthwaite
For Metal shader compilation where the bytecode compiler is unavailable force the debug compiler flag and disable the archiving flag because storing text requires this.
#jira UE-48163
Change 3590961 by Mark.Satterthwaite
Submitted on Richard Wallis's behalf as he's on holiday:
Mac fixes for Compute Skin Cache rendering issues (resulting in incorrect positions and tangents) and for recomputing tangents. Problem sampling from buffers/textures as floats with packed data. Some of the data appears as denorms so get flushed to zero then reinterpreted as uints via asuint or in Metal as_type<uint>(). Fix here for Metal seems to be to use uint types for the skin cache SRV's and as_type<> to floats instead.
There could be some other areas where we're unpacking via floats that could affect Metal and I'm not sure how this will impact on other platforms.
#jira UE-46688, UE-39256, UE-47215
Change 3590965 by Mark.Satterthwaite
Remove the Z-bias workaround from Metal MRT as it isn't required and actually causes more problems.
Change 3590969 by Mark.Satterthwaite
Make all Metal shader platforms compile such that half may be used, unless the material specifies full precision.
Change 3591871 by Rolando.Caloca
DR - Enable PCSS on Vulkan & Metal
- Enable capsule shadows on Vulkan
Change 3592014 by Mark.Satterthwaite
Remove support for Mac OS X El Capitan (10.11) including the stencil view workaround.
Bump the minimum Metal shader standard for Metal SM4, SM5 & Metal MRT to v1.2 (macOS 10.12 Sierra & iOS 10) so we can use FMAs and other newer shader language features globally.
Enable the new GRHIRequiresRenderTargetForPixelShaderUAVs flag as Metal is like Vulkan and needs a target for fragment rendering.
Also fix the filename for direct-compile & remove the old batch file generation in the Metal shader compiler.
Change 3592171 by Rolando.Caloca
DR - CIS fix
Change 3592753 by Jian.Ru
repeat Daniel's fix on xb1 profilegpu crash (draw events cannot live beyond present)
Change 3594595 by Rolando.Caloca
DR - Fix D3D shader compiling run time stack corruption failure on debug triggering falsely
Change 3594794 by Michael.Trepka
Call FPlatformMisc::PumpMessages() before attempting to toggle fullscreen on Mac to fix an issue on some Macs running 10.13 beta that would ignore the toggle fullscreen call freezing the app
Change 3594999 by Mark.Satterthwaite
Disable MallocBinned2 for iOS as on Rhino it worked but on iOS 10.0.2 there are bugs (munmap uses 64kb granularity, not the 4096 the code expects given the reported page-size).
While we are here remove the spurious FORCE_MALLOC_ANSI from the iOS platform header.
#jira UE-48342
Change 3595004 by Mark.Satterthwaite
Disable Metal's Deferred Store Actions and combined Depth/Stencil formats on iOS < 10.3 as there are bugs on earlier versions of iOS 10.
#jira UE-48342
Change 3595386 by Mark.Satterthwaite
Silence the deprecation warning for kIOSurfaceIsGlobal until SteamVR switches to one of the newer IOSurface sharing mechanisms.
Change 3595394 by Rolando.Caloca
DR - Added function for tracking down errors in the hlsl parser
- Added support for simple #if 0...#endif
Change 3599352 by Rolando.Caloca
DR - Fixes for HlslParser
- Added missing attributes for functions
- Fixed nested assignment
Change 3602440 by Michael.Trepka
Fixed Metal shader compilation from Windows with remote compilation disabled
#jira UE-48163
Change 3602898 by Chris.Bunner
Resaving assets.
Change 3603731 by Jian.Ru
fix a crash caused by a material destroyed before the decal component
#jira UE-48587
Change 3604629 by Rolando.Caloca
DR - Workaround for PF_R8G8B8A8_UINT on Android
#jira UE-48208
Change 3604984 by Peter.Sauerbrei
fix for orientation not being limited to that specified in the plist
#jira UE-48360
Change 3605738 by Chris.Bunner
Allow functional screenshot tests to request a camera cut (e.g. tests relying on temporal aa history).
#jira UE-48748
Change 3606009 by Mark.Satterthwaite
Correctly implement ClipDistance for Metal as an array of floats as required by the spec. and fix a few irritating issues from the merge that should not have.
- When compiling a tessellation vertex shader in the SCW direct mode we can't evaluate non-existant defines and we don't actually need to.
- The define names, values & shader file name are irrelevant to the Metal output key, but the shader format name & Metal standard really do matter - should speed up Metal shader compilation a bit.
- Move the shader vertex layer clip-distance to index 2 to avoid conflicts.
- Don't default initialise the debug code string for Metal shaders or it won't print out the actual code....
#jira UE-47663
Change 3606108 by Mark.Satterthwaite
Temporary hack to avoid a crash in AVPlayer.
#jira UE-48758
Change 3606121 by Mark.Satterthwaite
Fix Windows compilation.
Change 3606992 by Chris.Bunner
Static analysis fix.
[CL 3608256 by Marcus Wassmer in Main branch]
2017-08-24 15:38:57 -04:00
{
2021-06-14 12:46:26 -04:00
FRDGTextureDesc DummyDesc = FRDGTextureDesc : : Create2D ( TileListGroupSize , PF_B8G8R8A8 , FClearValueBinding : : Black , TexCreate_RenderTargetable ) ;
PassParameters - > RenderTargets [ 0 ] = FRenderTargetBinding ( GraphBuilder . CreateTexture ( DummyDesc , TEXT ( " Dummy " ) ) , ERenderTargetLoadAction : : ENoAction ) ;
}
ClearUnusedGraphResources ( VertexShader , & PassParameters - > VS ) ;
ClearUnusedGraphResources ( PixelShader , & PassParameters - > PS ) ;
GraphBuilder . AddPass (
bCountingPass ? RDG_EVENT_NAME ( " CountTileObjectIntersections " ) : RDG_EVENT_NAME ( " CullTilesToObjects " ) ,
PassParameters ,
ERDGPassFlags : : Raster ,
2022-04-13 18:48:32 -04:00
[ PassParameters , VertexShader , PixelShader , & View , TileListGroupSize , ObjectIndirectArguments ] ( FRHICommandList & RHICmdList )
2020-09-24 00:43:27 -04:00
{
2021-06-14 12:46:26 -04:00
FGraphicsPipelineStateInitializer GraphicsPSOInit ;
RHICmdList . ApplyCachedRenderTargets ( GraphicsPSOInit ) ;
2018-11-06 14:45:07 -05:00
2021-06-14 12:46:26 -04:00
RHICmdList . SetViewport ( 0 , 0 , 0.0f , TileListGroupSize . X , TileListGroupSize . Y , 1.0f ) ;
2018-11-06 14:45:07 -05:00
2021-06-14 12:46:26 -04:00
// Render backfaces since camera may intersect
GraphicsPSOInit . RasterizerState = View . bReverseCulling ? TStaticRasterizerState < FM_Solid , CM_CW > : : GetRHI ( ) : TStaticRasterizerState < FM_Solid , CM_CCW > : : GetRHI ( ) ;
GraphicsPSOInit . DepthStencilState = TStaticDepthStencilState < false , CF_Always > : : GetRHI ( ) ;
GraphicsPSOInit . BlendState = TStaticBlendState < > : : GetRHI ( ) ;
GraphicsPSOInit . PrimitiveType = PT_TriangleList ;
2018-11-06 14:45:07 -05:00
2021-06-14 12:46:26 -04:00
GraphicsPSOInit . BoundShaderState . VertexDeclarationRHI = GetVertexDeclarationFVector4 ( ) ;
GraphicsPSOInit . BoundShaderState . VertexShaderRHI = VertexShader . GetVertexShader ( ) ;
GraphicsPSOInit . BoundShaderState . PixelShaderRHI = PixelShader . GetPixelShader ( ) ;
2018-11-06 14:45:07 -05:00
2021-09-03 12:04:52 -04:00
SetGraphicsPipelineState ( RHICmdList , GraphicsPSOInit , 0 ) ;
2018-11-06 14:45:07 -05:00
2021-06-14 12:46:26 -04:00
SetShaderParameters ( RHICmdList , VertexShader , VertexShader . GetVertexShader ( ) , PassParameters - > VS ) ;
SetShaderParameters ( RHICmdList , PixelShader , PixelShader . GetPixelShader ( ) , PassParameters - > PS ) ;
2018-11-06 14:45:07 -05:00
2021-06-14 12:46:26 -04:00
RHICmdList . SetStreamSource ( 0 , StencilingGeometry : : GLowPolyStencilSphereVertexBuffer . VertexBufferRHI , 0 ) ;
2018-11-06 14:45:07 -05:00
2021-06-14 12:46:26 -04:00
RHICmdList . DrawIndexedPrimitiveIndirect (
StencilingGeometry : : GLowPolyStencilSphereIndexBuffer . IndexBufferRHI ,
ObjectIndirectArguments - > GetIndirectRHICallBuffer ( ) ,
0 ) ;
} ) ;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
}
2018-05-23 21:04:31 -04:00
FIntPoint GetTileListGroupSizeForView ( const FViewInfo & View )
{
return FIntPoint (
FMath : : DivideAndRoundUp ( FMath : : Max ( View . ViewRect . Size ( ) . X / GAODownsampleFactor , 1 ) , GDistanceFieldAOTileSizeX ) ,
FMath : : DivideAndRoundUp ( FMath : : Max ( View . ViewRect . Size ( ) . Y / GAODownsampleFactor , 1 ) , GDistanceFieldAOTileSizeY ) ) ;
}
2021-06-14 12:46:26 -04:00
void BuildTileObjectLists (
FRDGBuilder & GraphBuilder ,
FScene * Scene ,
UE5_MAIN: Multi-view-family scene renderer refactor, part 1. Major structural change to allow scene renderer to accept multiple view families, with otherwise negligible changes in internal behavior.
* Added "BeginRenderingViewFamilies" render interface call that accepts multiple view families. Original "BeginRenderingViewFamily" falls through to this.
* FSceneRenderer modified to include an array of view families, plus an active view family and the Views for that family.
* Swap ViewFamily to ActiveViewFamily.
* Swap Views array from TArray<FViewInfo> to TArrayView<FViewInfo>, including where the Views array is passed to functions.
* FSceneRenderer iterates over the view families, rendering each one at a time, as separate render graph executions.
* Some frame setup and cleanup logic outside the render graph runs once.
* Moved stateful FSceneRenderer members to FViewFamilyInfo, to preserve existing one-at-a-time view family rendering behavior.
* Display Cluster (Virtual Production) uses new API.
Next step will push everything into one render graph, which requires handling per-family external resources and cleaning up singletons (like FSceneTextures and FSceneTexturesConfig). Once that's done, we'll be in a position to further interleave rendering, properly handle once per frame work, and solve artifacts in various systems.
#jira none
#rnx
#rb zach.bethel
#preflight 625df821b21bb49791d377c9
[CL 19813996 by jason hoerner in ue5-main branch]
2022-04-19 14:45:26 -04:00
TArrayView < FViewInfo > & Views ,
2021-06-14 12:46:26 -04:00
FRDGBufferRef ObjectIndirectArguments ,
const FDistanceFieldCulledObjectBufferParameters & CulledObjectBufferParameters ,
FTileIntersectionParameters TileIntersectionParameters ,
FRDGTextureRef DistanceFieldNormal ,
const FDistanceFieldAOParameters & Parameters )
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
{
2021-06-14 12:46:26 -04:00
ensure ( GAOScatterTileCulling ) ;
2020-09-24 00:43:27 -04:00
RDG_EVENT_SCOPE ( GraphBuilder , " BuildTileList " ) ;
UE5_MAIN: Multi-view-family scene renderer refactor, part 2. Move FSceneTextures singleton out of RDG blackboard and FSceneTexturesConfig global variable singleton, into FViewFamilyInfo. This is necessary to allow multiple view families to render in a single render graph and a single scene renderer call.
* Existing calls to CreateSceneTextureShaderParameters and similar functions use "GetSceneTexturesChecked", which allows for the possibility that they are reached in a code path where scene textures haven't been initialized, and nullptr is returned instead of asserting. The shader parameter setup functions then fill in dummy defaults for that case. The goal was to precisely match the original behavior, which queried the RDG blackboard, and gracefully handled null if scene textures weren't there. This definitely appears to occur in FNiagaraGpuComputeDispatch::ProcessPendingTicksFlush, which can be called with a dummy scene with no scene textures. In the future, I may change this so dummy defaults are filled in for FSceneTextures at construction time, so the structure is never in an uninitialized state, but I would like to set up a test case for the Niagara code path before doing that, and the checks aren't harmful in the meantime.
* I marked as deprecated global functions which query values from FSceneTexturesConfig, but they'll still work with the caveat that if you use multi-view-family rendering, the results will be indeterminate (whatever view family rendered last). There was only one case outside the scene renderer that accessed the globals (depth clear value), which I removed, noting that there is nowhere in the code where we modify the depth clear value from its global default. I would like to permanently deprecate or remove these at some point. Display Cluster is the only code that's currently using the multi-view-family code path, and as a new (still incomplete) feature, third party code can't be using it, and won't be affected.
#jira NONE
#rb chris.kulla zach.bethel mihnea.balta
#preflight 6261aca76119a1a496bd2644
[CL 19873983 by jason hoerner in ue5-main branch]
2022-04-22 17:33:02 -04:00
TRDGUniformBufferRef < FSceneTextureUniformParameters > SceneTexturesUniformBuffer = GetViewFamily ( Views ) . GetSceneTextures ( ) . UniformBuffer ;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
for ( int32 ViewIndex = 0 ; ViewIndex < Views . Num ( ) ; ViewIndex + + )
{
const FViewInfo & View = Views [ ViewIndex ] ;
2020-09-24 00:43:27 -04:00
RDG_GPU_MASK_SCOPE ( GraphBuilder , View . GPUMask ) ;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
2021-06-14 12:46:26 -04:00
const FIntPoint TileListGroupSize = GetTileListGroupSizeForView ( View ) ;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
{
2021-06-14 12:46:26 -04:00
auto * PassParameters = GraphBuilder . AllocParameters < FBuildTileConesCS : : FParameters > ( ) ;
2021-12-09 04:49:36 -05:00
PassParameters - > View = View . ViewUniformBuffer ;
2021-06-14 12:46:26 -04:00
PassParameters - > RWTileConeAxisAndCos = TileIntersectionParameters . RWTileConeAxisAndCos ;
PassParameters - > RWTileConeDepthRanges = TileIntersectionParameters . RWTileConeDepthRanges ;
PassParameters - > SceneTextures = SceneTexturesUniformBuffer ;
PassParameters - > DistanceFieldNormalTexture = DistanceFieldNormal ;
PassParameters - > DistanceFieldNormalSampler = TStaticSamplerState < SF_Point , AM_Clamp , AM_Clamp , AM_Clamp > : : GetRHI ( ) ;
2021-12-09 04:49:36 -05:00
PassParameters - > AOParameters = DistanceField : : SetupAOShaderParameters ( Parameters ) ;
PassParameters - > NumGroups = FVector2f ( TileListGroupSize . X , TileListGroupSize . Y ) ;
2020-09-24 00:43:27 -04:00
2021-06-14 12:46:26 -04:00
auto ComputeShader = View . ShaderMap - > GetShader < FBuildTileConesCS > ( ) ;
2020-09-24 00:43:27 -04:00
2021-12-09 04:49:36 -05:00
FComputeShaderUtils : : AddPass ( GraphBuilder , RDG_EVENT_NAME ( " BuildTileCones " ) , ComputeShader , PassParameters , FIntVector ( TileListGroupSize . X , TileListGroupSize . Y , 1 ) ) ;
2021-06-14 12:46:26 -04:00
}
2020-09-24 00:43:27 -04:00
2021-06-14 12:46:26 -04:00
// Start at 0 tiles per object
AddClearUAVPass ( GraphBuilder , TileIntersectionParameters . RWNumCulledTilesArray , 0 ) ;
2020-09-24 00:43:27 -04:00
2021-06-14 12:46:26 -04:00
// Rasterize object bounding shapes and intersect with screen tiles to compute how many tiles intersect each object
ScatterTilesToObjects ( GraphBuilder , true , View , Scene - > DistanceFieldSceneData , TileListGroupSize , Parameters , ObjectIndirectArguments , CulledObjectBufferParameters , TileIntersectionParameters , SceneTexturesUniformBuffer ) ;
2020-09-24 00:43:27 -04:00
2021-06-14 12:46:26 -04:00
// Start at 0 threadgroups
AddClearUAVPass ( GraphBuilder , TileIntersectionParameters . RWObjectTilesIndirectArguments , 0 ) ;
2020-09-24 00:43:27 -04:00
2021-06-14 12:46:26 -04:00
{
auto * PassParameters = GraphBuilder . AllocParameters < FComputeCulledTilesStartOffsetCS : : FParameters > ( ) ;
PassParameters - > View = View . ViewUniformBuffer ;
PassParameters - > TileIntersectionParameters = TileIntersectionParameters ;
PassParameters - > DistanceFieldCulledObjectBuffers = CulledObjectBufferParameters ;
2022-04-26 14:37:07 -04:00
PassParameters - > DistanceFieldAtlas = DistanceField : : SetupAtlasParameters ( GraphBuilder , Scene - > DistanceFieldSceneData ) ;
2021-06-14 12:46:26 -04:00
PassParameters - > SceneTextures = SceneTexturesUniformBuffer ;
2020-09-24 00:43:27 -04:00
2021-06-14 12:46:26 -04:00
auto ComputeShader = View . ShaderMap - > GetShader < FComputeCulledTilesStartOffsetCS > ( ) ;
const int32 GroupSize = FMath : : DivideAndRoundUp < uint32 > ( Scene - > DistanceFieldSceneData . NumObjectsInBuffer , ComputeStartOffsetGroupSize ) ;
2020-09-24 00:43:27 -04:00
2021-06-14 12:46:26 -04:00
FComputeShaderUtils : : AddPass (
GraphBuilder ,
RDG_EVENT_NAME ( " ComputeStartOffsets " ) ,
ComputeShader ,
PassParameters ,
FIntVector ( GroupSize , 1 , 1 ) ) ;
}
2020-09-24 00:43:27 -04:00
2021-06-14 12:46:26 -04:00
// Start at 0 tiles per object
AddClearUAVPass ( GraphBuilder , TileIntersectionParameters . RWNumCulledTilesArray , 0 ) ;
2020-09-24 00:43:27 -04:00
2021-06-14 12:46:26 -04:00
// Rasterize object bounding shapes and intersect with screen tiles, and write out intersecting tile indices for the cone tracing pass
ScatterTilesToObjects ( GraphBuilder , false , View , Scene - > DistanceFieldSceneData , TileListGroupSize , Parameters , ObjectIndirectArguments , CulledObjectBufferParameters , TileIntersectionParameters , SceneTexturesUniformBuffer ) ;
Copying //UE4/Dev-Rendering to //UE4/Dev-Main (Source: //UE4/Dev-Rendering @ 3274304)
#lockdown Nick.Penwarden
#rb none
==========================
MAJOR FEATURES + CHANGES
==========================
Change 3250856 on 2017/01/09 by Daniel.Wright
Only showing instruction count for 'Base pass shader' now
Change 3250943 on 2017/01/09 by Rolando.Caloca
DR - Async Compute PSO creation
Change 3251036 on 2017/01/09 by Rolando.Caloca
DR - Add r.AsyncPipelineCompile
- Dispatch on any thread
- Wait for completion event
Change 3251058 on 2017/01/09 by Ben.Woodhouse
Fix for PSO creation D3D error with NumRenderTargets. Add code to compute the correct number of valid rendertargets to prevent an issue during PSO creation when NumRenderTargets is >0, but none of the formats are valid (all formats are DXGI_UNKNOWN)
#jira UE-40332
Change 3251141 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite CL 3243458:
D3D12 memory optimization - The d3d12 buddy suballocator is very wasteful for allocations above 4KB, but the vast majority of allocations are smaller . In the default buffer allocator this was causing 149MB of waste in 340MB of allocations. Moving the max allocation size threshold down to 4KB from 512KB saved 100MB of memory wastage memory.
On PC, buffers are 64KB aligned, so we need the threshold to be higher to avoid additional wastage.
Add PIX memory tracking instrumentation for buddy allocators so we can track the memory properly in PIX
Change 3251142 on 2017/01/09 by Ben.Woodhouse
Duplicated from Fortnite 3243496
memory optimisation: use NULL-terminated ansi strings instead of unicode FStrings for symbols, saving 118MB. Previously the strings were loaded from disk as ansi and then converted to FStrings (slowly), before finally being converted them back to ansi strings before being used. In addition to reducing memory overhead, this change reduces complexity and improves startup time.
Change 3252323 on 2017/01/10 by Rolando.Caloca
DR - Gfx async PSO creation prep
Change 3252474 on 2017/01/10 by Daniel.Wright
Added 'Compile Unreal Lightmass' to error message
Change 3252589 on 2017/01/10 by Daniel.Wright
Back out bulk data for distance fields from cl 3241990 which causes distance fields to be corrupt in Fortnite
Change 3252790 on 2017/01/10 by Daniel.Wright
Added InscatteringColorCubemapAngle to exponential height fog
Change 3252843 on 2017/01/10 by Uriel.Doyon
Propper fix for UE-40211, where texture streaming bound defrag and async tasks could interact in coherent ways.
The bound defrag is now done outside of the async work logic.
Change 3252866 on 2017/01/10 by Mark.Satterthwaite
Fix Metal shader pipeline hash collisions caused by deferring MTLFunction construction until PrepareToDraw so that we may use Function-Constants to specialise the shader source without generating additional permutations. This is required to generate proper tessellation shaders which are specialised against the index-buffer usage & type (none, uint16, uint32). While we're here amend the hash functions to make better use of the existing hash functions to improve the distribution and hopefully reduce the possibility of collisions in future.
#jira UE-40357
Change 3254511 on 2017/01/11 by Rolando.Caloca
DR - PSO stats
Change 3255958 on 2017/01/12 by Mark.Satterthwaite
Reimplement RQT_AbsoluteTime for Metal - pretty sure I did this before, but somehow it got lost. When a RQT_AbsoluteTime is inserted into the command-stream, insert a command-buffer completion handler to record the time of completion & submit the command-buffer immediately. This breaks command-buffers so is noticeably slower and if inserted in a pass that can't be restarted will fail but is currently the only option available. This is sufficient to support the GPUBenchmark used by Scalability. To make this more efficient I've refactored the FMetalCommandBufferFence implementation so that we use a single shared-ptr object containing the command-buffer and a dispatch semaphore, rather than allocating one for each query. The semaphore allows for timed-waits where previously we'd block until completion, unlike the other APIs that report failure after a fixed interval (2s for RQT_AbsoluteTime, otherwise 0.5s). Sadly not all drivers support this abuse of the Metal API, so replace the GL-based workaround for not having time queries with one that just guesses based on RHI device details. Radars will be filed.
#jira UE-40554
Change 3256329 on 2017/01/12 by Olaf.Piesche
#jira UE-38615
Assert shouldn't be necessary; in fact, it causes a crash when exporting emitters, since in that case we're changing the template at runtime.
Change 3256371 on 2017/01/12 by Uriel.Doyon
Reenabled texture streaming bound defrag as the fix is in CL 3252843
Change 3257032 on 2017/01/13 by Daniel.Wright
Added fastClamp to fastmath.usf
Change 3257111 on 2017/01/13 by Daniel.Wright
Disabled bAffectDistanceFieldLighting on DefaultPawn, fixes VisualizeMeshDistanceFields in game
Change 3257112 on 2017/01/13 by Daniel.Wright
DFAO optimizations
* Changed the culling algorithm to produce a list of intersecting screen tiles for each object, instead of the other way around. Each tile / object intersection gets its own cone tracing thread group so wavefronts are much smaller and scheduled better. 3.63ms -> 3.48ms (.15ms)
* Replace slow instructions in inner loop with fast approximations (exp2 -> sqr + 1, rcpFast, lengthFast) 3.25ms -> 3.09ms (.16ms)
* Moved transform from world to local space out of the inner loop (sample position constructed from local space position + direction) 3.09ms -> 3.04ms
* Compute shader for ClearUAV 3.04ms -> 2.62ms (.42ms)
Change 3257113 on 2017/01/13 by Daniel.Wright
Better distance field memory stats
Change 3257326 on 2017/01/13 by Uriel.Doyon
Workaround to support cases where several textures have the same lighting GUID.
Change 3257448 on 2017/01/13 by Daniel.Wright
Removed legacy features Distance Field Specular Occlusion, Distance Field Surface Cache AO, PreCullTriangles
Change 3257616 on 2017/01/13 by Daniel.Wright
Distance field mesh visualization now uses a cone containing the entire tile to cull objects with, making the results stable
Change 3257657 on 2017/01/13 by Daniel.Wright
Mesh distance fields are stored zlib compressed in memory until needed for uploading to GPU
* 81Mb of backing memory -> 32Mb in GPUPerfTest, atlas upload time 29ms -> 893ms
Change 3258063 on 2017/01/14 by Rolando.Caloca
DR - vk - Refactor descriptor set reuse in prep for more changes
Change 3258715 on 2017/01/16 by Daniel.Wright
Added VisualizeGlobalDistanceField show flag
Change 3258827 on 2017/01/16 by Daniel.Wright
Global distance field update regions are clipped against others to reduce redundant updates.
Change 3258959 on 2017/01/16 by Benjamin.Hyder
Updating Planar Reflection example material in TM-Shadermodels
Change 3259270 on 2017/01/16 by Daniel.Wright
[Copy] 'r.MSAACount 1' now produces no MSAA or TAA. 'r.MSAACount 0' can be used to toggle TAA on for comparisons.
Change 3259652 on 2017/01/16 by Uriel.Doyon
Better support for static primitive becoming dynamic.
Change 3260107 on 2017/01/17 by Ben.Woodhouse
Fix FMonitoredProcess to prevent infinite loop in -nothreading mode
#jira UE-40717
Change 3260594 on 2017/01/17 by Daniel.Wright
Added a new global distance field (4x 128^3 clipmaps) which caches mostly static primitives (Mobility set to Static or Stationary)
* The full global distance field inherits from the mostly static cache, so when a Movable primitive is modified, only other movable primitives in the vicinity need to be re-composited into the global distance field
* Global distance field update cost with one large rotating object went from 2.5ms -> .2ms on 970GTX and 4.6ms -> .3ms. Worst case full volume update is mostly the same.
* Adds 12Mb for the new volume textures
Change 3260956 on 2017/01/17 by Daniel.Wright
Structured buffers for DF object data
* Full global distance field clipmap composite 3.0ms -> 2.0ms due to scalarized loads
Change 3261296 on 2017/01/17 by Daniel.Wright
Exposed MaxObjectsPerTile with 'r.AOMaxObjectsPerCullTile' and lowered the default from 512 to 256, saves 17Mb of object tile culling data structures
Removed unnecessary UAV transitions preventing object and global cone tracing from overlapping, saves ~.1ms
Change 3262036 on 2017/01/18 by Ben.Salem
V0 of Perf monitor plugin for easily consumable stat csvs. With plugin enabled, enter PerformanceMonitor help into the console to get usage details.
Change 3262056 on 2017/01/18 by Chris.Bunner
Remove inverse tonemapping when rendering HDR output.
#jira UE-40728
Change 3262661 on 2017/01/18 by Rolando.Caloca
DR - Add missing SetStencilRef() and SetBlendFactor() on most RHIs
- Fix hash for PSOs
Change 3263674 on 2017/01/19 by Chris.Bunner
PR #3144: Improved error messages (Contributed by DarkSlot)
#jira UE-40835
Change 3264150 on 2017/01/19 by Ben.Woodhouse
Add support for single threaded in FMonitoredProcess. Deprecated IsRunning() in favour of a new Update() method because polling IsRunning is not compatible with -nothreading mode
#jira UE-40841
Change 3264153 on 2017/01/19 by Ben.Woodhouse
Integrate latest changes from MS-DX12 CLs 3231395-3262526
- Added WinPixEventRuntime.tps
- Includes PIX support, various optimizations (saved 1.3ms in testbed scene)
CL 3262343:
Fix depth testing on translucency not working correctly after cl 3231395. This change reapplies the D3D12RHI changes from CL 3231395 because those changes were lost when integrating from //Dev-Rendering/ but also includes the depth fixes:
- Fix depth state not being in DEPTH_READ for use as depth read. The issue was HasDepthBits and HasStencilBits wern't intended for SRV formats and always returned false in the SRV case.
CL 3231395:
Update D3D12 RHI:
- Fix deferred MSAA path in RHI
- Add Pix3.h support
- Cleanup SetName usage and remove it from shipping builds.
- Fix fence reuse bug. We were signaling MAX UINT (-1) and then waiting for 0, which was always signaled. This change also removes the fence value reset code, there is no need to reset a fence to a previous value.
- Use FPlatformAtomics::InterlockedIncrement instead of InterlockedIncrement64
- Use InterlockedIncrement() instead of _InterlockedIncrement() and use the FPlatformAtomics:: version.
- Fix possible readback heap being evicted while in use. GetQueryData happens on the render thread and isn't tied to a command list so we should always have readback heaps resident.
Change 3264251 on 2017/01/19 by Mark.Satterthwaite
Modify some asserts in MetalRHI - technically using a store-action of ENoAction on Stencil buffers should make it invalid to restart a render-pass but on Mac it will work because ENoAction won't invalidate anything written. In future we need to use deferred store-actions in Metal so that we can "restart" passes while enforcing correct Load/Store actions.
#jira UE-40803
Change 3264642 on 2017/01/19 by Daniel.Wright
Raised GMaxShadowDepthBufferSizeX to max texture resolution on most platforms, was previously 4096.
Change 3265330 on 2017/01/20 by Ben.Salem
Stop performance plugin from building in Win32.
#tests recompiled and preflighted
Change 3265678 on 2017/01/20 by Marcus.Wassmer
Fix bad declaration.
#3055
Change 3266656 on 2017/01/20 by Mark.Satterthwaite
Changes to the FShaderCache to restore it and extend it to optionally report on shader de-duplication when generating a binary shader cache (Console Variable: r.BinaryShaderCacheLogging).
Duplicate & amend CL #3266053 from Trepka:
Fixed issues with shader cache not working properly with Mac Metal (but it still requires -norhithread to work at all). Enabled the shader cache by default if RHI thread is disabled.
Amend & integrate RCO's CL #3197085.
Change 3267741 on 2017/01/23 by Rolando.Caloca
DR - Detect duplicated shader and pipeline types
Change 3268600 on 2017/01/23 by Uriel.Doyon
Added missing r.Streaming.MaxEffectiveScreenSize config to base texture scability settings.
Integrated CL 3227368 from Orion stream
Enabled r.Streaming.UsePerTextureBias by default as this has been tested in Orion for several months.
Fixed issue with the InvestigateTexture command which could return invalid reference depending on the timing,
Added th MaxEffectiveScreenSize settings in the investigate texture command.
Change 3269512 on 2017/01/24 by Richard.Wallis
Fix for shader binary cache uncompress data size during internal shader log.
Change 3271237 on 2017/01/25 by Ben.Woodhouse
D3D12 updateTexture2D crash fix
#jira UE-41059
Change 3271564 on 2017/01/25 by Olaf.Piesche
#jira UE-40980
#udn 325525
Fix uniform buffers for mesh particles; these should really be on the mesh collector, so allocating them as a one frame resource is safe
Change 3271594 on 2017/01/25 by Ben.Woodhouse
ESRAM support stage 1:
Implemented noncontiguous ESRAM page allocator replacing XgMemoryLayout API. The allocator allocates non-contiguous ranges of pages and maps them onto a contiguous virtual address range.
Unlike the previous implementation, this allocator frees pages for reuse when resources are destroyed
Note: issues with deferred deallocation may prevent reuse in many cases - that will be addressed in the next stage
Support for the old allocator is still available (for now) via the define NEW_ESRAM_ALLOCATOR
#fyi rolando.caloca
Change 3272616 on 2017/01/25 by Rolando.Caloca
DR - Update shader version
Change 3273138 on 2017/01/26 by Ben.Woodhouse
Fix merge issue with MonitoredProcess.cpp (this arose from an integration made as an edit in dev-rendering, which confused perforce when the change was subsequently integrated from main)
[CL 3274498 by Rolando Caloca in Main branch]
2017-01-26 19:20:49 -05:00
}
}