You've already forked UnrealEngineUWP
mirror of
https://github.com/izzy2lost/UnrealEngineUWP.git
synced 2026-03-26 18:15:20 -07:00
#lockdown Nick.Penwarden #rb none ========================== MAJOR FEATURES + CHANGES ========================== Change 3134663 on 2016/09/21 by Chris.Bunner Merging Dev-MaterialLayers to Dev-Rendering, CL 3134208. Initial material attribute extensibility changes. #jira UE-34347 Change 3142292 on 2016/09/27 by Rolando.Caloca DR - hlslcc - Fix for warning X3206: implicit truncation of vector type causing error #jira UE-31438 Change 3143557 on 2016/09/28 by Rolando.Caloca DR - Back out changelist 3142292 Change 3145354 on 2016/09/29 by Benjamin.Hyder Updating Tm-ContactShadows Change 3154832 on 2016/10/07 by Rolando.Caloca DR - vk - Fix crash on framebuffers with missing textures Change 3154838 on 2016/10/07 by Rolando.Caloca DR - vk - Enable clip distance Change 3154840 on 2016/10/07 by Rolando.Caloca DR - Remove branch per codereview Change 3155118 on 2016/10/07 by Rolando.Caloca DR - vk - Compute pipeline fixes Change 3155129 on 2016/10/07 by Rolando.Caloca DR - Added draw events for reflection captures Change 3155167 on 2016/10/07 by Rolando.Caloca DR - Use shader clear for platforms that can't use viewport or scissor Change 3155168 on 2016/10/07 by Rolando.Caloca DR - vk - Added submit gpu - Some fixes for Geometry and Compute Change 3155595 on 2016/10/07 by Rolando.Caloca DR - vk - Use new render pass system Change 3155720 on 2016/10/07 by Rolando.Caloca DR - vk - static analysis fix Change 3155732 on 2016/10/07 by Rolando.Caloca DR - Fix clears for platforms that can't use viewports, excluderects or scissor on clear Change 3156787 on 2016/10/10 by Rolando.Caloca DR - Fix mem leaks Change 3156805 on 2016/10/10 by Rolando.Caloca DR - Improve check msg per licensee Change 3156815 on 2016/10/10 by Rolando.Caloca DR - Fix infinite recursion Change 3157041 on 2016/10/10 by Rolando.Caloca DR - vk - Fix key access from multiple threads Change 3158253 on 2016/10/11 by Rolando.Caloca DR - Fix comment #jira UE-37128 PR #2852 Change 3158606 on 2016/10/11 by Rolando.Caloca DR - vk - Accessors Change 3160418 on 2016/10/12 by Daniel.Wright Lightmap textures are now outered to UMapBuildDataRegistry so that the UMapBuildDataRegistry can be moved in the content browser Change 3160644 on 2016/10/12 by Arne.Schober DR - [UE-32613] - OpenGL used to have custom code in the compiler to modify the source so that the same data and matricies can be used as DirectX, unfortunately that causes precission problem. Fortunately there is an extension available (glClipControl) which enables DirectX behaviour in OpenGL and it is widely supported. We only tested Linux and Windows and therfore only default enable on those platforms. Change 3161219 on 2016/10/13 by Luke.Thatcher [RENDERING] [!] Fix incorrect shader used in GPU Benchmark causing crash in OpenGL. Change 3161838 on 2016/10/13 by Daniel.Wright Fixed level getting added to the dirty list twice when legacy lightmaps are present Change3161884on 2016/10/13 by Arne.Schober DR - Fix Mac and DCC build Change 3162206 on 2016/10/13 by Chris.Bunner Merging Dev-MaterialLayers to Dev-Rendering, CL 3161593: Material expressions; Trig, fast-trig, saturate, round, truncate, pre-skinned normal. Added CustomEyeTangent to material attributes. Resolved some hard-coded attribute typing and other minor fixes. Change 3162491 on 2016/10/13 by Chris.Bunner Merging Dev-MaterialLayers to Dev-Rendering, CL 3162397: More fixed type-casting on material attributes. Swapped compiler::forcecast booleans to flags (and fixed a regression). Change 3163266 on 2016/10/14 by Daniel.Wright Fixed sublevels with legacy lighting data being added to the dirty packages list redundantly Change 3163524 on 2016/10/14 by Mark.Satterthwaite Bring over specific changes from Unicorn branch that increases the size of shader optional data so that it is considerably more useful. Change 3163529 on 2016/10/14 by Mark.Satterthwaite Move the Metal shader source code and compilation path into the newly enlarged shader optional data. Change3163553on 2016/10/14 by Mark.Satterthwaite Speculative fix for FORT-31590 also seen by a licensee - the Metal command buffer handler will be called from a dispatch queue thread that won't be registered with the stats system. #jira FORT-31590 Change 3163562 on 2016/10/14 by Mark.Satterthwaite Tidy up and extend the Metal debugging options: - Added rhi.Metal.BufferScribble which when enabled will fill freed buffer regions with 0xCD to help identify any areas where we are writing to a buffer while it is still being processed on the GPU. - Added rhi.Metal.BufferZeroFill which will zero-fill newly allocated buffer regions before any other data is read/written. Useful for catching cases where we might be reading uninitialised memory. - Added rhi.Metal.ResourcePurgeOnDelete which will purge the backing store of resources prior to releasing them back to the system or the respective pool. This will make any use-after-free conditions much more likely. - Added rhi.Metal.ResourceDeferDeleteNumFrames to defer releasing resources to the system or the resource pool by the specified number of frames (in addition to the current policy of waiting for the current end of frame & command-buffer completion). Useful for tracking down resource lifetime errors. - Fixed a number of bugs related to the modifications to vertex stream handling and addition of the SetShaderBytes API. - Track the start & end of FRingBuffer ranges - it appeared that the ring-buffer usage was invalid but it was in fact only my assumptions about the range that needed to be scribbled for rhi.Metal.BufferScribble. There is still the possibility that command-buffers that are implicitly parallelised by the driver may cause the ring-buffer range tracking to go awry - but with our data dependencies and the separation of the async. compute context I don't believe this is likely. - Fix up the "nometalv2" flag so that we can disable the features only available on iOS/tvOS-10/macOS-10.12 on newer devices to save having to reboot all the time. - Fixed the flickering geometry when enabling rhi.Metal.RuntimeDebugLevel=4 which breaks render passes into separate command-buffers - the occlusion query was waiting on the wrong command buffer in this case. Change 3163752 on 2016/10/14 by Mark.Satterthwaite Add missing parenthesis to fix compile error on iOS. Change 3164151 on 2016/10/16 by Benjamin.Hyder Submitting TM-AutoLOD level to QAGame #jira UE-29618 Change 3164190 on 2016/10/16 by Uriel.Doyon Materials now hold texture streaming data in the form of (UV scale X UV channel) for each texture. This data can be disabled through "r.Streaming.UseMaterialData" Defined a common framework in MeshComponent for texture streaming, used by both StaticMeshes and SkeletalMeshes. Simplified component interface for using the texture streaming build framework. Removed intermediate texture streaming build data from the static mesh components. Fixed shader compilation errors with the decals (from merge with main). Change 3164636 on 2016/10/17 by Rolando.Caloca DR - vk - Fix validation spam Change 3164679 on 2016/10/17 by Arne.Schober DR - [OR-28457] Part1, Scene View Refactoring - Removed Previous VewMatrices from SceneInfo and pass in Previous and Current ViewMatrices into Uniform Buffer creation to uniform UseCase for Shadows and CustomDepth, Fixed a Bug in Shadows with help of Daniel where the SceneView was copied unnecessary copied again. Also simplified the code in that area. Change 3164705 on 2016/10/17 by Daniel.Wright When new levels are loaded, only the Indirect Lighting Cache Allocations intersecting the level's light probes are updated to minimize hitches. This optimization requires a lighting build to compute PrecomputedLightVolume bounds. Change 3164834 on 2016/10/17 by Daniel.Wright Support directional light dynamic shadows in any channel with forward shading, which can happen with multiple shadow casting stationary directional lights (even though only the lighting of one will appear) Change 3164870 on 2016/10/17 by Arne.Schober DR - [OR-28457] Part2, Custom Depth Jitter - Allowed to overwite the viewconstant buffer in the custom depth pass. There ia also a new Project Setting available. The default constructor of the ContextDataType has been explicitly deleted to enforce compile errors when the templated code like the StaticMeshDrawList accidently tries to create a context without ViewUniformBuffer. Change 3164949 on 2016/10/17 by Rolando.Caloca DR - vk - First version of pooled occlusion queries Change 3165100 on 2016/10/17 by Rolando.Caloca DR - vk - Added driver version for Nvidia. AMD doesn't have one yet. Change 3165160 on 2016/10/17 by Rolando.Caloca DR - vk - Fix for queries not ready Change 3165230 on 2016/10/17 by Rolando.Caloca DR - vk - More fixes for occlusion queries Change 3165839 on 2016/10/18 by Rolando.Caloca DR - hlslcc - Fix default parameters getting wrong values Change 3166029 on 2016/10/18 by Rolando.Caloca DR - Switch some clears to DrawClearQuad() Change 3166066 on 2016/10/18 by Mark.Satterthwaite Update ShaderVersion due to CL #3163524 Change 3166067 on 2016/10/18 by Mark.Satterthwaite Update Mac hlslcc for RCO's 3165839. Change 3166370 on 2016/10/18 by Brian.Karis Improved hair AA Change 3166389 on 2016/10/18 by Uriel.Doyon Fixed lightmap having bigger resolutions than the engine can handle #jira UE-34737 #review-3166193 @daniel.wright Change 3166495 on 2016/10/18 by Rolando.Caloca DR - vk - Fix occlusion queries Change 3166516 on 2016/10/18 by Arne.Schober DR - Fix shaderbuild issue Change 3166650 on 2016/10/18 by Rolando.Caloca DR - vk - Enable GRHISupportsFirstInstance Change 3166799 on 2016/10/18 by Arne.Schober DR - [OR-28508] - The velocity Rendering pass was missing the adjustment for the PDO Change 3167855 on 2016/10/19 by Rolando.Caloca DR - vk - Implemented texture streaming Change 3168365 on 2016/10/19 by Rolando.Caloca DR - Fix static analysis Change 3168405 on 2016/10/19 by Mark.Satterthwaite Fix the optional shader data changes from Unicorn to prevent FindOptionalData from erronesouly testing against the trailing optional data size, which can match the tag for optional data entries if you are unlucky. #jira UE-37489 Change 3169467 on 2016/10/20 by Arne.Schober DR - UE-28039 - Fixed flickering cached shadows on dynamic objects: Adding preshadows whose depths are cached so that GatherDynamicMeshElements will still happen, which is necessary for preshadow receiver stenciling. Change 3169478 on 2016/10/20 by Arne.Schober DR - UE-28039 - missing comment Change 3169845 on 2016/10/20 by Arne.Schober DR - UE-35937 - readd Merged out check Change 3169859 on 2016/10/20 by Rolando.Caloca DR - vk - Stop popping up dialog on every run as the device name in the API doesn't match our driver database [CL 3170066 by Marcus Wassmer in Main branch]
670 lines
21 KiB
C++
670 lines
21 KiB
C++
// Copyright 1998-2016 Epic Games, Inc. All Rights Reserved.
|
|
|
|
/*=============================================================================
|
|
GPUBenchmark.cpp: GPUBenchmark to compute performance index to set video options automatically
|
|
=============================================================================*/
|
|
|
|
#include "RendererPrivate.h"
|
|
#include "ScenePrivate.h"
|
|
#include "SceneFilterRendering.h"
|
|
#include "GPUBenchmark.h"
|
|
#include "SceneUtils.h"
|
|
#include "GPUProfiler.h"
|
|
|
|
static const uint32 GBenchmarkResolution = 512;
|
|
static const uint32 GBenchmarkPrimitives = 200000;
|
|
static const uint32 GBenchmarkVertices = GBenchmarkPrimitives * 3;
|
|
|
|
DEFINE_LOG_CATEGORY_STATIC(LogSynthBenchmark, Log, All);
|
|
|
|
/** Encapsulates the post processing down sample pixel shader. */
|
|
template <uint32 PsMethod>
|
|
class FPostProcessBenchmarkPS : public FGlobalShader
|
|
{
|
|
DECLARE_SHADER_TYPE(FPostProcessBenchmarkPS, Global);
|
|
|
|
static bool ShouldCache(EShaderPlatform Platform)
|
|
{
|
|
return IsFeatureLevelSupported(Platform, ERHIFeatureLevel::SM4);
|
|
}
|
|
|
|
static void ModifyCompilationEnvironment(EShaderPlatform Platform, FShaderCompilerEnvironment& OutEnvironment)
|
|
{
|
|
FGlobalShader::ModifyCompilationEnvironment(Platform, OutEnvironment);
|
|
OutEnvironment.SetDefine(TEXT("PS_METHOD"), PsMethod);
|
|
}
|
|
|
|
/** Default constructor. */
|
|
FPostProcessBenchmarkPS() {}
|
|
|
|
public:
|
|
FShaderResourceParameter InputTexture;
|
|
FShaderResourceParameter InputTextureSampler;
|
|
|
|
/** Initialization constructor. */
|
|
FPostProcessBenchmarkPS(const ShaderMetaType::CompiledShaderInitializerType& Initializer)
|
|
: FGlobalShader(Initializer)
|
|
{
|
|
InputTexture.Bind(Initializer.ParameterMap,TEXT("InputTexture"));
|
|
InputTextureSampler.Bind(Initializer.ParameterMap,TEXT("InputTextureSampler"));
|
|
}
|
|
|
|
// FShader interface.
|
|
virtual bool Serialize(FArchive& Ar) override
|
|
{
|
|
bool bShaderHasOutdatedParameters = FGlobalShader::Serialize(Ar);
|
|
Ar << InputTexture << InputTextureSampler;
|
|
return bShaderHasOutdatedParameters;
|
|
}
|
|
|
|
void SetParameters(FRHICommandList& RHICmdList, const FSceneView& View, TRefCountPtr<IPooledRenderTarget>& Src)
|
|
{
|
|
const FPixelShaderRHIParamRef ShaderRHI = GetPixelShader();
|
|
|
|
FGlobalShader::SetParameters(RHICmdList, ShaderRHI, View);
|
|
|
|
SetTextureParameter(RHICmdList, ShaderRHI, InputTexture, InputTextureSampler, TStaticSamplerState<>::GetRHI(), Src->GetRenderTargetItem().ShaderResourceTexture);
|
|
}
|
|
|
|
static const TCHAR* GetSourceFilename()
|
|
{
|
|
return TEXT("GPUBenchmark");
|
|
}
|
|
|
|
static const TCHAR* GetFunctionName()
|
|
{
|
|
return TEXT("MainPS");
|
|
}
|
|
};
|
|
|
|
// #define avoids a lot of code duplication
|
|
#define VARIATION1(A) typedef FPostProcessBenchmarkPS<A> FPostProcessBenchmarkPS##A; \
|
|
IMPLEMENT_SHADER_TYPE2(FPostProcessBenchmarkPS##A, SF_Pixel);
|
|
|
|
VARIATION1(0) VARIATION1(1) VARIATION1(2) VARIATION1(3) VARIATION1(4) VARIATION1(5)
|
|
#undef VARIATION1
|
|
|
|
|
|
/** Encapsulates the post processing down sample vertex shader. */
|
|
template <uint32 VsMethod>
|
|
class FPostProcessBenchmarkVS : public FGlobalShader
|
|
{
|
|
DECLARE_SHADER_TYPE(FPostProcessBenchmarkVS,Global);
|
|
public:
|
|
|
|
static bool ShouldCache(EShaderPlatform Platform)
|
|
{
|
|
return true;
|
|
}
|
|
|
|
static void ModifyCompilationEnvironment(EShaderPlatform Platform, FShaderCompilerEnvironment& OutEnvironment)
|
|
{
|
|
FGlobalShader::ModifyCompilationEnvironment(Platform, OutEnvironment);
|
|
OutEnvironment.SetDefine(TEXT("VS_METHOD"), VsMethod);
|
|
}
|
|
|
|
/** Default constructor. */
|
|
FPostProcessBenchmarkVS() {}
|
|
|
|
/** Initialization constructor. */
|
|
FPostProcessBenchmarkVS(const ShaderMetaType::CompiledShaderInitializerType& Initializer):
|
|
FGlobalShader(Initializer)
|
|
{
|
|
}
|
|
|
|
/** Serializer */
|
|
virtual bool Serialize(FArchive& Ar) override
|
|
{
|
|
bool bShaderHasOutdatedParameters = FGlobalShader::Serialize(Ar);
|
|
|
|
return bShaderHasOutdatedParameters;
|
|
}
|
|
|
|
void SetParameters(FRHICommandList& RHICmdList, const FSceneView& View)
|
|
{
|
|
const FVertexShaderRHIParamRef ShaderRHI = GetVertexShader();
|
|
|
|
FGlobalShader::SetParameters(RHICmdList, ShaderRHI, View);
|
|
}
|
|
};
|
|
|
|
typedef FPostProcessBenchmarkVS<0> FPostProcessBenchmarkVS0;
|
|
typedef FPostProcessBenchmarkVS<1> FPostProcessBenchmarkVS1;
|
|
typedef FPostProcessBenchmarkVS<2> FPostProcessBenchmarkVS2;
|
|
|
|
IMPLEMENT_SHADER_TYPE(template<>,FPostProcessBenchmarkVS0,TEXT("GPUBenchmark"),TEXT("MainBenchmarkVS"),SF_Vertex);
|
|
IMPLEMENT_SHADER_TYPE(template<>,FPostProcessBenchmarkVS1,TEXT("GPUBenchmark"),TEXT("MainBenchmarkVS"),SF_Vertex);
|
|
IMPLEMENT_SHADER_TYPE(template<>,FPostProcessBenchmarkVS2,TEXT("GPUBenchmark"),TEXT("MainBenchmarkVS"),SF_Vertex);
|
|
|
|
struct FBenchmarkVertex
|
|
{
|
|
FVector4 Arg0;
|
|
FVector4 Arg1;
|
|
FVector4 Arg2;
|
|
FVector4 Arg3;
|
|
FVector4 Arg4;
|
|
|
|
FBenchmarkVertex(uint32 VertexID)
|
|
: Arg0(VertexID, 0.0f, 0.0f, 0.0f)
|
|
, Arg1()
|
|
, Arg2()
|
|
, Arg3()
|
|
, Arg4()
|
|
{}
|
|
};
|
|
|
|
struct FVertexThroughputDeclaration : public FRenderResource
|
|
{
|
|
FVertexDeclarationRHIRef DeclRHI;
|
|
|
|
virtual void InitRHI() override
|
|
{
|
|
FVertexDeclarationElementList Elements =
|
|
{
|
|
{ 0, 0 * sizeof(FVector4), VET_Float4, 0, sizeof(FBenchmarkVertex) },
|
|
{ 0, 1 * sizeof(FVector4), VET_Float4, 1, sizeof(FBenchmarkVertex) },
|
|
{ 0, 2 * sizeof(FVector4), VET_Float4, 2, sizeof(FBenchmarkVertex) },
|
|
{ 0, 3 * sizeof(FVector4), VET_Float4, 3, sizeof(FBenchmarkVertex) },
|
|
{ 0, 4 * sizeof(FVector4), VET_Float4, 4, sizeof(FBenchmarkVertex) },
|
|
};
|
|
|
|
DeclRHI = RHICreateVertexDeclaration(Elements);
|
|
}
|
|
|
|
virtual void ReleaseRHI() override
|
|
{
|
|
DeclRHI = nullptr;
|
|
}
|
|
};
|
|
|
|
TGlobalResource<FVertexThroughputDeclaration> GVertexThroughputDeclaration;
|
|
|
|
template <uint32 VsMethod, uint32 PsMethod>
|
|
void RunBenchmarkShader(FRHICommandList& RHICmdList, FVertexBufferRHIParamRef VertexThroughputBuffer, const FSceneView& View, TRefCountPtr<IPooledRenderTarget>& Src, float WorkScale)
|
|
{
|
|
auto ShaderMap = GetGlobalShaderMap(View.GetFeatureLevel());
|
|
|
|
TShaderMapRef<FPostProcessBenchmarkVS<VsMethod>> VertexShader(ShaderMap);
|
|
TShaderMapRef<FPostProcessBenchmarkPS<PsMethod>> PixelShader(ShaderMap);
|
|
|
|
bool bVertexTest = VsMethod != 0;
|
|
FVertexDeclarationRHIParamRef VertexDeclaration = bVertexTest
|
|
? GVertexThroughputDeclaration.DeclRHI
|
|
: GFilterVertexDeclaration.VertexDeclarationRHI;
|
|
|
|
static FGlobalBoundShaderState BoundShaderState;
|
|
SetGlobalBoundShaderState(RHICmdList, View.GetFeatureLevel(), BoundShaderState, VertexDeclaration, *VertexShader, *PixelShader);
|
|
|
|
PixelShader->SetParameters(RHICmdList, View, Src);
|
|
VertexShader->SetParameters(RHICmdList, View);
|
|
|
|
if (bVertexTest)
|
|
{
|
|
// Vertex Tests
|
|
|
|
uint32 TotalNumPrimitives = FMath::CeilToInt(GBenchmarkPrimitives * WorkScale);
|
|
uint32 TotalNumVertices = TotalNumPrimitives * 3;
|
|
|
|
while (TotalNumVertices != 0)
|
|
{
|
|
uint32 VerticesThisPass = FMath::Min(TotalNumVertices, GBenchmarkVertices);
|
|
uint32 PrimitivesThisPass = VerticesThisPass / 3;
|
|
|
|
RHICmdList.SetStreamSource(0, VertexThroughputBuffer, VertexThroughputBuffer ? sizeof(FBenchmarkVertex) : 0, 0);
|
|
|
|
RHICmdList.DrawPrimitive(PT_TriangleList, 0, PrimitivesThisPass, 1);
|
|
|
|
TotalNumVertices -= VerticesThisPass;
|
|
}
|
|
}
|
|
else
|
|
{
|
|
// Pixel Tests
|
|
|
|
// single pass was not fine grained enough so we reduce the pass size based on the fractional part of WorkScale
|
|
float TotalHeight = GBenchmarkResolution * WorkScale;
|
|
|
|
// rounds up
|
|
uint32 PassCount = (uint32)FMath::CeilToFloat(TotalHeight / GBenchmarkResolution);
|
|
|
|
for (uint32 i = 0; i < PassCount; ++i)
|
|
{
|
|
float Top = i * GBenchmarkResolution;
|
|
float Bottom = FMath::Min(Top + GBenchmarkResolution, TotalHeight);
|
|
float LocalHeight = Bottom - Top;
|
|
|
|
DrawRectangle(
|
|
RHICmdList,
|
|
0, 0,
|
|
GBenchmarkResolution, LocalHeight,
|
|
0, 0,
|
|
GBenchmarkResolution, LocalHeight,
|
|
FIntPoint(GBenchmarkResolution, GBenchmarkResolution),
|
|
FIntPoint(GBenchmarkResolution, GBenchmarkResolution),
|
|
*VertexShader,
|
|
EDRF_Default);
|
|
}
|
|
}
|
|
}
|
|
|
|
void RunBenchmarkShader(FRHICommandListImmediate& RHICmdList, FVertexBufferRHIParamRef VertexThroughputBuffer, const FSceneView& View, uint32 MethodId, TRefCountPtr<IPooledRenderTarget>& Src, float WorkScale)
|
|
{
|
|
SCOPED_DRAW_EVENTF(RHICmdList, Benchmark, TEXT("Benchmark Method:%d"), MethodId);
|
|
|
|
switch(MethodId)
|
|
{
|
|
case 0: RunBenchmarkShader<0, 0>(RHICmdList, VertexThroughputBuffer, View, Src, WorkScale); return;
|
|
case 1: RunBenchmarkShader<0, 1>(RHICmdList, VertexThroughputBuffer, View, Src, WorkScale); return;
|
|
case 2: RunBenchmarkShader<0, 2>(RHICmdList, VertexThroughputBuffer, View, Src, WorkScale); return;
|
|
case 3: RunBenchmarkShader<0, 3>(RHICmdList, VertexThroughputBuffer, View, Src, WorkScale); return;
|
|
case 4: RunBenchmarkShader<0, 4>(RHICmdList, VertexThroughputBuffer, View, Src, WorkScale); return;
|
|
case 5: RunBenchmarkShader<1, 5>(RHICmdList, VertexThroughputBuffer, View, Src, WorkScale); return;
|
|
case 6: RunBenchmarkShader<2, 5>(RHICmdList, nullptr, View, Src, WorkScale); return;
|
|
default:
|
|
check(0);
|
|
}
|
|
}
|
|
|
|
// Many Benchmark timings stored in an array to allow to extract a good value dropping outliers
|
|
// We need to get rid of the bad samples.
|
|
class FTimingSeries
|
|
{
|
|
public:
|
|
// @param ArraySize
|
|
void Init(uint32 ArraySize)
|
|
{
|
|
check(ArraySize > 0);
|
|
|
|
TimingValues.AddZeroed(ArraySize);
|
|
}
|
|
|
|
//
|
|
void SetEntry(uint32 Index, float TimingValue)
|
|
{
|
|
check(Index < (uint32)TimingValues.Num());
|
|
|
|
TimingValues[Index] = TimingValue;
|
|
}
|
|
|
|
//
|
|
float GetEntry(uint32 Index) const
|
|
{
|
|
check(Index < (uint32)TimingValues.Num());
|
|
|
|
return TimingValues[Index];
|
|
}
|
|
|
|
// @param OutConfidence
|
|
float ComputeValue(float& OutConfidence) const
|
|
{
|
|
float Ret = 0.0f;
|
|
|
|
TArray<float> SortedValues;
|
|
{
|
|
// a lot of values in the beginning are wrong, we cut off some part (1/4 of the samples area)
|
|
uint32 StartIndex = TimingValues.Num() / 3;
|
|
|
|
for(uint32 Index = StartIndex; Index < (uint32)TimingValues.Num(); ++Index)
|
|
{
|
|
SortedValues.Add(TimingValues[Index]);
|
|
}
|
|
SortedValues.Sort();
|
|
}
|
|
|
|
OutConfidence = 0.0f;
|
|
|
|
uint32 Passes = 10;
|
|
|
|
// slow but simple
|
|
for(uint32 Pass = 0; Pass < Passes; ++Pass)
|
|
{
|
|
// 0..1, 0 not included
|
|
float Alpha = (Pass + 1) / (float)Passes;
|
|
|
|
int32 MidIndex = SortedValues.Num() / 2;
|
|
int32 FromIndex = (int32)FMath::Lerp(MidIndex, 0, Alpha);
|
|
int32 ToIndex = (int32)FMath::Lerp(MidIndex, SortedValues.Num(), Alpha);
|
|
|
|
float Delta = 0.0f;
|
|
float Confidence = 0.0f;
|
|
|
|
float TimingValue = ComputeTimingFromSortedRange(FromIndex, ToIndex, SortedValues, Delta, Confidence);
|
|
|
|
// aim for 5% delta and some samples
|
|
if(Pass > 0 && Delta > TimingValue * 0.5f)
|
|
{
|
|
// it gets worse, take the best one we had so far
|
|
break;
|
|
}
|
|
|
|
OutConfidence = Confidence;
|
|
Ret = TimingValue;
|
|
}
|
|
|
|
return Ret;
|
|
}
|
|
|
|
private:
|
|
// @param FromIndex within SortedValues
|
|
// @param ToIndex within SortedValues
|
|
// @param OutDelta +/-
|
|
// @param OutConfidence 0..100 0=not at all, 100=fully, meaning how many samples are considered useful
|
|
// @return TimingValue, smaller is better
|
|
static float ComputeTimingFromSortedRange(int32 FromIndex, int32 ToIndex, const TArray<float>& SortedValues, float& OutDelta, float& OutConfidence)
|
|
{
|
|
float ValidSum = 0;
|
|
uint32 ValidCount = 0;
|
|
float Min = FLT_MAX;
|
|
float Max = -FLT_MAX;
|
|
{
|
|
for(int32 Index = FromIndex; Index < ToIndex; ++Index)
|
|
{
|
|
float Value = SortedValues[Index];
|
|
|
|
Min = FMath::Min(Min, Value);
|
|
Max = FMath::Max(Max, Value);
|
|
|
|
ValidSum += Value;
|
|
++ValidCount;
|
|
}
|
|
}
|
|
|
|
if(ValidCount)
|
|
{
|
|
OutDelta = (Max - Min) * 0.5f;
|
|
|
|
OutConfidence = 100.0f * ValidCount / (float)SortedValues.Num();
|
|
|
|
return ValidSum / ValidCount;
|
|
}
|
|
else
|
|
{
|
|
OutDelta = 0.0f;
|
|
OutConfidence = 0.0f;
|
|
return 0.0f;
|
|
}
|
|
}
|
|
|
|
TArray<float> TimingValues;
|
|
};
|
|
|
|
void RendererGPUBenchmark(FRHICommandListImmediate& RHICmdList, FSynthBenchmarkResults& InOut, const FSceneView& View, float WorkScale, bool bDebugOut)
|
|
{
|
|
check(IsInRenderingThread());
|
|
|
|
FRenderQueryPool TimerQueryPool(RQT_AbsoluteTime);
|
|
|
|
bool bValidGPUTimer = (FGPUTiming::GetTimingFrequency() / (1000 * 1000)) != 0;
|
|
|
|
if(!bValidGPUTimer)
|
|
{
|
|
UE_LOG(LogSynthBenchmark, Warning, TEXT("RendererGPUBenchmark failed, look for \"GPU Timing Frequency\" in the log"));
|
|
return;
|
|
}
|
|
|
|
TResourceArray<FBenchmarkVertex> Vertices;
|
|
Vertices.Reserve(GBenchmarkVertices);
|
|
for (uint32 Index = 0; Index < GBenchmarkVertices; ++Index)
|
|
{
|
|
Vertices.Emplace(Index);
|
|
}
|
|
|
|
FRHIResourceCreateInfo CreateInfo(&Vertices);
|
|
FVertexBufferRHIRef VertexBuffer = RHICreateVertexBuffer(GBenchmarkVertices * sizeof(FBenchmarkVertex), BUF_Static, CreateInfo);
|
|
|
|
// two RT to ping pong so we force the GPU to flush it's pipeline
|
|
TRefCountPtr<IPooledRenderTarget> RTItems[3];
|
|
{
|
|
FPooledRenderTargetDesc Desc(FPooledRenderTargetDesc::Create2DDesc(FIntPoint(GBenchmarkResolution, GBenchmarkResolution), PF_B8G8R8A8, FClearValueBinding::None, TexCreate_None, TexCreate_RenderTargetable | TexCreate_ShaderResource, false));
|
|
Desc.AutoWritable = false;
|
|
|
|
GRenderTargetPool.FindFreeElement(RHICmdList, Desc, RTItems[0], TEXT("Benchmark0"));
|
|
GRenderTargetPool.FindFreeElement(RHICmdList, Desc, RTItems[1], TEXT("Benchmark1"));
|
|
|
|
Desc.Extent = FIntPoint(1, 1);
|
|
Desc.Flags = TexCreate_CPUReadback; // needs TexCreate_ResolveTargetable?
|
|
Desc.TargetableFlags = TexCreate_None;
|
|
|
|
GRenderTargetPool.FindFreeElement(RHICmdList, Desc, RTItems[2], TEXT("BenchmarkReadback"));
|
|
}
|
|
|
|
// set the state
|
|
RHICmdList.SetBlendState(TStaticBlendState<>::GetRHI());
|
|
RHICmdList.SetRasterizerState(TStaticRasterizerState<>::GetRHI());
|
|
RHICmdList.SetDepthStencilState(TStaticDepthStencilState<false,CF_Always>::GetRHI());
|
|
|
|
{
|
|
// larger number means more accuracy but slower, some slower GPUs might timeout with a number to large
|
|
const uint32 IterationCount = 70;
|
|
const uint32 MethodCount = ARRAY_COUNT(InOut.GPUStats);
|
|
|
|
enum class EMethodType
|
|
{
|
|
Vertex,
|
|
Pixel
|
|
};
|
|
|
|
struct FBenchmarkMethod
|
|
{
|
|
const TCHAR* Desc;
|
|
float IndexNormalizedTime;
|
|
const TCHAR* ValueType;
|
|
float Weight;
|
|
EMethodType Type;
|
|
};
|
|
|
|
const FBenchmarkMethod Methods[] =
|
|
{
|
|
// e.g. on NV670: Method3 (mostly fill rate )-> 26GP/s (seems realistic)
|
|
// reference: http://en.wikipedia.org/wiki/Comparison_of_Nvidia_graphics_processing_units theoretical: 29.3G/s
|
|
{ TEXT("ALUHeavyNoise"), 1.0f / 4.601f, TEXT("s/GigaPix"), 1.0f, EMethodType::Pixel },
|
|
{ TEXT("TexHeavy"), 1.0f / 7.447f, TEXT("s/GigaPix"), 0.1f, EMethodType::Pixel },
|
|
{ TEXT("DepTexHeavy"), 1.0f / 3.847f, TEXT("s/GigaPix"), 0.1f, EMethodType::Pixel },
|
|
{ TEXT("FillOnly"), 1.0f / 25.463f, TEXT("s/GigaPix"), 3.0f, EMethodType::Pixel },
|
|
{ TEXT("Bandwidth"), 1.0f / 1.072f, TEXT("s/GigaPix"), 1.0f, EMethodType::Pixel },
|
|
{ TEXT("VertThroughPut1"), 1.0f / 1.537f, TEXT("s/GigaVert"), 0.0f, EMethodType::Vertex }, // TODO: Set weights
|
|
{ TEXT("VertThroughPut2"), 1.0f / 1.767f, TEXT("s/GigaVert"), 0.0f, EMethodType::Vertex }, // TODO: Set weights
|
|
};
|
|
|
|
static_assert(ARRAY_COUNT(Methods) == ARRAY_COUNT(InOut.GPUStats), "Benchmark methods descriptor array lengths should match.");
|
|
|
|
// Initialize the GPU benchmark stats
|
|
for (int32 Index = 0; Index < ARRAY_COUNT(Methods); ++Index)
|
|
{
|
|
auto& Method = Methods[Index];
|
|
InOut.GPUStats[Index] = FSynthBenchmarkStat(Method.Desc, Method.IndexNormalizedTime, Method.ValueType, Method.Weight);
|
|
}
|
|
|
|
// 0 / 1
|
|
uint32 DestRTIndex = 0;
|
|
|
|
const uint32 TimerSampleCount = IterationCount * MethodCount + 1;
|
|
|
|
static FRenderQueryRHIRef TimerQueries[TimerSampleCount];
|
|
static float LocalWorkScale[IterationCount];
|
|
|
|
for(uint32 i = 0; i < TimerSampleCount; ++i)
|
|
{
|
|
TimerQueries[i] = TimerQueryPool.AllocateQuery();
|
|
}
|
|
|
|
const bool bSupportsTimerQueries = (TimerQueries[0] != NULL);
|
|
if(!bSupportsTimerQueries)
|
|
{
|
|
UE_LOG(LogSynthBenchmark, Warning, TEXT("GPU driver does not support timer queries."));
|
|
|
|
// Temporary workaround for GL_TIMESTAMP being unavailable and GL_TIME_ELAPSED workaround breaking drivers
|
|
#if PLATFORM_MAC
|
|
GLint RendererID = 0;
|
|
float PerfScale = 1.0f;
|
|
[[NSOpenGLContext currentContext] getValues:&RendererID forParameter:NSOpenGLCPCurrentRendererID];
|
|
{
|
|
switch((RendererID & kCGLRendererIDMatchingMask))
|
|
{
|
|
case kCGLRendererATIRadeonX4000ID: // AMD 7xx0 & Dx00 series - should be pretty beefy
|
|
PerfScale = 1.2f;
|
|
break;
|
|
case kCGLRendererATIRadeonX3000ID: // AMD 5xx0, 6xx0 series - mostly OK
|
|
case kCGLRendererGeForceID: // Nvidia 6x0 & 7x0 series - mostly OK
|
|
PerfScale = 2.0f;
|
|
break;
|
|
case kCGLRendererIntelHD5000ID: // Intel HD 5000, Iris, Iris Pro - not dreadful
|
|
PerfScale = 4.2f;
|
|
break;
|
|
case kCGLRendererIntelHD4000ID: // Intel HD 4000 - quite slow
|
|
PerfScale = 7.5f;
|
|
break;
|
|
case kCGLRendererATIRadeonX2000ID: // ATi 4xx0, 3xx0, 2xx0 - almost all very slow and drivers are now very buggy
|
|
case kCGLRendererGeForce8xxxID: // Nvidia 3x0, 2x0, 1x0, 9xx0, 8xx0 - almost all very slow
|
|
case kCGLRendererIntelHDID: // Intel HD 3000 - very, very slow and very buggy driver
|
|
default:
|
|
PerfScale = 10.0f;
|
|
break;
|
|
}
|
|
}
|
|
|
|
for (int32 Index = 0; Index < MethodCount; ++Index)
|
|
{
|
|
FSynthBenchmarkStat& Stat = InOut.GPUStats[Index];
|
|
Stat.SetMeasuredTime(FTimeSample(PerfScale, PerfScale * Methods[Index].IndexNormalizedTime));
|
|
}
|
|
#endif
|
|
return;
|
|
}
|
|
|
|
// TimingValues are in Seconds
|
|
FTimingSeries TimingSeries[MethodCount];
|
|
// in 1/1000000 Seconds
|
|
uint64 TotalTimes[MethodCount];
|
|
|
|
for(uint32 MethodIterator = 0; MethodIterator < MethodCount; ++MethodIterator)
|
|
{
|
|
TotalTimes[MethodIterator] = 0;
|
|
TimingSeries[MethodIterator].Init(IterationCount);
|
|
}
|
|
|
|
RHICmdList.EndRenderQuery(TimerQueries[0]);
|
|
|
|
// multiple iterations to see how trust able the values are
|
|
for(uint32 Iteration = 0; Iteration < IterationCount; ++Iteration)
|
|
{
|
|
for(uint32 MethodIterator = 0; MethodIterator < MethodCount; ++MethodIterator)
|
|
{
|
|
// alternate between forward and backward (should give the same number)
|
|
// uint32 MethodId = (Iteration % 2) ? MethodIterator : (MethodCount - 1 - MethodIterator);
|
|
uint32 MethodId = MethodIterator;
|
|
|
|
uint32 QueryIndex = 1 + Iteration * MethodCount + MethodId;
|
|
|
|
// 0 / 1
|
|
const uint32 SrcRTIndex = 1 - DestRTIndex;
|
|
|
|
GRenderTargetPool.VisualizeTexture.SetCheckPoint(RHICmdList, RTItems[DestRTIndex]);
|
|
|
|
SetRenderTarget(RHICmdList, RTItems[DestRTIndex]->GetRenderTargetItem().TargetableTexture, FTextureRHIRef(), true);
|
|
|
|
// decide how much work we do in this pass
|
|
LocalWorkScale[Iteration] = (Iteration / 10.f + 1.f) * WorkScale;
|
|
|
|
RunBenchmarkShader(RHICmdList, VertexBuffer, View, MethodId, RTItems[SrcRTIndex], LocalWorkScale[Iteration]);
|
|
|
|
RHICmdList.CopyToResolveTarget(RTItems[DestRTIndex]->GetRenderTargetItem().TargetableTexture, RTItems[DestRTIndex]->GetRenderTargetItem().ShaderResourceTexture, false, FResolveParams());
|
|
|
|
/*if(bGPUCPUSync)
|
|
{
|
|
// more consistent timing but strangely much faster to the level that is unrealistic
|
|
|
|
FResolveParams Param;
|
|
|
|
Param.Rect = FResolveRect(0, 0, 1, 1);
|
|
RHICmdList.CopyToResolveTarget(
|
|
RTItems[DestRTIndex]->GetRenderTargetItem().TargetableTexture,
|
|
RTItems[2]->GetRenderTargetItem().ShaderResourceTexture,
|
|
false,
|
|
Param);
|
|
|
|
void* Data = 0;
|
|
int Width = 0;
|
|
int Height = 0;
|
|
|
|
RHIMapStagingSurface(RTItems[2]->GetRenderTargetItem().ShaderResourceTexture, Data, Width, Height);
|
|
RHIUnmapStagingSurface(RTItems[2]->GetRenderTargetItem().ShaderResourceTexture);
|
|
}*/
|
|
|
|
RHICmdList.EndRenderQuery(TimerQueries[QueryIndex]);
|
|
|
|
// ping pong
|
|
DestRTIndex = 1 - DestRTIndex;
|
|
}
|
|
}
|
|
|
|
{
|
|
uint64 OldAbsTime = 0;
|
|
// flushes the RHI thread to make sure all RHICmdList.EndRenderQuery() commands got executed.
|
|
RHICmdList.ImmediateFlush(EImmediateFlushType::FlushRHIThread);
|
|
RHICmdList.GetRenderQueryResult(TimerQueries[0], OldAbsTime, true);
|
|
TimerQueryPool.ReleaseQuery(TimerQueries[0]);
|
|
|
|
for(uint32 Iteration = 0; Iteration < IterationCount; ++Iteration)
|
|
{
|
|
uint32 Results[MethodCount];
|
|
|
|
for(uint32 MethodId = 0; MethodId < MethodCount; ++MethodId)
|
|
{
|
|
uint32 QueryIndex = 1 + Iteration * MethodCount + MethodId;
|
|
|
|
uint64 AbsTime;
|
|
RHICmdList.GetRenderQueryResult(TimerQueries[QueryIndex], AbsTime, true);
|
|
TimerQueryPool.ReleaseQuery(TimerQueries[QueryIndex]);
|
|
|
|
uint64 RelTime = FMath::Max(AbsTime - OldAbsTime, 1ull);
|
|
|
|
TotalTimes[MethodId] += RelTime;
|
|
Results[MethodId] = RelTime;
|
|
|
|
OldAbsTime = AbsTime;
|
|
}
|
|
|
|
for(uint32 MethodId = 0; MethodId < MethodCount; ++MethodId)
|
|
{
|
|
float TimeInSec = Results[MethodId] / 1000000.0f;
|
|
|
|
if (Methods[MethodId].Type == EMethodType::Vertex)
|
|
{
|
|
// to normalize from seconds to seconds per GVert
|
|
float SamplesInGVert = LocalWorkScale[Iteration] * GBenchmarkVertices / 1000000000.0f;
|
|
TimingSeries[MethodId].SetEntry(Iteration, TimeInSec / SamplesInGVert);
|
|
}
|
|
else
|
|
{
|
|
check(Methods[MethodId].Type == EMethodType::Pixel);
|
|
|
|
// to normalize from seconds to seconds per GPixel
|
|
float SamplesInGPix = LocalWorkScale[Iteration] * GBenchmarkResolution * GBenchmarkResolution / 1000000000.0f;
|
|
|
|
// TimingValue in Seconds per GPixel
|
|
TimingSeries[MethodId].SetEntry(Iteration, TimeInSec / SamplesInGPix);
|
|
}
|
|
}
|
|
}
|
|
|
|
if(bSupportsTimerQueries)
|
|
{
|
|
for(uint32 MethodId = 0; MethodId < MethodCount; ++MethodId)
|
|
{
|
|
float Confidence = 0.0f;
|
|
// in seconds per GPixel
|
|
float NormalizedTime = TimingSeries[MethodId].ComputeValue(Confidence);
|
|
|
|
if(Confidence > 0)
|
|
{
|
|
FTimeSample TimeSample(TotalTimes[MethodId] / 1000000.0f, NormalizedTime);
|
|
|
|
InOut.GPUStats[MethodId].SetMeasuredTime(TimeSample, Confidence);
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|