Files
UnrealEngineUWP/Engine/Source/Runtime/RenderCore/Public/RenderGraphEvent.h

566 lines
20 KiB
C
Raw Normal View History

// Copyright Epic Games, Inc. All Rights Reserved.
#pragma once
#include "Containers/Array.h"
#include "Containers/StaticArray.h"
#include "Containers/UnrealString.h"
#include "DynamicRenderScaling.h"
#include "HAL/Platform.h"
#include "HAL/PlatformCrt.h"
#include "Misc/AssertionMacros.h"
#include "MultiGPU.h"
#include "ProfilingDebugging/CsvProfiler.h"
#include "ProfilingDebugging/CsvProfilerConfig.h"
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
#include "ProfilingDebugging/CpuProfilerTrace.h"
#include "ProfilingDebugging/RealtimeGPUProfiler.h"
#include "RHI.h"
#include "RHICommandList.h"
#include "RenderGraphAllocator.h"
#include "RenderGraphDefinitions.h"
#include "RendererInterface.h"
#include "Stats/Stats2.h"
#include "Templates/UnrealTemplate.h"
#include "UObject/NameTypes.h"
//////////////////////////////////////////////////////////////////////////
//
// GPU Events - Named hierarchical events emitted to external profiling tools.
//
//////////////////////////////////////////////////////////////////////////
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
class FRDGScopeState;
/** Stores a GPU event name for the render graph. Draw events can be compiled out entirely from
* a release build for performance.
*/
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
class RENDERCORE_API FRDGEventName final
{
public:
FRDGEventName() = default;
// Constructors require a string that matches the RDG builder lifetime, as copies are not made in all configurations.
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
explicit FRDGEventName(const TCHAR* EventFormat, ...);
FRDGEventName(int32 NonVariadic, const TCHAR* EventName);
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
FRDGEventName(const FRDGEventName& Other) = default;
FRDGEventName& operator=(const FRDGEventName& Other) = default;
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
const TCHAR* GetTCHAR() const;
#if WITH_RHI_BREADCRUMBS
FRHIBreadcrumbNode* AllocBreadcrumb(FRHIBreadcrumbData&& Data, FRHIBreadcrumbAllocator& Allocator) const;
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
#endif // WITH_RHI_BREADCRUMBS
private:
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
#if RDG_EVENTS >= RDG_EVENTS_STRING_REF
// Event format kept around to still have a clue what error might be causing the problem in error messages.
const TCHAR* EventFormat = TEXT("");
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
#endif
#if RDG_EVENTS == RDG_EVENTS_STRING_COPY
FString FormattedEventName;
#endif
};
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
enum class ERDGScopeFlags : uint8
{
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
None = 0,
// Disables any nested scopes of the same type.
Final = 1 << 0,
// Ensures the scope is always emitted (ignores cvars that disable scopes)
AlwaysEnable = 1 << 1,
// The scope includes a GPU stat, so may need to be enabled even when cvars are disabling scopes.
Stat = 1 << 2,
};
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
ENUM_CLASS_FLAGS(ERDGScopeFlags);
#if HAS_GPU_STATS && (RHI_NEW_GPU_PROFILER == 0)
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
// Scope type for the legacy "realtime" GPU profiler and draw call counter stats
struct FRDGScope_GPU
{
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
FRealtimeGPUProfilerQuery StartQuery;
FRealtimeGPUProfilerQuery StopQuery;
FName StatName;
TStatId StatId;
FString StatDescription;
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
TOptional<FRHIDrawStatsCategory const*> PreviousCategory {};
FRHIDrawStatsCategory const* CurrentCategory = nullptr;
bool bEmitDuringExecute;
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
inline FRDGScope_GPU(FRDGScopeState& State, FRHIGPUMask GPUMask, const FName& CsvStatName, const TStatId& Stat, const TCHAR* Description, FRHIDrawStatsCategory const& Category);
inline ~FRDGScope_GPU();
inline void ImmediateEnd(FRDGScopeState& State);
inline void BeginCPU(FRHIComputeCommandList& RHICmdList, bool bPreScope);
inline void EndCPU (FRHIComputeCommandList& RHICmdList, bool bPreScope);
inline void BeginGPU(FRHIComputeCommandList& RHICmdList);
inline void EndGPU (FRHIComputeCommandList& RHICmdList);
};
#endif // HAS_GPU_STATS && (RHI_NEW_GPU_PROFILER == 0)
#if CSV_PROFILER_STATS
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
struct FRDGScope_CSVExclusive
{
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
const char* const StatName;
FRDGScope_CSVExclusive(FRDGScopeState&, const char* StatName)
: StatName(StatName)
{
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
FCsvProfiler::BeginExclusiveStat(StatName);
}
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
void ImmediateEnd(FRDGScopeState&)
{
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
FCsvProfiler::EndExclusiveStat(StatName);
}
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
void BeginCPU(FRHIComputeCommandList& RHICmdList, bool bPreScope)
{
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
FCsvProfiler::BeginExclusiveStat(StatName);
}
void EndCPU(FRHIComputeCommandList& RHICmdList, bool bPreScope)
{
FCsvProfiler::EndExclusiveStat(StatName);
}
void BeginGPU(FRHIComputeCommandList& RHICmdList)
{
}
void EndGPU(FRHIComputeCommandList& RHICmdList)
{
}
};
#endif // CSV_PROFILER_STATS
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
struct FRDGScope_Budget
{
class FRDGTimingFrame* Frame = nullptr;
int32 ScopeId;
bool bPop;
RENDERCORE_API FRDGScope_Budget(FRDGScopeState& State, DynamicRenderScaling::FBudget const& Budget);
RENDERCORE_API void ImmediateEnd(FRDGScopeState& State);
void BeginCPU(FRHIComputeCommandList& RHICmdList, bool bPreScope) { }
void EndCPU (FRHIComputeCommandList& RHICmdList, bool bPreScope) { }
RENDERCORE_API void BeginGPU(FRHIComputeCommandList& RHICmdList);
RENDERCORE_API void EndGPU (FRHIComputeCommandList& RHICmdList);
};
#if RDG_EVENTS
// Scope type for inserting named events on the CPU and GPU timelines.
struct FRDGScope_RHI
{
FRDGEventName Name;
#if WITH_RHI_BREADCRUMBS
FRHIBreadcrumbNode* Node = nullptr;
#endif
inline FRDGScope_RHI(FRDGScopeState& State, FRHIBreadcrumbData&& Data, FRDGEventName&& Name);
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
inline void ImmediateEnd(FRDGScopeState& State);
#if WITH_RHI_BREADCRUMBS
void BeginCPU(FRHIComputeCommandList& RHICmdList, bool bPreScope)
{
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
if (Node)
{
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
RHICmdList.BeginBreadcrumbCPU(Node, !bPreScope);
if (!bPreScope)
{
Remove use of SwitchPipeline at the bottom-of-pipe (i.e. during RHI command list execution). - This was mainly used for bindless descriptor updates, where updates need to be applied to all GPU pipelines. - Switching pipeline can change the current active breadcrumb on the new pipeline. Doing this at the bottom-of-pipe is not possible as the start/end breadcrumbs for each command list must be known at dispatch time (before execution / translation). - Added EnqueueLambdaMultiPipe which passes an array of RHI contexts to the lambda. This generally replaces the FRHICommandListBase& which is handed down through the platform RHI. - EnqueueLambdaMultiPipe may only be called at the top-of-pipe. Replace RHITransfer[...]UnderlyingResource with RHIReplaceResources on FDynamicRHI / platform implementations - Old function was always bottom-of-pipe, so couldn't call EnqueueLambdaMultiPipe. New function takes the RHICmdList and is called at top-of-pipe. - All resource types are merged into the same function (currently buffers and raytracing geometry). Remove use of RHILockBuffer and RHIUnlockBuffer at the bottom-of-pipe - Since platform RHIs need to use EnqueueLambdaMultiPipe for buffer locks/unlocks, it is no longer possible to call RHILock/UnlockBuffer at the bottom-of-pipe. - Also, buffers locked on parallel translating command lists are broken. Lock/unlock calls RHIThreadFence(true), which flags the command list for single-thread translate, however calling this at the bottom-of-pipe is too late, as the decision to dispatch the command list in parallel has already been made. - Added checks in these functions to catch future use. #rb zach.bethel #jira UE-208823 [FYI] christopher.waters [CL 32220227 by luke thatcher in ue5-main branch]
2024-03-13 13:49:59 -04:00
RHICmdList.BeginBreadcrumbGPU(Node, RHICmdList.GetPipeline());
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
}
}
}
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
void EndCPU(FRHIComputeCommandList& RHICmdList, bool bPreScope)
{
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
if (Node)
{
if (!bPreScope)
{
Remove use of SwitchPipeline at the bottom-of-pipe (i.e. during RHI command list execution). - This was mainly used for bindless descriptor updates, where updates need to be applied to all GPU pipelines. - Switching pipeline can change the current active breadcrumb on the new pipeline. Doing this at the bottom-of-pipe is not possible as the start/end breadcrumbs for each command list must be known at dispatch time (before execution / translation). - Added EnqueueLambdaMultiPipe which passes an array of RHI contexts to the lambda. This generally replaces the FRHICommandListBase& which is handed down through the platform RHI. - EnqueueLambdaMultiPipe may only be called at the top-of-pipe. Replace RHITransfer[...]UnderlyingResource with RHIReplaceResources on FDynamicRHI / platform implementations - Old function was always bottom-of-pipe, so couldn't call EnqueueLambdaMultiPipe. New function takes the RHICmdList and is called at top-of-pipe. - All resource types are merged into the same function (currently buffers and raytracing geometry). Remove use of RHILockBuffer and RHIUnlockBuffer at the bottom-of-pipe - Since platform RHIs need to use EnqueueLambdaMultiPipe for buffer locks/unlocks, it is no longer possible to call RHILock/UnlockBuffer at the bottom-of-pipe. - Also, buffers locked on parallel translating command lists are broken. Lock/unlock calls RHIThreadFence(true), which flags the command list for single-thread translate, however calling this at the bottom-of-pipe is too late, as the decision to dispatch the command list in parallel has already been made. - Added checks in these functions to catch future use. #rb zach.bethel #jira UE-208823 [FYI] christopher.waters [CL 32220227 by luke thatcher in ue5-main branch]
2024-03-13 13:49:59 -04:00
RHICmdList.EndBreadcrumbGPU(Node, RHICmdList.GetPipeline());
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
}
RHICmdList.EndBreadcrumbCPU(Node, !bPreScope);
}
}
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
#else
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
void BeginCPU(FRHIComputeCommandList& RHICmdList, bool bPreScope) {}
void EndCPU (FRHIComputeCommandList& RHICmdList, bool bPreScope) {}
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
#endif
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
// Nothing to do for Begin/EndGPU. The RHI API only requires breadcrumbs to be
// begun/ended once, and will automatically fixup other pipelines whenever we switch.
void BeginGPU(FRHIComputeCommandList& RHICmdList) {}
void EndGPU (FRHIComputeCommandList& RHICmdList) {}
};
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
#endif // RDG_EVENTS
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
//
// Main RDG scope class.
//
// A tree of these scopes is created by the render thread as the RenderGraph is built.
// Each scope type implementation uses the following functions, which are called during different RDG phases:
//
// Constructor / ImmediateEnd() - Render thread timeline. Called once, either side of scoped graph building work.
//
// BeginCPU / EndCPU - Parallel threads. Called during RDG pass lambdas execution. Scopes may be
// entered / exited multiple times depending on parallel pass set bucketing.
//
// BeginGPU / EndGPU - Parallel threads. Called once for each GPU pipeline the scope covers.
// Used for inserting commands on the RHICmdList. The command list passed to
// Begin / End may be different in each, depending on parallel pass set bucketing.
//
struct FRDGScope
{
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
FRDGScope* const Parent;
FRDGPass* CPUFirstPass = nullptr;
FRDGPass* CPULastPass = nullptr;
TRHIPipelineArray<FRDGPass*> GPUFirstPass { InPlace, nullptr };
TRHIPipelineArray<FRDGPass*> GPULastPass { InPlace, nullptr };
template <typename... TTypes>
class TStorage
{
typedef TVariant<FEmptyVariantState, TTypes...> TImpl;
TImpl Impl;
public:
template <typename TCallback>
void Dispatch(TCallback&& Callback)
{
size_t Index = Impl.GetIndex();
check(Index > 0);
((Index == Impl.template IndexOfType<TTypes>() ? Callback(Impl.template Get<TTypes>()),0 : 0), ...);
}
template <typename TScopeType, typename... TArgs>
void Emplace(TArgs&&... Args)
{
Impl.template Emplace<TScopeType>(Forward<TArgs>(Args)...);
}
template <typename TScopeType>
static constexpr SIZE_T GetTypeIndex()
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
{
return TImpl::template IndexOfType<TScopeType>();
}
template <typename TScopeType>
TScopeType* Get()
{
return Impl.GetIndex() == Impl.template IndexOfType<TScopeType>()
? &Impl.template Get<TScopeType>()
: nullptr;
}
template <typename TScopeType>
TScopeType const* Get() const
{
return const_cast<TStorage&>(*this).Get<TScopeType>();
}
};
typedef TStorage<
FRDGScope_Budget
#if RDG_EVENTS
, FRDGScope_RHI
#endif
#if HAS_GPU_STATS && (RHI_NEW_GPU_PROFILER == 0)
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
, FRDGScope_GPU
#endif
#if CSV_PROFILER_STATS
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
, FRDGScope_CSVExclusive
#endif
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
> FStorage;
FStorage Impl;
#if RDG_ENABLE_TRACE
bool bVisited = false;
#endif
FRDGScope(FRDGScope* Parent)
: Parent(Parent)
{}
void ImmediateEnd(FRDGScopeState& State) { Impl.Dispatch([&](auto& Scope) { Scope.ImmediateEnd(State); }); }
void BeginCPU(FRHIComputeCommandList& RHICmdList, bool bPreScope) { Impl.Dispatch([&](auto& Scope) { Scope.BeginCPU(RHICmdList, bPreScope); }); }
void BeginGPU(FRHIComputeCommandList& RHICmdList ) { Impl.Dispatch([&](auto& Scope) { Scope.BeginGPU(RHICmdList ); }); }
void EndCPU (FRHIComputeCommandList& RHICmdList, bool bPreScope) { Impl.Dispatch([&](auto& Scope) { Scope.EndCPU (RHICmdList, bPreScope); }); }
void EndGPU (FRHIComputeCommandList& RHICmdList ) { Impl.Dispatch([&](auto& Scope) { Scope.EndGPU (RHICmdList ); }); }
template <typename TScopeType> TScopeType * Get() { return Impl.Get<TScopeType>(); }
template <typename TScopeType> TScopeType const* Get() const { return Impl.Get<TScopeType>(); }
FString GetFullPath(FRDGEventName const& PassName);
};
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
template <typename TScopeType>
class TRDGEventScopeGuard
{
FRDGScopeState& State;
FRDGScope* const Scope;
public:
template <typename... TArgs>
inline TRDGEventScopeGuard(FRDGScopeState& State, ERDGScopeFlags Flags, TArgs&&... Args);
inline ~TRDGEventScopeGuard();
private:
static constexpr uint32 TypeMask = 1u << FRDGScope::FStorage::GetTypeIndex<TScopeType>();
static inline FRDGScope* Allocate(FRDGScopeState& State, ERDGScopeFlags Flags);
};
/** Macros for create render graph event names and scopes.
*
* FRDGEventName Name = RDG_EVENT_NAME("MyPass %sx%s", ViewRect.Width(), ViewRect.Height());
*
* RDG_EVENT_SCOPE(GraphBuilder, "MyProcessing %sx%s", ViewRect.Width(), ViewRect.Height());
*/
#if RDG_EVENTS
// Skip expensive string formatting for the relatively common case of no varargs. We detect this by stringizing the varargs and checking if the string is non-empty (more than just a null terminator).
#define RDG_EVENT_NAME(Format, ...) (sizeof(#__VA_ARGS__ "") > 1 ? FRDGEventName(TEXT(Format), ##__VA_ARGS__) : FRDGEventName(1, TEXT(Format)))
#define RDG_EVENT_SCOPE(GraphBuilder, Format, ...) \
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
TRDGEventScopeGuard<FRDGScope_RHI> PREPROCESSOR_JOIN(__RDG_ScopeRef_,__LINE__)( \
(GraphBuilder) \
, ERDGScopeFlags::None \
, FRHIBreadcrumbData(__FILE__, __LINE__, TStatId(), NAME_None) \
, RDG_EVENT_NAME(Format, ##__VA_ARGS__) \
)
#if HAS_GPU_STATS
#define RDG_EVENT_SCOPE_STAT(GraphBuilder, StatName, Format, ...) \
TRDGEventScopeGuard<FRDGScope_RHI> PREPROCESSOR_JOIN(__RDG_ScopeRef_,__LINE__)( \
(GraphBuilder) \
, ERDGScopeFlags::Stat \
, FRHIBreadcrumbData( \
__FILE__ \
, __LINE__ \
, GET_STATID(Stat_GPU_##StatName) \
, CSV_STAT_FNAME(StatName) \
) \
, RDG_EVENT_NAME(Format, ##__VA_ARGS__) \
)
#else
#define RDG_EVENT_SCOPE_STAT(GraphBuilder, StatName, Format, ...) \
RDG_EVENT_SCOPE(GraphBuilder, Format, ##__VA_ARGS__)
#endif
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
#define RDG_EVENT_SCOPE_CONDITIONAL(GraphBuilder, Condition, Format, ...) \
TOptional<TRDGEventScopeGuard<FRDGScope_RHI>> PREPROCESSOR_JOIN(__RDG_ScopeRef_,__LINE__); \
do \
{ \
if (Condition) \
{ \
PREPROCESSOR_JOIN(__RDG_ScopeRef_,__LINE__).Emplace( \
(GraphBuilder) \
, ERDGScopeFlags::None \
, FRHIBreadcrumbData( \
__FILE__ \
, __LINE__ \
, TStatId() \
, NAME_None \
) \
, RDG_EVENT_NAME(Format, ##__VA_ARGS__) \
); \
} \
} while (false)
#if HAS_GPU_STATS
#define RDG_EVENT_SCOPE_CONDITIONAL_STAT(GraphBuilder, Condition, StatName, Format, ...) \
TOptional<TRDGEventScopeGuard<FRDGScope_RHI>> PREPROCESSOR_JOIN(__RDG_ScopeRef_,__LINE__); \
do \
{ \
if (Condition) \
{ \
PREPROCESSOR_JOIN(__RDG_ScopeRef_,__LINE__).Emplace( \
(GraphBuilder) \
, ERDGScopeFlags::Stat \
, FRHIBreadcrumbData( \
__FILE__ \
, __LINE__ \
, GET_STATID(Stat_GPU_##StatName) \
, CSV_STAT_FNAME(StatName) \
) \
, RDG_EVENT_NAME(Format, ##__VA_ARGS__) \
); \
} \
} while (false)
#else
#define RDG_EVENT_SCOPE_CONDITIONAL_STAT(GraphBuilder, Condition, StatName, Format, ...) \
RDG_EVENT_SCOPE_CONDITIONAL(GraphBuilder, Condition, Format, ##__VA_ARGS__)
#endif
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
// The 'Final' version disables any further child scopes or pass events. It is intended
// to group overlapping passes as events can disable overlap on certain GPUs.
#define RDG_EVENT_SCOPE_FINAL(GraphBuilder, Format, ...) \
TOptional<TRDGEventScopeGuard<FRDGScope_RHI>> PREPROCESSOR_JOIN(__RDG_ScopeRef_,__LINE__); \
do \
{ \
PREPROCESSOR_JOIN(__RDG_ScopeRef_,__LINE__).Emplace( \
(GraphBuilder) \
, ERDGScopeFlags::Final \
, FRHIBreadcrumbData(__FILE__, __LINE__, TStatId(), NAME_None) \
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
, RDG_EVENT_NAME(Format, ##__VA_ARGS__) \
); \
} while (false)
#else
#define RDG_EVENT_NAME(...) FRDGEventName()
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
#define RDG_EVENT_SCOPE(...) do { } while (false)
#define RDG_EVENT_SCOPE_STAT(...) do { } while (false)
#define RDG_EVENT_SCOPE_CONDITIONAL(...) do { } while (false)
#define RDG_EVENT_SCOPE_CONDITIONAL_STAT(...) do { } while (false)
#define RDG_EVENT_SCOPE_FINAL(...) do { } while (false)
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
#endif
#if HAS_GPU_STATS && (RHI_NEW_GPU_PROFILER == 0)
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
#define RDG_GPU_STAT_SCOPE(GraphBuilder, StatName) TRDGEventScopeGuard<FRDGScope_GPU> PREPROCESSOR_JOIN(__RDG_GPUStatEvent_##StatName,__LINE__) ((GraphBuilder), ERDGScopeFlags::AlwaysEnable, (GraphBuilder).RHICmdList.GetGPUMask(), CSV_STAT_FNAME(StatName), GET_STATID(Stat_GPU_##StatName), nullptr , DrawcallCountCategory_##StatName);
#define RDG_GPU_STAT_SCOPE_VERBOSE(GraphBuilder, StatName, Description) TRDGEventScopeGuard<FRDGScope_GPU> PREPROCESSOR_JOIN(__RDG_GPUStatEvent_##StatName,__LINE__) ((GraphBuilder), ERDGScopeFlags::AlwaysEnable, (GraphBuilder).RHICmdList.GetGPUMask(), CSV_STAT_FNAME(StatName), GET_STATID(Stat_GPU_##StatName), Description, DrawcallCountCategory_##StatName);
#else
#define RDG_GPU_STAT_SCOPE(GraphBuilder, StatName)
#define RDG_GPU_STAT_SCOPE_VERBOSE(GraphBuilder, StatName, Description)
#endif
#if CSV_PROFILER_STATS
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
#define RDG_CSV_STAT_EXCLUSIVE_SCOPE(GraphBuilder, StatName) TRDGEventScopeGuard<FRDGScope_CSVExclusive> PREPROCESSOR_JOIN(__RDG_CSVStat_##StatName,__LINE__) ((GraphBuilder), ERDGScopeFlags::AlwaysEnable, #StatName);
#else
#define RDG_CSV_STAT_EXCLUSIVE_SCOPE(GraphBuilder, StatName)
#endif
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
/** Injects a scope onto both the RDG and RHI timeline. */
#define RDG_RHI_EVENT_SCOPE( GraphBuilder, Name) RDG_EVENT_SCOPE(GraphBuilder, #Name); RHI_BREADCRUMB_EVENT(GraphBuilder.RHICmdList, #Name)
#define RDG_RHI_EVENT_SCOPE_STAT(GraphBuilder, Stat, Name) RDG_EVENT_SCOPE_STAT(GraphBuilder, Stat, #Name); RHI_BREADCRUMB_EVENT_STAT(GraphBuilder.RHICmdList, Stat, #Name)
#define RDG_RHI_GPU_STAT_SCOPE( GraphBuilder, StatName) RDG_GPU_STAT_SCOPE(GraphBuilder, StatName); SCOPED_GPU_STAT(GraphBuilder.RHICmdList, StatName);
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
namespace DynamicRenderScaling
{
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
class FRDGScope final : public TRDGEventScopeGuard<FRDGScope_Budget>
{
public:
FRDGScope(FRDGScopeState& State, FBudget const& Budget)
: TRDGEventScopeGuard(State, ERDGScopeFlags::AlwaysEnable, Budget)
{}
};
} // namespace DynamicRenderScaling
enum class ERDGScopeMode : uint8
{
Disabled = 0,
TopLevelOnly = 1,
AllEvents = 2,
AllEventsAndPassNames = 3
};
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
class FRDGScopeState
{
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
protected:
struct FState
{
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
struct FRDGScope* Current = nullptr;
DynamicRenderScaling::FBudget const* ActiveBudget = nullptr;
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
uint32 Mask = 0;
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
bool const bImmediate;
bool const bParallelExecute;
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
#if RDG_EVENTS == RDG_EVENTS_NONE
static constexpr ERDGScopeMode const ScopeMode = ERDGScopeMode::Disabled;
#else
ERDGScopeMode const ScopeMode;
#endif
FState(bool bInImmediate, bool bInParallelExecute);
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
} ScopeState;
public:
/** The RHI command list used for the render graph. */
FRHICommandListImmediate& RHICmdList;
#if WITH_RHI_BREADCRUMBS
protected:
FRHIBreadcrumbNode* LocalCurrentBreadcrumb = FRHIBreadcrumbNode::Sentinel;
FRHIBreadcrumbList LocalBreadcrumbList {};
TSharedPtr<FRHIBreadcrumbAllocator> LocalBreadcrumbAllocator;
public:
FRHIBreadcrumbNode*& CurrentBreadcrumbRef;
FRHIBreadcrumbAllocator& GetBreadcrumbAllocator()
{
if (ScopeState.bImmediate)
{
return RHICmdList.GetBreadcrumbAllocator();
}
else
{
if (!LocalBreadcrumbAllocator.IsValid())
{
LocalBreadcrumbAllocator = MakeShared<FRHIBreadcrumbAllocator>();
}
return *LocalBreadcrumbAllocator;
}
}
#endif // WITH_RHI_BREADCRUMBS
public:
FRDGScopeState(FRHICommandListImmediate& InRHICmdList, bool bImmediate, bool bParallelExecute)
: ScopeState(bImmediate, bParallelExecute)
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
, RHICmdList(InRHICmdList)
#if WITH_RHI_BREADCRUMBS
, CurrentBreadcrumbRef(bImmediate ? InRHICmdList.GetCurrentBreadcrumbRef() : LocalCurrentBreadcrumb)
#endif
{}
bool ShouldEmitEvents() const
{
return ScopeState.ScopeMode != ERDGScopeMode::Disabled;
}
template <typename TScopeType>
friend class TRDGEventScopeGuard;
friend FRDGScope_Budget;
#if HAS_GPU_STATS && (RHI_NEW_GPU_PROFILER == 0)
Merging //UE5/Dev-ParallelRendering/... (up to CL 30965645) to //UE5/Main/... (base CL 30962637) Significant refactor of RHI command list management and submission, and RHI breadcrumbs / RenderGraph (RDG) scopes, to allow for parallel translation of most RHI command lists. See individual changelists in //UE5/Dev-ParallelRendering for details. A summary of the changes is as follows: This work's primary goal was to allow as many RHI command lists as possible to be parallel translated, to make more efficient use of many-core systems. To achieve this: - The submission code paths for the immediate and parallel RHI command lists have been merged into a single function: FRHICommandListExecutor::Submit(). - A "dispatch thread" (which is simply a series of chained task graph tasks) is used to decide which command lists are batched together in a single parallel translate job. - Individual command lists can disable parallel translate, which forces them to be executed on the RHI thread. This happens automatically if an RHI command list performs an operation that is not thread safe (e.g. buffer lock, or low-level resource transition). One of the primary blockers for parallel translation was the RHI breadcrumb system, and the way RDG builds scopes. This was also refactored to remove these limitations: - RDG could only push/pop events on the immediate command list, which resulted in parallel and immediate work being interleaved, breaking any opportunity for parallelism. - Platform RHI implementations of breadcrumbs (e.g. in D3D12 RHI) was not correct across multiple RHI contexts. Push/pop operations aren't necessarily balanced within any one RHI context given that RDG builds "parallel pass sets" containing arbitrary ranges of renderer passes. A summary of the new RHI breadcrumb system is as follows: - A tree of breadcrumb nodes is built by the render thread and RDG. Each node contains the node name, and pointers to the parent and next nodes. When fully built, the nodes form a depth-first linked list which is used for traversing the tree for GPU crash debugging. - The memory for breadcrumb nodes is provided by ref-counted allocator objects. These allocators are pipelined through the RHI, allowing the platform RHI implementation to extend their lifetime for GPU crash debugging purposes. - RHIPushEvent / RHIPopEvent have been removed, replaced with RHIBeginBreadcrumbGPU / RHIEndBreadcrumbGPU. Platform RHIs implement these functions to perform GPU immediate writes using the unique ID of each node, for tracking GPU progress. - Format string arguments are captured by-value to remove the cost of string formatting while building the breadcrumb tree. String formatting only occurs when the actual formatted string is required (e.g. during GPU crash breadcrumb stack traversal, or when calling platform GPU profiling APIs). RenderGraph scopes have been simplified: - The separate scope trees / arrays of ops have been combined. There is now a single tree of RDG scopes containing all types. - Each RDG pass holds a pointer to the scope it was created under. - BeginCPU / EndCPU is called on each RDG scope as the various RDG threads enter / exit them. This allows us to mark-up each worker thread with the relevant Unreal Insights scopes. Other changes include: - Fixes for bugs uncovered when parallel translate was enabled. - Adjusted platform affinities necessary due to the new layout of thread tasks in the renderer. - Refactored RHI draw call stats to better fit the new pipeline design. #rb jeannoe.morissette, zach.bethel #jira UE-139543 [CL 30973133 by Luke Thatcher in ue5-main branch]
2024-01-29 12:47:28 -05:00
friend FRDGScope_GPU;
#endif
#if RDG_EVENTS
friend FRDGScope_RHI;
#endif
friend DynamicRenderScaling::FRDGScope;
};
#include "RenderGraphEvent.inl" // IWYU pragma: export