Render Command Pipes dedicated asynchronous task pipes for render commands. Users can easily define new pipes and enqueue commands into them. Pipes can be synchronized using a scope to run serial render commands on the render thread, but initially pipes cannot be synchronized individually with each other. Render command overhead is reduced by recording command lambdas into MPSC queues which are serviced by the task graph; both for pipes and for the render thread. This reduces the task overhead as commands are no longer 1-to-1 with tasks.
Pipe behavior is controlled with new CVars. `r.RenderCommandPipeMode` controls overall behavior:
0 - Legacy render thread tasks,
1 - Render thread MPSC queue,
2 - Render thread and async pipe MPSC queues.
To define a Render Command Pipe, use DEFINE_RENDER_COMMAND_PIPE(MyPipe), or DECLARE_RENDER_COMMAND_PIPE(MyPipe, MODULE_API) to declare an extern reference.
Enqueue a command into the pipe like so:
ENQUEUE_RENDER_COMMAND(MyCommand)(UE::RenderCommandPipe::MyPipe, [] (FRHICommandList&) {}).
Omitting a pipe will fallback to the 'general' pipe which is the render thread.
Eventually pipes need to be synced back to the general pipe for scene renders and other GPU work. On the game thread timeline, use UE::RenderCommandPipe::FSyncScope to synchronize the pipes. This waits for pipes and disables recording of new pipe commands until the scope completes, at which point pipe recording is restarted. This creates a 'sync point', so render commands issued prior to a sync scope will be waited on at the start of the scope, and render commands issued after the scope ends will not be able to start until the render thread finishes processing prior commands.
#rb christopher.waters, luke.thatcher
[CL 27074956 by zach bethel in ue5-main branch]
[FYI] zach.bethel
Original CL Desc
-----------------------------------------------------------------
Render Command Pipe Implementation and API
Render Command Pipes dedicated asynchronous task pipes for render commands. Users can easily define new pipes and enqueue commands into them. Pipes can be synchronized using a scope to run serial render commands on the render thread, but initially pipes cannot be synchronized individually with each other. Render command overhead is reduced by recording command lambdas into MPSC queues which are serviced by the task graph; both for pipes and for the render thread. This reduces the task overhead as commands are no longer 1-to-1 with tasks.
Pipe behavior is controlled with new CVars. `r.RenderCommandPipeMode` controls overall behavior:
0 - Legacy render thread tasks,
1 - Render thread MPSC queue,
2 - Render thread and async pipe MPSC queues.
To define a Render Command Pipe, use DEFINE_RENDER_COMMAND_PIPE(MyPipe), or DECLARE_RENDER_COMMAND_PIPE(MyPipe, MODULE_API) to declare an extern reference.
Enqueue a command into the pipe like so:
ENQUEUE_RENDER_COMMAND(MyCommand)(UE::RenderCommandPipe::MyPipe, [] (FRHICommandList&) {}).
Omitting a pipe will fallback to the 'general' pipe which is the render thread.
Eventually pipes need to be synced back to the general pipe for scene renders and other GPU work. On the game thread timeline, use UE::RenderCommandPipe::FSyncScope to synchronize the pipes. This waits for pipes and disables recording of new pipe commands until the scope completes, at which point pipe recording is restarted. This creates a 'sync point', so render commands issued prior to a sync scope will be waited on at the start of the scope, and render commands issued after the scope ends will not be able to start until the render thread finishes processing prior commands.
#rb christopher.waters, luke.thatcher
[CL 27054009 by bob tellez in ue5-main branch]
Render Command Pipes dedicated asynchronous task pipes for render commands. Users can easily define new pipes and enqueue commands into them. Pipes can be synchronized using a scope to run serial render commands on the render thread, but initially pipes cannot be synchronized individually with each other. Render command overhead is reduced by recording command lambdas into MPSC queues which are serviced by the task graph; both for pipes and for the render thread. This reduces the task overhead as commands are no longer 1-to-1 with tasks.
Pipe behavior is controlled with new CVars. `r.RenderCommandPipeMode` controls overall behavior:
0 - Legacy render thread tasks,
1 - Render thread MPSC queue,
2 - Render thread and async pipe MPSC queues.
To define a Render Command Pipe, use DEFINE_RENDER_COMMAND_PIPE(MyPipe), or DECLARE_RENDER_COMMAND_PIPE(MyPipe, MODULE_API) to declare an extern reference.
Enqueue a command into the pipe like so:
ENQUEUE_RENDER_COMMAND(MyCommand)(UE::RenderCommandPipe::MyPipe, [] (FRHICommandList&) {}).
Omitting a pipe will fallback to the 'general' pipe which is the render thread.
Eventually pipes need to be synced back to the general pipe for scene renders and other GPU work. On the game thread timeline, use UE::RenderCommandPipe::FSyncScope to synchronize the pipes. This waits for pipes and disables recording of new pipe commands until the scope completes, at which point pipe recording is restarted. This creates a 'sync point', so render commands issued prior to a sync scope will be waited on at the start of the scope, and render commands issued after the scope ends will not be able to start until the render thread finishes processing prior commands.
#rb christopher.waters, luke.thatcher
[CL 27042459 by zach bethel in ue5-main branch]
+ pins in the primary group of a kernel determines the dispatch type of the kernel
+ Added additional validation code for resource expressions. Expressions that evaluates to different results for unified/non-unified now invalidate the deformer.
TODO: perform this validation during compile instead of runtime and throw a compile error
+ compile error if a data interface with no unified dispatch support is connected to secondary group
+ compile error if a kernel does not have execution data interface
+ support higher number dispatch group count, implying that in the long run we probably only support 1D dispatch of thread groups
#jira none
#rb halfdan.ingvarsson, Jeremy.Moore
#preflight https://horde.devtools.epicgames.com/job/6439dd22ec219759f540f526
[CL 25055283 by jack cai in ue5-main branch]
CPU performance improvements for using half edge buffers in deformer graph data interface.
The half edge buffer was being uploaded per frame. Now it is stored in a resource owned by the data provider and uploaded once.
This is still less than ideal. We want the resource to be owned by the skel mesh, so that it is cooked instead of created at runtime. But that is work for another day.
Also test if we have any compute worker jobs before kicking off RDG graph. This allows for a fairer CPU comparison between skin cache and deformer graph. (We don't want to add RDG overhead to skin cache only work).
Also added support for reordering the compute work so for optimal GPU execution. Ordering by kernel index instead of graph index allows greater overlap of work on GPU when multiple compute graphs are running.
Added a per graph sort priority so that work sorting doesn't cause any setup graphs to run later than execution graphs.
#preflight 63f40694977ceed915769bfe
[CL 24343458 by jeremy moore in ue5-main branch]
Fix bug where dispatch sizes were being added in all dimensions, causing way too many threads to launch.
Fix is fine for all current execution data interfaces which launch 1d thread groups. Will need refining in the future for 2d or 3d thread groups.
#preflight 63ed944b0a06073fef033645
#rbx
[CL 24253413 by jeremy moore in ue5-main branch]
ComputeFramework: Support for unified dispatches.
When a kernel supports unified dispatch it combines any subinvocations into a single dispatch invocation.
This can only happen if all data interfaces support unified dispatch, and all subinvocation shaders are the same.
Some data interfaces, such as skeleton, don't support this because each section keeps its own bone buffer to keep bone count to a minimum.
In future we should look to change the underlying structures to support unified dispatch as much as possible, since it will be more performant.
#preflight 63ed39c7514832b242bce7a7
#rb jack.cai
[CL 24251753 by jeremy moore in ue5-main branch]
Change OptimusSource files to set virtual path through virtual file name instead of through (less robust) line directives.
#preflight 63cc5a8d574ab9cae49dc676
[CL 23803742 by Jeremy Moore in ue5-main branch]
This allows us to jump to compile errors in source for data interfaces that have simple source templates.
#preflight 63cc5a0bd83c1837b1b02af6
[CL 23803741 by Jeremy Moore in ue5-main branch]
Tidied logging during shader compilation.
Optimus: Add support for opening file in visual studio from shader compile errors in Editor or VS log.
#preflight 63c8c39f8168e8b2526e1c66
[CL 23772051 by Jeremy Moore in ue5-main branch]
Existing system was single threaded. With this change we can make used of shader workers.
#preflight 63c01769d862fdd347e46cc0
[CL 23662505 by Jeremy Moore in ue5-main branch]
Move them to ComputeFramework virtual path by default and use valid paths with .ush suffix.
This is in preparation for migrating to the core shared compliler manager which validates these things.
Compile error path->object resolution for source libraries will be broken, but fixed in a second pass.
#preflight 63c0123e577437afe639b2f5
[CL 23662136 by Jeremy Moore in ue5-main branch]
Source is included from there in generically named generated file.
#preflight 63bf6109de27f9bc450b69b1
[CL 23658267 by Jeremy Moore in ue5-main branch]
Failure can happen immediately or later in the pipe (for example when we try to fetch shaders).
Optimus: Use fallback for mesh deformers. Fallback is to simply reset the gpu passthrough vertex factory so that we see the bind pose. This is the same behavior as before, but now can happen at top or bottom of the pipe.
#preflight 63bda31aaf3ebedd992348eb
[CL 23629300 by Jeremy Moore in ue5-main branch]