Commit Graph

39 Commits

Author SHA1 Message Date
Nirmoy Das
d41a39dda1 drm/scheduler: improve job distribution with multiple queues
This patch uses score to select a new drm scheduler for better
loadbalance between multiple drm schedulers instead of num_jobs.

Below are test results after running amdgpu_test for ~10 times.

Before this patch:

sched_name     num of many times it got schedule
=========      ==================================
sdma0          1463
sdma1          198
comp_1.0.1     280

After this patch:

sched_name     num of many times it got schedule
=========      ==================================
sdma0          925
sdma1          928
comp_1.0.1     177
comp_1.1.1     44
comp_1.2.1     43
comp_1.3.1     44

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/373000/
Signed-off-by: Christian König <christian.koenig@amd.com>
2020-06-26 14:16:29 +02:00
Sam Ravnborg
fa3d55a14a drm/sched: fix kernel-doc in gpu_scheduler.h
Fix following warning:
gpu_scheduler.h:103: warning: Function parameter or member 'priority' not described in 'drm_sched_entity'

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Nirmoy Das <nirmoy.das@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20200328132025.19910-4-sam@ravnborg.org
2020-04-05 22:22:37 +02:00
Nirmoy Das
ec2edcc279 drm/sched: implement and export drm_sched_pick_best
Remove drm_sched_entity_get_free_sched() and use the logic of picking
the least loaded drm scheduler from a drm scheduler list to implement
drm_sched_pick_best(). This patch also exports drm_sched_pick_best() so
that it can be utilized by other drm drivers.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-16 16:21:32 -04:00
changzhu
d164bebb95 Revert "drm/scheduler: improve job distribution with multiple queues"
It needs to revert this patch to avoid amdgpu_test compute hang problem
on picasso.

This reverts commit 56822db194.

Signed-off-by: changzhu <Changfeng.Zhu@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-16 16:21:32 -04:00
Nirmoy Das
b37aced31e drm/scheduler: implement a function to modify sched list
Implement drm_sched_entity_modify_sched() which modifies existing
sched_list with a different one. This is going to be helpful when
userspace changes priority of a ctx/entity then the driver can switch
to the corresponding HW scheduler list for that priority.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-09 13:51:35 -04:00
Nirmoy Das
2639f453f2 drm/amdgpu: fix doc by clarifying sched_list definition
expand sched_list definition for better understanding.
Also fix a typo atleast -> at least

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-27 16:46:44 -05:00
Nirmoy Das
9e3e90c50d drm/scheduler: fix documentation by replacing rq_list with sched_list
This also replaces old artifacts with a correct one in drm_sched_entity_init()
declaration

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-16 13:41:00 -05:00
Nirmoy Das
56822db194 drm/scheduler: improve job distribution with multiple queues
This patch uses score based logic to select a new rq for better
loadbalance between multiple rq/scheds instead of num_jobs.

Below are test results after running amdgpu_test from mesa drm

Before this patch:

sched_name     num of many times it got scheduled
=========      ==================================
sdma0          314
sdma1          32
comp_1.0.0     56
comp_1.0.1     0
comp_1.1.0     0
comp_1.1.1     0
comp_1.2.0     0
comp_1.2.1     0
comp_1.3.0     0
comp_1.3.1     0
After this patch:

sched_name     num of many times it got scheduled
=========      ==================================
sdma0          216
sdma1          185
comp_1.0.0     39
comp_1.0.1     9
comp_1.1.0     12
comp_1.1.1     0
comp_1.2.0     12
comp_1.2.1     0
comp_1.3.0     12
comp_1.3.1     0

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-16 13:37:54 -05:00
Nirmoy Das
b3ac17667f drm/scheduler: rework entity creation
Entity currently keeps a copy of run_queue list and modify it in
drm_sched_entity_set_priority(). Entities shouldn't modify run_queue
list. Use drm_gpu_scheduler list instead of drm_sched_rq list
in drm_sched_entity struct. In this way we can select a runqueue based
on entity/ctx's priority for a  drm scheduler.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-18 16:09:12 -05:00
Stephen Rothwell
dc10218da8 drm/sched: struct completion requires linux/completion.h inclusion
Fixes: 83a7772ba2 ("drm/sched: Use completion to wait for sched->thread idle v2.")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-08 12:29:52 -05:00
Andrey Grodzovsky
83a7772ba2 drm/sched: Use completion to wait for sched->thread idle v2.
Removes thread park/unpark hack from drm_sched_entity_fini and
by this fixes reactivation of scheduler thread while the thread
is supposed to be stopped.

v2: Per sched entity completion.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-07 18:08:07 -05:00
Andrey Grodzovsky
a5343b8a2c drm/scheduler: Add flag to hint the release of guilty job.
Problem:
Sched thread's cleanup function races against TO handler
and removes the guilty job from mirror list and we
have no way of differentiating if the job was removed from within the
TO handler or from the sched thread's clean-up function.

Fix:
Add a flag to scheduler to hint the TO handler that the guilty job needs
to be explicitly released.

v2: whitespace fix

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1555599624-12285-5-git-send-email-andrey.grodzovsky@amd.com
2019-05-02 15:50:55 -05:00
Christian König
5918045c4e drm/scheduler: rework job destruction
We now destroy finished jobs from the worker thread to make sure that
we never destroy a job currently in timeout processing.
By this we avoid holding lock around ring mirror list in drm_sched_stop
which should solve a deadlock reported by a user.

v2: Remove unused variable.
v4: Move guilty job free into sched code.
v5:
Move sched->hw_rq_count to drm_sched_start to account for counter
decrement in drm_sched_stop even when we don't call resubmit jobs
if guily job did signal.
v6: remove unused variable

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692

Acked-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1555599624-12285-3-git-send-email-andrey.grodzovsky@amd.com
2019-05-02 15:45:48 -05:00
Andrey Grodzovsky
3741540e04 drm/sched: Rework HW fence processing.
Expedite job deletion from ring mirror list to the HW fence signal
callback instead from finish_work, together with waiting for all
such fences to signal in drm_sched_stop we garantee that
already signaled job will not be processed twice.
Remove the sched finish fence callback and just submit finish_work
directly from the HW fence callback.

v2: Fix comments.
v3: Attach  hw fence cb to sched_job
v5: Rebase

Suggested-by: Christian Koenig <Christian.Koenig@amd.com>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-01-25 16:15:36 -05:00
Andrey Grodzovsky
222b5f0441 drm/sched: Refactor ring mirror list handling.
Decauple sched threads stop and start and ring mirror
list handling from the policy of what to do about the
guilty jobs.
When stoppping the sched thread and detaching sched fences
from non signaled HW fenes wait for all signaled HW fences
to complete before rerunning the jobs.

v2: Fix resubmission of guilty job into HW after refactoring.

v4:
Full restart for all the jobs, not only from guilty ring.
Extract karma increase into standalone function.

v5:
Rework waiting for signaled jobs without relying on the job
struct itself as those might already be freed for non 'guilty'
job's schedulers.
Expose karma increase to drivers.

v6:
Use list_for_each_entry_safe_continue and drm_sched_process_job
in case fence already signaled.
Call drm_sched_increase_karma only once for amdgpu and add documentation.

v7:
Wait only for the latest job's fence.

Suggested-by: Christian Koenig <Christian.Koenig@amd.com>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-01-25 16:15:36 -05:00
Sharat Masetty
1db8c142b6 drm/scheduler: Add drm_sched_suspend/resume_timeout()
This patch adds two new functions to help client drivers suspend and
resume the scheduler job timeout. This can be useful in cases where the
hardware has preemption support enabled. Using this, it is possible to have
the timeout active only for the ring which is active on the ringbuffer.
This patch also makes the job_list_lock IRQ safe.

Suggested-by: Christian Koenig <Christian.Koenig@amd.com>
Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-12-05 17:56:16 -05:00
Sharat Masetty
26efecf955 drm/scheduler: Add drm_sched_job_cleanup
This patch adds a new API to clean up the scheduler job resources. This
is primarliy needed in cases the job was created but was not queued to
the scheduler queue. Additionally with this change, the layer which
creates the scheduler job also gets to free up the job's resources and
this entails moving the dma_fence_put(finished_fence) to the drivers
ops free handler routines.

Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-05 14:21:27 -05:00
Andrey Grodzovsky
faf6e1a87e drm/sched: Add boolean to mark if sched is ready to work v5
Problem:
A particular scheduler may become unsuable (underlying HW) after
some event (e.g. GPU reset). If it's later chosen by
the get free sched. policy a command will fail to be
submitted.

Fix:
Add a driver specific callback to report the sched status so
rq with bad sched can be avoided in favor of working one or
none in which case job init will fail.

v2: Switch from driver callback to flag in scheduler.

v3: rebase

v4: Remove ready paramter from drm_sched_init, set
uncoditionally to true once init done.

v5: fix missed change in v3d in v4 (Alex)

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-05 14:21:22 -05:00
Christian König
8fe159b014 drm/sched: add drm_sched_fault
Add a helper to immediately start timeout handling in case of a hardware
fault.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-05 14:21:02 -05:00
Nayan Deshmukh
6a96243056 drm/scheduler: remove timeout work_struct from drm_sched_job (v3)
having a delayed work item per job is redundant as we only need one
per scheduler to track the time out the currently executing job.

v2: the first element of the ring mirror list is the currently
executing job so we don't need a additional variable for it

v3: squash in fixes for v3d and etnaviv

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-09-27 09:55:45 -05:00
Andrey Grodzovsky
62347a3300 drm/scheduler: Add stopped flag to drm_sched_entity
The flag will prevent another thread from same process to
reinsert the entity queue into scheduler's rq after it was already
removewd from there by another thread during drm_sched_entity_flush.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27 11:11:10 -05:00
Christian König
620e762f9a drm/scheduler: move entity handling into separate file
This is complex enough on it's own. Move it into a separate C file.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27 11:10:43 -05:00
Christian König
7febe4bfd5 drm/scheduler: fix setting the priorty for entities (v2)
Since we now deal with multiple rq we need to update all of them, not
just the current one.

v2: Trivial: Removed unused variable (Alex)

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27 11:10:00 -05:00
Nayan Deshmukh
249a07c05a drm/scheduler: add counter for total jobs in scheduler
To keep track of the scheduler load.

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27 11:09:45 -05:00
Nayan Deshmukh
ac0a6cf1c6 drm/scheduler: add a list of run queues to the entity
These are the potential run queues on which the jobs from this
entity can be scheduled. We will use this to do load balancing.

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27 11:09:44 -05:00