Files
linux-rockchip/drivers/gpu/drm/drm_gpuvm.c
Liviu Dudau c4f16b4db2 dt-bindings: gpu: mali-valhall-csf: Add support for Arm Mali CSF GPUs
Arm has introduced a new v10 GPU architecture that replaces the Job Manager
interface with a new Command Stream Frontend. It adds firmware driven
command stream queues that can be used by kernel and user space to submit
jobs to the GPU.

Add the initial schema for the device tree that is based on support for
RK3588 SoC. The minimum number of clocks is one for the IP, but on Rockchip
platforms they will tend to expose the semi-independent clocks for better
power management.

v5:
- Move the opp-table node under the gpu node

v4:
- Fix formatting issue

v3:
- Cleanup commit message to remove redundant text
- Added opp-table property and re-ordered entries
- Clarified power-domains and power-domain-names requirements for RK3588.
- Cleaned up example

Note: power-domains and power-domain-names requirements for other platforms
are still work in progress, hence the bindings are left incomplete here.

v2:
- New commit

Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>
Cc: Krzysztof Kozlowski <krzysztof.kozlowski+dt@linaro.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Conor Dooley <conor+dt@kernel.org>
Cc: devicetree@vger.kernel.org
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Rob Herring <robh@kernel.org>

drm: execution context for GEM buffers v7

This adds the infrastructure for an execution context for GEM buffers
which is similar to the existing TTMs execbuf util and intended to replace
it in the long term.

The basic functionality is that we abstracts the necessary loop to lock
many different GEM buffers with automated deadlock and duplicate handling.

v2: drop xarray and use dynamic resized array instead, the locking
    overhead is unnecessary and measurable.
v3: drop duplicate tracking, radeon is really the only one needing that.
v4: fixes issues pointed out by Danilo, some typos in comments and a
    helper for lock arrays of GEM objects.
v5: some suggestions by Boris Brezillon, especially just use one retry
    macro, drop loop in prepare_array, use flags instead of bool
v6: minor changes suggested by Thomas, Boris and Danilo
v7: minor typos pointed out by checkpatch.pl fixed

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Danilo Krummrich <dakr@redhat.com>
Tested-by: Danilo Krummrich <dakr@redhat.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230711133122.3710-2-christian.koenig@amd.com

drm: manager to keep track of GPUs VA mappings

Add infrastructure to keep track of GPU virtual address (VA) mappings
with a decicated VA space manager implementation.

New UAPIs, motivated by Vulkan sparse memory bindings graphics drivers
start implementing, allow userspace applications to request multiple and
arbitrary GPU VA mappings of buffer objects. The DRM GPU VA manager is
intended to serve the following purposes in this context.

1) Provide infrastructure to track GPU VA allocations and mappings,
   using an interval tree (RB-tree).

2) Generically connect GPU VA mappings to their backing buffers, in
   particular DRM GEM objects.

3) Provide a common implementation to perform more complex mapping
   operations on the GPU VA space. In particular splitting and merging
   of GPU VA mappings, e.g. for intersecting mapping requests or partial
   unmap requests.

Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Tested-by: Matthew Brost <matthew.brost@intel.com>
Tested-by: Donald Robson <donald.robson@imgtec.com>
Suggested-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230720001443.2380-2-dakr@redhat.com

drm: manager: Fix printk format for size_t

sizeof() returns a size_t which may be different to an unsigned long.
Use the correct format specifier of '%zu' to prevent compiler warnings.

Fixes: e6303f323b1a ("drm: manager to keep track of GPUs VA mappings")
Reviewed-by: Danilo Krummrich <dakr@redhat.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/2bf64010-c40a-8b84-144c-5387412b579e@arm.com

drm/gpuva_mgr: remove unused prev pointer in __drm_gpuva_sm_map()

The prev pointer in __drm_gpuva_sm_map() was used to implement automatic
merging of mappings. Since automatic merging did not make its way
upstream, remove this leftover.

Fixes: e6303f323b1a ("drm: manager to keep track of GPUs VA mappings")
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230823233119.2891-1-dakr@redhat.com

drm/gpuvm: rename struct drm_gpuva_manager to struct drm_gpuvm

Rename struct drm_gpuva_manager to struct drm_gpuvm including
corresponding functions. This way the GPUVA manager's structures align
very well with the documentation of VM_BIND [1] and VM_BIND locking [2].

It also provides a better foundation for the naming of data structures
and functions introduced for implementing a common dma-resv per GPU-VM
including tracking of external and evicted objects in subsequent
patches.

[1] Documentation/gpu/drm-vm-bind-async.rst
[2] Documentation/gpu/drm-vm-bind-locking.rst

Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230920144343.64830-2-dakr@redhat.com

drm/gpuvm: allow building as module

HB:
drivers/gpu/drm/nouveau/Kconfig
skipped because there is no gpuvm support of nouveau in 6.1

Currently, the DRM GPUVM does not have any core dependencies preventing
a module build.

Also, new features from subsequent patches require helpers (namely
drm_exec) which can be built as module.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230920144343.64830-3-dakr@redhat.com

drm/gpuvm: convert WARN() to drm_WARN() variants

HB:
drivers/gpu/drm/nouveau/nouveau_uvmm.c
skipped since 6.1 does not support gpuvm on nv

Use drm_WARN() and drm_WARN_ON() variants to indicate drivers the
context the failing VM resides in.

Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231108001259.15123-2-dakr@redhat.com

drm/gpuvm: don't always WARN in drm_gpuvm_check_overflow()

Don't always WARN in drm_gpuvm_check_overflow() and separate it into a
drm_gpuvm_check_overflow() and a dedicated
drm_gpuvm_warn_check_overflow() variant.

This avoids printing warnings due to invalid userspace requests.

Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231108001259.15123-3-dakr@redhat.com

drm/gpuvm: export drm_gpuvm_range_valid()

Drivers may use this function to validate userspace requests in advance,
hence export it.

Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231108001259.15123-4-dakr@redhat.com

drm/gpuvm: add common dma-resv per struct drm_gpuvm

hb:
drivers/gpu/drm/nouveau/nouveau_uvmm.c
skipped

Provide a common dma-resv for GEM objects not being used outside of this
GPU-VM. This is used in a subsequent patch to generalize dma-resv,
external and evicted object handling and GEM validation.

Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231108001259.15123-6-dakr@redhat.com

drm/gpuvm: add drm_gpuvm_flags to drm_gpuvm

HB:
drivers/gpu/drm/nouveau/nouveau_uvmm.c
skipped

Introduce flags for struct drm_gpuvm, this required by subsequent
commits.

Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231108001259.15123-8-dakr@redhat.com

drm/gpuvm: reference count drm_gpuvm structures

HB:
drivers/gpu/drm/nouveau/nouveau_uvmm.c
skipped

Implement reference counting for struct drm_gpuvm.

Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231108001259.15123-10-dakr@redhat.com

drm/gpuvm: add an abstraction for a VM / BO combination

HB:
drivers/gpu/drm/nouveau/nouveau_uvmm.c
skipped

Add an abstraction layer between the drm_gpuva mappings of a particular
drm_gem_object and this GEM object itself. The abstraction represents a
combination of a drm_gem_object and drm_gpuvm. The drm_gem_object holds
a list of drm_gpuvm_bo structures (the structure representing this
abstraction), while each drm_gpuvm_bo contains list of mappings of this
GEM object.

This has multiple advantages:

1) We can use the drm_gpuvm_bo structure to attach it to various lists
   of the drm_gpuvm. This is useful for tracking external and evicted
   objects per VM, which is introduced in subsequent patches.

2) Finding mappings of a certain drm_gem_object mapped in a certain
   drm_gpuvm becomes much cheaper.

3) Drivers can derive and extend the structure to easily represent
   driver specific states of a BO for a certain GPUVM.

The idea of this abstraction was taken from amdgpu, hence the credit for
this idea goes to the developers of amdgpu.

Cc: Christian König <christian.koenig@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231108001259.15123-11-dakr@redhat.com

drm/gpuvm: track/lock/validate external/evicted objects

Currently the DRM GPUVM offers common infrastructure to track GPU VA
allocations and mappings, generically connect GPU VA mappings to their
backing buffers and perform more complex mapping operations on the GPU VA
space.

However, there are more design patterns commonly used by drivers, which
can potentially be generalized in order to make the DRM GPUVM represent
a basis for GPU-VM implementations. In this context, this patch aims
at generalizing the following elements.

1) Provide a common dma-resv for GEM objects not being used outside of
   this GPU-VM.

2) Provide tracking of external GEM objects (GEM objects which are
   shared with other GPU-VMs).

3) Provide functions to efficiently lock all GEM objects dma-resv the
   GPU-VM contains mappings of.

4) Provide tracking of evicted GEM objects the GPU-VM contains mappings
   of, such that validation of evicted GEM objects is accelerated.

5) Provide some convinience functions for common patterns.

Big thanks to Boris Brezillon for his help to figure out locking for
drivers updating the GPU VA space within the fence signalling path.

Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231108001259.15123-12-dakr@redhat.com

drm/gpuvm: fall back to drm_exec_lock_obj()

Fall back to drm_exec_lock_obj() if num_fences is zero for the
drm_gpuvm_prepare_* function family.

Otherwise dma_resv_reserve_fences() would actually allocate slots even
though num_fences is zero.

Cc: Christian König <christian.koenig@amd.com>
Acked-by: Donald Robson <donald.robson@imgtec.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231129220835.297885-2-dakr@redhat.com

drm/gpuvm: Let drm_gpuvm_bo_put() report when the vm_bo object is destroyed

Some users need to release resources attached to the vm_bo object when
it's destroyed. In Panthor's case, we need to release the pin ref so
BO pages can be returned to the system when all GPU mappings are gone.

This could be done through a custom drm_gpuvm::vm_bo_free() hook, but
this has all sort of locking implications that would force us to expose
a drm_gem_shmem_unpin_locked() helper, not to mention the fact that
having a ::vm_bo_free() implementation without a ::vm_bo_alloc() one
seems odd. So let's keep things simple, and extend drm_gpuvm_bo_put()
to report when the object is destroyed.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231204151406.1977285-1-boris.brezillon@collabora.com

drm/exec: Pass in initial # of objects

HB: skipped
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c
drivers/gpu/drm/amd/amdkfd/kfd_svm.c
drivers/gpu/drm/imagination/pvr_job.c
drivers/gpu/drm/nouveau/nouveau_uvmm.c

In cases where the # is known ahead of time, it is silly to do the table
resize dance.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
Patchwork: https://patchwork.freedesktop.org/patch/568338/

drm/gem-shmem: When drm_gem_object_init failed, should release object

when goto err_free, the object had init, so it should be release when fail.

Signed-off-by: ChunyouTang <tangchunyou@163.com>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20221119064131.364-1-tangchunyou@163.com

drm: Remove usage of deprecated DRM_DEBUG_PRIME

drm_print.h says DRM_DEBUG_PRIME is deprecated in favor of
drm_dbg_prime().

Signed-off-by: Siddh Raman Pant <code@siddh.me>
Reviewed-by: Simon Ser <contact@emersion.fr>
Signed-off-by: Simon Ser <contact@emersion.fr>
Link: https://patchwork.freedesktop.org/patch/msgid/cd663b1bc42189e55898cddecdb3b73c591b341a.1673269059.git.code@siddh.me

drm/shmem: Cleanup drm_gem_shmem_create_with_handle()

Once we create the handle, the handle owns the reference.  Currently
nothing was doing anything with the shmem ptr after the handle was
created, but let's change drm_gem_shmem_create_with_handle() to not
return the pointer, so-as to not encourage problematic use of this
function in the future.  As a bonus, it makes the code a bit cleaner.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230123154831.3191821-1-robdclark@gmail.com

drm/shmem-helper: Fix locking for drm_gem_shmem_get_pages_sgt()

Other functions touching shmem->sgt take the pages lock, so do that here
too. drm_gem_shmem_get_pages() & co take the same lock, so move to the
_locked() variants to avoid recursive locking.

Discovered while auditing locking to write the Rust abstractions.

Fixes: 2194a63a81 ("drm: Add library for shmem backed GEM objects")
Fixes: 4fa3d66f13 ("drm/shmem: Do dma_unmap_sg before purging pages")
Signed-off-by: Asahi Lina <lina@asahilina.net>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230205125124.2260-1-lina@asahilina.net

drm/shmem-helper: Switch to use drm_* debug helpers

Ease debugging of a multi-GPU system by using drm_WARN_*() and
drm_dbg_kms() helpers that print out DRM device name corresponding
to shmem GEM.

Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Suggested-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Link: https://lore.kernel.org/all/20230108210445.3948344-6-dmitry.osipenko@collabora.com/

drm/shmem-helper: Don't use vmap_use_count for dma-bufs

DMA-buf core has its own refcounting of vmaps, use it instead of drm-shmem
counting. This change prepares drm-shmem for addition of memory shrinker
support where drm-shmem will use a single dma-buf reservation lock for
all operations performed over dma-bufs.

Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Link: https://lore.kernel.org/all/20230108210445.3948344-7-dmitry.osipenko@collabora.com/

drm/shmem-helper: Switch to reservation lock

Replace all drm-shmem locks with a GEM reservation lock. This makes locks
consistent with dma-buf locking convention where importers are responsible
for holding reservation lock for all operations performed over dma-bufs,
preventing deadlock between dma-buf importers and exporters.

Suggested-by: Daniel Vetter <daniel@ffwll.ch>
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Link: https://lore.kernel.org/all/20230108210445.3948344-8-dmitry.osipenko@collabora.com/

drm/shmem-helper: Revert accidental non-GPL export

The referenced commit added a wrapper for drm_gem_shmem_get_pages_sgt(),
but in the process it accidentally changed the export type from GPL to
non-GPL. Switch it back to GPL.

Reported-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Fixes: ddddedaa0db9 ("drm/shmem-helper: Fix locking for drm_gem_shmem_get_pages_sgt()")
Signed-off-by: Asahi Lina <lina@asahilina.net>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230227-shmem-export-fix-v1-1-8880b2c25e81@asahilina.net

Revert "drm/shmem-helper: Switch to reservation lock"

This reverts commit 67b7836d4458790f1261e31fe0ce3250989784f0.

The locking appears incomplete. A caller of SHMEM helper's pin
function never acquires the dma-buf reservation lock. So we get

  WARNING: CPU: 3 PID: 967 at drivers/gpu/drm/drm_gem_shmem_helper.c:243 drm_gem_shmem_pin+0x42/0x90 [drm_shmem_helper]

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230228152612.19971-1-tzimmermann@suse.de

drm/shmem-helper: Switch to reservation lock

Replace all drm-shmem locks with a GEM reservation lock. This makes locks
consistent with dma-buf locking convention where importers are responsible
for holding reservation lock for all operations performed over dma-bufs,
preventing deadlock between dma-buf importers and exporters.

Suggested-by: Daniel Vetter <daniel@ffwll.ch>
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230529223935.2672495-7-dmitry.osipenko@collabora.com

drm/shmem-helper: Reset vma->vm_ops before calling dma_buf_mmap()

The dma-buf backend is supposed to provide its own vm_ops, but some
implementation just have nothing special to do and leave vm_ops
untouched, probably expecting this field to be zero initialized (this
is the case with the system_heap implementation for instance).
Let's reset vma->vm_ops to NULL to keep things working with these
implementations.

Fixes: 26d3ac3cb0 ("drm/shmem-helpers: Redirect mmap for imported dma-buf")
Cc: <stable@vger.kernel.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reported-by: Roman Stratiienko <r.stratiienko@gmail.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Tested-by: Roman Stratiienko <r.stratiienko@gmail.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230724112610.60974-1-boris.brezillon@collabora.com

iommu: Allow passing custom allocators to pgtable drivers

This will be useful for GPU drivers who want to keep page tables in a
pool so they can:

- keep freed page tables in a free pool and speed-up upcoming page
  table allocations
- batch page table allocation instead of allocating one page at a time
- pre-reserve pages for page tables needed for map/unmap operations,
  to ensure map/unmap operations don't try to allocate memory in paths
  they're allowed to block or fail

It might also be valuable for other aspects of GPU and similar
use-cases, like fine-grained memory accounting and resource limiting.

We will extend the Arm LPAE format to support custom allocators in a
separate commit.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20231124142434.1577550-2-boris.brezillon@collabora.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>

iommu: Extend LPAE page table format to support custom allocators

We need that in order to implement the VM_BIND ioctl in the GPU driver
targeting new Mali GPUs.

VM_BIND is about executing MMU map/unmap requests asynchronously,
possibly after waiting for external dependencies encoded as dma_fences.
We intend to use the drm_sched framework to automate the dependency
tracking and VM job dequeuing logic, but this comes with its own set
of constraints, one of them being the fact we are not allowed to
allocate memory in the drm_gpu_scheduler_ops::run_job() to avoid this
sort of deadlocks:

- VM_BIND map job needs to allocate a page table to map some memory
  to the VM. No memory available, so kswapd is kicked
- GPU driver shrinker backend ends up waiting on the fence attached to
  the VM map job or any other job fence depending on this VM operation.

With custom allocators, we will be able to pre-reserve enough pages to
guarantee the map/unmap operations we queued will take place without
going through the system allocator. But we can also optimize
allocation/reservation by not free-ing pages immediately, so any
upcoming page table allocation requests can be serviced by some free
page table pool kept at the driver level.

I might also be valuable for other aspects of GPU and similar
use-cases, like fine-grained memory accounting and resource limiting.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20231124142434.1577550-3-boris.brezillon@collabora.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>

drm/sched: Add FIFO sched policy to run queue

When many entities are competing for the same run queue
on the same scheduler, we observe an unusually long wait
times and some jobs get starved. This has been observed on GPUVis.

The issue is due to the Round Robin policy used by schedulers
to pick up the next entity's job queue for execution. Under stress
of many entities and long job queues within entity some
jobs could be stuck for very long time in it's entity's
queue before being popped from the queue and executed
while for other entities with smaller job queues a job
might execute earlier even though that job arrived later
then the job in the long queue.

Fix:
Add FIFO selection policy to entities in run queue, chose next entity
on run queue in such order that if job on one entity arrived
earlier then job on another entity the first job will start
executing earlier regardless of the length of the entity's job
queue.

v2:
Switch to rb tree structure for entities based on TS of
oldest job waiting in the job queue of an entity. Improves next
entity extraction to O(1). Entity TS update
O(log N) where N is the number of entities in the run-queue

Drop default option in module control parameter.

v3:
Various cosmetical fixes and minor refactoring of fifo update function. (Luben)

v4:
Switch drm_sched_rq_select_entity_fifo to in order search (Luben)

v5: Fix up drm_sched_rq_select_entity_fifo loop (Luben)

v6: Add missing drm_sched_rq_remove_fifo_locked

v7: Fix ts sampling bug and more cosmetic stuff (Luben)

v8: Fix module parameter string (Luben)

Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Direct Rendering Infrastructure - Development <dri-devel@lists.freedesktop.org>
Cc: AMD Graphics <amd-gfx@lists.freedesktop.org>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Tested-by: Yunxiang Li (Teddy) <Yunxiang.Li@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220930041258.1050247-1-luben.tuikov@amd.com

drm/scheduler: Set the FIFO scheduling policy as the default

The currently default Round-Robin GPU scheduling can result in starvation
of entities which have a large number of jobs, over entities which have
a very small number of jobs (single digit).

This can be illustrated in the following diagram, where jobs are
alphabetized to show their chronological order of arrival, where job A is
the oldest, B is the second oldest, and so on, to J, the most recent job to
arrive.

    ---> entities
j | H-F-----A--E--I--
o | --G-----B-----J--
b | --------C--------
s\/ --------D--------

WLOG, assuming all jobs are "ready", then a R-R scheduling will execute them
in the following order (a slice off of the top of the entities' list),

H, F, A, E, I, G, B, J, C, D.

However, to mitigate job starvation, we'd rather execute C and D before E,
and so on, given, of course, that they're all ready to be executed.

So, if all jobs are ready at this instant, the order of execution for this
and the next 9 instances of picking the next job to execute, should really
be,

A, B, C, D, E, F, G, H, I, J,

which is their chronological order. The only reason for this order to be
broken, is if an older job is not yet ready, but a younger job is ready, at
an instant of picking a new job to execute. For instance if job C wasn't
ready at time 2, but job D was ready, then we'd pick job D, like this:

0 +1 +2  ...
A, B, D, ...

And from then on, C would be preferred before all other jobs, if it is ready
at the time when a new job for execution is picked. So, if C became ready
two steps later, the execution order would look like this:

......0 +1 +2  ...
A, B, D, E, C, F, G, H, I, J

This is what the FIFO GPU scheduling algorithm achieves. It uses a
Red-Black tree to keep jobs sorted in chronological order, where picking
the oldest job is O(1) (we use the "cached" structure), and balancing the
tree is O(log n). IOW, it picks the *oldest ready* job to execute now.

The implementation is already in the kernel, and this commit only changes
the default GPU scheduling algorithm to use.

This was tested and achieves about 1% faster performance over the Round
Robin algorithm.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Direct Rendering Infrastructure - Development <dri-devel@lists.freedesktop.org>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221024212634.27230-1-luben.tuikov@amd.com
Signed-off-by: Christian König <christian.koenig@amd.com>

drm/scheduler: add drm_sched_job_add_resv_dependencies

Add a new function to update job dependencies from a resv obj.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014084641.128280-3-christian.koenig@amd.com

drm/scheduler: remove drm_sched_dependency_optimized

Not used any more.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014084641.128280-12-christian.koenig@amd.com

drm/scheduler: rework entity flush, kill and fini

This was buggy because when we had to wait for entities which were
killed as well we would just deadlock.

Instead move all the dependency handling into the callbacks so that
will all happen asynchronously.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014084641.128280-13-christian.koenig@amd.com

drm/scheduler: rename dependency callback into prepare_job

This now matches much better what this is doing.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014084641.128280-14-christian.koenig@amd.com

drm/amdgpu: revert "implement tdr advanced mode"

This reverts commit e6c6338f39.

This feature basically re-submits one job after another to
figure out which one was the one causing a hang.

This is obviously incompatible with gang-submit which requires
that multiple jobs run at the same time. It's also absolutely
not helpful to crash the hardware multiple times if a clean
recovery is desired.

For testing and debugging environments we should rather disable
recovery alltogether to be able to inspect the state with a hw
debugger.

Additional to that the sw implementation is clearly buggy and causes
reference count issues for the hardware fence.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/scheduler: Fix lockup in drm_sched_entity_kill()

The drm_sched_entity_kill() is invoked twice by drm_sched_entity_destroy()
while userspace process is exiting or being killed. First time it's invoked
when sched entity is flushed and second time when entity is released. This
causes a lockup within wait_for_completion(entity_idle) due to how completion
API works.

Calling wait_for_completion() more times than complete() was invoked is a
error condition that causes lockup because completion internally uses
counter for complete/wait calls. The complete_all() must be used instead
in such cases.

This patch fixes lockup of Panfrost driver that is reproducible by killing
any application in a middle of 3d drawing operation.

Fixes: 2fdb8a8f07c2 ("drm/scheduler: rework entity flush, kill and fini")
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221123001303.533968-1-dmitry.osipenko@collabora.com

drm/scheduler: deprecate drm_sched_resubmit_jobs

This interface is not working as it should.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221109095010.141189-5-christian.koenig@amd.com

drm/scheduler: track GPU active time per entity

Track the accumulated time that jobs from this entity were active
on the GPU. This allows drivers using the scheduler to trivially
implement the DRM fdinfo when the hardware doesn't provide more
specific information than signalling job completion anyways.

[Bagas: Append missing colon to @elapsed_ns]
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>

drm/sched: Create wrapper to add a syncobj dependency to job

In order to add a syncobj's fence as a dependency to a job, it is
necessary to call drm_syncobj_find_fence() to find the fence and then
add the dependency with drm_sched_job_add_dependency(). So, wrap these
steps in one single function, drm_sched_job_add_syncobj_dependency().

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Signed-off-by: Maíra Canal <mairacanal@riseup.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20230209124447.467867-2-mcanal@igalia.com

drm/scheduler: Fix variable name in function description

Compiling AMD GPU drivers displays two warnings:

drivers/gpu/drm/scheduler/sched_main.c:738: warning: Function parameter or member 'file' not described in 'drm_sched_job_add_syncobj_dependency'
drivers/gpu/drm/scheduler/sched_main.c:738: warning: Excess function
parameter 'file_private' description in
'drm_sched_job_add_syncobj_dependency'

Get rid of them by renaming the variable name on the function description

Signed-off-by: Caio Novais <caionovais@usp.br>
Link: https://lore.kernel.org/r/20230325131532.6356-1-caionovais@usp.br
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>

drm/scheduler: Add fence deadline support

As the finished fence is the one that is exposed to userspace, and
therefore the one that other operations, like atomic update, would
block on, we need to propagate the deadline from from the finished
fence to the actual hw fence.

v2: Split into drm_sched_fence_set_parent() (ckoenig)
v3: Ensure a thread calling drm_sched_fence_set_deadline_finished() sees
    fence->parent set before drm_sched_fence_set_parent() does this
    test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT).

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>

Revert "drm/scheduler: track GPU active time per entity"

This reverts commit df622729ddbf as it introduces a use-after-free,
which isn't easy to fix without going back to the design drawing board.

Reported-by: Danilo Krummrich <dakr@redhat.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>

drm/scheduler: Fix UAF race in drm_sched_entity_push_job()

After a job is pushed into the queue, it is owned by the scheduler core
and may be freed at any time, so we can't write nor read the submit
timestamp after that point.

Fixes oopses observed with the drm/asahi driver, found with kASAN.

Signed-off-by: Asahi Lina <lina@asahilina.net>
Link: https://lore.kernel.org/r/20230406-scheduler-uaf-2-v1-1-972531cf0a81@asahilina.net
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>

drm/sched: Check scheduler ready before calling timeout handling

During an IGT GPU reset test we see the following oops,

[  +0.000003] ------------[ cut here ]------------
[  +0.000000] WARNING: CPU: 9 PID: 0 at kernel/workqueue.c:1656 __queue_delayed_work+0x6d/0xa0
[  +0.000004] Modules linked in: iptable_filter bpfilter amdgpu(OE) nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic intel_rapl_msr ledtrig_audio snd_hda_codec_hdmi intel_rapl_common snd_hda_intel edac_mce_amd snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core iommu_v2 gpu_sched(OE) kvm_amd drm_buddy snd_hwdep kvm video drm_ttm_helper snd_pcm ttm snd_seq_midi drm_display_helper snd_seq_midi_event snd_rawmidi cec crct10dif_pclmul ghash_clmulni_intel sha512_ssse3 snd_seq aesni_intel rc_core crypto_simd cryptd binfmt_misc drm_kms_helper rapl snd_seq_device input_leds joydev snd_timer i2c_algo_bit syscopyarea snd ccp sysfillrect sysimgblt wmi_bmof k10temp soundcore mac_hid sch_fq_codel msr parport_pc ppdev drm lp parport ramoops reed_solomon pstore_blk pstore_zone efi_pstore ip_tables x_tables autofs4 hid_generic usbhid hid r8169 ahci xhci_pci gpio_amdpt realtek i2c_piix4 wmi crc32_pclmul xhci_pci_renesas libahci gpio_generic
[  +0.000070] CPU: 9 PID: 0 Comm: swapper/9 Tainted: G        W OE      6.1.11+ #2
[  +0.000003] Hardware name: Gigabyte Technology Co., Ltd. AB350-Gaming 3/AB350-Gaming 3-CF, BIOS F7 06/16/2017
[  +0.000001] RIP: 0010:__queue_delayed_work+0x6d/0xa0
[  +0.000003] Code: 7a 50 48 01 c1 48 89 4a 30 81 ff 00 20 00 00 75 38 4c 89 cf e8 64 3e 0a 00 5d e9 1e c5 11 01 e8 99 f7 ff ff 5d e9 13 c5 11 01 <0f> 0b eb c1 0f 0b 48 81 7a 38 70 5c 0e 81 74 9f 0f 0b 48 8b 42 28
[  +0.000002] RSP: 0018:ffffc90000398d60 EFLAGS: 00010007
[  +0.000002] RAX: ffff88810d589c60 RBX: 0000000000000000 RCX: 0000000000000000
[  +0.000002] RDX: ffff88810d589c58 RSI: 0000000000000000 RDI: 0000000000002000
[  +0.000001] RBP: ffffc90000398d60 R08: 0000000000000000 R09: ffff88810d589c78
[  +0.000002] R10: 72705f305f39765f R11: 7866673a6d72645b R12: ffff88810d589c58
[  +0.000001] R13: 0000000000002000 R14: 0000000000000000 R15: 0000000000000000
[  +0.000002] FS:  0000000000000000(0000) GS:ffff8887fee40000(0000) knlGS:0000000000000000
[  +0.000001] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000002] CR2: 00005562c4797fa0 CR3: 0000000110da0000 CR4: 00000000003506e0
[  +0.000002] Call Trace:
[  +0.000001]  <IRQ>
[  +0.000001]  mod_delayed_work_on+0x5e/0xa0
[  +0.000004]  drm_sched_fault+0x23/0x30 [gpu_sched]
[  +0.000007]  gfx_v9_0_fault.isra.0+0xa6/0xd0 [amdgpu]
[  +0.000258]  gfx_v9_0_priv_reg_irq+0x29/0x40 [amdgpu]
[  +0.000254]  amdgpu_irq_dispatch+0x1ac/0x2b0 [amdgpu]
[  +0.000243]  amdgpu_ih_process+0x89/0x130 [amdgpu]
[  +0.000245]  amdgpu_irq_handler+0x24/0x60 [amdgpu]
[  +0.000165]  __handle_irq_event_percpu+0x4f/0x1a0
[  +0.000003]  handle_irq_event_percpu+0x15/0x50
[  +0.000001]  handle_irq_event+0x39/0x60
[  +0.000002]  handle_edge_irq+0xa8/0x250
[  +0.000003]  __common_interrupt+0x7b/0x150
[  +0.000002]  common_interrupt+0xc1/0xe0
[  +0.000003]  </IRQ>
[  +0.000000]  <TASK>
[  +0.000001]  asm_common_interrupt+0x27/0x40
[  +0.000002] RIP: 0010:native_safe_halt+0xb/0x10
[  +0.000003] Code: 46 ff ff ff cc cc cc cc cc cc cc cc cc cc cc eb 07 0f 00 2d 69 f2 5e 00 f4 e9 f1 3b 3e 00 90 eb 07 0f 00 2d 59 f2 5e 00 fb f4 <e9> e0 3b 3e 00 0f 1f 44 00 00 55 48 89 e5 53 e8 b1 d4 fe ff 66 90
[  +0.000002] RSP: 0018:ffffc9000018fdc8 EFLAGS: 00000246
[  +0.000002] RAX: 0000000000004000 RBX: 000000000002e5a8 RCX: 000000000000001f
[  +0.000001] RDX: 0000000000000001 RSI: ffff888101298800 RDI: ffff888101298864
[  +0.000001] RBP: ffffc9000018fdd0 R08: 000000527f64bd8b R09: 000000000001dc90
[  +0.000001] R10: 000000000001dc90 R11: 0000000000000003 R12: 0000000000000001
[  +0.000001] R13: ffff888101298864 R14: ffffffff832d9e20 R15: ffff888193aa8c00
[  +0.000003]  ? acpi_idle_do_entry+0x5e/0x70
[  +0.000002]  acpi_idle_enter+0xd1/0x160
[  +0.000003]  cpuidle_enter_state+0x9a/0x6e0
[  +0.000003]  cpuidle_enter+0x2e/0x50
[  +0.000003]  call_cpuidle+0x23/0x50
[  +0.000002]  do_idle+0x1de/0x260
[  +0.000002]  cpu_startup_entry+0x20/0x30
[  +0.000002]  start_secondary+0x120/0x150
[  +0.000003]  secondary_startup_64_no_verify+0xe5/0xeb
[  +0.000004]  </TASK>
[  +0.000000] ---[ end trace 0000000000000000 ]---
[  +0.000003] BUG: kernel NULL pointer dereference, address: 0000000000000102
[  +0.006233] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_low timeout, signaled seq=3, emitted seq=4
[  +0.000734] #PF: supervisor read access in kernel mode
[  +0.009670] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process amd_deadlock pid 2002 thread amd_deadlock pid 2002
[  +0.005135] #PF: error_code(0x0000) - not-present page
[  +0.000002] PGD 0 P4D 0
[  +0.000002] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  +0.000002] CPU: 9 PID: 0 Comm: swapper/9 Tainted: G        W OE      6.1.11+ #2
[  +0.000002] Hardware name: Gigabyte Technology Co., Ltd. AB350-Gaming 3/AB350-Gaming 3-CF, BIOS F7 06/16/2017
[  +0.012101] amdgpu 0000:0c:00.0: amdgpu: GPU reset begin!
[  +0.005136] RIP: 0010:__queue_work+0x1f/0x4e0
[  +0.000004] Code: 87 cd 11 01 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 49 89 d5 41 54 49 89 f4 53 48 83 ec 10 89 7d d4 <f6> 86 02 01 00 00 01 0f 85 6c 03 00 00 e8 7f 36 08 00 8b 45 d4 48

For gfx_rings the schedulers may not be initialized by
amdgpu_device_init_schedulers() due to ring->no_scheduler flag being set to
true and thus the timeout_wq is NULL. As a result, since all ASICs call
drm_sched_fault() unconditionally even for schedulers which have not been
initialized, it is simpler to use the ready condition which indicates whether
the given scheduler worker thread runs and whether the timeout_wq of the reset
domain has been initialized.

Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://lore.kernel.org/r/20230406200054.633379-1-luben.tuikov@amd.com

drm/scheduler: set entity to NULL in drm_sched_entity_pop_job()

It already happend a few times that patches slipped through which
implemented access to an entity through a job that was already removed
from the entities queue. Since jobs and entities might have different
lifecycles, this can potentially cause UAF bugs.

In order to make it obvious that a jobs entity pointer shouldn't be
accessed after drm_sched_entity_pop_job() was called successfully, set
the jobs entity pointer to NULL once the job is removed from the entity
queue.

Moreover, debugging a potential NULL pointer dereference is way easier
than potentially corrupted memory through a UAF.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://lore.kernel.org/r/20230418100453.4433-1-dakr@redhat.com
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>

drm/scheduler: properly forward fence errors

When a hw fence is signaled with an error properly forward that to the
finished fence.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230420115752.31470-1-christian.koenig@amd.com

drm/scheduler: add drm_sched_entity_error and use rcu for last_scheduled

Switch to using RCU handling for the last scheduled job and add a
function to return the error code of it.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230420115752.31470-2-christian.koenig@amd.com

drm/scheduler: mark jobs without fence as canceled

When no hw fence is provided for a job that means that the job didn't executed.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230427122726.1290170-1-christian.koenig@amd.com

drm/sched: Check scheduler work queue before calling timeout handling

During an IGT GPU reset test we see again oops despite of
commit 0c8c901aaaebc9 (drm/sched: Check scheduler ready before calling
timeout handling).

It uses ready condition whether to call drm_sched_fault which unwind
the TDR leads to GPU reset.
However it looks the ready condition is overloaded with other meanings,
for example, for the following stack is related GPU reset :

0  gfx_v9_0_cp_gfx_start
1  gfx_v9_0_cp_gfx_resume
2  gfx_v9_0_cp_resume
3  gfx_v9_0_hw_init
4  gfx_v9_0_resume
5  amdgpu_device_ip_resume_phase2

does the following:
	/* start the ring */
	gfx_v9_0_cp_gfx_start(adev);
	ring->sched.ready = true;

The same approach is for other ASICs as well :
gfx_v8_0_cp_gfx_resume
gfx_v10_0_kiq_resume, etc...

As a result, our GPU reset test causes GPU fault which calls unconditionally gfx_v9_0_fault
and then drm_sched_fault. However now it depends on whether the interrupt service routine
drm_sched_fault is executed after gfx_v9_0_cp_gfx_start is completed which sets the ready
field of the scheduler to true even  for uninitialized schedulers and causes oops vs
no fault or when ISR  drm_sched_fault is completed prior  gfx_v9_0_cp_gfx_start and
NULL pointer dereference does not occur.

Use the field timeout_wq  to prevent oops for uninitialized schedulers.
The field could be initialized by the work queue of resetting the domain.

v1: Corrections to commit message (Luben)

Fixes: 11b3b9f461c5c4 ("drm/sched: Check scheduler ready before calling timeout handling")
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Link: https://lore.kernel.org/r/20230510135111.58631-1-vitaly.prosyak@amd.com
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>

drm/sched: Remove redundant check

The rq pointer points inside the drm_gpu_scheduler structure. Thus
it can't be NULL.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: c61cdbdbff ("drm/scheduler: Fix hang when sched_entity released")
Signed-off-by: Vladislav Efanov <VEfanov@ispras.ru>
Link: https://lore.kernel.org/r/20230517125247.434103-1-VEfanov@ispras.ru
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>

drm/sched: Rename to drm_sched_can_queue()

Rename drm_sched_ready() to drm_sched_can_queue(). "ready" can mean many
things and is thus meaningless in this context. Instead, rename to a name
which precisely conveys what is being checked.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com>
Link: https://lore.kernel.org/r/20230517233550.377847-1-luben.tuikov@amd.com

drm/sched: Rename to drm_sched_wakeup_if_can_queue()

Rename drm_sched_wakeup() to drm_sched_wakeup_if_canqueue() since the former
is misleading, as it wakes up the GPU scheduler _only if_ more jobs can be
queued to the underlying hardware.

This distinction is important to make, since the wake conditional in the GPU
scheduler thread wakes up when other conditions are also true, e.g. when there
are jobs to be cleaned. For instance, a user might want to wake up the
scheduler only because there are more jobs to clean, but whether we can queue
more jobs is irrelevant.

v2: Separate "canqueue" to "can_queue". (Alex D.)

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://lore.kernel.org/r/20230517233550.377847-2-luben.tuikov@amd.com
Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com>

drm/scheduler: avoid infinite loop if entity's dependency is a scheduled error fence

[Why]
drm_sched_entity_add_dependency_cb ignores the scheduled fence and return false.
If entity's dependency is a scheduler error fence and drm_sched_stop is called
due to TDR, drm_sched_entity_pop_job will wait for the dependency infinitely.

[How]
Do not wait or ignore the scheduled error fence, add drm_sched_entity_wakeup
callback for the dependency with scheduled error fence.

Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/sched: Make sure we wait for all dependencies in kill_jobs_cb()

drm_sched_entity_kill_jobs_cb() logic is omitting the last fence popped
from the dependency array that was waited upon before
drm_sched_entity_kill() was called (drm_sched_entity::dependency field),
so we're basically waiting for all dependencies except one.

In theory, this wait shouldn't be needed because resources should have
their users registered to the dma_resv object, thus guaranteeing that
future jobs wanting to access these resources wait on all the previous
users (depending on the access type, of course). But we want to keep
these explicit waits in the kill entity path just in case.

Let's make sure we keep all dependencies in the array in
drm_sched_job_dependency(), so we can iterate over the array and wait
in drm_sched_entity_kill_jobs_cb().

We also make sure we wait on drm_sched_fence::finished if we were
originally asked to wait on drm_sched_fence::scheduled. In that case,
we assume the intent was to delegate the wait to the firmware/GPU or
rely on the pipelining done at the entity/scheduler level, but when
killing jobs, we really want to wait for completion not just scheduling.

v2:
- Don't evict deps in drm_sched_job_dependency()

v3:
- Always wait for drm_sched_fence::finished fences in
  drm_sched_entity_kill_jobs_cb() when we see a sched_fence

v4:
- Fix commit message
- Fix a use-after-free bug

v5:
- Flag deps on which we should only wait for the scheduled event
  at insertion time

v6:
- Back to v4 implementation
- Add Christian's R-b

Cc: Frank Binns <frank.binns@imgtec.com>
Cc: Sarah Walker <sarah.walker@imgtec.com>
Cc: Donald Robson <donald.robson@imgtec.com>
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Suggested-by: "Christian König" <christian.koenig@amd.com>
Reviewed-by: "Christian König" <christian.koenig@amd.com>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230619071921.3465992-1-boris.brezillon@collabora.com

drm/sched: Call drm_sched_fence_set_parent() from drm_sched_fence_scheduled()

Drivers that can delegate waits to the firmware/GPU pass the scheduled
fence to drm_sched_job_add_dependency(), and issue wait commands to
the firmware/GPU at job submission time. For this to be possible, they
need all their 'native' dependencies to have a valid parent since this
is where the actual HW fence information are encoded.

In drm_sched_main(), we currently call drm_sched_fence_set_parent()
after drm_sched_fence_scheduled(), leaving a short period of time
during which the job depending on this fence can be submitted.

Since setting parent and signaling the fence are two things that are
kinda related (you can't have a parent if the job hasn't been
scheduled),
it probably makes sense to pass the parent fence to
drm_sched_fence_scheduled() and let it call drm_sched_fence_set_parent()
before it signals the scheduled fence.

Here is a detailed description of the race we are fixing here:

Thread A				Thread B

- calls drm_sched_fence_scheduled()
- signals s_fence->scheduled which
  wakes up thread B

					- entity dep signaled, checking
					  the next dep
					- no more deps waiting
					- entity is picked for job
					  submission by drm_gpu_scheduler
					- run_job() is called
					- run_job() tries to
					  collect native fence info from
					  s_fence->parent, but it's
					  NULL =>
					  BOOM, we can't do our native
					  wait

- calls drm_sched_fence_set_parent()

v2:
* Fix commit message

v3:
* Add a detailed description of the race to the commit message
* Add Luben's R-b

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Frank Binns <frank.binns@imgtec.com>
Cc: Sarah Walker <sarah.walker@imgtec.com>
Cc: Donald Robson <donald.robson@imgtec.com>
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230623075204.382350-1-boris.brezillon@collabora.com

dma-buf: add dma_fence_timestamp helper

When a fence signals there is a very small race window where the timestamp
isn't updated yet. sync_file solves this by busy waiting for the
timestamp to appear, but on other ocassions didn't handled this
correctly.

Provide a dma_fence_timestamp() helper function for this and use it in
all appropriate cases.

Another alternative would be to grab the spinlock when that happens.

v2 by teddy: add a wait parameter to wait for the timestamp to show up, in case
   the accurate timestamp is needed and/or the timestamp is not based on
   ktime (e.g. hw timestamp)
v3 chk: drop the parameter again for unified handling

Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: 1774baa64f ("drm/scheduler: Change scheduled fence track v2")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
CC: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20230929104725.2358-1-christian.koenig@amd.com

drm/sched: Convert the GPU scheduler to variable number of run-queues

The GPU scheduler has now a variable number of run-queues, which are set
up at
drm_sched_init() time. This way, each driver announces how many
run-queues it
requires (supports) per each GPU scheduler it creates. Note, that
run-queues
correspond to scheduler "priorities", thus if the number of run-queues
is set
to 1 at drm_sched_init(), then that scheduler supports a single
run-queue,
i.e. single "priority". If a driver further sets a single entity per
run-queue, then this creates a 1-to-1 correspondence between a scheduler
and
a scheduled entity.

Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Russell King <linux+etnaviv@armlinux.org.uk>
Cc: Qiang Yu <yuq825@gmail.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Cc: Danilo Krummrich <dakr@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Emma Anholt <emma@anholt.net>
Cc: etnaviv@lists.freedesktop.org
Cc: lima@lists.freedesktop.org
Cc: linux-arm-msm@vger.kernel.org
Cc: freedreno@lists.freedesktop.org
Cc: nouveau@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20231023032251.164775-1-luben.tuikov@amd.com

drm/sched: Add drm_sched_wqueue_* helpers

Add scheduler wqueue ready, stop, and start helpers to hide the
implementation details of the scheduler from the drivers.

v2:
  - s/sched_wqueue/sched_wqueue (Luben)
  - Remove the extra white line after the return-statement (Luben)
  - update drm_sched_wqueue_ready comment (Luben)

Cc: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://lore.kernel.org/r/20231031032439.1558703-2-matthew.brost@intel.com
Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>

drm/sched: Convert drm scheduler to use a work queue rather than kthread

In Xe, the new Intel GPU driver, a choice has made to have a 1 to 1
mapping between a drm_gpu_scheduler and drm_sched_entity. At first this
seems a bit odd but let us explain the reasoning below.

1. In Xe the submission order from multiple drm_sched_entity is not
guaranteed to be the same completion even if targeting the same hardware
engine. This is because in Xe we have a firmware scheduler, the GuC,
which allowed to reorder, timeslice, and preempt submissions. If a using
shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR falls
apart as the TDR expects submission order == completion order. Using a
dedicated drm_gpu_scheduler per drm_sched_entity solve this problem.

2. In Xe submissions are done via programming a ring buffer (circular
buffer), a drm_gpu_scheduler provides a limit on number of jobs, if the
limit of number jobs is set to RING_SIZE / MAX_SIZE_PER_JOB we get flow
control on the ring for free.

A problem with this design is currently a drm_gpu_scheduler uses a
kthread for submission / job cleanup. This doesn't scale if a large
number of drm_gpu_scheduler are used. To work around the scaling issue,
use a worker rather than kthread for submission / job cleanup.

v2:
  - (Rob Clark) Fix msm build
  - Pass in run work queue
v3:
  - (Boris) don't have loop in worker
v4:
  - (Tvrtko) break out submit ready, stop, start helpers into own patch
v5:
  - (Boris) default to ordered work queue
v6:
  - (Luben / checkpatch) fix alignment in msm_ringbuffer.c
  - (Luben) s/drm_sched_submit_queue/drm_sched_wqueue_enqueue
  - (Luben) Update comment for drm_sched_wqueue_enqueue
  - (Luben) Positive check for submit_wq in drm_sched_init
  - (Luben) s/alloc_submit_wq/own_submit_wq
v7:
  - (Luben) s/drm_sched_wqueue_enqueue/drm_sched_run_job_queue
v8:
  - (Luben) Adjust var names / comments

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://lore.kernel.org/r/20231031032439.1558703-3-matthew.brost@intel.com
Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>

drm/sched: Split free_job into own work item

Rather than call free_job and run_job in same work item have a dedicated
work item for each. This aligns with the design and intended use of work
queues.

v2:
   - Test for DMA_FENCE_FLAG_TIMESTAMP_BIT before setting
     timestamp in free_job() work item (Danilo)
v3:
  - Drop forward dec of drm_sched_select_entity (Boris)
  - Return in drm_sched_run_job_work if entity NULL (Boris)
v4:
  - Replace dequeue with peek and invert logic (Luben)
  - Wrap to 100 lines (Luben)
  - Update comments for *_queue / *_queue_if_ready functions (Luben)
v5:
  - Drop peek argument, blindly reinit idle (Luben)
  - s/drm_sched_free_job_queue_if_ready/drm_sched_free_job_queue_if_done (Luben)
  - Update work_run_job & work_free_job kernel doc (Luben)
v6:
  - Do not move drm_sched_select_entity in file (Luben)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20231031032439.1558703-4-matthew.brost@intel.com
Reviewed-by: Luben Tuikov <ltuikov89@gmail.com>
Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>

drm/sched: Add drm_sched_start_timeout_unlocked helper

Also add a lockdep assert to drm_sched_start_timeout.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://lore.kernel.org/r/20231031032439.1558703-5-matthew.brost@intel.com
Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>

drm/sched: Add a helper to queue TDR immediately

Add a helper whereby a driver can invoke TDR immediately.

v2:
 - Drop timeout args, rename function, use mod delayed work (Luben)
v3:
 - s/XE/Xe (Luben)
 - present tense in commit message (Luben)
 - Adjust comment for drm_sched_tdr_queue_imm (Luben)
v4:
 - Adjust commit message (Luben)

Cc: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://lore.kernel.org/r/20231031032439.1558703-6-matthew.brost@intel.com
Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>

drm/sched: Rename drm_sched_get_cleanup_job to be more descriptive

"Get cleanup job" makes it sound like helper is returning a job which will
execute some cleanup, or something, while the kerneldoc itself accurately
says "fetch the next _finished_ job". So lets rename the helper to be self
documenting.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Luben Tuikov <ltuikov89@gmail.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231102105538.391648-2-tvrtko.ursulin@linux.intel.com
Reviewed-by: Luben Tuikov <ltuikov89@gmail.com>
Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>

drm/sched: Move free worker re-queuing out of the if block

Whether or not there are more jobs to clean up does not depend on the
existance of the current job, given both drm_sched_get_finished_job and
drm_sched_free_job_queue_if_done take and drop the job list lock.
Therefore it is confusing to make it read like there is a dependency.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Luben Tuikov <ltuikov89@gmail.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231102105538.391648-3-tvrtko.ursulin@linux.intel.com
Reviewed-by: Luben Tuikov <ltuikov89@gmail.com>
Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>

drm/sched: Rename drm_sched_free_job_queue to be more descriptive

The current name makes it sound like helper will free a queue, while what
it does is it enqueues the free job worker.

Rename it to drm_sched_run_free_queue to align with existing
drm_sched_run_job_queue.

Despite that creating an illusion there are two queues, while in reality
there is only one, at least it creates a consistent naming for the two
enqueuing helpers.

At the same time simplify the "if done" helper by dropping the suffix and
adding a double underscore prefix to the one which just enqueues.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Luben Tuikov <ltuikov89@gmail.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231102105538.391648-4-tvrtko.ursulin@linux.intel.com
Reviewed-by: Luben Tuikov <ltuikov89@gmail.com>
Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>

drm/sched: Rename drm_sched_run_job_queue_if_ready and clarify kerneldoc

"If ready" is not immediately clear what it means - is the scheduler
ready or something else? Drop the suffix, clarify kerneldoc, and employ
the same naming scheme as in drm_sched_run_free_queue:

 - drm_sched_run_job_queue   - enqueues if there is something to enqueue
                               *and* scheduler is ready (can queue)
 - __drm_sched_run_job_queue - low-level helper to simply queue the job

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Luben Tuikov <ltuikov89@gmail.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231102105538.391648-5-tvrtko.ursulin@linux.intel.com
Reviewed-by: Luben Tuikov <ltuikov89@gmail.com>
Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>

drm/sched: Drop suffix from drm_sched_wakeup_if_can_queue

Because a) helper is exported to other parts of the scheduler and
b) there isn't a plain drm_sched_wakeup to begin with, I think we can
drop the suffix and by doing so separate the intimiate knowledge
between the scheduler components a bit better.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Luben Tuikov <ltuikov89@gmail.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231102105538.391648-6-tvrtko.ursulin@linux.intel.com
Reviewed-by: Luben Tuikov <ltuikov89@gmail.com>
Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>

drm/sched: Don't disturb the entity when in RR-mode scheduling

Don't call drm_sched_select_entity() in drm_sched_run_job_queue().  In fact,
rename __drm_sched_run_job_queue() to just drm_sched_run_job_queue(), and let
it do just that, schedule the work item for execution.

The problem is that drm_sched_run_job_queue() calls drm_sched_select_entity()
to determine if the scheduler has an entity ready in one of its run-queues,
and in the case of the Round-Robin (RR) scheduling, the function
drm_sched_rq_select_entity_rr() does just that, selects the _next_ entity
which is ready, sets up the run-queue and completion and returns that
entity. The FIFO scheduling algorithm is unaffected.

Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
in the case of RR scheduling, that would result in drm_sched_select_entity()
having been called twice, which may result in skipping a ready entity if more
than one entity is ready. This commit fixes this by eliminating the call to
drm_sched_select_entity() from drm_sched_run_job_queue(), and leaves it only
in drm_sched_run_job_work().

v2: Rebased on top of Tvrtko's renames series of patches. (Luben)
    Add fixes-tag. (Tvrtko)

Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
Fixes: f7fe64ad0f22ff ("drm/sched: Split free_job into own work item")
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231107041020.10035-2-ltuikov89@gmail.com

drm/sched: Qualify drm_sched_wakeup() by drm_sched_entity_is_ready()

Don't "wake up" the GPU scheduler unless the entity is ready, as well as we
can queue to the scheduler, i.e. there is no point in waking up the scheduler
for the entity unless the entity is ready.

Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
Fixes: bc8d6a9df99038 ("drm/sched: Don't disturb the entity when in RR-mode scheduling")
Reviewed-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231110000123.72565-2-ltuikov89@gmail.com

drm/sched: implement dynamic job-flow control

Currently, job flow control is implemented simply by limiting the number
of jobs in flight. Therefore, a scheduler is initialized with a credit
limit that corresponds to the number of jobs which can be sent to the
hardware.

This implies that for each job, drivers need to account for the maximum
job size possible in order to not overflow the ring buffer.

However, there are drivers, such as Nouveau, where the job size has a
rather large range. For such drivers it can easily happen that job
submissions not even filling the ring by 1% can block subsequent
submissions, which, in the worst case, can lead to the ring run dry.

In order to overcome this issue, allow for tracking the actual job size
instead of the number of jobs. Therefore, add a field to track a job's
credit count, which represents the number of credits a job contributes
to the scheduler's credit limit.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Reviewed-by: Luben Tuikov <ltuikov89@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231110001638.71750-1-dakr@redhat.com

drm/sched: Fix bounds limiting when given a malformed entity

If we're given a malformed entity in drm_sched_entity_init()--shouldn't
happen, but we verify--with out-of-bounds priority value, we set it to an
allowed value. Fix the expression which sets this limit.

Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
Fixes: 56e449603f0ac5 ("drm/sched: Convert the GPU scheduler to variable number of run-queues")
Link: https://patchwork.freedesktop.org/patch/msgid/20231123122422.167832-2-ltuikov89@gmail.com
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/dbb91dbe-ef77-4d79-aaf9-2adb171c1d7a@amd.com

drm/sched: Rename priority MIN to LOW

Rename DRM_SCHED_PRIORITY_MIN to DRM_SCHED_PRIORITY_LOW.

This mirrors DRM_SCHED_PRIORITY_HIGH, for a list of DRM scheduler priorities
in ascending order,
  DRM_SCHED_PRIORITY_LOW,
  DRM_SCHED_PRIORITY_NORMAL,
  DRM_SCHED_PRIORITY_HIGH,
  DRM_SCHED_PRIORITY_KERNEL.

Cc: Rob Clark <robdclark@gmail.com>
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Cc: Danilo Krummrich <dakr@redhat.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: linux-arm-msm@vger.kernel.org
Cc: freedreno@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231124052752.6915-5-ltuikov89@gmail.com

drm/sched: Reverse run-queue priority enumeration

Reverse run-queue priority enumeration such that the higest priority is now 0,
and for each consecutive integer the prioirty diminishes.

Run-queues correspond to priorities. To an external observer a scheduler
created with a single run-queue, and another created with
DRM_SCHED_PRIORITY_COUNT number of run-queues, should always schedule
sched->sched_rq[0] with the same "priority", as that index run-queue exists in
both schedulers, i.e. a scheduler with one run-queue or many. This patch makes
it so.

In other words, the "priority" of sched->sched_rq[n], n >= 0, is the same for
any scheduler created with any allowable number of run-queues (priorities), 0
to DRM_SCHED_PRIORITY_COUNT.

Cc: Rob Clark <robdclark@gmail.com>
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Cc: Danilo Krummrich <dakr@redhat.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: linux-arm-msm@vger.kernel.org
Cc: freedreno@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231124052752.6915-6-ltuikov89@gmail.com

drm/sched: Partial revert of "Qualify drm_sched_wakeup() by drm_sched_entity_is_ready()"

Commit f3123c2590005c, in combination with the use of work queues by the GPU
scheduler, leads to random lock-ups of the GUI.

This is a partial revert of of commit f3123c2590005c since drm_sched_wakeup() still
needs its entity argument to pass it to drm_sched_can_queue().

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2994
Link: https://lists.freedesktop.org/archives/dri-devel/2023-November/431606.html

Signed-off-by: Bert Karwatzki <spasswolf@web.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20231127160955.87879-1-spasswolf@web.de
Link: https://lore.kernel.org/r/36bece178ff5dc705065e53d1e5e41f6db6d87e4.camel@web.de
Fixes: f3123c2590005c ("drm/sched: Qualify drm_sched_wakeup() by drm_sched_entity_is_ready()")
Reviewed-by: Luben Tuikov <ltuikov89@gmail.com>
Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>

drm/sched: One function call less in drm_sched_init() after error detection

The kfree() function was called in one case by the
drm_sched_init() function during error handling
even if the passed data structure member contained a null pointer.
This issue was detected by using the Coccinelle software.

Thus adjust a jump target.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Link: https://patchwork.freedesktop.org/patch/msgid/85066512-983d-480c-a44d-32405ab1b80e@web.de
Reviewed-by: Luben Tuikov <ltuikov89@gmail.com>
Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>

drm/sched: Return an error code only as a constant in drm_sched_init()

Return an error code without storing it in an intermediate variable.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Link: https://patchwork.freedesktop.org/patch/msgid/85f8004e-f0c9-42d9-8c59-30f1b4e0b89e@web.de
Reviewed-by: Luben Tuikov <ltuikov89@gmail.com>
Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>

drm/sched: Drain all entities in DRM sched run job worker

All entities must be drained in the DRM scheduler run job worker to
avoid the following case. An entity found that is ready, no job found
ready on entity, and run job worker goes idle with other entities + jobs
ready. Draining all ready entities (i.e. loop over all ready entities)
in the run job worker ensures all job that are ready will be scheduled.

Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Closes: https://lore.kernel.org/all/CABXGCsM2VLs489CH-vF-1539-s3in37=bwuOWtoeeE+q26zE+Q@mail.gmail.com/
Reported-and-tested-by: Mario Limonciello <mario.limonciello@amd.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3124
Link: https://lore.kernel.org/all/20240123021155.2775-1-mario.limonciello@amd.com/
Reported-and-tested-by: Vlastimil Babka <vbabka@suse.cz>
Closes: https://lore.kernel.org/dri-devel/05ddb2da-b182-4791-8ef7-82179fd159a8@amd.com/T/#m0c31d4d1b9ae9995bb880974c4f1dbaddc33a48a
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240124210811.1639040-1-matthew.brost@intel.com

drm/drm_exec: Work around a WW mutex lockdep oddity

If *any* object of a certain WW mutex class is locked, lockdep will
consider *all* mutexes of that class as locked. Also the lock allocation
tracking code will apparently register only the address of the first
mutex of a given class locked in a sequence.
This has the odd consequence that if that first mutex is unlocked while
other mutexes of the same class remain locked and then its memory then
freed, the lock alloc tracking code will incorrectly assume that memory
is freed with a held lock in there.

For now, work around that for drm_exec by releasing the first grabbed
object lock last.

v2:
- Fix a typo (Danilo Krummrich)
- Reword the commit message a bit.
- Add a Fixes: tag

Related lock alloc tracking warning:
[  322.660067] =========================
[  322.660070] WARNING: held lock freed!
[  322.660074] 6.5.0-rc7+ #155 Tainted: G     U           N
[  322.660078] -------------------------
[  322.660081] kunit_try_catch/4981 is freeing memory ffff888112adc000-ffff888112adc3ff, with a lock still held there!
[  322.660089] ffff888112adc1a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_exec_lock_obj+0x11a/0x600 [drm_exec]
[  322.660104] 2 locks held by kunit_try_catch/4981:
[  322.660108]  #0: ffffc9000343fe18 (reservation_ww_class_acquire){+.+.}-{0:0}, at: test_early_put+0x22f/0x490 [drm_exec_test]
[  322.660123]  #1: ffff888112adc1a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_exec_lock_obj+0x11a/0x600 [drm_exec]
[  322.660135]
               stack backtrace:
[  322.660139] CPU: 7 PID: 4981 Comm: kunit_try_catch Tainted: G     U           N 6.5.0-rc7+ #155
[  322.660146] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 0403 01/26/2021
[  322.660152] Call Trace:
[  322.660155]  <TASK>
[  322.660158]  dump_stack_lvl+0x57/0x90
[  322.660164]  debug_check_no_locks_freed+0x20b/0x2b0
[  322.660172]  slab_free_freelist_hook+0xa1/0x160
[  322.660179]  ? drm_exec_unlock_all+0x168/0x2a0 [drm_exec]
[  322.660186]  __kmem_cache_free+0xb2/0x290
[  322.660192]  drm_exec_unlock_all+0x168/0x2a0 [drm_exec]
[  322.660200]  drm_exec_fini+0xf/0x1c0 [drm_exec]
[  322.660206]  test_early_put+0x289/0x490 [drm_exec_test]
[  322.660215]  ? __pfx_test_early_put+0x10/0x10 [drm_exec_test]
[  322.660222]  ? __kasan_check_byte+0xf/0x40
[  322.660227]  ? __ksize+0x63/0x140
[  322.660233]  ? drmm_add_final_kfree+0x3e/0xa0 [drm]
[  322.660289]  ? _raw_spin_unlock_irqrestore+0x30/0x60
[  322.660294]  ? lockdep_hardirqs_on+0x7d/0x100
[  322.660301]  ? __pfx_kunit_try_run_case+0x10/0x10 [kunit]
[  322.660310]  ? __pfx_kunit_generic_run_threadfn_adapter+0x10/0x10 [kunit]
[  322.660319]  kunit_generic_run_threadfn_adapter+0x4a/0x90 [kunit]
[  322.660328]  kthread+0x2e7/0x3c0
[  322.660334]  ? __pfx_kthread+0x10/0x10
[  322.660339]  ret_from_fork+0x2d/0x70
[  322.660345]  ? __pfx_kthread+0x10/0x10
[  322.660349]  ret_from_fork_asm+0x1b/0x30
[  322.660358]  </TASK>
[  322.660818]     ok 8 test_early_put

Cc: Christian König <christian.koenig@amd.com>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Danilo Krummrich <dakr@redhat.com>
Cc: dri-devel@lists.freedesktop.org
Fixes: 09593216bff1 ("drm: execution context for GEM buffers v7")
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Danilo Krummrich <dakr@redhat.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230906095039.3320-4-thomas.hellstrom@linux.intel.com

drm/panthor: Add uAPI

Panthor follows the lead of other recently submitted drivers with
ioctls allowing us to support modern Vulkan features, like sparse memory
binding:

- Pretty standard GEM management ioctls (BO_CREATE and BO_MMAP_OFFSET),
  with the 'exclusive-VM' bit to speed-up BO reservation on job
submission
- VM management ioctls (VM_CREATE, VM_DESTROY and VM_BIND). The VM_BIND
  ioctl is loosely based on the Xe model, and can handle both
  asynchronous and synchronous requests
- GPU execution context creation/destruction, tiler heap context
creation
  and job submission. Those ioctls reflect how the hardware/scheduler
  works and are thus driver specific.

We also have a way to expose IO regions, such that the usermode driver
can directly access specific/well-isolate registers, like the
LATEST_FLUSH register used to implement cache-flush reduction.

This uAPI intentionally keeps usermode queues out of the scope, which
explains why doorbell registers and command stream ring-buffers are not
directly exposed to userspace.

v6:
- Add Maxime's and Heiko's acks

v5:
- Fix typo
- Add Liviu's R-b

v4:
- Add a VM_GET_STATE ioctl
- Fix doc
- Expose the CORE_FEATURES register so we can deal with variants in the
  UMD
- Add Steve's R-b

v3:
- Add the concept of sync-only VM operation
- Fix support for 32-bit userspace
- Rework drm_panthor_vm_create to pass the user VA size instead of
  the kernel VA size (suggested by Robin Murphy)
- Typo fixes
- Explicitly cast enums with top bit set to avoid compiler warnings in
  -pedantic mode.
- Drop property core_group_count as it can be easily calculated by the
  number of bits set in l2_present.

Co-developed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Acked-by: Maxime Ripard <mripard@kernel.org>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240229162230.2634044-2-boris.brezillon@collabora.com

drm/panthor: Add GPU register definitions

Those are the registers directly accessible through the MMIO range.

FW registers are exposed in panthor_fw.h.

v6:
- Add Maxime's and Heiko's acks

v4:
- Add the CORE_FEATURES register (needed for GPU variants)
- Add Steve's R-b

v3:
- Add macros to extract GPU ID info
- Formatting changes
- Remove AS_TRANSCFG_ADRMODE_LEGACY - it doesn't exist post-CSF
- Remove CSF_GPU_LATEST_FLUSH_ID_DEFAULT
- Add GPU_L2_FEATURES_LINE_SIZE for extracting the GPU cache line size

Co-developed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Steven Price <steven.price@arm.com> # MIT+GPL2 relicensing,Arm
Acked-by: Grant Likely <grant.likely@linaro.org> # MIT+GPL2 relicensing,Linaro
Acked-by: Boris Brezillon <boris.brezillon@collabora.com> # MIT+GPL2 relicensing,Collabora
Reviewed-by: Steven Price <steven.price@arm.com>
Acked-by: Maxime Ripard <mripard@kernel.org>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240229162230.2634044-3-boris.brezillon@collabora.com

drm/panthor: Add the device logical block

The panthor driver is designed in a modular way, where each logical
block is dealing with a specific HW-block or software feature. In order
for those blocks to communicate with each other, we need a central
panthor_device collecting all the blocks, and exposing some common
features, like interrupt handling, power management, reset, ...

This what this panthor_device logical block is about.

v6:
- Add Maxime's and Heiko's acks
- Keep header inclusion alphabetically ordered

v5:
- Suspend the MMU/GPU blocks if panthor_fw_resume() fails in
  panthor_device_resume()
- Move the pm_runtime_use_autosuspend() call before drm_dev_register()
- Add Liviu's R-b

v4:
- Check drmm_mutex_init() return code
- Fix panthor_device_reset_work() out path
- Fix the race in the unplug logic
- Fix typos
- Unplug blocks when something fails in panthor_device_init()
- Add Steve's R-b

v3:
- Add acks for the MIT+GPL2 relicensing
- Fix 32-bit support
- Shorten the sections protected by panthor_device::pm::mmio_lock to fix
  lock ordering issues.
- Rename panthor_device::pm::lock into panthor_device::pm::mmio_lock to
  better reflect what this lock is protecting
- Use dev_err_probe()
- Make sure we call drm_dev_exit() when something fails half-way in
  panthor_device_reset_work()
- Replace CSF_GPU_LATEST_FLUSH_ID_DEFAULT with a constant '1' and a
  comment to explain. Also remove setting the dummy flush ID on suspend.
- Remove drm_WARN_ON() in panthor_exception_name()
- Check pirq->suspended in panthor_xxx_irq_raw_handler()

Co-developed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Steven Price <steven.price@arm.com> # MIT+GPL2 relicensing,Arm
Acked-by: Grant Likely <grant.likely@linaro.org> # MIT+GPL2 relicensing,Linaro
Acked-by: Boris Brezillon <boris.brezillon@collabora.com> # MIT+GPL2 relicensing,Collabora
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Acked-by: Maxime Ripard <mripard@kernel.org>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240229162230.2634044-4-boris.brezillon@collabora.com

drm/panthor: Add the GPU logical block

Handles everything that's not related to the FW, the MMU or the
scheduler. This is the block dealing with the GPU property retrieval,
the GPU block power on/off logic, and some global operations, like
global cache flushing.

v6:
- Add Maxime's and Heiko's acks

v5:
- Fix GPU_MODEL() kernel doc
- Fix test in panthor_gpu_block_power_off()
- Add Steve's R-b

v4:
- Expose CORE_FEATURES through DEV_QUERY

v3:
- Add acks for the MIT/GPL2 relicensing
- Use macros to extract GPU ID info
- Make sure we reset clear pending_reqs bits when wait_event_timeout()
  times out but the corresponding bit is cleared in GPU_INT_RAWSTAT
  (can happen if the IRQ is masked or HW takes to long to call the IRQ
  handler)
- GPU_MODEL now takes separate arch and product majors to be more
  readable.
- Drop GPU_IRQ_MCU_STATUS_CHANGED from interrupt mask.
- Handle GPU_IRQ_PROTM_FAULT correctly (don't output registers that are
  not updated for protected interrupts).
- Minor code tidy ups

Cc: Alexey Sheplyakov <asheplyakov@basealt.ru> # MIT+GPL2 relicensing
Co-developed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Steven Price <steven.price@arm.com> # MIT+GPL2 relicensing,Arm
Acked-by: Grant Likely <grant.likely@linaro.org> # MIT+GPL2 relicensing,Linaro
Acked-by: Boris Brezillon <boris.brezillon@collabora.com> # MIT+GPL2 relicensing,Collabora
Reviewed-by: Steven Price <steven.price@arm.com>
Acked-by: Maxime Ripard <mripard@kernel.org>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240229162230.2634044-5-boris.brezillon@collabora.com

drm/panthor: Add GEM logical block

Anything relating to GEM object management is placed here. Nothing
particularly interesting here, given the implementation is based on
drm_gem_shmem_object, which is doing most of the work.

v6:
- Add Maxime's and Heiko's acks
- Return a page-aligned BO size to userspace when creating a BO
- Keep header inclusion alphabetically ordered

v5:
- Add Liviu's and Steve's R-b

v4:
- Force kernel BOs to be GPU mapped
- Make panthor_kernel_bo_destroy() robust against ERR/NULL BO pointers
  to simplify the call sites

v3:
- Add acks for the MIT/GPL2 relicensing
- Provide a panthor_kernel_bo abstraction for buffer objects managed by
  the kernel (will replace panthor_fw_mem and be used everywhere we were
  using panthor_gem_create_and_map() before)
- Adjust things to match drm_gpuvm changes
- Change return of panthor_gem_create_with_handle() to int

Co-developed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Steven Price <steven.price@arm.com> # MIT+GPL2 relicensing,Arm
Acked-by: Grant Likely <grant.likely@linaro.org> # MIT+GPL2 relicensing,Linaro
Acked-by: Boris Brezillon <boris.brezillon@collabora.com> # MIT+GPL2 relicensing,Collabora
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Acked-by: Maxime Ripard <mripard@kernel.org>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240229162230.2634044-6-boris.brezillon@collabora.com

drm/panthor: Add the devfreq logical block

Every thing related to devfreq in placed in panthor_devfreq.c, and
helpers that can be called by other logical blocks are exposed through
panthor_devfreq.h.

This implementation is loosely based on the panfrost implementation,
the only difference being that we don't count device users, because
the idle/active state will be managed by the scheduler logic.

v6:
- Add Maxime's and Heiko's acks
- Keep header inclusion alphabetically ordered

v4:
- Add Clément's A-b for the relicensing

v3:
- Add acks for the MIT/GPL2 relicensing

v2:
- Added in v2

Cc: Clément Péron <peron.clem@gmail.com> # MIT+GPL2 relicensing
Reviewed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Steven Price <steven.price@arm.com> # MIT+GPL2 relicensing,Arm
Acked-by: Grant Likely <grant.likely@linaro.org> # MIT+GPL2 relicensing,Linaro
Acked-by: Boris Brezillon <boris.brezillon@collabora.com> # MIT+GPL2 relicensing,Collabora
Acked-by: Clément Péron <peron.clem@gmail.com> # MIT+GPL2 relicensing
Acked-by: Maxime Ripard <mripard@kernel.org>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240229162230.2634044-7-boris.brezillon@collabora.com

drm/panthor: Add the MMU/VM logical block

MMU and VM management is related and placed in the same source file.

Page table updates are delegated to the io-pgtable-arm driver that's in
the iommu subsystem.

The VM management logic is based on drm_gpuva_mgr, and is assuming the
VA space is mostly managed by the usermode driver, except for a reserved
portion of this VA-space that's used for kernel objects (like the heap
contexts/chunks).

Both asynchronous and synchronous VM operations are supported, and
internal helpers are exposed to allow other logical blocks to map their
buffers in the GPU VA space.

There's one VM_BIND queue per-VM (meaning the Vulkan driver can only
expose one sparse-binding queue), and this bind queue is managed with
a 1:1 drm_sched_entity:drm_gpu_scheduler, such that each VM gets its own
independent execution queue, avoiding VM operation serialization at the
device level (things are still serialized at the VM level).

The rest is just implementation details that are hopefully well explained
in the documentation.

v6:
- Add Maxime's and Heiko's acks
- Add Steve's R-b
- Adjust the TRANSCFG value to account for SW VA space limitation on
  32-bit systems
- Keep header inclusion alphabetically ordered

v5:
- Fix a double panthor_vm_cleanup_op_ctx() call
- Fix a race between panthor_vm_prepare_map_op_ctx() and
  panthor_vm_bo_put()
- Fix panthor_vm_pool_destroy_vm() kernel doc
- Fix paddr adjustment in panthor_vm_map_pages()
- Fix bo_offset calculation in panthor_vm_get_bo_for_va()

v4:
- Add an helper to return the VM state
- Check drmm_mutex_init() return code
- Remove the VM from the AS reclaim list when panthor_vm_active() is
  called
- Count the number of active VM users instead of considering there's
  at most one user (several scheduling groups can point to the same
  vM)
- Pre-allocate a VMA object for unmap operations (unmaps can trigger
  a sm_step_remap() call)
- Check vm->root_page_table instead of vm->pgtbl_ops to detect if
  the io-pgtable is trying to allocate the root page table
- Don't memset() the va_node in panthor_vm_alloc_va(), make it a
  caller requirement
- Fix the kernel doc in a few places
- Drop the panthor_vm::base offset constraint and modify
  panthor_vm_put() to explicitly check for a NULL value
- Fix unbalanced vm_bo refcount in panthor_gpuva_sm_step_remap()
- Drop stale comments about the shared_bos list
- Patch mmu_features::va_bits on 32-bit builds to reflect the
  io_pgtable limitation and let the UMD know about it

v3:
- Add acks for the MIT/GPL2 relicensing
- Propagate MMU faults to the scheduler
- Move pages pinning/unpinning out of the dma_signalling path
- Fix 32-bit support
- Rework the user/kernel VA range calculation
- Make the auto-VA range explicit (auto-VA range doesn't cover the full
  kernel-VA range on the MCU VM)
- Let callers of panthor_vm_alloc_va() allocate the drm_mm_node
  (embedded in panthor_kernel_bo now)
- Adjust things to match the latest drm_gpuvm changes (extobj tracking,
  resv prep and more)
- Drop the per-AS lock and use slots_lock (fixes a race on vm->as.id)
- Set as.id to -1 when reusing an address space from the LRU list
- Drop misleading comment about page faults
- Remove check for irq being assigned in panthor_mmu_unplug()

Co-developed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Steven Price <steven.price@arm.com> # MIT+GPL2 relicensing,Arm
Acked-by: Grant Likely <grant.likely@linaro.org> # MIT+GPL2 relicensing,Linaro
Acked-by: Boris Brezillon <boris.brezillon@collabora.com> # MIT+GPL2 relicensing,Collabora
Reviewed-by: Steven Price <steven.price@arm.com>
Acked-by: Maxime Ripard <mripard@kernel.org>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240229162230.2634044-8-boris.brezillon@collabora.com

drm/panthor: Add the FW logical block

Contains everything that's FW related, that includes the code dealing
with the microcontroller unit (MCU) that's running the FW, and anything
related to allocating memory shared between the FW and the CPU.

A few global FW events are processed in the IRQ handler, the rest is
forwarded to the scheduler, since scheduling is the primary reason for
the FW existence, and also the main source of FW <-> kernel
interactions.

v6:
- Add Maxime's and Heiko's acks
- Keep header inclusion alphabetically ordered

v5:
- Fix typo in GLB_PERFCNT_SAMPLE definition
- Fix unbalanced panthor_vm_idle/active() calls
- Fallback to a slow reset when the fast reset fails
- Add extra information when reporting a FW boot failure

v4:
- Add a MODULE_FIRMWARE() entry for gen 10.8
- Fix a wrong return ERR_PTR() in panthor_fw_load_section_entry()
- Fix typos
- Add Steve's R-b

v3:
- Make the FW path more future-proof (Liviu)
- Use one waitqueue for all FW events
- Simplify propagation of FW events to the scheduler logic
- Drop the panthor_fw_mem abstraction and use panthor_kernel_bo instead
- Account for the panthor_vm changes
- Replace magic number with 0x7fffffff with ~0 to better signify that
  it's the maximum permitted value.
- More accurate rounding when computing the firmware timeout.
- Add a 'sub iterator' helper function. This also adds a check that a
  firmware entry doesn't overflow the firmware image.
- Drop __packed from FW structures, natural alignment is good enough.
- Other minor code improvements.

Co-developed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Acked-by: Maxime Ripard <mripard@kernel.org>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240229162230.2634044-9-boris.brezillon@collabora.com

drm/panthor: Add the heap logical block

Tiler heap growing requires some kernel driver involvement: when the
tiler runs out of heap memory, it will raise an exception which is
either directly handled by the firmware if some free heap chunks are
available in the heap context, or passed back to the kernel otherwise.
The heap helpers will be used by the scheduler logic to allocate more
heap chunks to a heap context, when such a situation happens.

Heap context creation is explicitly requested by userspace (using
the TILER_HEAP_CREATE ioctl), and the returned context is attached to a
queue through some command stream instruction.

All the kernel does is keep the list of heap chunks allocated to a
context, so they can be freed when TILER_HEAP_DESTROY is called, or
extended when the FW requests a new chunk.

v6:
- Add Maxime's and Heiko's acks

v5:
- Fix FIXME comment
- Add Steve's R-b

v4:
- Rework locking to allow concurrent calls to panthor_heap_grow()
- Add a helper to return a heap chunk if we couldn't pass it to the
  FW because the group was scheduled out

v3:
- Add a FIXME for the heap OOM deadlock
- Use the panthor_kernel_bo abstraction for the heap context and heap
  chunks
- Drop the panthor_heap_gpu_ctx struct as it is opaque to the driver
- Ensure that the heap context is aligned to the GPU cache line size
- Minor code tidy ups

Co-developed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Acked-by: Maxime Ripard <mripard@kernel.org>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240229162230.2634044-10-boris.brezillon@collabora.com

drm/panthor: Add the scheduler logical block

This is the piece of software interacting with the FW scheduler, and
taking care of some scheduling aspects when the FW comes short of slots
scheduling slots. Indeed, the FW only expose a few slots, and the kernel
has to give all submission contexts, a chance to execute their jobs.

The kernel-side scheduler is timeslice-based, with a round-robin queue
per priority level.

Job submission is handled with a 1:1 drm_sched_entity:drm_gpu_scheduler,
allowing us to delegate the dependency tracking to the core.

All the gory details should be documented inline.

v6:
- Add Maxime's and Heiko's acks
- Make sure the scheduler is initialized before queueing the tick work
  in the MMU fault handler
- Keep header inclusion alphabetically ordered

v5:
- Fix typos
- Call panthor_kernel_bo_destroy(group->syncobjs) unconditionally
- Don't move the group to the waiting list tail when it was already
  waiting for a different syncobj
- Fix fatal_queues flagging in the tiler OOM path
- Don't warn when more than one job timesout on a group
- Add a warning message when we fail to allocate a heap chunk
- Add Steve's R-b

v4:
- Check drmm_mutex_init() return code
- s/drm_gem_vmap_unlocked/drm_gem_vunmap_unlocked/ in
  panthor_queue_put_syncwait_obj()
- Drop unneeded WARN_ON() in cs_slot_sync_queue_state_locked()
- Use atomic_xchg() instead of atomic_fetch_and(0)
- Fix typos
- Let panthor_kernel_bo_destroy() check for IS_ERR_OR_NULL() BOs
- Defer TILER_OOM event handling to a separate workqueue to prevent
  deadlocks when the heap chunk allocation is blocked on mem-reclaim.
  This is just a temporary solution, until we add support for
  non-blocking/failable allocations
- Pass the scheduler workqueue to drm_sched instead of instantiating
  a separate one (no longer needed now that heap chunk allocation
  happens on a dedicated wq)
- Set WQ_MEM_RECLAIM on the scheduler workqueue, so we can handle
  job timeouts when the system is under mem pressure, and hopefully
  free up some memory retained by these jobs

v3:
- Rework the FW event handling logic to avoid races
- Make sure MMU faults kill the group immediately
- Use the panthor_kernel_bo abstraction for group/queue buffers
- Make in_progress an atomic_t, so we can check it without the reset lock
  held
- Don't limit the number of groups per context to the FW scheduler
  capacity. Fix the limit to 128 for now.
- Add a panthor_job_vm() helper
- Account for panthor_vm changes
- Add our job fence as DMA_RESV_USAGE_WRITE to all external objects
  (was previously DMA_RESV_USAGE_BOOKKEEP). I don't get why, given
  we're supposed to be fully-explicit, but other drivers do that, so
  there must be a good reason
- Account for drm_sched changes
- Provide a panthor_queue_put_syncwait_obj()
- Unconditionally return groups to their idle list in
  panthor_sched_suspend()
- Condition of sched_queue_{,delayed_}work fixed to be only when a reset
  isn't pending or in progress.
- Several typos in comments fixed.

Co-developed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Acked-by: Maxime Ripard <mripard@kernel.org>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240229162230.2634044-11-boris.brezillon@collabora.com

drm/panthor: Add the driver frontend block

This is the last piece missing to expose the driver to the outside
world.

This is basically a wrapper between the ioctls and the other logical
blocks.

v6:
- Add Maxime's and Heiko's acks
- Return a page-aligned BO size to userspace
- Keep header inclusion alphabetically ordered

v5:
- Account for the drm_exec_init() prototype change
- Include platform_device.h

v4:
- Add an ioctl to let the UMD query the VM state
- Fix kernel doc
- Let panthor_device_init() call panthor_device_init()
- Fix cleanup ordering in the panthor_init() error path
- Add Steve's and Liviu's R-b

v3:
- Add acks for the MIT/GPL2 relicensing
- Fix 32-bit support
- Account for panthor_vm and panthor_sched changes
- Simplify the resv preparation/update logic
- Use a linked list rather than xarray for list of signals.
- Simplify panthor_get_uobj_array by returning the newly allocated
  array.
- Drop the "DOC" for job submission helpers and move the relevant
  comments to panthor_ioctl_group_submit().
- Add helpers sync_op_is_signal()/sync_op_is_wait().
- Simplify return type of panthor_submit_ctx_add_sync_signal() and
  panthor_submit_ctx_get_sync_signal().
- Drop WARN_ON from panthor_submit_ctx_add_job().
- Fix typos in comments.

Co-developed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Steven Price <steven.price@arm.com> # MIT+GPL2 relicensing,Arm
Acked-by: Grant Likely <grant.likely@linaro.org> # MIT+GPL2 relicensing,Linaro
Acked-by: Boris Brezillon <boris.brezillon@collabora.com> # MIT+GPL2 relicensing,Collabora
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Acked-by: Maxime Ripard <mripard@kernel.org>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240229162230.2634044-12-boris.brezillon@collabora.com

drm/panthor: Allow driver compilation

Now that all blocks are available, we can add/update Kconfig/Makefile
files to allow compilation.

v6:
- Add Maxime's and Heiko's acks
- Keep source files alphabetically ordered in the Makefile

v4:
- Add Steve's R-b

v3:
- Add a dep on DRM_GPUVM
- Fix dependencies in Kconfig
- Expand help text to (hopefully) describe which GPUs are to be
  supported by this driver and which are for panfrost.

Co-developed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Steven Price <steven.price@arm.com> # MIT+GPL2 relicensing,Arm
Acked-by: Grant Likely <grant.likely@linaro.org> # MIT+GPL2 relicensing,Linaro
Acked-by: Boris Brezillon <boris.brezillon@collabora.com> # MIT+GPL2 relicensing,Collabora
Reviewed-by: Steven Price <steven.price@arm.com>
Acked-by: Maxime Ripard <mripard@kernel.org>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240229162230.2634044-13-boris.brezillon@collabora.com

dt-bindings: gpu: mali-valhall-csf: Add support for Arm Mali CSF GPUs

Arm has introduced a new v10 GPU architecture that replaces the Job
Manager
interface with a new Command Stream Frontend. It adds firmware driven
command stream queues that can be used by kernel and user space to
submit
jobs to the GPU.

Add the initial schema for the device tree that is based on support for
RK3588 SoC. The minimum number of clocks is one for the IP, but on
Rockchip
platforms they will tend to expose the semi-independent clocks for
better
power management.

v6:
- Add Maxime's and Heiko's acks

v5:
- Move the opp-table node under the gpu node

v4:
- Fix formatting issue

v3:
- Cleanup commit message to remove redundant text
- Added opp-table property and re-ordered entries
- Clarified power-domains and power-domain-names requirements for
RK3588.
- Cleaned up example

Note: power-domains and power-domain-names requirements for other
platforms
are still work in progress, hence the bindings are left incomplete here.

v2:
- New commit

Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>
Cc: Krzysztof Kozlowski <krzysztof.kozlowski+dt@linaro.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Conor Dooley <conor+dt@kernel.org>
Cc: devicetree@vger.kernel.org
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Maxime Ripard <mripard@kernel.org>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240229162230.2634044-14-boris.brezillon@collabora.com

drm/panthor: Add an entry to MAINTAINERS

Add an entry for the Panthor driver to the MAINTAINERS file.

v6:
- Add Maxime's and Heiko's acks

v4:
- Add Steve's R-b

v3:
- Add bindings document as an 'F:' line.
- Add Steven and Liviu as co-maintainers.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Acked-by: Maxime Ripard <mripard@kernel.org>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240229162230.2634044-15-boris.brezillon@collabora.com

drm/panthor: remove debugfs

dma-buf/dma-fence: Add deadline awareness

Add a way to hint to the fence signaler of an upcoming deadline, such as
vblank, which the fence waiter would prefer not to miss.  This is to aid
the fence signaler in making power management decisions, like boosting
frequency as the deadline approaches and awareness of missing deadlines
so that can be factored in to the frequency scaling.

v2: Drop dma_fence::deadline and related logic to filter duplicate
    deadlines, to avoid increasing dma_fence size.  The fence-context
    implementation will need similar logic to track deadlines of all
    the fences on the same timeline.  [ckoenig]
v3: Clarify locking wrt. set_deadline callback
v4: Clarify in docs comment that this is a hint
v5: Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT.
v6: More docs
v7: Fix typo, clarify past deadlines

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Pekka Paalanen <pekka.paalanen@collabora.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>

drm/gem: Take reservation lock for vmap/vunmap operations

The new common dma-buf locking convention will require buffer importers
to hold the reservation lock around mapping operations. Make DRM GEM core
to take the lock around the vmapping operations and update DRM drivers to
use the locked functions for the case where DRM core now holds the lock.
This patch prepares DRM core and drivers to the common dynamic dma-buf
locking convention.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221017172229.42269-4-dmitry.osipenko@collabora.com

dma-buf: Add unlocked variant of vmapping functions

Add unlocked variant of dma_buf_vmap/vunmap() that will be utilized
by drivers that don't take the reservation lock explicitly.

Acked-by: Sumit Semwal <sumit.semwal@linaro.org>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221017172229.42269-2-dmitry.osipenko@collabora.com

dma-buf: Add unlocked variant of attachment-mapping functions

Add unlocked variant of dma_buf_map/unmap_attachment() that will
be used by drivers that don't take the reservation lock explicitly.

Acked-by: Sumit Semwal <sumit.semwal@linaro.org>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221017172229.42269-3-dmitry.osipenko@collabora.com

dma-buf: Move dma_buf_vmap() to dynamic locking specification

Move dma_buf_vmap/vunmap() functions to the dynamic locking
specification by asserting that the reservation lock is held.

Acked-by: Sumit Semwal <sumit.semwal@linaro.org>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221017172229.42269-16-dmitry.osipenko@collabora.com

drm/gpuvm: Helper to get range of unmap from a remap op.

Determining the start and range of the unmap stage of a remap op is a
common piece of code currently implemented by multiple drivers. Add a
helper for this.

Changes since v7:
- Renamed helper to drm_gpuva_op_remap_to_unmap_range()
- Improved documentation

Changes since v6:
- Remove use of __always_inline

Signed-off-by: Donald Robson <donald.robson@imgtec.com>
Signed-off-by: Sarah Walker <sarah.walker@imgtec.com>
Reviewed-by: Danilo Krummrich <dakr@redhat.com>
Link: https://lore.kernel.org/r/8a0a5b5eeec459d3c60fcdaa5a638ad14a18a59e.1700668843.git.donald.robson@imgtec.com
Signed-off-by: Maxime Ripard <mripard@kernel.org>

drm: Enable PRIME import/export for all drivers

Call drm_gem_prime_handle_to_fd() and drm_gem_prime_fd_to_handle() by
default if no PRIME import/export helpers have been set. Both functions
are the default for almost all drivers.

DRM drivers implement struct drm_driver.gem_prime_import_sg_table
to import dma-buf objects from other drivers. Having the function
drm_gem_prime_fd_to_handle() functions set by default allows each
driver to import dma-buf objects to itself, even without support for
other drivers.

For drm_gem_prime_handle_to_fd() it is similar: using it by default
allows each driver to export to itself, even without support for other
drivers.

This functionality enables userspace to share per-driver buffers
across process boundaries via PRIME (e.g., wlroots requires this
functionality). The patch generalizes a pattern that has previously
been implemented by GEM VRAM helpers [1] to work with any driver.
For example, gma500 can now run the wlroots-based sway compositor.

v2:
	* clean up docs and TODO comments (Simon, Zack)
	* clean up style in drm_getcap()

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://lore.kernel.org/dri-devel/20230302143502.500661-1-contact@emersion.fr/ # 1
Reviewed-by: Simon Ser <contact@emersion.fr>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230620080252.16368-2-tzimmermann@suse.de

drm: Clear fd/handle callbacks in struct drm_driver

Clear all assignments of struct drm_driver's fd/handle callbacks to
drm_gem_prime_fd_to_handle() and drm_gem_prime_handle_to_fd(). These
functions are called by default. Add a TODO item to convert vmwgfx
to the defaults as well.

v2:
	* remove TODO item (Zack)
	* also update amdgpu's amdgpu_partition_driver

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Simon Ser <contact@emersion.fr>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Jeffrey Hugo <quic_jhugo@quicinc.com> # qaic
Link: https://patchwork.freedesktop.org/patch/msgid/20230620080252.16368-3-tzimmermann@suse.de

drm/panthor: Fix IO-page mmap() for 32-bit userspace on 64-bit kernel

When mapping an IO region, the pseudo-file offset is dependent on the
userspace architecture. panthor_device_mmio_offset() abstracts that
away for us by turning a userspace MMIO offset into its kernel
equivalent, but we were not updating vm_area_struct::vm_pgoff
accordingly, leading us to attach the MMIO region to the wrong file
offset.

This has implications when we start mixing 64 bit and 32 bit apps, but
that's only really a problem when we start having more that 2^43 bytes of
memory allocated, which is very unlikely to happen.

What's more problematic is the fact this turns our
unmap_mapping_range(DRM_PANTHOR_USER_MMIO_OFFSET) calls, which are
supposed to kill the MMIO mapping when entering suspend, into NOPs.
Which means we either keep the dummy flush_id mapping active at all
times, or we risk a BUS_FAULT if the MMIO region was mapped, and the
GPU is suspended after that.

Solve that by patching vm_pgoff early in panthor_mmap(). With
this in place, we no longer need the panthor_device_mmio_offset()
helper.

v3:
- No changes

v2:
- Kill panthor_device_mmio_offset()

Fixes: 5fe909cae118 ("drm/panthor: Add the device logical block")
Reported-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reported-by: Lukas F. Hartmann <lukas@mntmn.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10835
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>

drm/panthor: Fix ordering in _irq_suspend()

Make sure we set suspended=true last to avoid generating an irq storm
in the unlikely case where an IRQ happens between the suspended=true
assignment and the _INT_MASK update.

We also move the mask=0 assignment before writing to the _INT_MASK
register to prevent the thread handler from unmasking the interrupt
behind our back. This means we might lose events if there were some
pending when we get to suspend the IRQ, but that's fine.
The synchronize_irq() we have in the _irq_suspend() path was not
there to make sure all IRQs are processed, just to make sure we don't
have registers accesses coming from the irq handlers after
_irq_suspend() has been called. If there's a need to have all pending
IRQs processed, it should happen before _irq_suspend() is called.

v3:
- Add Steve's R-b

v2:
- New patch

Fixes: 5fe909cae118 ("drm/panthor: Add the device logical block")
Reported-by: Steven Price <steven.price@arm.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Acked-by: Liviu Dudau <liviu.dudau@arm.com>

drm/panthor: Drop the dev_enter/exit() sections in _irq_suspend/resume()

There's no reason for _irq_suspend/resume() to be called after the
device has been unplugged, and keeping this dev_enter/exit()
section in _irq_suspend() is turns _irq_suspend() into a NOP
when called from the _unplug() functions, which we don't want.

v3:
- New patch

Fixes: 5fe909cae118 ("drm/panthor: Add the device logical block")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Reviewed-by: Steven Price <steven.price@arm.com>
2025-04-15 20:18:19 +02:00

2753 lines
78 KiB
C

// SPDX-License-Identifier: GPL-2.0-only
/*
* Copyright (c) 2022 Red Hat.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
* OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
* ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
* OTHER DEALINGS IN THE SOFTWARE.
*
* Authors:
* Danilo Krummrich <dakr@redhat.com>
*
*/
#include <drm/drm_gpuvm.h>
#include <linux/interval_tree_generic.h>
#include <linux/mm.h>
/**
* DOC: Overview
*
* The DRM GPU VA Manager, represented by struct drm_gpuvm keeps track of a
* GPU's virtual address (VA) space and manages the corresponding virtual
* mappings represented by &drm_gpuva objects. It also keeps track of the
* mapping's backing &drm_gem_object buffers.
*
* &drm_gem_object buffers maintain a list of &drm_gpuva objects representing
* all existent GPU VA mappings using this &drm_gem_object as backing buffer.
*
* GPU VAs can be flagged as sparse, such that drivers may use GPU VAs to also
* keep track of sparse PTEs in order to support Vulkan 'Sparse Resources'.
*
* The GPU VA manager internally uses a rb-tree to manage the
* &drm_gpuva mappings within a GPU's virtual address space.
*
* The &drm_gpuvm structure contains a special &drm_gpuva representing the
* portion of VA space reserved by the kernel. This node is initialized together
* with the GPU VA manager instance and removed when the GPU VA manager is
* destroyed.
*
* In a typical application drivers would embed struct drm_gpuvm and
* struct drm_gpuva within their own driver specific structures, there won't be
* any memory allocations of its own nor memory allocations of &drm_gpuva
* entries.
*
* The data structures needed to store &drm_gpuvas within the &drm_gpuvm are
* contained within struct drm_gpuva already. Hence, for inserting &drm_gpuva
* entries from within dma-fence signalling critical sections it is enough to
* pre-allocate the &drm_gpuva structures.
*
* &drm_gem_objects which are private to a single VM can share a common
* &dma_resv in order to improve locking efficiency (e.g. with &drm_exec).
* For this purpose drivers must pass a &drm_gem_object to drm_gpuvm_init(), in
* the following called 'resv object', which serves as the container of the
* GPUVM's shared &dma_resv. This resv object can be a driver specific
* &drm_gem_object, such as the &drm_gem_object containing the root page table,
* but it can also be a 'dummy' object, which can be allocated with
* drm_gpuvm_resv_object_alloc().
*
* In order to connect a struct drm_gpuva its backing &drm_gem_object each
* &drm_gem_object maintains a list of &drm_gpuvm_bo structures, and each
* &drm_gpuvm_bo contains a list of &drm_gpuva structures.
*
* A &drm_gpuvm_bo is an abstraction that represents a combination of a
* &drm_gpuvm and a &drm_gem_object. Every such combination should be unique.
* This is ensured by the API through drm_gpuvm_bo_obtain() and
* drm_gpuvm_bo_obtain_prealloc() which first look into the corresponding
* &drm_gem_object list of &drm_gpuvm_bos for an existing instance of this
* particular combination. If not existent a new instance is created and linked
* to the &drm_gem_object.
*
* &drm_gpuvm_bo structures, since unique for a given &drm_gpuvm, are also used
* as entry for the &drm_gpuvm's lists of external and evicted objects. Those
* lists are maintained in order to accelerate locking of dma-resv locks and
* validation of evicted objects bound in a &drm_gpuvm. For instance, all
* &drm_gem_object's &dma_resv of a given &drm_gpuvm can be locked by calling
* drm_gpuvm_exec_lock(). Once locked drivers can call drm_gpuvm_validate() in
* order to validate all evicted &drm_gem_objects. It is also possible to lock
* additional &drm_gem_objects by providing the corresponding parameters to
* drm_gpuvm_exec_lock() as well as open code the &drm_exec loop while making
* use of helper functions such as drm_gpuvm_prepare_range() or
* drm_gpuvm_prepare_objects().
*
* Every bound &drm_gem_object is treated as external object when its &dma_resv
* structure is different than the &drm_gpuvm's common &dma_resv structure.
*/
/**
* DOC: Split and Merge
*
* Besides its capability to manage and represent a GPU VA space, the
* GPU VA manager also provides functions to let the &drm_gpuvm calculate a
* sequence of operations to satisfy a given map or unmap request.
*
* Therefore the DRM GPU VA manager provides an algorithm implementing splitting
* and merging of existent GPU VA mappings with the ones that are requested to
* be mapped or unmapped. This feature is required by the Vulkan API to
* implement Vulkan 'Sparse Memory Bindings' - drivers UAPIs often refer to this
* as VM BIND.
*
* Drivers can call drm_gpuvm_sm_map() to receive a sequence of callbacks
* containing map, unmap and remap operations for a given newly requested
* mapping. The sequence of callbacks represents the set of operations to
* execute in order to integrate the new mapping cleanly into the current state
* of the GPU VA space.
*
* Depending on how the new GPU VA mapping intersects with the existent mappings
* of the GPU VA space the &drm_gpuvm_ops callbacks contain an arbitrary amount
* of unmap operations, a maximum of two remap operations and a single map
* operation. The caller might receive no callback at all if no operation is
* required, e.g. if the requested mapping already exists in the exact same way.
*
* The single map operation represents the original map operation requested by
* the caller.
*
* &drm_gpuva_op_unmap contains a 'keep' field, which indicates whether the
* &drm_gpuva to unmap is physically contiguous with the original mapping
* request. Optionally, if 'keep' is set, drivers may keep the actual page table
* entries for this &drm_gpuva, adding the missing page table entries only and
* update the &drm_gpuvm's view of things accordingly.
*
* Drivers may do the same optimization, namely delta page table updates, also
* for remap operations. This is possible since &drm_gpuva_op_remap consists of
* one unmap operation and one or two map operations, such that drivers can
* derive the page table update delta accordingly.
*
* Note that there can't be more than two existent mappings to split up, one at
* the beginning and one at the end of the new mapping, hence there is a
* maximum of two remap operations.
*
* Analogous to drm_gpuvm_sm_map() drm_gpuvm_sm_unmap() uses &drm_gpuvm_ops to
* call back into the driver in order to unmap a range of GPU VA space. The
* logic behind this function is way simpler though: For all existent mappings
* enclosed by the given range unmap operations are created. For mappings which
* are only partically located within the given range, remap operations are
* created such that those mappings are split up and re-mapped partically.
*
* As an alternative to drm_gpuvm_sm_map() and drm_gpuvm_sm_unmap(),
* drm_gpuvm_sm_map_ops_create() and drm_gpuvm_sm_unmap_ops_create() can be used
* to directly obtain an instance of struct drm_gpuva_ops containing a list of
* &drm_gpuva_op, which can be iterated with drm_gpuva_for_each_op(). This list
* contains the &drm_gpuva_ops analogous to the callbacks one would receive when
* calling drm_gpuvm_sm_map() or drm_gpuvm_sm_unmap(). While this way requires
* more memory (to allocate the &drm_gpuva_ops), it provides drivers a way to
* iterate the &drm_gpuva_op multiple times, e.g. once in a context where memory
* allocations are possible (e.g. to allocate GPU page tables) and once in the
* dma-fence signalling critical path.
*
* To update the &drm_gpuvm's view of the GPU VA space drm_gpuva_insert() and
* drm_gpuva_remove() may be used. These functions can safely be used from
* &drm_gpuvm_ops callbacks originating from drm_gpuvm_sm_map() or
* drm_gpuvm_sm_unmap(). However, it might be more convenient to use the
* provided helper functions drm_gpuva_map(), drm_gpuva_remap() and
* drm_gpuva_unmap() instead.
*
* The following diagram depicts the basic relationships of existent GPU VA
* mappings, a newly requested mapping and the resulting mappings as implemented
* by drm_gpuvm_sm_map() - it doesn't cover any arbitrary combinations of these.
*
* 1) Requested mapping is identical. Replace it, but indicate the backing PTEs
* could be kept.
*
* ::
*
* 0 a 1
* old: |-----------| (bo_offset=n)
*
* 0 a 1
* req: |-----------| (bo_offset=n)
*
* 0 a 1
* new: |-----------| (bo_offset=n)
*
*
* 2) Requested mapping is identical, except for the BO offset, hence replace
* the mapping.
*
* ::
*
* 0 a 1
* old: |-----------| (bo_offset=n)
*
* 0 a 1
* req: |-----------| (bo_offset=m)
*
* 0 a 1
* new: |-----------| (bo_offset=m)
*
*
* 3) Requested mapping is identical, except for the backing BO, hence replace
* the mapping.
*
* ::
*
* 0 a 1
* old: |-----------| (bo_offset=n)
*
* 0 b 1
* req: |-----------| (bo_offset=n)
*
* 0 b 1
* new: |-----------| (bo_offset=n)
*
*
* 4) Existent mapping is a left aligned subset of the requested one, hence
* replace the existent one.
*
* ::
*
* 0 a 1
* old: |-----| (bo_offset=n)
*
* 0 a 2
* req: |-----------| (bo_offset=n)
*
* 0 a 2
* new: |-----------| (bo_offset=n)
*
* .. note::
* We expect to see the same result for a request with a different BO
* and/or non-contiguous BO offset.
*
*
* 5) Requested mapping's range is a left aligned subset of the existent one,
* but backed by a different BO. Hence, map the requested mapping and split
* the existent one adjusting its BO offset.
*
* ::
*
* 0 a 2
* old: |-----------| (bo_offset=n)
*
* 0 b 1
* req: |-----| (bo_offset=n)
*
* 0 b 1 a' 2
* new: |-----|-----| (b.bo_offset=n, a.bo_offset=n+1)
*
* .. note::
* We expect to see the same result for a request with a different BO
* and/or non-contiguous BO offset.
*
*
* 6) Existent mapping is a superset of the requested mapping. Split it up, but
* indicate that the backing PTEs could be kept.
*
* ::
*
* 0 a 2
* old: |-----------| (bo_offset=n)
*
* 0 a 1
* req: |-----| (bo_offset=n)
*
* 0 a 1 a' 2
* new: |-----|-----| (a.bo_offset=n, a'.bo_offset=n+1)
*
*
* 7) Requested mapping's range is a right aligned subset of the existent one,
* but backed by a different BO. Hence, map the requested mapping and split
* the existent one, without adjusting the BO offset.
*
* ::
*
* 0 a 2
* old: |-----------| (bo_offset=n)
*
* 1 b 2
* req: |-----| (bo_offset=m)
*
* 0 a 1 b 2
* new: |-----|-----| (a.bo_offset=n,b.bo_offset=m)
*
*
* 8) Existent mapping is a superset of the requested mapping. Split it up, but
* indicate that the backing PTEs could be kept.
*
* ::
*
* 0 a 2
* old: |-----------| (bo_offset=n)
*
* 1 a 2
* req: |-----| (bo_offset=n+1)
*
* 0 a' 1 a 2
* new: |-----|-----| (a'.bo_offset=n, a.bo_offset=n+1)
*
*
* 9) Existent mapping is overlapped at the end by the requested mapping backed
* by a different BO. Hence, map the requested mapping and split up the
* existent one, without adjusting the BO offset.
*
* ::
*
* 0 a 2
* old: |-----------| (bo_offset=n)
*
* 1 b 3
* req: |-----------| (bo_offset=m)
*
* 0 a 1 b 3
* new: |-----|-----------| (a.bo_offset=n,b.bo_offset=m)
*
*
* 10) Existent mapping is overlapped by the requested mapping, both having the
* same backing BO with a contiguous offset. Indicate the backing PTEs of
* the old mapping could be kept.
*
* ::
*
* 0 a 2
* old: |-----------| (bo_offset=n)
*
* 1 a 3
* req: |-----------| (bo_offset=n+1)
*
* 0 a' 1 a 3
* new: |-----|-----------| (a'.bo_offset=n, a.bo_offset=n+1)
*
*
* 11) Requested mapping's range is a centered subset of the existent one
* having a different backing BO. Hence, map the requested mapping and split
* up the existent one in two mappings, adjusting the BO offset of the right
* one accordingly.
*
* ::
*
* 0 a 3
* old: |-----------------| (bo_offset=n)
*
* 1 b 2
* req: |-----| (bo_offset=m)
*
* 0 a 1 b 2 a' 3
* new: |-----|-----|-----| (a.bo_offset=n,b.bo_offset=m,a'.bo_offset=n+2)
*
*
* 12) Requested mapping is a contiguous subset of the existent one. Split it
* up, but indicate that the backing PTEs could be kept.
*
* ::
*
* 0 a 3
* old: |-----------------| (bo_offset=n)
*
* 1 a 2
* req: |-----| (bo_offset=n+1)
*
* 0 a' 1 a 2 a'' 3
* old: |-----|-----|-----| (a'.bo_offset=n, a.bo_offset=n+1, a''.bo_offset=n+2)
*
*
* 13) Existent mapping is a right aligned subset of the requested one, hence
* replace the existent one.
*
* ::
*
* 1 a 2
* old: |-----| (bo_offset=n+1)
*
* 0 a 2
* req: |-----------| (bo_offset=n)
*
* 0 a 2
* new: |-----------| (bo_offset=n)
*
* .. note::
* We expect to see the same result for a request with a different bo
* and/or non-contiguous bo_offset.
*
*
* 14) Existent mapping is a centered subset of the requested one, hence
* replace the existent one.
*
* ::
*
* 1 a 2
* old: |-----| (bo_offset=n+1)
*
* 0 a 3
* req: |----------------| (bo_offset=n)
*
* 0 a 3
* new: |----------------| (bo_offset=n)
*
* .. note::
* We expect to see the same result for a request with a different bo
* and/or non-contiguous bo_offset.
*
*
* 15) Existent mappings is overlapped at the beginning by the requested mapping
* backed by a different BO. Hence, map the requested mapping and split up
* the existent one, adjusting its BO offset accordingly.
*
* ::
*
* 1 a 3
* old: |-----------| (bo_offset=n)
*
* 0 b 2
* req: |-----------| (bo_offset=m)
*
* 0 b 2 a' 3
* new: |-----------|-----| (b.bo_offset=m,a.bo_offset=n+2)
*/
/**
* DOC: Locking
*
* In terms of managing &drm_gpuva entries DRM GPUVM does not take care of
* locking itself, it is the drivers responsibility to take care about locking.
* Drivers might want to protect the following operations: inserting, removing
* and iterating &drm_gpuva objects as well as generating all kinds of
* operations, such as split / merge or prefetch.
*
* DRM GPUVM also does not take care of the locking of the backing
* &drm_gem_object buffers GPU VA lists and &drm_gpuvm_bo abstractions by
* itself; drivers are responsible to enforce mutual exclusion using either the
* GEMs dma_resv lock or alternatively a driver specific external lock. For the
* latter see also drm_gem_gpuva_set_lock().
*
* However, DRM GPUVM contains lockdep checks to ensure callers of its API hold
* the corresponding lock whenever the &drm_gem_objects GPU VA list is accessed
* by functions such as drm_gpuva_link() or drm_gpuva_unlink(), but also
* drm_gpuvm_bo_obtain() and drm_gpuvm_bo_put().
*
* The latter is required since on creation and destruction of a &drm_gpuvm_bo
* the &drm_gpuvm_bo is attached / removed from the &drm_gem_objects gpuva list.
* Subsequent calls to drm_gpuvm_bo_obtain() for the same &drm_gpuvm and
* &drm_gem_object must be able to observe previous creations and destructions
* of &drm_gpuvm_bos in order to keep instances unique.
*
* The &drm_gpuvm's lists for keeping track of external and evicted objects are
* protected against concurrent insertion / removal and iteration internally.
*
* However, drivers still need ensure to protect concurrent calls to functions
* iterating those lists, namely drm_gpuvm_prepare_objects() and
* drm_gpuvm_validate().
*
* Alternatively, drivers can set the &DRM_GPUVM_RESV_PROTECTED flag to indicate
* that the corresponding &dma_resv locks are held in order to protect the
* lists. If &DRM_GPUVM_RESV_PROTECTED is set, internal locking is disabled and
* the corresponding lockdep checks are enabled. This is an optimization for
* drivers which are capable of taking the corresponding &dma_resv locks and
* hence do not require internal locking.
*/
/**
* DOC: Examples
*
* This section gives two examples on how to let the DRM GPUVA Manager generate
* &drm_gpuva_op in order to satisfy a given map or unmap request and how to
* make use of them.
*
* The below code is strictly limited to illustrate the generic usage pattern.
* To maintain simplicitly, it doesn't make use of any abstractions for common
* code, different (asyncronous) stages with fence signalling critical paths,
* any other helpers or error handling in terms of freeing memory and dropping
* previously taken locks.
*
* 1) Obtain a list of &drm_gpuva_op to create a new mapping::
*
* // Allocates a new &drm_gpuva.
* struct drm_gpuva * driver_gpuva_alloc(void);
*
* // Typically drivers would embedd the &drm_gpuvm and &drm_gpuva
* // structure in individual driver structures and lock the dma-resv with
* // drm_exec or similar helpers.
* int driver_mapping_create(struct drm_gpuvm *gpuvm,
* u64 addr, u64 range,
* struct drm_gem_object *obj, u64 offset)
* {
* struct drm_gpuva_ops *ops;
* struct drm_gpuva_op *op
* struct drm_gpuvm_bo *vm_bo;
*
* driver_lock_va_space();
* ops = drm_gpuvm_sm_map_ops_create(gpuvm, addr, range,
* obj, offset);
* if (IS_ERR(ops))
* return PTR_ERR(ops);
*
* vm_bo = drm_gpuvm_bo_obtain(gpuvm, obj);
* if (IS_ERR(vm_bo))
* return PTR_ERR(vm_bo);
*
* drm_gpuva_for_each_op(op, ops) {
* struct drm_gpuva *va;
*
* switch (op->op) {
* case DRM_GPUVA_OP_MAP:
* va = driver_gpuva_alloc();
* if (!va)
* ; // unwind previous VA space updates,
* // free memory and unlock
*
* driver_vm_map();
* drm_gpuva_map(gpuvm, va, &op->map);
* drm_gpuva_link(va, vm_bo);
*
* break;
* case DRM_GPUVA_OP_REMAP: {
* struct drm_gpuva *prev = NULL, *next = NULL;
*
* va = op->remap.unmap->va;
*
* if (op->remap.prev) {
* prev = driver_gpuva_alloc();
* if (!prev)
* ; // unwind previous VA space
* // updates, free memory and
* // unlock
* }
*
* if (op->remap.next) {
* next = driver_gpuva_alloc();
* if (!next)
* ; // unwind previous VA space
* // updates, free memory and
* // unlock
* }
*
* driver_vm_remap();
* drm_gpuva_remap(prev, next, &op->remap);
*
* if (prev)
* drm_gpuva_link(prev, va->vm_bo);
* if (next)
* drm_gpuva_link(next, va->vm_bo);
* drm_gpuva_unlink(va);
*
* break;
* }
* case DRM_GPUVA_OP_UNMAP:
* va = op->unmap->va;
*
* driver_vm_unmap();
* drm_gpuva_unlink(va);
* drm_gpuva_unmap(&op->unmap);
*
* break;
* default:
* break;
* }
* }
* drm_gpuvm_bo_put(vm_bo);
* driver_unlock_va_space();
*
* return 0;
* }
*
* 2) Receive a callback for each &drm_gpuva_op to create a new mapping::
*
* struct driver_context {
* struct drm_gpuvm *gpuvm;
* struct drm_gpuvm_bo *vm_bo;
* struct drm_gpuva *new_va;
* struct drm_gpuva *prev_va;
* struct drm_gpuva *next_va;
* };
*
* // ops to pass to drm_gpuvm_init()
* static const struct drm_gpuvm_ops driver_gpuvm_ops = {
* .sm_step_map = driver_gpuva_map,
* .sm_step_remap = driver_gpuva_remap,
* .sm_step_unmap = driver_gpuva_unmap,
* };
*
* // Typically drivers would embedd the &drm_gpuvm and &drm_gpuva
* // structure in individual driver structures and lock the dma-resv with
* // drm_exec or similar helpers.
* int driver_mapping_create(struct drm_gpuvm *gpuvm,
* u64 addr, u64 range,
* struct drm_gem_object *obj, u64 offset)
* {
* struct driver_context ctx;
* struct drm_gpuvm_bo *vm_bo;
* struct drm_gpuva_ops *ops;
* struct drm_gpuva_op *op;
* int ret = 0;
*
* ctx.gpuvm = gpuvm;
*
* ctx.new_va = kzalloc(sizeof(*ctx.new_va), GFP_KERNEL);
* ctx.prev_va = kzalloc(sizeof(*ctx.prev_va), GFP_KERNEL);
* ctx.next_va = kzalloc(sizeof(*ctx.next_va), GFP_KERNEL);
* ctx.vm_bo = drm_gpuvm_bo_create(gpuvm, obj);
* if (!ctx.new_va || !ctx.prev_va || !ctx.next_va || !vm_bo) {
* ret = -ENOMEM;
* goto out;
* }
*
* // Typically protected with a driver specific GEM gpuva lock
* // used in the fence signaling path for drm_gpuva_link() and
* // drm_gpuva_unlink(), hence pre-allocate.
* ctx.vm_bo = drm_gpuvm_bo_obtain_prealloc(ctx.vm_bo);
*
* driver_lock_va_space();
* ret = drm_gpuvm_sm_map(gpuvm, &ctx, addr, range, obj, offset);
* driver_unlock_va_space();
*
* out:
* drm_gpuvm_bo_put(ctx.vm_bo);
* kfree(ctx.new_va);
* kfree(ctx.prev_va);
* kfree(ctx.next_va);
* return ret;
* }
*
* int driver_gpuva_map(struct drm_gpuva_op *op, void *__ctx)
* {
* struct driver_context *ctx = __ctx;
*
* drm_gpuva_map(ctx->vm, ctx->new_va, &op->map);
*
* drm_gpuva_link(ctx->new_va, ctx->vm_bo);
*
* // prevent the new GPUVA from being freed in
* // driver_mapping_create()
* ctx->new_va = NULL;
*
* return 0;
* }
*
* int driver_gpuva_remap(struct drm_gpuva_op *op, void *__ctx)
* {
* struct driver_context *ctx = __ctx;
* struct drm_gpuva *va = op->remap.unmap->va;
*
* drm_gpuva_remap(ctx->prev_va, ctx->next_va, &op->remap);
*
* if (op->remap.prev) {
* drm_gpuva_link(ctx->prev_va, va->vm_bo);
* ctx->prev_va = NULL;
* }
*
* if (op->remap.next) {
* drm_gpuva_link(ctx->next_va, va->vm_bo);
* ctx->next_va = NULL;
* }
*
* drm_gpuva_unlink(va);
* kfree(va);
*
* return 0;
* }
*
* int driver_gpuva_unmap(struct drm_gpuva_op *op, void *__ctx)
* {
* drm_gpuva_unlink(op->unmap.va);
* drm_gpuva_unmap(&op->unmap);
* kfree(op->unmap.va);
*
* return 0;
* }
*/
/**
* get_next_vm_bo_from_list() - get the next vm_bo element
* @__gpuvm: the &drm_gpuvm
* @__list_name: the name of the list we're iterating on
* @__local_list: a pointer to the local list used to store already iterated items
* @__prev_vm_bo: the previous element we got from get_next_vm_bo_from_list()
*
* This helper is here to provide lockless list iteration. Lockless as in, the
* iterator releases the lock immediately after picking the first element from
* the list, so list insertion deletion can happen concurrently.
*
* Elements popped from the original list are kept in a local list, so removal
* and is_empty checks can still happen while we're iterating the list.
*/
#define get_next_vm_bo_from_list(__gpuvm, __list_name, __local_list, __prev_vm_bo) \
({ \
struct drm_gpuvm_bo *__vm_bo = NULL; \
\
drm_gpuvm_bo_put(__prev_vm_bo); \
\
spin_lock(&(__gpuvm)->__list_name.lock); \
if (!(__gpuvm)->__list_name.local_list) \
(__gpuvm)->__list_name.local_list = __local_list; \
else \
drm_WARN_ON((__gpuvm)->drm, \
(__gpuvm)->__list_name.local_list != __local_list); \
\
while (!list_empty(&(__gpuvm)->__list_name.list)) { \
__vm_bo = list_first_entry(&(__gpuvm)->__list_name.list, \
struct drm_gpuvm_bo, \
list.entry.__list_name); \
if (kref_get_unless_zero(&__vm_bo->kref)) { \
list_move_tail(&(__vm_bo)->list.entry.__list_name, \
__local_list); \
break; \
} else { \
list_del_init(&(__vm_bo)->list.entry.__list_name); \
__vm_bo = NULL; \
} \
} \
spin_unlock(&(__gpuvm)->__list_name.lock); \
\
__vm_bo; \
})
/**
* for_each_vm_bo_in_list() - internal vm_bo list iterator
* @__gpuvm: the &drm_gpuvm
* @__list_name: the name of the list we're iterating on
* @__local_list: a pointer to the local list used to store already iterated items
* @__vm_bo: the struct drm_gpuvm_bo to assign in each iteration step
*
* This helper is here to provide lockless list iteration. Lockless as in, the
* iterator releases the lock immediately after picking the first element from the
* list, hence list insertion and deletion can happen concurrently.
*
* It is not allowed to re-assign the vm_bo pointer from inside this loop.
*
* Typical use:
*
* struct drm_gpuvm_bo *vm_bo;
* LIST_HEAD(my_local_list);
*
* ret = 0;
* for_each_vm_bo_in_list(gpuvm, <list_name>, &my_local_list, vm_bo) {
* ret = do_something_with_vm_bo(..., vm_bo);
* if (ret)
* break;
* }
* // Drop ref in case we break out of the loop.
* drm_gpuvm_bo_put(vm_bo);
* restore_vm_bo_list(gpuvm, <list_name>, &my_local_list);
*
*
* Only used for internal list iterations, not meant to be exposed to the outside
* world.
*/
#define for_each_vm_bo_in_list(__gpuvm, __list_name, __local_list, __vm_bo) \
for (__vm_bo = get_next_vm_bo_from_list(__gpuvm, __list_name, \
__local_list, NULL); \
__vm_bo; \
__vm_bo = get_next_vm_bo_from_list(__gpuvm, __list_name, \
__local_list, __vm_bo))
static void
__restore_vm_bo_list(struct drm_gpuvm *gpuvm, spinlock_t *lock,
struct list_head *list, struct list_head **local_list)
{
/* Merge back the two lists, moving local list elements to the
* head to preserve previous ordering, in case it matters.
*/
spin_lock(lock);
if (*local_list) {
list_splice(*local_list, list);
*local_list = NULL;
}
spin_unlock(lock);
}
/**
* restore_vm_bo_list() - move vm_bo elements back to their original list
* @__gpuvm: the &drm_gpuvm
* @__list_name: the name of the list we're iterating on
*
* When we're done iterating a vm_bo list, we should call restore_vm_bo_list()
* to restore the original state and let new iterations take place.
*/
#define restore_vm_bo_list(__gpuvm, __list_name) \
__restore_vm_bo_list((__gpuvm), &(__gpuvm)->__list_name.lock, \
&(__gpuvm)->__list_name.list, \
&(__gpuvm)->__list_name.local_list)
static void
cond_spin_lock(spinlock_t *lock, bool cond)
{
if (cond)
spin_lock(lock);
}
static void
cond_spin_unlock(spinlock_t *lock, bool cond)
{
if (cond)
spin_unlock(lock);
}
static void
__drm_gpuvm_bo_list_add(struct drm_gpuvm *gpuvm, spinlock_t *lock,
struct list_head *entry, struct list_head *list)
{
cond_spin_lock(lock, !!lock);
if (list_empty(entry))
list_add_tail(entry, list);
cond_spin_unlock(lock, !!lock);
}
/**
* drm_gpuvm_bo_list_add() - insert a vm_bo into the given list
* @__vm_bo: the &drm_gpuvm_bo
* @__list_name: the name of the list to insert into
* @__lock: whether to lock with the internal spinlock
*
* Inserts the given @__vm_bo into the list specified by @__list_name.
*/
#define drm_gpuvm_bo_list_add(__vm_bo, __list_name, __lock) \
__drm_gpuvm_bo_list_add((__vm_bo)->vm, \
__lock ? &(__vm_bo)->vm->__list_name.lock : \
NULL, \
&(__vm_bo)->list.entry.__list_name, \
&(__vm_bo)->vm->__list_name.list)
static void
__drm_gpuvm_bo_list_del(struct drm_gpuvm *gpuvm, spinlock_t *lock,
struct list_head *entry, bool init)
{
cond_spin_lock(lock, !!lock);
if (init) {
if (!list_empty(entry))
list_del_init(entry);
} else {
list_del(entry);
}
cond_spin_unlock(lock, !!lock);
}
/**
* drm_gpuvm_bo_list_del_init() - remove a vm_bo from the given list
* @__vm_bo: the &drm_gpuvm_bo
* @__list_name: the name of the list to insert into
* @__lock: whether to lock with the internal spinlock
*
* Removes the given @__vm_bo from the list specified by @__list_name.
*/
#define drm_gpuvm_bo_list_del_init(__vm_bo, __list_name, __lock) \
__drm_gpuvm_bo_list_del((__vm_bo)->vm, \
__lock ? &(__vm_bo)->vm->__list_name.lock : \
NULL, \
&(__vm_bo)->list.entry.__list_name, \
true)
/**
* drm_gpuvm_bo_list_del() - remove a vm_bo from the given list
* @__vm_bo: the &drm_gpuvm_bo
* @__list_name: the name of the list to insert into
* @__lock: whether to lock with the internal spinlock
*
* Removes the given @__vm_bo from the list specified by @__list_name.
*/
#define drm_gpuvm_bo_list_del(__vm_bo, __list_name, __lock) \
__drm_gpuvm_bo_list_del((__vm_bo)->vm, \
__lock ? &(__vm_bo)->vm->__list_name.lock : \
NULL, \
&(__vm_bo)->list.entry.__list_name, \
false)
#define to_drm_gpuva(__node) container_of((__node), struct drm_gpuva, rb.node)
#define GPUVA_START(node) ((node)->va.addr)
#define GPUVA_LAST(node) ((node)->va.addr + (node)->va.range - 1)
/* We do not actually use drm_gpuva_it_next(), tell the compiler to not complain
* about this.
*/
INTERVAL_TREE_DEFINE(struct drm_gpuva, rb.node, u64, rb.__subtree_last,
GPUVA_START, GPUVA_LAST, static __maybe_unused,
drm_gpuva_it)
static int __drm_gpuva_insert(struct drm_gpuvm *gpuvm,
struct drm_gpuva *va);
static void __drm_gpuva_remove(struct drm_gpuva *va);
static bool
drm_gpuvm_check_overflow(u64 addr, u64 range)
{
u64 end;
return check_add_overflow(addr, range, &end);
}
static bool
drm_gpuvm_warn_check_overflow(struct drm_gpuvm *gpuvm, u64 addr, u64 range)
{
return drm_WARN(gpuvm->drm, drm_gpuvm_check_overflow(addr, range),
"GPUVA address limited to %zu bytes.\n", sizeof(addr));
}
static bool
drm_gpuvm_in_mm_range(struct drm_gpuvm *gpuvm, u64 addr, u64 range)
{
u64 end = addr + range;
u64 mm_start = gpuvm->mm_start;
u64 mm_end = mm_start + gpuvm->mm_range;
return addr >= mm_start && end <= mm_end;
}
static bool
drm_gpuvm_in_kernel_node(struct drm_gpuvm *gpuvm, u64 addr, u64 range)
{
u64 end = addr + range;
u64 kstart = gpuvm->kernel_alloc_node.va.addr;
u64 krange = gpuvm->kernel_alloc_node.va.range;
u64 kend = kstart + krange;
return krange && addr < kend && kstart < end;
}
/**
* drm_gpuvm_range_valid() - checks whether the given range is valid for the
* given &drm_gpuvm
* @gpuvm: the GPUVM to check the range for
* @addr: the base address
* @range: the range starting from the base address
*
* Checks whether the range is within the GPUVM's managed boundaries.
*
* Returns: true for a valid range, false otherwise
*/
bool
drm_gpuvm_range_valid(struct drm_gpuvm *gpuvm,
u64 addr, u64 range)
{
return !drm_gpuvm_check_overflow(addr, range) &&
drm_gpuvm_in_mm_range(gpuvm, addr, range) &&
!drm_gpuvm_in_kernel_node(gpuvm, addr, range);
}
EXPORT_SYMBOL_GPL(drm_gpuvm_range_valid);
static void
drm_gpuvm_gem_object_free(struct drm_gem_object *obj)
{
drm_gem_object_release(obj);
kfree(obj);
}
static const struct drm_gem_object_funcs drm_gpuvm_object_funcs = {
.free = drm_gpuvm_gem_object_free,
};
/**
* drm_gpuvm_resv_object_alloc() - allocate a dummy &drm_gem_object
* @drm: the drivers &drm_device
*
* Allocates a dummy &drm_gem_object which can be passed to drm_gpuvm_init() in
* order to serve as root GEM object providing the &drm_resv shared across
* &drm_gem_objects local to a single GPUVM.
*
* Returns: the &drm_gem_object on success, NULL on failure
*/
struct drm_gem_object *
drm_gpuvm_resv_object_alloc(struct drm_device *drm)
{
struct drm_gem_object *obj;
obj = kzalloc(sizeof(*obj), GFP_KERNEL);
if (!obj)
return NULL;
obj->funcs = &drm_gpuvm_object_funcs;
drm_gem_private_object_init(drm, obj, 0);
return obj;
}
EXPORT_SYMBOL_GPL(drm_gpuvm_resv_object_alloc);
/**
* drm_gpuvm_init() - initialize a &drm_gpuvm
* @gpuvm: pointer to the &drm_gpuvm to initialize
* @name: the name of the GPU VA space
* @flags: the &drm_gpuvm_flags for this GPUVM
* @drm: the &drm_device this VM resides in
* @r_obj: the resv &drm_gem_object providing the GPUVM's common &dma_resv
* @start_offset: the start offset of the GPU VA space
* @range: the size of the GPU VA space
* @reserve_offset: the start of the kernel reserved GPU VA area
* @reserve_range: the size of the kernel reserved GPU VA area
* @ops: &drm_gpuvm_ops called on &drm_gpuvm_sm_map / &drm_gpuvm_sm_unmap
*
* The &drm_gpuvm must be initialized with this function before use.
*
* Note that @gpuvm must be cleared to 0 before calling this function. The given
* &name is expected to be managed by the surrounding driver structures.
*/
void
drm_gpuvm_init(struct drm_gpuvm *gpuvm, const char *name,
enum drm_gpuvm_flags flags,
struct drm_device *drm,
struct drm_gem_object *r_obj,
u64 start_offset, u64 range,
u64 reserve_offset, u64 reserve_range,
const struct drm_gpuvm_ops *ops)
{
gpuvm->rb.tree = RB_ROOT_CACHED;
INIT_LIST_HEAD(&gpuvm->rb.list);
INIT_LIST_HEAD(&gpuvm->extobj.list);
spin_lock_init(&gpuvm->extobj.lock);
INIT_LIST_HEAD(&gpuvm->evict.list);
spin_lock_init(&gpuvm->evict.lock);
kref_init(&gpuvm->kref);
gpuvm->name = name ? name : "unknown";
gpuvm->flags = flags;
gpuvm->ops = ops;
gpuvm->drm = drm;
gpuvm->r_obj = r_obj;
drm_gem_object_get(r_obj);
drm_gpuvm_warn_check_overflow(gpuvm, start_offset, range);
gpuvm->mm_start = start_offset;
gpuvm->mm_range = range;
memset(&gpuvm->kernel_alloc_node, 0, sizeof(struct drm_gpuva));
if (reserve_range) {
gpuvm->kernel_alloc_node.va.addr = reserve_offset;
gpuvm->kernel_alloc_node.va.range = reserve_range;
if (likely(!drm_gpuvm_warn_check_overflow(gpuvm, reserve_offset,
reserve_range)))
__drm_gpuva_insert(gpuvm, &gpuvm->kernel_alloc_node);
}
}
EXPORT_SYMBOL_GPL(drm_gpuvm_init);
static void
drm_gpuvm_fini(struct drm_gpuvm *gpuvm)
{
gpuvm->name = NULL;
if (gpuvm->kernel_alloc_node.va.range)
__drm_gpuva_remove(&gpuvm->kernel_alloc_node);
drm_WARN(gpuvm->drm, !RB_EMPTY_ROOT(&gpuvm->rb.tree.rb_root),
"GPUVA tree is not empty, potentially leaking memory.\n");
drm_WARN(gpuvm->drm, !list_empty(&gpuvm->extobj.list),
"Extobj list should be empty.\n");
drm_WARN(gpuvm->drm, !list_empty(&gpuvm->evict.list),
"Evict list should be empty.\n");
drm_gem_object_put(gpuvm->r_obj);
}
static void
drm_gpuvm_free(struct kref *kref)
{
struct drm_gpuvm *gpuvm = container_of(kref, struct drm_gpuvm, kref);
drm_gpuvm_fini(gpuvm);
if (drm_WARN_ON(gpuvm->drm, !gpuvm->ops->vm_free))
return;
gpuvm->ops->vm_free(gpuvm);
}
/**
* drm_gpuvm_put() - drop a struct drm_gpuvm reference
* @gpuvm: the &drm_gpuvm to release the reference of
*
* This releases a reference to @gpuvm.
*
* This function may be called from atomic context.
*/
void
drm_gpuvm_put(struct drm_gpuvm *gpuvm)
{
if (gpuvm)
kref_put(&gpuvm->kref, drm_gpuvm_free);
}
EXPORT_SYMBOL_GPL(drm_gpuvm_put);
static int
exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
unsigned int num_fences)
{
return num_fences ? drm_exec_prepare_obj(exec, obj, num_fences) :
drm_exec_lock_obj(exec, obj);
}
/**
* drm_gpuvm_prepare_vm() - prepare the GPUVMs common dma-resv
* @gpuvm: the &drm_gpuvm
* @exec: the &drm_exec context
* @num_fences: the amount of &dma_fences to reserve
*
* Calls drm_exec_prepare_obj() for the GPUVMs dummy &drm_gem_object; if
* @num_fences is zero drm_exec_lock_obj() is called instead.
*
* Using this function directly, it is the drivers responsibility to call
* drm_exec_init() and drm_exec_fini() accordingly.
*
* Returns: 0 on success, negative error code on failure.
*/
int
drm_gpuvm_prepare_vm(struct drm_gpuvm *gpuvm,
struct drm_exec *exec,
unsigned int num_fences)
{
return exec_prepare_obj(exec, gpuvm->r_obj, num_fences);
}
EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_vm);
static int
__drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
struct drm_exec *exec,
unsigned int num_fences)
{
struct drm_gpuvm_bo *vm_bo;
LIST_HEAD(extobjs);
int ret = 0;
for_each_vm_bo_in_list(gpuvm, extobj, &extobjs, vm_bo) {
ret = exec_prepare_obj(exec, vm_bo->obj, num_fences);
if (ret)
break;
}
/* Drop ref in case we break out of the loop. */
drm_gpuvm_bo_put(vm_bo);
restore_vm_bo_list(gpuvm, extobj);
return ret;
}
static int
drm_gpuvm_prepare_objects_locked(struct drm_gpuvm *gpuvm,
struct drm_exec *exec,
unsigned int num_fences)
{
struct drm_gpuvm_bo *vm_bo;
int ret = 0;
drm_gpuvm_resv_assert_held(gpuvm);
list_for_each_entry(vm_bo, &gpuvm->extobj.list, list.entry.extobj) {
ret = exec_prepare_obj(exec, vm_bo->obj, num_fences);
if (ret)
break;
if (vm_bo->evicted)
drm_gpuvm_bo_list_add(vm_bo, evict, false);
}
return ret;
}
/**
* drm_gpuvm_prepare_objects() - prepare all assoiciated BOs
* @gpuvm: the &drm_gpuvm
* @exec: the &drm_exec locking context
* @num_fences: the amount of &dma_fences to reserve
*
* Calls drm_exec_prepare_obj() for all &drm_gem_objects the given
* &drm_gpuvm contains mappings of; if @num_fences is zero drm_exec_lock_obj()
* is called instead.
*
* Using this function directly, it is the drivers responsibility to call
* drm_exec_init() and drm_exec_fini() accordingly.
*
* Note: This function is safe against concurrent insertion and removal of
* external objects, however it is not safe against concurrent usage itself.
*
* Drivers need to make sure to protect this case with either an outer VM lock
* or by calling drm_gpuvm_prepare_vm() before this function within the
* drm_exec_until_all_locked() loop, such that the GPUVM's dma-resv lock ensures
* mutual exclusion.
*
* Returns: 0 on success, negative error code on failure.
*/
int
drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
struct drm_exec *exec,
unsigned int num_fences)
{
if (drm_gpuvm_resv_protected(gpuvm))
return drm_gpuvm_prepare_objects_locked(gpuvm, exec,
num_fences);
else
return __drm_gpuvm_prepare_objects(gpuvm, exec, num_fences);
}
EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_objects);
/**
* drm_gpuvm_prepare_range() - prepare all BOs mapped within a given range
* @gpuvm: the &drm_gpuvm
* @exec: the &drm_exec locking context
* @addr: the start address within the VA space
* @range: the range to iterate within the VA space
* @num_fences: the amount of &dma_fences to reserve
*
* Calls drm_exec_prepare_obj() for all &drm_gem_objects mapped between @addr
* and @addr + @range; if @num_fences is zero drm_exec_lock_obj() is called
* instead.
*
* Returns: 0 on success, negative error code on failure.
*/
int
drm_gpuvm_prepare_range(struct drm_gpuvm *gpuvm, struct drm_exec *exec,
u64 addr, u64 range, unsigned int num_fences)
{
struct drm_gpuva *va;
u64 end = addr + range;
int ret;
drm_gpuvm_for_each_va_range(va, gpuvm, addr, end) {
struct drm_gem_object *obj = va->gem.obj;
ret = exec_prepare_obj(exec, obj, num_fences);
if (ret)
return ret;
}
return 0;
}
EXPORT_SYMBOL_GPL(drm_gpuvm_prepare_range);
/**
* drm_gpuvm_exec_lock() - lock all dma-resv of all assoiciated BOs
* @vm_exec: the &drm_gpuvm_exec wrapper
*
* Acquires all dma-resv locks of all &drm_gem_objects the given
* &drm_gpuvm contains mappings of.
*
* Addionally, when calling this function with struct drm_gpuvm_exec::extra
* being set the driver receives the given @fn callback to lock additional
* dma-resv in the context of the &drm_gpuvm_exec instance. Typically, drivers
* would call drm_exec_prepare_obj() from within this callback.
*
* Returns: 0 on success, negative error code on failure.
*/
int
drm_gpuvm_exec_lock(struct drm_gpuvm_exec *vm_exec)
{
struct drm_gpuvm *gpuvm = vm_exec->vm;
struct drm_exec *exec = &vm_exec->exec;
unsigned int num_fences = vm_exec->num_fences;
int ret;
drm_exec_init(exec, vm_exec->flags, 0);
drm_exec_until_all_locked(exec) {
ret = drm_gpuvm_prepare_vm(gpuvm, exec, num_fences);
drm_exec_retry_on_contention(exec);
if (ret)
goto err;
ret = drm_gpuvm_prepare_objects(gpuvm, exec, num_fences);
drm_exec_retry_on_contention(exec);
if (ret)
goto err;
if (vm_exec->extra.fn) {
ret = vm_exec->extra.fn(vm_exec);
drm_exec_retry_on_contention(exec);
if (ret)
goto err;
}
}
return 0;
err:
drm_exec_fini(exec);
return ret;
}
EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock);
static int
fn_lock_array(struct drm_gpuvm_exec *vm_exec)
{
struct {
struct drm_gem_object **objs;
unsigned int num_objs;
} *args = vm_exec->extra.priv;
return drm_exec_prepare_array(&vm_exec->exec, args->objs,
args->num_objs, vm_exec->num_fences);
}
/**
* drm_gpuvm_exec_lock_array() - lock all dma-resv of all assoiciated BOs
* @vm_exec: the &drm_gpuvm_exec wrapper
* @objs: additional &drm_gem_objects to lock
* @num_objs: the number of additional &drm_gem_objects to lock
*
* Acquires all dma-resv locks of all &drm_gem_objects the given &drm_gpuvm
* contains mappings of, plus the ones given through @objs.
*
* Returns: 0 on success, negative error code on failure.
*/
int
drm_gpuvm_exec_lock_array(struct drm_gpuvm_exec *vm_exec,
struct drm_gem_object **objs,
unsigned int num_objs)
{
struct {
struct drm_gem_object **objs;
unsigned int num_objs;
} args;
args.objs = objs;
args.num_objs = num_objs;
vm_exec->extra.fn = fn_lock_array;
vm_exec->extra.priv = &args;
return drm_gpuvm_exec_lock(vm_exec);
}
EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_array);
/**
* drm_gpuvm_exec_lock_range() - prepare all BOs mapped within a given range
* @vm_exec: the &drm_gpuvm_exec wrapper
* @addr: the start address within the VA space
* @range: the range to iterate within the VA space
*
* Acquires all dma-resv locks of all &drm_gem_objects mapped between @addr and
* @addr + @range.
*
* Returns: 0 on success, negative error code on failure.
*/
int
drm_gpuvm_exec_lock_range(struct drm_gpuvm_exec *vm_exec,
u64 addr, u64 range)
{
struct drm_gpuvm *gpuvm = vm_exec->vm;
struct drm_exec *exec = &vm_exec->exec;
int ret;
drm_exec_init(exec, vm_exec->flags, 0);
drm_exec_until_all_locked(exec) {
ret = drm_gpuvm_prepare_range(gpuvm, exec, addr, range,
vm_exec->num_fences);
drm_exec_retry_on_contention(exec);
if (ret)
goto err;
}
return ret;
err:
drm_exec_fini(exec);
return ret;
}
EXPORT_SYMBOL_GPL(drm_gpuvm_exec_lock_range);
static int
__drm_gpuvm_validate(struct drm_gpuvm *gpuvm, struct drm_exec *exec)
{
const struct drm_gpuvm_ops *ops = gpuvm->ops;
struct drm_gpuvm_bo *vm_bo;
LIST_HEAD(evict);
int ret = 0;
for_each_vm_bo_in_list(gpuvm, evict, &evict, vm_bo) {
ret = ops->vm_bo_validate(vm_bo, exec);
if (ret)
break;
}
/* Drop ref in case we break out of the loop. */
drm_gpuvm_bo_put(vm_bo);
restore_vm_bo_list(gpuvm, evict);
return ret;
}
static int
drm_gpuvm_validate_locked(struct drm_gpuvm *gpuvm, struct drm_exec *exec)
{
const struct drm_gpuvm_ops *ops = gpuvm->ops;
struct drm_gpuvm_bo *vm_bo, *next;
int ret = 0;
drm_gpuvm_resv_assert_held(gpuvm);
list_for_each_entry_safe(vm_bo, next, &gpuvm->evict.list,
list.entry.evict) {
ret = ops->vm_bo_validate(vm_bo, exec);
if (ret)
break;
dma_resv_assert_held(vm_bo->obj->resv);
if (!vm_bo->evicted)
drm_gpuvm_bo_list_del_init(vm_bo, evict, false);
}
return ret;
}
/**
* drm_gpuvm_validate() - validate all BOs marked as evicted
* @gpuvm: the &drm_gpuvm to validate evicted BOs
* @exec: the &drm_exec instance used for locking the GPUVM
*
* Calls the &drm_gpuvm_ops::vm_bo_validate callback for all evicted buffer
* objects being mapped in the given &drm_gpuvm.
*
* Returns: 0 on success, negative error code on failure.
*/
int
drm_gpuvm_validate(struct drm_gpuvm *gpuvm, struct drm_exec *exec)
{
const struct drm_gpuvm_ops *ops = gpuvm->ops;
if (unlikely(!ops || !ops->vm_bo_validate))
return -EOPNOTSUPP;
if (drm_gpuvm_resv_protected(gpuvm))
return drm_gpuvm_validate_locked(gpuvm, exec);
else
return __drm_gpuvm_validate(gpuvm, exec);
}
EXPORT_SYMBOL_GPL(drm_gpuvm_validate);
/**
* drm_gpuvm_resv_add_fence - add fence to private and all extobj
* dma-resv
* @gpuvm: the &drm_gpuvm to add a fence to
* @exec: the &drm_exec locking context
* @fence: fence to add
* @private_usage: private dma-resv usage
* @extobj_usage: extobj dma-resv usage
*/
void
drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm,
struct drm_exec *exec,
struct dma_fence *fence,
enum dma_resv_usage private_usage,
enum dma_resv_usage extobj_usage)
{
struct drm_gem_object *obj;
unsigned long index;
drm_exec_for_each_locked_object(exec, index, obj) {
dma_resv_assert_held(obj->resv);
dma_resv_add_fence(obj->resv, fence,
drm_gpuvm_is_extobj(gpuvm, obj) ?
extobj_usage : private_usage);
}
}
EXPORT_SYMBOL_GPL(drm_gpuvm_resv_add_fence);
/**
* drm_gpuvm_bo_create() - create a new instance of struct drm_gpuvm_bo
* @gpuvm: The &drm_gpuvm the @obj is mapped in.
* @obj: The &drm_gem_object being mapped in the @gpuvm.
*
* If provided by the driver, this function uses the &drm_gpuvm_ops
* vm_bo_alloc() callback to allocate.
*
* Returns: a pointer to the &drm_gpuvm_bo on success, NULL on failure
*/
struct drm_gpuvm_bo *
drm_gpuvm_bo_create(struct drm_gpuvm *gpuvm,
struct drm_gem_object *obj)
{
const struct drm_gpuvm_ops *ops = gpuvm->ops;
struct drm_gpuvm_bo *vm_bo;
if (ops && ops->vm_bo_alloc)
vm_bo = ops->vm_bo_alloc();
else
vm_bo = kzalloc(sizeof(*vm_bo), GFP_KERNEL);
if (unlikely(!vm_bo))
return NULL;
vm_bo->vm = drm_gpuvm_get(gpuvm);
vm_bo->obj = obj;
drm_gem_object_get(obj);
kref_init(&vm_bo->kref);
INIT_LIST_HEAD(&vm_bo->list.gpuva);
INIT_LIST_HEAD(&vm_bo->list.entry.gem);
INIT_LIST_HEAD(&vm_bo->list.entry.extobj);
INIT_LIST_HEAD(&vm_bo->list.entry.evict);
return vm_bo;
}
EXPORT_SYMBOL_GPL(drm_gpuvm_bo_create);
static void
drm_gpuvm_bo_destroy(struct kref *kref)
{
struct drm_gpuvm_bo *vm_bo = container_of(kref, struct drm_gpuvm_bo,
kref);
struct drm_gpuvm *gpuvm = vm_bo->vm;
const struct drm_gpuvm_ops *ops = gpuvm->ops;
struct drm_gem_object *obj = vm_bo->obj;
bool lock = !drm_gpuvm_resv_protected(gpuvm);
if (!lock)
drm_gpuvm_resv_assert_held(gpuvm);
drm_gpuvm_bo_list_del(vm_bo, extobj, lock);
drm_gpuvm_bo_list_del(vm_bo, evict, lock);
drm_gem_gpuva_assert_lock_held(obj);
list_del(&vm_bo->list.entry.gem);
if (ops && ops->vm_bo_free)
ops->vm_bo_free(vm_bo);
else
kfree(vm_bo);
drm_gpuvm_put(gpuvm);
drm_gem_object_put(obj);
}
/**
* drm_gpuvm_bo_put() - drop a struct drm_gpuvm_bo reference
* @vm_bo: the &drm_gpuvm_bo to release the reference of
*
* This releases a reference to @vm_bo.
*
* If the reference count drops to zero, the &gpuvm_bo is destroyed, which
* includes removing it from the GEMs gpuva list. Hence, if a call to this
* function can potentially let the reference count drop to zero the caller must
* hold the dma-resv or driver specific GEM gpuva lock.
*
* This function may only be called from non-atomic context.
*
* Returns: true if vm_bo was destroyed, false otherwise.
*/
bool
drm_gpuvm_bo_put(struct drm_gpuvm_bo *vm_bo)
{
might_sleep();
if (vm_bo)
return !!kref_put(&vm_bo->kref, drm_gpuvm_bo_destroy);
return false;
}
EXPORT_SYMBOL_GPL(drm_gpuvm_bo_put);
static struct drm_gpuvm_bo *
__drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
struct drm_gem_object *obj)
{
struct drm_gpuvm_bo *vm_bo;
drm_gem_gpuva_assert_lock_held(obj);
drm_gem_for_each_gpuvm_bo(vm_bo, obj)
if (vm_bo->vm == gpuvm)
return vm_bo;
return NULL;
}
/**
* drm_gpuvm_bo_find() - find the &drm_gpuvm_bo for the given
* &drm_gpuvm and &drm_gem_object
* @gpuvm: The &drm_gpuvm the @obj is mapped in.
* @obj: The &drm_gem_object being mapped in the @gpuvm.
*
* Find the &drm_gpuvm_bo representing the combination of the given
* &drm_gpuvm and &drm_gem_object. If found, increases the reference
* count of the &drm_gpuvm_bo accordingly.
*
* Returns: a pointer to the &drm_gpuvm_bo on success, NULL on failure
*/
struct drm_gpuvm_bo *
drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
struct drm_gem_object *obj)
{
struct drm_gpuvm_bo *vm_bo = __drm_gpuvm_bo_find(gpuvm, obj);
return vm_bo ? drm_gpuvm_bo_get(vm_bo) : NULL;
}
EXPORT_SYMBOL_GPL(drm_gpuvm_bo_find);
/**
* drm_gpuvm_bo_obtain() - obtains and instance of the &drm_gpuvm_bo for the
* given &drm_gpuvm and &drm_gem_object
* @gpuvm: The &drm_gpuvm the @obj is mapped in.
* @obj: The &drm_gem_object being mapped in the @gpuvm.
*
* Find the &drm_gpuvm_bo representing the combination of the given
* &drm_gpuvm and &drm_gem_object. If found, increases the reference
* count of the &drm_gpuvm_bo accordingly. If not found, allocates a new
* &drm_gpuvm_bo.
*
* A new &drm_gpuvm_bo is added to the GEMs gpuva list.
*
* Returns: a pointer to the &drm_gpuvm_bo on success, an ERR_PTR on failure
*/
struct drm_gpuvm_bo *
drm_gpuvm_bo_obtain(struct drm_gpuvm *gpuvm,
struct drm_gem_object *obj)
{
struct drm_gpuvm_bo *vm_bo;
vm_bo = drm_gpuvm_bo_find(gpuvm, obj);
if (vm_bo)
return vm_bo;
vm_bo = drm_gpuvm_bo_create(gpuvm, obj);
if (!vm_bo)
return ERR_PTR(-ENOMEM);
drm_gem_gpuva_assert_lock_held(obj);
list_add_tail(&vm_bo->list.entry.gem, &obj->gpuva.list);
return vm_bo;
}
EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain);
/**
* drm_gpuvm_bo_obtain_prealloc() - obtains and instance of the &drm_gpuvm_bo
* for the given &drm_gpuvm and &drm_gem_object
* @__vm_bo: A pre-allocated struct drm_gpuvm_bo.
*
* Find the &drm_gpuvm_bo representing the combination of the given
* &drm_gpuvm and &drm_gem_object. If found, increases the reference
* count of the found &drm_gpuvm_bo accordingly, while the @__vm_bo reference
* count is decreased. If not found @__vm_bo is returned without further
* increase of the reference count.
*
* A new &drm_gpuvm_bo is added to the GEMs gpuva list.
*
* Returns: a pointer to the found &drm_gpuvm_bo or @__vm_bo if no existing
* &drm_gpuvm_bo was found
*/
struct drm_gpuvm_bo *
drm_gpuvm_bo_obtain_prealloc(struct drm_gpuvm_bo *__vm_bo)
{
struct drm_gpuvm *gpuvm = __vm_bo->vm;
struct drm_gem_object *obj = __vm_bo->obj;
struct drm_gpuvm_bo *vm_bo;
vm_bo = drm_gpuvm_bo_find(gpuvm, obj);
if (vm_bo) {
drm_gpuvm_bo_put(__vm_bo);
return vm_bo;
}
drm_gem_gpuva_assert_lock_held(obj);
list_add_tail(&__vm_bo->list.entry.gem, &obj->gpuva.list);
return __vm_bo;
}
EXPORT_SYMBOL_GPL(drm_gpuvm_bo_obtain_prealloc);
/**
* drm_gpuvm_bo_extobj_add() - adds the &drm_gpuvm_bo to its &drm_gpuvm's
* extobj list
* @vm_bo: The &drm_gpuvm_bo to add to its &drm_gpuvm's the extobj list.
*
* Adds the given @vm_bo to its &drm_gpuvm's extobj list if not on the list
* already and if the corresponding &drm_gem_object is an external object,
* actually.
*/
void
drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo)
{
struct drm_gpuvm *gpuvm = vm_bo->vm;
bool lock = !drm_gpuvm_resv_protected(gpuvm);
if (!lock)
drm_gpuvm_resv_assert_held(gpuvm);
if (drm_gpuvm_is_extobj(gpuvm, vm_bo->obj))
drm_gpuvm_bo_list_add(vm_bo, extobj, lock);
}
EXPORT_SYMBOL_GPL(drm_gpuvm_bo_extobj_add);
/**
* drm_gpuvm_bo_evict() - add / remove a &drm_gpuvm_bo to / from the &drm_gpuvms
* evicted list
* @vm_bo: the &drm_gpuvm_bo to add or remove
* @evict: indicates whether the object is evicted
*
* Adds a &drm_gpuvm_bo to or removes it from the &drm_gpuvms evicted list.
*/
void
drm_gpuvm_bo_evict(struct drm_gpuvm_bo *vm_bo, bool evict)
{
struct drm_gpuvm *gpuvm = vm_bo->vm;
struct drm_gem_object *obj = vm_bo->obj;
bool lock = !drm_gpuvm_resv_protected(gpuvm);
dma_resv_assert_held(obj->resv);
vm_bo->evicted = evict;
/* Can't add external objects to the evicted list directly if not using
* internal spinlocks, since in this case the evicted list is protected
* with the VM's common dma-resv lock.
*/
if (drm_gpuvm_is_extobj(gpuvm, obj) && !lock)
return;
if (evict)
drm_gpuvm_bo_list_add(vm_bo, evict, lock);
else
drm_gpuvm_bo_list_del_init(vm_bo, evict, lock);
}
EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict);
static int
__drm_gpuva_insert(struct drm_gpuvm *gpuvm,
struct drm_gpuva *va)
{
struct rb_node *node;
struct list_head *head;
if (drm_gpuva_it_iter_first(&gpuvm->rb.tree,
GPUVA_START(va),
GPUVA_LAST(va)))
return -EEXIST;
va->vm = gpuvm;
drm_gpuva_it_insert(va, &gpuvm->rb.tree);
node = rb_prev(&va->rb.node);
if (node)
head = &(to_drm_gpuva(node))->rb.entry;
else
head = &gpuvm->rb.list;
list_add(&va->rb.entry, head);
return 0;
}
/**
* drm_gpuva_insert() - insert a &drm_gpuva
* @gpuvm: the &drm_gpuvm to insert the &drm_gpuva in
* @va: the &drm_gpuva to insert
*
* Insert a &drm_gpuva with a given address and range into a
* &drm_gpuvm.
*
* It is safe to use this function using the safe versions of iterating the GPU
* VA space, such as drm_gpuvm_for_each_va_safe() and
* drm_gpuvm_for_each_va_range_safe().
*
* Returns: 0 on success, negative error code on failure.
*/
int
drm_gpuva_insert(struct drm_gpuvm *gpuvm,
struct drm_gpuva *va)
{
u64 addr = va->va.addr;
u64 range = va->va.range;
int ret;
if (unlikely(!drm_gpuvm_range_valid(gpuvm, addr, range)))
return -EINVAL;
ret = __drm_gpuva_insert(gpuvm, va);
if (likely(!ret))
/* Take a reference of the GPUVM for the successfully inserted
* drm_gpuva. We can't take the reference in
* __drm_gpuva_insert() itself, since we don't want to increse
* the reference count for the GPUVM's kernel_alloc_node.
*/
drm_gpuvm_get(gpuvm);
return ret;
}
EXPORT_SYMBOL_GPL(drm_gpuva_insert);
static void
__drm_gpuva_remove(struct drm_gpuva *va)
{
drm_gpuva_it_remove(va, &va->vm->rb.tree);
list_del_init(&va->rb.entry);
}
/**
* drm_gpuva_remove() - remove a &drm_gpuva
* @va: the &drm_gpuva to remove
*
* This removes the given &va from the underlaying tree.
*
* It is safe to use this function using the safe versions of iterating the GPU
* VA space, such as drm_gpuvm_for_each_va_safe() and
* drm_gpuvm_for_each_va_range_safe().
*/
void
drm_gpuva_remove(struct drm_gpuva *va)
{
struct drm_gpuvm *gpuvm = va->vm;
if (unlikely(va == &gpuvm->kernel_alloc_node)) {
drm_WARN(gpuvm->drm, 1,
"Can't destroy kernel reserved node.\n");
return;
}
__drm_gpuva_remove(va);
drm_gpuvm_put(va->vm);
}
EXPORT_SYMBOL_GPL(drm_gpuva_remove);
/**
* drm_gpuva_link() - link a &drm_gpuva
* @va: the &drm_gpuva to link
* @vm_bo: the &drm_gpuvm_bo to add the &drm_gpuva to
*
* This adds the given &va to the GPU VA list of the &drm_gpuvm_bo and the
* &drm_gpuvm_bo to the &drm_gem_object it is associated with.
*
* For every &drm_gpuva entry added to the &drm_gpuvm_bo an additional
* reference of the latter is taken.
*
* This function expects the caller to protect the GEM's GPUVA list against
* concurrent access using either the GEMs dma_resv lock or a driver specific
* lock set through drm_gem_gpuva_set_lock().
*/
void
drm_gpuva_link(struct drm_gpuva *va, struct drm_gpuvm_bo *vm_bo)
{
struct drm_gem_object *obj = va->gem.obj;
struct drm_gpuvm *gpuvm = va->vm;
if (unlikely(!obj))
return;
drm_WARN_ON(gpuvm->drm, obj != vm_bo->obj);
va->vm_bo = drm_gpuvm_bo_get(vm_bo);
drm_gem_gpuva_assert_lock_held(obj);
list_add_tail(&va->gem.entry, &vm_bo->list.gpuva);
}
EXPORT_SYMBOL_GPL(drm_gpuva_link);
/**
* drm_gpuva_unlink() - unlink a &drm_gpuva
* @va: the &drm_gpuva to unlink
*
* This removes the given &va from the GPU VA list of the &drm_gem_object it is
* associated with.
*
* This removes the given &va from the GPU VA list of the &drm_gpuvm_bo and
* the &drm_gpuvm_bo from the &drm_gem_object it is associated with in case
* this call unlinks the last &drm_gpuva from the &drm_gpuvm_bo.
*
* For every &drm_gpuva entry removed from the &drm_gpuvm_bo a reference of
* the latter is dropped.
*
* This function expects the caller to protect the GEM's GPUVA list against
* concurrent access using either the GEMs dma_resv lock or a driver specific
* lock set through drm_gem_gpuva_set_lock().
*/
void
drm_gpuva_unlink(struct drm_gpuva *va)
{
struct drm_gem_object *obj = va->gem.obj;
struct drm_gpuvm_bo *vm_bo = va->vm_bo;
if (unlikely(!obj))
return;
drm_gem_gpuva_assert_lock_held(obj);
list_del_init(&va->gem.entry);
va->vm_bo = NULL;
drm_gpuvm_bo_put(vm_bo);
}
EXPORT_SYMBOL_GPL(drm_gpuva_unlink);
/**
* drm_gpuva_find_first() - find the first &drm_gpuva in the given range
* @gpuvm: the &drm_gpuvm to search in
* @addr: the &drm_gpuvas address
* @range: the &drm_gpuvas range
*
* Returns: the first &drm_gpuva within the given range
*/
struct drm_gpuva *
drm_gpuva_find_first(struct drm_gpuvm *gpuvm,
u64 addr, u64 range)
{
u64 last = addr + range - 1;
return drm_gpuva_it_iter_first(&gpuvm->rb.tree, addr, last);
}
EXPORT_SYMBOL_GPL(drm_gpuva_find_first);
/**
* drm_gpuva_find() - find a &drm_gpuva
* @gpuvm: the &drm_gpuvm to search in
* @addr: the &drm_gpuvas address
* @range: the &drm_gpuvas range
*
* Returns: the &drm_gpuva at a given &addr and with a given &range
*/
struct drm_gpuva *
drm_gpuva_find(struct drm_gpuvm *gpuvm,
u64 addr, u64 range)
{
struct drm_gpuva *va;
va = drm_gpuva_find_first(gpuvm, addr, range);
if (!va)
goto out;
if (va->va.addr != addr ||
va->va.range != range)
goto out;
return va;
out:
return NULL;
}
EXPORT_SYMBOL_GPL(drm_gpuva_find);
/**
* drm_gpuva_find_prev() - find the &drm_gpuva before the given address
* @gpuvm: the &drm_gpuvm to search in
* @start: the given GPU VA's start address
*
* Find the adjacent &drm_gpuva before the GPU VA with given &start address.
*
* Note that if there is any free space between the GPU VA mappings no mapping
* is returned.
*
* Returns: a pointer to the found &drm_gpuva or NULL if none was found
*/
struct drm_gpuva *
drm_gpuva_find_prev(struct drm_gpuvm *gpuvm, u64 start)
{
if (!drm_gpuvm_range_valid(gpuvm, start - 1, 1))
return NULL;
return drm_gpuva_it_iter_first(&gpuvm->rb.tree, start - 1, start);
}
EXPORT_SYMBOL_GPL(drm_gpuva_find_prev);
/**
* drm_gpuva_find_next() - find the &drm_gpuva after the given address
* @gpuvm: the &drm_gpuvm to search in
* @end: the given GPU VA's end address
*
* Find the adjacent &drm_gpuva after the GPU VA with given &end address.
*
* Note that if there is any free space between the GPU VA mappings no mapping
* is returned.
*
* Returns: a pointer to the found &drm_gpuva or NULL if none was found
*/
struct drm_gpuva *
drm_gpuva_find_next(struct drm_gpuvm *gpuvm, u64 end)
{
if (!drm_gpuvm_range_valid(gpuvm, end, 1))
return NULL;
return drm_gpuva_it_iter_first(&gpuvm->rb.tree, end, end + 1);
}
EXPORT_SYMBOL_GPL(drm_gpuva_find_next);
/**
* drm_gpuvm_interval_empty() - indicate whether a given interval of the VA space
* is empty
* @gpuvm: the &drm_gpuvm to check the range for
* @addr: the start address of the range
* @range: the range of the interval
*
* Returns: true if the interval is empty, false otherwise
*/
bool
drm_gpuvm_interval_empty(struct drm_gpuvm *gpuvm, u64 addr, u64 range)
{
return !drm_gpuva_find_first(gpuvm, addr, range);
}
EXPORT_SYMBOL_GPL(drm_gpuvm_interval_empty);
/**
* drm_gpuva_map() - helper to insert a &drm_gpuva according to a
* &drm_gpuva_op_map
* @gpuvm: the &drm_gpuvm
* @va: the &drm_gpuva to insert
* @op: the &drm_gpuva_op_map to initialize @va with
*
* Initializes the @va from the @op and inserts it into the given @gpuvm.
*/
void
drm_gpuva_map(struct drm_gpuvm *gpuvm,
struct drm_gpuva *va,
struct drm_gpuva_op_map *op)
{
drm_gpuva_init_from_op(va, op);
drm_gpuva_insert(gpuvm, va);
}
EXPORT_SYMBOL_GPL(drm_gpuva_map);
/**
* drm_gpuva_remap() - helper to remap a &drm_gpuva according to a
* &drm_gpuva_op_remap
* @prev: the &drm_gpuva to remap when keeping the start of a mapping
* @next: the &drm_gpuva to remap when keeping the end of a mapping
* @op: the &drm_gpuva_op_remap to initialize @prev and @next with
*
* Removes the currently mapped &drm_gpuva and remaps it using @prev and/or
* @next.
*/
void
drm_gpuva_remap(struct drm_gpuva *prev,
struct drm_gpuva *next,
struct drm_gpuva_op_remap *op)
{
struct drm_gpuva *va = op->unmap->va;
struct drm_gpuvm *gpuvm = va->vm;
drm_gpuva_remove(va);
if (op->prev) {
drm_gpuva_init_from_op(prev, op->prev);
drm_gpuva_insert(gpuvm, prev);
}
if (op->next) {
drm_gpuva_init_from_op(next, op->next);
drm_gpuva_insert(gpuvm, next);
}
}
EXPORT_SYMBOL_GPL(drm_gpuva_remap);
/**
* drm_gpuva_unmap() - helper to remove a &drm_gpuva according to a
* &drm_gpuva_op_unmap
* @op: the &drm_gpuva_op_unmap specifying the &drm_gpuva to remove
*
* Removes the &drm_gpuva associated with the &drm_gpuva_op_unmap.
*/
void
drm_gpuva_unmap(struct drm_gpuva_op_unmap *op)
{
drm_gpuva_remove(op->va);
}
EXPORT_SYMBOL_GPL(drm_gpuva_unmap);
static int
op_map_cb(const struct drm_gpuvm_ops *fn, void *priv,
u64 addr, u64 range,
struct drm_gem_object *obj, u64 offset)
{
struct drm_gpuva_op op = {};
op.op = DRM_GPUVA_OP_MAP;
op.map.va.addr = addr;
op.map.va.range = range;
op.map.gem.obj = obj;
op.map.gem.offset = offset;
return fn->sm_step_map(&op, priv);
}
static int
op_remap_cb(const struct drm_gpuvm_ops *fn, void *priv,
struct drm_gpuva_op_map *prev,
struct drm_gpuva_op_map *next,
struct drm_gpuva_op_unmap *unmap)
{
struct drm_gpuva_op op = {};
struct drm_gpuva_op_remap *r;
op.op = DRM_GPUVA_OP_REMAP;
r = &op.remap;
r->prev = prev;
r->next = next;
r->unmap = unmap;
return fn->sm_step_remap(&op, priv);
}
static int
op_unmap_cb(const struct drm_gpuvm_ops *fn, void *priv,
struct drm_gpuva *va, bool merge)
{
struct drm_gpuva_op op = {};
op.op = DRM_GPUVA_OP_UNMAP;
op.unmap.va = va;
op.unmap.keep = merge;
return fn->sm_step_unmap(&op, priv);
}
static int
__drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm,
const struct drm_gpuvm_ops *ops, void *priv,
u64 req_addr, u64 req_range,
struct drm_gem_object *req_obj, u64 req_offset)
{
struct drm_gpuva *va, *next;
u64 req_end = req_addr + req_range;
int ret;
if (unlikely(!drm_gpuvm_range_valid(gpuvm, req_addr, req_range)))
return -EINVAL;
drm_gpuvm_for_each_va_range_safe(va, next, gpuvm, req_addr, req_end) {
struct drm_gem_object *obj = va->gem.obj;
u64 offset = va->gem.offset;
u64 addr = va->va.addr;
u64 range = va->va.range;
u64 end = addr + range;
bool merge = !!va->gem.obj;
if (addr == req_addr) {
merge &= obj == req_obj &&
offset == req_offset;
if (end == req_end) {
ret = op_unmap_cb(ops, priv, va, merge);
if (ret)
return ret;
break;
}
if (end < req_end) {
ret = op_unmap_cb(ops, priv, va, merge);
if (ret)
return ret;
continue;
}
if (end > req_end) {
struct drm_gpuva_op_map n = {
.va.addr = req_end,
.va.range = range - req_range,
.gem.obj = obj,
.gem.offset = offset + req_range,
};
struct drm_gpuva_op_unmap u = {
.va = va,
.keep = merge,
};
ret = op_remap_cb(ops, priv, NULL, &n, &u);
if (ret)
return ret;
break;
}
} else if (addr < req_addr) {
u64 ls_range = req_addr - addr;
struct drm_gpuva_op_map p = {
.va.addr = addr,
.va.range = ls_range,
.gem.obj = obj,
.gem.offset = offset,
};
struct drm_gpuva_op_unmap u = { .va = va };
merge &= obj == req_obj &&
offset + ls_range == req_offset;
u.keep = merge;
if (end == req_end) {
ret = op_remap_cb(ops, priv, &p, NULL, &u);
if (ret)
return ret;
break;
}
if (end < req_end) {
ret = op_remap_cb(ops, priv, &p, NULL, &u);
if (ret)
return ret;
continue;
}
if (end > req_end) {
struct drm_gpuva_op_map n = {
.va.addr = req_end,
.va.range = end - req_end,
.gem.obj = obj,
.gem.offset = offset + ls_range +
req_range,
};
ret = op_remap_cb(ops, priv, &p, &n, &u);
if (ret)
return ret;
break;
}
} else if (addr > req_addr) {
merge &= obj == req_obj &&
offset == req_offset +
(addr - req_addr);
if (end == req_end) {
ret = op_unmap_cb(ops, priv, va, merge);
if (ret)
return ret;
break;
}
if (end < req_end) {
ret = op_unmap_cb(ops, priv, va, merge);
if (ret)
return ret;
continue;
}
if (end > req_end) {
struct drm_gpuva_op_map n = {
.va.addr = req_end,
.va.range = end - req_end,
.gem.obj = obj,
.gem.offset = offset + req_end - addr,
};
struct drm_gpuva_op_unmap u = {
.va = va,
.keep = merge,
};
ret = op_remap_cb(ops, priv, NULL, &n, &u);
if (ret)
return ret;
break;
}
}
}
return op_map_cb(ops, priv,
req_addr, req_range,
req_obj, req_offset);
}
static int
__drm_gpuvm_sm_unmap(struct drm_gpuvm *gpuvm,
const struct drm_gpuvm_ops *ops, void *priv,
u64 req_addr, u64 req_range)
{
struct drm_gpuva *va, *next;
u64 req_end = req_addr + req_range;
int ret;
if (unlikely(!drm_gpuvm_range_valid(gpuvm, req_addr, req_range)))
return -EINVAL;
drm_gpuvm_for_each_va_range_safe(va, next, gpuvm, req_addr, req_end) {
struct drm_gpuva_op_map prev = {}, next = {};
bool prev_split = false, next_split = false;
struct drm_gem_object *obj = va->gem.obj;
u64 offset = va->gem.offset;
u64 addr = va->va.addr;
u64 range = va->va.range;
u64 end = addr + range;
if (addr < req_addr) {
prev.va.addr = addr;
prev.va.range = req_addr - addr;
prev.gem.obj = obj;
prev.gem.offset = offset;
prev_split = true;
}
if (end > req_end) {
next.va.addr = req_end;
next.va.range = end - req_end;
next.gem.obj = obj;
next.gem.offset = offset + (req_end - addr);
next_split = true;
}
if (prev_split || next_split) {
struct drm_gpuva_op_unmap unmap = { .va = va };
ret = op_remap_cb(ops, priv,
prev_split ? &prev : NULL,
next_split ? &next : NULL,
&unmap);
if (ret)
return ret;
} else {
ret = op_unmap_cb(ops, priv, va, false);
if (ret)
return ret;
}
}
return 0;
}
/**
* drm_gpuvm_sm_map() - creates the &drm_gpuva_op split/merge steps
* @gpuvm: the &drm_gpuvm representing the GPU VA space
* @req_addr: the start address of the new mapping
* @req_range: the range of the new mapping
* @req_obj: the &drm_gem_object to map
* @req_offset: the offset within the &drm_gem_object
* @priv: pointer to a driver private data structure
*
* This function iterates the given range of the GPU VA space. It utilizes the
* &drm_gpuvm_ops to call back into the driver providing the split and merge
* steps.
*
* Drivers may use these callbacks to update the GPU VA space right away within
* the callback. In case the driver decides to copy and store the operations for
* later processing neither this function nor &drm_gpuvm_sm_unmap is allowed to
* be called before the &drm_gpuvm's view of the GPU VA space was
* updated with the previous set of operations. To update the
* &drm_gpuvm's view of the GPU VA space drm_gpuva_insert(),
* drm_gpuva_destroy_locked() and/or drm_gpuva_destroy_unlocked() should be
* used.
*
* A sequence of callbacks can contain map, unmap and remap operations, but
* the sequence of callbacks might also be empty if no operation is required,
* e.g. if the requested mapping already exists in the exact same way.
*
* There can be an arbitrary amount of unmap operations, a maximum of two remap
* operations and a single map operation. The latter one represents the original
* map operation requested by the caller.
*
* Returns: 0 on success or a negative error code
*/
int
drm_gpuvm_sm_map(struct drm_gpuvm *gpuvm, void *priv,
u64 req_addr, u64 req_range,
struct drm_gem_object *req_obj, u64 req_offset)
{
const struct drm_gpuvm_ops *ops = gpuvm->ops;
if (unlikely(!(ops && ops->sm_step_map &&
ops->sm_step_remap &&
ops->sm_step_unmap)))
return -EINVAL;
return __drm_gpuvm_sm_map(gpuvm, ops, priv,
req_addr, req_range,
req_obj, req_offset);
}
EXPORT_SYMBOL_GPL(drm_gpuvm_sm_map);
/**
* drm_gpuvm_sm_unmap() - creates the &drm_gpuva_ops to split on unmap
* @gpuvm: the &drm_gpuvm representing the GPU VA space
* @priv: pointer to a driver private data structure
* @req_addr: the start address of the range to unmap
* @req_range: the range of the mappings to unmap
*
* This function iterates the given range of the GPU VA space. It utilizes the
* &drm_gpuvm_ops to call back into the driver providing the operations to
* unmap and, if required, split existent mappings.
*
* Drivers may use these callbacks to update the GPU VA space right away within
* the callback. In case the driver decides to copy and store the operations for
* later processing neither this function nor &drm_gpuvm_sm_map is allowed to be
* called before the &drm_gpuvm's view of the GPU VA space was updated
* with the previous set of operations. To update the &drm_gpuvm's view
* of the GPU VA space drm_gpuva_insert(), drm_gpuva_destroy_locked() and/or
* drm_gpuva_destroy_unlocked() should be used.
*
* A sequence of callbacks can contain unmap and remap operations, depending on
* whether there are actual overlapping mappings to split.
*
* There can be an arbitrary amount of unmap operations and a maximum of two
* remap operations.
*
* Returns: 0 on success or a negative error code
*/
int
drm_gpuvm_sm_unmap(struct drm_gpuvm *gpuvm, void *priv,
u64 req_addr, u64 req_range)
{
const struct drm_gpuvm_ops *ops = gpuvm->ops;
if (unlikely(!(ops && ops->sm_step_remap &&
ops->sm_step_unmap)))
return -EINVAL;
return __drm_gpuvm_sm_unmap(gpuvm, ops, priv,
req_addr, req_range);
}
EXPORT_SYMBOL_GPL(drm_gpuvm_sm_unmap);
static struct drm_gpuva_op *
gpuva_op_alloc(struct drm_gpuvm *gpuvm)
{
const struct drm_gpuvm_ops *fn = gpuvm->ops;
struct drm_gpuva_op *op;
if (fn && fn->op_alloc)
op = fn->op_alloc();
else
op = kzalloc(sizeof(*op), GFP_KERNEL);
if (unlikely(!op))
return NULL;
return op;
}
static void
gpuva_op_free(struct drm_gpuvm *gpuvm,
struct drm_gpuva_op *op)
{
const struct drm_gpuvm_ops *fn = gpuvm->ops;
if (fn && fn->op_free)
fn->op_free(op);
else
kfree(op);
}
static int
drm_gpuva_sm_step(struct drm_gpuva_op *__op,
void *priv)
{
struct {
struct drm_gpuvm *vm;
struct drm_gpuva_ops *ops;
} *args = priv;
struct drm_gpuvm *gpuvm = args->vm;
struct drm_gpuva_ops *ops = args->ops;
struct drm_gpuva_op *op;
op = gpuva_op_alloc(gpuvm);
if (unlikely(!op))
goto err;
memcpy(op, __op, sizeof(*op));
if (op->op == DRM_GPUVA_OP_REMAP) {
struct drm_gpuva_op_remap *__r = &__op->remap;
struct drm_gpuva_op_remap *r = &op->remap;
r->unmap = kmemdup(__r->unmap, sizeof(*r->unmap),
GFP_KERNEL);
if (unlikely(!r->unmap))
goto err_free_op;
if (__r->prev) {
r->prev = kmemdup(__r->prev, sizeof(*r->prev),
GFP_KERNEL);
if (unlikely(!r->prev))
goto err_free_unmap;
}
if (__r->next) {
r->next = kmemdup(__r->next, sizeof(*r->next),
GFP_KERNEL);
if (unlikely(!r->next))
goto err_free_prev;
}
}
list_add_tail(&op->entry, &ops->list);
return 0;
err_free_unmap:
kfree(op->remap.unmap);
err_free_prev:
kfree(op->remap.prev);
err_free_op:
gpuva_op_free(gpuvm, op);
err:
return -ENOMEM;
}
static const struct drm_gpuvm_ops gpuvm_list_ops = {
.sm_step_map = drm_gpuva_sm_step,
.sm_step_remap = drm_gpuva_sm_step,
.sm_step_unmap = drm_gpuva_sm_step,
};
/**
* drm_gpuvm_sm_map_ops_create() - creates the &drm_gpuva_ops to split and merge
* @gpuvm: the &drm_gpuvm representing the GPU VA space
* @req_addr: the start address of the new mapping
* @req_range: the range of the new mapping
* @req_obj: the &drm_gem_object to map
* @req_offset: the offset within the &drm_gem_object
*
* This function creates a list of operations to perform splitting and merging
* of existent mapping(s) with the newly requested one.
*
* The list can be iterated with &drm_gpuva_for_each_op and must be processed
* in the given order. It can contain map, unmap and remap operations, but it
* also can be empty if no operation is required, e.g. if the requested mapping
* already exists is the exact same way.
*
* There can be an arbitrary amount of unmap operations, a maximum of two remap
* operations and a single map operation. The latter one represents the original
* map operation requested by the caller.
*
* Note that before calling this function again with another mapping request it
* is necessary to update the &drm_gpuvm's view of the GPU VA space. The
* previously obtained operations must be either processed or abandoned. To
* update the &drm_gpuvm's view of the GPU VA space drm_gpuva_insert(),
* drm_gpuva_destroy_locked() and/or drm_gpuva_destroy_unlocked() should be
* used.
*
* After the caller finished processing the returned &drm_gpuva_ops, they must
* be freed with &drm_gpuva_ops_free.
*
* Returns: a pointer to the &drm_gpuva_ops on success, an ERR_PTR on failure
*/
struct drm_gpuva_ops *
drm_gpuvm_sm_map_ops_create(struct drm_gpuvm *gpuvm,
u64 req_addr, u64 req_range,
struct drm_gem_object *req_obj, u64 req_offset)
{
struct drm_gpuva_ops *ops;
struct {
struct drm_gpuvm *vm;
struct drm_gpuva_ops *ops;
} args;
int ret;
ops = kzalloc(sizeof(*ops), GFP_KERNEL);
if (unlikely(!ops))
return ERR_PTR(-ENOMEM);
INIT_LIST_HEAD(&ops->list);
args.vm = gpuvm;
args.ops = ops;
ret = __drm_gpuvm_sm_map(gpuvm, &gpuvm_list_ops, &args,
req_addr, req_range,
req_obj, req_offset);
if (ret)
goto err_free_ops;
return ops;
err_free_ops:
drm_gpuva_ops_free(gpuvm, ops);
return ERR_PTR(ret);
}
EXPORT_SYMBOL_GPL(drm_gpuvm_sm_map_ops_create);
/**
* drm_gpuvm_sm_unmap_ops_create() - creates the &drm_gpuva_ops to split on
* unmap
* @gpuvm: the &drm_gpuvm representing the GPU VA space
* @req_addr: the start address of the range to unmap
* @req_range: the range of the mappings to unmap
*
* This function creates a list of operations to perform unmapping and, if
* required, splitting of the mappings overlapping the unmap range.
*
* The list can be iterated with &drm_gpuva_for_each_op and must be processed
* in the given order. It can contain unmap and remap operations, depending on
* whether there are actual overlapping mappings to split.
*
* There can be an arbitrary amount of unmap operations and a maximum of two
* remap operations.
*
* Note that before calling this function again with another range to unmap it
* is necessary to update the &drm_gpuvm's view of the GPU VA space. The
* previously obtained operations must be processed or abandoned. To update the
* &drm_gpuvm's view of the GPU VA space drm_gpuva_insert(),
* drm_gpuva_destroy_locked() and/or drm_gpuva_destroy_unlocked() should be
* used.
*
* After the caller finished processing the returned &drm_gpuva_ops, they must
* be freed with &drm_gpuva_ops_free.
*
* Returns: a pointer to the &drm_gpuva_ops on success, an ERR_PTR on failure
*/
struct drm_gpuva_ops *
drm_gpuvm_sm_unmap_ops_create(struct drm_gpuvm *gpuvm,
u64 req_addr, u64 req_range)
{
struct drm_gpuva_ops *ops;
struct {
struct drm_gpuvm *vm;
struct drm_gpuva_ops *ops;
} args;
int ret;
ops = kzalloc(sizeof(*ops), GFP_KERNEL);
if (unlikely(!ops))
return ERR_PTR(-ENOMEM);
INIT_LIST_HEAD(&ops->list);
args.vm = gpuvm;
args.ops = ops;
ret = __drm_gpuvm_sm_unmap(gpuvm, &gpuvm_list_ops, &args,
req_addr, req_range);
if (ret)
goto err_free_ops;
return ops;
err_free_ops:
drm_gpuva_ops_free(gpuvm, ops);
return ERR_PTR(ret);
}
EXPORT_SYMBOL_GPL(drm_gpuvm_sm_unmap_ops_create);
/**
* drm_gpuvm_prefetch_ops_create() - creates the &drm_gpuva_ops to prefetch
* @gpuvm: the &drm_gpuvm representing the GPU VA space
* @addr: the start address of the range to prefetch
* @range: the range of the mappings to prefetch
*
* This function creates a list of operations to perform prefetching.
*
* The list can be iterated with &drm_gpuva_for_each_op and must be processed
* in the given order. It can contain prefetch operations.
*
* There can be an arbitrary amount of prefetch operations.
*
* After the caller finished processing the returned &drm_gpuva_ops, they must
* be freed with &drm_gpuva_ops_free.
*
* Returns: a pointer to the &drm_gpuva_ops on success, an ERR_PTR on failure
*/
struct drm_gpuva_ops *
drm_gpuvm_prefetch_ops_create(struct drm_gpuvm *gpuvm,
u64 addr, u64 range)
{
struct drm_gpuva_ops *ops;
struct drm_gpuva_op *op;
struct drm_gpuva *va;
u64 end = addr + range;
int ret;
ops = kzalloc(sizeof(*ops), GFP_KERNEL);
if (!ops)
return ERR_PTR(-ENOMEM);
INIT_LIST_HEAD(&ops->list);
drm_gpuvm_for_each_va_range(va, gpuvm, addr, end) {
op = gpuva_op_alloc(gpuvm);
if (!op) {
ret = -ENOMEM;
goto err_free_ops;
}
op->op = DRM_GPUVA_OP_PREFETCH;
op->prefetch.va = va;
list_add_tail(&op->entry, &ops->list);
}
return ops;
err_free_ops:
drm_gpuva_ops_free(gpuvm, ops);
return ERR_PTR(ret);
}
EXPORT_SYMBOL_GPL(drm_gpuvm_prefetch_ops_create);
/**
* drm_gpuvm_bo_unmap_ops_create() - creates the &drm_gpuva_ops to unmap a GEM
* @vm_bo: the &drm_gpuvm_bo abstraction
*
* This function creates a list of operations to perform unmapping for every
* GPUVA attached to a GEM.
*
* The list can be iterated with &drm_gpuva_for_each_op and consists out of an
* arbitrary amount of unmap operations.
*
* After the caller finished processing the returned &drm_gpuva_ops, they must
* be freed with &drm_gpuva_ops_free.
*
* It is the callers responsibility to protect the GEMs GPUVA list against
* concurrent access using the GEMs dma_resv lock.
*
* Returns: a pointer to the &drm_gpuva_ops on success, an ERR_PTR on failure
*/
struct drm_gpuva_ops *
drm_gpuvm_bo_unmap_ops_create(struct drm_gpuvm_bo *vm_bo)
{
struct drm_gpuva_ops *ops;
struct drm_gpuva_op *op;
struct drm_gpuva *va;
int ret;
drm_gem_gpuva_assert_lock_held(vm_bo->obj);
ops = kzalloc(sizeof(*ops), GFP_KERNEL);
if (!ops)
return ERR_PTR(-ENOMEM);
INIT_LIST_HEAD(&ops->list);
drm_gpuvm_bo_for_each_va(va, vm_bo) {
op = gpuva_op_alloc(vm_bo->vm);
if (!op) {
ret = -ENOMEM;
goto err_free_ops;
}
op->op = DRM_GPUVA_OP_UNMAP;
op->unmap.va = va;
list_add_tail(&op->entry, &ops->list);
}
return ops;
err_free_ops:
drm_gpuva_ops_free(vm_bo->vm, ops);
return ERR_PTR(ret);
}
EXPORT_SYMBOL_GPL(drm_gpuvm_bo_unmap_ops_create);
/**
* drm_gpuva_ops_free() - free the given &drm_gpuva_ops
* @gpuvm: the &drm_gpuvm the ops were created for
* @ops: the &drm_gpuva_ops to free
*
* Frees the given &drm_gpuva_ops structure including all the ops associated
* with it.
*/
void
drm_gpuva_ops_free(struct drm_gpuvm *gpuvm,
struct drm_gpuva_ops *ops)
{
struct drm_gpuva_op *op, *next;
drm_gpuva_for_each_op_safe(op, next, ops) {
list_del(&op->entry);
if (op->op == DRM_GPUVA_OP_REMAP) {
kfree(op->remap.prev);
kfree(op->remap.next);
kfree(op->remap.unmap);
}
gpuva_op_free(gpuvm, op);
}
kfree(ops);
}
EXPORT_SYMBOL_GPL(drm_gpuva_ops_free);
MODULE_DESCRIPTION("DRM GPUVM");
MODULE_LICENSE("GPL");