Commit Graph

283 Commits

Author SHA1 Message Date
Thierry Reding
c3dbfb9c49 gpu: host1x: Plug potential memory leak
The memory allocated for a DMA fence could be leaked if the code failed
to allocate the waiter object. Make sure to release the fence allocation
on failure.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-09-16 18:06:52 +02:00
Dmitry Osipenko
a81cf839a0 gpu/host1x: fence: Make spinlock static
The DEFINE_SPINLOCK macro creates a global spinlock symbol that is visible
to the whole kernel. This is unintended in the code, fix it.

Fixes: 687db2207b ("gpu: host1x: Add DMA fence implementation")
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-09-16 18:06:51 +02:00
Thierry Reding
fed0289394 gpu: host1x: debug: Dump DMASTART and DMAEND register
Show the values of the DMASTART and DMAEND registers when dumping status
to help with failure analysis.

Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-08-13 18:23:32 +02:00
Thierry Reding
afa770fe57 gpu: host1x: debug: Dump only relevant parts of CDMA push buffer
Dumping the full CDMA push buffer takes a long time and isn't very
useful since most of the contents are not relevant. Instead only show
the CDMA push buffer entries associated with current jobs.

While at it, tweak the indentation a bit to make the output more
readable.

Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-08-13 18:23:32 +02:00
Thierry Reding
ff41dd2748 gpu: host1x: debug: Use dma_addr_t more consistently
The host1x debug code uses a mix of phys_addr_t, dma_addr_t and u32 to
represent addresses. However, these addresses are always DMA addresses
so use the appropriate type.

This fixes some issues with how these addresses are displayed, because
they could be truncated in some cases and not show the full address.

Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-08-13 18:23:32 +02:00
Mikko Perttunen
0fddaa85d6 gpu: host1x: Add option to skip firewall for a job
The new UAPI will have its own firewall, and we don't want to run
the firewall in the Host1x driver for those jobs. As such, add a
parameter to host1x_job_alloc to specify if we want to skip the
firewall in the Host1x driver.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-08-10 14:42:49 +02:00
Mikko Perttunen
e902585fc8 gpu: host1x: Add support for syncpoint waits in CDMA pushbuffer
Add support for inserting syncpoint waits in the CDMA pushbuffer.
These waits need to be done in HOST1X class, while gather submitted
by the application execute in engine class.

Support is added by converting the gather list of job into a command
list that can include both gathers and waits. When the job is
submitted, these commands are pushed as the appropriate opcodes
on the CDMA pushbuffer.

Also supported are waits relative to the start of the job,
which are useful for jobs doing multiple things with an engine
that doesn't natively support pipelining.

While at it, use 32-bit waits on chips that support them.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-08-10 14:41:19 +02:00
Mikko Perttunen
17a298e9ac gpu: host1x: Add job release callback
Add a callback field to the job structure, to be called just before
the job is to be freed. This allows the job's submitter to clean
up any of its own state, like decrement runtime PM refcounts.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-08-10 14:41:02 +02:00
Mikko Perttunen
c78f837ae3 gpu: host1x: Add no-recovery mode
Add a new property for jobs to enable or disable recovery i.e.
CPU increments of syncpoints to max value on job timeout. This
allows for a more solid model for hanged jobs, where userspace
doesn't need to guess if a syncpoint increment happened because
the job completed, or because job timeout was triggered.

On job timeout, we stop the channel, NOP all future jobs on the
channel using the same syncpoint, mark the syncpoint as locked
and resume the channel from the next job, if any.

The future jobs are NOPed, since because we don't do the CPU
increments, the value of the syncpoint is no longer synchronized,
and any waiters would become confused if a future job incremented
the syncpoint. The syncpoint is marked locked to ensure that any
future jobs cannot increment the syncpoint either, until the
application has recognized the situation and reallocated the
syncpoint.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-08-10 14:40:23 +02:00
Mikko Perttunen
687db2207b gpu: host1x: Add DMA fence implementation
Add an implementation of dma_fences based on syncpoints. Syncpoint
interrupts are used to signal fences. Additionally, after
software signaling has been enabled, a 30 second timeout is started.
If the syncpoint threshold is not reached within this period,
the fence is signalled with an -ETIMEDOUT error code. This is to
allow fences that would never reach their syncpoint threshold to
be cleaned up. The timeout can potentially be removed in the future
after job tracking code has been refactored.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-08-10 14:39:50 +02:00
Thierry Reding
0cfe5a6e75 gpu: host1x: Split up client initalization and registration
In some cases we may need to initialize the host1x client first before
registering it. This commit adds a new helper that will do nothing but
the initialization of the data structure.

At the same time, the initialization is removed from the registration
function. Note, however, that for simplicity we explicitly initialize
the client when the host1x_client_register() function is called, as
opposed to the low-level __host1x_client_register() function. This
allows existing callers to remain unchanged.

Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-05-17 12:31:05 +02:00
Thierry Reding
933deb8c7b gpu: host1x: Add early init and late exit callbacks
These callbacks can be used by client drivers to run code during early
init and during late exit. Early init callbacks are run prior to the
regular init callbacks while late exit callbacks run after the regular
exit callbacks.

Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-03-31 17:42:14 +02:00
Jon Hunter
d3555eb7f8 gpu: host1x: Fix Tegra194 syncpt interrupt threshold
Syncpoint interrupts are not working as expected on Tegra194. The
problem is that the syncpoint interrupt threshold being used is the
global interrupt threshold and not the virtual interrupt threshold.
Fix this by using the virtual interrupt threshold which aligns with
downstream.

Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-03-31 17:42:14 +02:00
Mikko Perttunen
5a8d95d20c gpu: host1x: Assign intr waiter inside lock
Move the assignment of the ref out-pointer in host1x_intr_add_action
to happen within the spinlock. With the current arrangement,
it is possible for the waiter to complete before the assignment
has happened, which breaks horribly if the waiter completion
callback tries to use the reference.

In practice, there is currently no situation where this issue can
manifest -- it was first noticed with the upcoming DMA fence
implementation patches. As such this doesn't need to be backported.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-03-31 17:42:14 +02:00
Mikko Perttunen
f5ba33fb96 gpu: host1x: Reserve VBLANK syncpoints at initialization
On T20-T148 chips, the bootloader can set up a boot splash
screen with DC configured to increment syncpoint 26/27
at VBLANK. Because of this we shouldn't allow these syncpoints
to be allocated until DC has been reset and will no longer
increment them in the background.

As such, on these chips, reserve those two syncpoints at
initialization, and only mark them free once the DC
driver has indicated it's safe to do so.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-03-31 17:42:13 +02:00
Mikko Perttunen
aded42ada6 gpu: host1x: Reset max value when freeing a syncpoint
With job recovery becoming optional, syncpoints may have a mismatch
between their value and max value when freed. As such, when freeing,
set the max value to the current value of the syncpoint so that it
is in a sane state for the next user.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-03-31 17:42:13 +02:00
Mikko Perttunen
2aed4f5ab0 gpu: host1x: Cleanup and refcounting for syncpoints
Add reference counting for allocated syncpoints to allow keeping
them allocated while jobs are referencing them. Additionally,
clean up various places using syncpoint IDs to use host1x_syncpt
pointers instead.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-03-31 17:42:13 +02:00
Mikko Perttunen
f63b42cbc8 gpu: host1x: Use HW-equivalent syncpoint expiration check
Make syncpoint expiration checks always use the same logic used by
the hardware. This ensures that there are no race conditions that
could occur because of the hardware triggering a syncpoint interrupt
and then the driver disagreeing.

One situation where this could occur is if a job incremented a
syncpoint too many times -- then the hardware would trigger an
interrupt, but the driver would assume that a syncpoint value
greater than the syncpoint's max value is in the future, and not
clean up the job.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-03-30 19:53:24 +02:00
Mikko Perttunen
ecfb888ade gpu: host1x: Remove cancelled waiters immediately
Before this patch, cancelled waiters would only be cleaned up
once their threshold value was reached. Make host1x_intr_put_ref
process the cancellation immediately to fix this.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-03-30 19:53:24 +02:00
Mikko Perttunen
49a5fb1679 gpu: host1x: Show number of pending waiters in debugfs
Show the number of pending waiters in the debugfs status file.
This is useful for testing to verify that waiters do not leak
or accumulate incorrectly.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-03-30 19:53:24 +02:00
Mikko Perttunen
86cec7ece3 gpu: host1x: Allow syncpoints without associated client
Syncpoints don't need to be associated with any client,
so remove the property, and expose host1x_syncpt_alloc.
This will allow allocating syncpoints without prior knowledge
of the engine that it will be used with.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-03-30 19:53:24 +02:00
Mikko Perttunen
a24f98176d gpu: host1x: Use different lock classes for each client
To avoid false lockdep warnings, give each client lock a different
lock class, passed from the initialization site by macro.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-03-30 19:37:20 +02:00
Lee Jones
d2a58fd1f0 gpu/host1x: bus: Add missing description for 'driver'
Fixes the following W=1 kernel build warning(s):

 drivers/gpu/host1x/bus.c:40: warning: Function parameter or member 'driver' not described in 'host1x_subdev_add'

Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Cc: linux-tegra@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20201105144517.1826692-2-lee.jones@linaro.org
2020-11-05 22:12:55 +01:00
Linus Torvalds
93b694d096 Merge tag 'drm-next-2020-10-15' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
 "Not a major amount of change, the i915 trees got split into display
  and gt trees to better facilitate higher level review, and there's a
  major refactoring of i915 GEM locking to use more core kernel concepts
  (like ww-mutexes). msm gets per-process pagetables, older AMD SI cards
  get DC support, nouveau got a bump in displayport support with common
  code extraction from i915.

  Outside of drm this contains a couple of patches for hexint
  moduleparams which you've acked, and a virtio common code tree that
  you should also get via it's regular path.

  New driver:
   - Cadence MHDP8546 DisplayPort bridge driver

  core:
   - cross-driver scatterlist cleanups
   - devm_drm conversions
   - remove drm_dev_init
   - devm_drm_dev_alloc conversion

  ttm:
   - lots of refactoring and cleanups

  bridges:
   - chained bridge support in more drivers

  panel:
   - misc new panels

  scheduler:
   - cleanup priority levels

  displayport:
   - refactor i915 code into helpers for nouveau

  i915:
   - split into display and GT trees
   - WW locking refactoring in GEM
   - execbuf2 extension mechanism
   - syncobj timeline support
   - GEN 12 HOBL display powersaving
   - Rocket Lake display additions
   - Disable FBC on Tigerlake
   - Tigerlake Type-C + DP improvements
   - Hotplug interrupt refactoring

  amdgpu:
   - Sienna Cichlid updates
   - Navy Flounder updates
   - DCE6 (SI) support for DC
   - Plane rotation enabled
   - TMZ state info ioctl
   - PCIe DPC recovery support
   - DC interrupt handling refactor
   - OLED panel fixes

  amdkfd:
   - add SMI events for thermal throttling
   - SMI interface events ioctl update
   - process eviction counters

  radeon:
   - move to dma_ for allocations
   - expose sclk via sysfs

  msm:
   - DSI support for sm8150/sm8250
   - per-process GPU pagetable support
   - Displayport support

  mediatek:
   - move HDMI phy driver to PHY
   - convert mtk-dpi to bridge API
   - disable mt2701 tmds

  tegra:
   - bridge support

  exynos:
   - misc cleanups

  vc4:
   - dual display cleanups

  ast:
   - cleanups

  gma500:
   - conversion to GPIOd API

  hisilicon:
   - misc reworks

  ingenic:
   - clock handling and format improvements

  mcde:
   - DSI support

  mgag200:
   - desktop g200 support

  mxsfb:
   - i.MX7 + i.MX8M
   - alpha plane support

  panfrost:
   - devfreq support
   - amlogic SoC support

  ps8640:
   - EDID from eDP retrieval

  tidss:
   - AM65xx YUV workaround

  virtio:
   - virtio-gpu exported resources

  rcar-du:
   - R8A7742, R8A774E1 and R8A77961 support
   - YUV planar format fixes
   - non-visible plane handling
   - VSP device reference count fix
   - Kconfig fix to avoid displaying disabled options in .config"

* tag 'drm-next-2020-10-15' of git://anongit.freedesktop.org/drm/drm: (1494 commits)
  drm/ingenic: Fix bad revert
  drm/amdgpu: Fix invalid number of character '{' in amdgpu_acpi_init
  drm/amdgpu: Remove warning for virtual_display
  drm/amdgpu: kfd_initialized can be static
  drm/amd/pm: setup APU dpm clock table in SMU HW initialization
  drm/amdgpu: prevent spurious warning
  drm/amdgpu/swsmu: fix ARC build errors
  drm/amd/display: Fix OPTC_DATA_FORMAT programming
  drm/amd/display: Don't allow pstate if no support in blank
  drm/panfrost: increase readl_relaxed_poll_timeout values
  MAINTAINERS: Update entry for st7703 driver after the rename
  Revert "gpu/drm: ingenic: Add option to mmap GEM buffers cached"
  drm/amd/display: HDMI remote sink need mode validation for Linux
  drm/amd/display: Change to correct unit on audio rate
  drm/amd/display: Avoid set zero in the requested clk
  drm/amdgpu: align frag_end to covered address space
  drm/amdgpu: fix NULL pointer dereference for Renoir
  drm/vmwgfx: fix regression in thp code due to ttm init refactor.
  drm/amdgpu/swsmu: add interrupt work handler for smu11 parts
  drm/amdgpu/swsmu: add interrupt work function
  ...
2020-10-15 10:46:16 -07:00
Marek Szyprowski
67ed9f9d95 drm: host1x: fix common struct sg_table related issues
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2020-09-10 08:18:35 +02:00