Commit Graph

59 Commits

Author SHA1 Message Date
Thierry Reding
3e9c458433 gpu: host1x: Do not use mapping cache for job submissions
Buffer mappings used in job submissions are usually small and not
rapidly reused as opposed to framebuffers (which are usually large and
rapidly reused, for example when page-flipping between double-buffered
framebuffers). Avoid going through the mapping cache for these buffers
since the cache would also lead to leaks if nobody is ever releasing
the cache's last reference. For DRM/KMS these last references are
dropped when the framebuffers are removed and therefore no longer
needed.

While at it, also add a note about the need to explicitly remove the
final reference to the mapping in the cache.

Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2022-04-06 15:12:36 +02:00
Randy Dunlap
fe696ccb27 gpu: host1x: Fix a kernel-doc warning
Add @cache description to eliminate a kernel-doc warning.

include/linux/host1x.h:104: warning: Function parameter or member 'cache' not described in 'host1x_client'

Fixes: 1f39b1dfa5 ("drm/tegra: Implement buffer object cache")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Thierry Reding <treding@nvidia.com>
Cc: linux-tegra@vger.kernel.org
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Thierry Reding <treding@nvidia.com>
2022-04-06 15:08:17 +02:00
Dmitry Osipenko
9ca790f446 gpu: host1x: Add host1x_channel_stop()
Add host1x_channel_stop() which waits till channel becomes idle and then
stops the channel hardware. This is needed for supporting suspend/resume
by host1x drivers since the hardware state is lost after power-gating,
thus the channel needs to be stopped before client enters into suspend.

Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya T30
Tested-by: Paul Fertser <fercerpav@gmail.com> # PAZ00 T20
Tested-by: Nicolas Chauvet <kwizart@gmail.com> # PAZ00 T20 and TK1 T124
Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya T30
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-12-16 14:07:07 +01:00
Mikko Perttunen
46f226c93d drm/tegra: Add NVDEC driver
Add support for booting and using NVDEC on Tegra210, Tegra186
and Tegra194 to the Host1x and TegraDRM drivers. Booting in
secure mode is not currently supported.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-12-16 14:07:06 +01:00
Thierry Reding
1f39b1dfa5 drm/tegra: Implement buffer object cache
This cache is used to avoid mapping and unmapping buffer objects
unnecessarily. Mappings are cached per client and stay hot until
the buffer object is destroyed.

Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-12-16 14:07:06 +01:00
Thierry Reding
c6aeaf56f4 drm/tegra: Implement correct DMA-BUF semantics
DMA-BUF requires that each device that accesses a DMA-BUF attaches to it
separately. To do so the host1x_bo_pin() and host1x_bo_unpin() functions
need to be reimplemented so that they can return a mapping, which either
represents an attachment or a map of the driver's own GEM object.

Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-12-16 14:07:06 +01:00
Mikko Perttunen
0fddaa85d6 gpu: host1x: Add option to skip firewall for a job
The new UAPI will have its own firewall, and we don't want to run
the firewall in the Host1x driver for those jobs. As such, add a
parameter to host1x_job_alloc to specify if we want to skip the
firewall in the Host1x driver.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-08-10 14:42:49 +02:00
Mikko Perttunen
e902585fc8 gpu: host1x: Add support for syncpoint waits in CDMA pushbuffer
Add support for inserting syncpoint waits in the CDMA pushbuffer.
These waits need to be done in HOST1X class, while gather submitted
by the application execute in engine class.

Support is added by converting the gather list of job into a command
list that can include both gathers and waits. When the job is
submitted, these commands are pushed as the appropriate opcodes
on the CDMA pushbuffer.

Also supported are waits relative to the start of the job,
which are useful for jobs doing multiple things with an engine
that doesn't natively support pipelining.

While at it, use 32-bit waits on chips that support them.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-08-10 14:41:19 +02:00
Mikko Perttunen
17a298e9ac gpu: host1x: Add job release callback
Add a callback field to the job structure, to be called just before
the job is to be freed. This allows the job's submitter to clean
up any of its own state, like decrement runtime PM refcounts.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-08-10 14:41:02 +02:00
Mikko Perttunen
c78f837ae3 gpu: host1x: Add no-recovery mode
Add a new property for jobs to enable or disable recovery i.e.
CPU increments of syncpoints to max value on job timeout. This
allows for a more solid model for hanged jobs, where userspace
doesn't need to guess if a syncpoint increment happened because
the job completed, or because job timeout was triggered.

On job timeout, we stop the channel, NOP all future jobs on the
channel using the same syncpoint, mark the syncpoint as locked
and resume the channel from the next job, if any.

The future jobs are NOPed, since because we don't do the CPU
increments, the value of the syncpoint is no longer synchronized,
and any waiters would become confused if a future job incremented
the syncpoint. The syncpoint is marked locked to ensure that any
future jobs cannot increment the syncpoint either, until the
application has recognized the situation and reallocated the
syncpoint.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-08-10 14:40:23 +02:00
Mikko Perttunen
687db2207b gpu: host1x: Add DMA fence implementation
Add an implementation of dma_fences based on syncpoints. Syncpoint
interrupts are used to signal fences. Additionally, after
software signaling has been enabled, a 30 second timeout is started.
If the syncpoint threshold is not reached within this period,
the fence is signalled with an -ETIMEDOUT error code. This is to
allow fences that would never reach their syncpoint threshold to
be cleaned up. The timeout can potentially be removed in the future
after job tracking code has been refactored.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-08-10 14:39:50 +02:00
Thierry Reding
0cfe5a6e75 gpu: host1x: Split up client initalization and registration
In some cases we may need to initialize the host1x client first before
registering it. This commit adds a new helper that will do nothing but
the initialization of the data structure.

At the same time, the initialization is removed from the registration
function. Note, however, that for simplicity we explicitly initialize
the client when the host1x_client_register() function is called, as
opposed to the low-level __host1x_client_register() function. This
allows existing callers to remain unchanged.

Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-05-17 12:31:05 +02:00
Thierry Reding
933deb8c7b gpu: host1x: Add early init and late exit callbacks
These callbacks can be used by client drivers to run code during early
init and during late exit. Early init callbacks are run prior to the
regular init callbacks while late exit callbacks run after the regular
exit callbacks.

Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-03-31 17:42:14 +02:00
Mikko Perttunen
f5ba33fb96 gpu: host1x: Reserve VBLANK syncpoints at initialization
On T20-T148 chips, the bootloader can set up a boot splash
screen with DC configured to increment syncpoint 26/27
at VBLANK. Because of this we shouldn't allow these syncpoints
to be allocated until DC has been reset and will no longer
increment them in the background.

As such, on these chips, reserve those two syncpoints at
initialization, and only mark them free once the DC
driver has indicated it's safe to do so.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-03-31 17:42:13 +02:00
Mikko Perttunen
2aed4f5ab0 gpu: host1x: Cleanup and refcounting for syncpoints
Add reference counting for allocated syncpoints to allow keeping
them allocated while jobs are referencing them. Additionally,
clean up various places using syncpoint IDs to use host1x_syncpt
pointers instead.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-03-31 17:42:13 +02:00
Mikko Perttunen
86cec7ece3 gpu: host1x: Allow syncpoints without associated client
Syncpoints don't need to be associated with any client,
so remove the property, and expose host1x_syncpt_alloc.
This will allow allocating syncpoints without prior knowledge
of the engine that it will be used with.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-03-30 19:53:24 +02:00
Mikko Perttunen
a24f98176d gpu: host1x: Use different lock classes for each client
To avoid false lockdep warnings, give each client lock a different
lock class, passed from the initialization site by macro.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-03-30 19:37:20 +02:00
Sowjanya Komatineni
cf5153e433 media: gpu: host1x: mipi: Keep MIPI clock enabled and mutex locked till calibration done
With the split of MIPI calibration into tegra_mipi_calibrate() and
tegra_mipi_wait(), MIPI clock is not kept enabled and mutex is not locked
till the calibration is done.

So, this patch keeps MIPI clock enabled and mutex locked after triggering
start of calibration till its done.

To let calibration process go through its finite sequence codes before
calibration logic waiting for pads idle state added wait time of 75usec
to make sure it sees idle state to apply the results.

This patch renames tegra_mipi_calibrate() as tegra_mipi_start_calibration()
and tegra_mipi_wait() as tegra_mipi_finish_calibration() to be inline
with their usage.

Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Sowjanya Komatineni <skomatineni@nvidia.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-08-28 15:12:38 +02:00
Sowjanya Komatineni
b3f1b76071 gpu: host1x: mipi: Split tegra_mipi_calibrate() and tegra_mipi_wait()
SW can trigger MIPI pads calibration any time after power on
but calibration results will be latched and applied to the pads
by MIPI CAL unit only when the link is in LP-11 state and then
status register will be updated.

For CSI, trigger of pads calibration happen during CSI stream
enable where CSI receiver is kept ready prior to sensor or CSI
transmitter stream start.

So, pads may not be in LP-11 at this time and waiting for the
calibration to be done immediate after calibration start will
result in timeout.

This patch splits tegra_mipi_calibrate() and tegra_mipi_wait()
so triggering for calibration and waiting for it to complete can
happen at different stages.

Signed-off-by: Sowjanya Komatineni <skomatineni@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2020-07-17 16:06:14 +02:00
Sowjanya Komatineni
767598d447 gpu: host1x: mipi: Update tegra_mipi_request() to be node based
Tegra CSI driver need a separate MIPI device for each channel as
calibration of corresponding MIPI pads for each channel should
happen independently.

So, this patch updates tegra_mipi_request() API to add a device_node
pointer argument to allow creating mipi device for specific device
node rather than a device.

Signed-off-by: Sowjanya Komatineni <skomatineni@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2020-07-17 16:06:13 +02:00
Colton Lewis
2fd2bc7f49 gpu: host1x: Correct trivial kernel-doc inconsistencies
Silence documentation build warnings by adding kernel-doc fields.

./include/linux/host1x.h:69: warning: Function parameter or member 'parent' not described in 'host1x_client'
./include/linux/host1x.h:69: warning: Function parameter or member 'usecount' not described in 'host1x_client'
./include/linux/host1x.h:69: warning: Function parameter or member 'lock' not described in 'host1x_client'

Signed-off-by: Colton Lewis <colton.w.lewis@protonmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2020-06-16 18:59:45 +02:00
Thierry Reding
501be6c1c7 drm/tegra: Fix SMMU support on Tegra124 and Tegra210
When testing whether or not to enable the use of the SMMU, consult the
supported DMA mask rather than the actually configured DMA mask, since
the latter might already have been restricted.

Fixes: 2d9384ff91 ("drm/tegra: Relax IOMMU usage criteria on old Tegra")
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2020-04-28 11:44:07 +02:00
Dave Airlie
fd7226fbb2 Merge tag 'drm/tegra/for-5.6-rc1' of git://anongit.freedesktop.org/tegra/linux into drm-next
drm/tegra: Changes for v5.6-rc1

This contains a small set of mostly fixes and some minor improvements.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thierry Reding <thierry.reding@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200111004835.2412858-1-thierry.reding@gmail.com
2020-01-15 16:21:28 +10:00
Thierry Reding
fd67e9c6ed drm/tegra: Do not implement runtime PM
The Tegra DRM driver heavily relies on the implementations for runtime
suspend/resume to be called at specific times. Unfortunately, there are
some cases where that doesn't work. One example is if the user disables
runtime PM for a given subdevice. Another example is that the PM core
acquires a reference to runtime PM during system sleep, effectively
preventing devices from going into low power modes. This is intentional
to avoid nasty race conditions, but it also causes system sleep to not
function properly on all Tegra systems.

Fix this by not implementing runtime PM at all. Instead, a minimal,
reference-counted suspend/resume infrastructure is added to the host1x
bus. This has the benefit that it can be used regardless of the system
power state (or any transitions we might be in), or whether or not the
user allows runtime PM.

Atomic modesetting guarantees that these functions will end up being
called at the right point in time, so the pitfalls for the more generic
runtime PM do not apply here.

Signed-off-by: Thierry Reding <treding@nvidia.com>
2020-01-10 16:37:43 +01:00
Thierry Reding
608f43ad27 gpu: host1x: Rename "parent" to "host"
Rename the host1x clients' parent to "host" because that more closely
describes what it is. The parent can be confused with the parent device
in terms of the device hierarchy. Subsequent patches will add a new
member that refers to the parent in that hierarchy.

Signed-off-by: Thierry Reding <treding@nvidia.com>
2020-01-10 16:37:38 +01:00