mirror of
https://github.com/armbian/linux-cix.git
synced 2026-01-06 12:30:45 -08:00
accel/qaic: Add documentation for AIC100 accelerator driver
The Qualcomm Cloud AI 100 (AIC100) device is an Artificial Intelligence accelerator PCIe card. It contains a number of components both in the SoC and on the card which facilitate running workloads: QSM: management processor NSPs: workload compute units DMA Bridge: dedicated data mover for the workloads MHI: multiplexed communication channels DDR: workload storage and memory The Linux kernel driver for AIC100 is called "QAIC" and is located in the accel subsystem. Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Reviewed-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Acked-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1679932497-30277-2-git-send-email-quic_jhugo@quicinc.com
This commit is contained in:
committed by
Jacek Lawrynowicz
parent
f2c7ca8901
commit
830f3f27a3
@@ -8,6 +8,7 @@ Compute Accelerators
|
||||
:maxdepth: 1
|
||||
|
||||
introduction
|
||||
qaic/index
|
||||
|
||||
.. only:: subproject and html
|
||||
|
||||
|
||||
510
Documentation/accel/qaic/aic100.rst
Normal file
510
Documentation/accel/qaic/aic100.rst
Normal file
File diff suppressed because it is too large
Load Diff
13
Documentation/accel/qaic/index.rst
Normal file
13
Documentation/accel/qaic/index.rst
Normal file
@@ -0,0 +1,13 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0-only
|
||||
|
||||
=====================================
|
||||
accel/qaic Qualcomm Cloud AI driver
|
||||
=====================================
|
||||
|
||||
The accel/qaic driver supports the Qualcomm Cloud AI machine learning
|
||||
accelerator cards.
|
||||
|
||||
.. toctree::
|
||||
|
||||
qaic
|
||||
aic100
|
||||
170
Documentation/accel/qaic/qaic.rst
Normal file
170
Documentation/accel/qaic/qaic.rst
Normal file
@@ -0,0 +1,170 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0-only
|
||||
|
||||
=============
|
||||
QAIC driver
|
||||
=============
|
||||
|
||||
The QAIC driver is the Kernel Mode Driver (KMD) for the AIC100 family of AI
|
||||
accelerator products.
|
||||
|
||||
Interrupts
|
||||
==========
|
||||
|
||||
While the AIC100 DMA Bridge hardware implements an IRQ storm mitigation
|
||||
mechanism, it is still possible for an IRQ storm to occur. A storm can happen
|
||||
if the workload is particularly quick, and the host is responsive. If the host
|
||||
can drain the response FIFO as quickly as the device can insert elements into
|
||||
it, then the device will frequently transition the response FIFO from empty to
|
||||
non-empty and generate MSIs at a rate equivalent to the speed of the
|
||||
workload's ability to process inputs. The lprnet (license plate reader network)
|
||||
workload is known to trigger this condition, and can generate in excess of 100k
|
||||
MSIs per second. It has been observed that most systems cannot tolerate this
|
||||
for long, and will crash due to some form of watchdog due to the overhead of
|
||||
the interrupt controller interrupting the host CPU.
|
||||
|
||||
To mitigate this issue, the QAIC driver implements specific IRQ handling. When
|
||||
QAIC receives an IRQ, it disables that line. This prevents the interrupt
|
||||
controller from interrupting the CPU. Then AIC drains the FIFO. Once the FIFO
|
||||
is drained, QAIC implements a "last chance" polling algorithm where QAIC will
|
||||
sleep for a time to see if the workload will generate more activity. The IRQ
|
||||
line remains disabled during this time. If no activity is detected, QAIC exits
|
||||
polling mode and reenables the IRQ line.
|
||||
|
||||
This mitigation in QAIC is very effective. The same lprnet usecase that
|
||||
generates 100k IRQs per second (per /proc/interrupts) is reduced to roughly 64
|
||||
IRQs over 5 minutes while keeping the host system stable, and having the same
|
||||
workload throughput performance (within run to run noise variation).
|
||||
|
||||
|
||||
Neural Network Control (NNC) Protocol
|
||||
=====================================
|
||||
|
||||
The implementation of NNC is split between the KMD (QAIC) and UMD. In general
|
||||
QAIC understands how to encode/decode NNC wire protocol, and elements of the
|
||||
protocol which require kernel space knowledge to process (for example, mapping
|
||||
host memory to device IOVAs). QAIC understands the structure of a message, and
|
||||
all of the transactions. QAIC does not understand commands (the payload of a
|
||||
passthrough transaction).
|
||||
|
||||
QAIC handles and enforces the required little endianness and 64-bit alignment,
|
||||
to the degree that it can. Since QAIC does not know the contents of a
|
||||
passthrough transaction, it relies on the UMD to satisfy the requirements.
|
||||
|
||||
The terminate transaction is of particular use to QAIC. QAIC is not aware of
|
||||
the resources that are loaded onto a device since the majority of that activity
|
||||
occurs within NNC commands. As a result, QAIC does not have the means to
|
||||
roll back userspace activity. To ensure that a userspace client's resources
|
||||
are fully released in the case of a process crash, or a bug, QAIC uses the
|
||||
terminate command to let QSM know when a user has gone away, and the resources
|
||||
can be released.
|
||||
|
||||
QSM can report a version number of the NNC protocol it supports. This is in the
|
||||
form of a Major number and a Minor number.
|
||||
|
||||
Major number updates indicate changes to the NNC protocol which impact the
|
||||
message format, or transactions (impacts QAIC).
|
||||
|
||||
Minor number updates indicate changes to the NNC protocol which impact the
|
||||
commands (does not impact QAIC).
|
||||
|
||||
uAPI
|
||||
====
|
||||
|
||||
QAIC defines a number of driver specific IOCTLs as part of the userspace API.
|
||||
This section describes those APIs.
|
||||
|
||||
DRM_IOCTL_QAIC_MANAGE
|
||||
This IOCTL allows userspace to send a NNC request to the QSM. The call will
|
||||
block until a response is received, or the request has timed out.
|
||||
|
||||
DRM_IOCTL_QAIC_CREATE_BO
|
||||
This IOCTL allows userspace to allocate a buffer object (BO) which can send
|
||||
or receive data from a workload. The call will return a GEM handle that
|
||||
represents the allocated buffer. The BO is not usable until it has been
|
||||
sliced (see DRM_IOCTL_QAIC_ATTACH_SLICE_BO).
|
||||
|
||||
DRM_IOCTL_QAIC_MMAP_BO
|
||||
This IOCTL allows userspace to prepare an allocated BO to be mmap'd into the
|
||||
userspace process.
|
||||
|
||||
DRM_IOCTL_QAIC_ATTACH_SLICE_BO
|
||||
This IOCTL allows userspace to slice a BO in preparation for sending the BO
|
||||
to the device. Slicing is the operation of describing what portions of a BO
|
||||
get sent where to a workload. This requires a set of DMA transfers for the
|
||||
DMA Bridge, and as such, locks the BO to a specific DBC.
|
||||
|
||||
DRM_IOCTL_QAIC_EXECUTE_BO
|
||||
This IOCTL allows userspace to submit a set of sliced BOs to the device. The
|
||||
call is non-blocking. Success only indicates that the BOs have been queued
|
||||
to the device, but does not guarantee they have been executed.
|
||||
|
||||
DRM_IOCTL_QAIC_PARTIAL_EXECUTE_BO
|
||||
This IOCTL operates like DRM_IOCTL_QAIC_EXECUTE_BO, but it allows userspace
|
||||
to shrink the BOs sent to the device for this specific call. If a BO
|
||||
typically has N inputs, but only a subset of those is available, this IOCTL
|
||||
allows userspace to indicate that only the first M bytes of the BO should be
|
||||
sent to the device to minimize data transfer overhead. This IOCTL dynamically
|
||||
recomputes the slicing, and therefore has some processing overhead before the
|
||||
BOs can be queued to the device.
|
||||
|
||||
DRM_IOCTL_QAIC_WAIT_BO
|
||||
This IOCTL allows userspace to determine when a particular BO has been
|
||||
processed by the device. The call will block until either the BO has been
|
||||
processed and can be re-queued to the device, or a timeout occurs.
|
||||
|
||||
DRM_IOCTL_QAIC_PERF_STATS_BO
|
||||
This IOCTL allows userspace to collect performance statistics on the most
|
||||
recent execution of a BO. This allows userspace to construct an end to end
|
||||
timeline of the BO processing for a performance analysis.
|
||||
|
||||
DRM_IOCTL_QAIC_PART_DEV
|
||||
This IOCTL allows userspace to request a duplicate "shadow device". This extra
|
||||
accelN device is associated with a specific partition of resources on the
|
||||
AIC100 device and can be used for limiting a process to some subset of
|
||||
resources.
|
||||
|
||||
Userspace Client Isolation
|
||||
==========================
|
||||
|
||||
AIC100 supports multiple clients. Multiple DBCs can be consumed by a single
|
||||
client, and multiple clients can each consume one or more DBCs. Workloads
|
||||
may contain sensitive information therefore only the client that owns the
|
||||
workload should be allowed to interface with the DBC.
|
||||
|
||||
Clients are identified by the instance associated with their open(). A client
|
||||
may only use memory they allocate, and DBCs that are assigned to their
|
||||
workloads. Attempts to access resources assigned to other clients will be
|
||||
rejected.
|
||||
|
||||
Module parameters
|
||||
=================
|
||||
|
||||
QAIC supports the following module parameters:
|
||||
|
||||
**datapath_polling (bool)**
|
||||
|
||||
Configures QAIC to use a polling thread for datapath events instead of relying
|
||||
on the device interrupts. Useful for platforms with broken multiMSI. Must be
|
||||
set at QAIC driver initialization. Default is 0 (off).
|
||||
|
||||
**mhi_timeout_ms (unsigned int)**
|
||||
|
||||
Sets the timeout value for MHI operations in milliseconds (ms). Must be set
|
||||
at the time the driver detects a device. Default is 2000 (2 seconds).
|
||||
|
||||
**control_resp_timeout_s (unsigned int)**
|
||||
|
||||
Sets the timeout value for QSM responses to NNC messages in seconds (s). Must
|
||||
be set at the time the driver is sending a request to QSM. Default is 60 (one
|
||||
minute).
|
||||
|
||||
**wait_exec_default_timeout_ms (unsigned int)**
|
||||
|
||||
Sets the default timeout for the wait_exec ioctl in milliseconds (ms). Must be
|
||||
set prior to the waic_exec ioctl call. A value specified in the ioctl call
|
||||
overrides this for that call. Default is 5000 (5 seconds).
|
||||
|
||||
**datapath_poll_interval_us (unsigned int)**
|
||||
|
||||
Sets the polling interval in microseconds (us) when datapath polling is active.
|
||||
Takes effect at the next polling interval. Default is 100 (100 us).
|
||||
Reference in New Issue
Block a user