This merge commit includes some misc shared code updates from mlx5-next branch needed
for net-next.
1) From Aya: Enable general events on all physical link types and
restrict general event handling of subtype DELAY_DROP_TIMEOUT in mlx5 rdma
driver to ethernet links only as it was intended.
2) From Eli: Introduce low level bits for prio tag mode
3) From Maor: Low level steering updates to support RDMA RX flow
steering and enables RoCE loopback traffic when switchdev is enabled.
4) From Vu and Parav: Two small mlx5 core cleanups
5) From Yevgeny add HW definitions of geneve offloads
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Introduce specification for Geneve decap flow with encapsulation options
and allow creation of rules that are matching on Geneve TLV options.
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add new flow steering namespace - MLX5_FLOW_NAMESPACE_RDMA_RX.
Flow steering rules in this namespace are used to filter
RDMA traffic.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Handle event of power state change in the PCIE slot. When the event
occurs, check if query power state and PCI power fields is supported. If
so, read these fields from MPEIN (management PCIE info) register and
issue a corresponding message.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In Embedded CPU (EC) configurations, the EC driver needs to know when
the number of virtual functions change on the corresponding PF at the
host side. This is required so the EC driver can create or destroy
representor net devices that represent the VFs ports.
Whenever a change in the number of VFs occurs, firmware will generate an
event towards the EC which will trigger a work to complete the rest of
the handling. The specifics of the handling will be introduced in a
downstream patch.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Mellanox's SmartNIC combines embedded CPU(e.g, ARM) processing power
with advanced network offloads to accelerate a multitude of security,
networking and storage applications.
With the introduction of the SmartNIC, there is a new PCI function
called Embedded CPU Physical Function(ECPF). And it's possible for a
PF to get its ICM pages from the ECPF PCI function. Driver shall
identify if it is running on such a function by reading a bit in
the initialization segment.
When firmware asks for pages, it would issue a page request event
specifying how many pages it requests and for which function. That
driver responds with a manage_pages command providing the requested
pages along with an indication for which function it is providing these
pages.
The encoding before this patch was as follows:
function_id == 0: pages are requested for the function receiving
the EQE.
function_id != 0: pages are requested for VF identified by the
function_id value
A new one bit field in the EQE identifies that pages are requested for
the ECPF.
The notion of page_supplier can be introduced here and to support that,
manage pages and query pages were modified so firmware can distinguish
the following cases:
1. Function provides pages for itself
2. PF provides pages for its VF
3. ECPF provides pages to itself
4. ECPF provides pages for another function
This distinction is possible through the introduction of the bit
"embedded_cpu_function" in query_pages, manage_pages and page request
EQE.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
To avoid compatibility issue with older kernels the firmware doesn't
allow SRQ to work with ODP unless kernel asks for it.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Add support for the HW feature of multi-packet WQE in XDP
xmit flow.
The conventional TX descriptor (WQE, Work Queue Element) serves
a single packet. Our HW has support for multi-packet WQE (MPWQE)
in which a single descriptor serves multiple TX packets.
This reduces both the PCI overhead and the CPU cycles wasted on
writing them.
In this patch we add support for the HW feature, which is supported
starting from ConnectX-5.
Performance:
Tested packet rate for UDP 64Byte multi-stream over ConnectX-5 NICs.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
XDP_TX:
We see a huge gain on single port ConnectX-5, and reach the 100 Mpps
milestone.
* Single-port HCA:
Before: 70 Mpps
After: 100 Mpps (+42.8%)
* Dual-port HCA:
Before: 51.7 Mpps
After: 57.3 Mpps (+10.8%)
* In both cases we tested traffic on one port and for now On Dual-port HCAs
we see only small gain, we are working to overcome this bottleneck, but
for the moment only with experimental firmware on dual port HCAs we can
reach the wanted numbers as seen on Single-port HCAs.
XDP_REDIRECT:
Redirect from (A) ConnectX-5 to (B) ConnectX-5.
Due to a setup limitation, (A) and (B) are on different NUMA nodes,
so absolute performance numbers are not optimal.
Note:
Below is the transmit rate of (B), not the redirect rate of (A)
which is in some cases higher.
* (B) is single-port:
Before: 77 Mpps
After: 90 Mpps (+16.8%)
* (B) is dual-port:
Before: 61 Mpps
After: 72 Mpps (+18%)
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Will be used in downstream patch to monitor counter changes
by the HCA and report it to the driver by an event.
The driver will update its counters cached data accordingly.
Signed-off-by: Eyal Davidovich <eyald@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Introduce and use a helper that extracts the opcode
from a CQE (completion queue entry) structure.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Use atomic_notifier_chain to fire firmware events at internal mlx5 core
components such as eswitch/fpga/clock/FW tracer/etc.., this is to
avoid explicit calls from low level mlx5_core to upper components and to
simplify the mlx5_core API for future developments.
Simply provide register/unregister notifiers API and call the notifier
chain on firmware async events.
Example: to subscribe to a FW event:
struct mlx5_nb port_event;
MLX5_NB_INIT(&port_event, port_event_handler, PORT_CHANGE);
mlx5_eq_notifier_register(mdev, &port_event);
where:
- port_event_handler is the notifier block callback.
- PORT_EVENT is the suffix of MLX5_EVENT_TYPE_PORT_CHANGE.
The above will guarantee that port_event_handler will receive all FW
events of the type MLX5_EVENT_TYPE_PORT_CHANGE.
To receive all FW/HW events one can subscribe to
MLX5_EVENT_TYPE_NOTIFY_ANY.
The next few patches will start moving all mlx5 core components to use
this new API and cleanup mlx5_eq_async_int misx handler from component
explicit calls and specific logic.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
mlx5 updates for both net-next and rdma-next
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: (21 commits)
net/mlx5: Expose DC scatter to CQE capability bit
net/mlx5: Update mlx5_ifc with DEVX UID bits
net/mlx5: Set uid as part of DCT commands
net/mlx5: Set uid as part of SRQ commands
net/mlx5: Set uid as part of SQ commands
net/mlx5: Set uid as part of RQ commands
net/mlx5: Set uid as part of QP commands
net/mlx5: Set uid as part of CQ commands
net/mlx5: Rename incorrect naming in IFC file
net/mlx5: Export packet reformat alloc/dealloc functions
net/mlx5: Pass a namespace for packet reformat ID allocation
net/mlx5: Expose new packet reformat capabilities
{net, RDMA}/mlx5: Rename encap to reformat packet
net/mlx5: Move header encap type to IFC header file
net/mlx5: Break encap/decap into two separated flow table creation flags
net/mlx5: Add support for more namespaces when allocating modify header
net/mlx5: Export modify header alloc/dealloc functions
net/mlx5: Add proper NIC TX steering flow tables support
net/mlx5: Cleanup flow namespace getter switch logic
net/mlx5: Add memic command opcode to command checker
...
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Today mlx5 devices support two teardown modes:
1- Regular teardown
2- Force teardown
This change introduces the enhanced version of the "Force teardown" that
allows SW to perform teardown in a faster way without the need to reclaim
all the pages.
Fast teardown provides the following advantages:
1- Fix a FW race condition that could cause command timeout
2- Avoid moving to polling mode
3- Close the vport to prevent PCI ACK to be sent without been scatter
to memory
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Extend the ability to add steering rules to NIC TX flow tables.
For now, we are only adding TX bypass (egress) which is used by the RDMA
side. This will allow to shape outgoing traffic and tweak it if needed, for
example performing encapsulation or rewriting headers.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Modify and query vport state commands share the same admin_state and
op_mod values, rename the enums to fit them both.
In addition, remove the esw prefix from the admin state enum as this
also applied for vnic.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tracer has one event, event 0x26, with two subtypes:
- Subtype 0: Ownership change
- Subtype 1: Traces available
An ownership change occurs in the following cases:
1- Owner releases his ownership, in this case, an event will be
sent to inform others to reattempt acquire ownership.
2- Ownership was taken by a higher priority tool, in this case
the owner should understand that it lost ownership, and go through
tear down flow.
The second subtype indicates that there are traces in the trace buffer,
in this case, the driver polls the tracer buffer for new traces, parse
them and prepares the messages for printing.
The HW starts tracing from the first address in the tracer buffer.
Driver receives an event notifying that new trace block exists.
HW posts a timestamp event at the last 8B of every 256B block.
Comparing the timestamp to the last handled timestamp would indicate
that this is a new trace block. Once the new timestamp is detected,
the entire block is considered valid.
Block validation and parsing, should be done after copying the current
block to a different location, in order to avoid block overwritten
during processing.
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This patch updates the mlx5_ifc structures and
command interface to support DEVX.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Pull rdma updates from Jason Gunthorpe:
"This has been a quiet cycle for RDMA, the big bulk is the usual
smallish driver updates and bug fixes. About four new uAPI related
things. Not as much Szykaller patches this time, the bugs it finds are
getting harder to fix.
Summary:
- More work cleaning up the RDMA CM code
- Usual driver bug fixes and cleanups for qedr, qib, hfi1, hns,
i40iw, iw_cxgb4, mlx5, rxe
- Driver specific resource tracking and reporting via netlink
- Continued work for name space support from Parav
- MPLS support for the verbs flow steering uAPI
- A few tricky IPoIB fixes improving robustness
- HFI1 driver support for the '16B' management packet format
- Some auditing to not print kernel pointers via %llx or similar
- Mark the entire 'UCM' user-space interface as BROKEN with the
intent to remove it entirely. The user space side of this was long
ago replaced with RDMA-CM and syzkaller is finding bugs in the
residual UCM interface nobody wishes to fix because nobody uses it.
- Purge more bogus BUG_ON's from Leon
- 'flow counters' verbs uAPI
- T10 fixups for iser/isert, these are Acked by Martin but going
through the RDMA tree due to dependencies"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (138 commits)
RDMA/mlx5: Update SPDX tags to show proper license
RDMA/restrack: Change SPDX tag to properly reflect license
IB/hfi1: Fix comment on default hdr entry size
IB/hfi1: Rename exp_lock to exp_mutex
IB/hfi1: Add bypass register defines and replace blind constants
IB/hfi1: Remove unused variable
IB/hfi1: Ensure VL index is within bounds
IB/hfi1: Fix user context tail allocation for DMA_RTAIL
IB/hns: Use zeroing memory allocator instead of allocator/memset
infiniband: fix a possible use-after-free bug
iw_cxgb4: add INFINIBAND_ADDR_TRANS dependency
IB/isert: use T10-PI check mask definitions from core layer
IB/iser: use T10-PI check mask definitions from core layer
RDMA/core: introduce check masks for T10-PI offload
IB/isert: fix T10-pi check mask setting
IB/mlx5: Add counters read support
IB/mlx5: Add flow counters read support
IB/mlx5: Add flow counters binding support
IB/mlx5: Add counters create and destroy support
IB/uverbs: Add support for flow counters
...
The FPGA queue pair (QP) event fires whenever a QP on the FPGA
transitions to the error state.
At this stage, this event is unrecoverable, it may become recoverable
in the future.
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Adi Nissim <adin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Temperature warning event is sent by FW to indicate high temperature
as detected by one of the sensors on the board.
Add handling of this event by writing the numbers of the alert sensors
to the kernel log.
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Adi Nissim <adin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add pbmc and pptb in the port_access_reg_cap_mask. These two
bits determine if device supports receive buffer configuration.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This patch reports the device's capbilities to offload
encapsulated MPLS tunnel protocols to user-space:
- Capability to offload MPLS over GRE.
- Capability to offload MPLS over UDP.
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>