Currently, FPGA capability is located in (mdev)->caps.hca_cur,
change the location to be (mdev)->caps.fpga,
since hca_cur is reserved for HCA device capabilities.
Fixes: e29341fb3a ("net/mlx5: FPGA, Add basic support for Innova")
Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Pull networking updates from David Miller:
1) Support ipv6 checksum offload in sunvnet driver, from Shannon
Nelson.
2) Move to RB-tree instead of custom AVL code in inetpeer, from Eric
Dumazet.
3) Allow generic XDP to work on virtual devices, from John Fastabend.
4) Add bpf device maps and XDP_REDIRECT, which can be used to build
arbitrary switching frameworks using XDP. From John Fastabend.
5) Remove UFO offloads from the tree, gave us little other than bugs.
6) Remove the IPSEC flow cache, from Florian Westphal.
7) Support ipv6 route offload in mlxsw driver.
8) Support VF representors in bnxt_en, from Sathya Perla.
9) Add support for forward error correction modes to ethtool, from
Vidya Sagar Ravipati.
10) Add time filter for packet scheduler action dumping, from Jamal Hadi
Salim.
11) Extend the zerocopy sendmsg() used by virtio and tap to regular
sockets via MSG_ZEROCOPY. From Willem de Bruijn.
12) Significantly rework value tracking in the BPF verifier, from Edward
Cree.
13) Add new jump instructions to eBPF, from Daniel Borkmann.
14) Rework rtnetlink plumbing so that operations can be run without
taking the RTNL semaphore. From Florian Westphal.
15) Support XDP in tap driver, from Jason Wang.
16) Add 32-bit eBPF JIT for ARM, from Shubham Bansal.
17) Add Huawei hinic ethernet driver.
18) Allow to report MD5 keys in TCP inet_diag dumps, from Ivan
Delalande.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1780 commits)
i40e: point wb_desc at the nvm_wb_desc during i40e_read_nvm_aq
i40e: avoid NVM acquire deadlock during NVM update
drivers: net: xgene: Remove return statement from void function
drivers: net: xgene: Configure tx/rx delay for ACPI
drivers: net: xgene: Read tx/rx delay for ACPI
rocker: fix kcalloc parameter order
rds: Fix non-atomic operation on shared flag variable
net: sched: don't use GFP_KERNEL under spin lock
vhost_net: correctly check tx avail during rx busy polling
net: mdio-mux: add mdio_mux parameter to mdio_mux_init()
rxrpc: Make service connection lookup always check for retry
net: stmmac: Delete dead code for MDIO registration
gianfar: Fix Tx flow control deactivation
cxgb4: Ignore MPS_TX_INT_CAUSE[Bubble] for T6
cxgb4: Fix pause frame count in t4_get_port_stats
cxgb4: fix memory leak
tun: rename generic_xdp to skb_xdp
tun: reserve extra headroom only when XDP is set
net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping
net: dsa: bcm_sf2: Advertise number of egress queues
...
Pull rdma updates from Doug Ledford:
"This is a big pull request.
Of note is that I'm sending you the new ioctl API for the rdma
subsystem. We put it up on linux-api@, but didn't get much response.
The API is complex, but it solves two different problems in one go:
1) The bi-directional nature of the RDMA file write calls, which
created the security hole we had to handle (and for which the fix
is now causing problems for systems in production, we were a bit
over zealous in the fix and the ability to open a device, then
fork, then create new queue pairs on the device and use them is
broken).
2) The bloat caused by different vendors implementing extensions to
the base verbs API. Each vendor's hardware is slightly different,
and the hardware might be suitable for one extension but not
another.
By the time we add generic extensions for all the different ways
that the different hardware can offload things, the API becomes
bloated. Things like our completion structs have started to exceed
a cache line in size because of all the elements needed to support
this. That in turn shows up heavily in the performance graphs with
a noticable drop in performance on 100Gigabit links as our
completion structs go from occupying one cache line to 1+.
This API makes things like the completion structs modular in a
very similar way to netlink so that your structs can only include
the items needed for the offloads/features you are actually using
on a given queue pair. In that way we support everything, but only
use what we need, and our structs stay smaller.
The ioctl API is better explained by the posting on linux-api@ than I
can explain it here, so I'll just leave it at that.
The rest of the pull request is typical stuff.
Updates for 4.14 kernel merge window
- Lots of hfi1 driver updates (mixed with a few qib and core updates
as well)
- rxe updates
- various mlx updates
- Set default roce type to RoCEv2
- Several larger fixes for bnxt_re that were too big for -rc
- Several larger fixes for qedr that, likewise, were too big for -rc
- Misc core changes
- Make the hns_roce driver compilable on arches other than aarch64 so
we can more easily debug build issues related to it
- Add rdma-netlink infrastructure updates
- Add automatic IRQ affinity infrastructure
- Add 32bit lid support
- Lots of misc fixes across the subsystem from random people
- Autoloading of RDMA netlink modules
- PCI pool cleanups from Romain Perier
- mlx5 driver feature additions and fixes
- Hardware tag matchine feature
- Fix sleeping in atomic when resolving roce ah
- Add experimental ioctl interface as posted to linux-api@"
* tag 'for-linus-ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (328 commits)
IB/core: Expose ioctl interface through experimental Kconfig
IB/core: Assign root to all drivers
IB/core: Add completion queue (cq) object actions
IB/core: Add legacy driver's user-data
IB/core: Export ioctl enum types to user-space
IB/core: Explicitly destroy an object while keeping uobject
IB/core: Add macros for declaring methods and attributes
IB/core: Add uverbs merge trees functionality
IB/core: Add DEVICE object and root tree structure
IB/core: Declare an object instead of declaring only type attributes
IB/core: Add new ioctl interface
RDMA/vmw_pvrdma: Fix a signedness
RDMA/vmw_pvrdma: Report network header type in WC
IB/core: Add might_sleep() annotation to ib_init_ah_from_wc()
IB/cm: Fix sleeping in atomic when RoCE is used
IB/core: Add support to finalize objects in one transaction
IB/core: Add a generic way to execute an operation on a uobject
Documentation: Hardware tag matching
IB/mlx5: Support IB_SRQT_TM
net/mlx5: Add XRQ support
...
There is an issue where the firmware fails during mlx5_load_one,
the health_care timer detects the issue and schedules a health_care call.
Then the mlx5_load_one detects the issue, cleans up and quits. Then
the health_care starts and calls mlx5_unload_one to clean up the resources
that no longer exist and causes kernel panic.
The root cause is that the bit MLX5_INTERFACE_STATE_DOWN is not set
after mlx5_load_one fails. The solution is removing the bit
MLX5_INTERFACE_STATE_DOWN and quit mlx5_unload_one if the
bit MLX5_INTERFACE_STATE_UP is not set. The bit MLX5_INTERFACE_STATE_DOWN
is redundant and we can use MLX5_INTERFACE_STATE_UP instead.
Fixes: 5fc7197d3a ("net/mlx5: Add pci shutdown callback")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add support to new XRQ(eXtended shared Receive Queue)
hardware object. It supports SRQ semantics with addition
of extended receive buffers topologies and offloads.
Currently supports tag matching topology and rendezvouz offload.
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch is the first step in decoupling UMR usage and
allocation from the MR cache. The only functional change
in this patch is to enables UMR for MRs created with
reg_create.
This change fixes a bug where ODP memory regions that
were not allocated from the MR cache did not have UMR
enabled.
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
mlx5_core_wq is no longer being used and should be removed
from the code.
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Conflicts:
drivers/infiniband/hw/mlx5/main.c - Both add new code
include/rdma/ib_verbs.h - Both add new code
Signed-off-by: Doug Ledford <dledford@redhat.com>
generic api takes care of spreading affinity similar to
what mlx5 open coded (and even handles better asymmetric
configurations). Ask the generic API to spread affinity
for us, and feed him pre_vectors that do not participate
in affinity settings (which is an improvement to what we
had before).
The affinity assignments should match what mlx5 tried to
do earlier but now we do not set affinity to async, cmd
and pages dedicated vectors.
Also, remove mlx5e_get_cpu and introduce mlx5e_get_node
(used for allocation purposes) and mlx5_get_vector_affinity
(for indirection table construction) as they provide the needed
information. Luckily, we have generic helpers to get cpumask
and node given a irq vector. mlx5_get_vector_affinity will
be used by mlx5_ib in a subsequent patch.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Now that we have a generic code to allocate an array
of irq vectors and even correctly spread their affinity,
correctly handle cpu hotplug events and more, were much
better off using it.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When mlx5_ib registers itself to mlx5_core as an interface, it will
call mlx5_add_device which will call mlx5_ib interface add callback,
in case the latter successfully returns, only then mlx5_core will add
it to the interface list and async events will be forwarded to mlx5_ib.
Between mlx5_ib interface add callback and mlx5_core adding the mlx5_ib
interface to its devices list, arriving mlx5_core events can be missed
by the new mlx5_ib registering interface.
In other words:
thread 1: mlx5_ib: mlx5_register_interface(dev)
thread 1: mlx5_core: mlx5_add_device(dev)
thread 1: mlx5_core: ctx = dev->add => (mlx5_ib)->mlx5_ib_add
thread 2: mlx5_core_event: **new event arrives, forward to dev_list
thread 1: mlx5_core: add_ctx_to_dev_list(ctx)
/* previous event was missed by the new interface.*/
It is ok to miss events before dev->add (mlx5_ib)->mlx5_ib_add_device
but not after.
We fix this race by accumulating the events that come between the
ib_register_device (inside mlx5_add_device->(dev->add)) till the adding
to the list completes and fire them to the new registering interface
after that.
Fixes: f1ee87fe55 ("net/mlx5: Organize device list API in one place")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Multi-Physical Function Switch (MPFs) is required for when multi-PF
configuration is enabled to allow passing user configured unicast MAC
addresses to the requesting PF.
Before this patch eswitch.c used to manage the HW MPFS l2 table,
E-Switch always (regardless of sriov) enabled vport(0) (NIC PF) vport's
contexts update on unicast mac address list changes, to populate the PF's
MPFS L2 table accordingly.
In downstream patch we would like to allow compiling the driver without
E-Switch functionalities, for that we move MPFS l2 table logic out
of eswitch.c into its own file, and provide Kconfig flag (MLX5_MPFS) to
allow compiling out MPFS for those who don't want Multi-PF support.
NIC PF netdevice will now directly update MPFS l2 table via the new MPFS
API. VF netdevice has no access to MPFS L2 table, so E-Switch will remain
responsible of updating its MPFS l2 table on behalf of its VFs.
Due to this change we also don't require enabling vport(0) (PF vport)
unicast mac changes events anymore, for when SRIOV is not enabled.
Which means E-Switch is now activated only on SRIOV activation, and not
required otherwise.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: Jes Sorensen <jsorensen@fb.com>
Cc: kernel-team@fb.com
When delay drop timeout is expired, the firmware raises
general notification event of DELAY_DROP_TIMEOUT subtype.
In addition the feature is disable so the driver have to
reactivate the timeout.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When a user sets port_guid, node_guid or policy of an IB virtual
function, save this information in "struct mlx5_vf_context".
This information will be restored later when pci_resume is called.
To make sure this works, one can use aer-inject to generate PCI
errors on mlx5 devices and verify if relevant fields are restored
after PCI resume.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Some overlapping changes in the mlx5 driver.
A merge conflict resolution posted by Stephen Rothwell was used as a
guide.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add interface to initialize and interact with Innova FPGA SBU
connections.
A client driver may use these functions to set up a high-speed DMA
connection with its SBU hardware logic, and send/receive messages
over this connection.
A later patch in this patchset will make use of these functions for
Innova IPSec offload in mlx5 Ethernet driver.
Add commands to retrieve Innova FPGA SBU capabilities, and to
read/write Innova FPGA configuration space registers and memory,
over internal I2C.
At high level, the FPGA configuration space is divided such:
0x00000000 - 0x007fffff is reserved for the SBU
0x00800000 - 0xffffffff is reserved for the Shell
0x400000000 - ... is DDR memory
A later patchset will add support for accessing FPGA CrSpace and memory
over a high-speed connection. This is the reason for the ACCESS_TYPE
enumeration, which currently only supports I2C.
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Previously, only mlx5_ib enabled RoCE on the port, but FPGA needs it as
well.
Add support for counting number of enables, so that FPGA and IB can work
in parallel and independently.
Program the HW to enable RoCE on the first enable call, and program to
disable RoCE on the last disable call.
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reserved GIDs are entries in the GID table in use by the mlx5_core
and its submodules (e.g. FPGA, SRIOV, E-Swtich, netdev).
The entries are reserved at the high indexes of the GID table.
A mlx5 submodule may reserve a certain amount of GIDs for its own use
during the load sequence by calling mlx5_core_reserve_gids, and must
also take care to un-reserve these GIDs when it closes.
Reservation is only allowed during the load sequence and before any
interfaces (e.g. mlx5_ib or mlx5_en) are up.
After reservation, a submodule may call mlx5_core_reserved_gid_alloc/
free to allocate entries from the reserved GIDs pool.
Reserve a GID table entry for every supported FPGA QP.
A later patch in the patchset will remove them from being reported to
IB core.
Another such patch will make use of these for FPGA QPs in Innova NIC.
Added lib/mlx5.h to serve as a library for mlx5 submodlues, and to
expose only public mlx5 API, more mlx5 library files will be added in
future submissions.
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Draining the health workqueue will ignore future health works including
the one that report hardware failure and thus we can't enter error state
Instead cancel the recovery flow and make sure only recovery flow won't
be scheduled.
Fixes: 5e44fca504 ('net/mlx5: Only cancel recovery work when cleaning up device')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
MCC (Management Component Control) allows to control a firmware
component update.
MCDA (Management Component Data Access) allows to read and write
a firmware component.
MCQI (Management Component Query Information) allows to query
information about firmware components.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add a new interface for commands execution that allows the
caller to wait for the command's completion in a busy-wait
loop (polling mode).
This is useful if we want to execute a command in a polling mode
while the driver is working in events mode for the rest of
the commands.
This interface will be used in the downstream patches.
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Overlapping changes in drivers/net/phy/marvell.c, bug fix in 'net'
restricting a HW workaround alongside cleanups in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>