kernel

mirror of https://github.com/ukui/kernel.git synced 2026-03-09 10:07:04 -07:00

Author	SHA1	Message	Date
Linus Torvalds	5dedb9f3bd	Merge tag 'rdma-for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull InfiniBand/RDMA changes from Roland Dreier: - Updates to the qib low-level driver - First chunk of changes for SR-IOV support for mlx4 IB - RDMA CM support for IPv6-only binding - Other misc cleanups and fixes Fix up some add-add conflicts in include/linux/mlx4/device.h and drivers/net/ethernet/mellanox/mlx4/main.c * tag 'rdma-for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (30 commits) IB/qib: checkpatch fixes IB/qib: Add congestion control agent implementation IB/qib: Reduce sdma_lock contention IB/qib: Fix an incorrect log message IB/qib: Fix QP RCU sparse warnings mlx4: Put physical GID and P_Key table sizes in mlx4_phys_caps struct and paravirtualize them mlx4_core: Allow guests to have IB ports mlx4_core: Implement mechanism for reserved Q_Keys net/mlx4_core: Free ICM table in case of error IB/cm: Destroy idr as part of the module init error flow mlx4_core: Remove double function declarations IB/mlx4: Fill the masked_atomic_cap attribute in query device IB/mthca: Fill in sq_sig_type in query QP IB/mthca: Warning about event for non-existent QPs should show event type IB/qib: Fix sparse RCU warnings in qib_keys.c net/mlx4_core: Initialize IB port capabilities for all slaves mlx4: Use port management change event instead of smp_snoop IB/qib: RCU locking for MR validation IB/qib: Avoid returning EBUSY from MR deregister IB/qib: Fix UC MR refs for immediate operations ...	2012-07-24 13:56:26 -07:00
Amir Vadai	d9236c3f10	{NET,IB}/mlx4: Add rmap support to mlx4_assign_eq Enable callers of mlx4_assign_eq to supply a pointer to cpu_rmap. If supplied, the assigned IRQ is tracked using rmap infrastructure. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-19 08:34:37 -07:00
Amir Vadai	af22d9de45	net/mlx4: Move MAC_MASK to a common place Define this macro is one common place instead of duplicating it over the code Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-19 08:34:37 -07:00
Jack Morgenstein	6634961c14	mlx4: Put physical GID and P_Key table sizes in mlx4_phys_caps struct and paravirtualize them To allow easy paravirtualization of P_Key and GID table sizes, keep paravirtualized sizes in mlx4_dev->caps, but save the actual physical sizes from FW in struct: mlx4_dev->phys_cap. In addition, in SR-IOV mode, do the following: 1. Reduce reported P_Key table size by 1. This is done to reserve the highest P_Key index for internal use, for declaring an invalid P_Key in P_Key paravirtualization. We require a P_Key index which always contain an invalid P_Key value for this purpose (i.e., one which cannot be modified by the subnet manager). The way to do this is to reduce the P_Key table size reported to the subnet manager by 1, so that it will not attempt to access the P_Key at index #127. 2. Paravirtualize the GID table size to 1. Thus, each guest sees only a single GID (at its paravirtualized index 0). In addition, since we are paravirtualizing the GID table size to 1, we add paravirtualization of the master GID event here (i.e., we do not do ib_dispatch_event() for the GUID change event on the master, since its (only) GUID never changes). Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-11 11:52:23 -07:00
Jack Morgenstein	396f2feb05	mlx4_core: Implement mechanism for reserved Q_Keys The SR-IOV special QP tunneling mechanism uses proxy special QPs (instead of the real special QPs) for MADs on guests. These proxy QPs send their packets to a "tunnel" QP owned by the master. The master then forwards the MAD (after any required paravirtualization) to the real special QP, which sends out the MAD. For security reasons (i.e., to prevent guests from sending MADs to tunnel QPs belonging to other guests), each proxy-tunnel QP pair is assigned a unique, reserved, Q_Key. These Q_Keys are available only for proxy and tunnel QPs -- if the guest tries to use these Q_Keys with other QPs, it will fail. This patch introduces a mechanism for reserving a block of 64K Q_Keys for proxy/tunneling use. The patch introduces also two new fields into mlx4_dev: base_sqpn and base_tunnel_sqpn. In SR-IOV mode, the QP numbers for the "real," proxy, and tunnel sqps are added to the reserved QPN area (so that they will not change). There are 8 special QPs per port in the HCA, and each of them is assigned both a proxy and a tunnel QP, for each VF and for the PF as well in SR-IOV mode. The QPNs for these QPs are arranged as follows: 1. The real SQP numbers (8) 2. The proxy SQPs (8 * (max number of VFs + max number of PFs) 3. The tunnel SQPs (8 * (max number of VFs + max number of PFs) To support these QPs, two new fields are added to struct mlx4_dev: base_sqp: this is the QP number of the first of the real SQPs base_tunnel_sqp: this is the qp number of the first qp in the tunnel sqp region. (On guests, this is the first tunnel sqp of the 8 which are assigned to that guest). In addition, in SR-IOV mode, sqp_start is the number of the first proxy SQP in the proxy SQP region. (In guests, this is the first proxy SQP of the 8 which are assigned to that guest) Note that in non-SR-IOV mode, there are no proxies and no tunnels. In this case, sqp_start is set to sqp_base -- which minimizes code changes. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-11 11:35:55 -07:00
Jack Morgenstein	2aca1172c2	net/mlx4_core: Initialize IB port capabilities for all slaves With IB SR-IOV, each slave has its own separate copy of the port capabilities flags. For example, the master can run a subnet manager (which causes the IsSM bit to be set in the master's port capabilities) without affecting the port capabilities seen by the slaves (the IsSM bit will be seen as cleared in the slaves). Also add a static inline mlx4_master_func_num() to enhance readability of the code. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-10 09:57:06 -07:00
Jack Morgenstein	00f5ce99dc	mlx4: Use port management change event instead of smp_snoop The port management change event can replace smp_snoop. If the capability bit for this event is set in dev-caps, the event is used (by the driver setting the PORT_MNG_CHG_EVENT bit in the async event mask in the MAP_EQ fw command). In this case, when the driver passes incoming SMP PORT_INFO SET mads to the FW, the FW generates port management change events to signal any changes to the driver. If the FW generates these events, smp_snoop shouldn't be invoked in ib_process_mad(), or duplicate events will occur (once from the FW-generated event, and once from smp_snoop). In the case where the FW does not generate port management change events smp_snoop needs to be invoked to create these events. The flow in smp_snoop has been modified to make use of the same procedures as in the fw-generated-event event case to generate the port management events (LID change, Client-rereg, Pkey change, and/or GID change). Port management change event handling required changing the mlx4_ib_event and mlx4_dispatch_event prototypes; the "param" argument (last argument) had to be changed to unsigned long in order to accomodate passing the EQE pointer. We also needed to move the definition of struct mlx4_eqe from net/mlx4.h to file device.h -- to make it available to the IB driver, to handle port management change events. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-10 09:47:10 -07:00
Jack Morgenstein	752a50cab6	mlx4_core: Pass an invalid PCI id number to VFs Currently, VFs have 0 in their dev->caps.function field. This is a valid pci id (usually of the PF). Instead, pass an invalid PCI id to the VF via QUERY_FW, so that if the value gets accessed in the VF driver, we'll catch the problem. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-08 18:05:05 -07:00
Hadar Hen Zion	592e49dda8	net/mlx4: Implement promiscuous mode with device managed flow-steering The device managed flow steering API has three promiscuous modes: 1. Uplink - captures all the packets that arrive to the port. 2. Allmulti - captures all multicast packets arriving to the port. 3. Function port - for future use, this mode is not implemented yet. Use these modes with the flow_attach and flow_detach firmware commands according to the promiscuous state of the netdevice. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-07 16:23:06 -07:00
Hadar Hen Zion	0ff1fb654b	{NET, IB}/mlx4: Add device managed flow steering firmware API The driver is modified to support three operation modes. If supported by firmware use the device managed flow steering API, that which we call device managed steering mode. Else, if the firmware supports the B0 steering mode use it, and finally, if none of the above, use the A0 steering mode. When the steering mode is device managed, the code is modified such that L2 based rules set by the mlx4_en driver for Ethernet unicast and multicast, and the IB stack multicast attach calls done through the mlx4_ib driver are all routed to use the device managed API. When attaching rule using device managed flow steering API, the firmware returns a 64 bit registration id, which is to be provided during detach. Currently the firmware is always programmed during HCA initialization to use standard L2 hashing. Future work should be done to allow configuring the flow-steering hash function with common, non proprietary means. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-07 16:23:05 -07:00
Hadar Hen Zion	8fcfb4db74	net/mlx4_core: Add firmware commands to support device managed flow steering Add support for firmware commands to attach/detach a new device managed steering mode. Such network steering rules allow the user to provide an L2/L3/L4 flow specification to the firmware and have the device to steer traffic that matches that specification to the provided QP. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-07 16:23:05 -07:00
Hadar Hen Zion	c96d97f4d1	net/mlx4: Set steering mode according to device capabilities Instead of checking the firmware supported steering mode in various places in the code, add a dedicated field in the mlx4 device capabilities structure which is written once during the initialization flow and read across the code. This also set the grounds for add new steering modes. Currently two modes are supported, and are named after the ConnectX HW versions A0 and B0. A0 steering uses mac_index, vlan_index and priority to steer traffic into pre-defined range of QPs. B0 steering uses Ethernet L2 hashing rules and is enabled only if the firmware supports both unicast and multicast B0 steering, The current steering modes are relevant for Ethernet traffic only, such that Infiniband steering remains untouched. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-07 16:23:05 -07:00
Marcel Apfelbaum	3fc929e2d6	net/mlx4_core: Fix number of EQs used in ICM initialisation In SRIOV mode, the number of EQs used when computing the total ICM size was incorrect. To fix this, we do the following: 1. We add a new structure to mlx4_dev, mlx4_phys_caps, to contain physical HCA capabilities. The PPF uses the phys capabilities when it computes things like ICM size. The dev_caps structure will then contain the paravirtualized values, making bookkeeping much easier in SRIOV mode. We add a structure rather than a single parameter because there will be other fields in the phys_caps. The first field we add to the mlx4_phys_caps structure is num_phys_eqs. 2. In INIT_HCA, when running in SRIOV mode, the "log_num_eqs" parameter passed to the FW is the number of EQs per VF/PF; each function (PF or VF) has this number of EQs available. However, the total number of EQs which must be allowed for in the ICM is (1 << log_num_eqs) * (#VFs + #PFs). Rather than compute this quantity, we allocate ICM space for 1024 EQs (which is the device maximum number of EQs, and which is the value we place in the mlx4_phys_caps structure). For INIT_HCA, however, we use the per-function number of EQs as described above. Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-31 18:18:16 -04:00
Linus Torvalds	c23ddf7857	Merge tag 'rdma-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull InfiniBand/RDMA changes from Roland Dreier: - Add ocrdma hardware driver for Emulex IB-over-Ethernet adapters - Add generic and mlx4 support for "raw" QPs: allow suitably privileged applications to send and receive arbitrary packets directly to/from the hardware - Add "doorbell drop" handling to the cxgb4 driver - A fairly large batch of qib hardware driver changes - A few fixes for lockdep-detected issues - A few other miscellaneous fixes and cleanups Fix up trivial conflict in drivers/net/ethernet/emulex/benet/be.h. * tag 'rdma-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (53 commits) RDMA/cxgb4: Include vmalloc.h for vmalloc and vfree IB/mlx4: Fix mlx4_ib_add() error flow IB/core: Fix IB_SA_COMP_MASK macro IB/iser: Fix error flow in iser ep connection establishment IB/mlx4: Increase the number of vectors (EQs) available for ULPs RDMA/cxgb4: Add query_qp support RDMA/cxgb4: Remove kfifo usage RDMA/cxgb4: Use vmalloc() for debugfs QP dump RDMA/cxgb4: DB Drop Recovery for RDMA and LLD queues RDMA/cxgb4: Disable interrupts in c4iw_ev_dispatch() RDMA/cxgb4: Add DB Overflow Avoidance RDMA/cxgb4: Add debugfs RDMA memory stats cxgb4: DB Drop Recovery for RDMA and LLD queues cxgb4: Common platform specific changes for DB Drop Recovery cxgb4: Detect DB FULL events and notify RDMA ULD RDMA/cxgb4: Drop peer_abort when no endpoint found RDMA/cxgb4: Always wake up waiters in c4iw_peer_abort_intr() mlx4_core: Change bitmap allocator to work in round-robin fashion RDMA/nes: Don't call event handler if pointer is NULL RDMA/nes: Fix for the ORD value of the connecting peer ...	2012-05-21 17:54:55 -07:00
Shlomo Pongratz	b3416f4476	mlx4_core: Add second capabilities flags field This patch adds a 64-bit flags2 features member to struct mlx4_dev to export further features of the hardware. The original flags field tracks features whose support bits are advertised by the firmware in offsets 0x40 and 0x44 of the query device capabilities command. flags2 will track features whose support bits are scattered at various offsets. RSS support is the first feature to be exported through flags2. RSS capabilities are located at offset 0x2e. The size of the RSS indirection table is also given in this offset. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-08 11:54:32 -07:00
Oren Duer	c0c1d3d761	IB/mlx4: Put priority bits in WQE of IBoE MLX QP Otherwise CM packets going over MLX QP1 get fixed scheduling priority 0. We want CM packets to get the same scheduling priority, and therefore map to the same SQ (Schedule Queue) and eventually TC (Traffic Class), as the application requested for the actual QP used for the connection. Signed-off-by: Oren Duer <oren@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-08 11:48:09 -07:00
Amir Vadai	e5395e92a4	net/mlx4_core: set port QoS attributes Adding QoS firmware commands: - mlx4_en_SET_PORT_PRIO2TC - set UP <=> TC - mlx4_en_SET_PORT_SCHEDULER - set promised BW, max BW and PG number Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-05 05:08:03 -04:00
Amir Vadai	0e98b523c4	net/mlx4_en: Force user priority by QP attribute Instead of relying on HW to change schedule queue by UP, schedule queue is fixed for a tx_ring, and UP in WQE is ignored in this aspect. This resolves two issues with untagged traffic: 1. untagged traffic has no UP in packet which is needed for QoS. The change above allows setting the schedule queue (and by that the UP) of such a stream. 2. BlueFlame uses the same field used by vlan tag. So forcing UP from QPC allows using BF for untagged but prioritized traffic. In old firmware that force UP is not supported, untagged traffic will not subject to QoS. Because UP is set by QP, need to always have a tx ring per UP, even if pfcrx module paramter is false. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-05 05:08:03 -04:00
Linus Torvalds	250f6715a4	Merge tag 'device-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux Pull <linux/device.h> avoidance patches from Paul Gortmaker: "Nearly every subsystem has some kind of header with a proto like: void foo(struct device dev); and yet there is no reason for most of these guys to care about the sub fields within the device struct. This allows us to significantly reduce the scope of headers including headers. For this instance, a reduction of about 40% is achieved by replacing the include with the simple fact that the device is some kind of a struct. Unlike the much larger module.h cleanup, this one is simply two commits. One to fix the implicit <linux/device.h> users, and then one to delete the device.h includes from the linux/include/ dir wherever possible." tag 'device-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: device.h: audit and cleanup users in main include dir device.h: cleanup users outside of linux/include (C files)	2012-03-24 10:41:37 -07:00
Linus Torvalds	0c2fe82a9b	Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull InfiniBand/RDMA changes for the 3.4 merge window from Roland Dreier: "Nothing big really stands out; by patch count lots of fixes to the mlx4 driver plus some cleanups and fixes to the core and other drivers." * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (28 commits) mlx4_core: Scale size of MTT table with system RAM mlx4_core: Allow dynamic MTU configuration for IB ports IB/mlx4: Fix info returned when querying IBoE ports IB/mlx4: Fix possible missed completion event mlx4_core: Report thermal error events mlx4_core: Fix one more static exported function IB: Change CQE "csum_ok" field to a bit flag RDMA/iwcm: Reject connect requests if cmid is not in LISTEN state RDMA/cxgb3: Don't pass irq flags to flush_qp() mlx4_core: Get rid of redundant ext_port_cap flags RDMA/ucma: Fix AB-BA deadlock IB/ehca: Fix ilog2() compile failure IB: Use central enum for speed instead of hard-coded values IB/iser: Post initial receive buffers before sending the final login request IB/iser: Free IB connection resources in the proper place IB/srp: Consolidate repetitive sysfs code IB/srp: Use pr_fmt() and pr_err()/pr_warn() IB/core: Fix SDR rates in sysfs mlx4: Enforce device max FMR maps in FMR alloc IB/mlx4: Set bad_wr for invalid send opcode ...	2012-03-21 10:33:42 -07:00
Paul Gortmaker	313162d0b8	device.h: audit and cleanup users in main include dir The <linux/device.h> header includes a lot of stuff, and it in turn gets a lot of use just for the basic "struct device" which appears so often. Clean up the users as follows: 1) For those headers only needing "struct device" as a pointer in fcn args, replace the include with exactly that. 2) For headers not really using anything from device.h, simply delete the include altogether. 3) For headers relying on getting device.h implicitly before being included themselves, now explicitly include device.h 4) For files in which doing #1 or #2 uncovers an implicit dependency on some other header, fix by explicitly adding the required header(s). Any C files that were implicitly relying on device.h to be present have already been dealt with in advance. Total removals from #1 and #2: 51. Total additions coming from #3: 9. Total other implicit dependencies from #4: 7. As of 3.3-rc1, there were 110, so a net removal of 42 gives about a 38% reduction in device.h presence in include/* Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2012-03-16 10:38:24 -04:00
Roland Dreier	42872c7a5e	Merge branches 'misc' and 'mlx4' into for-next Conflicts: drivers/infiniband/hw/mlx4/main.c drivers/net/ethernet/mellanox/mlx4/main.c include/linux/mlx4/device.h	2012-03-12 16:25:28 -07:00
Or Gerlitz	096335b3f9	mlx4_core: Allow dynamic MTU configuration for IB ports Set the MTU for IB ports in the driver instead of using the firmware default of 2KB (the driver defaults to 4KB). Allow for dynamic mtu configuration through a new, per-port sysfs entry. Since there's a dependency between the port MTU and the max number of HW VLs the port can support, apply a mim/max approach, using a loop that goes down from the highest possible number of VLs to the lowest, using the firmware return status to know whether the requested number of VLs is possible with a given MTU. For now, as with the dynamic link type change / VPI support, the sysfs entry to change the mtu is exposed only when NOT running in SR-IOV mode. To allow changing the MTU for the master in SR-IOV mode, primary-function-initiated FLR (Function Level Reset) needs to be implemented. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-03-12 16:24:59 -07:00
Jack Morgenstein	5984be9004	mlx4_core: Report thermal error events Print an error message when a thermal error async event is reported by the HW. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Dotan Barak <dotanb@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-03-12 16:24:59 -07:00
Or Gerlitz	8154c07fe1	mlx4_core: Get rid of redundant ext_port_cap flags While doing the work for commit `a6f7feae6d` ("IB/mlx4: pass SMP vendor-specific attribute MADs to firmware") we realized that the firmware would respond on all sorts of vendor-specific MADs. Therefore commit `97285b7817` ("mlx4_core: Add extended port capabilities support") adds redundant code into the driver, since there's no real reaon to maintain the extended capabilities of the port, as they can be queried on demand (e.g the FDR10 capability). This patch reverts commit `97285b7817` and removes the check for extended caps from the mlx4_ib driver port query flow. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-03-06 17:25:18 -08:00

1 2 3 4 5

112 Commits