Commit Graph

624 Commits

Author SHA1 Message Date
Linus Torvalds f470f8d4e7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (62 commits)
  mlx4_core: Deprecate log_num_vlan module param
  IB/mlx4: Don't set VLAN in IBoE WQEs' control segment
  IB/mlx4: Enable 4K mtu for IBoE
  RDMA/cxgb4: Mark QP in error before disabling the queue in firmware
  RDMA/cxgb4: Serialize calls to CQ's comp_handler
  RDMA/cxgb3: Serialize calls to CQ's comp_handler
  IB/qib: Fix issue with link states and QSFP cables
  IB/mlx4: Configure extended active speeds
  mlx4_core: Add extended port capabilities support
  IB/qib: Hold links until tuning data is available
  IB/qib: Clean up checkpatch issue
  IB/qib: Remove s_lock around header validation
  IB/qib: Precompute timeout jiffies to optimize latency
  IB/qib: Use RCU for qpn lookup
  IB/qib: Eliminate divide/mod in converting idx to egr buf pointer
  IB/qib: Decode path MTU optimization
  IB/qib: Optimize RC/UC code by IB operation
  IPoIB: Use the right function to do DMA unmap pages
  RDMA/cxgb4: Use correct QID in insert_recv_cqe()
  RDMA/cxgb4: Make sure flush CQ entries are collected on connection close
  ...
2011-11-01 10:51:38 -07:00
Roland Dreier 504255f8d0 Merge branches 'amso1100', 'cma', 'cxgb3', 'cxgb4', 'fdr', 'ipath', 'ipoib', 'misc', 'mlx4', 'misc', 'nes', 'qib' and 'xrc' into for-next 2011-11-01 09:37:08 -07:00
Christoph Lameter bc3e53f682 mm: distinguish between mlocked and pinned pages
Some kernel components pin user space memory (infiniband and perf) (by
increasing the page count) and account that memory as "mlocked".

The difference between mlocking and pinning is:

A. mlocked pages are marked with PG_mlocked and are exempt from
   swapping. Page migration may move them around though.
   They are kept on a special LRU list.

B. Pinned pages cannot be moved because something needs to
   directly access physical memory. They may not be on any
   LRU list.

I recently saw an mlockalled process where mm->locked_vm became
bigger than the virtual size of the process (!) because some
memory was accounted for twice:

Once when the page was mlocked and once when the Infiniband
layer increased the refcount because it needt to pin the RDMA
memory.

This patch introduces a separate counter for pinned pages and
accounts them seperately.

Signed-off-by: Christoph Lameter <cl@linux.com>
Cc: Mike Marciniszyn <infinipath@qlogic.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:46 -07:00
Sean Hefty 42849b2697 RDMA/uverbs: Export ib_open_qp() capability to user space
Allow processes that share the same XRC domain to open an existing
shareable QP.  This permits those processes to receive events on the
shared QP and transfer ownership, so that any process may modify the
QP.  The latter allows the creating process to exit, while a remaining
process can still transition it for path migration purposes.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:50:56 -07:00
Sean Hefty 0e0ec7e063 RDMA/core: Export ib_open_qp() to share XRC TGT QPs
XRC TGT QPs are shared resources among multiple processes.  Since the
creating process may exit, allow other processes which share the same
XRC domain to open an existing QP.  This allows us to transfer
ownership of an XRC TGT QP to another process.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:49:51 -07:00
Sean Hefty 2622e18ef4 IB/cm: Do not automatically disconnect XRC TGT QPs
Because an XRC TGT QP can end up being shared among multiple
processes, don't have the ib_cm automatically send a DREQ when the
userspace process that owns the ib_cm_id exits.  Disconnect can be
initiated by the user directly; otherwise, the owner of the XRC INI QP
controls the connection.

Note that as a result of the process exiting, the ib_cm will stop
tracking the XRC connection on the target side.  For the purposes of
disconnecting, this isn't a big deal.  The ib_cm will respond to the
DREQ appropriately.  For other messages, mainly LAP, the CM will
reject the request, since there's no one available to route the
request to.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:42:06 -07:00
Sean Hefty 18c441a6c3 RDMA/cma: Support XRC QPs
Allow users to connect XRC QPs through the rdma_cm.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:41:36 -07:00
Sean Hefty 638ef7a6c6 RDMA/ucm: Allow user to specify QP type when creating id
Allow the user to indicate the QP type separately from the port space
when allocating an rdma_cm_id.  With RDMA_PS_IB, there is no longer a
1:1 relationship between the QP type and port space, so we need to
switch on the QP type to select between UD and connected QPs.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:40:36 -07:00
Sean Hefty 2d2e941529 RDMA/cm: Define new RDMA port space specific to IB
Add RDMA_PS_IB.  XRC QP types will use the IB port space when operating
over the RDMA CM.  For the 'IP protocol' field value, we select 0x3F,
which is listed as being for 'any local network'.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:39:52 -07:00
Sean Hefty ef70044647 IB/cm: Update XRC support based on XRC annex errata
The XRC annex was updated to have XRC behave more like RD. Specifically,
the XRC TGT QPN moves from the local QPN to local EECN field.  Lookup of
SRQN is done using the REQ/REP protocol.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:38:35 -07:00
Sean Hefty d26a360b77 IB/cm: Update protocol to support XRC
Update the REQ and REP messages to support XRC connection setup
according to the XRC Annex.  Several existing fields must be set to 0 or
1 when connecting XRC QPs, and a reserved field is changed to an
extended transport type.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:37:55 -07:00
Sean Hefty b93f3c1872 RDMA/uverbs: Export XRC TGT QPs to user space
Allow user space to operate on XRC TGT QPs the same way as other types
of QPs, with one notable exception: since XRC TGT QPs may be shared
among multiple processes, the XRC TGT QP is allowed to exist beyond the
lifetime of the creating process.

The process that creates the QP is allowed to destroy it, but if the
process exits without destroying the QP, then the QP will be left bound
to the lifetime of the XRCD.

TGT QPs are not associated with CQs or a PD.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:37:07 -07:00
Sean Hefty 9977f4f64b RDMA/uverbs: Export XRC INI QPs to userspace
XRC INI QPs are similar to send only RC QPs.  Allow user space to create
INI QPs.  Note that INI QPs do not require receive CQs.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:32:27 -07:00
Sean Hefty 8541f8de05 RDMA/uverbs: Export XRC SRQs to user space
We require additional information to create XRC SRQs than we can
exchange using the existing create SRQ ABI.  Provide an enhanced create
ABI for extended SRQ types.

Based on patches by Jack Morgenstein <jackm@dev.mellanox.co.il>
and Roland Dreier <roland@purestorage.com>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:29:18 -07:00
Sean Hefty 53d0bd1e7f RDMA/uverbs: Export XRC domains to user space
Allow user space to create XRC domains.  Because XRCDs are expected to
be shared among multiple processes, we use inodes to identify an XRCD.

Based on patches by Jack Morgenstein <jackm@dev.mellanox.co.il>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:21:24 -07:00
Sean Hefty d3d72d909e RDMA/verbs: Cleanup XRC TGT QPs when destroying XRCD
XRC TGT QPs are intended to be shared among multiple users and
processes.  Allow the destruction of an XRC TGT QP to be done explicitly
through ib_destroy_qp() or when the XRCD is destroyed.

To support destroying an XRC TGT QP, we need to track TGT QPs with the
XRCD.  When the XRCD is destroyed, all tracked XRC TGT QPs are also
cleaned up.

To avoid stale reference issues, if a user is holding a reference on a
TGT QP, we increment a reference count on the QP.  The user releases the
reference by calling ib_release_qp.  This releases any access to the QP
from a user above verbs, but allows the QP to continue to exist until
destroyed by the XRCD.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:20:27 -07:00
Sean Hefty b42b63cf0d RDMA/core: Add XRC QPs
XRC ("eXtended reliable connected") is an IB transport that provides
better scalability by allowing senders to specify which shared receive
queue (SRQ) should be used to receive a message, which essentially
allows one transport context (QP connection) to serve multiple
destinations (as long as they share an adapter, of course).

XRC communication is between an initiator (INI) QP and a target (TGT)
QP.  Target QPs are associated with SRQs through an XRCD.  An XRC TGT QP
behaves like a receive-only RD QP.  XRC INI QPs behave similarly to RC
QPs, except that work requests posted to an XRC INI QP must specify the
remote SRQ that is the target of the work request.

We define two new QP types for XRC, to distinguish between INI and TGT
QPs, and update the core layer to support XRC QPs.

This patch is derived from work by Jack Morgenstein
<jackm@dev.mellanox.co.il>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:16:19 -07:00
Sean Hefty 418d51307d RDMA/core: Add XRC SRQ type
XRC ("eXtended reliable connected") is an IB transport that provides
better scalability by allowing senders to specify which shared receive
queue (SRQ) should be used to receive a message, which essentially
allows one transport context (QP connection) to serve multiple
destinations (as long as they share an adapter, of course).

XRC defines SRQs that are specifically used by XRC connections.  Expand
the SRQ code to support XRC SRQs.  An XRC SRQ is currently restricted to
only XRC use according to the IB XRC Annex.

Portions of this patch were derived from work by
Jack Morgenstein <jackm@dev.mellanox.co.il>.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:14:31 -07:00
Sean Hefty 96104eda01 RDMA/core: Add SRQ type field
Currently, there is only a single ("basic") type of SRQ, but with XRC
support we will add a second.  Prepare for this by defining an SRQ type
and setting all current users to IB_SRQT_BASIC.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:13:26 -07:00
Sean Hefty 59991f94eb RDMA/core: Add XRC domain support
XRC ("eXtended reliable connected") is an IB transport that provides
better scalability by allowing senders to specify which shared receive
queue (SRQ) should be used to receive a message, which essentially
allows one transport context (QP connection) to serve multiple
destinations (as long as they share an adapter, of course).

A few new concepts are introduced to support this.  This patch adds:

 - A new device capability flag, IB_DEVICE_XRC, which low-level
   drivers set to indicate that a device supports XRC.
 - A new object type, XRC domains (struct ib_xrcd), and new verbs
   ib_alloc_xrcd()/ib_dealloc_xrcd().  XRCDs are used to limit which
   XRC SRQs an incoming message can target.

This patch is derived from work by Jack Morgenstein <jackm@dev.mellanox.co.il>.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-12 10:32:26 -07:00
Marcel Apfelbaum 71eeba161d IB: Add new InfiniBand link speeds
Introduce support for the following extended speeds:

FDR-10: a Mellanox proprietary link speed which is 10.3125 Gbps with
        64b/66b encoding rather than 8b/10b encoding.
FDR:    IBA extended speed 14.0625 Gbps.
EDR:    IBA extended speed 25.78125 Gbps.

Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-11 11:53:47 -07:00
Kumar Sanghvi 3ebeebc38b RDMA/iwcm: Propagate ird/ord values upwards
Update struct iw_cm_event to support propagating the ird/ord values
upwards to the application.

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-06 09:37:52 -07:00
Hefty, Sean caf6e3f221 RDMA/ucm: Removed checks for unsigned value < 0
cmd is unsigned, no need to check for < 0.  Found by code inspection.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-06 09:33:05 -07:00
Hefty, Sean b7ab0b19a8 IB/mad: Verify mgmt class in received MADs
If a received MAD contains an invalid or reserved mgmt class, we will
attempt to access method_table outside of its range.  Add a check to
ensure that mgmt class falls within the handled range.

Found by code inspection.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-06 09:33:05 -07:00
Hefty, Sean f45ee80eb0 RDMA/cma: Check for NULL conn_param in rdma_accept
Check that conn_param is not null before dereferencing it when
processing rdma_accept().  This is necessary to prevent a possible
system crash, which can be caused by user space.

Problem found by code inspection.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-06 09:33:04 -07:00