Commit Graph

534985 Commits

Author SHA1 Message Date
Haggai Eran 7c1eb45a22 IB/core: lock client data with lists_rwsem
An ib_client callback that is called with the lists_rwsem locked only for
read is protected from changes to the IB client lists, but not from
ib_unregister_device() freeing its client data. This is because
ib_unregister_device() will remove the device from the device list with
lists_rwsem locked for write, but perform the rest of the cleanup,
including the call to remove() without that lock.

Mark client data that is undergoing de-registration with a new going_down
flag in the client data context. Lock the client data list with lists_rwsem
for write in addition to using the spinlock, so that functions calling the
callback would be able to lock only lists_rwsem for read and let callbacks
sleep.

Since ib_unregister_client() now marks the client data context, no need for
remove() to search the context again, so pass the client data directly to
remove() callbacks.

Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 15:48:21 -04:00
Haggai Eran 5aa44bb90f IB/core: Add rwsem to allow reading device list or client list
Currently the RDMA subsystem's device list and client list are protected by
a single mutex. This prevents adding user-facing APIs that iterate these
lists, since using them may cause a deadlock. The patch attempts to solve
this problem by adding a read-write semaphore to protect the lists. Readers
now don't need the mutex, and are safe just by read-locking the semaphore.

The ib_register_device, ib_register_client, ib_unregister_device, and
ib_unregister_client functions are modified to lock the semaphore for write
during their respective list modification. Also, in order to make sure
client callbacks are called only between add() and remove() calls, the code
is changed to only add items to the lists after the add() calls and remove
from the lists before the remove() calls.

This patch attempts to solve a similar need [1] that was seen in the RoCE
v2 patch series.

[1] http://www.spinics.net/lists/linux-rdma/msg24733.html

Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: Matan Barak <matanb@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 15:48:15 -04:00
Steve Wise 3403051ebb RDMA/Core: remove rdma_cap_read_multi_sge() helper
This functionality already exists via the max_sge_rd
device capability.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-28 23:02:11 -04:00
Steve Wise bc3fe2e376 svcrdma: Use max_sge_rd for destination read depths
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-28 23:02:11 -04:00
Steve Wise aaae91f4f0 ipath,qib: Expose max_sge_rd correctly
Applications must not assume that max_sge and max_sge_rd are the same,
Hence expose max_sge_rd correctly as well.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-28 23:02:11 -04:00
Sagi Grimberg 18ebd40773 mlx4, mlx5, mthca: Expose max_sge_rd correctly
Applications must not assume that max_sge and max_sge_rd are the same,
Hence expose max_sge_rd correctly as well.

Reported-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-28 23:02:10 -04:00
Jeff Becker a724648e8a staging/hfi1: replace indent spaces with tabs
Running checkpatch.pl on mad.c produces several
"ERROR: code indent should use tabs where possible" messages.
This patch fixes these.

Signed-off-by: Jeff Becker <Jeffrey.C.Becker@nasa.gov>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-28 23:02:10 -04:00
Mike Marciniszyn 7724105686 IB/hfi1: add driver files
Signed-off-by: Andrew Friedley <andrew.friedley@intel.com>
Signed-off-by: Arthur Kepner <arthur.kepner@intel.com>
Signed-off-by: Brendan Cunningham <brendan.cunningham@intel.com>
Signed-off-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Caz Yokoyama <caz.yokoyama@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jim Snow <jim.m.snow@intel.com>
Signed-off-by: John Gregor <john.a.gregor@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Kevin Pine <kevin.pine@intel.com>
Signed-off-by: Kyle Liddell <kyle.liddell@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Ravi Krishnaswamy <ravi.krishnaswamy@intel.com>
Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Sanath Kumar <sanath.s.kumar@intel.com>
Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: Vlad Danushevsky <vladimir.danusevsky@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-28 22:59:36 -04:00
Dennis Dalessandro d4ab347005 IB/core: Add core header changes needed for OPA
This patch adds the value of the CNP opcode to the existing list of enumerated
opcodes in ib_pack.h

Add common OPA header definitions for driver
build:
- opa_port_info.h
- opa_smi.h
- hfi1_user.h

Additionally, ib_mad.h, has additional definitions
that are common to ib_drivers including:
- trap support
- cca support

The qib driver has the duplication removed in favor
those in ib_mad.h

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: John, Jubin <jubin.john@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-28 22:54:50 -04:00
Steve Wise 072bf1f7e4 RDMA/amso1100: Deprecate the amso1100 driver and move to staging
The HW hasn't been sold since 2005, and the SW has definite bit rot.
Its time to remove it.  So move it to staging for a few releases and
then remove it after that.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-28 22:54:49 -04:00
Dennis Dalessandro 6f9b38903c IB/ipath: Deprecate ipath driver and move to staging.
It is now time for the ipath driver to begin to be phased out of the kernel.
This patch moves the ipath driver from the Infiniband sub tree to the staging
area where it will remain until the code is removed from the kernel in a few
releases.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-28 22:54:49 -04:00
Doug Ledford 2dfcad3ade Staging: Add staging/rdma directory and update MAINTAINERS
Create the rdma directory in the staging area for use as we deprecate
some older drivers and as we bring in some new drivers that are in
need of work.  Update the MAINTAINERS file so that updates to these
files go to linux-rdma@vger.kernel.org.  Expected lifespan of this
directory is three releases for any deprecated drivers moved here
and an unknown, but theoretically bounded amount of time for the new
drivers as a new core RDMA transfer library needs to be written and
the drivers modified to use it in order for them to move out of this
directory.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-28 22:54:49 -04:00
Hariprasad S 84cc6ac62d iw_cxgb4: Add support for clip
Add support for ipv6 address handling clip api provided by lld

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-28 22:54:48 -04:00
Spencer Baugh 6c26a77124 RDMA/cma: fix IPv6 address resolution
Resolving a link-local IPv6 address with an unspecified source address
was broken by commit 5462eddd7a, which prevented the IPv6 stack from
learning the scope id of the link-local IPv6 address, causing random
failures as the IP stack chose a random link to resolve the address on.

This commit 5462eddd7a made us bail out of cma_check_linklocal early if
the address passed in was not an IPv6 link-local address. On the address
resolution path, the address passed in is the source address; if the
source address is the unspecified address, which is not link-local, we
will bail out early.

This is mostly correct, but if the destination address is a link-local
address, then we will be following a link-local route, and we'll need to
tell the IPv6 stack what the scope id of the destination address is.
This used to be done by last line of cma_check_linklocal, which is
skipped when bailing out early:

	dev_addr->bound_dev_if = sin6->sin6_scope_id;

(In cma_bind_addr, the sin6_scope_id of the source address is set to the
sin6_scope_id of the destination address, so this is correct)
This line is required in turn for the following line, L279 of
addr6_resolve, to actually inform the IPv6 stack of the scope id:

      fl6.flowi6_oif = addr->bound_dev_if;

Since we can only know we are in this failure case when we have access
to both the source IPv6 address and destination IPv6 address, we have to
deal with this further up the stack. So detect this failure case in
cma_bind_addr, and set bound_dev_if to the destination address scope id
to correct it.

Signed-off-by: Spencer Baugh <sbaugh@catern.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-28 22:54:48 -04:00
Jason Gunthorpe 7e967fd0b8 IB/ucma: Fix theoretical user triggered use-after-free
Something like this:

CPU A                         CPU B
Acked-by: Sean Hefty <sean.hefty@intel.com>

========================      ================================
ucma_destroy_id()
 wait_for_completion()
                              .. anything
                                ucma_put_ctx()
                                  complete()
 .. continues ...
                              ucma_leave_multicast()
                               mutex_lock(mut)
                                 atomic_inc(ctx->ref)
                               mutex_unlock(mut)
 ucma_free_ctx()
  ucma_cleanup_multicast()
   mutex_lock(mut)
     kfree(mc)
                               rdma_leave_multicast(mc->ctx->cm_id,..

Fix it by latching the ref at 0. Once it goes to 0 mc and ctx cannot
leave the mutex(mut) protection.

The other atomic_inc in ucma_get_ctx is OK because mutex(mut) protects
it from racing with ucma_destroy_id.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-28 22:54:48 -04:00
Hariprasad S b8ac311246 iw_cxgb4: set the default MPA version to 2
This enables ORD/IRD negotiation and its about time to enable it by
default

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-28 22:54:47 -04:00
Steve Wise 7854550ae6 RDMA/iser: Limit sgs to the device fastreg depth
Currently the sg tablesize, which dictates fast register page list
depth to use, does not take into account the limits of the rdma device.
So adjust it once we discover the device fastreg max depth limit.  Also
adjust the max_sectors based on the resulting sg tablesize.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-28 22:54:47 -04:00
Roland Dreier d6c7276be1 IB/mlx5: Remove dead code from alloc_cached_mr()
The only place that assigns mr inside the loop already does a break.
So "if (mr)" will never be true here since the function initializes mr
to NULL at the top.  We can just drop the extra if and break here.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Acked-by: Eli Cohen  <eli@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-28 22:54:47 -04:00
Mike Marciniszyn d6f1c17e16 IB/qib: Change lkey table allocation to support more MRs
The lkey table is allocated with with a get_user_pages() with an
order based on a number of index bits from a module parameter.

The underlying kernel code cannot allocate that many contiguous pages.

There is no reason the underlying memory needs to be physically
contiguous.

This patch:
- switches the allocation/deallocation to vmalloc/vfree
- caps the number of bits to 23 to insure at least 1 generation bit
  o this matches the module parameter description

Cc: stable@vger.kernel.org
Reviewed-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-28 22:54:46 -04:00
Sagi Grimberg e0238a6a36 mlx5: Expose correct page_size_cap in device attributes
Should be all the page sizes that are supported by the
device.

Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-28 22:54:46 -04:00
Sagi Grimberg a3c874200c mlx5: Fix missing device local_dma_lkey
The mlx5 driver exposes device capability IB_DEVICE_LOCAL_DMA_LKEY
but does not set the the device local_dma_lkey. This breaks
rpcrdma drivers.

Query and set this lkey when creating the device resources.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-28 22:54:46 -04:00
Linus Torvalds c13dcf9f2d Linux 4.2-rc8 2015-08-23 20:52:59 -07:00
Linus Torvalds d683477020 Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
 "A couple of major (hang and deadlock) fixes with fortunately fairly
  rare triggering conditions.  The PM oops is only really triggered by
  people using enclosure services (rare) and the fnic driver is mostly
  used in enterprise environments"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  SCSI: Fix NULL pointer dereference in runtime PM
  fnic: Use the local variable instead of I/O flag to acquire io_req_lock in fnic_queuecommand() to avoid deadloack
2015-08-23 20:46:22 -07:00
Linus Torvalds eb63b34bdf Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS bug fixes from Ralf Baechle:
 "Two more fixes for 4.2.

  One fixes a build issue with the LLVM assembler - LLVM assembler macro
  names are case sensitive, GNU as macro names are insensitive; the
  other corrects a license string (GPL v2, not GPLv2) such that the
  module loader will recognice the license correctly"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  FIRMWARE: bcm47xx_nvram: Fix module license.
  MIPS: Fix LLVM build issue.
2015-08-23 07:23:09 -07:00
Linus Torvalds c4c53bad40 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull 9p regression fix from Al Viro:
 "Fix for breakage introduced when switching p9_client_{read,write}() to
  struct iov_iter * (went into 4.1)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  9p: ensure err is initialized to 0 in p9_client_read/write
2015-08-22 20:22:11 -07:00