Commit Graph

1701 Commits

Author SHA1 Message Date
John Gregor
ab69b3cf12 IB/ipath: Fix SDMA error recovery in absence of link status change
What's fixed:

    in ipath_cancel_sends()

        We need to unconditionally set ABORTING.  So, swap the tests
        so the set_bit() isn't shadowed by the &&.

        If we've disarmed the piobufs, then we need to unconditionally
        set DISARMED.  So, move it out from the overly protective if
        at the bottom.

    in sdma_abort_task()

        Abort_task was written knowing that the SDMA engine would always
        be reset (and restarted) on error.  A recent change broke that
        fundamental assumption by taking the restart portion and making
        it conditional on a link status change.  But, SDMA can go boom
        without a link status change in some conditions.

Signed-off-by: John Gregor <john.gregor@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-07 11:01:10 -07:00
Dave Olson
e2ab41cae4 IB/ipath: Need to always request and handle PIO avail interrupts
Now that we always use PIO for vl15 on 7220, we could get stuck forever
if we happened to run out of PIO buffers from the verbs code, because
the setup code wouldn't run; the interrupt was also ignored if SDMA was
supported.  We also have to reduce the pio update threshold if we have
fewer kernel buffers than the existing threshold.

Clean up the initialization a bit to get ordering safer and more
sensible, and use the existing ipath_chg_kernavail call to do init,
rather than doing it separately.

Drop unnecessary clearing of pio buffer on pio parity error.

Drop incorrect updating of pioavailshadow when exitting freeze mode
(software state may not match chip state if buffer has been allocated
and not yet written).

If we couldn't get a kernel buffer for a while, make sure we are
in sync with hardware, mainly to handle the exitting freeze case.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-07 11:00:15 -07:00
Michael Albaugh
2889d1ef12 IB/ipath: Fix count of packets received by kernel
The loop in ipath_kreceive() that processes packets increments the
loop-index 'i' once too often, because the exit condition does not
depend on it, and is checked after the increment. By adding a check for
!last to the iterator in the for loop, we correct that in a way that is
not so likely to be re-broken by changes in the loop body.

Signed-off-by: Michael Albaugh <micheal.albaugh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-07 10:59:23 -07:00
Ralph Campbell
2bfc8e9edf IB/ipath: Return the correct opcode for RDMA WRITE with immediate
This patch fixes a bug in the RC responder which generates a completion
entry with the wrong opcode when an RDMA WRITE with immediate is received.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-07 10:58:50 -07:00
Dave Olson
b4d390d8d2 IB/ipath: Fix bug that can leave sends disabled after freeze recovery
The semantics of cancel_sends changed, but the code using it was missed.
Don't leave sends and pioavail updates disabled, and add a comment as to
why the force update is needed.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-07 10:57:48 -07:00
Ralph Campbell
6e87d15007 IB/ipath: Only increment SSN if WQE is put on send queue
If a send work request has immediate errors and is not put on the
send queue, we shouldn't update any of the QP state.

The increment of the SSN wasn't obeying this.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-07 10:57:14 -07:00
Michael Albaugh
5f51efc195 IB/ipath: Only warn about prototype chip during init
We warn about prototype chips, but the function that checks for
support is also called as a result of a get_portinfo request, which
can clutter the logs.

Restrict warning to only appear during initialization.

Signed-off-by: Michael Albaugh <michael.albaugh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-07 10:56:47 -07:00
Roland Dreier
273748cc90 RDMA/cxgb3: Fix severe limit on userspace memory registration size
Currently, iw_cxgb3 is severely limited on the amount of userspace
memory that can be registered in in a single memory region, which
causes big problems for applications that expect to be able to
register 100s of MB.

The problem is that the driver uses a single kmalloc()ed buffer to
hold the physical buffer list (PBL) for the entire memory region
during registration, which means that 8 bytes of contiguous memory are
required for each page of memory being registered.  For example, a 64
MB registration will require 128 KB of contiguous memory with 4 KB
pages, and it unlikely that such an allocation will succeed on a busy
system.

This is purely a driver problem: the temporary page list buffer is not
needed by the hardware, so we can fix this by writing the PBL to the
hardware in page-sized chunks rather than all at once.  We do this by
splitting the memory registration operation up into several steps:

 - Allocate PBL space in adapter memory for the full registration
 - Copy PBL to adapter memory in chunks
 - Allocate STag and enable memory region

This also allows several other cleanups to the __cxio_tpt_op()
interface and related parts of the driver.

This change leaves the reregister memory region and memory window
operations broken, but they already didn't work due to other
longstanding bugs, so fixing them will be left to a later patch.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-06 15:56:22 -07:00
Roland Dreier
0e9913362a RDMA/cxgb3: Don't add PBL memory to gen_pool in chunks
Current iw_cxgb3 code adds PBL memory to the driver's gen_pool in 2 MB
chunks.  This limits the largest single allocation that can be done to
the same size, which means that with 4 KB pages, each of which takes 8
bytes of PBL memory, the largest memory region that can be allocated
is 1 GB (256K PBL entries * 4 KB/entry).

Remove this limit by adding all the PBL memory in a single gen_pool
chunk, if possible.  Add code that falls back to smaller chunks if
gen_pool_add() fails, which can happen if there is not sufficient
contiguous lowmem for the internal gen_pool bitmap.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-06 15:03:38 -07:00
Stefan Roscher
cf04690885 IB/ehca: Fix function return types
Also remove duplicate assignment of local_ca_ack_delay and change
min_t check for local_ca_ack_delay to u8 instead of int.

Signed-off-by: Stefan Roscher <stefan.roscher at de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-05 15:51:49 -07:00
Steve Wise
77a8d5741f RDMA/cxgb3: Bump up the MPA connection setup timeout.
Testing on large clusters shows its way too short at 10 secs.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-02 10:57:09 -07:00
Steve Wise
c4d49776e8 RDMA/cxgb3: Silently ignore close reply after abort.
Remove bad BUG_ON() that can trigger in correct operation from
close_con_rpl().  It is possible to get a close_rpl message on a dead
connection.  The sequence is:

	- host refs ep for close exchange
	- host posts close_req
	- hw posts PEER_ABORT from incoming RST
	- host marks ep DEAD
	- host posts ABORT_RPL and releases ep resources
	- hw posts CLOSE_RPL
	- host derefs ep and ep freed.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-02 10:57:09 -07:00
Steve Wise
c8286944b8 RDMA/cxgb3: QP flush fixes
- Flush the QP only after the HW disables the connection.  Currently
  we flush the QP when transitioning to CLOSING.  This exposes a race
  condition where the HW can complete a RECV WR, for instance, -and-
  the SW can flush that same WR.

- Only call CQ event handlers on flush IFF we actually flushed something.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-02 10:56:57 -07:00
Eli Cohen
57ce41d1d1 IB/ipoib: Fix transmit queue stalling forever
Commit f56bcd80 ("IPoIB: Use separate CQ for UD send completions")
introduced a bug where the transmit queue could get stopped and never
woken up.  The problem is that send completions are only polled at the
end of the xmit function, so if the send queue fills up and the xmit
path stops the queue, then there is no way for send completions to
ever get polled, and so the transmit queue stays stopped forever.

Fix this by arming the send CQ just before posting the last send
request that fills the send queue.  Then, when the completion event
handler is called, drain the send CQ.  Since it is possible that not
enough send completions are in the CQ, verify that the the net queue
has been woken up after draining the send CQ, and if not arm a timer
and drain again at the timer function.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-30 20:02:45 -07:00
Roland Dreier
3ae15e1623 IB/mlx4: Fix off-by-one errors in calls to mlx4_ib_free_cq_buf()
When I merged bbf8eed1 ("IB/mlx4: Add support for resizing CQs") I
changed things around so that mlx4_ib_alloc_cq_buf() and
mlx4_ib_free_cq_buf() were used everywhere they could be.  However, I
screwed up the number of entries passed into mlx4_ib_alloc_cq_buf()
in a couple places -- the function bumps the number of entries
internally, so the caller shouldn't add 1 as well.

Passing a too-big value for the number of entries to mlx4_ib_free_cq_buf()
can cause the cleanup to go off the end of an array and corrupt
allocator state in interesting ways.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-30 19:52:55 -07:00
Glenn Streiff
7495ab6837 RDMA/nes: Formatting cleanup
Various cleanups:
	- Change // to /* .. */
	- Place whitespace around binary operators.
	- Trim down a few long lines.
	- Some minor alignment formatting for better readability.
	- Remove some silly tabs.

Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:54 -07:00
Eric Schneider
0e1de5d62e RDMA/nes: Add support for SFP+ PHY
This patch enables the iw_nes module for NetEffect RNICs to support
additional PHYs including SFP+ (referred to as ARGUS in the code).

Signed-off-by: Eric Schneider <eric.schneider@neteffect.com>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:54 -07:00
Faisal Latif
37dab4112d RDMA/nes: Use LRO
Signed-off-by: Faisal Latif <flatif@neteffect.com.
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:54 -07:00
Eli Cohen
b4132efa1a IPoIB: Copy child MTU from parent
When creating a child interface, copy the MTU information from the
parent.  Otherwise when the child's multicast join completes, the MTU
will not be updated since the code does

	dev->mtu = min(priv->mcast_mtu, priv->admin_mtu);

and priv->admin_mtu will be set to 0.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:53 -07:00
Roland Dreier
baaad380c0 IB/mthca: Avoid changing userspace ABI to handle DMA write barrier attribute
Commit cb9fbc5c ("IB: expand ib_umem_get() prototype") changed the
mthca userspace ABI to provide a way for userspace to indicate which
memory regions need the DMA write barrier attribute.  However, it is
possible to handle this without breaking existing userspace, by having
the mthca kernel driver recognize whether it is talking to old or new
userspace, depending on the size of the register MR structure passed in.

The only potential drawback of this is that is allows old userspace
(which has a bug with DMA ordering on large SGI Altix systems) to
continue to run on new kernels, but the advantage of allowing old
userspace to continue to work on unaffected systems seems to outweigh
this, and we can print a warning to push people to upgrade their
userspace.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:53 -07:00
Olaf Kirch
0bfe151cc4 IB/mthca: Avoid recycling old FMR R_Keys too soon
When a FMR is unmapped, mthca resets the map count to 0, and clears
the upper part of the R_Key which is used as the sequence counter.

This poses a problem for RDS, which uses ib_fmr_unmap as a fence
operation.  RDS assumes that after issuing an unmap, the old R_Keys
will be invalid for a "reasonable" period of time. For instance,
Oracle processes uses shared memory buffers allocated from a pool of
buffers.  When a process dies, we want to reclaim these buffers -- but
we must make sure there are no pending RDMA operations to/from those
buffers.  The only way to achieve that is by using unmap and sync the
TPT.

However, when the sequence count is reset on unmap, there is a high
likelihood that a new mapping will be given the same R_Key that was
issued a few milliseconds ago.

To prevent this, don't reset the sequence count when unmapping a FMR.

Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:53 -07:00
Stefan Roscher
d227fa7288 IB/ehca: Allocate event queue size depending on max number of CQs and QPs
If a lot of QPs fall into Error state at once and the EQ of the
respective HCA is too small, it might overrun, causing the eHCA driver
to stop processing completion events and calling the application's
completion handlers, effectively causing traffic to stop.

Fix this by limiting available QPs and CQs to a customizable max
count, and determining EQ size based on these counts and a worst-case
assumption.

Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:53 -07:00
Eli Cohen
f56bcd8013 IPoIB: Use separate CQ for UD send completions
Use a dedicated CQ for UD send completions. Also, do not arm the UD
send CQ, which reduces the number of interrupts generated.  This patch
farther reduces overhead by not calling poll CQ for every posted send
WR -- it does polls only when there 16 or more outstanding work requests.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:53 -07:00
Eli Dorfman
87528227df IB/iser: Count FMR alignment violations per session
Count FMR alignment violations per session as part of the iscsi
statistics.

Signed-off-by: Eli Dorfman <elid@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:52 -07:00
Eli Dorfman
6f735e36ba IB/iser: Move high-volume debug output to higher debug level
Add another level for debug.

Signed-off-by: Eli Dorfman <elid@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29 13:46:52 -07:00