These functions were to cope with differently ordered
sg entries depending on RDS 3.0 or 3.1+. Now that
we've dropped 3.0 compatibility we no longer need them.
Also, modify usage sites for these to refer to sge[0] or [1]
directly. Reorder code to initialize header sgs first.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
RDS 3.0 connections (in OFED 1.3 and earlier) put the
header at the end. 3.1 connections put it at the head.
The code has significant added complexity in order to
handle both configurations. In OFED 1.6 we can
drop this and simplify the code by only supporting
"header-first" configuration.
This patch checks the protocol version, and if prior
to 3.1, does not complete the connection.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
both atomics and rdmas need to convert ib-specific completion codes
into RDS status codes. Rename rds_ib_rdma_send_complete to
rds_ib_send_complete, and have it take a pointer to the function to
call with the new error code.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Instead of using a constant for initiator_depth and
responder_resources, read the per-QP values when the
device is enumerated, and then use these values when creating
the connection.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Implement a CMSG-based interface to do FADD and CSWP ops.
Alter send routines to handle atomic ops.
Add atomic counters to stats.
Add xmit_atomic() to struct rds_transport
Inline rds_ib_send_unmap_rdma into unmap_rm
Signed-off-by: Andy Grover <andy.grover@oracle.com>
The previous code was correct, but made the assumption that
if r_notifier was non-NULL then either r_recverr or r_notify
was true. Valid, but fragile. Changed to explicitly check
r_recverr (shows up in greps for recverr now, too.)
Signed-off-by: Andy Grover <andy.grover@oracle.com>
rds_message_alloc_sgs() now returns correctly-initialized
sg lists, so calleds need not do this themselves.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
This eliminates a separate memory alloc, although
it is now necessary to add an "r_active" flag, since
it is no longer to use the m_rdma_op pointer as an
indicator of if an rdma op is present.
rdma SGs allocated from rm sg pool.
rds_rm_size also gets bigger. It's a little inefficient to
run through CMSGs twice, but it makes later steps a lot smoother.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
r_m_copy_from_user used to allocate the rm as well as kernel
buffers for the data, and then copy the data in. Now, sendmsg()
allocates the rm, although the data buffer alloc still happens
in r_m_copy_from_user.
SGs are still allocated with rm, but now r_m_alloc_sgs() is
used to reserve them. This allows multiple SG lists to be
allocated from the one rm -- this is important once we also
want to alloc our rdma sgl from this pool.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
First, it looks to me like the atomic_inc is wrong.
We should be decrementing refcount only once here, no? It's
already being done by the mr_put() at the end.
Second, simplify the logic a bit by bailing early (with a warning)
if !mr.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Clearly separate rdma-related variables in rm from data-related ones.
This is in anticipation of adding atomic support.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
This function has been the source of numerous bugs; it's just
too complicated. Simplified to nest spinlocks cleanly within
the second loop body, and kick out early if there are no
rms to drop.
This will be a little slower because conn lock is grabbed for
each entry instead of "caching" the lock across rms, but this
should be entirely irrelevant to fastpath performance.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
On second look at this bug (OFED #2002), it seems that the
collision is not with the retransmission queue (packet acked
by the peer), but with the local send completion. A theoretical
sequence of events (from time t0 to t3) is thought to be as
follows,
Thread #1
t0:
sock_release
rds_release
rds_send_drop_to /* wait on send completion */
t2:
rds_rdma_drop_keys() /* destroy & free all mrs */
Thread #2
t1:
rds_ib_send_cq_comp_handler
rds_ib_send_unmap_rm
rds_message_unmapped /* wake up #1 @ t0 */
t3:
rds_message_put
rds_message_purge
rds_mr_put /* memory corruption detected */
The problem with the rds_rdma_drop_keys() is it could
remove a mr's refcount more than its due (i.e. repeatedly
as long as it still remains in the tree (mr->r_refcount > 0)).
Theoretically it should remove only one reference - reference
by the tree.
/* Release any MRs associated with this socket */
while ((node = rb_first(&rs->rs_rdma_keys))) {
mr = container_of(node, struct rds_mr, r_rb_node);
if (mr->r_trans == rs->rs_transport)
mr->r_invalidate = 0;
rds_mr_put(mr);
}
I think the correct way of doing it is to remove the mr from
the tree and rds_destroy_mr it first, then a rds_mr_put()
to decrement its reference count by one. Whichever thread
holds the last reference will free the mr via rds_mr_put().
Signed-off-by: Tina Yang <tina.yang@oracle.com>
Signed-off-by: Andy Grover <andy.grover@oracle.com>
in_interrupt() is true in softirqs. The BUG_ONs are supposed
to check for if irqs are disabled, so we should use
BUG_ON(irqs_disabled()) instead, duh.
Signed-off-by: Andy Grover <andy.grover@oracle.com>