Because an XRC TGT QP can end up being shared among multiple
processes, don't have the ib_cm automatically send a DREQ when the
userspace process that owns the ib_cm_id exits. Disconnect can be
initiated by the user directly; otherwise, the owner of the XRC INI QP
controls the connection.
Note that as a result of the process exiting, the ib_cm will stop
tracking the XRC connection on the target side. For the purposes of
disconnecting, this isn't a big deal. The ib_cm will respond to the
DREQ appropriately. For other messages, mainly LAP, the CM will
reject the request, since there's no one available to route the
request to.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The XRC annex was updated to have XRC behave more like RD. Specifically,
the XRC TGT QPN moves from the local QPN to local EECN field. Lookup of
SRQN is done using the REQ/REP protocol.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Update the REQ and REP messages to support XRC connection setup
according to the XRC Annex. Several existing fields must be set to 0 or
1 when connecting XRC QPs, and a reserved field is changed to an
extended transport type.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Commits 71c29bd5c2 ("IB/uverbs: Add devnode method to set path/mode")
and c3af0980ce ("IB: Add devnode methods to cm_class and umad_class")
added devnode methods that set the mode.
However, these methods don't check for a NULL mode, and so we get a
crash when unloading modules because devtmpfs_delete_node() calls
device_get_devnode() with mode == NULL.
Add the missing checks.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de>
[ Also fix cm.c. - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We want the ucmX, umadX and issmX device nodes to show up under
/dev/infiniband, and additionally ucmX should have mode 0666. Add
appropriate devnode methods to their class structs for this.
Signed-off-by: Roland Dreier <roland@purestorage.com>
This problem was reported by Moni Shoua <monis@mellanox.com> and Amir
Vadai <amirv@mellanox.com>:
When destroying a cm_id from a context of a work queue and if
the lap_state of this cm_id is IB_CM_LAP_SENT, we need to
release the reference of this id that was taken upon the send
of the LAP message. Otherwise, if the expected APR message
gets lost, it is only after a long time that the reference
will be released, while during that the work handler thread is
not available to process other things.
It turns out that we need to cancel any pending LAP messages whenever
we transition out of the IB_CM_ESTABLISH state. This occurs when
disconnecting - either sending or receiving a DREQ. It can also
happen in a corner case where we receive a REJ message after sending
an RTU, followed by a LAP. Add checks and cancel any outstanding LAP
messages in these three cases.
Canceling the LAP when sending a DREQ fixes the destroy problem
reported by Moni. When a cm_id is destroyed in the IB_CM_ESTABLISHED
state, it sends a DREQ to the remote side to notify the peer that the
connection is going away.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
When processing a SIDR REQ, the ib_cm allocates a new cm_id. The
refcount of the cm_id is initialized to 1. However, cm_process_work
will decrement the refcount after invoking all callbacks. The result
is that the cm_id will end up with refcount set to 0 by the end of the
sidr req handler.
If a user tries to destroy the cm_id, the destruction will proceed,
under the incorrect assumption that no other threads are referencing
the cm_id. This can lead to a crash when the cm callback thread tries
to access the cm_id.
This problem was noticed as part of a larger investigation with kernel
crashes in the rdma_cm when running on a real time OS.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
NULL pointer dereferences in ib_cm_init_qp_attr() were seen by some
users. From a crash dump, I determined that we died in
cm_init_qp_rts_attr() (it's inlined, so it doesn't show up in the
traceback) on the line labeled below:
static int cm_init_qp_rts_attr(struct cm_id_private *cm_id_priv,
struct ib_qp_attr *qp_attr,
int *qp_attr_mask)
{
........
if (cm_id_priv->id.lap_state == IB_CM_LAP_UNINIT) {
.....
} else {
*qp_attr_mask = IB_QP_ALT_PATH | IB_QP_PATH_MIG_STATE;
qp_attr->alt_port_num = cm_id_priv->alt_av.port->port_num; <-die
The problem is that the rdma_cm can call ib_send_cm_mra() after a
connection has been established. The ib_cm incorrectly assumes that
the MRA is in response to a LAP (load alternate path) message, even
though no LAP message has been received. The ib_cm needs to check the
lap_state before sending an MRA if the cm_id state is established.
Reported-by: Arthur Kepner <akepner@sgi.com>
Reported-by: Josh England <jjengla@gmail.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/mlx4: Check correct variable for allocation failure
RDMA/nes: Correct cap.max_inline_data assignment in nes_query_qp()
RDMA/cm: Set num_paths when manually assigning path records
IB/cm: Fix device_create() return value check
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Constify struct sysfs_ops.
This is part of the ops structure constification
effort started by Arjan van de Ven et al.
Benefits of this constification:
* prevents modification of data that is shared
(referenced) by many other structure instances
at runtime
* detects/prevents accidental (but not intentional)
modification attempts on archs that enforce
read-only kernel data at runtime
* potentially better optimized code as the compiler
can assume that the const data cannot be changed
* the compiler/linker move const data into .rodata
and therefore exclude them from false sharing
Signed-off-by: Emese Revfy <re.emese@gmail.com>
Acked-by: David Teigland <teigland@redhat.com>
Acked-by: Matt Domsch <Matt_Domsch@dell.com>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Acked-by: Hans J. Koch <hjk@linutronix.de>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The base versions handle constant folding just fine, use them
directly. The replacements are OK in the include/ files as they are
not exported to userspace so we don't need the __ prefixed versions.
This patch does not affect code generation at all.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Tejun's commit 7b595756ec made sysfs
attribute->owner unnecessary. But the field was left in the structure to
ease the merge. It's been over a year since that change and it is now
time to start killing attribute->owner along with its users - one arch at
a time!
This patch is attempt #1 to get rid of attribute->owner only for
CONFIG_X86_64 or CONFIG_X86_32 . We will deal with other arches later on
as and when possible - avr32 will be the next since that is something I
can test. Compile (make allyesconfig / make allmodconfig / custom config)
and boot tested.
akpm: the idea is that we put the declaration of sttribute.owner inside
`#ifndef CONFIG_X86'. But that proved to be too ambitious for now because
new usages kept on turning up in subsystem trees.
[akpm: remove the ifdef for now]
Signed-off-by: Parag Warudkar <parag.lkml@gmail.com>
Cc: Greg KH <greg@kroah.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: David Brownell <david-b@pacbell.net>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
commit 110cf374 ("infiniband: make cm_device use a struct device and
not a kobject.") introduced a memory leak, since it deleted
cm_release_dev_obj(), which was where cm_dev was freed. Fix this by
freeing the leaked structure after calling device_unregister().
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This object really should be a struct device, or at least contain a
pointer to a struct device, as it is trying to create a separate device
tree outside of the main device tree. This patch fixes this problem.
It is needed for the class core rework that is being done in the driver
core.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This pointer really is a struct ib_device, not a struct device, so name
it properly to help prevent confusion.
This makes the followon patch in this series much smaller and easier to
understand as well.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Mostly update the RB tree comparisons to force __be types to normal
integers, but the change to cm_format_sidr_req() is a real fix:
param->path->pkey is already __be16.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
When a CM MAD is received, it is queued to a CM workqueue for
processing. The queued work item references the port and device on
which the MAD was received. If that device is removed from the system
before the work item can execute, the work item will reference freed
memory.
To fix this, flush the workqueue after unregistering to receive MAD,
and before the device is be freed.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Commit 9af57b7a ("IB/cm: Add basic performance counters") introduced a
bug in how the reference count for cm_class.subsys.kobj was handled:
the path that released a device did a kobject_put() on that kobject, but
there was no kobject_get() in the path the handles adding a device. So
the reference count ended up too low, which leads to bad things. Fix up
and simplify the reference counting to avoid this.
(Actually, I introduced the bug when fixing the patch up to match some
of Greg's kobject changes, but who's counting)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Paths with hop_limit > 1 indicate that the connection will be routed
between IB subnets. Update the subnet local field in the CM REQ based
on the hop_limit value. In addition, if the path is routed, then set
the LIDs in the REQ to the permissive LIDs. This is used to indicate
to the passive side that it should use the LIDs in the received local
route header (LRH) associated with the REQ when programming the QP.
This is a temporary work-around to the IB CM to support IB router
development until the IB router specification is completed. It is not
anticipated that this work-around will cause any interoperability
issues with existing stacks or future stacks that will properly
support IB routers when defined.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>