Commit Graph

140 Commits

Author SHA1 Message Date
Aleksey Senin
a2ebf07ae5 IB: Rename RAW_ETY to RAW_ETHERTYPE
Change abbreviated IB_QPT_RAW_ETY to IB_QPT_RAW_ETHERTYPE to make
the special QP type easier to understand.

cf http://www.mail-archive.com/linux-rdma@vger.kernel.org/msg04530.html

Signed-off-by: Aleksey Senin <alekseys@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-08-04 10:44:19 -07:00
Ralph Campbell
9a6edb60ec IB/core: Allow device-specific per-port sysfs files
Add a new parameter to ib_register_device() so that low-level device
drivers can pass in a pointer to a callback function that will be
called for each port that is registered in sysfs.  This allows
low-level device drivers to create files in

    /sys/class/infiniband/<hca>/ports/<N>/

without having to poke through the internals of the RDMA sysfs handling.

There is no need for an unregister function since the kobject
reference will go to zero when ib_unregister_device() is called.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-05-21 10:34:44 -07:00
Vladimir Sokolovsky
5e80ba8ff0 IB/core: Add support for masked atomic operations
- Add new IB_WR_MASKED_ATOMIC_CMP_AND_SWP and IB_WR_MASKED_ATOMIC_FETCH_AND_ADD
   send opcodes that can be used to post "masked atomic compare and
   swap" and "masked atomic fetch and add" work request respectively.
 - Add masked_atomic_cap capability.
 - Add mask fields to atomic struct of ib_send_wr
 - Add new opcodes to ib_wc_opcode

The new operations are described more precisely below:

* Masked Compare and Swap (MskCmpSwap)

The MskCmpSwap atomic operation is an extension to the CmpSwap
operation defined in the IB spec.  MskCmpSwap allows the user to
select a portion of the 64 bit target data for the “compare” check as
well as to restrict the swap to a (possibly different) portion.  The
pseudo code below describes the operation:

| atomic_response = *va
| if (!((compare_add ^ *va) & compare_add_mask)) then
|     *va = (*va & ~(swap_mask)) | (swap & swap_mask)
|
| return atomic_response

The additional operands are carried in the Extended Transport Header.
Atomic response generation and packet format for MskCmpSwap is as for
standard IB Atomic operations.

* Masked Fetch and Add (MFetchAdd)

The MFetchAdd Atomic operation extends the functionality of the
standard IB FetchAdd by allowing the user to split the target into
multiple fields of selectable length. The atomic add is done
independently on each one of this fields. A bit set in the
field_boundary parameter specifies the field boundaries. The pseudo
code below describes the operation:

| bit_adder(ci, b1, b2, *co)
| {
|	value = ci + b1 + b2
|	*co = !!(value & 2)
|
|	return value & 1
| }
|
| #define MASK_IS_SET(mask, attr)      (!!((mask)&(attr)))
| bit_position = 1
| carry = 0
| atomic_response = 0
|
| for i = 0 to 63
| {
|         if ( i != 0 )
|                 bit_position =  bit_position << 1
|
|         bit_add_res = bit_adder(carry, MASK_IS_SET(*va, bit_position),
|                                 MASK_IS_SET(compare_add, bit_position), &new_carry)
|         if (bit_add_res)
|                 atomic_response |= bit_position
|
|         carry = ((new_carry) && (!MASK_IS_SET(compare_add_mask, bit_position)))
| }
|
| return atomic_response

Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-04-21 16:37:48 -07:00
Roland Dreier
fe8875e5a4 Merge branch 'misc' into for-next
Conflicts:
	drivers/infiniband/core/uverbs_main.c
2010-03-01 23:52:31 -08:00
Roland Dreier
e8094e667a Merge branch 'cma' into for-next 2010-03-01 23:51:54 -08:00
Eli Cohen
920d706c89 IB/core: Fix and clean up ib_ud_header_init()
ib_ud_header_init() first clears header and then fills up the various
fields.  Later on, it tests header->immediate_present, which it has
already cleared, so the condition is always false.  Fix this by adding
an immediate_present parameter and setting header->immediate_present
as is done with grh_present.  Also remove unused calculation of
header_len.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 14:54:10 -08:00
Alexander Chiang
17a55f79fd IB/core: Pack struct ib_device a little tighter
A small change to reduce the size of ib_device to 1112 bytes
(from 1128).

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-24 10:23:49 -08:00
Sean Hefty
cf4f7e8c47 RDMA/cm: Remove unused definition of RDMA_PS_SCTP
The defined SCTP number is incorrect (0x83, rather than 0x84), and
since it is not used anywhere, simply remove the definition.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-02-11 15:40:25 -08:00
Roland Dreier
14f369d1d6 Merge branches 'amso1100', 'cma', 'cxgb3', 'ehca', 'ipath', 'ipoib', 'iser', 'misc', 'mlx4' and 'nes' into for-next 2009-12-15 23:39:25 -08:00
Bart Van Assche
55464d461b IB: Clarify the documentation of ib_post_send()
Clarify the behavior of ib_post_send() when a list of work requests is
passed in and an immediate error is returned.

Signed-off-by: Bart Van Assche <bart.vanassche@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-12-09 14:20:04 -08:00
Sean Hefty
6f8372b69c RDMA/cm: fix loopback address support
The RDMA CM is intended to support the use of a loopback address
when establishing a connection; however, the behavior of the CM
when loopback addresses are used is confusing and does not always
work, depending on whether loopback was specified by the server,
the client, or both.

The defined behavior of rdma_bind_addr is to associate an RDMA
device with an rdma_cm_id, as long as the user specified a non-
zero address.  (ie they weren't just trying to reserve a port)
Currently, if the loopback address is passed to rdam_bind_addr,
no device is associated with the rdma_cm_id.  Fix this.

If a loopback address is specified by the client as the destination
address for a connection, it will fail to establish a connection.
This is true even if the server is listing across all addresses or
on the loopback address itself.  The issue is that the server tries
to translate the IP address carried in the REQ message to a local
net_device address, which fails.  The translation is not needed in
this case, since the REQ carries the actual HW address that should
be used.

Finally, cleanup loopback support to be more transport neutral.
Replace separate calls to get/set the sgid and dgid from the
device address to a single call that behaves correctly depending
on the format of the device address.  And support both IPv4 and
IPv6 address formats.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

[ Fixed RDS build by s/ib_addr_get/rdma_addr_get/  - Roland ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-11-19 13:26:06 -08:00
Sean Hefty
c4315d85f9 IB/addr: Store net_device type instead of translating to RDMA transport
The struct rdma_dev_addr stores net_device address information:
the source device address, destination hardware address, and
broadcast address.  For consistency, store the net_device type
rather than converting it to the rdma_node_type.

The type indicates the format of the various hardware addresses,
which is what we're concerned with, and not the RDMA node type
that the address may map to.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-11-19 12:57:18 -08:00
Sean Hefty
6266ed6e41 RDMA/cma: Replace net_device pointer with index
Provide the device interface when resolving route information to
ensure that the correct outbound device is used.  This will also
simplify processing of sin6_scope_id for IPv6 support.

Based on work from:
David Wilder <dwilder@us.ibm.com>
Jason Gunthorpe <jgunthrope@obsidianresearch.com>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-11-19 12:55:22 -08:00
Sean Hefty
a7ca1f00ed RDMA/ucma: Add option to manually set IB path
Export rdma_set_ib_paths to user space to allow applications to
manually set the IB path used for connections.  This allows
alternative ways for a user space application or library to obtain
path record information, including retrieving path information
from cached data, avoiding direct interaction with the IB SA.
The IB SA is a single, centralized entity that can limit scaling
on large clusters running MPI applications.

Future changes to the rdma cm can expand on this framework to
support the full range of features allowed by the IB CM, such as
separate forward and reverse paths and APM.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-11-16 09:30:33 -08:00
Anand Gadiyar
fd589a8f0a trivial: fix typo "to to" in multiple files
Signed-off-by: Anand Gadiyar <gadiyar@ti.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-09-21 15:14:55 +02:00
Linus Torvalds
13220a94d3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1750 commits)
  ixgbe: Allow Priority Flow Control settings to survive a device reset
  net: core: remove unneeded include in net/core/utils.c.
  e1000e: update version number
  e1000e: fix close interrupt race
  e1000e: fix loss of multicast packets
  e1000e: commonize tx cleanup routine to match e1000 & igb
  netfilter: fix nf_logger name in ebt_ulog.
  netfilter: fix warning in ebt_ulog init function.
  netfilter: fix warning about invalid const usage
  e1000: fix close race with interrupt
  e1000: cleanup clean_tx_irq routine so that it completely cleans ring
  e1000: fix tx hang detect logic and address dma mapping issues
  bridge: bad error handling when adding invalid ether address
  bonding: select current active slave when enslaving device for mode tlb and alb
  gianfar: reallocate skb when headroom is not enough for fcb
  Bump release date to 25Mar2009 and version to 0.22
  r6040: Fix second PHY address
  qeth: fix wait_event_timeout handling
  qeth: check for completion of a running recovery
  qeth: unregister MAC addresses during recovery.
  ...

Manually fixed up conflicts in:
	drivers/infiniband/hw/cxgb3/cxio_hal.h
	drivers/infiniband/hw/nes/nes_nic.c
2009-03-26 15:54:36 -07:00
Roland Dreier
09f98bafea Merge branches 'cxgb3', 'endian', 'ipath', 'ipoib', 'iser', 'mad', 'misc', 'mlx4', 'mthca', 'nes' and 'sysfs' into for-next 2009-03-24 20:44:41 -07:00
Ramachandra K
7020cb0fe2 IB/mad: Fix RMPP header RRespTime manipulation
Fix ib_set_rmpp_flags() to use the correct bit mask for RRespTime.  In
the 8-bit field of the RMPP header, the first 5 bits are RRespTime and
next 3 bits are RMPPFlags. Hence to retain the first 5 bits, the mask
should be 0xF8 instead of 0xF1.

ack_recv()-->format_ack() calls ib_set_rmpp_flags() and due to the
incorrect ANDing with 0xF1, RRespTime got changed incorrectly and RMPP
Acks sent back always had a RRespTime of 0x1E (30) which caused the
other end to consider the time outs to be approximately 4297 seconds
(i.e. in the order of 4*2^30) instead of the usual ~4 seconds (order
of 4*2^20).

Signed-off-by: Ramachandra K <ramachandra.kuchimanchi@qlogic.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-02-27 10:33:12 -08:00
Harvey Harrison
f3a7c66b5c net: replace __constant_{endian} uses in net headers
Base versions handle constant folding now.  For headers exposed to
userspace, we must only expose the __ prefixed versions.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-14 22:58:35 -08:00
Harvey Harrison
9c3da09917 IB: Remove __constant_{endian} uses
The base versions handle constant folding just fine, use them
directly.  The replacements are OK in the include/ files as they are
not exported to userspace so we don't need the __ prefixed versions.

This patch does not affect code generation at all.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-01-17 17:11:57 -08:00
Roland Dreier
3f44675439 RDMA/cma: Remove padding arrays by using struct sockaddr_storage
There are a few places where the RDMA CM code handles IPv6 by doing

	struct sockaddr		addr;
	u8			pad[sizeof(struct sockaddr_in6) -
				    sizeof(struct sockaddr)];

This is fragile and ugly; handle this in a better way with just

	struct sockaddr_storage	addr;

[ Also roll in patch from Aleksey Senin <alekseys@voltaire.com> to
  switch to struct sockaddr_storage and get rid of padding arrays in
  struct rdma_addr. ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-08-04 11:02:14 -07:00
FUJITA Tomonori
8d8bb39b9e dma-mapping: add the device argument to dma_mapping_error()
Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER
architecture does:

This enables us to cleanly fix the Calgary IOMMU issue that some devices
are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423).

I think that per-device dma_mapping_ops support would be also helpful for
KVM people to support PCI passthrough but Andi thinks that this makes it
difficult to support the PCI passthrough (see the above thread).  So I
CC'ed this to KVM camp.  Comments are appreciated.

A pointer to dma_mapping_ops to struct dev_archdata is added.  If the
pointer is non NULL, DMA operations in asm/dma-mapping.h use it.  If it's
NULL, the system-wide dma_ops pointer is used as before.

If it's useful for KVM people, I plan to implement a mechanism to register
a hook called when a new pci (or dma capable) device is created (it works
with hot plugging).  It enables IOMMUs to set up an appropriate
dma_mapping_ops per device.

The major obstacle is that dma_mapping_error doesn't take a pointer to the
device unlike other DMA operations.  So x86 can't have dma_mapping_ops per
device.  Note all the POWER IOMMUs use the same dma_mapping_error function
so this is not a problem for POWER but x86 IOMMUs use different
dma_mapping_error functions.

The first patch adds the device argument to dma_mapping_error.  The patch
is trivial but large since it touches lots of drivers and dma-mapping.h in
all the architecture.

This patch:

dma_mapping_error() doesn't take a pointer to the device unlike other DMA
operations.  So we can't have dma_mapping_ops per device.

Note that POWER already has dma_mapping_ops per device but all the POWER
IOMMUs use the same dma_mapping_error function.  x86 IOMMUs use device
argument.

[akpm@linux-foundation.org: fix sge]
[akpm@linux-foundation.org: fix svc_rdma]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix bnx2x]
[akpm@linux-foundation.org: fix s2io]
[akpm@linux-foundation.org: fix pasemi_mac]
[akpm@linux-foundation.org: fix sdhci]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix sparc]
[akpm@linux-foundation.org: fix ibmvscsi]
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:03 -07:00
Amir Vadai
38ca83a588 RDMA/cma: Add RDMA_CM_EVENT_TIMEWAIT_EXIT event
Consumers that want to re-use their QPs in new connections need to
know when the QP has exited the timewait state.  Report the timewait
event through the rdma_cm.

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-22 14:14:23 -07:00
Or Gerlitz
dd5bdff83b RDMA/cma: Add RDMA_CM_EVENT_ADDR_CHANGE event
Add an RDMA_CM_EVENT_ADDR_CHANGE event can be used by rdma-cm
consumers that wish to have their RDMA sessions always use the same
links (eg <hca/port>) as the IP stack does.  In the current code, this
does not happen when bonding is used and fail-over happened but the IB
link used by an already existing session is operating fine.

Use the netevent notification for sensing that a change has happened
in the IP stack, then scan the rdma-cm ID list to see if there is an
ID that is "misaligned" with respect to the IP stack, and deliver
RDMA_CM_EVENT_ADDR_CHANGE for this ID.  The consumer can act on the
event or just ignore it.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-22 14:14:22 -07:00
Or Gerlitz
64c5e613b9 RDMA/addr: Keep pointer to netdevice in struct rdma_dev_addr
Keep a pointer to the local (src) netdevice in struct rdma_dev_addr,
and copy it in as part of rdma_copy_addr().  Use rdma_translate_ip()
in cma_new_conn_id() to reduce some code duplication and also make
sure the src_dev member gets set.

In a high-availability configuration the netdevice pointer can be used
by the RDMA CM to align RDMA sessions to use the same links as the IP
stack does under fail-over and route change cases.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:53 -07:00