Pull SCSI target updates from Nicholas Bellinger:
"Here are the outstanding target pending updates for v4.7-rc1.
The highlights this round include:
- Allow external PR/ALUA metadata path be defined at runtime via top
level configfs attribute (Lee)
- Fix target session shutdown bug for ib_srpt multi-channel (hch)
- Make TFO close_session() and shutdown_session() optional (hch)
- Drop se_sess->sess_kref + convert tcm_qla2xxx to internal kref
(hch)
- Add tcm_qla2xxx endpoint attribute for basic FC jammer (Laurence)
- Refactor iscsi-target RX/TX PDU encode/decode into common code
(Varun)
- Extend iscsit_transport with xmit_pdu, release_cmd, get_rx_pdu,
validate_parameters, and get_r2t_ttt for generic ISO offload
(Varun)
- Initial merge of cxgb iscsi-segment offload target driver (Varun)
The bulk of the changes are Chelsio's new driver, along with a number
of iscsi-target common code improvements made by Varun + Co along the
way"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (29 commits)
iscsi-target: Fix early sk_data_ready LOGIN_FLAGS_READY race
cxgbit: Use type ISCSI_CXGBIT + cxgbit tpg_np attribute
iscsi-target: Convert transport drivers to signal rdma_shutdown
iscsi-target: Make iscsi_tpg_np driver show/store use generic code
tcm_qla2xxx Add SCSI command jammer/discard capability
iscsi-target: graceful disconnect on invalid mapping to iovec
target: need_to_release is always false, remove redundant check and kfree
target: remove sess_kref and ->shutdown_session
iscsi-target: remove usage of ->shutdown_session
tcm_qla2xxx: introduce a private sess_kref
target: make close_session optional
target: make ->shutdown_session optional
target: remove acl_stop
target: consolidate and fix session shutdown
cxgbit: add files for cxgbit.ko
iscsi-target: export symbols
iscsi-target: call complete on conn_logout_comp
iscsi-target: clear tx_thread_active
iscsi-target: add new offload transport type
iscsi-target: use conn_transport->transport_type in text rsp
...
Pull more rdma updates from Doug Ledford:
"This is the second group of code for the 4.7 merge window. It looks
large, but only in one sense. I'll get to that in a minute. The list
of changes here breaks down as follows:
- Dynamic counter infrastructure in the IB drivers
This is a sysfs based code to allow free form access to the
hardware counters RDMA devices might support so drivers don't need
to code this up repeatedly themselves
- SendOnlyFullMember multicast support
- IB router support
- A couple misc fixes
- The big item on the list: hfi1 driver updates, plus moving the hfi1
driver out of staging
There was a group of 15 patches in the hfi1 list that I thought I had
in the first pull request but they weren't. So that added to the
length of the hfi1 section here.
As far as these go, everything but the hfi1 is pretty straight
forward.
The hfi1 is, if you recall, the driver that Al had complaints about
how it used the write/writev interfaces in an overloaded fashion. The
write portion of their interface behaved like the write handler in the
IB stack proper and did bi-directional communications. The writev
interface, on the other hand, only accepts SDMA request structures.
The completions for those structures are sent back via an entirely
different event mechanism.
With the security patch, we put security checks on the write
interface, however, we also knew they would be going away soon. Now,
we've converted the write handler in the hfi1 driver to use ioctls
from the IB reserved magic area for its bidirectional communications.
With that change, Intel has addressed all of the items originally on
their TODO when they went into staging (as well as many items added to
the list later).
As such, I moved them out, and since they were the last item in the
staging/rdma directory, and I don't have immediate plans to use the
staging area again, I removed the staging/rdma area.
Because of the move out of staging, as well as a series of 5 patches
in the hfi1 driver that removed code people thought should be done in
a different way and was optional to begin with (a snoop debug
interface, an eeprom driver for an eeprom connected directory to their
hfi1 chip and not via an i2c bus, and a few other things like that),
the line count, especially the removal count, is high"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (56 commits)
staging/rdma: Remove the entire rdma subdirectory of staging
IB/core: Make device counter infrastructure dynamic
IB/hfi1: Fix pio map initialization
IB/hfi1: Correct 8051 link parameter settings
IB/hfi1: Update pkey table properly after link down or FM start
IB/rdamvt: Fix rdmavt s_ack_queue sizing
IB/rdmavt: Max atomic value should be a u8
IB/hfi1: Fix hard lockup due to not using save/restore spin lock
IB/hfi1: Add tracing support for send with invalidate opcode
IB/hfi1, qib: Add ieth to the packet header definitions
IB/hfi1: Move driver out of staging
IB/hfi1: Do not free hfi1 cdev parent structure early
IB/hfi1: Add trace message in user IOCTL handling
IB/hfi1: Remove write(), use ioctl() for user cmds
IB/hfi1: Add ioctl() interface for user commands
IB/hfi1: Remove unused user command
IB/hfi1: Remove snoop/diag interface
IB/hfi1: Remove EPROM functionality from data device
IB/hfi1: Remove UI char device
IB/hfi1: Remove multiple device cdev
...
In practice, each RDMA device has a unique set of counters that the
hardware implements. Having a central set of counters that they must
all adhere to is limiting and causes many useful counters to not be
available.
Therefore we create a dynamic counter registration infrastructure.
The driver must implement a stats structure allocation routine, in
which the driver must place the directory name it wants, a list of
names for all of the counters, an array of u64 counters themselves,
plus a few generic configuration options.
We then implement a core routine to create a sysfs file for each
of the named stats elements, and a core routine to retrieve the
stats when any of the sysfs attribute files are read.
To avoid excessive beating on the stats generation routine in the
drivers, the core code also caches the stats for a short period of
time so that someone attempting to read all of the stats in a
given device's directory will not result in a stats generation
call per file read.
Future work will attempt to standardize just the shared stats
elements, and possibly add a method to get the stats via netlink
in addition to sysfs.
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
[ Add caching, make structure names more informative, add i40iw support,
other significant rewrites from the original patch ]
The pio map initialization function is off by 1 causing the last
kernel send context that is allocated to not get mapped into the
pio map which leads to the last kernel send context not being used
by any of the qps.
The send context reserved for VL15 is taken care of by setting the
scontext variable that is used as the index into the kernel send
context array to 1 and does not need to be accounted for in the
kernel send context counting loop as it is currently done.
Fix the kernel send context counting loop to account for all the
allocated send contexts and map all of them to the different VLs.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Two 8051 link settings, external device config and tuning method,
were written in the wrong location and the previous settings were
not cleared. For both, clear the old value and write the new
value.
Fixes: 8ebd4cf185 ("staging/rdma/hfi1: Add active and optical cable support")
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When FM is disabled, and the HFI port on the switch is
changed from MgmtAllowed=YES to MgmtAllowed=NO and the
link is bounced, FULL_MGMT_P_KEY doesn't get cleared
from the pkey table. This also occurs when the QSFP
cable is moved from a switch port with MgmtAllowed=YES
to a MgmtAllowed=NO port. Clear pkey entry properly.
Also, when the driver is loaded and the switch port is
set to MgmtAllowed=NO, FULL_MGMT_P_KEY shouldn't be added
to pkey table after FM is started. Only set FULL_MGMT_P_KEY
in the pkey table if switch port is configured to
MgmtAllowed=YES.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
rdmavt allows the driver to specify the size of the ack queue, but
only uses it for the modify QP limit testing for setting the atomic
limit value.
The driver dependent size is now used to size the s_ack_queue ring
dynamicially.
Since the driver knows its size, the driver will use its define
for any ring size dependent code.
Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Commit b9b06cb6fe
("IB/hfi1: Fix missing lock/unlock in verbs drain callback")
added a spin lock.
Unfortunately, the new lock code can be called from a base
level interrupt state, and an interrupt that can get stacked
will attempt to get the same lock.
Fix by using the flag save/restore spin lock variation.
Cc: stable@vger.kernel.org # 4.6+
Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
A new union member "ieth" (Invalidate Extended Transport Header) is
added to the packet header definition in preparation of supporting
the send with invalidate opcode.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The TODO list for the hfi1 driver was completed during 4.6. In addition
other objections raised (which are far beyond what was in the TODO list)
have been addressed as well. It is now time to remove the driver from
staging and into the drivers/infiniband sub-tree.
Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Building the qib driver with gcc version 6.1.0 raises the following
build warning:
drivers/infiniband/hw/qib/qib_iba7322.c:1311:39: warning:
'qib_7322_intr_msgs' defined but not used [-Wunused-const-variable=]
static const struct qib_hwerror_msgs qib_7322_intr_msgs[] = {
^~~~~~~~~~~~~~~~~~
Remove the unused qib_7322_intr_msgs[]
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Use kzalloc_node instead of kzalloc for rdmavt memory region segment
allocation to optimize for performance on NUMA platforms.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In IB networks, and specifically in IPoIB/rdmacm traffic, the device
address of an IPoIB interface is used as a means to exchange information
between nodes needed for communication.
Currently an IPoIB interface will always be created with a device
address based on its node GUID without a way to change that.
This change adds the ability to set the device address of an IPoIB
interface by value. We use the set mac address ndo to do that.
The flow should be broken down to two:
1) The GID value is already in the GID table,
in this case the interface will be able to set carrier up.
2) The GID value is not yet in the GID table,
in this case the interface won't try to join the multicast group
and will wait (listen on GID_CHANGE event) until the GID is inserted.
In order to track those changes, we add a new flag:
* IPOIB_FLAG_DEV_ADDR_SET.
When set, it means the dev_addr is a based on a value in the gid
table. this bit will be cleared upon a dev_addr change triggered
by the user and set after validation.
Per IB spec the port GUID can't change if the module is loaded.
port GUID is the basis for GID at index 0 which is the basis for
the default device address of a ipoib interface.
The issue is that there are devices that don't follow the spec,
they change the port GUID while HCA is powered on, so in order
not to break userspace applications. We need to check if the
user wanted to control the device address and we assume that
if he sets the device address back to be based on GID index 0,
he no longer wishs to control it.
In order to track this, we add an additional flag:
* IPOIB_FLAG_DEV_ADDR_CTRL
When setting the device address, there is no validation of the upper
twelve bytes of the device address (flags, qpn, subnet prefix) as those
bytes are not under the control of the user.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Check (via an SA query) if the SM supports the new option for SendOnly
multicast joins.
If the SM supports that option it will use the new join state to create
such multicast group.
If SendOnlyFullMember is supported, we wouldn't use faked FullMember state
join for SendOnly MCG, use the correct state if supported.
This check is performed at every invocation of mcast_restart task, to be
sure that the driver stays in sync with the current state of the SM.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
There are four types for MCG, FullMember, NonMember, SendOnlyNonMember,
and the new added type: SendOnlyFullMember.
Add support for the new SendOnlyFullMember join state.
The new type allows host to send join request as sendonly, it will cause the
group to be created but without getting packets from this multicast back to the
host.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
New SA query function to return the ClassPortInfo struct from the SA.
If the SM supports FullMemberSendOnly mode for MCG's, it sets a
capability bit in the capability_mask2 field of the response.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Change struct ib_class_port_info to conform to IB Spec 1.3
That in order to get specific capability mask from ClassPortInfo mad.
>From the IB Spec, ClassPortInfo section:
"CapabilityMask2 Bits 0-26: Additional class-specific capabilities...
RespTimeValue the rest 5 bits"
The new struct now has one field for capabilitymask2 (previously was the
reserved field) and the resp_time field.
And it fixes up qib and srpt, use of the field repurposed to be used as
capabilitymask2:
IB/qib: Change pma_get_classportinfo
IB/srpt: Adjust the use of ib_class_port_info
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
There is an assumption that rdmacm is used only between nodes
in the same IB subnet, this why ARP resolution can be used to turn
IP to GID in rdmacm.
When dealing with IB communication between subnets this assumption
is no longer valid. ARP resolution will get us the next hop device
address and not the peer node's device address.
To solve this issue, we will check user space if it can provide the
GID of the peer node, and fail if not.
We add a sequence number to identify each request and fill in the GID
upon answer from userspace.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Move SA ibnl client registration to ib_core module init.
This will allow us to register a single client to handle
all RDMA_NL_LS operations and make it SA independent.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Consolidate ib_sa into ib_core, this commit eliminates
ib_sa.ko and makes it part of ib_core.ko
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Consolidate ib_mad into ib_core, this commit eliminates
ib_mad.ko and makes it part of ib_core.ko
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>