SCSI updates from James Bottomley:
"The update includes the usual assortment of driver updates (lpfc,
qla2xxx, qla4xxx, bfa, bnx2fc, bnx2i, isci, fcoe, hpsa) plus a huge
amount of infrastructure work in the SAS library and transport class
as well as an iSCSI update. There's also a new SCSI based virtio
driver."
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (177 commits)
[SCSI] qla4xxx: Update driver version to 5.02.00-k15
[SCSI] qla4xxx: trivial cleanup
[SCSI] qla4xxx: Fix sparse warning
[SCSI] qla4xxx: Add support for multiple session per host.
[SCSI] qla4xxx: Export CHAP index as sysfs attribute
[SCSI] scsi_transport: Export CHAP index as sysfs attribute
[SCSI] qla4xxx: Add support to display CHAP list and delete CHAP entry
[SCSI] iscsi_transport: Add support to display CHAP list and delete CHAP entry
[SCSI] pm8001: fix endian issue with code optimization.
[SCSI] pm8001: Fix possible racing condition.
[SCSI] pm8001: Fix bogus interrupt state flag issue.
[SCSI] ipr: update PCI ID definitions for new adapters
[SCSI] qla2xxx: handle default case in qla2x00_request_firmware()
[SCSI] isci: improvements in driver unloading routine
[SCSI] isci: improve phy event warnings
[SCSI] isci: debug, provide state-enum-to-string conversions
[SCSI] scsi_transport_sas: 'enable' phys on reset
[SCSI] libsas: don't recover end devices attached to disabled phys
[SCSI] libsas: fixup target_port_protocols for expanders that don't report sata
[SCSI] libsas: set attached device type and target protocols for local phys
...
The error reported up the stack for a discard failure did not clearly
indicate that the command was processed and subsequently failed by the
target device.
Return -EREMOTEIO so multipathing does not classify this condition as a
path failure.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
This patch fixes the host byte settings DID_TARGET_FAILURE and
DID_NEXUS_FAILURE. The function __scsi_error_from_host_byte, tries to reset
the host byte to DID_OK. But that does not happen because of the OR operation.
Here is the flow.
scsi_softirq_done-> scsi_decide_disposition -> __scsi_error_from_host_byte
Let's take an example with DID_NEXUS_FAILURE. In scsi_decide_disposition,
result will be set as DID_NEXUS_FAILURE (=0x11). Then in
__scsi_error_from_host_byte, when we do OR with DID_OK. Purpose is to reset
it back to DID_OK. But that does not happen. This patch fixes this issue.
Signed-off-by: Babu Moger <babu.moger@netapp.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
The sdev is deleted from starved list and then try to dispatch from this
device. It's quite possible the sdev can't eventually dispatch a request,
then the sdev will be in starved list tail. This isn't fair.
There are two cases here:
1. unplug path. scsi_request_fn() calls to scsi_target_queue_ready(), then
the dev is removed from starved list, but quite possible host queue isn't
ready, the dev is moved to starved list without dispatching any request.
2. scsi_run_queue path. It deletes the dev from starved list first (both
global and local starved lists), then handles the dev. Then we could have
the same process like case 1.
This patch fixes the first case. Case 2 isn't fixed, because there is a
rare case scsi_run_queue finds host isn't busy but scsi_request_fn finds
host is busy (other CPU is faster to get host queue depth). Not deleting
the dev from starved list in scsi_run_queue will keep scsi_run_queue
looping (though this is very rare case, because host will become busy).
Fortunately fixing case 1 already gives big improvement for starvation in
my test. In a 12 disk JBOD setup, running file creation under EXT4, this
gives 12% more throughput.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
When we tear down a device we try to flush all outstanding
commands in scsi_free_queue(). However the check in
scsi_request_fn() is imperfect as it only signals that
we _might start_ aborting commands, not that we've actually
aborted some.
So move the printk inside the scsi_kill_request function,
this will also give us a hint about which commands are aborted.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
Revert "tracing: Include module.h in define_trace.h"
irq: don't put module.h into irq.h for tracking irqgen modules.
bluetooth: macroize two small inlines to avoid module.h
ip_vs.h: fix implicit use of module_get/module_put from module.h
nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
include: replace linux/module.h with "struct module" wherever possible
include: convert various register fcns to macros to avoid include chaining
crypto.h: remove unused crypto_tfm_alg_modname() inline
uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
pm_runtime.h: explicitly requires notifier.h
linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
miscdevice.h: fix up implicit use of lists and types
stop_machine.h: fix implicit use of smp.h for smp_processor_id
of: fix implicit use of errno.h in include/linux/of.h
of_platform.h: delete needless include <linux/module.h>
acpi: remove module.h include from platform/aclinux.h
miscdevice.h: delete unnecessary inclusion of module.h
device_cgroup.h: delete needless include <linux/module.h>
net: sch_generic remove redundant use of <linux/module.h>
net: inet_timewait_sock doesnt need <linux/module.h>
...
Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
- drivers/media/dvb/frontends/dibx000_common.c
- drivers/media/video/{mt9m111.c,ov6650.c}
- drivers/mfd/ab3550-core.c
- include/linux/dmaengine.h
For the basic SCSI infrastructure files that are exporting symbols
but not modules themselves, add in the basic export.h header file
to allow the exports.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
During cable pull tests on our 16G FC adapter, we are seeing errors,
typically reads to close targets, which fail due to CRC or framing
errors caused by the cable being pull (return status DID_ERROR).
The adapter detects the error on one of the first frames received,
marks the FC exchange as dead (further frames go to bit bucket) and
signals the host of the error. This action is so quick, and coupled
with fast host CPUs, creates a scenario in which the midlayer sees
the failure and retries the io almost immediately. We've seen link
traces with the retry on the link while the original i/o is still
being processed by the target. We're also seeing the time window
for the "link to pull-apart" and the physical interface to report
disconnected to be in the few millisecond range. Which means, we're
encountering scenarios where the full retry count is exhausted
(all with error) by the midlayer before the link disconnect state
is detected.
We looked at 8G FC behavior and occasionally see the same behavior,
but as the link was slower, it rarely could exhaust all retries
before the link reported disconnect.
What is needed is a slight delay between io retries due to DID_ERROR
to cover this error. It is inappropriate to put this delay in the
driver, as the error is indistinguishable from other link-related errors,
nor does the driver track whether the io is a retry or not. This is also
easier than tracking between-io-error bursts that are seen in this
scenario.
The patch below updates the retry path so that it inserts a delay as
if the target was busy. The busy delay is on the order of 6ms. This
delay is sufficient to ensure the link down condition is reported
before the retry count is exhausted (at most 1 retry is seen).
Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com>
Signed-off-by: James Smart <james.smart@emulex.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
USB surprise removal of sr is triggering an oops in
scsi_dispatch_command(). What seems to be happening is that USB is
hanging on to a queue reference until the last close of the upper
device, so the crash is caused by surprise remove of a mounted CD
followed by attempted unmount.
The problem is that USB doesn't issue its final commands as part of
the SCSI teardown path, but on last close when the block queue is long
gone. The long term fix is probably to make sr do the teardown in the
same way as sd (so remove all the lower bits on ejection, but keep the
upper disk alive until last close of user space). However, the
current oops can be simply fixed by not allowing any commands to be
sent to a dead queue.
Cc: stable@kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: don't delay blk_run_queue_async
scsi: remove performance regression due to async queue run
blk-throttle: Use task_subsys_state() to determine a task's blkio_cgroup
block: rescan partitions on invalidated devices on -ENOMEDIA too
cdrom: always check_disk_change() on open
block: unexport DISK_EVENT_MEDIA_CHANGE for legacy/fringe drivers
Commit c21e6beb removed our queue request_fn re-enter
protection, and defaulted to always running the queues from
kblockd to be safe. This was a known potential slow down,
but should be safe.
Unfortunately this is causing big performance regressions for
some, so we need to improve this logic. Looking into the details
of the re-enter, the real issue is on requeue of requests.
Requeue of requests upon seeing a BUSY condition from the device
ends up re-running the queue, causing traces like this:
scsi_request_fn()
scsi_dispatch_cmd()
scsi_queue_insert()
__scsi_queue_insert()
scsi_run_queue()
scsi_request_fn()
...
potentially causing the issue we want to avoid. So special
case the requeue re-run of the queue, but improve it to offload
the entire run of local queue and starved queue from a single
workqueue callback. This is a lot better than potentially
kicking off a workqueue run for each device seen.
This also fixes the issue of the local device going into recursion,
since the above mentioned commit never moved that queue run out
of line.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
The recent commit closing the race window in device teardown:
commit 86cbfb5607
Author: James Bottomley <James.Bottomley@suse.de>
Date: Fri Apr 22 10:39:59 2011 -0500
[SCSI] put stricter guards on queue dead checks
is causing a potential NULL deref in scsi_run_queue() because the
q->queuedata may already be NULL by the time this function is called.
Since we shouldn't be running a queue that is being torn down, simply
add a NULL check in scsi_run_queue() to forestall this.
Tested-by: Jim Schutt <jaschut@sandia.gov>
Cc: stable@kernel.org
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
We are currently using this flag to check whether it's safe
to call into ->request_fn(). If it is set, we punt to kblockd.
But we get a lot of false positives and excessive punts to
kblockd, which hurts performance.
The only real abuser of this infrastructure is SCSI. So export
the async queue run and convert SCSI over to use that. There's
room for improvement in that SCSI need not always use the async
call, but this fixes our performance issue and they can fix that
up in due time.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Instead of overloading __blk_run_queue to force an offload to kblockd
add a new blk_run_queue_async helper to do it explicitly. I've kept
the blk_queue_stopped check for now, but I suspect it's not needed
as the check we do when the workqueue items runs should be enough.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
Documentation/iostats.txt: bit-size reference etc.
cfq-iosched: removing unnecessary think time checking
cfq-iosched: Don't clear queue stats when preempt.
blk-throttle: Reset group slice when limits are changed
blk-cgroup: Only give unaccounted_time under debug
cfq-iosched: Don't set active queue in preempt
block: fix non-atomic access to genhd inflight structures
block: attempt to merge with existing requests on plug flush
block: NULL dereference on error path in __blkdev_get()
cfq-iosched: Don't update group weights when on service tree
fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
block: Require subsystems to explicitly allocate bio_set integrity mempool
jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
fs: make fsync_buffers_list() plug
mm: make generic_writepages() use plugging
blk-cgroup: Add unaccounted time to timeslice_used.
block: fixup plugging stubs for !CONFIG_BLOCK
block: remove obsolete comments for blkdev_issue_zeroout.
blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
...
Fix up conflicts in fs/{aio.c,super.c}
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (170 commits)
[SCSI] scsi_dh_rdac: Add MD36xxf into device list
[SCSI] scsi_debug: add consecutive medium errors
[SCSI] libsas: fix ata list corruption issue
[SCSI] hpsa: export resettable host attribute
[SCSI] hpsa: move device attributes to avoid forward declarations
[SCSI] scsi_debug: Logical Block Provisioning (SBC3r26)
[SCSI] sd: Logical Block Provisioning update
[SCSI] Include protection operation in SCSI command trace
[SCSI] hpsa: fix incorrect PCI IDs and add two new ones (2nd try)
[SCSI] target: Fix volume size misreporting for volumes > 2TB
[SCSI] bnx2fc: Broadcom FCoE offload driver
[SCSI] fcoe: fix broken fcoe interface reset
[SCSI] fcoe: precedence bug in fcoe_filter_frames()
[SCSI] libfcoe: Remove stale fcoe-netdev entries
[SCSI] libfcoe: Move FCOE_MTU definition from fcoe.h to libfcoe.h
[SCSI] libfc: introduce __fc_fill_fc_hdr that accepts fc_hdr as an argument
[SCSI] fcoe, libfc: initialize EM anchors list and then update npiv EMs
[SCSI] Revert "[SCSI] libfc: fix exchange being deleted when the abort itself is timed out"
[SCSI] libfc: Fixing a memory leak when destroying an interface
[SCSI] megaraid_sas: Version and Changelog update
...
Fix up trivial conflicts due to whitespace differences in
drivers/scsi/libsas/{sas_ata.c,sas_scsi_host.c}
SBC3r26 contains many changes to the Logical Block Provisioning
interfaces (formerly known as Thin Provisioning ditto). This patch
implements support for both the old and new schemes using the same
heuristic as before (whether the LBP VPD page is present).
The new code also allows the provisioning mode (i.e. choice of command)
to be overridden on a per-device basis via sysfs. Two additional modes
are supported in this version:
- WRITE SAME(10) with the UNMAP bit set
- WRITE SAME(10) without the UNMAP bit set. This allows us to support
devices that predate the TP/LBP enhancements in SBC3 and which work
by way zero-detection
Switching between modes has been consolidated in a helper function that
also updates the block layer topology according to the limitations of
the chosen command.
I experimented with trying WRITE SAME(16) if UNMAP fails, WRITE SAME(10)
if WRITE SAME(16) fails, etc. but found several devices that got
cranky. So for now we'll disable discard if one of the commands
fail. The user still has the option of selecting a different mode in
sysfs.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
When debugging DIF/DIX it is very helpful to be able to see which DIX
operation is associated with the scsi_cmnd. Include the protection op in
the SCSI command trace.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
It was always abuse to reuse the plugging infrastructure for this,
convert it to the (new) real API for delaying queueing a bit. A
default delay of 3 msec is defined, to match the previous
behaviour.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
__blk_run_queue() automatically either calls q->request_fn() directly
or schedules kblockd depending on whether the function is recursed.
blk-flush implementation needs to be able to explicitly choose
kblockd. Add @force_kblockd.
All the current users are converted to specify %false for the
parameter and this patch doesn't introduce any behavior change.
stable: This is prerequisite for fixing ide oops caused by the new
blk-flush implementation.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Instead of just passing 'EIO' for any I/O error we should be
notifying the upper layers with more details about the cause
of this error.
Update the possible I/O errors to:
- ENOLINK: Link failure between host and target
- EIO: Retryable I/O error
- EREMOTEIO: Non-retryable I/O error
- EBADE: I/O error restricted to the I_T_L nexus
'Retryable' in this context means that an I/O error _might_ be
restricted to the I_T_L nexus (vulgo: path), so retrying on another
nexus / path might succeed.
'Non-retryable' in general refers to a target failure, so this
error will always be generated regardless of the I_T_L nexus
it was send on.
I/O errors restricted to the I_T_L nexus might be retried
on another nexus / path, but they should _not_ be queued
if no paths are available.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
* 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits)
block: ensure that completion error gets properly traced
blktrace: add missing probe argument to block_bio_complete
block cfq: don't use atomic_t for cfq_group
block cfq: don't use atomic_t for cfq_queue
block: trace event block fix unassigned field
block: add internal hd part table references
block: fix accounting bug on cross partition merges
kref: add kref_test_and_get
bio-integrity: mark kintegrityd_wq highpri and CPU intensive
block: make kblockd_workqueue smarter
Revert "sd: implement sd_check_events()"
block: Clean up exit_io_context() source code.
Fix compile warnings due to missing removal of a 'ret' variable
fs/block: type signature of major_to_index(int) to major_to_index(unsigned)
block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p)
cfq-iosched: don't check cfqg in choose_service_tree()
fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors
cdrom: export cdrom_check_events()
sd: implement sd_check_events()
sr: implement sr_check_events()
...