Commit Graph

2822 Commits

Author SHA1 Message Date
Linus Torvalds
f66c83d059 Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull first round of SCSI updates from James Bottomley:
 "This patch set is a set of driver updates (ufs, zfcp, lpfc, mpt2/3sas,
  qla4xxx, qla2xxx [adding support for ISP8044 + other things]).

  We also have a new driver: esas2r which has a number of static checker
  problems, but which I expect to resolve over the -rc course of 3.12
  under the new driver exception.

  We also have the error return that were discussed at LSF"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (118 commits)
  [SCSI] sg: push file descriptor list locking down to per-device locking
  [SCSI] sg: checking sdp->detached isn't protected when open
  [SCSI] sg: no need sg_open_exclusive_lock
  [SCSI] sg: use rwsem to solve race during exclusive open
  [SCSI] scsi_debug: fix logical block provisioning support when unmap_alignment != 0
  [SCSI] scsi_debug: fix endianness bug in sdebug_build_parts()
  [SCSI] qla2xxx: Update the driver version to 8.06.00.08-k.
  [SCSI] qla2xxx: print MAC via %pMR.
  [SCSI] qla2xxx: Correction to message ids.
  [SCSI] qla2xxx: Correctly print out/in mailbox registers.
  [SCSI] qla2xxx: Add a new interface to update versions.
  [SCSI] qla2xxx: Move queue depth ramp down message to i/o debug level.
  [SCSI] qla2xxx: Select link initialization option bits from current operating mode.
  [SCSI] qla2xxx: Add loopback IDC-TIME-EXTEND aen handling support.
  [SCSI] qla2xxx: Set default critical temperature value in cases when ISPFX00 firmware doesn't provide it
  [SCSI] qla2xxx: QLAFX00 make over temperature AEN handling informational, add log for normal temperature AEN
  [SCSI] qla2xxx: Correct Interrupt Register offset for ISPFX00
  [SCSI] qla2xxx: Remove handling of Shutdown Requested AEN from qlafx00_process_aen().
  [SCSI] qla2xxx: Send all AENs for ISPFx00 to above layers.
  [SCSI] qla2xxx: Add changes in initialization for ISPFX00 cards with BIOS
  ...
2013-09-03 15:48:06 -07:00
Hannes Reinecke
7e782af576 [SCSI] Return ENODATA on medium error
When a medium error is detected the SCSI stack should return
ENODATA to the upper layers.

[jejb: fix whitespace error]
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-08-23 12:54:53 -04:00
Geert Uytterhoeven
b936bf8b78 dm cache: avoid conflicting remove_mapping() in mq policy
On sparc32, which includes <linux/swap.h> from <asm/pgtable_32.h>:

drivers/md/dm-cache-policy-mq.c:962:13: error: conflicting types for 'remove_mapping'
include/linux/swap.h:285:12: note: previous declaration of 'remove_mapping' was here

As mq_remove_mapping() already exists, and the local remove_mapping() is
used only once, inline it manually to avoid the conflict.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair Kergon <agk@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
2013-08-16 15:56:51 -04:00
Linus Torvalds
c271f5bc9a Merge tag 'md/3.11-fixes' of git://neil.brown.name/md
Pull md fixes from Neil Brown:
 "Two more bugfixes for md in 3.11

  Both marked for -stable, both since 3.3.  I guess I should spend more
  time testing..."

* tag 'md/3.11-fixes' of git://neil.brown.name/md:
  md/raid5: fix interaction of 'replace' and 'recovery'.
  md/raid10: remove use-after-free bug.
2013-07-26 11:20:10 -07:00
NeilBrown
f94c0b6658 md/raid5: fix interaction of 'replace' and 'recovery'.
If a device in a RAID4/5/6 is being replaced while another is being
recovered, then the writes to the replacement device currently don't
happen, resulting in corruption when the replacement completes and the
new drive takes over.

This is because the replacement writes are only triggered when
's.replacing' is set and not when the similar 's.sync' is set (which
is the case during resync and recovery - it means all devices need to
be read).

So schedule those writes when s.replacing is set as well.

In this case we cannot use "STRIPE_INSYNC" to record that the
replacement has happened as that is needed for recording that any
parity calculation is complete.  So introduce STRIPE_REPLACED to
record if the replacement has happened.

For safety we should also check that STRIPE_COMPUTE_RUN is not set.
This has a similar effect to the "s.locked == 0" test.  The latter
ensure that now IO has been flagged but not started.  The former
checks if any parity calculation has been flagged by not started.
We must wait for both of these to complete before triggering the
'replace'.

Add a similar test to the subsequent check for "are we finished yet".
This possibly isn't needed (is subsumed in the STRIPE_INSYNC test),
but it makes it more obvious that the REPLACE will happen before we
think we are finished.

Finally if a NeedReplace device is not UPTODATE then that is an
error.  We really must trigger a warning.

This bug was introduced in commit 9a3e1101b8
(md/raid5:  detect and handle replacements during recovery.)
which introduced replacement for raid5.
That was in 3.3-rc3, so any stable kernel since then would benefit
from this fix.

Cc: stable@vger.kernel.org (3.3+)
Reported-by: qindehua <13691222965@163.com>
Tested-by: qindehua <qindehua@163.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-07-25 16:46:57 +10:00
NeilBrown
0eb25bb027 md/raid10: remove use-after-free bug.
We always need to be careful when calling generic_make_request, as it
can start a chain of events which might free something that we are
using.

Here is one place I wasn't careful enough.  If the wbio2 is not in
use, then it might get freed at the first generic_make_request call.
So perform all necessary tests first.

This bug was introduced in 3.3-rc3 (24afd80d99) and can cause an
oops, so fix is suitable for any -stable since then.

Cc: stable@vger.kernel.org (3.3+)
Signed-off-by: NeilBrown <neilb@suse.de>
2013-07-25 16:46:53 +10:00
Linus Torvalds
d4c90b1b9f Merge branch 'for-3.11/drivers' of git://git.kernel.dk/linux-block
Pull block IO driver bits from Jens Axboe:
 "As I mentioned in the core block pull request, due to real life
  circumstances the driver pull request would be late.  Now it looks
  like -rc2 late...  On the plus side, apart form the rsxx update, these
  are all things that I could argue could go in later in the cycle as
  they are fixes and not features.  So even though things are late, it's
  not ALL bad.

  The pull request contains:

   - Updates to bcache, all bug fixes, from Kent.

   - A pile of drbd bug fixes (no big features this time!).

   - xen blk front/back fixes.

   - rsxx driver updates, some of them deferred form 3.10.  So should be
     well cooked by now"

* 'for-3.11/drivers' of git://git.kernel.dk/linux-block: (63 commits)
  bcache: Allocation kthread fixes
  bcache: Fix GC_SECTORS_USED() calculation
  bcache: Journal replay fix
  bcache: Shutdown fix
  bcache: Fix a sysfs splat on shutdown
  bcache: Advertise that flushes are supported
  bcache: check for allocation failures
  bcache: Fix a dumb race
  bcache: Use standard utility code
  bcache: Update email address
  bcache: Delete fuzz tester
  bcache: Document shrinker reserve better
  bcache: FUA fixes
  drbd: Allow online change of al-stripes and al-stripe-size
  drbd: Constants should be UPPERCASE
  drbd: Ignore the exit code of a fence-peer handler if it returns too late
  drbd: Fix rcu_read_lock balance on error path
  drbd: fix error return code in drbd_init()
  drbd: Do not sleep inside rcu
  bcache: Refresh usage docs
  ...
2013-07-22 19:02:52 -07:00
NeilBrown
30bc9b5387 md/raid1: fix bio handling problems in process_checks()
Recent change to use bio_copy_data() in raid1 when repairing
an array is faulty.

The underlying may have changed the bio in various ways using
bio_advance and these need to be undone not just for the 'sbio' which
is being copied to, but also the 'pbio' (primary) which is being
copied from.

So perform the reset on all bios that were read from and do it early.

This also ensure that the sbio->bi_io_vec[j].bv_len passed to
memcmp is correct.

This fixes a crash during a 'check' of a RAID1 array.  The crash was
introduced in 3.10 so this is suitable for 3.10-stable.

Cc: stable@vger.kernel.org (3.10)
Reported-by: Joe Lawrence <joe.lawrence@stratus.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-07-18 14:18:04 +10:00
NeilBrown
5024c29831 md: Remove recent change which allows devices to skip recovery.
commit 7ceb17e87b
    md: Allow devices to be re-added to a read-only array.

allowed a bit more than just that.  It also allows devices to be added
to a read-write array and to end up skipping recovery.

This patch removes the offending piece of code pending a rewrite for a
subsequent release.

More specifically:
 If the array has a bitmap, then the device will still need a bitmap
 based resync ('saved_raid_disk' is set under different conditions
 is a bitmap is present).
 If the array doesn't have a bitmap, then this is correct as long as
 nothing has been written to the array since the metadata was checked
 by ->validate_super.  However there is no locking to ensure that there
 was no write.

Bug was introduced in 3.10 and causes data corruption so
patch is suitable for 3.10-stable.

Cc: stable@vger.kernel.org (3.10)
Reported-by: Joe Lawrence <joe.lawrence@stratus.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-07-18 14:18:03 +10:00
NeilBrown
7bb23c4934 md/raid10: fix two problems with RAID10 resync.
1/ When an different between blocks is found, data is copied from
   one bio to the other.  However bv_len is used as the length to
   copy and this could be zero.  So use r10_bio->sectors to calculate
   length instead.
   Using bv_len was probably always a bit dubious, but the introduction
   of bio_advance made it much more likely to be a problem.

2/ When preparing some blocks for sync, we don't set BIO_UPTODATE
   except on bios that we schedule for a read.  This ensures that
   missing/failed devices don't confuse the loop at the top of
   sync_request write.
   Commit 8be185f2c9 "raid10: Use bio_reset()"
   removed a loop which set BIO_UPTDATE on all appropriate bios.
   So we need to re-add that flag.

These bugs were introduced in 3.10, so this patch is suitable for
3.10-stable, and can remove a potential for data corruption.

Cc: stable@vger.kernel.org (3.10)
Reported-by: Brassow Jonathan <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-07-18 14:18:01 +10:00
Kent Overstreet
79826c35eb bcache: Allocation kthread fixes
The alloc kthread should've been using try_to_freeze() - and also there
was the potential for the alloc kthread to get woken up after it had
shut down, which would have been bad.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
2013-07-12 00:22:49 -07:00
Kent Overstreet
29ebf465b9 bcache: Fix GC_SECTORS_USED() calculation
Part of the job of garbage collection is to add up however many sectors
of live data it finds in each bucket, but that doesn't work very well if
it doesn't reset GC_SECTORS_USED() when it starts. Whoops.

This wouldn't have broken anything horribly, but allocation tries to
preferentially reclaim buckets that are mostly empty and that's not
gonna work with an incorrect GC_SECTORS_USED() value.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
2013-07-12 00:22:48 -07:00
Kent Overstreet
faa5673617 bcache: Journal replay fix
The journal replay code starts by finding something that looks like a
valid journal entry, then it does a binary search over the unchecked
region of the journal for the journal entries with the highest sequence
numbers.

Trouble is, the logic was wrong - journal_read_bucket() returns true if
it found journal entries we need, but if the range of journal entries
we're looking for loops around the end of the journal - in that case
journal_read_bucket() could return true when it hadn't found the highest
sequence number we'd seen yet, and in that case the binary search did
the wrong thing. Whoops.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
2013-07-12 00:22:48 -07:00
Kent Overstreet
5caa52afc5 bcache: Shutdown fix
Stopping a cache set is supposed to make it stop attached backing
devices, but somewhere along the way that code got lost. Fixing this
mainly has the effect of fixing our reboot notifier.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
2013-07-12 00:22:47 -07:00
Kent Overstreet
c9502ea442 bcache: Fix a sysfs splat on shutdown
If we stopped a bcache device when we were already detaching (or
something like that), bcache_device_unlink() would try to remove a
symlink from sysfs that was already gone because the bcache dev kobject
had already been removed from sysfs.

So keep track of whether we've removed stuff from sysfs.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
2013-07-12 00:22:47 -07:00
Kent Overstreet
54d12f2b4f bcache: Advertise that flushes are supported
Whoops - bcache's flush/FUA was mostly correct, but flushes get filtered
out unless we say we support them...

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
2013-07-12 00:22:46 -07:00
Dan Carpenter
d2a65ce2ac bcache: check for allocation failures
There is a missing NULL check after the kzalloc().

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
2013-07-12 00:22:46 -07:00
Kent Overstreet
6aa8f1a6ca bcache: Fix a dumb race
In the far-too-complicated closure code - closures can have destructors,
for probably dubious reasons; they get run after the closure is no
longer waiting on anything but before dropping the parent ref, intended
just for freeing whatever memory the closure is embedded in.

Trouble is, when remaining goes to 0 and we've got nothing more to run -
we also have to unlock the closure, setting remaining to -1. If there's
a destructor, that unlock isn't doing anything - nobody could be trying
to lock it if we're about to free it - but if the unlock _is needed...
that check for a destructor was racy. Argh.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
2013-07-12 00:22:33 -07:00
Linus Torvalds
9903883f1d Merge tag 'dm-3.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm
Pull device-mapper changes from Alasdair G Kergon:
 "Add a device-mapper target called dm-switch to provide a multipath
  framework for storage arrays that dynamically reconfigure their
  preferred paths for different device regions.

  Fix a bug in the verity target that prevented its use with some
  specific sizes of devices.

  Improve some locking mechanisms in the device-mapper core and bufio.

  Add Mike Snitzer as a device-mapper maintainer.

  A few more clean-ups and fixes"

* tag 'dm-3.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm:
  dm: add switch target
  dm: update maintainers
  dm: optimize reorder structure
  dm: optimize use SRCU and RCU
  dm bufio: submit writes outside lock
  dm cache: fix arm link errors with inline
  dm verity: use __ffs and __fls
  dm flakey: correct ctr alloc failure mesg
  dm verity: remove pointless comparison
  dm: use __GFP_HIGHMEM in __vmalloc
  dm verity: fix inability to use a few specific devices sizes
  dm ioctl: set noio flag to avoid __vmalloc deadlock
  dm mpath: fix ioctl deadlock when no paths
2013-07-11 13:05:40 -07:00
Jim Ramsay
9d0eb0ab43 dm: add switch target
dm-switch is a new target that maps IO to underlying block devices
efficiently when there is a large number of fixed-sized address regions
but there is no simple pattern to allow for a compact mapping
representation such as dm-stripe.

Though we have developed this target for a specific storage device, Dell
EqualLogic, we have made an effort to keep it as general purpose as
possible in the hope that others may benefit.

Originally developed by Jim Ramsay. Simplified by Mikulas Patocka.

Signed-off-by: Jim Ramsay <jim_ramsay@dell.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-07-10 23:41:19 +01:00
Mikulas Patocka
2a7faeb176 dm: optimize reorder structure
This reorder actually improves performance by 20% (from 39.1s to 32.8s)
on x86-64 quad core Opteron.

I have no explanation for this, possibly it makes some other entries are
better cache-aligned.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-07-10 23:41:18 +01:00
Mikulas Patocka
83d5e5b0af dm: optimize use SRCU and RCU
This patch removes "io_lock" and "map_lock" in struct mapped_device and
"holders" in struct dm_table and replaces these mechanisms with
sleepable-rcu.

Previously, the code would call "dm_get_live_table" and "dm_table_put" to
get and release table. Now, the code is changed to call "dm_get_live_table"
and "dm_put_live_table". dm_get_live_table locks sleepable-rcu and
dm_put_live_table unlocks it.

dm_get_live_table_fast/dm_put_live_table_fast can be used instead of
dm_get_live_table/dm_put_live_table. These *_fast functions use
non-sleepable RCU, so the caller must not block between them.

If the code changes active or inactive dm table, it must call
dm_sync_table before destroying the old table.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-07-10 23:41:18 +01:00
Mikulas Patocka
2480945cd4 dm bufio: submit writes outside lock
This patch changes dm-bufio so that it submits write I/Os outside of the
lock. If the number of submitted buffers is greater than the number of
requests on the target queue, submit_bio blocks. We want to block outside
of the lock to improve latency of other threads that may need the lock.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-07-10 23:41:18 +01:00
Mikulas Patocka
43aeaa2957 dm cache: fix arm link errors with inline
Use __always_inline to avoid a link failure with gcc 4.6 on ARM.
gcc 4.7 is OK.

It creates a function block_div.part.8, it references __udivdi3 and
__umoddi3 and it is never called. The references to __udivdi3 and
__umoddi3 cause a link failure.

Reported-by: Rob Herring <robherring2@gmail.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-07-10 23:41:17 +01:00
Mikulas Patocka
553d8fe029 dm verity: use __ffs and __fls
This patch changes ffs() to __ffs() and fls() to __fls() which don't add
one to the result.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-07-10 23:41:17 +01:00