release_stripe still has big lock contention. We just add the stripe to a llist
without taking device_lock. We let the raid5d thread to do the real stripe
release, which must hold device_lock anyway. In this way, release_stripe
doesn't hold any locks.
The side effect is the released stripes order is changed. But sounds not a big
deal, stripes are never handled in order. And I thought block layer can already
do nice request merge, which means order isn't that important.
I kept the unplug release batch, which is unnecessary with this patch from lock
contention avoid point of view, and actually if we delete it, the stripe_head
release_list and lru can share storage. But the unplug release batch is also
helpful for request merge. We probably can delay wakeup raid5d till unplug, but
I'm still afraid of the case which raid5d is running.
Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When the last process closes /dev/mdX sync_blockdev will be called so
that all buffers get flushed.
So if it is then opened for the STOP_ARRAY ioctl to be sent there will
be nothing to flush.
However if we open /dev/mdX in order to send the STOP_ARRAY ioctl just
moments before some other process which was writing closes their file
descriptor, then there won't be a 'last close' and the buffers might
not get flushed.
So do_md_stop() calls sync_blockdev(). However at this point it is
holding ->reconfig_mutex. So if the array is currently 'clean' then
the writes from sync_blockdev() will not complete until the array
can be marked dirty and that won't happen until some other thread
can get ->reconfig_mutex. So we deadlock.
We need to move the sync_blockdev() call to before we take
->reconfig_mutex.
However then some other thread could open /dev/mdX and write to it
after we call sync_blockdev() and before we actually stop the array.
This can leave dirty data in the page cache which is awkward.
So introduce new flag MD_STILL_CLOSED. Set it before calling
sync_blockdev(), clear it if anyone does open the file, and abort the
STOP_ARRAY attempt if it gets set before we lock against further
opens.
It is still possible to get problems if you open /dev/mdX, write to
it, then issue the STOP_ARRAY ioctl. Just don't do that.
Signed-off-by: NeilBrown <neilb@suse.de>
mddev->flags is mostly used to record if an update of the
metadata is needed. Sometimes the whole field is tested
instead of just the important bits. This makes it difficult
to introduce more state bits.
So replace all bare tests of mddev->flags with tests for the bits
that actually need testing.
Signed-off-by: NeilBrown <neilb@suse.de>
Setting a variable to itself probably wasn't the intention here.
Signed-off-by: Dave Jones <davej@fedoraproject.org>
Signed-off-by: NeilBrown <neilb@suse.de>
This change adds TILE-Gx SIMD instructions to the software raid
(md), modeling the Altivec implementation. This is only for Syndrome
generation; there is more that could be done to improve recovery,
as in the recent Intel SSE3 recovery implementation.
The code unrolls 8 times; this turns out to be the best on tilegx
hardware among the set 1, 2, 4, 8 or 16. The code reads one
cache-line of data from each disk, stores P and Q then goes to the
next cache-line.
The test code in sys/linux/lib/raid6/test reports 2008 MB/s data
read rate for syndrome generation using 18 disks (16 data and 2
parity). It was 1512 MB/s before this SIMD optimizations. This is
running on 1 core with all the data in cache.
This is based on the paper The Mathematics of RAID-6.
(http://kernel.org/pub/linux/kernel/people/hpa/raid6.pdf).
Signed-off-by: Ken Steele <ken@tilera.com>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Whe we set the safe_mode_timeout to a smaller value we trigger a timeout
immediately - otherwise the small value might not be honoured.
However if the previous timeout was 0 meaning "no timeout", we didn't.
This would mean that no timeout happens until the next write completes,
which could be a long time.
Signed-off-by: NeilBrown <neilb@suse.de>
There is no really need as GFP_NOIO is very likely sufficient,
and failure is not catastrophic.
Calling md_allow_write here will convert a read-auto array to
read/write which could be confusing when you are just performing
a read operation.
Signed-off-by: NeilBrown <neilb@suse.de>
Pull staging fixes from Greg KH:
"Here are two tiny staging tree fixes (well, one is for an iio driver,
but those updates come through the staging tree due to dependancies)
One fixes a problem with an IIO driver, and the other fixes a bug in
the comedi driver core"
* tag 'staging-3.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: comedi: bug-fix NULL pointer dereference on failed attach
iio: adjd_s311: Fix non-scan mode data read
Pull USB fixes from Greg KH:
"Here are two USB fixes for 3.11-rc7
One fixes a reported regression in the OHCI driver, and the other
fixes a reported build breakage in the USB phy drivers"
* tag 'usb-3.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: phy: fix build breakage
USB: OHCI: add missing PCI PM callbacks to ohci-pci.c
Pull ARM fixes from Russell King:
"This round of fixes is smaller than previous: a couple more updates
for the security fixes, and a one-liner kexec fix"
* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
ARM: 7816/1: CONFIG_KUSER_HELPERS: fix help text
ARM: 7815/1: kexec: offline non panic CPUs on Kdump panic
ARM: 7819/1: fiq: Cast the first argument of flush_icache_range()
Pull vfs fixes from Al Viro:
"Assorted fixes from the last week or so"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
VFS: collect_mounts() should return an ERR_PTR
bfs: iget_locked() doesn't return an ERR_PTR
efs: iget_locked() doesn't return an ERR_PTR()
proc: kill the extra proc_readfd_common()->dir_emit_dots()
cope with potentially long ->d_dname() output for shmem/hugetlb
Pull ACPI fix from Rafael Wysocki:
"I really hoped that it wouldn't be necessary to change anything in
ACPI at this point, but it turns out that we need to revert one more
ACPI video commit causing trouble.
This reverts a change in the ACPI video driver that caused the ACPI
backlight initialization to be carried out even if acpi_backlight=vendor
is passed in the kernel command line which turns out to break things
at least on one system"
* tag 'acpi-3.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "ACPI / video: Always call acpi_video_init_brightness() on init"
Pull SCSI fixes from James Bottomley:
"This is a set of small bug fixes for lpfc and zfcp and a fix for a
fairly nasty bug in sg where a process which cancels I/O completes in
a kernel thread which would then try to write back to the now gone
userspace and end up writing to a random kernel address instead"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
[SCSI] zfcp: remove access control tables interface (keep sysfs files)
[SCSI] zfcp: fix schedule-inside-lock in scsi_device list loops
[SCSI] zfcp: fix lock imbalance by reworking request queue locking
[SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal
[SCSI] lpfc: Don't force CONFIG_GENERIC_CSUM on
This should actually be returning an ERR_PTR on error instead of NULL.
That was how it was designed and all the callers expect it.
[AV: actually, that's what "VFS: Make clone_mnt()/copy_tree()/collect_mounts()
return errors" missed - originally collect_mounts() was expected to return
NULL on failure]
Cc: <stable@vger.kernel.org> # 3.10+
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
proc_readfd_common() does dir_emit_dots() twice in a row,
we need to do this only once.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
dynamic_dname() is both too much and too little for those - the
output may be well in excess of 64 bytes dynamic_dname() assumes
to be enough (thanks to ashmem feeding really long names to
shmem_file_setup()) and vsnprintf() is an overkill for those
guys.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull libata fixes from Tejun Heo:
"This contains three commits all of which are updates for specific
devices which aren't too widespread. Pretty limited scope and nothing
too interesting or dangerous"
* 'for-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
sata_fsl: save irqs while coalescing
libata: apply behavioral quirks to sil3826 PMP
sata, highbank: fix ordering of SGPIO signals
Pull cgroup fix from Tejun Heo:
"A late fix for cgroup.
This fixes a behavior regression visible to userland which was created
by a commit merged during -rc1. While the behavior change isn't too
likely to be noticeable, the fix is relatively low risk and we'll need
to backport it through -stable anyway if the bug gets released"
* 'for-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cpuset: fix a regression in validating config change
Pull drm fixes from Dave Airlie:
"Ben was on holidays for a week so a few nouveau regression fixes
backed up, but they all seem necessary.
Otherwise one i915 and one gma500 fix"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
gma500: Fix SDVO turning off randomly
drm/nv04/disp: fix framebuffer pin refcounting
drm/nouveau/mc: fix race condition between constructor and request_irq()
drm/nouveau: fix reclocking on nv40
drm/nouveau/ltcg: fix allocating memory as free
drm/nouveau/ltcg: fix ltcg memory initialization after suspend
drm/nouveau/fb: fix null derefs in nv49 and nv4e init
drm/i915: Invalidate TLBs for the rings after a reset
Commit 94ae9843 (usb: phy: rename all phy drivers to phy-$name-usb.c)
renamed drivers/usb/phy/otg_fsm.h to drivers/usb/phy/phy-fsm-usb.h
but changed drivers/usb/phy/phy-fsm-usb.c to include not existing
"phy-otg-fsm.h" instead of new "phy-fsm-usb.h". This breaks building:
...
drivers/usb/phy/phy-fsm-usb.c:32:25: fatal error: phy-otg-fsm.h: No such file or directory
compilation terminated.
make[3]: *** [drivers/usb/phy/phy-fsm-usb.o] Error 1
This commit also missed to modify drivers/usb/phy/phy-fsl-usb.h
to include new "phy-fsm-usb.h" instead of "otg_fsm.h" resulting
in another build breakage:
...
In file included from drivers/usb/phy/phy-fsl-usb.c:46:0:
drivers/usb/phy/phy-fsl-usb.h:18:21: fatal error: otg_fsm.h: No such file or directory
compilation terminated.
make[3]: *** [drivers/usb/phy/phy-fsl-usb.o] Error 1
Fix both issues.
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Cc: stable <stable@vger.kernel.org> # 3.10+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>