All the interesting information printed by this ioctl
is provided in /proc/mdstat and/or sysfs.
So it isn't needed and isn't used and would be best if it didn't
exist.
Signed-off-by: NeilBrown <neilb@suse.de>
Most of the places that call this are doing so pointlessly.
A couple of the others a best replaced with WARN_ON().
Signed-off-by: NeilBrown <neilb@suse.de>
unknown ioctls no longer get this deep into md_ioctl since
md_ioctl_valid() was introduced in 3.14.
So remove the test and the misleading comment.
Signed-off-by: NeilBrown <neilb@suse.de>
If an array is active, devices can be marked 'faulty', but simply
removing the 'sync' flag is wrong. That only makes sense
for an array which is not active (and is probably only useful
for testing anyway).
Signed-off-by: NeilBrown <neilb@suse.de>
The main 'md' thread is needed for processing writes, so if it blocks
write requests could be delayed.
Starting a new thread requires some GFP_KERNEL allocations and so can
wait for writes to complete. This can deadlock.
So instead, ask a workqueue to start the sync thread.
There is no particular rush for this to happen, so any work queue
will do.
MD_RECOVERY_RUNNING is used to ensure only one thread is started.
Reported-by: BillStuff <billstuff2001@sbcglobal.net>
Signed-off-by: NeilBrown <neilb@suse.de>
We don't really need the full mddev_lock here, and having to
drop it is messy.
RCU is enough to protect these lists.
Signed-off-by: NeilBrown <neilb@suse.de>
printk may cause long time lapse if value of printk_delay in sysctl is
configured large by user. If register_md_personality takes long time to print in
spinlock pers_lock, we may encounter high CPU usage rate when there are other
pers_lock competitors who may be blocked to spin.
We can avoid this condition by moving printk out of coverage of pers_lock
spinlock.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: NeilBrown <neilb@suse.de>
In general we don't allow an array to be stopped if it is in use.
However if the array hasn't really been started yet, then any
apparent use is an anomily, probably due to 'udev' or similar
having a look to see what is there.
This means that if something goes wrong while assembling an array
it cannot reliably be un-assembled - STOP_ARRAY could fail.
There is no value here, so change do_md_stop() to succeed
despite concurrent opens if the array has not yet been
activated. i.e. if ->pers is NULL.
Reported-by: "Baldysiak, Pawel" <pawel.baldysiak@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
An array can only accept a bitmap if it will call bitmap_daemon_work
periodically, which means it needs a thread running.
If there is no thread, don't allow a bitmap to be added.
Signed-off-by: NeilBrown <neilb@suse.de>
When we calculate the speed of recovery, the numerator that contains
the recovery done sectors. It's need to subtract the sectors which
don't finish recovery.
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
The way md devices are traditionally created in the kernel
is simply to open the device with the desired major/minor number.
This can be problematic as some support tools, notably udev and
programs run by udev, can open a device just to see what is there, and
find that it has created something. It is easy for a race to cause
udev to open an md device just after it was destroy, causing it to
suddenly re-appear.
For some time we have had an alternate way to create md devices
echo md_somename > /sys/modules/md_mod/paramaters/new_array
This will always use a minor number of 512 or higher, which mdadm
normally avoids.
Using this makes the creation-by-opening unnecessary, but does
not disable it, so it is still there to cause problems.
This patch disable probing for devices with a major of 9 (MD_MAJOR)
and a minor of 512 and up. This devices created by writing to
new_array cannot be re-created by opening the node in /dev.
Signed-off-by: NeilBrown <neilb@suse.de>
When we write to a degraded array which has a bitmap, we
make sure the relevant bit in the bitmap remains set when
the write completes (so a 're-add' can quickly rebuilt a
temporarily-missing device).
If, immediately after such a write starts, we incorporate a spare,
commence recovery, and skip over the region where the write is
happening (because the 'needs recovery' flag isn't set yet),
then that write will not get to the new device.
Once the recovery finishes the new device will be trusted, but will
have incorrect data, leading to possible corruption.
We cannot set the 'needs recovery' flag when we start the write as we
do not know easily if the write will be "degraded" or not. That
depends on details of the particular raid level and particular write
request.
This patch fixes a corruption issue of long standing and so it
suitable for any -stable kernel. It applied correctly to 3.0 at
least and will minor editing to earlier kernels.
Reported-by: Bill <billstuff2001@sbcglobal.net>
Tested-by: Bill <billstuff2001@sbcglobal.net>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/53A518BB.60709@sbcglobal.net
Signed-off-by: NeilBrown <neilb@suse.de>
Julia Lawall and coccinelle report that md_clear_badblocks always
returns 0, despite appearing to have an error path.
The error path really should return an error code. ENOSPC is
reasonably appropriate.
Reported-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: NeilBrown <neilb@suse.de>
read-only arrays should not be changed. This includes changing
the level, layout, size, or number of devices.
So reject those changes for readonly arrays.
Signed-off-by: NeilBrown <neilb@suse.de>
Commit 8313b8e57f
md: fix problem when adding device to read-only array with bitmap.
added a called to md_reap_sync_thread() which cause a reshape thread
to be interrupted (in particular, it could cause md_thread() to never even
call md_do_sync()).
However it didn't set MD_RECOVERY_INTR so ->finish_reshape() would not
know that the reshape didn't complete.
This only happens when mddev->ro is set and normally reshape threads
don't run in that situation. But raid5 and raid10 can start a reshape
thread during "run" is the array is in the middle of a reshape.
They do this even if ->ro is set.
So it is best to set MD_RECOVERY_INTR before abortingg the
sync thread, just in case.
Though it rare for this to trigger a problem it can cause data corruption
because the reshape isn't finished properly.
So it is suitable for any stable which the offending commit was applied to.
(3.2 or later)
Fixes: 8313b8e57f
Cc: stable@vger.kernel.org (3.2+)
Signed-off-by: NeilBrown <neilb@suse.de>
If mddev->ro is set, md_to_sync will (correctly) abort.
However in that case MD_RECOVERY_INTR isn't set.
If a RESHAPE had been requested, then ->finish_reshape() will be
called and it will think the reshape was successful even though
nothing happened.
Normally a resync will not be requested if ->ro is set, but if an
array is stopped while a reshape is on-going, then when the array is
started, the reshape will be restarted. If the array is also set
read-only at this point, the reshape will instantly appear to success,
resulting in data corruption.
Consequently, this patch is suitable for any -stable kernel.
Cc: stable@vger.kernel.org (any)
Signed-off-by: NeilBrown <neilb@suse.de>
If an md array with externally managed metadata (e.g. DDF or IMSM)
is in use, then we should not set safemode==2 at shutdown because:
1/ this is ineffective: user-space need to be involved in any 'safemode' handling,
2/ The safemode management code doesn't cope with safemode==2 on external metadata
and md_check_recover enters an infinite loop.
Even at shutdown, an infinite-looping process can be problematic, so this
could cause shutdown to hang.
Cc: stable@vger.kernel.org (any kernel)
Signed-off-by: NeilBrown <neilb@suse.de>
If md-mod is unloaded while some process is in poll() or select(),
then that process maintains a pointer to md_event_waiters, and when
the try to unlink from that list, they will oops.
The procfs infrastructure ensures that ->poll won't be called after
remove_proc_entry, but doesn't provide a wait_queue_head for us to
use, and the waitqueue code doesn't provide a way to remove all
listeners from a waitqueue.
So we need to:
1/ make sure no further references to md_event_waiters are taken (by
setting md_unloading)
2/ wake up all processes currently waiting, and
3/ wait until all those processes have disconnected from our
wait_queue_head.
Reported-by: "majianpeng" <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>