From: NeilBrown <neilb@suse.de>
If an error is reported by a drive in a RAID array (which is done via
bi_end_io - in interrupt context), we call md_error and md_new_event which
calls sysfs_notify. However sysfs_notify grabs a mutex and so cannot be
called in interrupt context.
This patch just creates a variant of md_new_event which avoids the sysfs
call, and uses that. A better fix for later is to arrange for the event to
be called from user-context.
Note: avoiding the sysfs call isn't a problem as an error will not, by
itself, modify the sync_action attribute. (We do still need to
wake_up(&md_event_waiters) as an error by itself will modify /proc/mdstat).
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
otherwise we get nasty messages about locks not being released.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This loop that sets up the hash_table has problems.
Careful examination will show that the last time through, everything but
the first line is pointless. This is because all it does is change 'cur'
and 'size' and neither of these are used after the loop. This should ring
warning bells... That last time through the loop,
size += conf->strip_zone[cur].size
can index off the end of the strip_zone array. Depending on what it finds
there, it might exit the loop cleanly, or it might spin going further and
further beyond the array until it hits an unmapped address.
This patch rearranges the code so that the last, pointless, iteration of
the loop never happens. i.e. the one statement of the last loop that is
needed is moved the the end of the previous loop - or to before the loop
starts - and the loop counter starts from 1 instead of 0.
Cc: "Don Dupuis" <dondster@gmail.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We should be able to write 'repair' to /sys/block/mdX/md/sync_action,
however due to and inverted test, that always given EINVAL.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When retrying a failed BIO_RW_BARRIER request, we need to keep the reference
in ->nr_pending over the whole retry. Currently, we only hold the reference
if the failed request is the *last* one to finish - which is silly, because it
would normally be the first to finish.
So move the rdev_dec_pending call up into the didn't-fail branch. As the rdev
isn't used in the later code, calling rdev_dec_pending earlier doesn't hurt.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move the test for 'do barrier work' down a bit so that if the first write to a
raid1 is a BIO_RW_BARRIER write, the checking done by superblock writes will
cause the right thing to happen.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Because that is what you get if a BIO_RW_BARRIER isn't supported!
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We need to hold a reference to rdevs while reading and writing to attempt to
correct read errors. This reference must be taken under an rcu lock.
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We should add to the counter for the rdev *after* checking if the rdev is
NULL!!!
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- fix mddev_lock() usage bugs in md_attr_show() and md_attr_store().
[they did not anticipate the possibility of getting a signal]
- remove mddev_lock_uninterruptible() [unused]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It works like this:
Open the file
Read all the contents.
Call poll requesting POLLERR or POLLPRI (so select/exceptfds works)
When poll returns,
close the file and go to top of loop.
or lseek to start of file and go back to the 'read'.
Events are signaled by an object manager calling
sysfs_notify(kobj, dir, attr);
If the dir is non-NULL, it is used to find a subdirectory which
contains the attribute (presumably created by sysfs_create_group).
This has a cost of one int per attribute, one wait_queuehead per kobject,
one int per open file.
The name "sysfs_notify" may be confused with the inotify
functionality. Maybe it would be nice to support inotify for sysfs
attributes as well?
This patch also uses sysfs_notify to allow /sys/block/md*/md/sync_action
to be pollable
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
reshape_position is a 64bit field that was not 64bit aligned. So swap with
new_level.
NOTE: this is a user-visible change. However:
- The bad code has not appeared in a released kernel
- This code is still marked 'experimental'
- This only affects version-1 superblock, which are not in wide use
- These field are only used (rather than simply reported) by user-space
tools in extemely rare circumstances : after a reshape crashes in the
first second of the reshape process.
So I believe that, at this stage, the change is safe. Especially if people
heed the 'help' message on use mdadm-2.4.1.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
this changes if() BUG(); constructs to BUG_ON() which is
cleaner and can better optimized away
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
this changes if() BUG(); constructs to BUG_ON() which is
cleaner and can better optimized away
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
this changes if() BUG(); constructs to BUG_ON() which is
cleaner and can better optimized away
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
this changes if() BUG(); constructs to BUG_ON() which is
cleaner and can better optimized away
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
this changes if() BUG(); constructs to BUG_ON() which is
cleaner and can better optimized away
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
And remove the comments that were put in inplace of a fix too....
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently a device failure during recovery leaves bits set in the bitmap.
This normally isn't a problem as the offending device will be rejected because
of errors. However if device re-adding is being used with non-persistent
bitmaps, this can be a problem.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
... being careful that mutex_trylock is inverted wrt down_trylock
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When retrying a write due to barrier failure, we don't reset 'remaining', so
it goes negative and never hits 0 again.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
An md array can be asked to change the amount of each device that it is using,
and in particular can be asked to use the maximum available space. This
currently only works if the first device is not larger than the rest. As
'size' gets changed and so 'fit' becomes wrong. So check if a 'fit' is
required early and don't corrupt it.
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
raid5 overloads bi_phys_segments to count the number of blocks that the
request was broken in to so that it knows when the bio is completely handled.
Accessing this must always be done under a spinlock. In one case we also call
bi_end_io under that spinlock, which probably isn't ideal as bi_end_io could
be expensive (even though it isn't allowed to sleep).
So we reducde the range of the spinlock to just accessing bi_phys_segments.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>