Commit Graph

121 Commits

Author SHA1 Message Date
NeilBrown
070dc6dd71 md: resolve confusion of MD_CHANGE_CLEAN
MD_CHANGE_CLEAN is used for two different purposes and this leads to
confusion.
One of the purposes is largely mirrored by MD_CHANGE_PENDING which is
not used for anything else, so have MD_CHANGE_PENDING take over that
purpose fully.

The two purposes are:
 1/ tell md_update_sb that an update is needed and that it is just a
   clean/dirty transition.
 2/ tell user-space that an transition from clean to dirty is pending
    (something wants to write), and tell te kernel (by clearin the
    flag) that the transition is OK.

The first purpose remains wit MD_CHANGE_CLEAN, the second is moved
fully to MD_CHANGE_PENDING.

This means that various places which conditionally set or cleared
MD_CHANGE_CLEAN no longer need to be conditional.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-08-30 18:06:21 +10:00
NeilBrown
69e51b449d md/bitmap: separate out loading a bitmap from initialising the structures.
dm makes this distinction between ->ctr and ->resume, so we need to
too.

Also get the new bitmap_load to clear out the bitmap first, as this is
most consistent with the dm suspend/resume approach

Signed-off-by: NeilBrown <neilb@suse.de>
2010-07-26 13:21:34 +10:00
NeilBrown
e384e58549 md/bitmap: prepare for storing write-intent-bitmap via dm-dirty-log.
This allows md/raid5 to fully work as a dm target.

Normally md uses a 'filemap' which contains a list of pages of bits
each of which may be written separately.
dm-log uses and all-or-nothing approach to writing the log, so
when using a dm-log, ->filemap is NULL and the flags normally stored
in filemap_attr are stored in ->logattrs instead.



Signed-off-by: NeilBrown <neilb@suse.de>
2010-07-26 13:21:34 +10:00
NeilBrown
ef42567335 md/bitmap: optimise scanning of empty bitmaps.
A bitmap is stored as one page per 2048 bits.
If none of the bits are set, the page is not allocated.

When bitmap_get_counter finds that a page isn't allocate,
it just reports that one bit work of space isn't flagged,
rather than reporting that 2048 bits worth of space are
unflagged.
This can cause searches for flagged bits (e.g. bitmap_close_sync)
to do more work than is really necessary.

So change bitmap_get_counter (when creating) to report a number of
blocks that more accurately reports the range of the device for which
no counter currently exists.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-07-26 13:21:32 +10:00
NeilBrown
b63d7c2e29 md/bitmap: clean up plugging calls.
1/ use md_unplug in bitmap.c as we will soon be using bitmaps under
  arrays with no queue attached.

2/ Don't bother plugging the queue when we set a bit in the bitmap.
   The reason for this was to encourage as many bits as possible to
   get set before we unplug and write stuff out.
   However every personality already plugs the queue after
   bitmap_startwrite either directly (raid1/raid10) or be setting
   STRIPE_BIT_DELAY which causes the queue to be plugged later
   (raid5).

Signed-off-by: NeilBrown <neilb@suse.de>
2010-07-26 13:21:32 +10:00
NeilBrown
5ff5afffe6 md/bitmap: reduce dependence on sysfs.
For dm-raid45 we will want to use bitmaps in dm-targets which don't
have entries in sysfs, so cope with the mddev not living in sysfs.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-07-26 13:21:31 +10:00
NeilBrown
ac2f40be46 md/bitmap: white space clean up and similar.
Fixes some whitespace problems
Fixed some checkpatch.pl complaints.
Replaced kmalloc ... memset(0), with kzalloc
Fixed an unlikely memory leak on an error path.
Reformatted a number of 'if/else' sets, sometimes
replacing goto with an else clause.
Removed some old comments and commented-out code.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-07-26 13:07:22 +10:00
NeilBrown
676e42d896 md: be more careful setting MD_CHANGE_CLEAN
When MD_CHANGE_CLEAN is set we might block in md_write_start.
So we should only set it when fairly sure that something will clear
it.

There are two places where it is set so as to encourage a metadata
update to record the progress of resync/recovery.  This should only
be done if the internal metadata update mechanisms are in use, which
can be tested by by inspecting '->persistent'.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-07-26 12:52:27 +10:00
Linus Torvalds
e8bebe2f71 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (69 commits)
  fix handling of offsets in cris eeprom.c, get rid of fake on-stack files
  get rid of home-grown mutex in cris eeprom.c
  switch ecryptfs_write() to struct inode *, kill on-stack fake files
  switch ecryptfs_get_locked_page() to struct inode *
  simplify access to ecryptfs inodes in ->readpage() and friends
  AFS: Don't put struct file on the stack
  Ban ecryptfs over ecryptfs
  logfs: replace inode uid,gid,mode initialization with helper function
  ufs: replace inode uid,gid,mode initialization with helper function
  udf: replace inode uid,gid,mode init with helper
  ubifs: replace inode uid,gid,mode initialization with helper function
  sysv: replace inode uid,gid,mode initialization with helper function
  reiserfs: replace inode uid,gid,mode initialization with helper function
  ramfs: replace inode uid,gid,mode initialization with helper function
  omfs: replace inode uid,gid,mode initialization with helper function
  bfs: replace inode uid,gid,mode initialization with helper function
  ocfs2: replace inode uid,gid,mode initialization with helper function
  nilfs2: replace inode uid,gid,mode initialization with helper function
  minix: replace inode uid,gid,mode init with helper
  ext4: replace inode uid,gid,mode init with helper
  ...

Trivial conflict in fs/fs-writeback.c (mark bitfields unsigned)
2010-05-21 19:37:45 -07:00
NeilBrown
19fdb9eefb Merge commit '3ff195b011d7decf501a4d55aeed312731094796' into for-linus
Conflicts:
	drivers/md/md.c

- Resolved conflict in md_update_sb
- Added extra 'NULL' arg to new instance of sysfs_get_dirent.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-05-22 08:31:36 +10:00
Christoph Hellwig
8018ab0574 sanitize vfs_fsync calling conventions
Now that the last user passing a NULL file pointer is gone we can remove
the redundant dentry argument and associated hacks inside vfs_fsynmc_range.

The next step will be removig the dentry argument from ->fsync, but given
the luck with the last round of method prototype changes I'd rather
defer this until after the main merge window.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-21 18:31:21 -04:00
Eric W. Biederman
3ff195b011 sysfs: Implement sysfs tagged directory support.
The problem.  When implementing a network namespace I need to be able
to have multiple network devices with the same name.  Currently this
is a problem for /sys/class/net/*, /sys/devices/virtual/net/*, and
potentially a few other directories of the form /sys/ ... /net/*.

What this patch does is to add an additional tag field to the
sysfs dirent structure.  For directories that should show different
contents depending on the context such as /sys/class/net/, and
/sys/devices/virtual/net/ this tag field is used to specify the
context in which those directories should be visible.  Effectively
this is the same as creating multiple distinct directories with
the same name but internally to sysfs the result is nicer.

I am calling the concept of a single directory that looks like multiple
directories all at the same path in the filesystem tagged directories.

For the networking namespace the set of directories whose contents I need
to filter with tags can depend on the presence or absence of hotplug
hardware or which modules are currently loaded.  Which means I need
a simple race free way to setup those directories as tagged.

To achieve a reace free design all tagged directories are created
and managed by sysfs itself.

Users of this interface:
- define a type in the sysfs_tag_type enumeration.
- call sysfs_register_ns_types with the type and it's operations
- sysfs_exit_ns when an individual tag is no longer valid

- Implement mount_ns() which returns the ns of the calling process
  so we can attach it to a sysfs superblock.
- Implement ktype.namespace() which returns the ns of a syfs kobject.

Everything else is left up to sysfs and the driver layer.

For the network namespace mount_ns and namespace() are essentially
one line functions, and look to remain that.

Tags are currently represented a const void * pointers as that is
both generic, prevides enough information for equality comparisons,
and is trivial to create for current users, as it is just the
existing namespace pointer.

The work needed in sysfs is more extensive.  At each directory
or symlink creating I need to check if the directory it is being
created in is a tagged directory and if so generate the appropriate
tag to place on the sysfs_dirent.  Likewise at each symlink or
directory removal I need to check if the sysfs directory it is
being removed from is a tagged directory and if so figure out
which tag goes along with the name I am deleting.

Currently only directories which hold kobjects, and
symlinks are supported.  There is not enough information
in the current file attribute interfaces to give us anything
to discriminate on which makes it useless, and there are
no potential users which makes it an uninteresting problem
to solve.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-21 09:37:31 -07:00
NeilBrown
e555190d82 md/raid1: delay reads that could overtake behind-writes.
When a raid1 array is configured to support write-behind
on some devices, it normally only reads from other devices.
If all devices are write-behind (because the rest have failed)
it is possible for a read request to be serviced before a
behind-write request, which would appear as data corruption.

So when forced to read from a WriteMostly device, wait for any
write-behind to complete, and don't start any more behind-writes.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-05-18 15:27:57 +10:00
H Hartley Sweeten
7b92813c3c drivers/md: Remove unnecessary casts of void *
void pointers do not need to be cast to other pointer types.

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-05-18 15:27:46 +10:00
Paul Clements
696fcd535b md: expose max value of behind writes counter
Keep track of the maximum number of concurrent write-behind requests
for an md array and exposed this number in sysfs at
   md/bitmap/max_backlog_used

Writing any value to this file will clear it.

This allows userspace to be involved in tuning bitmap/backlog.

Signed-off-by: Paul Clements <paul.clements@steeleye.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-05-18 15:27:46 +10:00
NeilBrown
ffa23322b1 md/bitmap: update dirty flag when bitmap bits are explicitly set.
There is a sysfs file which allows bits in the write-intent
bitmap to be explicit set - indicating that the block is thought
to be 'dirty'.
When this happens we should really set recovery_cp backwards
to include the block to reflect this dirtiness.

In particular, a 'resync' process will refuse to start if
recovery_cp is beyond the end of the array, so this is needed
to allow a resync to be triggered.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-12-14 12:51:41 +11:00
NeilBrown
ece5cff0da md: Support write-intent bitmaps with externally managed metadata.
In this case, the metadata needs to not be in the same
sector as the bitmap.
md will not read/write any bitmap metadata.  Config must be
done via sysfs and when a recovery makes the array non-degraded
again, writing 'true' to 'bitmap/can_clear' will allow bits in
the bitmap to be cleared again.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-12-14 12:51:41 +11:00
NeilBrown
624ce4f565 md/bitmap: move setting of daemon_lastrun out of bitmap_read_sb
Setting daemon_lastrun really has nothing to do with reading
the bitmap superblock, it just happens to be needed at the same time.
bitmap_read_sb is about to become options, so move that code out
to after the call to bitmap_read_sb.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-12-14 12:51:41 +11:00
NeilBrown
43a705076e md: support updating bitmap parameters via sysfs.
A new attribute directory 'bitmap' in 'md' is created which
contains files for configuring the bitmap.
'location' identifies where the bitmap is, either 'none',
or 'file' or 'sector offset from metadata'.
Writing 'location' can create or remove a bitmap.
Adding a 'file' bitmap this way is not yet supported.
'chunksize' and 'time_base' must be set before 'location'
can be set.

'chunksize' can be set before creating a bitmap, but is
currently always over-ridden by the bitmap superblock.

'time_base' and 'backlog' can be updated at any time.


Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Andre Noll <maan@systemlinux.org>
2009-12-14 12:51:41 +11:00
NeilBrown
f6af949c56 md: support bitmap offset appropriate for external-metadata arrays.
For md arrays were metadata is managed externally, the kernel does not
know about a superblock so the superblock offset is 0.
If we want to have a write-intent-bitmap near the end of the
devices of such an array, we should support sector_t sized offset.
We need offset be possibly negative for when the bitmap is before
the metadata, so use loff_t instead.

Also add sanity check that bitmap does not overlap with data.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-12-14 12:51:41 +11:00
NeilBrown
9cd30fdc33 md: remove needless setting of thread->timeout in raid10_quiesce
As bitmap_create and bitmap_destroy already set thread->timeout
as appropriate, there is no need to do it in raid10_quiesce.
There is a possible need to wake the thread after the timeout
has been set low, but it is better to do that where the timeout
is actually set low, in bitmap_create.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-12-14 12:51:41 +11:00
NeilBrown
1b04be96f6 md: change daemon_sleep to be in 'jiffies' rather than 'seconds'.
This removes a lot of multiplications by HZ.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-12-14 12:51:41 +11:00
NeilBrown
42a04b5078 md: move offset, daemon_sleep and chunksize out of bitmap structure
... and into bitmap_info.  These are all configuration parameters
that need to be set before the bitmap is created.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-12-14 12:51:41 +11:00
NeilBrown
c3d9714e88 md: collect bitmap-specific fields into one structure.
In preparation for making bitmap fields configurable via sysfs,
start tidying up by making a single structure to contain the
configuration fields.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-12-14 12:51:41 +11:00
NeilBrown
aa5cbd1038 md/bitmap: protect against bitmap removal while being updated.
A write intent bitmap can be removed from an array while the
array is active.
When this happens, all IO is suspended and flushed before the
bitmap is removed.
However it is possible that bitmap_daemon_work is still running to
clear old bits from the bitmap.  If it is, it can dereference the
bitmap after it has been freed.

So introduce a new mutex to protect bitmap_daemon_work and get it
before destroying a bitmap.

This is suitable for any current -stable kernel.

Signed-off-by: NeilBrown <neilb@suse.de>
Cc: stable@kernel.org
2009-12-14 12:49:46 +11:00