As a preparation for removing ext2 non-atomic bit operations from
asm/bitops.h. This converts ext2 non-atomic bit operations to
little-endian bit operations.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mddev->curr_resync has artificial values of '1' and '2' which are used
by the code which ensures only one resync is happening at a time on
any given device.
These values are internal and should never be exposed to user-space
(except when translated appropriately as in the 'pending' status in
/proc/mdstat).
Unfortunately they are as ->curr_resync is assigned to
->curr_resync_completed and that value is directly visible through
sysfs.
So change the assignments to ->curr_resync_completed to get the same
valued from elsewhere in a form that doesn't have the magic '1' or '2'
values.
Signed-off-by: NeilBrown <neilb@suse.de>
Allow the metadata to be on a separate device from the
data.
This doesn't mean the data and metadata will by on separate
physical devices - it simply gives device-mapper and userspace
tools more flexibility.
Signed-off-by: NeilBrown <neilb@suse.de>
Add new parameter to 'sync_page_io'.
The new parameter allows us to distinguish between metadata and data
operations. This becomes important later when we add the ability to
use separate devices for data and metadata.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
When writing to an 'external' bitmap we don't currently unplug the
device before waiting, so we can get a 3msec delay each time;
So use REQ_UNPLUG to force and unplug.
Signed-off-by: NeilBrown <neilb@suse.de>
Currently sync_page_io takes a 'bdev'.
Every caller passes 'rdev->bdev'.
We will soon want another field out of the rdev in sync_page_io,
So just pass the rdev instead of the bdev out of it.
Signed-off-by: NeilBrown <neilb@suse.de>
bitmap_get_counter returns the number of sectors covered
by the counter in a pass-by-reference variable.
In some cases this can be very large, so make it a sector_t
for safety.
Signed-off-by: NeilBrown <neilb@suse.de>
MD_CHANGE_CLEAN is used for two different purposes and this leads to
confusion.
One of the purposes is largely mirrored by MD_CHANGE_PENDING which is
not used for anything else, so have MD_CHANGE_PENDING take over that
purpose fully.
The two purposes are:
1/ tell md_update_sb that an update is needed and that it is just a
clean/dirty transition.
2/ tell user-space that an transition from clean to dirty is pending
(something wants to write), and tell te kernel (by clearin the
flag) that the transition is OK.
The first purpose remains wit MD_CHANGE_CLEAN, the second is moved
fully to MD_CHANGE_PENDING.
This means that various places which conditionally set or cleared
MD_CHANGE_CLEAN no longer need to be conditional.
Signed-off-by: NeilBrown <neilb@suse.de>
dm makes this distinction between ->ctr and ->resume, so we need to
too.
Also get the new bitmap_load to clear out the bitmap first, as this is
most consistent with the dm suspend/resume approach
Signed-off-by: NeilBrown <neilb@suse.de>
This allows md/raid5 to fully work as a dm target.
Normally md uses a 'filemap' which contains a list of pages of bits
each of which may be written separately.
dm-log uses and all-or-nothing approach to writing the log, so
when using a dm-log, ->filemap is NULL and the flags normally stored
in filemap_attr are stored in ->logattrs instead.
Signed-off-by: NeilBrown <neilb@suse.de>
A bitmap is stored as one page per 2048 bits.
If none of the bits are set, the page is not allocated.
When bitmap_get_counter finds that a page isn't allocate,
it just reports that one bit work of space isn't flagged,
rather than reporting that 2048 bits worth of space are
unflagged.
This can cause searches for flagged bits (e.g. bitmap_close_sync)
to do more work than is really necessary.
So change bitmap_get_counter (when creating) to report a number of
blocks that more accurately reports the range of the device for which
no counter currently exists.
Signed-off-by: NeilBrown <neilb@suse.de>
1/ use md_unplug in bitmap.c as we will soon be using bitmaps under
arrays with no queue attached.
2/ Don't bother plugging the queue when we set a bit in the bitmap.
The reason for this was to encourage as many bits as possible to
get set before we unplug and write stuff out.
However every personality already plugs the queue after
bitmap_startwrite either directly (raid1/raid10) or be setting
STRIPE_BIT_DELAY which causes the queue to be plugged later
(raid5).
Signed-off-by: NeilBrown <neilb@suse.de>
For dm-raid45 we will want to use bitmaps in dm-targets which don't
have entries in sysfs, so cope with the mddev not living in sysfs.
Signed-off-by: NeilBrown <neilb@suse.de>
Fixes some whitespace problems
Fixed some checkpatch.pl complaints.
Replaced kmalloc ... memset(0), with kzalloc
Fixed an unlikely memory leak on an error path.
Reformatted a number of 'if/else' sets, sometimes
replacing goto with an else clause.
Removed some old comments and commented-out code.
Signed-off-by: NeilBrown <neilb@suse.de>
When MD_CHANGE_CLEAN is set we might block in md_write_start.
So we should only set it when fairly sure that something will clear
it.
There are two places where it is set so as to encourage a metadata
update to record the progress of resync/recovery. This should only
be done if the internal metadata update mechanisms are in use, which
can be tested by by inspecting '->persistent'.
Signed-off-by: NeilBrown <neilb@suse.de>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (69 commits)
fix handling of offsets in cris eeprom.c, get rid of fake on-stack files
get rid of home-grown mutex in cris eeprom.c
switch ecryptfs_write() to struct inode *, kill on-stack fake files
switch ecryptfs_get_locked_page() to struct inode *
simplify access to ecryptfs inodes in ->readpage() and friends
AFS: Don't put struct file on the stack
Ban ecryptfs over ecryptfs
logfs: replace inode uid,gid,mode initialization with helper function
ufs: replace inode uid,gid,mode initialization with helper function
udf: replace inode uid,gid,mode init with helper
ubifs: replace inode uid,gid,mode initialization with helper function
sysv: replace inode uid,gid,mode initialization with helper function
reiserfs: replace inode uid,gid,mode initialization with helper function
ramfs: replace inode uid,gid,mode initialization with helper function
omfs: replace inode uid,gid,mode initialization with helper function
bfs: replace inode uid,gid,mode initialization with helper function
ocfs2: replace inode uid,gid,mode initialization with helper function
nilfs2: replace inode uid,gid,mode initialization with helper function
minix: replace inode uid,gid,mode init with helper
ext4: replace inode uid,gid,mode init with helper
...
Trivial conflict in fs/fs-writeback.c (mark bitfields unsigned)
Conflicts:
drivers/md/md.c
- Resolved conflict in md_update_sb
- Added extra 'NULL' arg to new instance of sysfs_get_dirent.
Signed-off-by: NeilBrown <neilb@suse.de>
Now that the last user passing a NULL file pointer is gone we can remove
the redundant dentry argument and associated hacks inside vfs_fsynmc_range.
The next step will be removig the dentry argument from ->fsync, but given
the luck with the last round of method prototype changes I'd rather
defer this until after the main merge window.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The problem. When implementing a network namespace I need to be able
to have multiple network devices with the same name. Currently this
is a problem for /sys/class/net/*, /sys/devices/virtual/net/*, and
potentially a few other directories of the form /sys/ ... /net/*.
What this patch does is to add an additional tag field to the
sysfs dirent structure. For directories that should show different
contents depending on the context such as /sys/class/net/, and
/sys/devices/virtual/net/ this tag field is used to specify the
context in which those directories should be visible. Effectively
this is the same as creating multiple distinct directories with
the same name but internally to sysfs the result is nicer.
I am calling the concept of a single directory that looks like multiple
directories all at the same path in the filesystem tagged directories.
For the networking namespace the set of directories whose contents I need
to filter with tags can depend on the presence or absence of hotplug
hardware or which modules are currently loaded. Which means I need
a simple race free way to setup those directories as tagged.
To achieve a reace free design all tagged directories are created
and managed by sysfs itself.
Users of this interface:
- define a type in the sysfs_tag_type enumeration.
- call sysfs_register_ns_types with the type and it's operations
- sysfs_exit_ns when an individual tag is no longer valid
- Implement mount_ns() which returns the ns of the calling process
so we can attach it to a sysfs superblock.
- Implement ktype.namespace() which returns the ns of a syfs kobject.
Everything else is left up to sysfs and the driver layer.
For the network namespace mount_ns and namespace() are essentially
one line functions, and look to remain that.
Tags are currently represented a const void * pointers as that is
both generic, prevides enough information for equality comparisons,
and is trivial to create for current users, as it is just the
existing namespace pointer.
The work needed in sysfs is more extensive. At each directory
or symlink creating I need to check if the directory it is being
created in is a tagged directory and if so generate the appropriate
tag to place on the sysfs_dirent. Likewise at each symlink or
directory removal I need to check if the sysfs directory it is
being removed from is a tagged directory and if so figure out
which tag goes along with the name I am deleting.
Currently only directories which hold kobjects, and
symlinks are supported. There is not enough information
in the current file attribute interfaces to give us anything
to discriminate on which makes it useless, and there are
no potential users which makes it an uninteresting problem
to solve.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When a raid1 array is configured to support write-behind
on some devices, it normally only reads from other devices.
If all devices are write-behind (because the rest have failed)
it is possible for a read request to be serviced before a
behind-write request, which would appear as data corruption.
So when forced to read from a WriteMostly device, wait for any
write-behind to complete, and don't start any more behind-writes.
Signed-off-by: NeilBrown <neilb@suse.de>
void pointers do not need to be cast to other pointer types.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Keep track of the maximum number of concurrent write-behind requests
for an md array and exposed this number in sysfs at
md/bitmap/max_backlog_used
Writing any value to this file will clear it.
This allows userspace to be involved in tuning bitmap/backlog.
Signed-off-by: Paul Clements <paul.clements@steeleye.com>
Signed-off-by: NeilBrown <neilb@suse.de>
There is a sysfs file which allows bits in the write-intent
bitmap to be explicit set - indicating that the block is thought
to be 'dirty'.
When this happens we should really set recovery_cp backwards
to include the block to reflect this dirtiness.
In particular, a 'resync' process will refuse to start if
recovery_cp is beyond the end of the array, so this is needed
to allow a resync to be triggered.
Signed-off-by: NeilBrown <neilb@suse.de>
In this case, the metadata needs to not be in the same
sector as the bitmap.
md will not read/write any bitmap metadata. Config must be
done via sysfs and when a recovery makes the array non-degraded
again, writing 'true' to 'bitmap/can_clear' will allow bits in
the bitmap to be cleared again.
Signed-off-by: NeilBrown <neilb@suse.de>