Commit Graph

67 Commits

Author SHA1 Message Date
Andre Noll ac5e7113e7 md: Push down data integrity code to personalities.
This patch replaces md_integrity_check() by two new public functions:
md_integrity_register() and md_integrity_add_rdev() which are both
personality-independent.

md_integrity_register() is called from the ->run and ->hot_remove
methods of all personalities that support data integrity.  The
function iterates over the component devices of the array and
determines if all active devices are integrity capable and if their
profiles match. If this is the case, the common profile is registered
for the mddev via blk_integrity_register().

The second new function, md_integrity_add_rdev() is called from the
->hot_add_disk methods, i.e. whenever a new device is being added
to a raid array. If the new device does not support data integrity,
or has a profile different from the one already registered, data
integrity for the mddev is disabled.

For raid0 and linear, only the call to md_integrity_register() from
the ->run method is necessary.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
2009-08-03 10:59:47 +10:00
Martin K. Petersen 8f6c2e4b32 md: Use new topology calls to indicate alignment and I/O sizes
Switch MD over to the new disk_stack_limits() function which checks for
aligment and adjusts preferred I/O sizes when stacking.

Also indicate preferred I/O sizes where applicable.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2009-07-01 11:13:45 +10:00
Andre Noll 0894cc3066 md: Move check for bitmap presence to personality code.
If the superblock of a component device indicates the presence of a
bitmap but the corresponding raid personality does not support bitmaps
(raid0, linear, multipath, faulty), then something is seriously wrong
and we'd better refuse to run such an array.

Currently, this check is performed while the superblocks are examined,
i.e. before entering personality code. Therefore the generic md layer
must know which raid levels support bitmaps and which do not.

This patch avoids this layer violation without adding identical code
to various personalities. This is accomplished by introducing a new
public function to md.c, md_check_no_bitmap(), which replaces the
hard-coded checks in the superblock loading functions.

A call to md_check_no_bitmap() is added to the ->run method of each
personality which does not support bitmaps and assembly is aborted
if at least one component device contains a bitmap.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-18 08:49:23 +10:00
NeilBrown 13f2682b72 md: raid0/linear: ensure device sizes are rounded to chunk size.
This is currently ensured by common code, but it is more reliable to
ensure it where it is needed in personality code.
All the other personalities that care already round the size to
the chunk_size.  raid0 and linear are the only hold-outs.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-18 08:48:55 +10:00
NeilBrown d6e412eaa5 md: raid0: chunk_sectors cleanups.
following the conversion to chunk_sectors, there is room
for cleaning up a little.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-18 08:47:00 +10:00
Andre Noll 9d8f036362 md: Make mddev->chunk_size sector-based.
This patch renames the chunk_size field to chunk_sectors with the
implied change of semantics.  Since

	is_power_of_2(chunk_size) = is_power_of_2(chunk_sectors << 9)
				  = is_power_of_2(chunk_sectors)

these bits don't need an adjustment for the shift.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-18 08:45:01 +10:00
raz ben yehuda fbb704efb7 md: raid0 :Enables chunk size other than powers of 2.
Maintain two flows, one for pow2 chunk sizes (which uses masks and
shift), and a flow for the general case (which uses sector_div).
This is for the sake of performance.

 - introduce map_sector and is_io_in_chunk_boundary to encapsulate
   those two flows better for raid0_make_request
 - fix blk_mergeable to support the two flows.

Signed-off-by: raziebe@gmail.com
Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-16 17:02:05 +10:00
raz ben yehuda 92e59b6ba2 md: raid0: chunk size check in raid0_run
have raid0 check chunk size in run method instead of in md.
This is part of a series moving the checks from common code to
the personalities where they belong.

hardsect is short and chunksize is an int, so it is safe to use %.

Signed-off-by: raziebe@gmail.com
Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-16 17:00:57 +10:00
raz ben yehuda 46994191ae md: have raid0 report its formation
Report to the user what are the raid zones

Signed-off-by: raziebe@gmail.com
Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-16 17:00:54 +10:00
raz ben yehuda 1b9614291e md: have raid0 compile with MD_DEBUG on
Because of the removal of the device list from
the strips raid0 did not compile with MD_DEBUG flag on

Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-16 16:57:40 +10:00
NeilBrown 070ec55d07 md: remove mddev_to_conf "helper" macro
Having a macro just to cast a void* isn't really helpful.
I would must rather see that we are simply de-referencing ->private,
than have to know what the macro does.

So open code the macro everywhere and remove the pointless cast.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-16 16:54:21 +10:00
NeilBrown a6b3deafe0 md: raid0: remove setting of segment boundary.
This setting doesn't seem to make sense (half the chunk size??) and
shouldn't be needed.
The segment boundary exported by raid0 should simply be the minimum
of the segment boundary of all component devices.  And we already
get that right.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-16 16:54:07 +10:00
NeilBrown b414579f45 md: raid0: remove ->dev pointer from strip_zone structure
If we treat conf->devlist more like a 2 dimensional array,
we can get the devlist for a particular zone simply by indexing
that array, so we don't need to store the pointers to subarrays
in strip_zone.  This makes strip_zone smaller and so (hopefully)
searches faster.

Signed-of-by: NeilBrown <neilb@suse.de>
2009-06-16 16:50:52 +10:00
NeilBrown 49f357a22b md: raid0: remove ->sectors from the strip_zone structure.
storing ->sectors is redundant as is can be computed from the
difference  z->zone_end - (z-1)->zone_end

The one place where it is used, it is just as efficient to use
a zone_end value instead.

And removing it makes strip_zone smaller, so they array of these that
is searched on every request has a better chance to say in cache.

So discard the field and get the value from elsewhere.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-16 16:50:35 +10:00
Andre Noll fb5ab4b5d6 md: raid0: Fix a memory leak when stopping a raid0 array.
raid0_stop() removes all references to the raid0 configuration but
misses to free the ->devlist buffer.

This patch closes this leak, removes a pointless initialization and
fixes a coding style issue in raid0_stop().

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-16 16:48:19 +10:00
Andre Noll ed7b00380d md: raid0: Allocate all buffers for the raid0 configuration in one function.
Currently the raid0 configuration is allocated in raid0_run() while
the buffers for the strip_zone and the dev_list arrays are allocated
in create_strip_zones(). On errors, all three buffers are freed
in raid0_run().

It's easier and more readable to do the allocation and cleanup within
a single function. So move that code into create_strip_zones().

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-16 16:47:36 +10:00
Andre Noll 5568a6035d md: raid0: Make raid0_run() return a proper error code.
Currently raid0_run() always returns -ENOMEM on errors. This is
incorrect as running the array might fail for other reasons, for
example because not all component devices were available.

This patch changes create_strip_zones() so that it returns a proper
error code (either -ENOMEM or -EINVAL) rather than 1 on errors and
makes raid0_run(), its single caller, return that value instead
of -ENOMEM.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-16 16:47:21 +10:00
Andre Noll 8f79cfcdb6 md: raid0: Remove hash spacing and sector shift.
The "sector_shift" and "spacing" fields of struct raid0_private_data
were only used for the hash table lookups. So the removal of the
hash table allows get rid of these fields as well which simplifies
create_strip_zones() and raid0_run() quite a bit.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-16 16:47:10 +10:00
Andre Noll 09770e0b6e md: raid0: Remove hash table.
The raid0 hash table has become unused due to the changes in the
previous patch. This patch removes the hash table allocation and
setup code and kills the hash_table field of struct raid0_private_data.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-16 16:46:48 +10:00
NeilBrown d27a43abd7 md/raid0: two cleanups in create_stripe_zones.
1/ remove current_start.  The same value is available in
     zone->dev_start and storing it separately doesn't gain anything.
2/ rename curr_zone_start to curr_zone_end as we are now more
     focused on the 'end' of each zone.  We end up storing the
     same number though - the old name was a little confusing
     (and what does 'current' mean in this context anyway).

Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-16 16:46:46 +10:00
Andre Noll dc58266385 md: raid0: Replace hash table lookup by looping over all strip_zones.
The number of strip_zones of a raid0 array is bounded by the number of
drives in the array and is in fact much smaller for typical setups. For
example, any raid0 array containing identical disks will have only
a single strip_zone.

Therefore, the hash tables which are used for quickly finding the
strip_zone that holds a particular sector are of questionable value
and add quite a bit of unnecessary complexity.

This patch replaces the hash table lookup by equivalent code which
simply loops over all strip zones to find the zone that holds the
given sector.

In order to make this loop as fast as possible, the zone->start field
of struct strip_zone has been renamed to zone_end, and it now stores
the beginning of the next zone in sectors. This allows to save one
addition in the loop.

Subsequent cleanup patches will remove the hash table structure.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-16 16:18:43 +10:00
Martin K. Petersen ae03bf639a block: Use accessor functions for queue limits
Convert all external users of queue limits to using wrapper functions
instead of poking the request queue variables directly.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-05-22 23:22:54 +02:00
Dan Williams b522adcde9 md: 'array_size' sysfs attribute
Allow userspace to set the size of the array according to the following
semantics:

1/ size must be <= to the size returned by mddev->pers->size(mddev, 0, 0)
   a) If size is set before the array is running, do_md_run will fail
      if size is greater than the default size
   b) A reshape attempt that reduces the default size to less than the set
      array size should be blocked
2/ once userspace sets the size the kernel will not change it
3/ writing 'default' to this attribute returns control of the size to the
   kernel and reverts to the size reported by the personality

Also, convert locations that need to know the default size from directly
reading ->array_sectors to <pers>_size.  Resync/reshape operations
always follow the default size.

Finally, fixup other locations that read a number of 1k-blocks from
userspace to use strict_blocks_to_sectors() which checks for unsigned
long long to sector_t overflow and blocks to sectors overflow.

Reviewed-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-03-31 15:00:31 +11:00
Dan Williams 1f403624bd md: centralize ->array_sectors modifications
Get personalities out of the business of directly modifying
->array_sectors.  Lays groundwork to introduce policy on when
->array_sectors can be modified.

Reviewed-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-03-31 14:59:03 +11:00
Dan Williams 80c3a6ce4b md: add 'size' as a personality method
In preparation for giving userspace control over ->array_sectors we need
to be able to retrieve the 'default' size, and the 'anticipated' size
when a reshape is requested.  For personalities that do not reshape emit
a warning if anything but the default size is requested.

In the raid5 case we need to update ->previous_raid_disks to make the
new 'default' size available.

Reviewed-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-03-31 14:57:49 +11:00