Commit Graph

848 Commits

Author SHA1 Message Date
Alasdair G Kergon
76df1c651b [PATCH] device-mapper snapshot: fix invalidation
When a snapshot becomes invalid, s->valid is set to 0.  In this state, a
snapshot can no longer be accessed.

When s->lock is acquired, before doing anything else, s->valid must be checked
to ensure the snapshot remains valid.

This patch eliminates some races (that may cause panics) by adding some
missing checks.  At the same time, some unnecessary levels of indentation are
removed and snapshot invalidation is moved into a single function that always
generates a device-mapper event.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:58 -08:00
Alasdair G Kergon
b4b610f684 [PATCH] device-mapper snapshot: replace sibling list
The siblings "list" is used unsafely at the moment.

Firstly, only the element on the list being changed gets locked (via the
snapshot lock), not the next and previous elements which have pointers that
are also being changed.

Secondly, if you have two or more snapshots and write to the same chunk a
second time before every snapshot has finished making its private copy of the
data, if you're unlucky, _origin_write() could attempt its list_merge() and
dereference a 'last' pointer to a pending_exception structure that has just
been freed.

Analysis reveals that the list is actually only there for reference counting.
If 5 pending_exceptions are needed in origin_write, then the 5 are joined
together into a 5-element list - without a separate list head because there's
nowhere suitable to store it.  As the pending_exceptions complete, they are
removed from the list one-by-one and any contents of origin_bios get moved
across to one of the remaining pending_exceptions on the list.  Whichever one
is last is detected because list_empty() is then true and the origin_bios get
submitted.

The fix proposed here uses an alternative reference counting mechanism by
choosing one of the pending_exceptions as primary and maintaining an atomic
counter there.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:58 -08:00
Alasdair G Kergon
eccf081799 [PATCH] device-mapper snapshot: fix origin_write pending_exception submission
Say you have several snapshots of the same origin and then you issue a write
to some place in the origin for the first time.

Before the device-mapper snapshot target lets the write go through to the
underlying device, it needs to make a copy of the data that is about to be
overwritten.  Each snapshot is independent, so it makes one copy for each
snapshot.

__origin_write() loops through each snapshot and checks to see whether a copy
is needed for that snapshot.  (A copy is only needed the first time that data
changes.)

If a copy is needed, the code allocates a 'pending_exception' structure
holding the details.  It links these together for all the snapshots, then
works its way through this list and submits the copying requests to the kcopyd
thread by calling start_copy().  When each request is completed, the original
pending_exception structure gets freed in pending_complete().

If you're very unlucky, this structure can get freed *before* the submission
process has finished walking the list.

This patch:

  1) Creates a new temporary list pe_queue to hold the pending exception
     structures;

  2) Does all the bookkeeping up-front, then walks through the new list
     safely and calls start_copy() for each pending_exception that needed it;

  3) Avoids attempting to add pe->siblings to the list if it's already
     connected.

[NB This does not fix all the races in this code.  More patches will follow.]

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:58 -08:00
Linus Torvalds
9ae21d1bb3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
  drivers/char/ftape/lowlevel/fdc-io.c: Correct a comment
  Kconfig help: MTD_JEDECPROBE already supports Intel
  Remove ugly debugging stuff
  do_mounts.c: Minor ROOT_DEV comment cleanup
  BUG_ON() Conversion in drivers/s390/block/dasd_devmap.c
  BUG_ON() Conversion in mm/mempool.c
  BUG_ON() Conversion in mm/memory.c
  BUG_ON() Conversion in kernel/fork.c
  BUG_ON() Conversion in ipc/sem.c
  BUG_ON() Conversion in fs/ext2/
  BUG_ON() Conversion in fs/hfs/
  BUG_ON() Conversion in fs/dcache.c
  BUG_ON() Conversion in fs/buffer.c
  BUG_ON() Conversion in input/serio/hp_sdc_mlc.c
  BUG_ON() Conversion in md/dm-table.c
  BUG_ON() Conversion in md/dm-path-selector.c
  BUG_ON() Conversion in drivers/isdn
  BUG_ON() Conversion in drivers/char
  BUG_ON() Conversion in drivers/mtd/
2006-03-26 09:41:18 -08:00
Matthew Dobson
93d2341c75 [PATCH] mempool: use mempool_create_slab_pool()
Modify well over a dozen mempool users to call mempool_create_slab_pool()
rather than calling mempool_create() with extra arguments, saving about 30
lines of code and increasing readability.

Signed-off-by: Matthew Dobson <colpatch@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:00 -08:00
Matthew Dobson
26b6e051bc [PATCH] mempool: use common mempool kzalloc allocator
This patch changes a mempool user, which is basically just a wrapper around
kzalloc(), to use the common mempool_kmalloc/kfree, rather than its own
wrapper function, removing duplicated code.

Signed-off-by: Matthew Dobson <colpatch@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:59 -08:00
Matthew Dobson
0eaae62aba [PATCH] mempool: use common mempool kmalloc allocator
This patch changes several mempool users, all of which are basically just
wrappers around kmalloc(), to use the common mempool_kmalloc/kfree, rather
than their own wrapper function, removing a bunch of duplicated code.

Signed-off-by: Matthew Dobson <colpatch@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:59 -08:00
Matthew Dobson
a19b27ce38 [PATCH] mempool: use common mempool page allocator
Convert two mempool users that currently use their own mempool-backed page
allocators to use the generic mempool page allocator.

Also included are 2 trivial whitespace fixes.

Signed-off-by: Matthew Dobson <colpatch@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:59 -08:00
Ingo Molnar
14cc3e2b63 [PATCH] sem2mutex: misc static one-file mutexes
Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jens Axboe <axboe@suse.de>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Cc: Greg KH <greg@kroah.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:55 -08:00
Eric Sesterhenn
547bc92649 BUG_ON() Conversion in md/dm-table.c
this changes if() BUG(); constructs to BUG_ON() which is
cleaner and can better optimized away

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-03-26 18:22:50 +02:00
Eric Sesterhenn
c163c7293e BUG_ON() Conversion in md/dm-path-selector.c
this changes if() BUG(); constructs to BUG_ON() which is
cleaner and can better optimized away

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-03-26 18:21:58 +02:00
Linus Torvalds
1e8c573933 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (21 commits)
  BUG_ON() Conversion in drivers/video/
  BUG_ON() Conversion in drivers/parisc/
  BUG_ON() Conversion in drivers/block/
  BUG_ON() Conversion in sound/sparc/cs4231.c
  BUG_ON() Conversion in drivers/s390/block/dasd.c
  BUG_ON() Conversion in lib/swiotlb.c
  BUG_ON() Conversion in kernel/cpu.c
  BUG_ON() Conversion in ipc/msg.c
  BUG_ON() Conversion in block/elevator.c
  BUG_ON() Conversion in fs/coda/
  BUG_ON() Conversion in fs/binfmt_elf_fdpic.c
  BUG_ON() Conversion in input/serio/hil_mlc.c
  BUG_ON() Conversion in md/dm-hw-handler.c
  BUG_ON() Conversion in md/bitmap.c
  The comment describing how MS_ASYNC works in msync.c is confusing
  rcu: undeclared variable used in documentation
  fix typos "wich" -> "which"
  typo patch for fs/ufs/super.c
  Fix simple typos
  tabify drivers/char/Makefile
  ...
2006-03-25 08:41:09 -08:00
Adrian Bunk
7e31765550 [PATCH] md/bitmap.c:bitmap_mask_state(): fix inconsequent NULL checking
We dereference bitmap both one line above and one line below this check
rendering this check quite useless.

Spotted by the Coverity checker.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:57 -08:00
Eric Sesterhenn
4401d13899 BUG_ON() Conversion in md/dm-hw-handler.c
this changes if() BUG(); constructs to BUG_ON() which is
cleaner and can better optimized away

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-03-24 18:36:27 +01:00
Eric Sesterhenn
5daf2cf19a BUG_ON() Conversion in md/bitmap.c
this changes if() BUG(); constructs to BUG_ON() which is
cleaner and can better optimized away

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-03-24 18:35:26 +01:00
Jens Axboe
2056a782f8 [PATCH] Block queue IO tracing support (blktrace) as of 2006-03-23
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-23 20:00:26 +01:00
Alasdair G Kergon
d2044a94e8 [PATCH] dm: bio split bvec fix
The code that handles bios that span table target boundaries by breaking
them up into smaller bios will not split an individual struct bio_vec into
more than two pieces.  Sometimes more than that are required.

This patch adds a loop to break the second piece up into as many pieces as
are necessary.

Cc: "Abhishek Gupta" <abhishekgupt@gmail.com>
Cc: Dan Smith <danms@us.ibm.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:55 -08:00
Al Viro
1312f40e11 [PATCH] regularize blk_cleanup_queue() use
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-18 18:34:20 -05:00
Kevin Corry
8ba32fde2c [PATCH] dm stripe: Fix bounds
The dm-stripe target currently does not enforce that the size of a stripe
device be a multiple of the chunk-size.  Under certain conditions, this can
lead to I/O requests going off the end of an underlying device.  This
test-case shows one example.

echo "0 100 linear /dev/hdb1 0" | dmsetup create linear0
echo "0 100 linear /dev/hdb1 100" | dmsetup create linear1
echo "0 200 striped 2 32 /dev/mapper/linear0 0 /dev/mapper/linear1 0" | \
   dmsetup create stripe0
dd if=/dev/zero of=/dev/mapper/stripe0 bs=1k

This will produce the output:
dd: writing '/dev/mapper/stripe0': Input/output error
97+0 records in
96+0 records out

And in the kernel log will be:
attempt to access beyond end of device
dm-0: rw=0, want=104, limit=100

The patch will check that the table size is a multiple of the stripe
chunk-size when the table is created, which will prevent the above striped
device from being created.

This should not affect tools like LVM or EVMS, since in all the cases I can
think of, striped devices are always created with the sizes being a
multiple of the chunk-size.

The size of a stripe device must be a multiple of its chunk-size.

(akpm: that typecast is quite gratuitous)

Signed-off-by: Kevin Corry <kevcorry@us.ibm.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-17 07:51:25 -08:00
NeilBrown
04b857f74c [PATCH] md: Fix several raid1 bugs which cause a memory leak
- wrong test for 'is this a BARRIER bio'
- not freeing on all possible paths.
- using r1_bio after freeing it.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-09 19:47:37 -08:00
Jun'ichi Nomura
63d94e482d [PATCH] dm: free minor after unlink gendisk
Minor number should be freed after del_gendisk().  Otherwise, there could
be a window where 2 registered gendisk has same minor number.

Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-24 14:31:39 -08:00
Jun'ichi Nomura
d9dde59ba0 [PATCH] dm: missing bdput/thaw_bdev at removal
Need to unfreeze and release bdev otherwise the bdev inode with
inconsistent state is reused later and cause problem.

Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-24 14:31:39 -08:00
NeilBrown
8ed75463b9 [PATCH] md: Make sure rdev->size gets set for version-1 superblocks
Sometimes it doesn't so make the code more like the version-0 code which
works.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-03 08:32:00 -08:00
NeilBrown
29fc7e3e70 [PATCH] md: Assorted little md fixes
- version-1 superblock
  + The default_bitmap_offset is in sectors, not bytes.
  + the 'size' field in the superblock is in sectors, not KB
- raid0_run should return a negative number on error, not '1'
- raid10_read_balance should not return a valid 'disk' number if
     ->rdev turned out to be NULL
- kmem_cache_destroy doesn't like being passed a NULL.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-03 08:32:00 -08:00
NeilBrown
284ae7cab0 [PATCH] md: Handle overflow of mdu_array_info_t->size better
mdu_array_info_t->size is 'int', which isn't big enough for the size (in KB of
each component in) some arrays.

So rather than a random overflow, set size to -1 when it cannot be set
correctly.

To update aspect on an array, userspace will sometimes:
  get_array_info
  change one field
  set_array_info

in this case, we don't want the '-1' in 'size' to change to size, or look like
a size change at all.  So test for that in update_array_info.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-03 08:31:59 -08:00