Commit Graph

175897 Commits

Author SHA1 Message Date
NeilBrown 43a705076e md: support updating bitmap parameters via sysfs.
A new attribute directory 'bitmap' in 'md' is created which
contains files for configuring the bitmap.
'location' identifies where the bitmap is, either 'none',
or 'file' or 'sector offset from metadata'.
Writing 'location' can create or remove a bitmap.
Adding a 'file' bitmap this way is not yet supported.
'chunksize' and 'time_base' must be set before 'location'
can be set.

'chunksize' can be set before creating a bitmap, but is
currently always over-ridden by the bitmap superblock.

'time_base' and 'backlog' can be updated at any time.


Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Andre Noll <maan@systemlinux.org>
2009-12-14 12:51:41 +11:00
NeilBrown 72e02075a3 md: factor out parsing of fixed-point numbers
safe_delay_store can parse fixed point numbers (for fractions
of a second).  We will want to do that for another sysfs
file soon, so factor out the code.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-12-14 12:51:41 +11:00
NeilBrown f6af949c56 md: support bitmap offset appropriate for external-metadata arrays.
For md arrays were metadata is managed externally, the kernel does not
know about a superblock so the superblock offset is 0.
If we want to have a write-intent-bitmap near the end of the
devices of such an array, we should support sector_t sized offset.
We need offset be possibly negative for when the bitmap is before
the metadata, so use loff_t instead.

Also add sanity check that bitmap does not overlap with data.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-12-14 12:51:41 +11:00
NeilBrown 9cd30fdc33 md: remove needless setting of thread->timeout in raid10_quiesce
As bitmap_create and bitmap_destroy already set thread->timeout
as appropriate, there is no need to do it in raid10_quiesce.
There is a possible need to wake the thread after the timeout
has been set low, but it is better to do that where the timeout
is actually set low, in bitmap_create.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-12-14 12:51:41 +11:00
NeilBrown 1b04be96f6 md: change daemon_sleep to be in 'jiffies' rather than 'seconds'.
This removes a lot of multiplications by HZ.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-12-14 12:51:41 +11:00
NeilBrown 42a04b5078 md: move offset, daemon_sleep and chunksize out of bitmap structure
... and into bitmap_info.  These are all configuration parameters
that need to be set before the bitmap is created.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-12-14 12:51:41 +11:00
NeilBrown c3d9714e88 md: collect bitmap-specific fields into one structure.
In preparation for making bitmap fields configurable via sysfs,
start tidying up by making a single structure to contain the
configuration fields.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-12-14 12:51:41 +11:00
NeilBrown 709ae4879a md/raid1: add takeover support for raid5->raid1
A 2-device raid5 array can now be converted to raid1.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-12-14 12:51:41 +11:00
NeilBrown 6eef4b21ff md: add honouring of suspend_{lo,hi} to raid1.
This will allow us to stop writeout to portions of the array
while  they are resynced by someone else - e.g. another node in
a cluster.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-12-14 12:51:40 +11:00
NeilBrown 729a18663a md/raid5: don't complete make_request on barrier until writes are scheduled
The post-barrier-flush is sent by md as soon as make_request on the
barrier write completes.  For raid5, the data might not be in the
per-device queues yet.  So for barrier requests, wait for any
pre-reading to be done so that the request will be in the per-device
queues.

We use the 'preread_active' count to check that nothing is still in
the preread phase, and delay the decrement of this count until after
write requests have been submitted to the underlying devices.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-12-14 12:51:40 +11:00
NeilBrown a2826aa92e md: support barrier requests on all personalities.
Previously barriers were only supported on RAID1.  This is because
other levels requires synchronisation across all devices and so needed
a different approach.
Here is that approach.

When a barrier arrives, we send a zero-length barrier to every active
device.  When that completes - and if the original request was not
empty -  we submit the barrier request itself (with the barrier flag
cleared) and then submit a fresh load of zero length barriers.

The barrier request itself is asynchronous, but any subsequent
request will block until the barrier completes.

The reason for clearing the barrier flag is that a barrier request is
allowed to fail.  If we pass a non-empty barrier through a striping
raid level it is conceivable that part of it could succeed and part
could fail.  That would be way too hard to deal with.
So if the first run of zero length barriers succeed, we assume all is
sufficiently well that we send the request and ignore errors in the
second run of barriers.

RAID5 needs extra care as write requests may not have been submitted
to the underlying devices yet.  So we flush the stripe cache before
proceeding with the barrier.

Note that the second set of zero-length barriers are submitted
immediately after the original request is submitted.  Thus when
a personality finds mddev->barrier to be set during make_request,
it should not return from make_request until the corresponding
per-device request(s) have been queued.

That will be done in later patches.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Andre Noll <maan@systemlinux.org>
2009-12-14 12:49:49 +11:00
NeilBrown efa593390e md: don't reset curr_resync_completed after an interrupted resync
If a resync/recovery/check/repair is interrupted for some reason, it
can be useful to know exactly where it got up to.
So in that case, do not clear curr_resync_completed.
Initialise it when starting a resync/recovery/... instead.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-12-14 12:49:49 +11:00
NeilBrown c07b70ad32 md: adjust resync_min usefully when resync aborts.
When a 'check' or 'repair' finished we should clear resync_min
so that a future check/repair will cover the whole array (by default).
However if it is interrupted, we should update resync_min to
where we got up to, so that when the check/repair continues it
just does the remainder of the array.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-12-14 12:49:48 +11:00
NeilBrown 7820f9e1dd md: remove sparse warning:symbol XXX was not declared.
Signed-off-by: NeilBrown <neilb@suse.de>
2009-12-14 12:49:47 +11:00
NeilBrown 8553fe7ec7 md/raid5: remove some sparse warnings.
qd_idx is previously declared and given exactly the same value!

Signed-off-by: NeilBrown <neilb@suse.de>
2009-12-14 12:49:47 +11:00
NeilBrown aa5cbd1038 md/bitmap: protect against bitmap removal while being updated.
A write intent bitmap can be removed from an array while the
array is active.
When this happens, all IO is suspended and flushed before the
bitmap is removed.
However it is possible that bitmap_daemon_work is still running to
clear old bits from the bitmap.  If it is, it can dereference the
bitmap after it has been freed.

So introduce a new mutex to protect bitmap_daemon_work and get it
before destroying a bitmap.

This is suitable for any current -stable kernel.

Signed-off-by: NeilBrown <neilb@suse.de>
Cc: stable@kernel.org
2009-12-14 12:49:46 +11:00
Linus Torvalds f40542532e Merge branch 'ixp4xx' of git://git.kernel.org/pub/scm/linux/kernel/git/chris/linux-2.6
* 'ixp4xx' of git://git.kernel.org/pub/scm/linux/kernel/git/chris/linux-2.6:
  IXP4xx: GTWX5715 platform only has two PCI IRQ lines, not four.
  IXP4xx: Introduce IXP4XX_GPIO_IRQ(n) macro and convert IXP4xx platform files.
  IXP4xx: move Gemtek GTWX5715 platform macros to the platform code.
  IXP4xx: Remove unused Motorola PrPMC1100 platform macros.
  IXP4xx: move FSG platform macros to the platform code.
  IXP4xx: move DSM G600 platform macros to the platform code.
  IXP4xx: move NAS100D platform macros to the platform code.
  IXP4xx: move NSLU2 platform macros to the platform code.
  IXP4xx: move Coyote platform macros to the platform code.
  IXP4xx: move AVILA platform macros to the platform code.
  IXP4xx: move IXDP425 platform macros to the platform code.
  IXP4xx: Extend PCI MMIO indirect address space to 1 GB.
  IXP4xx: Fix compilation failure with CONFIG_IXP4XX_INDIRECT_PCI.
  IXP4xx: Drop "__ixp4xx_" prefix from in/out/ioread/iowrite functions for clarity.
  IXP4xx: Rename indirect MMIO primitives from __ixp4xx_* to __indirect_*.
  IXP4xx: Ensure index is positive in irq_to_gpio() and npe_request().
  ARM: fix insl() and outsl() endianness on IXP4xx architecture.
  IXP4xx: Fix normally-disabled debugging text in drivers/net/arm/ixp4xx_eth.c.
  IXP4xx: change the timer base frequency to 66.666000 MHz.
2009-12-12 15:22:22 -08:00
Linus Torvalds f01eb36403 [BKL] add 'might_sleep()' to the outermost lock taker
As shown by the previous patch (6698e3472: "tty: Fix BKL taken under a
spinlock bug introduced in the BKL split") the BKL removal is prone to
some subtle issues, where removing the BKL in one place may in fact make
a previously nested BKL call the new outer call, and then prone to nasty
deadlocks with other spinlocks.

In general, we should never take the BKL while we're holding a spinlock,
so let's just add a "might_sleep()" to it (even though the BKL doesn't
technically sleep - at least not yet), and we'll get nice warnings the
next time this kind of problem happens during BKL removal.

Acked-and-Tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-12 14:46:33 -08:00
Alan Cox 6698e34720 tty: Fix BKL taken under a spinlock bug introduced in the BKL split
The fasync path takes the BKL (it probably doesn't need to in fact)
while holding the file_list spinlock.  You can't do that with the kernel
lock: it causes lock inversions and deadlocks.

Leave the BKL over that bit for the moment.

Identified by AKPM.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Acked-and-Tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-12 14:46:21 -08:00
Linus Torvalds 09cea96caa Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (151 commits)
  powerpc: Fix usage of 64-bit instruction in 32-bit altivec code
  MAINTAINERS: Add PowerPC patterns
  powerpc/pseries: Track previous CPPR values to correctly EOI interrupts
  powerpc/pseries: Correct pseries/dlpar.c build break without CONFIG_SMP
  powerpc: Make "intspec" pointers in irq_host->xlate() const
  powerpc/8xx: DTLB Miss cleanup
  powerpc/8xx: Remove DIRTY pte handling in DTLB Error.
  powerpc/8xx: Start using dcbX instructions in various copy routines
  powerpc/8xx: Restore _PAGE_WRITETHRU
  powerpc/8xx: Add missing Guarded setting in DTLB Error.
  powerpc/8xx: Fixup DAR from buggy dcbX instructions.
  powerpc/8xx: Tag DAR with 0x00f0 to catch buggy instructions.
  powerpc/8xx: Update TLB asm so it behaves as linux mm expects.
  powerpc/8xx: Invalidate non present TLBs
  powerpc/pseries: Serialize cpu hotplug operations during deactivate Vs deallocate
  pseries/pseries: Add code to online/offline CPUs of a DLPAR node
  powerpc: stop_this_cpu: remove the cpu from the online map.
  powerpc/pseries: Add kernel based CPU DLPAR handling
  sysfs/cpu: Add probe/release files
  powerpc/pseries: Kernel DLPAR Infrastructure
  ...
2009-12-12 14:27:24 -08:00
Linus Torvalds 6eb7365db6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: hda - Overwrite pin config on intel DG45ID board.
  intelhdmi - dont power off HDA link
  ALSA: hrtimer - Fix lock-up
  ALSA: intelhdmi - add channel mapping for typical configurations
  ALSA: intelhdmi - channel mapping applies to Pin
  ALSA: intelhdmi - accept DisplayPort pin
  ALSA: hda - show HBR(High Bit Rate) pin cap in procfs
  ALSA: hda - Fix LED GPIO setup for HP laptops with IDT codecs
  ASoC: Fix build of OMAP sound drivers
  ALSA: opti93x: fix irq releasing if the irq cannot be allocated
2009-12-12 11:40:50 -08:00
Linus Torvalds 9c3936cb69 Merge branch 'omap-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6
* 'omap-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6: (75 commits)
  omap3: Fix OMAP35XX_REV macros
  omap: serial: fix non-empty uart fifo read abort
  omap3: Zoom2/3: Update hsmmc board config params
  omap3 : Enable TWL4030 Keypad for Zoom2 and Zoom3 boards
  omap3: id code detection 3525 vs 3515
  omap3: rx51: Use wl1251 in SPI mode 3
  omap3: zoom2/3: make MMC slot work again
  omap1: htcherald: Update defconfig to include mux support
  omap1: LCD_DMA: Use some define rather than a hexadecimal
  omap: header: remove unused data-type
  omap: arch/arm/plat-omap/devices.c - sort alphabetically
  omap: Correcting GPMC_CONFIG1_DEVICETYPE_NAND
  OMAP3: serial - allow platforms specify which UARTs to initialize
  omap3: cm-t35: add mux initialization
  OMAP4: Sync up omap4430 defconfig
  OMAP4: Remove the secondary wait loop
  OMAP4: AuxCoreBoot registers only accessible in secure mode
  OMAP4: Fix SRAM base and size
  OMAP4: Fix cpu detection
  omap3: pandora: board file updates for .33
  ...
2009-12-12 11:40:13 -08:00
Linus Torvalds 5de76b18d1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  be2net: fix error in rx completion processing.
  igbvf: avoid reset storms due to mailbox issues
  igb: fix handling of mailbox collisions between PF/VF
  usb: remove rare pm primitive for conversion to new API
2009-12-12 11:39:09 -08:00
Linus Torvalds 8d0e7fb9d1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
  slab, kmemleak: pass the correct pointer to kmemleak_erase()
  slab, kmemleak: stop calling kmemleak_erase() unconditionally
  SLAB: Fix unlikely() annotation in __cache_alloc_node()
  SLAB: Fix lockdep annotations for CPU hotplug
  SLUB: Fix __GFP_ZERO unlikely() annotation
  slub: allow stats to be cleared
2009-12-12 11:37:39 -08:00
Linus Torvalds 702a7c7609 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (21 commits)
  sched: Remove forced2_migrations stats
  sched: Fix memory leak in two error corner cases
  sched: Fix build warning in get_update_sysctl_factor()
  sched: Update normalized values on user updates via proc
  sched: Make tunable scaling style configurable
  sched: Fix missing sched tunable recalculation on cpu add/remove
  sched: Fix task priority bug
  sched: cgroup: Implement different treatment for idle shares
  sched: Remove unnecessary RCU exclusion
  sched: Discard some old bits
  sched: Clean up check_preempt_wakeup()
  sched: Move update_curr() in check_preempt_wakeup() to avoid redundant call
  sched: Sanitize fork() handling
  sched: Clean up ttwu() rq locking
  sched: Remove rq->clock coupling from set_task_cpu()
  sched: Consolidate select_task_rq() callers
  sched: Remove sysctl.sched_features
  sched: Protect sched_rr_get_param() access to task->sched_class
  sched: Protect task->cpus_allowed access in sched_getaffinity()
  sched: Fix balance vs hotplug race
  ...

Fixed up conflicts in kernel/sysctl.c (due to sysctl cleanup)
2009-12-12 11:34:10 -08:00