Commit Graph

22961 Commits

Author SHA1 Message Date
Linus Torvalds
a7f75d3bed Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: re-tune NUMA topologies
  sched: stop wake_affine from causing serious imbalance
  sched: fix sched_clock_cpu()
  revert ("sched: fair-group: SMP-nice for group scheduling")
  sched: cleanup
  show_schedstat(): fix memleak
  sched: unite unlikely pairs in rt_policy() and schedule_debug()
  revert ("sched: fair: weight calculations")
2008-05-29 09:26:17 -07:00
Ingo Molnar
6715930654 Merge commit 'linus/master' into sched-fixes-for-linus 2008-05-29 16:05:05 +02:00
Ingo Molnar
ea3f01f8af sched: re-tune NUMA topologies
improve the sysbench ramp-up phase and its peak throughput on
a 16way NUMA box, by turning on WAKE_AFFINE:

             tip/sched   tip/sched+wake-affine
-------------------------------------------------
    1:             700              830    +15.65%
    2:            1465             1391    -5.28%
    4:            3017             3105    +2.81%
    8:            5100             6021    +15.30%
   16:           10725            10745    +0.19%
   32:           10135            10150    +0.16%
   64:            9338             9240    -1.06%
  128:            8599             8252    -4.21%
  256:            8475             8144    -4.07%
-------------------------------------------------
  SUM:           57558            57882    +0.56%

this change also improves lat_ctx from 6.69 usecs to 1.11 usec:

  $ ./lat_ctx -s 0 2
  "size=0k ovr=1.19
  2 1.11

  $ ./lat_ctx -s 0 2
  "size=0k ovr=1.22
  2 6.69

in sysbench it's an overall win with some weakness at the lots-of-clients
side. That happens because we now under-balance this workload
a bit. To counter that effect, turn on NEWIDLE:

              wake-idle          wake-idle+newidle
 -------------------------------------------------
     1:             830              834    +0.43%
     2:            1391             1401    +0.65%
     4:            3105             3091    -0.43%
     8:            6021             6046    +0.42%
    16:           10745            10736    -0.08%
    32:           10150            10206    +0.55%
    64:            9240             9533    +3.08%
   128:            8252             8355    +1.24%
   256:            8144             8384    +2.87%
 -------------------------------------------------
   SUM:           57882            58591    +1.21%

as a bonus this not only improves the many-clients case but
also improves the (more important) rampup phase.

sysbench is a workload that quickly breaks down if the
scheduler over-balances, so since it showed an improvement
under NEWIDLE this change is definitely good.
2008-05-29 14:46:30 +02:00
Ingo Molnar
6363ca57c7 revert ("sched: fair-group: SMP-nice for group scheduling")
Yanmin Zhang reported:

Comparing with 2.6.25, volanoMark has big regression with kernel 2.6.26-rc1.
It's about 50% on my 8-core stoakley, 16-core tigerton, and Itanium Montecito.

With bisect, I located the following patch:

| 18d95a2832 is first bad commit
| commit 18d95a2832
| Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
| Date:   Sat Apr 19 19:45:00 2008 +0200
|
|     sched: fair-group: SMP-nice for group scheduling

Revert it so that we get v2.6.25 behavior.

Bisected-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-29 11:28:57 +02:00
Linus Torvalds
3897b82c35 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
  [IA64] Workaround for RSE issue
2008-05-28 12:58:12 -07:00
David Howells
0a2ce2ffc3 Fix FRV minimum slab/kmalloc alignment
> +#define	ARCH_KMALLOC_MINALIGN		(sizeof(long) * 2)
> +#define	ARCH_SLAB_MINALIGN		(sizeof(long) * 2)

This doesn't work if SLAB is selected and slab debugging is enabled as
these are passed to the preprocessor, and the preprocessor doesn't
understand sizeof.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-28 09:05:28 -07:00
Linus Torvalds
b4412323cc Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  cfq-iosched: fix RCU problem in cfq_cic_lookup()
  block: make blktrace use per-cpu buffers for message notes
  Added in elevator switch message to blktrace stream
  Added in MESSAGE notes for blktraces
  block: reorder cfq_queue to save space on 64bit builds
  block: Move the second call to get_request to the end of the loop
  splice: handle try_to_release_page() failure
  splice: fix sendfile() issue with relay
2008-05-28 08:00:51 -07:00
David Howells
dc1d60a014 FRV: Specify the minimum slab/kmalloc alignment
Specify the minimum slab/kmalloc alignment to be 8 bytes.  This fixes a
crash when SLOB is selected as the memory allocator.  The FRV arch needs
this so that it can use the load- and store-double instructions without
faulting.  By default SLOB sets the minimum to be 4 bytes.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-28 07:59:06 -07:00
Vegard Nossum
5e55843bb8 MN10300: Fix typo in header guard
Fix a typo in the header guard of asm/ipc.h.

Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-28 07:59:06 -07:00
Jens Axboe
64565911cd block: make blktrace use per-cpu buffers for message notes
Currently it uses a single static char array, but that risks
being corrupted when multiple users issue message notes at the
same time. Make the buffers dynamically allocated when the trace
is setup and make them per-cpu instead.

The default max message size of 1k is also very large, the
interface is mainly for small text notes. So shrink it to 128 bytes.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-05-28 14:49:27 +02:00
Alan D. Brunelle
9d5f09a424 Added in MESSAGE notes for blktraces
Allows messages to be inserted into blktrace streams.

Signed-off-by: Alan D. Brunelle <alan.brunelle@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-05-28 14:49:27 +02:00
Tony Luck
4dcc29e157 [IA64] Workaround for RSE issue
Problem: An application violating the architectural rules regarding
operation dependencies and having specific Register Stack Engine (RSE)
state at the time of the violation, may result in an illegal operation
fault and invalid RSE state.  Such faults may initiate a cascade of
repeated illegal operation faults within OS interruption handlers.
The specific behavior is OS dependent.

Implication: An application causing an illegal operation fault with
specific RSE state may result in a series of illegal operation faults
and an eventual OS stack overflow condition.

Workaround: OS interruption handlers that switch to kernel backing
store implement a check for invalid RSE state to avoid the series
of illegal operation faults.

The core of the workaround is the RSE_WORKAROUND code sequence
inserted into each invocation of the SAVE_MIN_WITH_COVER and
SAVE_MIN_WITH_COVER_R19 macros.  This sequence includes hard-coded
constants that depend on the number of stacked physical registers
being 96.  The rest of this patch consists of code to disable this
workaround should this not be the case (with the presumption that
if a future Itanium processor increases the number of registers, it
would also remove the need for this patch).

Move the start of the RBS up to a mod32 boundary to avoid some
corner cases.

The dispatch_illegal_op_fault code outgrew the spot it was
squatting in when built with this patch and CONFIG_VIRT_CPU_ACCOUNTING=y
Move it out to the end of the ivt.

Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-05-27 13:24:39 -07:00
Oleg Nesterov
cbaffba12c posix timers: discard SI_TIMER signals on exec
Based on Roland's patch. This approach was suggested by Austin Clements
from the very beginning, and then by Linus.

As Austin pointed out, the execing task can be killed by SI_TIMER signal
because exec flushes the signal handlers, but doesn't discard the pending
signals generated by posix timers. Perhaps not a bug, but people find this
surprising. See http://bugzilla.kernel.org/show_bug.cgi?id=10460

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Austin Clements <amdragon+kernelbugzilla@mit.edu>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-26 10:37:07 -07:00
Linus Torvalds
84a881657d Merge branch 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6
* 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6:
  i2c: Align i2c_device_id
  tuner: Do not alter i2c_client.name
2008-05-26 10:24:06 -07:00
Linus Torvalds
0dfdf77ab8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  sparc64: global_reg_snapshot is not for userspace
2008-05-26 10:14:37 -07:00
Linus Torvalds
c5e6fd28e5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (52 commits)
  vlan: Use bitmask of feature flags instead of seperate feature bits
  fmvj18x_cs: add NextCom NC5310 rev B support
  xirc2ps_cs: re-initialize the multicast address in do_reset
  3C509: rx_bytes should not be increased when alloc_skb failed
  NETFRONT: Use __skb_queue_purge()
  VIRTIO: Use __skb_queue_purge()
  phylib: do EXPORT_SYMBOL on get_phy_id
  netlink: Fix nla_parse_nested_compat() to call nla_parse() directly
  WAN: protect HDLC proto list while insmod/rmmod
  drivers/net/fs_enet: remove null pointer dereference
  S2io: Version update for napi and MSI-X patches
  S2io: Added napi support when MSIX is enabled.
  S2io: Move all the transmit completions to a single msi-x (alarm) vector
  drivers/net/ehea - remove unnecessary memset after kzalloc
  au1000_eth: remove useless check
  Blackfin EMAC Driver: Removed duplicated include <linux/ethtool.h>
  cpmac bugfixes and enhancements
  e1000e: use resource_size_t, not unsigned long, for phys addrs
  net/usb: add support for Apple USB Ethernet Adapter
  uli526x: add support for netpoll
  ...
2008-05-26 10:14:02 -07:00
Jiri Slaby
2548baa07d i2c: Align i2c_device_id
Align i2c_device_id.driver_data to 8 bytes to not fail on crossbuilds.

(Added in d2653e92732bd3911feff6bee5e23dbf959381db.)

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2008-05-26 16:08:40 +02:00
Adrian Bunk
551dec47bb sparc64: global_reg_snapshot is not for userspace
global_reg_snapshot shouldn't be visible in our userspace headers.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-25 22:50:16 -07:00
Linus Torvalds
eb90d81d03 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip:
  x86: prevent PGE flush from interruption/preemption
  x86: use explicit copy in vdso_gettimeofday()
  namespacecheck: automated fixes
  x86/xen: fix arbitrary_virt_to_machine()
  x86: don't read maxlvt before checking if APIC is mapped
  x86: disable TSC for sched_clock() when calibration failed
  x86: distangle user disabled TSC from unstable
  x86: fix setup of cyc2ns in tsc_64.c
2008-05-24 10:20:00 -07:00
Linus Torvalds
d3c5f8b93f Merge branch 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm
* 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm:
  [ARM] integrator: fix build warnings and errors
  [ARM] fix OMAP include loops
  Revert "[ARM] pxa: spitz wants PXA27x UDC definitions"
  [ARM] 5053/1: define before use of processor_id
  [ARM] 5052/1: export clock functions for the at91x40
  [ARM] 5051/1: define pgtable_t for the !CONFIG_MMU case too
  [ARM] omap: fix omap clk support build errors
  [ARM] 5039/1: S3C244X: Rename SDI device if running on S3C244X.
  [ARM] 5043/1: pxafb: remove unused mode variable in pxafb_init_fbinfo
  [ARM] 5041/1: VR1000: Fix DM9000 IRQ flags initialisation
  [ARM] 5040/1: BAST: Fix DM9000 IRQ flags initialisation
  [ARM] 5038/1: ARM: OMAP: Remove tsc2102 references from board-palmte.c
  [ARM] 5025/2: fix collie cpu initialisation
2008-05-24 10:13:16 -07:00
Fernando Luis Vazquez Cao
12d15f0d51 for_each_online_pgdat(): kerneldoc fix
for_each_pgdat() was renamed to for_each_online_pgdat() and kerneldoc
comments should be updated accordingly.

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-24 09:56:13 -07:00
David Brownell
6ea0205b56 gpio: build fixes
This fixes various gpio-related build errors (mostly potential)
reported in part by Russell King and Uwe Kleine-König.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Uwe Kleine-König <Uwe.Kleine-Koenig@digi.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Arnaud Patard <arnaud.patard@rtp-net.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-24 09:56:13 -07:00
Ben Dooks
cdc83ae245 SM501: reverse FPEN/VBIASEN flags behaviour
To keep backwards compatibility, reverse the meanings of these flags so
that when they are not set, the driver uses the original behvaiour.

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Cc: Arnaud Patard <arnaud.patard@rtp-net.org>
Acked-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-24 09:56:12 -07:00
NeilBrown
dfc7064500 md: restart recovery cleanly after device failure.
When we get any IO error during a recovery (rebuilding a spare), we abort
the recovery and restart it.

For RAID6 (and multi-drive RAID1) it may not be best to restart at the
beginning: when multiple failures can be tolerated, the recovery may be
able to continue and re-doing all that has already been done doesn't make
sense.

We already have the infrastructure to record where a recovery is up to
and restart from there, but it is not being used properly.
This is because:
  - We sometimes abort with MD_RECOVERY_ERR rather than just MD_RECOVERY_INTR,
    which causes the recovery not be be checkpointed.
  - We remove spares and then re-added them which loses important state
    information.

The distinction between MD_RECOVERY_ERR and MD_RECOVERY_INTR really isn't
needed.  If there is an error, the relevant drive will be marked as
Faulty, and that is enough to ensure correct handling of the error.  So we
first remove MD_RECOVERY_ERR, changing some of the uses of it to
MD_RECOVERY_INTR.

Then we cause the attempt to remove a non-faulty device from an array to
fail (unless recovery is impossible as the array is too degraded).  Then
when remove_and_add_spares attempts to remove the devices on which
recovery can continue, it will fail, they will remain in place, and
recovery will continue on them as desired.

Issue:  If we are halfway through rebuilding a spare and another drive
fails, and a new spare is immediately available,  do we want to:
 1/ complete the current rebuild, then go back and rebuild the new spare or
 2/ restart the rebuild from the start and rebuild both devices in
    parallel.

Both options can be argued for.  The code currently takes option 2 as
  a/ this requires least code change
  b/ this results in a minimally-degraded array in minimal time.

Cc: "Eivind Sarto" <ivan@kasenna.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-24 09:56:10 -07:00
Bernd Schubert
90b08710e4 md: allow parallel resync of md-devices.
In some configurations, a raid6 resync can be limited by CPU speed
(Calculating P and Q and moving data) rather than by device speed.  In
these cases there is nothing to be gained byt serialising resync of arrays
that share a device, and doing the resync in parallel can provide benefit.
 So add a sysfs tunable to flag an array as being allowed to resync in
parallel with other arrays that use (a different part of) the same device.

Signed-off-by: Bernd Schubert <bs@q-leap.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-24 09:56:10 -07:00