Patches c22402a2f ("sched/fair: Let minimally loaded cpu balance the
group") and 0ce90475 ("sched/fair: Add some serialization to the
sched_domain load-balance walk") are horribly broken so revert them.
The problem is that while it sounds good to have the minimally loaded
cpu do the pulling of more load, the way we walk the domains there is
absolutely no guarantee this cpu will actually get to the domain. In
fact its very likely it wont. Therefore the higher up the tree we get,
the less likely it is we'll balance at all.
The first of mask always walks up, while sucky in that it accumulates
load on the first cpu and needs extra passes to spread it out at least
guarantees a cpu gets up that far and load-balancing happens at all.
Since its now always the first and idle cpus should always be able to
balance so they get a task as fast as possible we can also do away
with the added serialization.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-rpuhs5s56aiv1aw7khv9zkw6@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit ad7687dde ("x86/numa: Check for nonsensical topologies on real
hw as well") is broken in that the condition can trigger for valid
setups but only changes the end result for invalid setups with no real
means of discerning between those.
Rewrite set_cpu_sibling_map() to make the code clearer and make sure
to only warn when the check changes the end result.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-klcwahu3gx467uhfiqjyhdcs@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently we let the leftmost (or first idle) cpu ascend the
sched_domain tree and perform load-balancing. The result is that the
busiest cpu in the group might be performing this function and pull
more load to itself. The next load balance pass will then try to
equalize this again.
Change this to pick the least loaded cpu to perform higher domain
balancing.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-v8zlrmgmkne3bkcy9dej1fvm@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since there's a PID space limit of 30bits (see
futex.h:FUTEX_TID_MASK) and allocating that many tasks (assuming a
lower bound of 2 pages per task) would still take 8T of memory it
seems reasonable to say that unsigned int is sufficient for
rq->nr_running.
When we do get anywhere near that amount of tasks I suspect other
things would go funny, load-balancer load computations would really
need to be hoisted to 128bit etc.
So save a few bytes and convert rq->nr_running and friends to
unsigned int.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-y3tvyszjdmbibade5bw8zl81@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Allows emulating more interesting NUMA configurations like a quad
socket AMD Magny-Cour:
"numa=fake=8:10,16,16,22,16,22,16,22,
16,10,22,16,22,16,22,16,
16,22,10,16,16,22,16,22,
22,16,16,10,22,16,22,16,
16,22,16,22,10,16,16,22,
22,16,22,16,16,10,22,16,
16,22,16,22,16,22,10,16,
22,16,22,16,22,16,16,10"
Which has a non-fully-connected topology.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/n/tip-e1136ef7kdffj7yf9tjhydln@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The checks that exist in mwait_usable() for "idle=" kernel
parameters are insufficient. As a result, mwait_usable() can
return 1 even if "idle=nomwait" or "idle=poll" or "idle=halt"
parameters are passed.
Of these cases, incorrect handling of idle=nomwait is a
universal problem since mwait can get used for usual CPU idling.
However the rest of the cases are problematic only during CPU
Hotplug (offline) because, in the CPU offline path, the function
mwait_play_dead() is called, which might result in mwait being
used in the offline CPUs, if mwait_usable() happens to return 1.
Fix these issues by checking for the boot time "idle=" kernel
parameter properly in mwait_usable().
The first issue (usual cpu idling) is demonstrated below:
Before applying the patch (dmesg snippet):
[ 0.000000] Command line: [...] idle=nomwait
[ 0.000000] Kernel command line: [...] idle=nomwait
[ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
[ 0.140606] using mwait in idle threads. <======= mwait being used
[ 4.303986] cpuidle: using governor ladder
[ 4.308232] cpuidle: using governor menu
After applying the patch:
[ 0.000000] Command line: [...] idle=nomwait
[ 0.000000] Kernel command line: [...] idle=nomwait
[ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
[ 4.264100] cpuidle: using governor ladder
[ 4.268342] cpuidle: using governor menu
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: venki@google.com
Cc: suresh.b.siddha@intel.com
Cc: Borislav Petkov <bp@amd64.org>
Cc: lenb@kernel.org
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Link: http://lkml.kernel.org/r/4F9E37B8.30001@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit ce7e5d2d19 ("x86: fix broken TASK_SIZE for ia32_aout") breaks
kernel builds when "CONFIG_IA32_AOUT=m" with
ERROR: "set_personality_ia32" [arch/x86/ia32/ia32_aout.ko] undefined!
make[1]: *** [__modpost] Error 1
The entry point needs to be exported.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Acked-by: Al Viro <viro@zeniv.linux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull x86 fixes form Peter Anvin
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
intel_mid_powerbtn: mark irq as IRQF_NO_SUSPEND
arch/x86/platform/geode/net5501.c: change active_low to 0 for LED driver
x86, relocs: Remove an unused variable
asm-generic: Use __BITS_PER_LONG in statfs.h
x86/amd: Re-enable CPU topology extensions in case BIOS has disabled it
Pull btrfs fixes from Chris Mason:
"The big ones here are a memory leak we introduced in rc1, and a
scheduling while atomic if the transid on disk doesn't match the
transid we expected. This happens for corrupt blocks, or out of date
disks.
It also fixes up the ioctl definition for our ioctl to resolve logical
inode numbers. The __u32 was a merging error and doesn't match what
we ship in the progs."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: avoid sleeping in verify_parent_transid while atomic
Btrfs: fix crash in scrub repair code when device is missing
btrfs: Fix mismatching struct members in ioctl.h
Btrfs: fix page leak when allocing extent buffers
Btrfs: Add properly locking around add_root_to_dirty_list
Setting TIF_IA32 in load_aout_binary() used to be enough; these days
TASK_SIZE is controlled by TIF_ADDR32 and that one doesn't get set
there. Switch to use of set_personality_ia32()...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
verify_parent_transid needs to lock the extent range to make
sure no IO is underway, and so it can safely clear the
uptodate bits if our checks fail.
But, a few callers are using it with spinlocks held. Most
of the time, the generation numbers are going to match, and
we don't want to switch to a blocking lock just for the error
case. This adds an atomic flag to verify_parent_transid,
and changes it to return EAGAIN if it needs to block to
properly verifiy things.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Pull alpha fixes from Matt Turner:
"My alpha tree is back up (after taking quite some time to get my GPG
key signed). It contains just some simple fixes."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha:
alpha: silence 'const' warning in sys_marvel.c
alpha: include module.h to fix modpost on Tsunami
alpha: properly define get/set_rtc_time on Marvel/SMP
alpha: VGA_HOSE depends on VGA_CONSOLE
The test in pdc_console_tty_close '!tty->count' was always wrong
because tty->count is decremented after tty->ops->close is called and
thus can never be zero. Hence the 'then' branch was never executed and
the timer never deleted.
This did not matter until commit 5dd5bc40f3 ("TTY: pdc_cons, use
tty_port"). There we needed to set TTY in tty_port to NULL, but this
never happened due to the bug above.
So change the test to really trigger at the last close by changing the
condition to 'tty->count == 1'.
Well, the driver should not touch tty->count at all. It should use
tty_port->count and count open count there itself.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Reported-and-tested-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull sound sound fixes from Takashi Iwai:
"As good as nothing exciting here; just a few trivial fixes for various
ASoC stuff."
* tag 'sound-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ASoC: omap-pcm: Free dma buffers in case of error.
ASoC: s3c2412-i2s: Fix dai registration
ASoC: wm8350: Don't use locally allocated codec struct
ASoC: tlv312aic23: unbreak resume
ASoC: bf5xx-ssm2602: Set DAI format
ASoC: core: check of_property_count_strings failure
ASoC: dt: sgtl5000.txt: Add description for 'reg' field
ASoC: wm_hubs: Make sure we don't disable differential line outputs
Pull an ACPI patch from Len Brown:
"It fixes a D3 issue new in 3.4-rc1."
By Lin Ming via Len Brown:
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
ACPI: Fix D3hot v D3cold confusion