Commit Graph

2934 Commits

Author SHA1 Message Date
David S. Miller
f3baa4827a [COMPAT]: Fix build on COMPAT platforms when CONFIG_NET is disabled.
Add some missing cond_syscall() entries for this case.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-30 21:29:56 -07:00
Rafael J. Wysocki
cc5f916e90 Freezer: do not allow freezing processes to clear TIF_SIGPENDING
Do not allow processes to clear their TIF_SIGPENDING if TIF_FREEZE is set,
so that they will not race with the freezer (like mysqld does, for example).

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@suspend2.net>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-30 08:06:55 -07:00
Balbir Singh
9301899be7 sched: fix /proc/<PID>/stat stime/utime monotonicity, part 2
Extend Peter's patch to fix accounting issues, by keeping stime
monotonic too.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Frans Pop <elendil@planet.nl>
2007-10-30 00:26:32 +01:00
Ingo Molnar
38605cae99 sched: fix style in kernel/sched.c
fallout of recent commits: small coding style fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-29 21:18:11 +01:00
Ingo Molnar
8eb172d941 sched: fix style of swap() macro in kernel/sched_fair.c
fix style of swap() macro in kernel/sched_fair.c.

( this macro should eventually move to a general header, as ext3 uses
  a similar construct too. )

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-29 21:18:11 +01:00
Paul Menage
fe5c7cc228 sched: report CPU usage in CFS cgroup directories
Adds a cpu.usage file to the CFS cgroup that reports CPU usage in
milliseconds for that cgroup's tasks

[ mingo@elte.hu: style cleanups. ]

Signed-off-by: Paul Menage <menage@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-29 21:18:11 +01:00
Srivatsa Vaddagiri
ae8393e508 sched: move rcu_head to task_group struct
Peter Zijlstra noticed that the rcu_head object need not be present
in every cfs_rq of a group. Move it to the task_group structure
instead.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-29 21:18:11 +01:00
James Bottomley
7bae49d498 sched: fix incorrect assumption that cpu 0 exists
This patch:

commit 9b5b77512d
Author: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Date:   Mon Oct 15 17:00:09 2007 +0200

    sched: clean up code under CONFIG_FAIR_GROUP_SCHED

Introduced an assumption of the existence of CPU0 via this line

cfs_rq = tg->cfs_rq[0];

If you have no CPU0, that will be NULL.  The fix seems to be just to
take whatever cfs_rq queue comes out of the for_each_possible_cpu()
loop, since they're all equally good for the destruction operation.

Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-29 21:18:11 +01:00
Peter Zijlstra
73a2bcb0ed sched: keep utime/stime monotonic
keep utime/stime monotonic.

cpustats use utime/stime as a ratio against sum_exec_runtime, as a
consequence it can happen - when the ratio changes faster than time
accumulates - that either can be appear to go backwards.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-29 21:18:11 +01:00
Adrian Bunk
f7402e0361 sched: make kernel/sched.c:account_guest_time() static
account_guest_time() can become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-29 21:18:10 +01:00
Linus Torvalds
93400708db Merge git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt
* git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt:
  Quieten hrtimer printk: "Switched to high resolution mode .."
  timer_list: Fix printk format strings
  clockevents: unexport tick_nohz_get_sleep_length
2007-10-29 07:47:05 -07:00
Al Viro
ca5cd877ae x86 merge fallout: uml
Don't undef __i386__/__x86_64__ in uml anymore, make sure that (few) places
that required adjusting the ifdefs got those.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-29 07:41:32 -07:00
Michael Ellerman
edfed66e17 Quieten hrtimer printk: "Switched to high resolution mode .."
Change the hrtimer printk "Switched to high resolution mode .." to
be KERN_DEBUG, rather than KERN_INFO. If users need to see this they
can pass "loglevel" or "debug" on the command line, or check dmesg.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

 kernel/hrtimer.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
2007-10-29 09:39:38 +01:00
Vegard Nossum
129f1d2c53 timer_list: Fix printk format strings
This makes sure printk format strings contain no more than a single
line.

Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-29 09:39:38 +01:00
Adrian Bunk
64e38eb082 clockevents: unexport tick_nohz_get_sleep_length
This patch removes the unused 
EXPORT_SYMBOL_GPL(tick_nohz_get_sleep_length).

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-29 09:39:38 +01:00
Peter Williams
681f3e6854 sched: isolate SMP balancing code a bit more
At the moment, a lot of load balancing code that is irrelevant to non
SMP systems gets included during non SMP builds.

This patch addresses this issue and reduces the binary size on non
SMP systems:

   text    data     bss     dec     hex filename
  10983      28    1192   12203    2fab sched.o.before
  10739      28    1192   11959    2eb7 sched.o.after

Signed-off-by: Peter Williams <pwil3058@bigpond.net.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-24 18:23:51 +02:00
Peter Williams
e1d1484f72 sched: reduce balance-tasks overhead
At the moment, balance_tasks() provides low level functionality for both
  move_tasks() and move_one_task() (indirectly) via the load_balance()
function (in the sched_class interface) which also provides dual
functionality.  This dual functionality complicates the interfaces and
internal mechanisms and makes the run time overhead of operations that
are called with two run queue locks held.

This patch addresses this issue and reduces the overhead of these
operations.

Signed-off-by: Peter Williams <pwil3058@bigpond.net.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-24 18:23:51 +02:00
Adrian Bunk
a0f846aa76 sched: make cpu_shares_{show,store}() static
cpu_shares_{show,store}() can become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-24 18:23:50 +02:00
Paul Menage
2b01dfe372 sched: clean up some control group code
- replace "cont" with "cgrp" in a few places in the CFS cgroup code, 
- use write_uint rather than write for cpu.shares write function

Signed-off-by: Paul Menage <menage@google.com>
Acked-by : Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-24 18:23:50 +02:00
Mel Gorman
b3da2a73ff sched: document profile=sleep requiring CONFIG_SCHEDSTATS
profile=sleep only works if CONFIG_SCHEDSTATS is set. This patch notes
the limitation in Documentation/kernel-parameters.txt and prints a
warning at boot-time if profile=sleep is used without CONFIG_SCHEDSTAT.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-24 18:23:50 +02:00
Satyam Sharma
838225b48e sched: use show_regs() to improve __schedule_bug() output
A full register dump along with stack backtrace would make the
"scheduling while atomic" message more helpful. Use show_regs() instead
of dump_stack() for this. We already know we're atomic in here (that is
why this function was called) so show_regs()'s atomicity expectations
are guaranteed.

Also, modify the output of the "BUG: scheduling while atomic:" header a
bit to keep task->comm and task->pid together and preempt_count() after
them.

Signed-off-by: Satyam Sharma <satyam@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-24 18:23:50 +02:00
Ingo Molnar
4dcf6aff02 sched: clean up sched_domain_debug()
clean up sched_domain_debug().

this also shrinks the code a bit:

   text    data     bss     dec     hex filename
  50474    4306     480   55260    d7dc sched.o.before
  50404    4306     480   55190    d796 sched.o.after

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-24 18:23:48 +02:00
Ingo Molnar
b15136e949 sched: fix fastcall mismatch in completion APIs
Jeff Dike noticed that wait_for_completion_interruptible()'s prototype
had a mismatched fastcall.

Fix this by removing the fastcall attributes from all the completion APIs.

Found-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-24 18:23:48 +02:00
Milton Miller
7378547f2c sched: fix sched_domain sysctl registration again
commit  029190c515 (cpuset
sched_load_balance flag) was not tested SCHED_DEBUG enabled as
committed as it dereferences NULL when used and it reordered
the sysctl registration to cause it to never show any domains
or their tunables.

Fixes:

1) restore arch_init_sched_domains ordering
	we can't walk the domains before we build them

	presently we register cpus with empty directories (no domain
	directories or files).

2) make unregister_sched_domain_sysctl do nothing when already unregistered
	detach_destroy_domains is now called one set of cpus at a time
	unregister_syctl dereferences NULL if called with a null.

	While the the function would always dereference null if called
	twice, in the previous code it was always called once and then
	was followed a register.  So only the hidden bug of the
	sysctl_root_table not being allocated followed by an attempt to
	free it would have shown the error.

3) always call unregister and register in partition_sched_domains
	The code is "smart" about unregistering only needed domains.
	Since we aren't guaranteed any calls to unregister, always 
	unregister.   Without calling register on the way out we
	will not have a table or any sysctl tree.

4) warn if register is called without unregistering
	The previous table memory is lost, leaving pointers to the
	later freed memory in sysctl and leaking the memory of the
	tables.

Before this patch on a 2-core 4-thread box compiled for SMT and NUMA,
the domains appear empty (there are actually 3 levels per cpu).  And as
soon as two domains a null pointer is dereferenced (unreliable in this
case is stack garbage):

bu19a:~# ls -R /proc/sys/kernel/sched_domain/
/proc/sys/kernel/sched_domain/:
cpu0  cpu1  cpu2  cpu3

/proc/sys/kernel/sched_domain/cpu0:

/proc/sys/kernel/sched_domain/cpu1:

/proc/sys/kernel/sched_domain/cpu2:

/proc/sys/kernel/sched_domain/cpu3:

bu19a:~# mkdir /dev/cpuset
bu19a:~# mount -tcpuset cpuset /dev/cpuset/
bu19a:~# cd /dev/cpuset/
bu19a:/dev/cpuset# echo 0 > sched_load_balance 
bu19a:/dev/cpuset# mkdir one
bu19a:/dev/cpuset# echo 1 > one/cpus               
bu19a:/dev/cpuset# echo 0 > one/sched_load_balance 
Unable to handle kernel paging request for data at address 0x00000018
Faulting instruction address: 0xc00000000006b608
NIP: c00000000006b608 LR: c00000000006b604 CTR: 0000000000000000
REGS: c000000018d973f0 TRAP: 0300   Not tainted  (2.6.23-bml)
MSR: 9000000000009032 <EE,ME,IR,DR>  CR: 28242442  XER: 00000000
DAR: 0000000000000018, DSISR: 0000000040000000
TASK = c00000001912e340[1987] 'bash' THREAD: c000000018d94000 CPU: 2
..
NIP [c00000000006b608] .unregister_sysctl_table+0x38/0x110
LR [c00000000006b604] .unregister_sysctl_table+0x34/0x110
Call Trace:
[c000000018d97670] [c000000007017270] 0xc000000007017270 (unreliable)
[c000000018d97720] [c000000000058710] .detach_destroy_domains+0x30/0xb0
[c000000018d977b0] [c00000000005cf1c] .partition_sched_domains+0x1bc/0x230
[c000000018d97870] [c00000000009fdc4] .rebuild_sched_domains+0xb4/0x4c0
[c000000018d97970] [c0000000000a02e8] .update_flag+0x118/0x170
[c000000018d97a80] [c0000000000a1768] .cpuset_common_file_write+0x568/0x820
[c000000018d97c00] [c00000000009d95c] .cgroup_file_write+0x7c/0x180
[c000000018d97cf0] [c0000000000e76b8] .vfs_write+0xe8/0x1b0
[c000000018d97d90] [c0000000000e810c] .sys_write+0x4c/0x90
[c000000018d97e30] [c00000000000852c] syscall_exit+0x0/0x40

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-24 18:23:48 +02:00
Jeff Garzik
3bdf590eac cgroup: kill unused variable
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2007-10-23 21:28:39 -04:00