Various SMP balancing algorithms require that the bandwidth period
run in sync.
Possible improvements are moving the rt_bandwidth thing into root_domain
and keeping a span per rt_bandwidth which marks throttled cpus.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Balbir Singh reported:
> 1:mon> t
> [c0000000e7677da0] c000000000067de0 .sys_sched_yield+0x6c/0xbc
> [c0000000e7677e30] c000000000008748 syscall_exit+0x0/0x40
> --- Exception: c01 (System Call) at 00000400001d09e4
> SP (4000664cb10) is in userspace
> 1:mon> r
> cpu 0x1: Vector: 300 (Data Access) at [c0000000e7677aa0]
> pc: c000000000068e50: .yield_task_fair+0x94/0xc4
> lr: c000000000067de0: .sys_sched_yield+0x6c/0xbc
the check that should have avoided that is:
/*
* Are we the only task in the tree?
*/
if (unlikely(rq->load.weight == curr->se.load.weight))
return;
But I guess that overlooks rt tasks, they also increase the load.
So I guess something like this ought to fix it..
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There is no need to loop any longer when 'same == 0'.
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
turn off sync wakeups by default. They are not needed anymore - the
buddy logic should be smart enough to keep the system from
overscheduling.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The wakeup buddy logic didn't use the same wakeup granularity logic as the
wakeup preemption did, this might cause the ->next buddy to be selected past
the point where we would have preempted had the task been a single running
instance.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When using CONFIG_NO_HZ, rq->tick_timestamp is not updated every TICK_NSEC.
We check that the number of skipped ticks matches the clock jump seen in
__update_rq_clock().
Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
kernel/sched.c:506: erreur: implicit declaration of function tick_get_tick_sched
kernel/sched.c:506: erreur: invalid type argument of ->
kernel/sched.c:506: erreur: NOHZ_MODE_INACTIVE undeclared (first use in this function)
kernel/sched.c:506: erreur: (Each undeclared identifier is reported only once
kernel/sched.c:506: erreur: for each function it appears in.)
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Alexey Zaytsev reported (and bisected) that the introduction of
cpu_clock() in printk made the timestamps jump back and forth.
Make cpu_clock() more reliable while still keeping it fast when it's
called frequently.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Rename the se_str and se_rule audit fields elements to
lsm_str and lsm_rule to avoid confusion.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
Setup the new Audit LSM hooks for SELinux.
Remove the now redundant exported SELinux Audit interface.
Audit: Export 'audit_krule' and 'audit_field' to the public
since their internals are needed by the implementation of the
new LSM hook 'audit_rule_known'.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
Convert Audit to use the new LSM Audit hooks instead of
the exported SELinux interface.
Basically, use:
security_audit_rule_init
secuirty_audit_rule_free
security_audit_rule_known
security_audit_rule_match
instad of (respectively) :
selinux_audit_rule_init
selinux_audit_rule_free
audit_rule_has_selinux
selinux_audit_rule_match
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
Stop using the following exported SELinux interfaces:
selinux_get_inode_sid(inode, sid)
selinux_get_ipc_sid(ipcp, sid)
selinux_get_task_sid(tsk, sid)
selinux_sid_to_string(sid, ctx, len)
kfree(ctx)
and use following generic LSM equivalents respectively:
security_inode_getsecid(inode, secid)
security_ipc_getsecid*(ipcp, secid)
security_task_getsecid(tsk, secid)
security_sid_to_secctx(sid, ctx, len)
security_release_secctx(ctx, len)
Call security_release_secctx only if security_secid_to_secctx
succeeded.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
Reviewed-by: Paul Moore <paul.moore@hp.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt:
clocksource: make clocksource watchdog cycle through online CPUs
Documentation: move timer related documentation to a single place
clockevents: optimise tick_nohz_stop_sched_tick() a bit
locking: remove unused double_spin_lock()
hrtimers: simplify lockdep handling
timers: simplify lockdep handling
posix-timers: fix shadowed variables
timer_list: add annotations to workqueue.c
hrtimer: use nanosleep specific restart_block fields
hrtimer: add nanosleep specific restart_block member
* 'semaphore' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc:
Remove DEBUG_SEMAPHORE from Kconfig
Improve semaphore documentation
Simplify semaphore implementation
Add down_timeout and change ACPI to use it
Introduce down_killable()
Generic semaphore implementation
Add semaphore.h to kernel_lock.c
Fix quota.h includes
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6: (36 commits)
[S390] Remove code duplication from monreader / dcssblk.
[S390] kernel: show last breaking-event-address on oops
[S390] lowcore: Change type of lowcores softirq_pending to __u32.
[S390] zcrypt: Comments and kernel-doc cleanup
[S390] uaccess: Always access the correct address space.
[S390] Fix a lot of sparse warnings.
[S390] Convert s390 to GENERIC_CLOCKEVENTS.
[S390] genirq/clockevents: move irq affinity prototypes/inlines to interrupt.h
[S390] Convert monitor calls to function calls.
[S390] qdio (new feature): enhancing info-retrieval from QDIO-adapters
[S390] replace remaining __FUNCTION__ occurrences
[S390] remove redundant display of free swap space in show_mem()
[S390] qdio: remove outdated developerworks link.
[S390] Add debug_register_mode() function to debug feature API
[S390] crypto: use more descriptive function names for init/exit routines.
[S390] switch sched_clock to store-clock-extended.
[S390] zcrypt: add support for large random numbers
[S390] hw_random: allow rng_dev_read() to return hardware errors.
[S390] Vertical cpu management.
[S390] cpu topology support for s390.
...
This breaks out the ptrace handling from get_signal_to_deliver into a
new subroutine. The actual code there doesn't change, and it gets
inlined into nearly identical compiled code. This makes the function
substantially shorter and thus easier to read, and it nicely isolates
the ptrace magic.
Signed-off-by: Roland McGrath <roland@redhat.com>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When I ran a test program to fork mass processes and at the same time
'cat /cgroup/tasks', I got the following oops:
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:72!
invalid opcode: 0000 [#1] SMP
Pid: 4178, comm: a.out Not tainted (2.6.25-rc9 #72)
...
Call Trace:
[<c044a5f9>] ? cgroup_exit+0x55/0x94
[<c0427acf>] ? do_exit+0x217/0x5ba
[<c0427ed7>] ? do_group_exit+0.65/0x7c
[<c0427efd>] ? sys_exit_group+0xf/0x11
[<c0404842>] ? syscall_call+0x7/0xb
[<c05e0000>] ? init_cyrix+0x2fa/0x479
...
EIP: [<c04df671>] list_del+0x35/0x53 SS:ESP 0068:ebc7df4
---[ end trace caffb7332252612b ]---
Fixing recursive fault but reboot is needed!
After digging into the code and debugging, I finlly found out a race
situation:
do_exit()
->cgroup_exit()
->if (!list_empty(&tsk->cg_list))
list_del(&tsk->cg_list);
cgroup_iter_start()
->cgroup_enable_task_cg_list()
->list_add(&tsk->cg_list, ..);
In this case the list won't be deleted though the process has exited.
We got two bug reports in the past, which seem to be the same bug as
this one:
http://lkml.org/lkml/2008/3/5/332http://lkml.org/lkml/2007/10/17/224
Actually sometimes I got oops on list_del, sometimes oops on list_add.
And I can change my test program a bit to trigger other oops.
The patch has been tested both on x86_32 and x86_64.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On the ppc 4xx architecture the instruction cache must be flushed as
well as the data cache. This patch just makes it generic for all
architectures where CACHE_FLUSH_IS_SAFE is set to 1.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix the problem of protecting the kgdb handle_exception exit
which had an NMI race condition, while trying to restore
normal system operation.
There was a small window after the master processor sets cpu_in_debug
to zero but before it has set kgdb_active to zero where a
non-master processor in an SMP system could receive an NMI and
re-enter the kgdb_wait() loop.
As long as the master processor sets the cpu_in_debug before sending
the cpu roundup the cpu_in_debug variable can also be used to guard
against the race condition.
The kgdb_wait() function no longer needs to check
kgdb_active because it is done in the arch specific code
and handled along with the nmi traps at the low level.
This also allows kgdb_wait() to exit correctly if it was
entered for some unknown reason due to a spurious NMI that
could not be handled by the arch specific code.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>