Remove unused variable, and make code less evil that way. Fix whitespace
around for-loop-like macro.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This cleans spaces between * and pointer up, and adds "int" in "unsigned
int".
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Replace smp_processor_id() with any_online_cpu(cpu_online_map) in order to
avoid lots of "BUG: using smp_processor_id() in preemptible [00000001]
code:..." messages in case taking a cpu online fails.
All the traces start at the last notifier_call_chain(...) in kernel/cpu.c.
Since we hold the cpu_control semaphore it shouldn't be any problem to access
cpu_online_map.
The reason why cpu_up failed is simply that the cpu that was supposed to be
taken online wasn't even there. That is because on s390 we never know when a
new cpu comes and therefore cpu_possible_map consists of only ones and doesn't
reflect reality.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
!unlikely(expr) hurts my brain. likely(!expr) is more straightforward.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Do not transfer remaining time slice to another cpu on process exit.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I recently picked up my older work to remove unnecessary #includes of
sched.h, starting from a patch by Dave Jones to not include sched.h
from module.h. This reduces the number of indirect includes of sched.h
by ~300. Another ~400 pointless direct includes can be removed after
this disentangling (patch to follow later).
However, quite a few indirect includes need to be fixed up for this.
In order to feed the patches through -mm with as little disturbance as
possible, I've split out the fixes I accumulated up to now (complete for
i386 and x86_64, more archs to follow later) and post them before the real
patch. This way this large part of the patch is kept simple with only
adding #includes, and all hunks are independent of each other. So if any
hunk rejects or gets in the way of other patches, just drop it. My scripts
will pick it up again in the next round.
Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Combine a bit of redundant code between force_sig_info() and
force_sig_specific().
Signed-off-by: paulmck@us.ibm.com
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch removes checks for ->si_code == SI_TIMER from send_signal,
specific_send_sig_info, __group_send_sig_info.
I think posix-timers.c used these functions some time ago, now it sends
signals via send_{,group_}sigqueue, so these hooks are unneeded.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch simplifies some checks for magic siginfo values. It should not
change the behaviour in any way.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch replaces hardcoded SEND_SIG_xxx constants with
their symbolic names.
No changes in affected .o files.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Simplify the UP (1 CPU) implementatin of set_cpus_allowed.
The one CPU is hardcoded to be cpu 0 - so just test for that bit, and avoid
having to pick up the cpu_online_map.
Also, unexport cpu_online_map: it was only needed for set_cpus_allowed().
Signed-off-by: Paul Jackson <pj@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This change corrects an omission in posix_cpu_timer_schedule, so that it
correctly propagates the overrun calculation to where it will get reported
to the user.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch is a rewrite of the one submitted on October 1st, using modules
(http://marc.theaimsgroup.com/?l=linux-kernel&m=112819093522998&w=2).
This rewrite adds a tristate CONFIG_RCU_TORTURE_TEST, which enables an
intense torture test of the RCU infratructure. This is needed due to the
continued changes to the RCU infrastructure to accommodate dynamic ticks,
CPU hotplug, realtime, and so on. Most of the code is in a separate file
that is compiled only if the CONFIG variable is set. Documentation on how
to run the test and interpret the output is also included.
This code has been tested on i386 and ppc64, and an earlier version of the
code has received extensive testing on a number of architectures as part of
the PREEMPT_RT patchset.
Signed-off-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Define jiffies_64 in kernel/timer.c rather than having 24 duplicated
defines in each architecture.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Back about a year ago when I last fiddled heavily with the do_wait code, I
was thinking too hard about the wrong thing and I now think I introduced a
bug whose inverse thought I was fixing.
Apparently noone was looking too hard over much shoulder, so as to cite my
bogus reasoning at the time. In the race condition when PTRACE_ATTACH is
about to steal a child and then the child hits a tracing event (what
my_ptrace_child checks for), the real parent does need to set its flag
noting it has some eligible live children. Otherwise a spurious ECHILD
error is possible, since the child in question is not yet on the
ptrace_children list.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The PF_DEAD setting doesn't belong to exit_notify(), move it to a proper
place.
Signed-off-by: Coywolf Qi Hunt <qiyong@fc-cn.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Removes some trailing whitespace
- Breaks long lines and make other small changes to conform to CodingStyle
- Add explicit printk loglevels in two places.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The attached patch gets rid of a "statement without effect" warning when
CONFIG_KEYS is disabled by making use of the return value of key_get().
The compiler will optimise all of this away when keys are disabled.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I could seldom reproduce a deadlock with a task not killable in T state
(TASK_STOPPED, not TASK_TRACED) by attaching a NPTL threaded program to
gdb, by segfaulting the task and triggering a core dump while some other
task is executing exit_group and while one task is in ptrace_attached
TASK_STOPPED state (not TASK_TRACED yet). This originated from a gdb
bugreport (the fact gdb was segfaulting the task wasn't a kernel bug), but
I just incidentally noticed the gdb bug triggered a real kernel bug as
well.
Most threads hangs in exit_mm because the core_dumping is still going, the
core dumping hangs because the stopped task doesn't exit, the stopped task
can't wakeup because it has SIGNAL_GROUP_EXIT set, hence the deadlock.
To me it seems that the problem is that the force_sig_specific(SIGKILL) in
zap_threads is a noop if the task has PF_PTRACED set (like in this case
because gdb is attached). The __ptrace_unlink does nothing because the
signal->flags is set to SIGNAL_GROUP_EXIT|SIGNAL_STOP_DEQUEUED (verified).
The above info also shows that the stopped task hit a race and got the stop
signal (presumably by the ptrace_attach, only the attach, state is still
TASK_STOPPED and gdb hangs waiting the core before it can set it to
TASK_TRACED) after one of the thread invoked the core dump (it's the core
dump that sets signal->flags to SIGNAL_GROUP_EXIT).
So beside the fact nobody would wakeup the task in __ptrace_unlink (the
state is _not_ TASK_TRACED), there's a secondary problem in the signal
handling code, where a task should ignore the ptrace-sigstops as long as
SIGNAL_GROUP_EXIT is set (or the wakeup in __ptrace_unlink path wouldn't be
enough).
So I attempted to make this patch that seems to fix the problem. There
were various ways to fix it, perhaps you prefer a different one, I just
opted to the one that looked safer to me.
I also removed the clearing of the stopped bits from the zap_other_threads
(zap_other_threads was safe unlike zap_threads). I don't like useless
code, this whole NPTL signal/ptrace thing is already unreadable enough and
full of corner cases without confusing useless code into it to make it even
less readable. And if this code is really needed, then you may want to
explain why it's not being done in the other paths that sets
SIGNAL_GROUP_EXIT at least.
Even after this patch I still wonder who serializes the read of
p->ptrace in zap_threads.
Patch is called ptrace-core_dump-exit_group-deadlock-1.
This was the trace I've got:
test T ffff81003e8118c0 0 14305 1 14311 14309 (NOTLB)
ffff810058ccdde8 0000000000000082 000001f4000037e1 ffff810000000013
00000000000000f8 ffff81003e811b00 ffff81003e8118c0 ffff810011362100
0000000000000012 ffff810017ca4180
Call Trace:<ffffffff801317ed>{try_to_wake_up+893} <ffffffff80141677>{finish_stop+87}
<ffffffff8014367f>{get_signal_to_deliver+1359} <ffffffff8010d3ad>{do_signal+157}
<ffffffff8013deee>{ptrace_check_attach+222} <ffffffff80111575>{sys_ptrace+2293}
<ffffffff80131810>{default_wake_function+0} <ffffffff80196399>{sys_ioctl+73}
<ffffffff8010dd27>{sysret_signal+28} <ffffffff8010e00f>{ptregscall_common+103}
test D ffff810011362100 0 14309 1 14305 14312 (NOTLB)
ffff810053c81cf8 0000000000000082 0000000000000286 0000000000000001
0000000000000195 ffff810011362340 ffff810011362100 ffff81002e338040
ffff810001e0ca80 0000000000000001
Call Trace:<ffffffff801317ed>{try_to_wake_up+893} <ffffffff8044677d>{wait_for_completion+173}
<ffffffff80131810>{default_wake_function+0} <ffffffff80137435>{exit_mm+149}
<ffffffff801381af>{do_exit+479} <ffffffff80138d0c>{do_group_exit+252}
<ffffffff801436db>{get_signal_to_deliver+1451} <ffffffff8010d3ad>{do_signal+157}
<ffffffff8013deee>{ptrace_check_attach+222} <ffffffff80140850>{specific_send_sig_info+2
<ffffffff8014208a>{force_sig_info+186} <ffffffff804479a0>{do_int3+112}
<ffffffff8010e308>{retint_signal+61}
test D ffff81002e338040 0 14311 1 14716 14305 (NOTLB)
ffff81005ca8dcf8 0000000000000082 0000000000000286 0000000000000001
0000000000000120 ffff81002e338280 ffff81002e338040 ffff8100481cb740
ffff810001e0ca80 0000000000000001
Call Trace:<ffffffff801317ed>{try_to_wake_up+893} <ffffffff8044677d>{wait_for_completion+173}
<ffffffff80131810>{default_wake_function+0} <ffffffff80137435>{exit_mm+149}
<ffffffff801381af>{do_exit+479} <ffffffff80142d0e>{__dequeue_signal+558}
<ffffffff80138d0c>{do_group_exit+252} <ffffffff801436db>{get_signal_to_deliver+1451}
<ffffffff8010d3ad>{do_signal+157} <ffffffff8013deee>{ptrace_check_attach+222}
<ffffffff80140850>{specific_send_sig_info+208} <ffffffff8014208a>{force_sig_info+186}
<ffffffff804479a0>{do_int3+112} <ffffffff8010e308>{retint_signal+61}
test D ffff810017ca4180 0 14312 1 14309 13882 (NOTLB)
ffff81005d15fcb8 0000000000000082 ffff81005d15fc58 ffffffff80130816
0000000000000897 ffff810017ca43c0 ffff810017ca4180 ffff81003e8118c0
0000000000000082 ffffffff801317ed
Call Trace:<ffffffff80130816>{activate_task+150} <ffffffff801317ed>{try_to_wake_up+893}
<ffffffff8044677d>{wait_for_completion+173} <ffffffff80131810>{default_wake_function+0}
<ffffffff8018cdc3>{do_coredump+819} <ffffffff80445f52>{thread_return+82}
<ffffffff801436d4>{get_signal_to_deliver+1444} <ffffffff8010d3ad>{do_signal+157}
<ffffffff8013deee>{ptrace_check_attach+222} <ffffffff80140850>{specific_send_sig_info+2
<ffffffff804472e5>{_spin_unlock_irqrestore+5} <ffffffff8014208a>{force_sig_info+186}
<ffffffff804476ff>{do_general_protection+159} <ffffffff8010e308>{retint_signal+61}
Signed-off-by: Andrea Arcangeli <andrea@suse.de>
Cc: Roland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch automatically updates a tasks NUMA mempolicy when its cpuset
memory placement changes. It does so within the context of the task,
without any need to support low level external mempolicy manipulation.
If a system is not using cpusets, or if running on a system with just the
root (all-encompassing) cpuset, then this remap is a no-op. Only when a
task is moved between cpusets, or a cpusets memory placement is changed
does the following apply. Otherwise, the main routine below,
rebind_policy() is not even called.
When mixing cpusets, scheduler affinity, and NUMA mempolicies, the
essential role of cpusets is to place jobs (several related tasks) on a set
of CPUs and Memory Nodes, the essential role of sched_setaffinity is to
manage a jobs processor placement within its allowed cpuset, and the
essential role of NUMA mempolicy (mbind, set_mempolicy) is to manage a jobs
memory placement within its allowed cpuset.
However, CPU affinity and NUMA memory placement are managed within the
kernel using absolute system wide numbering, not cpuset relative numbering.
This is ok until a job is migrated to a different cpuset, or what's the
same, a jobs cpuset is moved to different CPUs and Memory Nodes.
Then the CPU affinity and NUMA memory placement of the tasks in the job
need to be updated, to preserve their cpuset-relative position. This can
be done for CPU affinity using sched_setaffinity() from user code, as one
task can modify anothers CPU affinity. This cannot be done from an
external task for NUMA memory placement, as that can only be modified in
the context of the task using it.
However, it easy enough to remap a tasks NUMA mempolicy automatically when
a task is migrated, using the existing cpuset mechanism to trigger a
refresh of a tasks memory placement after its cpuset has changed. All that
is needed is the old and new nodemask, and notice to the task that it needs
to rebind its mempolicy. The tasks mems_allowed has the old mask, the
tasks cpuset has the new mask, and the existing
cpuset_update_current_mems_allowed() mechanism provides the notice. The
bitmap/cpumask/nodemask remap operators provide the cpuset relative
calculations.
This patch leaves open a couple of issues:
1) Updating vma and shmfs/tmpfs/hugetlbfs memory policies:
These mempolicies may reference nodes outside of those allowed to
the current task by its cpuset. Tasks are migrated as part of jobs,
which reside on what might be several cpusets in a subtree. When such
a job is migrated, all NUMA memory policy references to nodes within
that cpuset subtree should be translated, and references to any nodes
outside that subtree should be left untouched. A future patch will
provide the cpuset mechanism needed to mark such subtrees. With that
patch, we will be able to correctly migrate these other memory policies
across a job migration.
2) Updating cpuset, affinity and memory policies in user space:
This is harder. Any placement state stored in user space using
system-wide numbering will be invalidated across a migration. More
work will be required to provide user code with a migration-safe means
to manage its cpuset relative placement, while preserving the current
API's that pass system wide numbers, not cpuset relative numbers across
the kernel-user boundary.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add support for renaming cpusets. Only allow simple rename of cpuset
directories in place. Don't allow moving cpusets elsewhere in hierarchy or
renaming the special cpuset files in each cpuset directory.
The usefulness of this simple rename became apparent when developing task
migration facilities. It allows building a second cpuset hierarchy using
new names and containing new CPUs and Memory Nodes, moving tasks from the
old to the new cpusets, removing the old cpusets, and then renaming the new
cpusets to be just like the old names, so that any knowledge that the tasks
had of their cpuset names will still be valid.
Leaf node cpusets can be migrated to other CPUs or Memory Nodes by just
updating their 'cpus' and 'mems' files, but because no cpuset can contain
CPUs or Nodes not in its parent cpuset, one cannot do this in a cpuset
hierarchy without first expanding all the non-leaf cpusets to contain the
union of both the old and new CPUs and Nodes, which would obfuscate the
one-to-one migration of a task from one cpuset to another required to
correctly migrate the physical page frames currently allocated to that
task.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Overhaul cpuset locking. Replace single semaphore with two semaphores.
The suggestion to use two locks was made by Roman Zippel.
Both locks are global. Code that wants to modify cpusets must first
acquire the exclusive manage_sem, which allows them read-only access to
cpusets, and holds off other would-be modifiers. Before making actual
changes, the second semaphore, callback_sem must be acquired as well. Code
that needs only to query cpusets must acquire callback_sem, which is also a
global exclusive lock.
The earlier problems with double tripping are avoided, because it is
allowed for holders of manage_sem to nest the second callback_sem lock, and
only callback_sem is needed by code called from within __alloc_pages(),
where the double tripping had been possible.
This is not quite the same as a normal read/write semaphore, because
obtaining read-only access with intent to change must hold off other such
attempts, while allowing read-only access w/o such intention. Changing
cpusets involves several related checks and changes, which must be done
while allowing read-only queries (to avoid the double trip), but while
ensuring nothing changes (holding off other would be modifiers.)
This overhaul of cpuset locking also makes careful use of task_lock() to
guard access to the task->cpuset pointer, closing a couple of race
conditions noticed while reading this code (thanks, Roman). I've never
seen these races fail in any use or test.
See further the comments in the code.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove a rather hackish depth counter on cpuset locking. The depth counter
was avoiding a possible double trip on the global cpuset_sem semaphore. It
worked, but now an improved version of cpuset locking is available, to come
in the next patch, using two global semaphores.
This patch reverses "cpuset semaphore depth check deadlock fix"
The kernel still works, even after this patch, except for some rare and
difficult to reproduce race conditions when agressively creating and
destroying cpusets marked with the notify_on_release option, on very large
systems.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove one more useless line from cpuset_common_file_read().
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>