On one path, cond_resched_lock() fails to return true if it dropped the lock.
We think this might be causing the crashes in JBD's log_do_checkpoint().
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
flush_icache_range() is used in two different situation - in binfmt_elf.c &
co for user space mappings and module.c for kernel modules. On m68k
flush_icache_range() doesn't know which data to flush, as it has separate
address spaces and the pointer argument can be valid in either address
space.
First I considered splitting flush_icache_range(), but this patch is
simpler. Setting the correct context gives flush_icache_range() enough
information to flush the correct data.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The "unhandled interrupts" catcher, note_interrupt(), increments a global
desc->irq_count and grossly damages scaling of very large systems, e.g.,
>192p ia64 Altix, because of this highly contented cacheline, especially
for timer interrupts. 384p is severely crippled, and 512p is unuseable.
All calls to note_interrupt() can be disabled by booting with "noirqdebug",
but this disables the useful interrupt checking for all interrupts.
I propose eliminating note_interrupt() for all per-CPU interrupts. This
was the behavior of linux-2.6.10 and earlier, but in 2.6.11 a code
restructuring added a call to note_interrupt() for per-CPU interrupts.
Besides, note_interrupt() is a bit racy for concurrent CPU calls anyway, as
the desc->irq_count++ increment isn't atomic (which, if done, would make
scaling even worse).
Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There is a race in the kernel cpuset code, between the code
to handle notify_on_release, and the code to remove a cpuset.
The notify_on_release code can end up trying to access a
cpuset that has been removed. In the most common case, this
causes a NULL pointer dereference from the routine cpuset_path.
However all manner of bad things are possible, in theory at least.
The existing code decrements the cpuset use count, and if the
count goes to zero, processes the notify_on_release request,
if appropriate. However, once the count goes to zero, unless we
are holding the global cpuset_sem semaphore, there is nothing to
stop another task from immediately removing the cpuset entirely,
and recycling its memory.
The obvious fix would be to always hold the cpuset_sem
semaphore while decrementing the use count and dealing with
notify_on_release. However we don't want to force a global
semaphore into the mainline task exit path, as that might create
a scaling problem.
The actual fix is almost as easy - since this is only an issue
for cpusets using notify_on_release, which the top level big
cpusets don't normally need to use, only take the cpuset_sem
for cpusets using notify_on_release.
This code has been run for hours without a hiccup, while running
a cpuset create/destroy stress test that could crash the existing
kernel in seconds. This patch applies to the current -linus
git kernel.
Signed-off-by: Paul Jackson <pj@sgi.com>
Acked-by: Simon Derr <simon.derr@bull.net>
Acked-by: Dinakar Guniguntala <dino@in.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
While they were all just simple blobs it made sense to just free them
as we walked through and logged them. Now that there are pointers to
other objects which need refcounting, we might as well revert to
_only_ logging them in audit_log_exit(), and put the code to free them
properly in only one place -- in audit_free_aux().
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
----------------------------------------------------------
If SIGKILL does not have priority, we cannot instantly kill task before it
makes some unexpected job. It can be critical, but we were unable to
reproduce this easily until Heiko Carstens <Heiko.Carstens@de.ibm.com>
reported this problem on LKML.
Signed-Off-By: Kirill Korotaev <dev@sw.ru>
Signed-Off-By: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These changes make processing of audit logs easier. Based on a patch
from Steve Grubb <sgrubb@redhat.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Move audit_serial() into audit.c and use it to generate serial numbers
on messages even when there is no audit context from syscall auditing.
This allows us to disambiguate audit records when more than one is
generated in the same millisecond.
Based on a patch by Steve Grubb after he observed the problem.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
In _spin_unlock_bh(lock):
do { \
_raw_spin_unlock(lock); \
preempt_enable(); \
local_bh_enable(); \
__release(lock); \
} while (0)
there is no reason for using preempt_enable() instead of a simple
preempt_enable_no_resched()
Since we know bottom halves are disabled, preempt_schedule() will always
return at once (preempt_count!=0), and hence preempt_check_resched() is
useless here...
This fixes it by using "preempt_enable_no_resched()" instead of the
"preempt_enable()", and thus avoids the useless preempt_check_resched()
just before re-enabling bottom halves.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The attached patch changes all occurrences of loginuid to auid. It also
changes everything to %u that is an unsigned type.
Signed-off-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
The original AVC_USER message wasn't consolidated with the new range of
user messages. The attached patch fixes the kernel so the old messages
work again.
Signed-off-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
This patch changes the SELinux AVC to defer logging of paths to the audit
framework upon syscall exit, by saving a reference to the (dentry,vfsmount)
pair in an auxiliary audit item on the current audit context for processing
by audit_log_exit.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
This patch removes the entwining of cpusets and hotplug code in the "No
more Mr. Nice Guy" case of sched.c move_task_off_dead_cpu().
Since the hotplug code is holding a spinlock at this point, we cannot take
the cpuset semaphore, cpuset_sem, as would seem to be required either to
update the tasks cpuset, or to scan up the nested cpuset chain, looking for
the nearest cpuset ancestor that still has some CPUs that are online. So
we just punt and blast the tasks cpus_allowed with all bits allowed.
This reverts these lines of code to what they were before the cpuset patch.
And it updates the cpuset Doc file, to match.
The one known alternative to this that seems to work came from Dinakar
Guniguntala, and required the hotplug code to take the cpuset_sem semaphore
much earlier in its processing. So far as we know, the increased locking
entanglement between cpusets and hot plug of this alternative approach is
not worth doing in this case.
Signed-off-by: Paul Jackson <pj@sgi.com>
Acked-by: Nathan Lynch <ntl@pobox.com>
Acked-by: Dinakar Guniguntala <dino@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The limit on the number of outstanding audit messages was inadvertently
removed with the switch to queuing skbs directly for sending by a kernel
thread. Put it back again.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
netlink_unicast() will attempt to reallocate and will free messages if
the socket's rcvbuf limit is reached unless we give it an infinite
timeout. So do that, from a kernel thread which is dedicated to spewing
stuff up the netlink socket.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
* If vsnprintf returns -1, it will mess up the sk buffer space accounting.
This is fixed by not calling skb_put with bogus len values.
* audit_log_hex was a loop that called audit_log_vformat with %02X for each
character. This is very inefficient since conversion from unsigned character
to Ascii representation is essentially masking, shifting, and byte lookups.
Also, the length of the converted string is well known - it's twice the
original. Fixed by rewriting the function.
* audit_log_untrustedstring had no comments. This makes it hard for
someone to understand what the string format will be.
* audit_log_d_path was never fixed to use untrustedstring. This could mess
up user space parsers. This was fixed to make a temp buffer, call d_path,
and log temp buffer using untrustedstring.
From: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
It's silly to have to add explicit entries for new userspace messages
as we invent them. Just treat all messages in the user range the same.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
This patch includes various tweaks in the messaging that appears during
system pm state transitions:
* Warn about certain illegal calls in the device tree, like resuming
child before parent or suspending parent before child. This could
happen easily enough through sysfs, or in some cases when drivers
use device_pm_set_parent().
* Be more consistent about dev_dbg() tracing ... do it for resume() and
shutdown() too, and never if the driver doesn't have that method.
* Say which type of system sleep state is being entered.
Except for the warnings, these only affect debug messaging.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>