John Hawkes explained the problem best:
A large number of processes that are pinned to a single CPU results
in every other CPU's load_balance() seeing this overloaded CPU as
"busiest", yet move_tasks() never finds a task to pull-migrate. This
condition occurs during module unload, but can also occur as a
denial-of-service using sys_sched_setaffinity(). Several hundred
CPUs performing this fruitless load_balance() will livelock on the
busiest CPU's runqueue lock. A smaller number of CPUs will livelock
if the pinned task count gets high.
Expanding slightly on John's patch, this one attempts to work out whether the
balancing failure has been due to too many tasks pinned on the runqueue. This
allows it to be basically invisible to the regular blancing paths (ie. when
there are no pinned tasks). We can use this extra knowledge to shut down the
balancing faster, and ensure the migration threads don't start running which
is another problem observed in the wild.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
New sched-domains code means we don't get spans with offline CPUs in
them.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Show swsuspend only on .config where it can compile. I got this on PPC32 &&
SMP:
kernel/power/smp.c:24: error: storage size of `ctxt' isn't known
Also mark swsusp as no longer experimental.
Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In the cpu hotplug case, per-cpu data possibly isn't initialized even the
system state is 'running'. As the comments say in the original code, some
console drivers assume per-cpu resources have been allocated. radeon fb is
one such driver, which uses kmalloc. After a CPU is down, the per-cpu data
of slab is freed, so the system crashed when printing some info.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The following patch moves the recalculation of nr_copy_pages so that the right
number is used in the calculation of the size of memory and swap needed.
It prevents swsusp from attempting to suspend if there is not enough memory
and/or swap (which is unlikely anyway).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The following patch cleans up whitespace in swsusp.c (a bit):
- removes any trailing whitespace
- adds spaces after if, for, for_each_pbe, for_each_zone etc., wherever
necessary.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The following patch removes the unnecessary function does_collide_order().
This function is no longer necessary, as currently there are only 0-order
allocations in swsusp, and the use of it is confusing.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Without this patch, Linux provokes emergency disk shutdowns and
similar nastiness. It was in SuSE kernels for some time, IIRC.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Using CPU hotplug to support suspend/resume SMP. Both S3 and S4 use
disable/enable_nonboot_cpus API. The S4 part is based on Pavel's original S4
SMP patch.
Signed-off-by: Li Shaohua<shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
(The i386 CPU hotplug patch provides infrastructure for some work which Pavel
is doing as well as for ACPI S3 (suspend-to-RAM) work which Li Shaohua
<shaohua.li@intel.com> is doing)
The following provides i386 architecture support for safely unregistering and
registering processors during runtime, updated for the current -mm tree. In
order to avoid dumping cpu hotplug code into kernel/irq/* i dropped the
cpu_online check in do_IRQ() by modifying fixup_irqs(). The difference being
that on cpu offline, fixup_irqs() is called before we clear the cpu from
cpu_online_map and a long delay in order to ensure that we never have any
queued external interrupts on the APICs. There are additional changes to s390
and ppc64 to account for this change.
1) Add CONFIG_HOTPLUG_CPU
2) disable local APIC timer on dead cpus.
3) Disable preempt around irq balancing to prevent CPUs going down.
4) Print irq stats for all possible cpus.
5) Debugging check for interrupts on offline cpus.
6) Hacky fixup_irqs() to redirect irqs when cpus go off/online.
7) play_dead() for offline cpus to spin inside.
8) Handle offline cpus set in flush_tlb_others().
9) Grab lock earlier in smp_call_function() to prevent CPUs going down.
10) Implement __cpu_disable() and __cpu_die().
11) Enable local interrupts in cpu_enable() after fixup_irqs()
12) Don't fiddle with NMI on dead cpu, but leave intact on other cpus.
13) Program IRQ affinity whilst cpu is still in cpu_online_map on offline.
Signed-off-by: Zwane Mwaikambo <zwane@linuxpower.ca>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Don't look up the task by its pid and then use the syscall filtering
helper. Just implement our own filter helper which operates solely on
the information in the netlink_skb_parms.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
When the task refcounting was added to audit_filter_rules() it became
more of a problem that this function was violating the 'only one
return from each function' rule. In fixing it to use a variable to store
'ret' I stupidly neglected to actually change the 'return 1;' at the
end. This makes it not work very well.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Another rollup of patches which give various symbols static scope
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds version and srcversion files to
/sys/module/${modulename} containing the version and srcversion fields
of the module's modinfo section (if present).
/sys/module/e1000
|-- srcversion
`-- version
This patch differs slightly from the version posted in January, as it
now uses the new kstrdup() call in -mm.
Why put this in sysfs?
a) Tools like DKMS, which deal with changing out individual kernel
modules without replacing the whole kernel, can behave smarter if they
can tell the version of a given module. The autoinstaller feature, for
example, which determines if your system has a "good" version of a
driver (i.e. if the one provided by DKMS has a newer verson than that
provided by the kernel package installed), and to automatically compile
and install a newer version if DKMS has it but your kernel doesn't yet
have that version.
b) Because sysadmins manually, or with tools like DKMS, can switch out
modules on the file system, you can't count on 'modinfo foo.ko', which
looks at /lib/modules/${kernelver}/... actually matching what is loaded
into the kernel already. Hence asking sysfs for this.
c) as the unbind-driver-from-device work takes shape, it will be
possible to rebind a driver that's built-in (no .ko to modinfo for the
version) to a newly loaded module. sysfs will have the
currently-built-in version info, for comparison.
d) tech support scripts can then easily grab the version info for what's
running presently - a question I get often.
There has been renewed interest in this patch on linux-scsi by driver
authors.
As the idea originated from GregKH, I leave his Signed-off-by: intact,
though the implementation is nearly completely new. Compiled and run on
x86 and x86_64.
From: Matthew Dobson <colpatch@us.ibm.com>
build fix
From: Thierry Vignaud <tvignaud@mandriva.com>
build fix
From: Matthew Dobson <colpatch@us.ibm.com>
warning fix
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The attached patch makes the following changes:
(1) There's a new special key type called ".request_key_auth".
This is an authorisation key for when one process requests a key and
another process is started to construct it. This type of key cannot be
created by the user; nor can it be requested by kernel services.
Authorisation keys hold two references:
(a) Each refers to a key being constructed. When the key being
constructed is instantiated the authorisation key is revoked,
rendering it of no further use.
(b) The "authorising process". This is either:
(i) the process that called request_key(), or:
(ii) if the process that called request_key() itself had an
authorisation key in its session keyring, then the authorising
process referred to by that authorisation key will also be
referred to by the new authorisation key.
This means that the process that initiated a chain of key requests
will authorise the lot of them, and will, by default, wind up with
the keys obtained from them in its keyrings.
(2) request_key() creates an authorisation key which is then passed to
/sbin/request-key in as part of a new session keyring.
(3) When request_key() is searching for a key to hand back to the caller, if
it comes across an authorisation key in the session keyring of the
calling process, it will also search the keyrings of the process
specified therein and it will use the specified process's credentials
(fsuid, fsgid, groups) to do that rather than the calling process's
credentials.
This allows a process started by /sbin/request-key to find keys belonging
to the authorising process.
(4) A key can be read, even if the process executing KEYCTL_READ doesn't have
direct read or search permission if that key is contained within the
keyrings of a process specified by an authorisation key found within the
calling process's session keyring, and is searchable using the
credentials of the authorising process.
This allows a process started by /sbin/request-key to read keys belonging
to the authorising process.
(5) The magic KEY_SPEC_*_KEYRING key IDs when passed to KEYCTL_INSTANTIATE or
KEYCTL_NEGATE will specify a keyring of the authorising process, rather
than the process doing the instantiation.
(6) One of the process keyrings can be nominated as the default to which
request_key() should attach new keys if not otherwise specified. This is
done with KEYCTL_SET_REQKEY_KEYRING and one of the KEY_REQKEY_DEFL_*
constants. The current setting can also be read using this call.
(7) request_key() is partially interruptible. If it is waiting for another
process to finish constructing a key, it can be interrupted. This permits
a request-key cycle to be broken without recourse to rebooting.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-Off-By: Benoit Boissinot <benoit.boissinot@ens-lyon.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The attached patch makes it possible to pass a session keyring through to the
process spawned by call_usermodehelper(). This allows patch 3/3 to pass an
authorisation key through to /sbin/request-key, thus permitting better access
controls when doing just-in-time key creation.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In the upcoming aio_down patch, it is useful to store a private data
pointer in the kiocb's wait_queue. Since we provide our own wake up
function and do not require the task_struct pointer, it makes sense to
convert the task pointer into a generic private pointer.
Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Avoid taking the tasklist_lock in sys_times if the process is single
threaded. In a NUMA system taking the tasklist_lock may cause a bouncing
cacheline if multiple independent processes continually call sys_times to
measure their performance.
Signed-off-by: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes recalc_sigpending() to work correctly with tasks which are
being freezed.
The problem is that freeze_processes() sets PF_FREEZE and TIF_SIGPENDING
flags on tasks, but recalc_sigpending() called from e.g.
sys_rt_sigtimedwait or any other kernel place will clear TIF_SIGPENDING due
to no pending signals queued and the tasks won't be freezed until it
recieves a real signal or freezed_processes() fail due to timeout.
Signed-Off-By: Kirill Korotaev <dev@sw.ru>
Signed-Off-By: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a new `suid_dumpable' sysctl:
This value can be used to query and set the core dump mode for setuid
or otherwise protected/tainted binaries. The modes are
0 - (default) - traditional behaviour. Any process which has changed
privilege levels or is execute only will not be dumped
1 - (debug) - all processes dump core when possible. The core dump is
owned by the current user and no security is applied. This is intended
for system debugging situations only. Ptrace is unchecked.
2 - (suidsafe) - any binary which normally would not be dumped is dumped
readable by root only. This allows the end user to remove such a dump but
not access it directly. For security reasons core dumps in this mode will
not overwrite one another or other files. This mode is appropriate when
adminstrators are attempting to debug problems in a normal environment.
(akpm:
> > +EXPORT_SYMBOL(suid_dumpable);
>
> EXPORT_SYMBOL_GPL?
No problem to me.
> > if (current->euid == current->uid && current->egid == current->gid)
> > current->mm->dumpable = 1;
>
> Should this be SUID_DUMP_USER?
Actually the feedback I had from last time was that the SUID_ defines
should go because its clearer to follow the numbers. They can go
everywhere (and there are lots of places where dumpable is tested/used
as a bool in untouched code)
> Maybe this should be renamed to `dump_policy' or something. Doing that
> would help us catch any code which isn't using the #defines, too.
Fair comment. The patch was designed to be easy to maintain for Red Hat
rather than for merging. Changing that field would create a gigantic
diff because it is used all over the place.
)
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Presently either multiple kprobes or only one jprobe could be inserted.
This patch removes the above limitation and allows one jprobe and multiple
kprobes to coexist at the same address. However multiple jprobes cannot
coexist with multiple kprobes. Currently I am working on the prototype to
allow multiple jprobes coexist with multiple kprobes.
Signed-off-by: Ananth N Mavinakayanhalli <amavin@redhat.com>
Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In situations where a kprobes handler calls a routine which has a probe on it,
then kprobes_handler() disarms the new probe forever. This patch removes the
above limitation by temporarily disarming the new probe. When the another
probe hits while handling the old probe, the kprobes_handler() saves previous
kprobes state and handles the new probe without calling the new kprobes
registered handlers. kprobe_post_handler() restores back the previous kprobes
state and the normal execution continues.
However on x86_64 architecture, re-rentrancy is provided only through
pre_handler(). If a routine having probe is referenced through
post_handler(), then the probes on that routine are disarmed forever, since
the exception stack is gets changed after the processor single steps the
instruction of the new probe.
This patch includes generic changes to support temporary disarming on
reentrancy of probes.
Signed-of-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch moves the lock/unlock of the arch specific kprobe_flush_task()
to the non-arch specific kprobe_flusk_task().
Signed-off-by: Hien Nguyen <hien@us.ibm.com>
Acked-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>