Attempts to read() from the non-existent dmesg buffer will return zero and
userspace tends to get stuck in a busyloop.
So just remove /dev/kmsg altogether if CONFIG_PRINTK=n.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Replace all uses of kmem_cache_t with struct kmem_cache.
The patch was generated using the following script:
#!/bin/sh
#
# Replace one string by another in all the kernel sources.
#
set -e
for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
quilt add $file
sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
mv /tmp/$$ $file
quilt refresh
done
The script was run like this
sh replace kmem_cache_t "struct kmem_cache"
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
SLAB_KERNEL is an alias of GFP_KERNEL.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently a user process cannot rise its own oom_adj value (i.e.
unprotecting itself from the OOM killer). As this value is stored in the
task structure it gets inherited and the unprivileged childs will be unable
to rise it.
The EPERM will be handled by the generic proc fs layer, as only processes
with the proper caps or the owner of the process will be able to write to
the file. So we allow only the processes with CAP_SYS_RESOURCE to lower
the value, otherwise it will get an EACCES which seems more appropriate
than EPERM.
Signed-off-by: Guillem Jover <guillem.jover@nokia.com>
Acked-by: Andrea Arcangeli <andrea@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Despite mm.h is not being exported header, it does contain one thing
which is part of userspace ABI -- value disabling OOM killer for given
process. So,
a) create and export include/linux/oom.h
b) move OOM_DISABLE define there.
c) turn bounding values of /proc/$PID/oom_adj into defines and export
them too.
Note: mass __KERNEL__ removal will be done later.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Actually, the decimal representation of a 32-bit signed number can take 12
bytes, including the \0.
And then some code adds a \n as well, so let's give it 13 bytes.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Maintain a per-CPU global "struct pt_regs *" variable which can be used instead
of passing regs around manually through all ~1800 interrupt handlers in the
Linux kernel.
The regs pointer is used in few places, but it potentially costs both stack
space and code to pass it around. On the FRV arch, removing the regs parameter
from all the genirq function results in a 20% speed up of the IRQ exit path
(ie: from leaving timer_interrupt() to leaving do_IRQ()).
Where appropriate, an arch may override the generic storage facility and do
something different with the variable. On FRV, for instance, the address is
maintained in GR28 at all times inside the kernel as part of general exception
handling.
Having looked over the code, it appears that the parameter may be handed down
through up to twenty or so layers of functions. Consider a USB character
device attached to a USB hub, attached to a USB controller that posts its
interrupts through a cascaded auxiliary interrupt controller. A character
device driver may want to pass regs to the sysrq handler through the input
layer which adds another few layers of parameter passing.
I've build this code with allyesconfig for x86_64 and i386. I've runtested the
main part of the code on FRV and i386, though I can't test most of the drivers.
I've also done partial conversion for powerpc and MIPS - these at least compile
with minimal configurations.
This will affect all archs. Mostly the changes should be relatively easy.
Take do_IRQ(), store the regs pointer at the beginning, saving the old one:
struct pt_regs *old_regs = set_irq_regs(regs);
And put the old one back at the end:
set_irq_regs(old_regs);
Don't pass regs through to generic_handle_irq() or __do_IRQ().
In timer_interrupt(), this sort of change will be necessary:
- update_process_times(user_mode(regs));
- profile_tick(CPU_PROFILING, regs);
+ update_process_times(user_mode(get_irq_regs()));
+ profile_tick(CPU_PROFILING);
I'd like to move update_process_times()'s use of get_irq_regs() into itself,
except that i386, alone of the archs, uses something other than user_mode().
Some notes on the interrupt handling in the drivers:
(*) input_dev() is now gone entirely. The regs pointer is no longer stored in
the input_dev struct.
(*) finish_unlinks() in drivers/usb/host/ohci-q.c needs checking. It does
something different depending on whether it's been supplied with a regs
pointer or not.
(*) Various IRQ handler function pointers have been moved to type
irq_handler_t.
Signed-Off-By: David Howells <dhowells@redhat.com>
(cherry picked from 1b16e7ac850969f38b375e511e3fa2f474a33867 commit)
proc_pid_make_inode:
ei->pid = get_pid(task_pid(task));
I think this is not safe. get_pid() can be preempted after checking "pid
!= NULL". Then the task exits, does detach_pid(), and RCU frees the pid.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It was pointed out that since I am taking ARRAY_SIZE anyway the trailing empty
entry is silly and just wastes space.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The value doesn't change but this ensures I will have the proper value when
other files are added to proc_base_stuff.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
task_state() needs tasklist_lock to protect ->parent/->real_parent. However
task->parent points to nowhere only when the actions below happen in order
1) release_task(task)
2) release_task(task->parent)
3) a grace period passed
But 3) implies that the memory ops from 1) should be finished, so pid_alive()
can't be true in such a case.
Otherwise, we don't care if ->parent/->real_parent changes under us.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Drop tasklist_lock. ->siglock protects almost all interesting data
(including sub-threads traversal) except:
->signal->tty
protected by tty_mutex
->real_parent
the task can't be unhashed while we are holding
->siglock, so ->real_parent can change from under us
but we can safely dereference it under rcu_read_lock()
->pgrp/->session
we can get inconsistent numbers if the task does
sys_setsid/daemonize at the same time. I hope this
is acceptable.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The implementation is exactly the same and there is currently nothing to
distinguish proc_tid_attr, and proc_tgid_attr. So it is pointless to have two
separate implementations.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The hard coded inode numbers in proc currently limit its maintainability,
its flexibility, and what can be done with the rest of system. /proc limits
pid-max to 32768 on 32 bit systems it limits fd-max to 32768 on all systems,
and placing the pid in the inode number really gets in the way of implementing
subdirectories of per process information.
Ever since people started adding to the middle of the file type enumeration we
haven't been maintaing the historical inode numbers, all we have really
succeeded in doing is keeping the pid in the proc inode number. The pid is
already available in the directory name so no information is lost removing it
from the inode number.
So if something in user space cares if we remove the inode number from the
/proc inode it is almost certainly broken.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
To remove the hard coded proc inode numbers it is necessary to be able to
create the proc inodes during readdir. The instantiate methods are the subset
of lookup that is needed to accomplish that.
This first step just splits the lookup methods into 2 functions.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch generalizes the concept of files in /proc that are related to
processes but live in the root directory of /proc
Ideally this would reuse infrastructure from the rest of the process specific
parts of proc but unfortunately security_task_to_inode must not be called on
files that are not strictly per process. security_task_to_inode really needs
to be reexamined as the security label can change in important places that we
are not currently catching, but I'm not certain that simplifies this problem.
By at least matching the structure of the rest of proc we get more idiom reuse
and it becomes easier to spot problems in the way things are put together.
Later things like /proc/mounts are likely to be moved into proc_base as well.
If union mounts are ever supported we may be able to make /proc a union mount,
and properly split it into 2 filesystems.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This moves the mount namespace into the nsproxy. The mount namespace count
now refers to the number of nsproxies point to it, rather than the number of
tasks. As a result, the unshare_namespace() function in kernel/fork.c no
longer checks whether it is being shared.
Signed-off-by: Serge Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Helper functions in base.c like proc_pident_readdir and proc_pident_lookup
assume the directories have an associated task, and cannot currently be used
on the /proc root directory because it does not have such a task.
This small changes allows for base.c to be simplified and later when multiple
pid spaces are introduced it makes getting the needed context information
trivial.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>