Currently lookup for /proc/$PID first goes through spinlock and whole list
of misc /proc entries only to confirm that, yes, /proc/42 can not possibly
match random proc entry.
List is is several dozens entries long (52 entries on my setup).
None of this is necessary.
Try to convert dentry name to integer first.
If it works, it must be /proc/$PID.
If it doesn't, it must be random proc entry.
Based on patch from Al Viro.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* remove proc_create(NULL, ...) check, let it oops
* warn about proc_create("", ...) and proc_create("very very long name", ...)
proc code keeps length as u8, no 256+ name length possible
* warn about proc_create("123", ...)
/proc/$PID and /proc/misc namespaces are separate things,
but dumb module might create funky a-la $PID entry.
* remove post mortem strchr('/') check
Triggering it implies either strchr() is buggy or memory corruption.
It should be VFS check anyway.
In reality, none of these checks will ever trigger,
it is preparation for the next patch.
Based on patch from Al Viro.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
proc_uid_seq_operations, proc_gid_seq_operations and
proc_projid_seq_operations are only called in proc_id_map_open with
seq_open as const struct seq_operations so we can constify the 3
structures and update proc_id_map_open prototype.
text data bss dec hex filename
6817 404 1984 9205 23f5 kernel/user_namespace.o-before
6913 308 1984 9205 23f5 kernel/user_namespace.o-after
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull audit updates from Eric Paris.
* git://git.infradead.org/users/eparis/audit: (28 commits)
AUDIT: make audit_is_compat depend on CONFIG_AUDIT_COMPAT_GENERIC
audit: renumber AUDIT_FEATURE_CHANGE into the 1300 range
audit: do not cast audit_rule_data pointers pointlesly
AUDIT: Allow login in non-init namespaces
audit: define audit_is_compat in kernel internal header
kernel: Use RCU_INIT_POINTER(x, NULL) in audit.c
sched: declare pid_alive as inline
audit: use uapi/linux/audit.h for AUDIT_ARCH declarations
syscall_get_arch: remove useless function arguments
audit: remove stray newline from audit_log_execve_info() audit_panic() call
audit: remove stray newlines from audit_log_lost messages
audit: include subject in login records
audit: remove superfluous new- prefix in AUDIT_LOGIN messages
audit: allow user processes to log from another PID namespace
audit: anchor all pid references in the initial pid namespace
audit: convert PPIDs to the inital PID namespace.
pid: get pid_t ppid of task in init_pid_ns
audit: rename the misleading audit_get_context() to audit_take_context()
audit: Add generic compat syscall support
audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL
...
/proc/self/make-it-fail is a boolean, but accepts any number, including
negative ones. Change variable to unsigned, and cap upper bound at 1.
[akpm@linux-foundation.org: don't make make_it_fail unsigned]
Signed-off-by: Dave Jones <davej@fedoraproject.org>
Reviewed-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The /proc/*/pagemap contain sensitive information and currently its mode
is 0444. Change this to 0400, so the VFS will prevent unprivileged
processes from getting file descriptors on arbitrary privileged
/proc/*/pagemap files.
This reduces the scope of address space leaking and bypasses by protecting
already running processes.
Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
These procfs files contain sensitive information and currently their
mode is 0444. Change this to 0400, so the VFS will be able to block
unprivileged processes from getting file descriptors on arbitrary
privileged /proc/*/{stack,syscall,personality} files.
This reduces the scope of ASLR leaking and bypasses by protecting already
running processes.
Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The expected logic of proc_map_files_get_link() is either to return 0
and initialize 'path' or return an error and leave 'path' uninitialized.
By the time dname_to_vma_addr() returns 0 the corresponding vma may have
already be gone. In this case the path is not initialized but the
return value is still 0. This results in 'general protection fault'
inside d_path().
Steps to reproduce:
CONFIG_CHECKPOINT_RESTORE=y
fd = open(...);
while (1) {
mmap(fd, ...);
munmap(fd, ...);
}
ls -la /proc/$PID/map_files
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=68991
Signed-off-by: Artem Fetishev <artem_fetishev@epam.com>
Signed-off-by: Aleksandr Terekhov <aleksandr_terekhov@epam.com>
Reported-by: <wiebittewas@gmail.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1. proc_task_readdir()->first_tid() path truncates f_pos to int, this
is wrong even on 64bit.
We could check that f_pos < PID_MAX or even INT_MAX in
proc_task_readdir(), but this patch simply checks the potential
overflow in first_tid(), this check is nop on 64bit. We do not care if
it was negative and the new unsigned value is huge, all we need to
ensure is that we never wrongly return !NULL.
2. Remove the 2nd "nr != 0" check before get_nr_threads(),
nr_threads == 0 is not distinguishable from !pid_task() above.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Sameer Nanda <snanda@chromium.org>
Cc: Sergey Dyasly <dserrg@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
proc_task_readdir() does not really need "leader", first_tid() has to
revalidate it anyway. Just pass proc_pid(inode) to first_tid() instead,
it can do pid_task(PIDTYPE_PID) itself and read ->group_leader only if
necessary.
The patch also extracts the "inode is dead" code from
pid_delete_dentry(dentry) into the new trivial helper,
proc_inode_is_dead(inode), proc_task_readdir() uses it to return -ENOENT
if this dir was removed.
This is a bit racy, but the race is very inlikely and the getdents() after
openndir() can see the empty "." + ".." dir only once.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Sameer Nanda <snanda@chromium.org>
Cc: Sergey Dyasly <dserrg@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
proc_task_readdir() verifies that the result of get_proc_task() is
pid_alive() and thus its ->group_leader is fine too. However this is not
necessarily true after rcu_read_unlock(), we need to recheck this again
after first_tid() does rcu_read_lock(). Otherwise
leader->thread_group.next (used by next_thread()) can be invalid if the
rcu grace period expires in between.
The race is subtle and unlikely, but still it is possible afaics. To
simplify lets ignore the "likely" case when tid != 0, f_version can be
cleared by proc_task_operations->llseek().
Suppose we have a main thread M and its subthread T. Suppose that f_pos
== 3, iow first_tid() should return T. Now suppose that the following
happens between rcu_read_unlock() and rcu_read_lock():
1. T execs and becomes the new leader. This removes M from
->thread_group but next_thread(M) is still T.
2. T creates another thread X which does exec as well, T
goes away.
3. X creates another subthread, this increments nr_threads.
4. first_tid() does next_thread(M) and returns the already
dead T.
Note also that we need 2. and 3. only because of get_nr_threads() check,
and this check was supposed to be optimization only.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Sameer Nanda <snanda@chromium.org>
Cc: Sergey Dyasly <dserrg@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If a task has CAP_AUDIT_CONTROL allow that task to unset their loginuid.
This would allow a child of that task to set their loginuid without
CAP_AUDIT_CONTROL. Thus when launching a new login daemon, a
priviledged helper would be able to unset the loginuid and then the
daemon, which may be malicious user facing, do not need priv to function
correctly.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>