This fixes CVE-2013-1792.
There is a race in install_user_keyrings() that can cause a NULL pointer
dereference when called concurrently for the same user if the uid and
uid-session keyrings are not yet created. It might be possible for an
unprivileged user to trigger this by calling keyctl() from userspace in
parallel immediately after logging in.
Assume that we have two threads both executing lookup_user_key(), both
looking for KEY_SPEC_USER_SESSION_KEYRING.
THREAD A THREAD B
=============================== ===============================
==>call install_user_keyrings();
if (!cred->user->session_keyring)
==>call install_user_keyrings()
...
user->uid_keyring = uid_keyring;
if (user->uid_keyring)
return 0;
<==
key = cred->user->session_keyring [== NULL]
user->session_keyring = session_keyring;
atomic_inc(&key->usage); [oops]
At the point thread A dereferences cred->user->session_keyring, thread B
hasn't updated user->session_keyring yet, but thread A assumes it is
populated because install_user_keyrings() returned ok.
The race window is really small but can be exploited if, for example,
thread B is interrupted or preempted after initializing uid_keyring, but
before doing setting session_keyring.
This couldn't be reproduced on a stock kernel. However, after placing
systemtap probe on 'user->session_keyring = session_keyring;' that
introduced some delay, the kernel could be crashed reliably.
Fix this by checking both pointers before deciding whether to return.
Alternatively, the test could be done away with entirely as it is checked
inside the mutex - but since the mutex is global, that may not be the best
way.
Signed-off-by: David Howells <dhowells@redhat.com>
Reported-by: Mateusz Guzik <mguzik@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
A patch to fix some unreachable code in search_my_process_keyrings() got
applied twice by two different routes upstream as commits e67eab39be
and b010520ab3 (both "fix unreachable code").
Unfortunately, the second application removed something it shouldn't
have and this wasn't detected by GIT. This is due to the patch not
having sufficient lines of context to distinguish the two places of
application.
The effect of this is relatively minor: inside the kernel, the keyring
search routines may search multiple keyrings and then prioritise the
errors if no keys or negative keys are found in any of them. With the
extra deletion, the presence of a negative key in the thread keyring
(causing ENOKEY) is incorrectly overridden by an error searching the
process keyring.
So revert the second application of the patch.
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull security subsystem updates from James Morris:
"A quiet cycle for the security subsystem with just a few maintenance
updates."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
Smack: create a sysfs mount point for smackfs
Smack: use select not depends in Kconfig
Yama: remove locking from delete path
Yama: add RCU to drop read locking
drivers/char/tpm: remove tasklet and cleanup
KEYS: Use keyring_alloc() to create special keyrings
KEYS: Reduce initial permissions on keys
KEYS: Make the session and process keyrings per-thread
seccomp: Make syscall skipping and nr changes more consistent
key: Fix resource leak
keys: Fix unreachable code
KEYS: Add payload preparsing opportunity prior to key instantiate or update
Reduce the initial permissions on new keys to grant the possessor everything,
view permission only to the user (so the keys can be seen in /proc/keys) and
nothing else.
This gives the creator a chance to adjust the permissions mask before other
processes can access the new key or create a link to it.
To aid with this, keyring_alloc() now takes a permission argument rather than
setting the permissions itself.
The following permissions are now set:
(1) The user and user-session keyrings grant the user that owns them full
permissions and grant a possessor everything bar SETATTR.
(2) The process and thread keyrings grant the possessor full permissions but
only grant the user VIEW. This permits the user to see them in
/proc/keys, but not to do anything with them.
(3) Anonymous session keyrings grant the possessor full permissions, but only
grant the user VIEW and READ. This means that the user can see them in
/proc/keys and can list them, but nothing else. Possibly READ shouldn't
be provided either.
(4) Named session keyrings grant everything an anonymous session keyring does,
plus they grant the user LINK permission. The whole point of named
session keyrings is that others can also subscribe to them. Possibly this
should be a separate permission to LINK.
(5) The temporary session keyring created by call_sbin_request_key() gets the
same permissions as an anonymous session keyring.
(6) Keys created by add_key() get VIEW, SEARCH, LINK and SETATTR for the
possessor, plus READ and/or WRITE if the key type supports them. The used
only gets VIEW now.
(7) Keys created by request_key() now get the same as those created by
add_key().
Reported-by: Lennart Poettering <lennart@poettering.net>
Reported-by: Stef Walter <stefw@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Make the session keyring per-thread rather than per-process, but still
inherited from the parent thread to solve a problem with PAM and gdm.
The problem is that join_session_keyring() will reject attempts to change the
session keyring of a multithreaded program but gdm is now multithreaded before
it gets to the point of starting PAM and running pam_keyinit to create the
session keyring. See:
https://bugs.freedesktop.org/show_bug.cgi?id=49211
The reason that join_session_keyring() will only change the session keyring
under a single-threaded environment is that it's hard to alter the other
thread's credentials to effect the change in a multi-threaded program. The
problems are such as:
(1) How to prevent two threads both running join_session_keyring() from
racing.
(2) Another thread's credentials may not be modified directly by this process.
(3) The number of threads is uncertain whilst we're not holding the
appropriate spinlock, making preallocation slightly tricky.
(4) We could use TIF_NOTIFY_RESUME and key_replace_session_keyring() to get
another thread to replace its keyring, but that means preallocating for
each thread.
A reasonable way around this is to make the session keyring per-thread rather
than per-process and just document that if you want a common session keyring,
you must get it before you spawn any threads - which is the current situation
anyway.
Whilst we're at it, we can the process keyring behave in the same way. This
means we can clean up some of the ickyness in the creds code.
Basically, after this patch, the session, process and thread keyrings are about
inheritance rules only and not about sharing changes of keyring.
Reported-by: Mantas M. <grawity@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Ray Strode <rstrode@redhat.com>
- Replace key_user ->user_ns equality checks with kuid_has_mapping checks.
- Use from_kuid to generate key descriptions
- Use kuid_t and kgid_t and the associated helpers instead of uid_t and gid_t
- Avoid potential problems with file descriptor passing by displaying
keys in the user namespace of the opener of key status proc files.
Cc: linux-security-module@vger.kernel.org
Cc: keyrings@linux-nfs.org
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
task_work and rcu_head are identical now; merge them (calling the result
struct callback_head, rcu_head #define'd to it), kill separate allocation
in security/keys since we can just use cred->rcu now.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
get rid of the only user of ->data; this is _not_ the final variant - in the
end we'll have task_work and rcu_head identical and just use cred->rcu,
at which point the separate allocation will be gone completely.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change keyctl_session_to_parent() to use task_work_add() and move
key_replace_session_keyring() logic into task_work->func().
Note that we do task_work_cancel() before task_work_add() to ensure that
only one work can be pending at any time. This is important, we must not
allow user-space to abuse the parent's ->task_works list.
The callback, replace_session_keyring(), checks PF_EXITING. I guess this
is not really needed but looks better.
As a side effect, this fixes the (unlikely) race. The callers of
key_replace_session_keyring() and keyctl_session_to_parent() lack the
necessary barriers, the parent can miss the request.
Now we can remove task_struct->replacement_session_keyring and related
code.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexander Gordeev <agordeev@redhat.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Smith <dsmith@redhat.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull user namespace enhancements from Eric Biederman:
"This is a course correction for the user namespace, so that we can
reach an inexpensive, maintainable, and reasonably complete
implementation.
Highlights:
- Config guards make it impossible to enable the user namespace and
code that has not been converted to be user namespace safe.
- Use of the new kuid_t type ensures the if you somehow get past the
config guards the kernel will encounter type errors if you enable
user namespaces and attempt to compile in code whose permission
checks have not been updated to be user namespace safe.
- All uids from child user namespaces are mapped into the initial
user namespace before they are processed. Removing the need to add
an additional check to see if the user namespace of the compared
uids remains the same.
- With the user namespaces compiled out the performance is as good or
better than it is today.
- For most operations absolutely nothing changes performance or
operationally with the user namespace enabled.
- The worst case performance I could come up with was timing 1
billion cache cold stat operations with the user namespace code
enabled. This went from 156s to 164s on my laptop (or 156ns to
164ns per stat operation).
- (uid_t)-1 and (gid_t)-1 are reserved as an internal error value.
Most uid/gid setting system calls treat these value specially
anyway so attempting to use -1 as a uid would likely cause
entertaining failures in userspace.
- If setuid is called with a uid that can not be mapped setuid fails.
I have looked at sendmail, login, ssh and every other program I
could think of that would call setuid and they all check for and
handle the case where setuid fails.
- If stat or a similar system call is called from a context in which
we can not map a uid we lie and return overflowuid. The LFS
experience suggests not lying and returning an error code might be
better, but the historical precedent with uids is different and I
can not think of anything that would break by lying about a uid we
can't map.
- Capabilities are localized to the current user namespace making it
safe to give the initial user in a user namespace all capabilities.
My git tree covers all of the modifications needed to convert the core
kernel and enough changes to make a system bootable to runlevel 1."
Fix up trivial conflicts due to nearby independent changes in fs/stat.c
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits)
userns: Silence silly gcc warning.
cred: use correct cred accessor with regards to rcu read lock
userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq
userns: Convert cgroup permission checks to use uid_eq
userns: Convert tmpfs to use kuid and kgid where appropriate
userns: Convert sysfs to use kgid/kuid where appropriate
userns: Convert sysctl permission checks to use kuid and kgids.
userns: Convert proc to use kuid/kgid where appropriate
userns: Convert ext4 to user kuid/kgid where appropriate
userns: Convert ext3 to use kuid/kgid where appropriate
userns: Convert ext2 to use kuid/kgid where appropriate.
userns: Convert devpts to use kuid/kgid where appropriate
userns: Convert binary formats to use kuid/kgid where appropriate
userns: Add negative depends on entries to avoid building code that is userns unsafe
userns: signal remove unnecessary map_cred_ns
userns: Teach inode_capable to understand inodes whose uids map to other namespaces.
userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
userns: Convert stat to return values mapped from kuids and kgids
userns: Convert user specfied uids and gids in chown into kuids and kgid
userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
...
Do an LRU discard in keyrings that are full rather than returning ENFILE. To
perform this, a time_t is added to the key struct and updated by the creation
of a link to a key and by a key being found as the result of a search. At the
completion of a successful search, the keyrings in the path between the root of
the search and the first found link to it also have their last-used times
updated.
Note that discarding a link to a key from a keyring does not necessarily
destroy the key as there may be references held by other places.
An alternate discard method that might suffice is to perform FIFO discard from
the keyring, using the spare 2-byte hole in the keylist header as the index of
the next link to be discarded.
This is useful when using a keyring as a cache for DNS results or foreign
filesystem IDs.
This can be tested by the following. As root do:
echo 1000 >/proc/sys/kernel/keys/root_maxkeys
kr=`keyctl newring foo @s`
for ((i=0; i<2000; i++)); do keyctl add user a$i a $kr; done
Without this patch ENFILE should be reported when the keyring fills up. With
this patch, the keyring discards keys in an LRU fashion. Note that the stored
LRU time has a granularity of 1s.
After doing this, /proc/key-users can be observed and should show that most of
the 2000 keys have been discarded:
[root@andromeda ~]# cat /proc/key-users
0: 517 516/516 513/1000 5249/20000
The "513/1000" here is the number of quota-accounted keys present for this user
out of the maximum permitted.
In /proc/keys, the keyring shows the number of keys it has and the number of
slots it has allocated:
[root@andromeda ~]# grep foo /proc/keys
200c64c4 I--Q-- 1 perm 3b3f0000 0 0 keyring foo: 509/509
The maximum is (PAGE_SIZE - header) / key pointer size. That's typically 509
on a 64-bit system and 1020 on a 32-bit system.
Signed-off-by: David Howells <dhowells@redhat.com>
struct user_struct will shortly loose it's user_ns reference
so make the cred user_ns reference a proper reference complete
with reference counting.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Optimize performance and prepare for the removal of the user_ns reference
from user_struct. Remove the slow long walk through cred->user->user_ns and
instead go straight to cred->user_ns.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
The test for "if (cred->request_key_auth->flags & KEY_FLAG_REVOKED) {"
should actually testing that the (1 << KEY_FLAG_REVOKED) bit is set.
The current code actually checks for KEY_FLAG_DEAD.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
The keyctl call:
keyctl_get_keyring_ID(KEY_SPEC_SESSION_KEYRING, 1)
should create a session keyring if the process doesn't have one of its own
because the create flag argument is set - rather than subscribing to and
returning the user-session keyring as:
keyctl_get_keyring_ID(KEY_SPEC_SESSION_KEYRING, 0)
will do.
This can be tested by commenting out pam_keyinit in the /etc/pam.d files and
running the following program a couple of times in a row:
#include <stdio.h>
#include <stdlib.h>
#include <keyutils.h>
int main(int argc, char *argv[])
{
key_serial_t uk, usk, sk, nsk;
uk = keyctl_get_keyring_ID(KEY_SPEC_USER_KEYRING, 0);
usk = keyctl_get_keyring_ID(KEY_SPEC_USER_SESSION_KEYRING, 0);
sk = keyctl_get_keyring_ID(KEY_SPEC_SESSION_KEYRING, 0);
nsk = keyctl_get_keyring_ID(KEY_SPEC_SESSION_KEYRING, 1);
printf("keys: %08x %08x %08x %08x\n", uk, usk, sk, nsk);
return 0;
}
Without this patch, I see:
keys: 3975ddc7 119c0c66 119c0c66 119c0c66
keys: 3975ddc7 119c0c66 119c0c66 119c0c66
With this patch, I see:
keys: 2cb4997b 34112878 34112878 17db2ce3
keys: 2cb4997b 34112878 34112878 39f3c73e
As can be seen, the session keyring starts off the same as the user-session
keyring each time, but with the patch a new session keyring is created when
the create flag is set.
Reported-by: Greg Wettstein <greg@enjellic.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Greg Wettstein <greg@enjellic.com>
Signed-off-by: James Morris <jmorris@namei.org>
If install_session_keyring() is given a keyring, it should install it rather
than just creating a new one anyway. This was accidentally broken in:
commit d84f4f992c
Author: David Howells <dhowells@redhat.com>
Date: Fri Nov 14 10:39:23 2008 +1100
Subject: CRED: Inaugurate COW credentials
The impact of that commit is that pam_keyinit no longer works correctly if
'force' isn't specified against a login process. This is because:
keyctl_get_keyring_ID(KEY_SPEC_SESSION_KEYRING, 0)
now always creates a new session keyring and thus the check whether the session
keyring and the user-session keyring are the same is always false. This leads
pam_keyinit to conclude that a session keyring is installed and it shouldn't be
revoked by pam_keyinit here if 'revoke' is specified.
Any system that specifies 'force' against pam_keyinit in the PAM configuration
files for login methods (login, ssh, su -l, kdm, etc.) is not affected since
that bypasses the broken check and forces the creation of a new session keyring
anyway (for which the revoke flag is not cleared) - and any subsequent call to
pam_keyinit really does have a session keyring already installed, and so the
check works correctly there.
Reverting to the previous behaviour will cause the kernel to subscribe the
process to the user-session keyring as its session keyring if it doesn't have a
session keyring of its own. pam_keyinit will detect this and install a new
session keyring anyway (and won't clear the revert flag).
This can be tested by commenting out pam_keyinit in the /etc/pam.d files and
running the following program a couple of times in a row:
#include <stdio.h>
#include <stdlib.h>
#include <keyutils.h>
int main(int argc, char *argv[])
{
key_serial_t uk, usk, sk;
uk = keyctl_get_keyring_ID(KEY_SPEC_USER_KEYRING, 0);
usk = keyctl_get_keyring_ID(KEY_SPEC_USER_SESSION_KEYRING, 0);
sk = keyctl_get_keyring_ID(KEY_SPEC_SESSION_KEYRING, 0);
printf("keys: %08x %08x %08x\n", uk, usk, sk);
return 0;
}
Without the patch, I see:
keys: 3884e281 24c4dfcf 22825f8e
keys: 3884e281 24c4dfcf 068772be
With the patch, I see:
keys: 26be9c83 0e755ce0 0e755ce0
keys: 26be9c83 0e755ce0 0e755ce0
As can be seen, with the patch, the session keyring is the same as the
user-session keyring each time; without the patch a new session keyring is
generated each time.
Reported-by: Greg Wettstein <greg@enjellic.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Greg Wettstein <greg@enjellic.com>
Signed-off-by: James Morris <jmorris@namei.org>
Since this cred was not created with copy_creds(), it needs to get
initialized. Otherwise use of syscall(__NR_keyctl, KEYCTL_SESSION_TO_PARENT);
can lead to a NULL deref. Thanks to Robert for finding this.
But introduced by commit 47a150edc2 ("Cache user_ns in struct cred").
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Reported-by: Robert Święcki <robert@swiecki.net>
Cc: David Howells <dhowells@redhat.com>
Cc: stable@kernel.org (2.6.39)
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Improve /proc/keys by:
(1) Don't attempt to summarise the payload of a negated key. It won't have
one. To this end, a helper function - key_is_instantiated() has been
added that allows the caller to find out whether the key is positively
instantiated (as opposed to being uninstantiated or negatively
instantiated).
(2) Do show keys that are negative, expired or revoked rather than hiding
them. This requires an override flag (no_state_check) to be passed to
search_my_process_keyrings() and keyring_search_aux() to suppress this
check.
Without this, keys that are possessed by the caller, but only grant
permissions to the caller if possessed are skipped as the possession check
fails.
Keys that are visible due to user, group or other checks are visible with
or without this patch.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Do a bit of a style clean up in the key management code. No functional
changes.
Done using:
perl -p -i -e 's!^/[*]*/\n!!' security/keys/*.c
perl -p -i -e 's!} /[*] end [a-z0-9_]*[(][)] [*]/\n!}\n!' security/keys/*.c
sed -i -s -e ": next" -e N -e 's/^\n[}]$/}/' -e t -e P -e 's/^.*\n//' -e "b next" security/keys/*.c
To remove /*****/ lines, remove comments on the closing brace of a
function to name the function and remove blank lines before the closing
brace of a function.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>