Add a light version of override/revert_creds(), this should only be
used when the credentials in question will outlive the critical
section and the critical section doesn't change the ->usage of the
credentials.
Suggested-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Commit 0a31bd5f2b ("KMEM_CACHE(): simplify slab cache creation")
introduces a new macro. Use the new KMEM_CACHE() macro instead of
direct kmem_cache_create() to simplify the creation of SLAB caches.
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
[PM: alignment fixes in both code and description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
This code is rarely (never?) enabled by distros, and it hasn't caught
anything in decades. Let's kill off this legacy debug code.
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are multiple ways to grab references to credentials, and the only
protection we have against overflowing it is the memory required to do
so.
With memory sizes only moving in one direction, let's bump the reference
count to 64-bit and move it outside the realm of feasibly overflowing.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull LSM updates from Paul Moore:
- Add new credential functions, get_cred_many() and put_cred_many() to
save some atomic_t operations for a few operations.
While not strictly LSM related, this patchset had been rotting on the
mailing lists for some time and since the LSMs do care a lot about
credentials I thought it reasonable to give this patch a home.
- Five patches to constify different LSM hook parameters.
- Fix a spelling mistake.
* tag 'lsm-pr-20231030' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
lsm: fix a spelling mistake
cred: add get_cred_many and put_cred_many
lsm: constify 'sb' parameter in security_sb_kern_mount()
lsm: constify 'bprm' parameter in security_bprm_committed_creds()
lsm: constify 'bprm' parameter in security_bprm_committing_creds()
lsm: constify 'file' parameter in security_bprm_creds_from_file()
lsm: constify 'sb' parameter in security_quotactl()
atomic_t variables are currently used to implement reference counters
with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)
Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows and
underflows. This is important since overflows and underflows can lead
to use-after-free situation and be exploitable.
The variable group_info.usage is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
**Important note for maintainers:
Some functions from refcount_t API defined in refcount.h have different
memory ordering guarantees than their atomic counterparts. Please check
Documentation/core-api/refcount-vs-atomic.rst for more information.
Normally the differences should not matter since refcount_t provides
enough guarantees to satisfy the refcounting use cases, but in some
rare cases it might matter. Please double check that you don't have
some undocumented memory guarantees for this variable usage.
For the group_info.usage it might make a difference in following places:
- put_group_info(): decrement in refcount_dec_and_test() only
provides RELEASE ordering and ACQUIRE ordering on success vs. fully
ordered atomic counterpart
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Link: https://lore.kernel.org/r/20230818041456.gonna.009-kees@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Some of the frequent consumers of get_cred and put_cred operate on 2
references on the same creds back-to-back.
Switch them to doing the work in one go instead.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
[PM: removed changelog from commit description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Michal Koutný <mkoutny@suse.com> wrote:
> Tasks are associated to multiple users at once. Historically and as per
> setrlimit(2) RLIMIT_NPROC is enforce based on real user ID.
>
> The commit 21d1c5e386 ("Reimplement RLIMIT_NPROC on top of ucounts")
> made the accounting structure "indexed" by euid and hence potentially
> account tasks differently.
>
> The effective user ID may be different e.g. for setuid programs but
> those are exec'd into already existing task (i.e. below limit), so
> different accounting is moot.
>
> Some special setresuid(2) users may notice the difference, justifying
> this fix.
I looked at cred->ucount and it is only used for rlimit operations
that were previously stored in cred->user. Making the fact
cred->ucount can refer to a different user from cred->user a bug,
affecting all uses of cred->ulimit not just RLIMIT_NPROC.
Fix set_cred_ucounts to always use the real uid not the effective uid.
Further simplify set_cred_ucounts by noticing that set_cred_ucounts
somehow retained a draft version of the check to see if alloc_ucounts
was needed that checks the new->user and new->user_ns against the
current_real_cred(). Remove that draft version of the check.
All that matters for setting the cred->ucounts are the user_ns and uid
fields in the cred.
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20220207121800.5079-4-mkoutny@suse.com
Link: https://lkml.kernel.org/r/20220216155832.680775-3-ebiederm@xmission.com
Reported-by: Michal Koutný <mkoutny@suse.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Fixes: 21d1c5e386 ("Reimplement RLIMIT_NPROC on top of ucounts")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Any cred that is destined for use by commit_creds must have a non-NULL
cred->ucounts field. Only curing credential construction is a NULL
cred->ucounts valid. Only abort_creds, put_cred, and put_cred_rcu
needs to deal with a cred with a NULL ucount. As set_cred_ucounts is
non of those case don't confuse people by handling something that can
not happen.
Link: https://lkml.kernel.org/r/871r4irzds.fsf_-_@disp2133
Tested-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Alexey Gladkov <legion@kernel.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Setting cred->ucounts in cred_alloc_blank does not make sense. The
uid and user_ns are deliberately not set in cred_alloc_blank but
instead the setting is delayed until key_change_session_keyring.
So move dealing with ucounts into key_change_session_keyring as well.
Unfortunately that movement of get_ucounts adds a new failure mode to
key_change_session_keyring. I do not see anything stopping the parent
process from calling setuid and changing the relevant part of it's
cred while keyctl_session_to_parent is running making it fundamentally
necessary to call get_ucounts in key_change_session_keyring. Which
means that the new failure mode cannot be avoided.
A failure of key_change_session_keyring results in a single threaded
parent keeping it's existing credentials. Which results in the parent
process not being able to access the session keyring and whichever
keys are in the new keyring.
Further get_ucounts is only expected to fail if the number of bits in
the refernece count for the structure is too few.
Since the code has no other way to report the failure of get_ucounts
and because such failures are not expected to be common add a WARN_ONCE
to report this problem to userspace.
Between the WARN_ONCE and the parent process not having access to
the keys in the new session keyring I expect any failure of get_ucounts
will be noticed and reported and we can find another way to handle this
condition. (Possibly by just making ucounts->count an atomic_long_t).
Cc: stable@vger.kernel.org
Fixes: 905ae01c4a ("Add a reference to ucounts for each cred")
Link: https://lkml.kernel.org/r/7k0ias0uf.fsf_-_@disp2133
Tested-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Alexey Gladkov <legion@kernel.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Pull user namespace rlimit handling update from Eric Biederman:
"This is the work mainly by Alexey Gladkov to limit rlimits to the
rlimits of the user that created a user namespace, and to allow users
to have stricter limits on the resources created within a user
namespace."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
cred: add missing return error code when set_cred_ucounts() failed
ucounts: Silence warning in dec_rlimit_ucounts
ucounts: Set ucount_max to the largest positive value the type can hold
kselftests: Add test to check for rlimit changes in different user namespaces
Reimplement RLIMIT_MEMLOCK on top of ucounts
Reimplement RLIMIT_SIGPENDING on top of ucounts
Reimplement RLIMIT_MSGQUEUE on top of ucounts
Reimplement RLIMIT_NPROC on top of ucounts
Use atomic_t for ucounts reference counting
Add a reference to ucounts for each cred
Increase size of ucounts to atomic_long_t
The rlimit counter is tied to uid in the user_namespace. This allows
rlimit values to be specified in userns even if they are already
globally exceeded by the user. However, the value of the previous
user_namespaces cannot be exceeded.
To illustrate the impact of rlimits, let's say there is a program that
does not fork. Some service-A wants to run this program as user X in
multiple containers. Since the program never fork the service wants to
set RLIMIT_NPROC=1.
service-A
\- program (uid=1000, container1, rlimit_nproc=1)
\- program (uid=1000, container2, rlimit_nproc=1)
The service-A sets RLIMIT_NPROC=1 and runs the program in container1.
When the service-A tries to run a program with RLIMIT_NPROC=1 in
container2 it fails since user X already has one running process.
We cannot use existing inc_ucounts / dec_ucounts because they do not
allow us to exceed the maximum for the counter. Some rlimits can be
overlimited by root or if the user has the appropriate capability.
Changelog
v11:
* Change inc_rlimit_ucounts() which now returns top value of ucounts.
* Drop inc_rlimit_ucounts_and_test() because the return code of
inc_rlimit_ucounts() can be checked.
Signed-off-by: Alexey Gladkov <legion@kernel.org>
Link: https://lkml.kernel.org/r/c5286a8aa16d2d698c222f7532f3d735c82bc6bc.1619094428.git.legion@kernel.org
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
For RLIMIT_NPROC and some other rlimits the user_struct that holds the
global limit is kept alive for the lifetime of a process by keeping it
in struct cred. Adding a pointer to ucounts in the struct cred will
allow to track RLIMIT_NPROC not only for user in the system, but for
user in the user_namespace.
Updating ucounts may require memory allocation which may fail. So, we
cannot change cred.ucounts in the commit_creds() because this function
cannot fail and it should always return 0. For this reason, we modify
cred.ucounts before calling the commit_creds().
Changelog
v6:
* Fix null-ptr-deref in is_ucounts_overlimit() detected by trinity. This
error was caused by the fact that cred_alloc_blank() left the ucounts
pointer empty.
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Alexey Gladkov <legion@kernel.org>
Link: https://lkml.kernel.org/r/b37aaef28d8b9b0d757e07ba6dd27281bbe39259.1619094428.git.legion@kernel.org
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
It is almost possible to use the result of prepare_exec_creds with no
modifications during exec. Update prepare_exec_creds to initialize
the suid and the fsuid to the euid, and the sgid and the fsgid to the
egid. This is all that is needed to handle the common case of exec
when nothing special like a setuid exec is happening.
That this preserves the existing behavior of exec can be verified
by examing bprm_fill_uid and cap_bprm_set_creds.
This change makes it clear that the later parts of exec that
update bprm->cred are just need to handle special cases such
as setuid exec and change of domains.
Link: https://lkml.kernel.org/r/871rng22dm.fsf_-_@x220.int.ebiederm.org
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
This removes an outdated comment in prepare_kernel_cred.
There is no "cred_replace_mutex" any more, so the comment must
go away.
Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Merge misc fixes from David Howells.
Two afs fixes and a key refcounting fix.
* dhowells:
afs: Fix afs_lookup() to not clobber the version on a new dentry
afs: Fix use-after-loss-of-ref
keys: Fix request_key() cache