Commit Graph

143 Commits

Author SHA1 Message Date
James Morris
e42f6f9be4 Merge tag 'v4.19-rc2' into next-general
Sync to Linux 4.19-rc2 for downstream developers.
2018-09-04 11:35:54 -07:00
Christian Brauner
4408e300a6 security/capabilities: remove check for -EINVAL
bprm_caps_from_vfs_caps() never returned -EINVAL so remove the
rc == -EINVAL check.

Signed-off-by: Christian Brauner <christian@brauner.io>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
2018-08-29 09:05:28 -07:00
Eddie.Horng
355139a8db cap_inode_getsecurity: use d_find_any_alias() instead of d_find_alias()
The code in cap_inode_getsecurity(), introduced by commit 8db6c34f1d
("Introduce v3 namespaced file capabilities"), should use
d_find_any_alias() instead of d_find_alias() do handle unhashed dentry
correctly. This is needed, for example, if execveat() is called with an
open but unlinked overlayfs file, because overlayfs unhashes dentry on
unlink.
This is a regression of real life application, first reported at
https://www.spinics.net/lists/linux-unionfs/msg05363.html

Below reproducer and setup can reproduce the case.
  const char* exec="echo";
  const char *newargv[] = { "echo", "hello", NULL};
  const char *newenviron[] = { NULL };
  int fd, err;

  fd = open(exec, O_PATH);
  unlink(exec);
  err = syscall(322/*SYS_execveat*/, fd, "", newargv, newenviron,
AT_EMPTY_PATH);
  if(err<0)
    fprintf(stderr, "execveat: %s\n", strerror(errno));

gcc compile into ~/test/a.out
mount -t overlay -orw,lowerdir=/mnt/l,upperdir=/mnt/u,workdir=/mnt/w
none /mnt/m
cd /mnt/m
cp /bin/echo .
~/test/a.out

Expected result:
hello
Actually result:
execveat: Invalid argument
dmesg:
Invalid argument reading file caps for /dev/fd/3

The 2nd reproducer and setup emulates similar case but for
regular filesystem:
  const char* exec="echo";
  int fd, err;
  char buf[256];

  fd = open(exec, O_RDONLY);
  unlink(exec);
  err = fgetxattr(fd, "security.capability", buf, 256);
  if(err<0)
    fprintf(stderr, "fgetxattr: %s\n", strerror(errno));

gcc compile into ~/test_fgetxattr

cd /tmp
cp /bin/echo .
~/test_fgetxattr

Result:
fgetxattr: Invalid argument

On regular filesystem, for example, ext4 read xattr from
disk and return to execveat(), will not trigger this issue, however,
the overlay attr handler pass real dentry to vfs_getxattr() will.
This reproducer calls fgetxattr() with an unlinked fd, involkes
vfs_getxattr() then reproduced the case that d_find_alias() in
cap_inode_getsecurity() can't find the unlinked dentry.

Suggested-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Serge E. Hallyn <serge@hallyn.com>
Fixes: 8db6c34f1d ("Introduce v3 namespaced file capabilities")
Cc: <stable@vger.kernel.org> # v4.14
Signed-off-by: Eddie Horng <eddie.horng@mediatek.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2018-08-11 02:05:53 -05:00
Eric W. Biederman
b1d749c5c3 capabilities: Allow privileged user in s_user_ns to set security.* xattrs
A privileged user in s_user_ns will generally have the ability to
manipulate the backing store and insert security.* xattrs into
the filesystem directly. Therefore the kernel must be prepared to
handle these xattrs from unprivileged mounts, and it makes little
sense for commoncap to prevent writing these xattrs to the
filesystem. The capability and LSM code have already been updated
to appropriately handle xattrs from unprivileged mounts, so it
is safe to loosen this restriction on setting xattrs.

The exception to this logic is that writing xattrs to a mounted
filesystem may also cause the LSM inode_post_setxattr or
inode_setsecurity callbacks to be invoked. SELinux will deny the
xattr update by virtue of applying mountpoint labeling to
unprivileged userns mounts, and Smack will deny the writes for
any user without global CAP_MAC_ADMIN, so loosening the
capability check in commoncap is safe in this respect as well.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2018-05-24 12:03:31 -05:00
Tetsuo Handa
1f5781725d commoncap: Handle memory allocation failure.
syzbot is reporting NULL pointer dereference at xattr_getsecurity() [1],
for cap_inode_getsecurity() is returning sizeof(struct vfs_cap_data) when
memory allocation failed. Return -ENOMEM if memory allocation failed.

[1] https://syzkaller.appspot.com/bug?id=a55ba438506fe68649a5f50d2d82d56b365e0107

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 8db6c34f1d ("Introduce v3 namespaced file capabilities")
Reported-by: syzbot <syzbot+9369930ca44f29e60e2d@syzkaller.appspotmail.com>
Cc: stable <stable@vger.kernel.org> # 4.14+
Acked-by: Serge E. Hallyn <serge@hallyn.com>
Acked-by: James Morris <james.morris@microsoft.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2018-04-10 19:17:41 -05:00
Eric Biggers
dc32b5c3e6 capabilities: fix buffer overread on very short xattr
If userspace attempted to set a "security.capability" xattr shorter than
4 bytes (e.g. 'setfattr -n security.capability -v x file'), then
cap_convert_nscap() read past the end of the buffer containing the xattr
value because it accessed the ->magic_etc field without verifying that
the xattr value is long enough to contain that field.

Fix it by validating the xattr value size first.

This bug was found using syzkaller with KASAN.  The KASAN report was as
follows (cleaned up slightly):

    BUG: KASAN: slab-out-of-bounds in cap_convert_nscap+0x514/0x630 security/commoncap.c:498
    Read of size 4 at addr ffff88002d8741c0 by task syz-executor1/2852

    CPU: 0 PID: 2852 Comm: syz-executor1 Not tainted 4.15.0-rc6-00200-gcc0aac99d977 #253
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
    Call Trace:
     __dump_stack lib/dump_stack.c:17 [inline]
     dump_stack+0xe3/0x195 lib/dump_stack.c:53
     print_address_description+0x73/0x260 mm/kasan/report.c:252
     kasan_report_error mm/kasan/report.c:351 [inline]
     kasan_report+0x235/0x350 mm/kasan/report.c:409
     cap_convert_nscap+0x514/0x630 security/commoncap.c:498
     setxattr+0x2bd/0x350 fs/xattr.c:446
     path_setxattr+0x168/0x1b0 fs/xattr.c:472
     SYSC_setxattr fs/xattr.c:487 [inline]
     SyS_setxattr+0x36/0x50 fs/xattr.c:483
     entry_SYSCALL_64_fastpath+0x18/0x85

Fixes: 8db6c34f1d ("Introduce v3 namespaced file capabilities")
Cc: <stable@vger.kernel.org> # v4.14+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2018-01-02 20:49:13 +11:00
Linus Torvalds
55b3a0cb5a Merge branch 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull general security subsystem updates from James Morris:
 "TPM (from Jarkko):
   - essential clean up for tpm_crb so that ARM64 and x86 versions do
     not distract each other as much as before

   - /dev/tpm0 rejects now too short writes (shorter buffer than
     specified in the command header

   - use DMA-safe buffer in tpm_tis_spi

   - otherwise mostly minor fixes.

  Smack:
   - base support for overlafs

  Capabilities:
   - BPRM_FCAPS fixes, from Richard Guy Briggs:

     The audit subsystem is adding a BPRM_FCAPS record when auditing
     setuid application execution (SYSCALL execve). This is not expected
     as it was supposed to be limited to when the file system actually
     had capabilities in an extended attribute. It lists all
     capabilities making the event really ugly to parse what is
     happening. The PATH record correctly records the setuid bit and
     owner. Suppress the BPRM_FCAPS record on set*id.

  TOMOYO:
   - Y2038 timestamping fixes"

* 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (28 commits)
  MAINTAINERS: update the IMA, EVM, trusted-keys, encrypted-keys entries
  Smack: Base support for overlayfs
  MAINTAINERS: remove David Safford as maintainer for encrypted+trusted keys
  tomoyo: fix timestamping for y2038
  capabilities: audit log other surprising conditions
  capabilities: fix logic for effective root or real root
  capabilities: invert logic for clarity
  capabilities: remove a layer of conditional logic
  capabilities: move audit log decision to function
  capabilities: use intuitive names for id changes
  capabilities: use root_priveleged inline to clarify logic
  capabilities: rename has_cap to has_fcap
  capabilities: intuitive names for cap gain status
  capabilities: factor out cap_bprm_set_creds privileged root
  tpm, tpm_tis: use ARRAY_SIZE() to define TPM_HID_USR_IDX
  tpm: fix duplicate inline declaration specifier
  tpm: fix type of a local variables in tpm_tis_spi.c
  tpm: fix type of a local variable in tpm2_map_command()
  tpm: fix type of a local variable in tpm2_get_cc_attrs_tbl()
  tpm-dev-common: Reject too short writes
  ...
2017-11-13 10:30:44 -08:00
Richard Guy Briggs
dbbbe1105e capabilities: audit log other surprising conditions
The existing condition tested for process effective capabilities set by
file attributes but intended to ignore the change if the result was
unsurprisingly an effective full set in the case root is special with a
setuid root executable file and we are root.

Stated again:
- When you execute a setuid root application, it is no surprise and
  expected that it got all capabilities, so we do not want capabilities
  recorded.
        if (pE_grew && !(pE_fullset && (eff_root || real_root) && root_priveleged) )

Now make sure we cover other cases:
- If something prevented a setuid root app getting all capabilities and
  it wound up with one capability only, then it is a surprise and should
  be logged.  When it is a setuid root file, we only want capabilities
  when the process does not get full capabilities..
        root_priveleged && setuid_root && !pE_fullset

- Similarly if a non-setuid program does pick up capabilities due to
  file system based capabilities, then we want to know what capabilities
  were picked up.  When it has file system based capabilities we want
  the capabilities.
        !is_setuid && (has_fcap && pP_gained)

- If it is a non-setuid file and it gets ambient capabilities, we want
  the capabilities.
        !is_setuid && pA_gained

- These last two are combined into one due to the common first parameter.

Related: https://github.com/linux-audit/audit-kernel/issues/16

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-10-20 15:22:46 +11:00
Richard Guy Briggs
588fb2c7e2 capabilities: fix logic for effective root or real root
Now that the logic is inverted, it is much easier to see that both real
root and effective root conditions had to be met to avoid printing the
BPRM_FCAPS record with audit syscalls.  This meant that any setuid root
applications would print a full BPRM_FCAPS record when it wasn't
necessary, cluttering the event output, since the SYSCALL and PATH
records indicated the presence of the setuid bit and effective root user
id.

Require only one of effective root or real root to avoid printing the
unnecessary record.

Ref: commit 3fc689e96c ("Add audit_log_bprm_fcaps/AUDIT_BPRM_FCAPS")
See: https://github.com/linux-audit/audit-kernel/issues/16

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-10-20 15:22:45 +11:00
Richard Guy Briggs
c0d1adefe0 capabilities: invert logic for clarity
The way the logic was presented, it was awkward to read and verify.
Invert the logic using DeMorgan's Law to be more easily able to read and
understand.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Acked-by: Kees Cook <keescook@chromium.org>
Okay-ished-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-10-20 15:22:45 +11:00
Richard Guy Briggs
02ebbaf48c capabilities: remove a layer of conditional logic
Remove a layer of conditional logic to make the use of conditions
easier to read and analyse.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Acked-by: Kees Cook <keescook@chromium.org>
Okay-ished-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-10-20 15:22:45 +11:00
Richard Guy Briggs
9fbc2c7964 capabilities: move audit log decision to function
Move the audit log decision logic to its own function to isolate the
complexity in one place.

Suggested-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Acked-by: Kees Cook <keescook@chromium.org>
Okay-ished-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-10-20 15:22:44 +11:00
Richard Guy Briggs
81a6a01299 capabilities: use intuitive names for id changes
Introduce a number of inlines to make the use of the negation of
uid_eq() easier to read and analyse.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Acked-by: Kees Cook <keescook@chromium.org>
Okay-ished-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-10-20 15:22:44 +11:00
Richard Guy Briggs
9304b46c91 capabilities: use root_priveleged inline to clarify logic
Introduce inline root_privileged() to make use of SECURE_NONROOT
easier to read.

Suggested-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Acked-by: Kees Cook <keescook@chromium.org>
Okay-ished-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-10-20 15:22:44 +11:00
Richard Guy Briggs
fc7eadf768 capabilities: rename has_cap to has_fcap
Rename has_cap to has_fcap to clarify it applies to file capabilities
since the entire source file is about capabilities.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Acked-by: Kees Cook <keescook@chromium.org>
Okay-ished-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-10-20 15:22:44 +11:00
Richard Guy Briggs
4c7e715fc8 capabilities: intuitive names for cap gain status
Introduce macros cap_gained, cap_grew, cap_full to make the use of the
negation of is_subset() easier to read and analyse.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Acked-by: Kees Cook <keescook@chromium.org>
Okay-ished-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-10-20 15:22:43 +11:00
Richard Guy Briggs
db1a8922cf capabilities: factor out cap_bprm_set_creds privileged root
Factor out the case of privileged root from the function
cap_bprm_set_creds() to make the latter easier to read and analyse.

Suggested-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Acked-by: Kees Cook <keescook@chromium.org>
Okay-ished-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-10-20 15:22:43 +11:00
Colin Ian King
76ba89c76f commoncap: move assignment of fs_ns to avoid null pointer dereference
The pointer fs_ns is assigned from inode->i_ib->s_user_ns before
a null pointer check on inode, hence if inode is actually null we
will get a null pointer dereference on this assignment. Fix this
by only dereferencing inode after the null pointer check on
inode.

Detected by CoverityScan CID#1455328 ("Dereference before null check")

Fixes: 8db6c34f1d ("Introduce v3 namespaced file capabilities")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: stable@vger.kernel.org
Acked-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-10-19 13:09:33 +11:00
Linus Torvalds
a302824782 Merge branch 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull misc security layer update from James Morris:
 "This is the remaining 'general' change in the security tree for v4.14,
  following the direct merging of SELinux (+ TOMOYO), AppArmor, and
  seccomp.

  That's everything now for the security tree except IMA, which will
  follow shortly (I've been traveling for the past week with patchy
  internet)"

* 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  security: fix description of values returned by cap_inode_need_killpriv
2017-09-24 11:40:41 -07:00
Stefan Berger
ab5348c9c2 security: fix description of values returned by cap_inode_need_killpriv
cap_inode_need_killpriv returns 1 if security.capability exists and
has a value and inode_killpriv() is required, 0 otherwise. Fix the
description of the return value to reflect this.

Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-09-23 21:15:41 -07:00
Linus Torvalds
dd198ce714 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace updates from Eric Biederman:
 "Life has been busy and I have not gotten half as much done this round
  as I would have liked. I delayed it so that a minor conflict
  resolution with the mips tree could spend a little time in linux-next
  before I sent this pull request.

  This includes two long delayed user namespace changes from Kirill
  Tkhai. It also includes a very useful change from Serge Hallyn that
  allows the security capability attribute to be used inside of user
  namespaces. The practical effect of this is people can now untar
  tarballs and install rpms in user namespaces. It had been suggested to
  generalize this and encode some of the namespace information
  information in the xattr name. Upon close inspection that makes the
  things that should be hard easy and the things that should be easy
  more expensive.

  Then there is my bugfix/cleanup for signal injection that removes the
  magic encoding of the siginfo union member from the kernel internal
  si_code. The mips folks reported the case where I had used FPE_FIXME
  me is impossible so I have remove FPE_FIXME from mips, while at the
  same time including a return statement in that case to keep gcc from
  complaining about unitialized variables.

  I almost finished the work to get make copy_siginfo_to_user a trivial
  copy to user. The code is available at:

     git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git neuter-copy_siginfo_to_user-v3

  But I did not have time/energy to get the code posted and reviewed
  before the merge window opened.

  I was able to see that the security excuse for just copying fields
  that we know are initialized doesn't work in practice there are buggy
  initializations that don't initialize the proper fields in siginfo. So
  we still sometimes copy unitialized data to userspace"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  Introduce v3 namespaced file capabilities
  mips/signal: In force_fcr31_sig return in the impossible case
  signal: Remove kernel interal si_code magic
  fcntl: Don't use ambiguous SIG_POLL si_codes
  prctl: Allow local CAP_SYS_ADMIN changing exe_file
  security: Use user_namespace::level to avoid redundant iterations in cap_capable()
  userns,pidns: Verify the userns for new pid namespaces
  signal/testing: Don't look for __SI_FAULT in userspace
  signal/mips: Document a conflict with SI_USER with SIGFPE
  signal/sparc: Document a conflict with SI_USER with SIGFPE
  signal/ia64: Document a conflict with SI_USER with SIGFPE
  signal/alpha: Document a conflict with SI_USER for SIGTRAP
2017-09-11 18:34:47 -07:00
Serge E. Hallyn
8db6c34f1d Introduce v3 namespaced file capabilities
Root in a non-initial user ns cannot be trusted to write a traditional
security.capability xattr.  If it were allowed to do so, then any
unprivileged user on the host could map his own uid to root in a private
namespace, write the xattr, and execute the file with privilege on the
host.

However supporting file capabilities in a user namespace is very
desirable.  Not doing so means that any programs designed to run with
limited privilege must continue to support other methods of gaining and
dropping privilege.  For instance a program installer must detect
whether file capabilities can be assigned, and assign them if so but set
setuid-root otherwise.  The program in turn must know how to drop
partial capabilities, and do so only if setuid-root.

This patch introduces v3 of the security.capability xattr.  It builds a
vfs_ns_cap_data struct by appending a uid_t rootid to struct
vfs_cap_data.  This is the absolute uid_t (that is, the uid_t in user
namespace which mounted the filesystem, usually init_user_ns) of the
root id in whose namespaces the file capabilities may take effect.

When a task asks to write a v2 security.capability xattr, if it is
privileged with respect to the userns which mounted the filesystem, then
nothing should change.  Otherwise, the kernel will transparently rewrite
the xattr as a v3 with the appropriate rootid.  This is done during the
execution of setxattr() to catch user-space-initiated capability writes.
Subsequently, any task executing the file which has the noted kuid as
its root uid, or which is in a descendent user_ns of such a user_ns,
will run the file with capabilities.

Similarly when asking to read file capabilities, a v3 capability will
be presented as v2 if it applies to the caller's namespace.

If a task writes a v3 security.capability, then it can provide a uid for
the xattr so long as the uid is valid in its own user namespace, and it
is privileged with CAP_SETFCAP over its namespace.  The kernel will
translate that rootid to an absolute uid, and write that to disk.  After
this, a task in the writer's namespace will not be able to use those
capabilities (unless rootid was 0), but a task in a namespace where the
given uid is root will.

Only a single security.capability xattr may exist at a time for a given
file.  A task may overwrite an existing xattr so long as it is
privileged over the inode.  Note this is a departure from previous
semantics, which required privilege to remove a security.capability
xattr.  This check can be re-added if deemed useful.

This allows a simple setxattr to work, allows tar/untar to work, and
allows us to tar in one namespace and untar in another while preserving
the capability, without risking leaking privilege into a parent
namespace.

Example using tar:

 $ cp /bin/sleep sleepx
 $ mkdir b1 b2
 $ lxc-usernsexec -m b:0:100000:1 -m b:1:$(id -u):1 -- chown 0:0 b1
 $ lxc-usernsexec -m b:0:100001:1 -m b:1:$(id -u):1 -- chown 0:0 b2
 $ lxc-usernsexec -m b:0:100000:1000 -- tar --xattrs-include=security.capability --xattrs -cf b1/sleepx.tar sleepx
 $ lxc-usernsexec -m b:0:100001:1000 -- tar --xattrs-include=security.capability --xattrs -C b2 -xf b1/sleepx.tar
 $ lxc-usernsexec -m b:0:100001:1000 -- getcap b2/sleepx
   b2/sleepx = cap_sys_admin+ep
 # /opt/ltp/testcases/bin/getv3xattr b2/sleepx
   v3 xattr, rootid is 100001

A patch to linux-test-project adding a new set of tests for this
functionality is in the nsfscaps branch at github.com/hallyn/ltp

Changelog:
   Nov 02 2016: fix invalid check at refuse_fcap_overwrite()
   Nov 07 2016: convert rootid from and to fs user_ns
   (From ebiederm: mar 28 2017)
     commoncap.c: fix typos - s/v4/v3
     get_vfs_caps_from_disk: clarify the fs_ns root access check
     nsfscaps: change the code split for cap_inode_setxattr()
   Apr 09 2017:
       don't return v3 cap for caps owned by current root.
      return a v2 cap for a true v2 cap in non-init ns
   Apr 18 2017:
      . Change the flow of fscap writing to support s_user_ns writing.
      . Remove refuse_fcap_overwrite().  The value of the previous
        xattr doesn't matter.
   Apr 24 2017:
      . incorporate Eric's incremental diff
      . move cap_convert_nscap to setxattr and simplify its usage
   May 8, 2017:
      . fix leaking dentry refcount in cap_inode_getsecurity

Signed-off-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2017-09-01 14:57:15 -05:00
Kees Cook
ee67ae7ef6 commoncap: Move cap_elevated calculation into bprm_set_creds
Instead of a separate function, open-code the cap_elevated test, which
lets us entirely remove bprm->cap_effective (to use the local "effective"
variable instead), and more accurately examine euid/egid changes via the
existing local "is_setid".

The following LTP tests were run to validate the changes:

	# ./runltp -f syscalls -s cap
	# ./runltp -f securebits
	# ./runltp -f cap_bounds
	# ./runltp -f filecaps

All kernel selftests for capabilities and exec continue to pass as well.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: James Morris <james.l.morris@oracle.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
2017-08-01 12:03:09 -07:00
Kees Cook
46d98eb4e1 commoncap: Refactor to remove bprm_secureexec hook
The commoncap implementation of the bprm_secureexec hook is the only LSM
that depends on the final call to its bprm_set_creds hook (since it may
be called for multiple files, it ignores bprm->called_set_creds). As a
result, it cannot safely _clear_ bprm->secureexec since other LSMs may
have set it.  Instead, remove the bprm_secureexec hook by introducing a
new flag to bprm specific to commoncap: cap_elevated. This is similar to
cap_effective, but that is used for a specific subset of elevated
privileges, and exists solely to track state from bprm_set_creds to
bprm_secureexec. As such, it will be removed in the next patch.

Here, set the new bprm->cap_elevated flag when setuid/setgid has happened
from bprm_fill_uid() or fscapabilities have been prepared. This temporarily
moves the bprm_secureexec hook to a static inline. The helper will be
removed in the next patch; this makes the step easier to review and bisect,
since this does not introduce any changes to inputs nor outputs to the
"elevated privileges" calculation.

The new flag is merged with the bprm->secureexec flag in setup_new_exec()
since this marks the end of any further prepare_binprm() calls.

Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Acked-by: James Morris <james.l.morris@oracle.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
2017-08-01 12:03:08 -07:00
Kirill Tkhai
64db4c7f4c security: Use user_namespace::level to avoid redundant iterations in cap_capable()
When ns->level is not larger then cred->user_ns->level,
then ns can't be cred->user_ns's descendant, and
there is no a sense to search in parents.

So, break the cycle earlier and skip needless iterations.

v2: Change comment on suggested by Andy Lutomirski.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2017-07-20 07:46:06 -05:00