* cmd/libsnap-confine-private: do not deny all devices when reusing the device cgroup
With device cgroup v1, when reusing the cgroup (i.e. opening with
SC_DEVICE_CGROUP_FROM_EXISTING flag), we should not deny all devices, as this
will negatively affect the processes that are in the group.
This code path was executed by snap-device-helper, so it is possible that when
processing of real events from device changes the group could have become
broken.
Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>
* cmd/libsnap-confine-private/device-cgroup-support.c: add comment
Co-authored-by: Ian Johnson <person.uwsome@gmail.com>
* tests/main/security-device-cgroups-strict-enforced: verify that udev changes do not break device group settings
Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>
* tests/main/security-device-cgroups-strict-enforced: skip triggering events on 14.04
Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>
Co-authored-by: Alberto Mardegan <mardy@users.sourceforge.net>
Co-authored-by: Ian Johnson <person.uwsome@gmail.com>
Historically, a snap process ended up in a device cgroup (with device filtering)
only when there were assigned devices for it. On systems where CURRENT_TAGS is
supported and set by systemd/udev, snap-confine needs to do 2 passes on the list
of assigned devices. It may happen, that despite snap tag being present in the
TAGS list, it will not be present in the CURRENT_TAGS, in which case we may end
up in a scenario when no devices are actually assigned to the snap. The current
code would incorrectly handle such situation, and move the process into device
cgroup.
The branch introduces a lazy initialization of device cgroup and moves the
process to the group (or sets up device filtering on v2) only when there were
any assigned device.
Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>
It is possible that the snap process has not been moved to the snap specific
tracking cgroup. In this case, setting up a device filtering on the group can
negatively affect whatever group the process is part of. Try to catch test
scenarios when thiss happens, so that we may reach a reasonable solution.
Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>
Observed on an Impish s390x machine:
root@test-i:~# ls -lah /usr/lib/ld64.so.1
lrwxrwxrwx 1 root root 25 Sep 2 21:26 /usr/lib/ld64.so.1 -> s390x-linux-gnu/ld64.so.1
Which comes from the new libc6, and is different from what we expected on
previous releases, see for example Hirsute:
root@testing-h:~# ls -lah /usr/lib/ld64.so.1
lrwxrwxrwx 1 root root 26 Mar 31 2021 /usr/lib/ld64.so.1 -> s390x-linux-gnu/ld-2.33.so
The latter matched the existing rule, while the former does not. The new rule
allows both.
Signed-off-by: Ian Johnson <ian.johnson@canonical.com>
* snap-bootstrap: wait in `mountNonDataPartitionMatchingKernelDisk`
The current snap-bootstrap has a race when mounting the seed
partition in `mountNonDataPartitionMatchingKernelDisk` on EFI
systems.
The code determines the partUUID of the disk that booted the
kernel by reading the EFI LoaderDevicePartUUID variable. However
there is no guarantee that this partition is available when
snap-bootstrap runs, the kernel may still enumerate the HW.
This can be observed on a fast NUC when booting from a USB
stick.
Note that the `the-tool.serice` already has a
"After=systemd-udev-settle.service" set but that is still
racy because any `udev settle` (or `udev trigger --settle`)
is racy, the only option is to poll for the part uuid to
appear.
This is a minimal commit to avoid too much churn in code.
Thanks to Sertac for reporting this bug.
* snap-bootstrap: rework waitPartSrc to improve testing (thanks to Alberto and Ian)
* snap-bootstrap: show a log message if waitPartSrc needs to wait
If waitPartSrc needs to wait for the device this commit makes it
show a logger.Noticef() message. The message is only shown once
because waiting for the device is usually super quick and if it
is not that is most likely an error anyway so spamming the
terminal will not help.
* snap-bootstrap: add test that ensures that if no waiting is needed for partSrc no log message is displayed
* snap-bootstrap: rename waitPartSrc -> waitFile
* snap-bootstrap: fix time.Duration() casting on 32bit systems
On some architectures, such as aarch64, we've seen the ld.so interpreter to be
named like lib-linux-aarch64.so.1. However, snap-confine's AppArmor profile
prevents that path from being loaded, thus breaking execution of snaps. The
following denial was observed on affected systems:
audit: type=1400 audit(1632477434.031:8902): apparmor="DENIED" operation="file_mmap"
namespace="root//lxd-happy-impish_<var-snap-lxd-common-lxd>"
profile="/snap/snapd/12886/usr/lib/snapd/snap-confine"
name="/usr/lib/aarch64-linux-gnu/ld-linux-aarch64.so.1"
pid=1101743 comm="snap-confine" requested_mask="m" denied_mask="m"
fsuid=1000000 ouid=1000000
Fixes: https://bugs.launchpad.net/ubuntu/+source/snapd/+bug/1944004
Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>
Clang 13.0 generates this warning:
libsnap-confine-private/cgroup-support-test.c:282:15: error: variable 'p' set but not used [-Werror,-Wunused-but-set-variable]
char *p SC_CLEANUP(sc_cleanup_string) = NULL;
^
1 error generated.
Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>
Old apparmor_parser complains about that. OTOH things worked without that.
On 16.04:
+ apparmor_parser -r squashfs-root/etc/apparmor.d/usr.lib.snapd.snap-confine.real
AppArmor parser error for squashfs-root/etc/apparmor.d/usr.lib.snapd.snap-confine.real in squashfs-root/etc/apparmor.d/usr.lib.snapd.snap-confine.real at line 81: Invalid capability bpf.
Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>
The pre 5.11 kernels used to use rlimit memlock for accounting of memory used by
BPF objects such as maps and programs. This has been fixed in 5.11 with
https://lkml.org/lkml/2020/7/24/1318 where the accounting is better. However,
some systems such as Debian Sid still use older kernels. That combined with a
very low rlimit memlock setting (64k) results in snap-confine erroring out due
to -EPERM on bpf(BPF_MAP_CREATE, ..). Further investigation showed that the
memlock size of the BPF map we create (9+1 bytes * 500 entries) is 11 (45k)
pages on those systems. While on systems sporting 5.11 kernel, the memlock as
reported by bpftool is just 2 pages (8k).
The patch bumps the memlock limit to 512k when necessary. Which is enough to fit
the map and a beefy program.
Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>