This is a follow-up for d5ee050ffc, and
reintroduces a requirement dep from systemd-journal-flush.service onto
systemd-journald.service, but a weaker one than originally: a Wants= one
instead of a Requires= one.
Why? Simply because the service issues an IPC call to the journald,
hence it should pull it in. (Note that socket activation doesn't happen
for the Varlink socket it uses, hence we should pull in the service
itself.)
The systemd-oomd.service unit contains
[Install]
WantedBy=multi-user.target
Alias=dbus-org.freedesktop.oom1.service
which means the symlink is supposed to be created dynamically when the
service is enabled.
In the olden days systemd-resolved used dbus and it didn't make sense to start
it before dbus which is started fairly late. But we have mostly ported resolved
over to varlink. The queries from nss-resolve are done using varlink, so name
resolution can work without dbus. resolvectl still uses dbus, so e.g. 'resolvectl
query' will not work, but by starting systemd-resolved earlier we're not making this
any worse.
If systemd-resolved is started after dbus, it registers the name and everything
is fine. If it is started before dbus, it'll watch for the dbus socket and
connect later. So it should be fine to start systemd-resolved earlier. (If dbus
is stopped and restarted, unfortunately systemd-resolved does not reconnect.
This seems to be a small bug: since our daemons know how to watch for
dbus.socket, they could restart the watch if they ever lose the connection. But
this scenario shouldn't happen in normal boot, and restarting dbus is not
supported anyway.)
Moving the start earlier the following advantages:
- name resolution becomes availabe earlier, in particular for synthesized
hostnames even before the network is up.
- basic.target is part of initrd.target, so systemd-resolved will get started
in the initrd if installed. This is required for nfs-root when the server is
specified using a name (https://bugzilla.redhat.com/show_bug.cgi?id=2037311).
Otherwise, systemd-homed-active.service will fail to deactivate all
homes because homectl can no longer talk to homed if dbus stops first.
As a result, /home cannot be umounted.
Doing this on systemd-homed-active.service instead works as well, but
systemd-homed will exit 1 if dbus is already shut down.
It is used by udevd and networkd. Since udevd is enabled statically, let's also
change the preset to "on". networkd is opt-in, so let's pull in the generator
when enabling networkd too.
Fixes#21626. (The bug report talks about /run, but the issue is actually with
/tmp.) People use /tmp for various things that fit in memory, e.g. unpacking
packages, and 400k is not much. Let's raise is a bit.
Programs run by udev triggers may need to execute the bpf() syscall. Even more
so, since on a cgroup v2 system, the only way to set up device access filtering
is to install a BPF program on the cgroup in question and one way of passing
data to such program is through BPF maps, which can only be access using the
bpf() syscall. One such use case was identified in RHBZ#2025264 related to
snap-device-helper, and led to RHBZ#2027627 being filed.
Unfortunately there is no finer grained control over what gets passed in the
syscall, so just enable bpf() and leave fine grained mediation to other
security layers (eg. SELinux).
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2027627
Signed-off-by: Maciek Borzecki <maciek.borzecki@gmail.com>
Due to the fact that systemd-journal-flush.service has
"Requires=systemd-journald.service", this service is stopped too when journald
is requested to do so.
However stopping systemd-journal-flush.service implies that journald
relinquishes /var hence implicitly switching back to the volatile storage
mode and removing /run/systemd/journal/flushed.
If journald is started afterwards, it will run in volatile storage mode
regardless of the value of 'Storage=' as it believes now that /var is not yet
ready (because the flushed flag is missing).
Because this flag is mainly an indication for journald that the initialization
of /var/log/journal (during the boot process) has been done,
systemd-journal-flush.service shouldn't be tied to the state of journald itself
but to the state of /var/log/journal, hence to the state of the system.
Parsing objects is risky as data could be malformed or malicious,
so avoid doing that from the main systemd-coredump process and
instead fork another process, and set it to avoid generating
core files itself.
Users may use rules that refer to binaries e.g. in /opt or /usr/local,
and those directories may be separate mount points. We don't need the
binfmt rules in early boot, so let's delay the service so that we can
rely on the full local filesystem being visible.
Fixes#21178.
When using "capture : true" in custom_target()s the mode of the source
file is not preserved when the generated file is not installed and so
needs to be tweaked manually. Switch from output capture to creating the
target file and copy the permissions from the input file.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
If the tty arg is set to "-", agetty uses the stdin fd as the tty.
Let's pass the tty this way so that we keep an fd open to the tty
at all times. If all fd's to a tty are closed, the kernel might
reset the tty which we want to avoid.
This adds support for dm integrity targets and an associated
/etc/integritytab file which is required as the dm integrity device
super block doesn't include all of the required metadata to bring up
the device correctly. See integritytab man page for details.
Let's make it slightly more likely that a per-user service manager is
killed than any system service. We use a conservative 100 (from a range
that goes all the way to 1000).
Replaces: #17426
Together with the previous commit this means: system manager and system
services are placed at OOM score adjustment 0 (specifically: they
inherit kernel default of 0). User service manager (both for root and
non-root) are placed at 100. User services for non-root are placed at
200, those for root inherit 100.
Note that processes forked off the user *sessions* (i.e. not forked off
the per-user service manager) remain at 0 (e.g. the shell process
created by a tty or ssh login). This probably should be
addressed too one day (maybe in pam_systemd?), but is not covered here.