Otherwise, if a service unit that requests LogNamespace= stopped before
systemd-journald@.service is started, logs generated by the service will be
lost, as systemd-journald@.socket is stopped and
systemd-journald@.service will never started.
To prevent the issue, let's introduce another implicit dependency to
a oneshot service that explicitly synchronizes a namespaced journal file
when the log namespace is not needed anymore.
Fixes#32604.
It's ordered with networkd, but just in case. Lintian complains:
W: systemd: systemd-service-file-shutdown-problems [usr/lib/systemd/system/systemd-networkd-persistent-storage.service]
Follow-up for 91676b6458
Right now systemd-tpm2-setup-early and systemd-pcrphase-initrd.service
are not ordered against each other. However, they require the same slow
resource to operate: the TPM2. If we allow them to access the device
simultaneously, the kernel resource manager like has to save/restore TPM
state while they operate, slowing things down further.
hence, let's avoid all this mess, and just order them against each other
so that the shared resource is first used in full by one and then by the
other.
I opted to order systemd-pcrphase-initrd before
systemd-tpm2-setup-early, since there's value in having the former as
early as possible in userspace, to be a good marker for the transition
from kernel to first userspace. I can see no benefit in the opposite
order however.
This mimics what we do for systemd-cryptsetup@.service (see
src/shared/generator.c), and makes sense since repart might lock up the
root volume against a TPM, which ideally has its SRK already set up by
then.
More importantly though, this ensures that we ordered correctly after
tpm2.target (which systemd-tpm2-setup-early.service has a dependency
on), for systems where the TPM drivers are not compiled into the kernel.
See: https://lists.freedesktop.org/archives/systemd-devel/2024-April/050201.html
This adds a small, socket-activated Varlink daemon that can delegate UID
ranges for user namespaces to clients asking for it.
The primary call is AllocateUserRange() where the user passes in an
uninitialized userns fd, which is then set up.
There are other calls that allow assigning a mount fd to a userns
allocated that way, to set up permissions for a cgroup subtree, and to
allocate a veth for such a user namespace.
Since the UID assignments are supposed to be transitive, i.e. not
permanent, care is taken to ensure that users cannot create inodes owned
by these UIDs, so that persistancy cannot be acquired. This is
implemented via a BPF-LSM module that ensures that any member of a
userns allocated that way cannot create files unless the mount it
operates on is owned by the userns itself, or is explicitly
allowelisted.
BPF LSM program with contributions from Alexei Starovoitov.
stale HibernateLocation EFI variable
Currently, if the HibernateLocation EFI variable exists,
but we failed to resume from it, the boot carries on
without clearing the stale variable. Therefore, the subsequent
boots would still be waiting for the device timeout,
unless the variable is purged manually.
There's no point to keep trying to resume after a successful
switch-root, because the hibernation image state
would have been invalidated by then. OTOH, we don't
want to clear the variable prematurely either,
i.e. in initrd, since if the resume device is the same
as root one, the boot won't succeed and the user might
be able to try resuming again. So, let's introduce a
unit that only runs after switch-root and clears the var.
Fixes#32021
Drop connections and caches and reload config from files, to allow
for low-interruptions updates, and hook up to the usual SIGHUP and
ExecReload=. Mark servers and services configured directly via D-Bus
so that they can be kept around, and only the configuration file
settings are dropped and reloaded.
Fixes https://github.com/systemd/systemd/issues/17503
Fixes https://github.com/systemd/systemd/issues/20604
c0aeff4b99 added this in one unit file, but the
same problem occurs here. (There are no other files where this would apply.)
I think we should solve this systematically somehow, but it's not clear how to
do that, so until we have that better solution, let's apply the manual solution
so that our units work as expected.
Now, networkd accesses the state directory through the file descriptor
passed from systemd-networkd-persistent-storage.service.
Hence, the networkd itself does not need to access the state directory
through its path, and we can use more stronger mode for ProtectSystem=.
Currently the associated units fail if full tpm support is not available
on the system. Similar to systemd-pcrextend, let's add a --graceful option
that exits gracefully if no full TPM support is detected and use it in both
units.
This new passive target is supposed to be pulled in by SSH
implementations and should be reached when remote SSH access is
possible. The idea is that this target can be used as indicator for
other components to determine if and when SSH access is possible.
One specific usecase for this is the new sd_notify() logic in PID 1 that
sends its own supervisor notifications whenever target units are
reached. This can be used to precisely schedule SSH connections from
host to VM/container, or just to identify systems where SSH is even
available.
"Starting Boot Control…" would be a fairly confusing message in the boot logs.
Use "… Service" to mirror what we have in other services like
systemd-{hostnamed,timedated,portabled,machined,…}.service.