Currently for portable services we automatically add a bind mount
os-release -> /run/host/os-release. This becomes problematic for the
soft-reboot case, as it's likely that portable services will be configured
to survive it, and thus would forever keep a reference to the old host's
os-release, which would be a problem because it becomes outdated, and also
it stops the old rootfs from being garbage collected.
Create a copy when the manager starts under /run/systemd/propagate instead,
and bind mount that for all services using RootDirectory=/RootImage=, so
that on soft-reboot the content gets updated (without creating a new file,
so the existing bind mounts will see the new content too).
This expands the /run/host/os-release protocol to more services, but I
think that's a nice thing to have too.
Closes https://github.com/systemd/systemd/issues/28023
This reverts commit e019ea738d.
In the new approach, a lock on /dev/console will be used. This lock will solve
the issue for services which run in early boot. Services which run later are
ordered after sysinit.target, so they'll run much later anyway so this
automatic dependency is not useful. Let's remove it again to make the code
simpler.
This adds support for the new XDG_STATE_HOME env var that was added to
the xdg basedir spec. Previously, because the basedir spec didn't know
the concept we'd alias the backing dir for StateDirectory= to the one
for ConfigurationDirectory= when runnin in --user mode. With this change
we'll make separate. This brings us various benefits, such as proper
"systemctl clean" support, where we can clear service state separately
from service configuration, now in user mode too.
This does not come without complications: retaining compatibility with
older setups is difficult, because we cannot possibly identitfy which
files in existing populated config dirs are actually "state" and which
one are true" configuration.
Hence let's deal with this pragmatically: if we detect that a service
that has both dirs configured only has the configuration dir existing,
then symlink the state dir to the configuration dir to retain
compatibility.
This is not great, but it's the only somewhat reasonable way out I can
see.
Fixes: #25739
Let's restrict how we apply credential globbing in ImportCredential=, so
that we have some flexibility in automatically extending the glob
expression with per-instance data eventually without getting into
conflict with the globbing parts.
In our current uses we only allow globbing at the end of the expression,
and this is a new, unreleased feature hence let's be restrictive on this
initially. We can still relax this later if we feel the need to after
all.
Fixes: #28022
This setting allows services to run in an ephemeral copy of the root
directory or root image. To make sure the ephemeral copies are always
cleaned up, we add a tmpfiles snippet to unconditionally clean up
/var/lib/systemd/ephemeral. To prevent in use ephemeral copies from
being cleaned up by tmpfiles, we use the newly added COPY_LOCK_BSD
and BTRFS_SNAPSHOT_LOCK_BSD flags to take a BSD lock on the ephemeral
copies which instruct tmpfiles to not touch those ephemeral copies as
long as the BSD lock is held.
ImportCredential= takes a credential name and searches for a matching
credential in all the credential stores we know about it. It supports
globs which are expanded so that all matching credentials are loaded.
This adds support for KSM (kernel samepage merging). It adds a new
boolean parameter called MemoryKSM to enable the feature. The feature
can only be enabled with newer kernels.
The goal of this change is to delay getty services until after
systemd-vconsole-setup has finished. systemd-vconsole-setup starts loadkeys,
and it seems that when loadkeys is interrupted by the TTY hangup call we do
when starting tty services [1], so that loadkeys starts getting EIO from the
ioctl("/dev/tty1", KDSKBENT) syscall it does.
Fixes#26908.
[1] https://github.com/legionus/kbd/issues/92#issuecomment-1554451788
Initially I wanted to add ordering dependencies to individual units, but
TTYVHangup= can be added to other various external units too. The solution with
an implicit dependency should cover those cases too.
Fixes#26413: the docs said that the filter prevents writes, but it just a
filter at the system call level, and some of those calls are used for writing
and reading. This is confusing esp. when a higher level library call like
ntp_gettime() is denied.
I don't think it's realistic that we'll make the filter smarter in the near
future, so let's change the docs to describe the implementation.
Also, split out the advice part into a separate paragraph.
This ensure messages from PID1 regarding a unit also contain those
fields. For example, portable services have PORTABLE=<image> as
extra fields, which is useful to identify which version of a portable
image produced a log message like an error or an oomd kill.
On some ARM platforms, the dynamic linker could use PROT_BTI memory protection
flag with `mprotect(..., PROT_BTI | PROT_EXEC)` to enable additional memory
protection for executable pages. But `MemoryDenyWriteExecute=yes` blocks this
with seccomp filter denying all `mprotect(..., x | PROT_EXEC)`.
Newly preferred method is to use prctl(PR_SET_MDWE) on supported kernels. Then
in-kernel implementation can allow PROT_BTI as necessary, without weakening
MDWE. In-kernel version may also be extended to more sophisticated protections
in the future.