Before this commit, $USER, $HOME, $LOGNAME and $SHELL are only
set when User= is set for the unit. For system service, this
results in different behaviors depending on whether User=root is set.
$USER always makes sense on its own, so let's set it unconditionally.
Ideally $HOME should be set too, but it causes trouble when e.g. getty
passes '-p' to login(1), which then doesn't override $HOME. $LOGNAME and
$SHELL are more like "login environments", and are generally not
suitable for system services. Therefore, a new option SetLoginEnvironment=
is also added to control the latter three variables.
Fixes#23438
Replaces #8227
When we're starting early boot services such as systemd-userdbd.service,
/tmp might not yet be mounted, so let's use a directory in /run instead
which is guaranteed to be available.
This adds a new structure UnitDefaults which embedds the various default
settings for units we maintain. We so far maintained two sets of
variables for this, one in main.c as static variables and one in the
Manager structure. This moves them into a common structure.
This is most just search/replace, i.e. very dumb refactoring.
The fact that we now use a common structure for this allows us further
refactorings later.
Inspired by the discussions on #27890
This reverts commits
- 9ae3624889
"test-execute: add tests for credentials directory with mount namespace"↲
- 94fe4cf255
"core: do not leak mount for credentials directory if mount namespace is enabled",
- 7241b9cd72
"core/credential: make setup_credentials() return path to credentials directory",
- fbaf3b23ae
"core: set $CREDENTIALS_DIRECTORY only when we set up credentials"
Before the commits, credentials directory set up on ExecStart= was kept
on e.g. ExecStop=. But, with the changes, if a service requests a
private mount namespace, the credentials directory is discarded after
ExecStart= is finished.
Let's revert the change, and find better way later.
Addresses the post-merge comment
https://github.com/systemd/systemd/pull/28787#issuecomment-1690614202.
Since kernel v5.2, open_tree() and move_mount() are added. If a service
loads or sets credentials, then let's try to clone the mount that contains
credentials with open_tree(), then mount it after a (private) mount
namespace is initialized for the service. Then, we can setup a mount for
credentials directory without leaking it to the main shared mount
namespace.
With this change, the credentials for services that request their own
private mount namespace become much much safer. And, the number of mount
events triggered by setting up credential directories can be decreased.
Unfortunately, this does not 'fix' the original issue #25527, as the
reported service does not requests private mount namespace, but the
situation should be better now.
seccomp-util.h doesn't need ifdeffing, hence don't. It has worked since
quite a while with HAVE_SECCOMP is off, hence use it everywhere.
Also drop explicit seccomp.h inclusion everywhere (which needs
HAVE_SECCOMP ifdeffery everywhere). seccomp-util.h includes it anyway,
automatically, which we can just rely on, and it deals with HAVE_SECCOMP
at one central place.
If someone reads /run/host/os-release at the exact same time it is being updated, and it
is large enough, they might read a half-written file. This is very unlikely as
os-release is typically small and very rarely changes, but it is not
impossible.
Bind mount a staging directory instead of the file, and symlink the file
into into, so that we can do atomic file updates and close this gap.
Atomic replacement creates a new inode, so existing bind mounts would
continue to see the old file, and only new services would see the new file.
The indirection via the directory allows to work around this, as the
directory is fixed and never changes so the bind mount is always valid,
and its content is shared with all existing services.
Fixes https://github.com/systemd/systemd/issues/28794
Follow-up for 3f37a82545
exec_child() is supposed to set *exit_status when returning failure.
Unfortunately, we didn't do that in two cases. The result would be:
- a bogus error message "Failed at step SUCCESS spawning foo: …",
- a bogus success exit status.
Bugs introduced in 390902012c and
ad21e542b2.
The code is reworked to add some asserts and not set exit_status in the caller
so that it's clearer (also to the compiler) that it needs to be set.
For a userns root user to be able to access the credentials, both
the uid and gid of the credentials directory have to be mapped into
the userns. Currently, the credentials directory group is root, which
we obviously do not want to map in to a userns, so let's make sure
that the credentials directory and files are owned by the service
group instead, which can generally be safely mapped into the userns.
Since we use permissions mode 0600, this shouldn't cause any change
in who is able to access the credentials.
Fixes#28747
Given that ERRNO_IS_PRIVILEGE() also matches positive values,
make sure this macro is not called with arguments that do not have
errno semantics.
In this case the arguments passed to ERRNO_IS_PRIVILEGE() are the values
returned by set_oom_score_adjust() and set_coredump_filter() which are
not expected to return any positive values, but let's be consistent
anyway and move the ERRNO_IS_PRIVILEGE() invocations to the branches
where the return values are known to be negative.