Also: rename Handover → Handoff. I think it makes it clearer that this
is not really about handing over any resources, but that the executor is
out off the game from that point on.
Enable the exec_fd logic for Type=notify* services too, and change it
to send a timestamp instead of a '1' byte. Record the timestamp in a
new ExecMainHandoverTimestamp property so that users can track accurately
when control is handed over from systemd to the service payload, so
that latency and startup performance can be trivially and accurately
tracked and attributed.
Today listen file descriptors created by socket unit don't get passed to
commands in Exec{Start,Stop}{Pre,Post}= socket options.
This prevents ExecXYZ= commands from accessing the created socket FDs to do
any kind of system setup which involves the socket but is not covered by
existing socket unit options.
One concrete example is to insert a socket FD into a BPF map capable of
holding socket references, such as BPF sockmap/sockhash [1] or
reuseport_sockarray [2]. Or, similarly, send the file descriptor with
SCM_RIGHTS to another process, which has access to a BPF map for storing
sockets.
To unblock this use case, pass ListenXYZ= file descriptors to ExecXYZ=
commands as listen FDs [4]. As an exception, ExecStartPre= command does not
inherit any file descriptors because it gets invoked before the listen FDs
are created.
This new behavior can potentially break existing configurations. Commands
invoked from ExecXYZ= might not expect to inherit file descriptors through
sd_listen_fds protocol.
To prevent breakage, add a new socket unit parameter,
PassFileDescriptorsToExec=, to control whether ExecXYZ= programs inherit
listen FDs.
[1] https://docs.kernel.org/bpf/map_sockmap.html
[2] https://lore.kernel.org/r/20180808075917.3009181-1-kafai@fb.com
[3] https://man.archlinux.org/man/socket.7#SO_INCOMING_CPU
[4] https://www.freedesktop.org/software/systemd/man/latest/sd_listen_fds.html
Since signals can take arguments, let's suffix them with () as we
already do with functions. To make sure we remain consistent, make the
`update-dbus-docs.py` script check & fix any occurrences where this is
not the case.
Resolves: #31002
This commit introduces new D-Bus API, StartAuxiliaryScope(). It may be
used by services as part of the restart procedure. Service sends an
array of PID file descriptors corresponding to processes that are part
of the service and must continue running also after service restarts,
i.e. they haven't finished the job why they were spawned in the first
place (e.g. long running video transcoding job). Systemd creates new
scope unit for these processes and migrates them into it. Cgroup
properties of scope are copied from the service so it retains same
cgroup settings and limits as service had.
Users become perplexed when they run their workload in a unit with no
explicit limits configured (moreover, listing the limit property would
even show it's infinity) but they experience unexpected resource
limitation.
The memory and pid limits come as the most visible, therefore add new
unit read-only properties:
- EffectiveMemoryMax=,
- EffectiveMemoryHigh=,
- EffectiveTasksMax=.
These properties represent the most stringent limit systemd is aware of
for the given unit -- and that is typically(*) the effective value.
Implement the properties by simply traversing all parents in the
leaf-slice tree and picking the minimum value. Note that effective
limits are thus defined even for units that don't enable explicit
accounting (because of the hierarchy).
(*) The evasive case is when systemd runs in a cgroupns and cannot
reason about outer setup. Complete solution would need kernel support.
This is the equivalent of RequiresMountsFor=, but adds Wants= instead
of Requires=. It will be useful for example for the autogenerated
systemd-cryptsetup units.
Fixes https://github.com/systemd/systemd/issues/11646
In systemctl-show we only show current swap if ever swapped or non-zero. This
reduces the noise on swapless systems, that would otherwise always show a swap
value that never has the chance to become non-zero. It further reduces the
noise for services that never swapped.