Confexts should not contain code, so mount confexts with noexec.
We cannot mount invidial extensions as noexec, as the overlay ignores
it and bypasses it, we need to use the flag on the whole overlay for
it to be effective.
But given there are legacy scripts still shipped in /etc, allow to
override it with --noexec=false.
Repart considers the start and end of the usable space to the first multiple
of grainsz (at least 4096 bytes). However the first usable LBA of a GPT
partition is at sector 34 (512 bytes sectors) which is not a multiple of 4096.
The backup GPT label at the end also takes up 33 sectors, meaning the last
usable LBA is at 34 sectors from the end, unlikely to be a 4096 multiple as
well.
This meant that the very first and last sectors were never discarded. However
more problematically if an existing partition started before the first
usable grainsz multiple its start didn't get taken into account as a valid
starting point and got its data discarded.
Signed-off-by: Sjoerd Simons <sjoerd@collabora.com>
Just to match service_release_stdio_fd() and service_release_fd_store()
in the name, since they do similar things.
This follows the concept that we "release" resources, and this is all
generically wrapped in "service_release_resources()".
We already clear the various fds we keep from the release_resources()
handler, let's also destroy the runtime dir from there if this
preservation mode is selected.
This makes a minor semantic change: previously we'd keep a runtime
directory around if RuntimeDirectoryPreserve=restart is selected and at
least one JOB_START job was around. With this logic we'll keep it around
a tiny bit longer: as long as any job for the unit is around.
The file descriptors we keep in the fdstore might be basically anything,
let's clean it up with our asynchronous closing feature, to not
deadlock on close().
(Let's also do the same for stdin/stdout/stderr fds, since they might
point to network services these days.)
Now that we have a potentially pinned fdstore let's add a concept for
cleaning it explicitly on user requested. Let's expose this via
"systemctl clean", i.e. the same way as user directories are cleaned.
Oftentimes it is useful to allow the per-service fd store to survive
longer than for a restart. This is useful in various scenarios:
1. An fd to some security relevant object needs to be stashed somewhere,
that should not be cleaned automatically, because the security
enforcement would be dropped then.
2. A user namespace fd should be allocated on first invocation and be
kept around until the user logs out (i.e. systemd --user ends), á la
#16328 (This does not implement what #16318 asks for, but should
solve the use-case discussed there.)
3. There's interest in allow a concept of "userspace reboots" where the
kernel stays running, and userspace is swapped out (i.e. all services
exit, and the rootfs transitioned into a new version of it) while
keeping some select resources pinned, very similar to how we
implement a switch root. Thus it is useful to allow services to exit,
while leaving their fds around till the very end.
This is exposed through a new FileDescriptorStorePreserve= setting that
is closely modelled after RuntimeDirectoryPreserve= (in fact it reused
the same internal type), since we want similar behaviour in the end, and
quite often they probably want to be used together.