Re-watching pids on cgroup v1 (needed because of unreliability of cgroup
empty notifications in containers) is handled bellow at the end of
service_sigchld_event() and depends on value main_pid_known flag.
In CentOS Stream 8 container on cgroup v1 the stop action would get stuck
indefinitely on unit like this,
$ cat /run/systemd/system/foo.service
[Service]
ExecStart=/bin/bash -c 'trap "nohup sleep 1 & exit 0" TERM; sleep infinity'
ExecStop=/bin/bash -c 'kill -s TERM $MAINPID'
TimeoutSec=0
However, upstream works "fine" because in upstream version of systemd we
actually never wait on processes killed in containers and proceed
immediately to sending SIGKILL hence re-watching of pids in the cgroup
is not necessary. But for the sake of correctness we should merge the
patch also upstream.
We ship with empty /var, so /var/log/journal does not exist, which
means journald does not do persistent logging. Let's fix that by
setting the config to explicitly enable persistent logging.
Since bash has no namespaces, let's do the second best thing and prefix
all "internal" stuff with an underscore, to minimize the chance of a name
conflict in the future.
This reverts commit 1db6dbb1dc.
The original patch was reverted because of issue #25369. The issue was created
because it wrongly assumed that sd_journal_seek_tail() seeks to 'current' tail.
But in fact, only when a subsequent sd_journal_previous() is called that it's
pointing to the tail at that time. The concept of 'tail' in sd_journal_seek_tail()
only has a logical meaning, and a sd_journal_previous is needed. In fact, if we
look at the codes in journalctl, we can see sd_journal_seek_tail() is followed by
sd_journal_previous(). By contrary, a sd_journal_next() after a 'logical' tail does
not make much sense. So the original patch is correct, and projects that are
using 'sd_journal_next()' right after 'sd_journal_seek_tail()' should do fixes
as in https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/2823#note_1637715.