This is useful for development where overwriting files out side
the configured prefix will affect the host as well as stateless
systems such as NixOS that don't let packages install to /etc but handle
configuration on their own.
Alternative to https://github.com/systemd/systemd/pull/17501
tested with:
$ mkdir inst build && cd build
$ meson \
-Dcreate-log-dirs=false \
-Dsysvrcnd-path=$(realpath ../inst)/etc/rc.d \
-Dsysvinit-path=$(realpath ../inst)/etc/init.d \
-Drootprefix=$(realpath ../inst) \
-Dinstall-sysconfdir=false \
--prefix=$(realpath ../inst) ..
$ ninja install
We need to make sure that our coredump pattern handler manages to read
process metadata from /proc/$PID/ before the kernel reaps the crashed
process. By default the kernel will reap the process as soon as it can.
By setting kernel.core_pipe_limit to a non-zero the kernel will wait for
userspace to finish before reaping.
We'll set the value to 16, which allows 16 crashes to be
processed in parallel. This matches the MaxConnections= setting in
systemd-coredump.socket.
See: #17301
(This doesn't close 17301, since we probably should also gracefully
handle if /proc/$PID/ vanished already while our coredump handler runs,
just in case people loclly set the sysctl back to zero. i.e. we should
collect what we can and rather issue an incomplete log record than
none.)
Right now the kernel will not dump anything that went through setuid or
setgid. But it is routine for daemons to do that, and it makes things hard to
debug.
systemd-coredump saves the coredump readable by the users the process was
running as. This should be enough to avoid information leakage. So let's also
tell the kernel to do the coredump.
For https://bugzilla.redhat.com/show_bug.cgi?id=1790972.
Both patterns are stored in the same file, so they are enabled or disabled
together. (Though suid_dumpable=2 is supposed to be safe even when writing to
plain files.)
Fixes#6282.
This solution is a bit busy, but we close the race without setting *.all.*, so
it is still possible to set a different setting for particular interfaces.
Setting just "default" is not very useful because any interfaces present before
systemd-sysctl is invoked are not affected. Setting "all" is too harsh, because
the kernel takes the stronger of the device-specific setting and the "all" value,
so effectively having a weaker setting for specific interfaces is not possible.
This makes ping(8) work without CAP_NET_ADMIN and CAP_NET_RAW because
those aren't effective inside rootless Podman containers.
It's quite useful when using OSTree based operating systems like Fedora
Silverblue, where development environments are often set up using
rootless Podman containers with helpers like Toolbox [1]. Not having
a basic network utility like ping(8) work inside the development
environment can be inconvenient.
See:
https://lwn.net/Articles/422330/http://man7.org/linux/man-pages/man7/icmp.7.htmlhttps://github.com/containers/libpod/issues/1550
The upper limit of the range of group identifiers is set to 2147483647,
which is 2^31-1. Values greater than that get rejected by the kernel
because of this definition in linux/include/net/ping.h:
#define GID_T_MAX (((gid_t)~0U) >> 1)
That's not so bad because values between 2^31 and 2^32-1 are reserved
on systemd-based systems anyway [2].
[1] https://github.com/debarshiray/toolbox
[2] https://systemd.io/UIDS-GIDS.html#summary
I couldn't see any reason why the kernel could provide COMM to the coredump
handler via the core_pattern command line but could not make it available in
/proc. So let's assume that this info is always available in /proc.
For "backtrace" mode (when --backtrace option is passed), I assumed that the
crashing process still exists at the time systemd-coredump is called.
Also changing the core_pattern line is an API breakage for any users of the
backtrace mode but given that systemd-coredump is installed in
/usr/lib/systemd, it's a private tool which has no internal users. At least no
one complained when the hostname was added to the core_pattern line
(f45b801551)...
Indeed it's much easier to get it from /proc since the kernel substitutes '%e'
specifier with multiple strings if the process name contains spaces (!).
This should PID collisions a tiny bit less likely, and thus improve
security and robustness.
2^22 isn't particularly a lot either, but it's the current kernel
limitation.
Bumping this limit was suggested by Linus himself:
https://lwn.net/ml/linux-kernel/CAHk-=wiZ40LVjnXSi9iHLE_-ZBsWFGCgdmNiYZUXn1-V5YBg2g@mail.gmail.com/
Let's experiment with this in systemd upstream first. Downstreams and
users can after all still comment this easily.
Besides compat concern the most often heard issue with such high PIDs is
usability, since they are potentially hard to type. I am not entirely sure though
whether 4194304 (as largest new PID) is that much worse to type or to
copy than 65563.
This should also simplify management of per system tasks limits as by
this move the sysctl /proc/sys/kernel/threads-max becomes the primary
knob to control how many processes to have in parallel.
These sysctls were added in Linux 4.19 (torvalds/linux@30aba6656f), and
we should enable them just like we enable the older hardlink/symlink
protection since v199. Implements #11414.
This switches the RFC3704 Reverse Path filtering from Strict mode to Loose
mode. The Strict mode breaks some pretty common and reasonable use cases,
such as keeping connections via one default route alive after another one
appears (e.g. plugging an Ethernet cable when connected via Wi-Fi).
The strict filter also makes it impossible for NetworkManager to do
connectivity check on a newly arriving default route (it starts with a
higher metric and is bumped lower if there's connectivity).
Kernel's default is 0 (no filter), but a Loose filter is good enough. The
few use cases where a Strict mode could make sense can easily override
this.
The distributions that don't care about the client use cases and prefer a
strict filter could just ship a custom configuration in
/usr/lib/sysctl.d/ to override this.
Turning on ECN still causes slow or broken network on linux. Our tcp
is not yet ready for wide spread use of ECN.
This reverts commit 919472741dba6ad0a3f6c2b76d390a02d0e2fdc3.
To further avoid bufferbloat Explicit Congestion Notification (ECN)
should be enabled for both in and outgoing connections. The kernel
default is to enable it when requested for incoming connections, but
not to request it on outgoing connections. This patch enables it for
both.
A long time ago enabling these was causing problems, but these issues
have since been dealt with.
Fixes#9087.
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.
I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
This commint adds a new command line parameter to sytemd-coredump. The
parameter should be mappend to core_pattern's placeholder %h - hostname.
The field _HOSTNAME holds the name from the kernel's namespaces which might be
different then the one comming from process' namespaces.
It is true that the real hostname is usually available in the field
COREDUMP_ENVIRON (environment variables) but I believe it is more reliable to
use the value passed by kernel.
----
The length of iovec is no longer static and hence I corrected the declarations
of the functions set_iovec_field and set_iovec_field_free.
Thank you @yuwata and @poettering!
It is redundant because in these cases the values in
`net.ipv4.conf.all.*` take precedence. Also, setting the `default` does
nothing for devices that already exist.
So far I avoided adding license headers to meson files, but they are pretty
big and important and should carry license headers like everything else.
I added my own copyright, even though other people modified those files too.
But this is mostly symbolic, so I hope that's OK.
The advantage is that is the name is mispellt, cpp will warn us.
$ git grep -Ee "conf.set\('(HAVE|ENABLE)_" -l|xargs sed -r -i "s/conf.set\('(HAVE|ENABLE)_/conf.set10('\1_/"
$ git grep -Ee '#ifn?def (HAVE|ENABLE)' -l|xargs sed -r -i 's/#ifdef (HAVE|ENABLE)/#if \1/; s/#ifndef (HAVE|ENABLE)/#if ! \1/;'
$ git grep -Ee 'if.*defined\(HAVE' -l|xargs sed -i -r 's/defined\((HAVE_[A-Z0-9_]*)\)/\1/g'
$ git grep -Ee 'if.*defined\(ENABLE' -l|xargs sed -i -r 's/defined\((ENABLE_[A-Z0-9_]*)\)/\1/g'
+ manual changes to meson.build
squash! build-sys: use #if Y instead of #ifdef Y everywhere
v2:
- fix incorrect setting of HAVE_LIBIDN2