diff --git a/man/resolved.conf.xml b/man/resolved.conf.xml
index 3c56b76748..2b9f482971 100644
--- a/man/resolved.conf.xml
+++ b/man/resolved.conf.xml
@@ -75,7 +75,7 @@
Domains=
- A space-separated list of domains optionally prefixed with ~,
+ A space-separated list of domains, optionally prefixed with ~,
used for two distinct purposes described below. Defaults to the empty list.Any domains not prefixed with ~ are used as search
@@ -86,17 +86,23 @@
/etc/resolv.conf with the search keyword are used instead, if
that file exists and any domains are configured in it.
- The domains prefixed with ~ are called "routing domains". All domains listed
- here (both search domains and routing domains after removing the ~ prefix) define
- a search path that preferably directs DNS queries to this interface. This search path has an effect
- only when suitable per-link DNS servers are known. Such servers may be defined through the
- DNS= setting (see above) and dynamically at run time, for example from DHCP
- leases. If no per-link DNS servers are known, routing domains have no effect.
+ The domains prefixed with ~ are called "route-only domains". All domains
+ listed here (both search domains and route-only domains after removing the
+ ~ prefix) define a search path that preferably directs DNS queries to this
+ interface. This search path has an effect only when suitable per-link DNS servers are known. Such
+ servers may be defined through the DNS= setting (see above) and dynamically at run
+ time, for example from DHCP leases. If no per-link DNS servers are known, route-only domains have no
+ effect.Use the construct ~. (which is composed from ~ to
- indicate a routing domain and . to indicate the DNS root domain that is the
+ indicate a route-only domain and . to indicate the DNS root domain that is the
implied suffix of all DNS domains) to use the DNS servers defined for this link preferably for all
- domains.
+ domains.
+
+ See "Protocols and Routing" in
+ systemd-resolved.service8
+ for details of how search and route-only domains are used.
+
diff --git a/man/systemd-system.conf.xml b/man/systemd-system.conf.xml
index 71d403db8d..829b4be0ed 100644
--- a/man/systemd-system.conf.xml
+++ b/man/systemd-system.conf.xml
@@ -444,11 +444,11 @@
IPAccounting=. See
systemd.resource-control5
for details on the per-unit settings. DefaultTasksAccounting= defaults to yes,
- DefaultMemoryAccounting= to
- &MEMORY_ACCOUNTING_DEFAULT;. DefaultCPUAccounting= defaults to yes if enabling CPU
- accounting doesn't require the CPU controller to be enabled (Linux 4.15+ using the unified hierarchy
- for resource control), otherwise it defaults to no. The other three settings default to
- no.
+ DefaultMemoryAccounting= to &MEMORY_ACCOUNTING_DEFAULT;.
+ DefaultCPUAccounting= defaults to yes, but really has no effect if enabling CPU
+ accounting doesn't require the controller to be enabled (Linux 4.15+ using the
+ unified hierarchy for resource control), otherwise it defaults to no. The other three settings
+ default to no.
diff --git a/man/systemd.resource-control.xml b/man/systemd.resource-control.xml
index f057433973..d18fd9a94c 100644
--- a/man/systemd.resource-control.xml
+++ b/man/systemd.resource-control.xml
@@ -59,10 +59,76 @@
systemd.exec5.
Those options complement options listed here.
- See the New
- Control Group Interfaces for an introduction on how to make
- use of resource control APIs from programs.
+
+ Enabling and disabling controllers
+
+ Controllers in the cgroup hierarchy are hierarchical, and resource control is realized by
+ distributing resource assignments between siblings in branches of the cgroup hierarchy. There is no
+ need to explicitly enable a cgroup controller for a unit.
+ systemd will instruct the kernel to enable a controller for a given unit when this
+ unit has configuration for a given controller. For example, when CPUWeight= is set,
+ the controller will be enabled, and when TasksMax= are set, the
+ controller will be enabled. In addition, various controllers may be also be
+ enabled explicitly via the
+ MemoryAccounting=/TasksAccounting=/IOAccounting=
+ settings. Because of how the cgroup hierarchy works, controllers will be automatically enabled for all
+ parent units and for any sibling units starting with the lowest level at which a controller is enabled.
+ Units for which a controller is enabled may be subject to resource control even if they don't have any
+ explicit configuration.
+
+ Setting Delegate= enables any delegated controllers for that unit (see below).
+ The delegatee may then enable controllers for its children as appropriate. In particular, if the
+ delegatee is systemd (in the user@.service unit), it will
+ repeat the same logic as the system instance and enable controllers for user units which have resource
+ limits configured, and their siblings and parents and parents' siblings.
+
+ Controllers may be disabled for parts of the cgroup hierarchy with
+ DisableControllers= (see below).
+
+
+ Enabling and disabling controllers
+
+
+ -.slice
+ / \
+ /-----/ \--------------\
+ / \
+ system.slice user.slice
+ / \ / \
+ / \ / \
+ / \ user@0.service user@1000.service
+ / \ Delegate=yes Delegate=yes
+a.service b.slice / \
+CPUWeight=20 DisableControllers=cpu / \
+ / \ app.slice session.slice
+ / \ CPUWeight=100 CPUWeight=100
+ / \
+ b1.service b2.service
+ CPUWeight=1000
+
+
+ In this hierarchy, the controller is enabled for all units shown except
+ b1.service and b2.service. Because there is no explicit
+ configuration for system.slice and user.slice, CPU
+ resources will be split equally between them. Similarly, resources are allocated equally between
+ children of user.slice and between the child slices beneath
+ user@1000.service. Assuming that there is no futher configuration of resources
+ or delegation below slices app.slice or session.slice, the
+ controller would not be enabled for units in those slices and CPU resources
+ would be further allocated using other mechanisms, e.g. based on nice levels.
+
+ In the slice system.slice, CPU resources are split 1:6 for service
+ a.service, and 5:6 for slice b.slice, because slice
+ b.slice gets the default value of 100 for cpu.weight when
+ CPUWeight= is not set.
+
+ CPUWeight= setting in service b2.service is neutralized
+ by DisableControllers= in slice b.slice, so the
+ controller would not be enabled for services b1.service and
+ b2.service, and CPU resources would be further allocated using other mechanisms,
+ e.g. based on nice levels.
+
+ Setting resource controls for a group of related units
@@ -82,6 +148,11 @@
/etc/systemd/system/user-.slice.d/*.conf. This last directory
applies to all user slices.
+
+ See the New
+ Control Group Interfaces for an introduction on how to make
+ use of resource control APIs from programs.
@@ -118,6 +189,9 @@
setting may be controlled with
DefaultCPUAccounting= in
systemd-system.conf5.
+
+ Under the unified cgroup hierarchy, CPU accounting is available for all units and this
+ setting has no effect.
@@ -126,17 +200,20 @@
StartupCPUWeight=weight
+ These settings control the controller in the unified hierarchy.
+
These options accept an integer value or a the special string "idle":
- If set to an integer value, assign the specified CPU time weight to the processes executed,
- if the unified control group hierarchy is used on the system. These options control the
- cpu.weight control group attribute. The allowed range is 1 to 10000. Defaults to
- 100. For details about this control group attribute, see Control Groups v2
- and CFS
- Scheduler. The available CPU time is split up among all units within one slice relative to
- their CPU time weight. A higher weight means more CPU time, a lower weight means less.
+ If set to an integer value, assign the specified CPU time weight to the processes
+ executed, if the unified control group hierarchy is used on the system. These options control
+ the cpu.weight control group attribute. The allowed range is 1 to 10000.
+ Defaults to unset, but the kernel default is 100. For details about this control group
+ attribute, see Control Groups
+ v2 and CFS
+ Scheduler. The available CPU time is split up among all units within one slice
+ relative to their CPU time weight. A higher weight means more CPU time, a lower weight means
+ less.If set to the special string "idle", mark the cgroup for "idle scheduling", which means
@@ -151,6 +228,13 @@
CPUWeight= applies to normal runtime of the system, and if the former is not set also to
the startup and shutdown phases. Using StartupCPUWeight= allows prioritizing specific services at
boot-up and shutdown differently than during normal runtime.
+
+ In addition to the resource allocation performed by the controller, the
+ kernel may automatically divide resources based on session-id grouping, see "The autogroup feature"
+ in sched7.
+ The effect of this feature is similar to the controller with no explicit
+ configuration, so users should be careful to not mistake one for the other.
@@ -158,6 +242,8 @@
CPUQuota=
+ This setting controls the controller in the unified hierarchy.
+
Assign the specified CPU time quota to the processes executed. Takes a percentage value, suffixed with
"%". The percentage specifies how much CPU time the unit shall get at maximum, relative to the total CPU time
available on one CPU. Use values > 100% for allotting CPU time on more than one CPU. This controls the
@@ -177,6 +263,8 @@
CPUQuotaPeriodSec=
+ This setting controls the controller in the unified hierarchy.
+
Assign the duration over which the CPU time quota specified by CPUQuota= is measured.
Takes a time duration value in seconds, with an optional suffix such as "ms" for milliseconds (or "s" for seconds.)
The default setting is 100ms. The period is clamped to the range supported by the kernel, which is [1ms, 1000ms].
@@ -197,6 +285,8 @@
StartupAllowedCPUs=
+ This setting controls the controller in the unified hierarchy.
+
Restrict processes to be executed on specific CPUs. Takes a list of CPU indices or ranges separated by either
whitespace or commas. CPU ranges are specified by the lower and upper CPU indices separated by a dash.
@@ -218,6 +308,8 @@
StartupAllowedMemoryNodes=
+ These settings control the controller in the unified hierarchy.
+
Restrict processes to be executed on specific memory NUMA nodes. Takes a list of memory NUMA nodes indices
or ranges separated by either whitespace or commas. Memory NUMA nodes ranges are specified by the lower and upper
NUMA nodes indices separated by a dash.
@@ -239,6 +331,8 @@
MemoryAccounting=
+ This setting controls the controller in the unified hierarchy.
+
Turn on process and kernel memory accounting for this
unit. Takes a boolean argument. Note that turning on memory
accounting for one unit will also implicitly turn it on for
@@ -255,6 +349,8 @@
StartupMemoryLow=bytes, DefaultStartupMemoryLow=bytes
+ These settings control the controller in the unified hierarchy.
+
Specify the memory usage protection of the executed processes in this unit.
When reclaiming memory, the unit is treated as if it was using less memory resulting in memory
to be preferentially reclaimed from unprotected units.
@@ -299,6 +395,8 @@
StartupMemoryHigh=bytes
+ These settings control the controller in the unified hierarchy.
+
Specify the throttling limit on memory usage of the executed processes in this unit. Memory usage may go
above the limit if unavoidable, but the processes are heavily slowed down and memory is taken away
aggressively in such cases. This is the main mechanism to control memory usage of a unit.
@@ -323,6 +421,8 @@
StartupMemoryMax=bytes
+ These settings control the controller in the unified hierarchy.
+
Specify the absolute limit on memory usage of the executed processes in this unit. If memory usage
cannot be contained under the limit, out-of-memory killer is invoked inside the unit. It is recommended to
use MemoryHigh= as the main control mechanism and use MemoryMax= as the
@@ -347,6 +447,8 @@
StartupMemorySwapMax=bytes
+ These settings control the controller in the unified hierarchy.
+
Specify the absolute limit on swap usage of the executed processes in this unit.Takes a swap size in bytes. If the value is suffixed with K, M, G or T, the specified swap size is
@@ -367,6 +469,8 @@
StartupMemoryZSwapMax=bytes
+ These settings control the controller in the unified hierarchy.
+
Specify the absolute limit on zswap usage of the processes in this unit. Zswap is a lightweight compressed
cache for swap pages. It takes pages that are in the process of being swapped out and attempts to compress them into a
dynamically allocated RAM-based memory pool. If the limit specified is hit, no entries from this unit will be
@@ -390,17 +494,14 @@
TasksAccounting=
- Turn on task accounting for this unit. Takes a
- boolean argument. If enabled, the system manager will keep
- track of the number of tasks in the unit. The number of
- tasks accounted this way includes both kernel threads and
- userspace processes, with each thread counting
- individually. Note that turning on tasks accounting for one
- unit will also implicitly turn it on for all units contained
- in the same slice and for all its parent slices and the
- units contained therein. The system default for this setting
- may be controlled with
- DefaultTasksAccounting= in
+ This setting controls the controller in the unified hierarchy.
+
+ Turn on task accounting for this unit. Takes a boolean argument. If enabled, the kernel will
+ keep track of the total number of tasks in the unit and its children. This number includes both
+ kernel threads and userspace processes, with each thread counted individually. Note that turning on
+ tasks accounting for one unit will also implicitly turn it on for all units contained in the same
+ slice and for all its parent slices and the units contained therein. The system default for this
+ setting may be controlled with DefaultTasksAccounting= in
systemd-system.conf5.
@@ -409,6 +510,8 @@
TasksMax=N
+ This setting controls the controller in the unified hierarchy.
+
Specify the maximum number of tasks that may be created in the unit. This ensures that the
number of tasks accounted for the unit (see above) stays below a specific limit. This either takes
an absolute number of tasks or a percentage value that is taken relative to the configured maximum
@@ -428,6 +531,8 @@
IOAccounting=
+ This setting controls the controller in the unified hierarchy.
+
Turn on Block I/O accounting for this unit, if the unified control group hierarchy is used on the
system. Takes a boolean argument. Note that turning on block I/O accounting for one unit will also implicitly
turn it on for all units contained in the same slice and all for its parent slices and the units contained
@@ -442,6 +547,8 @@
StartupIOWeight=weight
+ These settings control the controller in the unified hierarchy.
+
Set the default overall block I/O weight for the executed processes, if the unified control
group hierarchy is used on the system. Takes a single weight value (between 1 and 10000) to set the
default block I/O weight. This controls the io.weight control group attribute,
@@ -464,6 +571,8 @@
IODeviceWeight=deviceweight
+ This setting controls the controller in the unified hierarchy.
+
Set the per-device overall block I/O weight for the executed processes, if the unified control group
hierarchy is used on the system. Takes a space-separated pair of a file path and a weight value to specify
the device specific weight value, between 1 and 10000. (Example: /dev/sda 1000). The file
@@ -488,6 +597,8 @@
IOWriteBandwidthMax=devicebytes
+ These settings control the controller in the unified hierarchy.
+
Set the per-device overall block I/O bandwidth maximum limit for the executed processes, if the unified
control group hierarchy is used on the system. This limit is not work-conserving and the executed processes
are not allowed to use more even if the device has idle capacity. Takes a space-separated pair of a file
@@ -510,6 +621,8 @@
IOWriteIOPSMax=deviceIOPS
+ These settings control the controller in the unified hierarchy.
+
Set the per-device overall block I/O IOs-Per-Second maximum limit for the executed processes, if the
unified control group hierarchy is used on the system. This limit is not work-conserving and the executed
processes are not allowed to use more even if the device has idle capacity. Takes a space-separated pair of
@@ -531,6 +644,8 @@
IODeviceLatencyTargetSec=devicetarget
+ This setting controls the controller in the unified hierarchy.
+
Set the per-device average target I/O latency for the executed processes, if the unified control group
hierarchy is used on the system. Takes a file path and a timespan separated by a space to specify
the device specific latency target. (Example: "/dev/sda 25ms"). The file path may be specified
@@ -1034,29 +1149,37 @@ DeviceAllow=/dev/loop-control
Turns on delegation of further resource control partitioning to processes of the unit. Units where this
is enabled may create and manage their own private subhierarchy of control groups below the control group of
the unit itself. For unprivileged services (i.e. those using the User= setting) the unit's
- control group will be made accessible to the relevant user. When enabled the service manager will refrain
- from manipulating control groups or moving processes below the unit's control group, so that a clear concept
- of ownership is established: the control group tree above the unit's control group (i.e. towards the root
- control group) is owned and managed by the service manager of the host, while the control group tree below
- the unit's control group is owned and managed by the unit itself. Takes either a boolean argument or a list
- of control group controller names. If true, delegation is turned on, and all supported controllers are
- enabled for the unit, making them available to the unit's processes for management. If false, delegation is
- turned off entirely (and no additional controllers are enabled). If set to a list of controllers, delegation
- is turned on, and the specified controllers are enabled for the unit. Note that additional controllers than
- the ones specified might be made available as well, depending on configuration of the containing slice unit
- or other units contained in it. Note that assigning the empty string will enable delegation, but reset the
- list of controllers, all assignments prior to this will have no effect. Defaults to false.
+ control group will be made accessible to the relevant user.
- Note that controller delegation to less privileged code is only safe on the unified control group
- hierarchy. Accordingly, access to the specified controllers will not be granted to unprivileged services on
- the legacy hierarchy, even when requested.
+ When enabled the service manager will refrain from manipulating control groups or moving
+ processes below the unit's control group, so that a clear concept of ownership is established: the
+ control group tree above the unit's control group (i.e. towards the root control group) is owned
+ and managed by the service manager of the host, while the control group tree below the unit's
+ control group is owned and managed by the unit itself.
+
+ Takes either a boolean argument or a list of control group controller names. If true,
+ delegation is turned on, and all supported controllers are enabled for the unit, making them
+ available to the unit's processes for management. If false, delegation is turned off entirely (and
+ no additional controllers are enabled). If set to a list of controllers, delegation is turned on,
+ and the specified controllers are enabled for the unit. Note that additional controllers other than
+ the ones specified might be made available as well, depending on configuration of the containing
+ slice unit or other units contained in it. Note that assigning the empty string will enable
+ delegation, but reset the list of controllers, and all assignments prior to this will have no
+ effect. Defaults to false.
+
+ Note that controller delegation to less privileged code is only safe on the unified control
+ group hierarchy. Accordingly, access to the specified controllers will not be granted to
+ unprivileged services on the legacy hierarchy, even when requested.
- Not all of these controllers are available on all kernels however, and some are
- specific to the unified hierarchy while others are specific to the legacy hierarchy. Also note that the
- kernel might support further controllers, which aren't covered here yet as delegation is either not supported
- at all for them or not defined cleanly.
+ Not all of these controllers are available on all kernels however, and some are specific to
+ the unified hierarchy while others are specific to the legacy hierarchy. Also note that the kernel
+ might support further controllers, which aren't covered here yet as delegation is either not
+ supported at all for them or not defined cleanly.
+
+ Note that because of the hierarchical nature of cgroup hierarchy, any controllers that are
+ delegated will be enabled for the parent and sibling units of the unit with delegation.For further details on the delegation model consult Control Group APIs and Delegation.
@@ -1067,19 +1190,20 @@ DeviceAllow=/dev/loop-control
DisableControllers=
- Disables controllers from being enabled for a unit's children. If a controller listed is already in use
- in its subtree, the controller will be removed from the subtree. This can be used to avoid child units being
- able to implicitly or explicitly enable a controller. Defaults to not disabling any controllers.
-
- It may not be possible to successfully disable a controller if the unit or any child of the unit in
- question delegates controllers to its children, as any delegated subtree of the cgroup hierarchy is unmanaged
- by systemd.
+ Disables controllers from being enabled for a unit's children. If a controller listed is
+ already in use in its subtree, the controller will be removed from the subtree. This can be used to
+ avoid configuration in child units from being able to implicitly or explicitly enable a controller.
+ Defaults to empty.Multiple controllers may be specified, separated by spaces. You may also pass
DisableControllers= multiple times, in which case each new instance adds another controller
to disable. Passing DisableControllers= by itself with no controller name present resets
the disabled controller list.
+ It may not be possible to disable a controller after units have been started, if the unit or
+ any child of the unit in question delegates controllers to its children, as any delegated subtree
+ of the cgroup hierarchy is unmanaged by systemd.
+
diff --git a/src/core/system.conf.in b/src/core/system.conf.in
index 0c27586c46..9572b57f17 100644
--- a/src/core/system.conf.in
+++ b/src/core/system.conf.in
@@ -51,7 +51,7 @@
#DefaultStartLimitIntervalSec=10s
#DefaultStartLimitBurst=5
#DefaultEnvironment=
-#DefaultCPUAccounting=no
+#DefaultCPUAccounting=yes
#DefaultIOAccounting=no
#DefaultIPAccounting=no
#DefaultMemoryAccounting={{ 'yes' if MEMORY_ACCOUNTING_DEFAULT else 'no' }}