The file name: accounting/delay-accounting.rst
should be, instead: Documentation/accounting/delay-accounting.rst.
Also, there's no need to use doc:`foo`, as automarkup.py will
automatically handle plain text mentions to Documentation/
files.
So, update its cross-reference accordingly.
Fixes: fcb5017045 ("delayacct: Document task_delayacct sysctl")
Fixes: c3123552aa ("docs: accounting: convert to ReST")
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
The proactive compaction[1] gets triggered for every 500msec and run
compaction on the node for COMPACTION_HPAGE_ORDER (usually order-9) pages
based on the value set to sysctl.compaction_proactiveness. Triggering the
compaction for every 500msec in search of COMPACTION_HPAGE_ORDER pages is
not needed for all applications, especially on the embedded system
usecases which may have few MB's of RAM. Enabling the proactive
compaction in its state will endup in running almost always on such
systems.
Other side, proactive compaction can still be very much useful for getting
a set of higher order pages in some controllable manner(controlled by
using the sysctl.compaction_proactiveness). So, on systems where enabling
the proactive compaction always may proove not required, can trigger the
same from user space on write to its sysctl interface. As an example, say
app launcher decide to launch the memory heavy application which can be
launched fast if it gets more higher order pages thus launcher can prepare
the system in advance by triggering the proactive compaction from
userspace.
This triggering of proactive compaction is done on a write to
sysctl.compaction_proactiveness by user.
[1]https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=facdaa917c4d5a376d09d25865f5a863f906234a
[akpm@linux-foundation.org: tweak vm.rst, per Mike]
Link: https://lkml.kernel.org/r/1627653207-12317-1-git-send-email-charante@codeaurora.org
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Rafael Aquini <aquini@redhat.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Nitin Gupta <nigupta@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Khalid Aziz <khalid.aziz@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge misc updates from Andrew Morton:
"191 patches.
Subsystems affected by this patch series: kthread, ia64, scripts,
ntfs, squashfs, ocfs2, kernel/watchdog, and mm (gup, pagealloc, slab,
slub, kmemleak, dax, debug, pagecache, gup, swap, memcg, pagemap,
mprotect, bootmem, dma, tracing, vmalloc, kasan, initialization,
pagealloc, and memory-failure)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (191 commits)
mm,hwpoison: make get_hwpoison_page() call get_any_page()
mm,hwpoison: send SIGBUS with error virutal address
mm/page_alloc: split pcp->high across all online CPUs for cpuless nodes
mm/page_alloc: allow high-order pages to be stored on the per-cpu lists
mm: replace CONFIG_FLAT_NODE_MEM_MAP with CONFIG_FLATMEM
mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA
docs: remove description of DISCONTIGMEM
arch, mm: remove stale mentions of DISCONIGMEM
mm: remove CONFIG_DISCONTIGMEM
m68k: remove support for DISCONTIGMEM
arc: remove support for DISCONTIGMEM
arc: update comment about HIGHMEM implementation
alpha: remove DISCONTIGMEM and NUMA
mm/page_alloc: move free_the_page
mm/page_alloc: fix counting of managed_pages
mm/page_alloc: improve memmap_pages dbg msg
mm: drop SECTION_SHIFT in code comments
mm/page_alloc: introduce vm.percpu_pagelist_high_fraction
mm/page_alloc: limit the number of pages on PCP lists when reclaim is active
mm/page_alloc: scale the number of pages that are batch freed
...
This introduces a new sysctl vm.percpu_pagelist_high_fraction. It is
similar to the old vm.percpu_pagelist_fraction. The old sysctl increased
both pcp->batch and pcp->high with the higher pcp->high potentially
reducing zone->lock contention. However, the higher pcp->batch value also
potentially increased allocation latency while the PCP was refilled. This
sysctl only adjusts pcp->high so that zone->lock contention is potentially
reduced but allocation latency during a PCP refill remains the same.
# grep -E "high:|batch" /proc/zoneinfo | tail -2
high: 649
batch: 63
# sysctl vm.percpu_pagelist_high_fraction=8
# grep -E "high:|batch" /proc/zoneinfo | tail -2
high: 35071
batch: 63
# sysctl vm.percpu_pagelist_high_fraction=64
high: 4383
batch: 63
# sysctl vm.percpu_pagelist_high_fraction=0
high: 649
batch: 63
[mgorman@techsingularity.net: fix documentation]
Link: https://lkml.kernel.org/r/20210528151010.GQ30378@techsingularity.net
Link: https://lkml.kernel.org/r/20210525080119.5455-7-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "Calculate pcp->high based on zone sizes and active CPUs", v2.
The per-cpu page allocator (PCP) is meant to reduce contention on the zone
lock but the sizing of batch and high is archaic and neither takes the
zone size into account or the number of CPUs local to a zone. With larger
zones and more CPUs per node, the contention is getting worse.
Furthermore, the fact that vm.percpu_pagelist_fraction adjusts both batch
and high values means that the sysctl can reduce zone lock contention but
also increase allocation latencies.
This series disassociates pcp->high from pcp->batch and then scales
pcp->high based on the size of the local zone with limited impact to
reclaim and accounting for active CPUs but leaves pcp->batch static. It
also adapts the number of pages that can be on the pcp list based on
recent freeing patterns.
The motivation is partially to adjust to larger memory sizes but is also
driven by the fact that large batches of page freeing via release_pages()
often shows zone contention as a major part of the problem. Another is a
bug report based on an older kernel where a multi-terabyte process can
takes several minutes to exit. A workaround was to use
vm.percpu_pagelist_fraction to increase the pcp->high value but testing
indicated that a production workload could not use the same values because
of an increase in allocation latencies. Unfortunately, I cannot reproduce
this test case myself as the multi-terabyte machines are in active use but
it should alleviate the problem.
The series aims to address both and partially acts as a pre-requisite.
pcp only works with order-0 which is useless for SLUB (when using high
orders) and THP (unconditionally). To store high-order pages on PCP, the
pcp->high values need to be increased first.
This patch (of 6):
The vm.percpu_pagelist_fraction is used to increase the batch and high
limits for the per-cpu page allocator (PCP). The intent behind the sysctl
is to reduce zone lock acquisition when allocating/freeing pages but it
has a problem. While it can decrease contention, it can also increase
latency on the allocation side due to unreasonably large batch sizes.
This leads to games where an administrator adjusts
percpu_pagelist_fraction on the fly to work around contention and
allocation latency problems.
This series aims to alleviate the problems with zone lock contention while
avoiding the allocation-side latency problems. For the purposes of
review, it's easier to remove this sysctl now and reintroduce a similar
sysctl later in the series that deals only with pcp->high.
Link: https://lkml.kernel.org/r/20210525080119.5455-1-mgorman@techsingularity.net
Link: https://lkml.kernel.org/r/20210525080119.5455-2-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull documentation updates from Jonathan Corbet:
"This was a reasonably active cycle for documentation; this includes:
- Some kernel-doc cleanups. That script is still regex onslaught from
hell, but it has gotten a little better.
- Improvements to the checkpatch docs, which are also used by the
tool itself.
- A major update to the pathname lookup documentation.
- Elimination of :doc: markup, since our automarkup magic can create
references from filenames without all the extra noise.
- The flurry of Chinese translation activity continues.
Plus, of course, the usual collection of updates, typo fixes, and
warning fixes"
* tag 'docs-5.14' of git://git.lwn.net/linux: (115 commits)
docs: path-lookup: use bare function() rather than literals
docs: path-lookup: update symlink description
docs: path-lookup: update get_link() ->follow_link description
docs: path-lookup: update WALK_GET, WALK_PUT desc
docs: path-lookup: no get_link()
docs: path-lookup: update i_op->put_link and cookie description
docs: path-lookup: i_op->follow_link replaced with i_op->get_link
docs: path-lookup: Add macro name to symlink limit description
docs: path-lookup: remove filename_mountpoint
docs: path-lookup: update do_last() part
docs: path-lookup: update path_mountpoint() part
docs: path-lookup: update path_to_nameidata() part
docs: path-lookup: update follow_managed() part
docs: Makefile: Use CONFIG_SHELL not SHELL
docs: Take a little noise out of the build process
docs: x86: avoid using ReST :doc:`foo` markup
docs: virt: kvm: s390-pv-boot.rst: avoid using ReST :doc:`foo` markup
docs: userspace-api: landlock.rst: avoid using ReST :doc:`foo` markup
docs: trace: ftrace.rst: avoid using ReST :doc:`foo` markup
docs: trace: coresight: coresight.rst: avoid using ReST :doc:`foo` markup
...
Pull networking fixes from Jakub Kicinski:
"Networking fixes for 5.13-rc4, including fixes from bpf, netfilter,
can and wireless trees. Notably including fixes for the recently
announced "FragAttacks" WiFi vulnerabilities. Rather large batch,
touching some core parts of the stack, too, but nothing hair-raising.
Current release - regressions:
- tipc: make node link identity publish thread safe
- dsa: felix: re-enable TAS guard band mode
- stmmac: correct clocks enabled in stmmac_vlan_rx_kill_vid()
- stmmac: fix system hang if change mac address after interface
ifdown
Current release - new code bugs:
- mptcp: avoid OOB access in setsockopt()
- bpf: Fix nested bpf_bprintf_prepare with more per-cpu buffers
- ethtool: stats: fix a copy-paste error - init correct array size
Previous releases - regressions:
- sched: fix packet stuck problem for lockless qdisc
- net: really orphan skbs tied to closing sk
- mlx4: fix EEPROM dump support
- bpf: fix alu32 const subreg bound tracking on bitwise operations
- bpf: fix mask direction swap upon off reg sign change
- bpf, offload: reorder offload callback 'prepare' in verifier
- stmmac: Fix MAC WoL not working if PHY does not support WoL
- packetmmap: fix only tx timestamp on request
- tipc: skb_linearize the head skb when reassembling msgs
Previous releases - always broken:
- mac80211: address recent "FragAttacks" vulnerabilities
- mac80211: do not accept/forward invalid EAPOL frames
- mptcp: avoid potential error message floods
- bpf, ringbuf: deny reserve of buffers larger than ringbuf to
prevent out of buffer writes
- bpf: forbid trampoline attach for functions with variable arguments
- bpf: add deny list of functions to prevent inf recursion of tracing
programs
- tls splice: check SPLICE_F_NONBLOCK instead of MSG_DONTWAIT
- can: isotp: prevent race between isotp_bind() and
isotp_setsockopt()
- netfilter: nft_set_pipapo_avx2: Add irq_fpu_usable() check,
fallback to non-AVX2 version
Misc:
- bpf: add kconfig knob for disabling unpriv bpf by default"
* tag 'net-5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (172 commits)
net: phy: Document phydev::dev_flags bits allocation
mptcp: validate 'id' when stopping the ADD_ADDR retransmit timer
mptcp: avoid error message on infinite mapping
mptcp: drop unconditional pr_warn on bad opt
mptcp: avoid OOB access in setsockopt()
nfp: update maintainer and mailing list addresses
net: mvpp2: add buffer header handling in RX
bnx2x: Fix missing error code in bnx2x_iov_init_one()
net: zero-initialize tc skb extension on allocation
net: hns: Fix kernel-doc
sctp: fix the proc_handler for sysctl encap_port
sctp: add the missing setting for asoc encap_port
bpf, selftests: Adjust few selftest result_unpriv outcomes
bpf: No need to simulate speculative domain for immediates
bpf: Fix mask direction swap upon off reg sign change
bpf: Wrap aux data inside bpf_sanitize_info container
bpf: Fix BPF_LSM kconfig symbol dependency
selftests/bpf: Add test for l3 use of bpf_redirect_peer
bpftool: Add sock_release help info for cgroup attach/prog load command
net: dsa: microchip: enable phy errata workaround on 9567
...
It's been a few releases since this defaulted to /sbin/hotplug. Update
the text, and include pointers to the two CONFIG_UEVENT_HELPER{,_PATH}
config knobs whose help text could provide more info, but also hint
that the user probably doesn't need to care at all.
Fixes: 7934779a69 ("Driver-Core: disable /sbin/hotplug by default")
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Link: https://lore.kernel.org/r/20210420120638.1104016-1-linux@rasmusvillemoes.dk
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Daniel Borkmann says:
====================
pull-request: bpf 2021-05-11
The following pull-request contains BPF updates for your *net* tree.
We've added 13 non-merge commits during the last 8 day(s) which contain
a total of 21 files changed, 817 insertions(+), 382 deletions(-).
The main changes are:
1) Fix multiple ringbuf bugs in particular to prevent writable mmap of
read-only pages, from Andrii Nakryiko & Thadeu Lima de Souza Cascardo.
2) Fix verifier alu32 known-const subregister bound tracking for bitwise
operations and/or/xor, from Daniel Borkmann.
3) Reject trampoline attachment for functions with variable arguments,
and also add a deny list of other forbidden functions, from Jiri Olsa.
4) Fix nested bpf_bprintf_prepare() calls used by various helpers by
switching to per-CPU buffers, from Florent Revest.
5) Fix kernel compilation with BTF debug info on ppc64 due to pahole
missing TCP-CC functions like cubictcp_init, from Martin KaFai Lau.
6) Add a kconfig entry to provide an option to disallow unprivileged
BPF by default, from Daniel Borkmann.
7) Fix libbpf compilation for older libelf when GELF_ST_VISIBILITY()
macro is not available, from Arnaldo Carvalho de Melo.
8) Migrate test_tc_redirect to test_progs framework as prep work
for upcoming skb_change_head() fix & selftest, from Jussi Maki.
9) Fix a libbpf segfault in add_dummy_ksym_var() if BTF is not
present, from Ian Rogers.
10) Fix tx_only micro-benchmark in xdpsock BPF sample with proper frame
size, from Magnus Karlsson.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a kconfig knob which allows for unprivileged bpf to be disabled by default.
If set, the knob sets /proc/sys/kernel/unprivileged_bpf_disabled to value of 2.
This still allows a transition of 2 -> {0,1} through an admin. Similarly,
this also still keeps 1 -> {1} behavior intact, so that once set to permanently
disabled, it cannot be undone aside from a reboot.
We've also added extra2 with max of 2 for the procfs handler, so that an admin
still has a chance to toggle between 0 <-> 2.
Either way, as an additional alternative, applications can make use of CAP_BPF
that we added a while ago.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/74ec548079189e4e4dffaeb42b8987bb3c852eee.1620765074.git.daniel@iogearbox.net
Pull powerpc updates from Michael Ellerman:
- Enable KFENCE for 32-bit.
- Implement EBPF for 32-bit.
- Convert 32-bit to do interrupt entry/exit in C.
- Convert 64-bit BookE to do interrupt entry/exit in C.
- Changes to our signal handling code to use user_access_begin/end()
more extensively.
- Add support for time namespaces (CONFIG_TIME_NS)
- A series of fixes that allow us to reenable STRICT_KERNEL_RWX.
- Other smaller features, fixes & cleanups.
Thanks to Alexey Kardashevskiy, Andreas Schwab, Andrew Donnellan, Aneesh
Kumar K.V, Athira Rajeev, Bhaskar Chowdhury, Bixuan Cui, Cédric Le
Goater, Chen Huang, Chris Packham, Christophe Leroy, Christopher M.
Riedl, Colin Ian King, Dan Carpenter, Daniel Axtens, Daniel Henrique
Barboza, David Gibson, Davidlohr Bueso, Denis Efremov, dingsenjie,
Dmitry Safonov, Dominic DeMarco, Fabiano Rosas, Ganesh Goudar, Geert
Uytterhoeven, Geetika Moolchandani, Greg Kurz, Guenter Roeck, Haren
Myneni, He Ying, Jiapeng Chong, Jordan Niethe, Laurent Dufour, Lee
Jones, Leonardo Bras, Li Huafei, Madhavan Srinivasan, Mahesh Salgaonkar,
Masahiro Yamada, Nathan Chancellor, Nathan Lynch, Nicholas Piggin,
Oliver O'Halloran, Paul Menzel, Pu Lehui, Randy Dunlap, Ravi Bangoria,
Rosen Penev, Russell Currey, Santosh Sivaraj, Sebastian Andrzej Siewior,
Segher Boessenkool, Shivaprasad G Bhat, Srikar Dronamraju, Stephen
Rothwell, Thadeu Lima de Souza Cascardo, Thomas Gleixner, Tony Ambardar,
Tyrel Datwyler, Vaibhav Jain, Vincenzo Frascino, Xiongwei Song, Yang Li,
Yu Kuai, and Zhang Yunkai.
* tag 'powerpc-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (302 commits)
powerpc/signal32: Fix erroneous SIGSEGV on RT signal return
powerpc: Avoid clang uninitialized warning in __get_user_size_allowed
powerpc/papr_scm: Mark nvdimm as unarmed if needed during probe
powerpc/kvm: Fix build error when PPC_MEM_KEYS/PPC_PSERIES=n
powerpc/kasan: Fix shadow start address with modules
powerpc/kernel/iommu: Use largepool as a last resort when !largealloc
powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE() to save TCEs
powerpc/44x: fix spelling mistake in Kconfig "varients" -> "variants"
powerpc/iommu: Annotate nested lock for lockdep
powerpc/iommu: Do not immediately panic when failed IOMMU table allocation
powerpc/iommu: Allocate it_map by vmalloc
selftests/powerpc: remove unneeded semicolon
powerpc/64s: remove unneeded semicolon
powerpc/eeh: remove unneeded semicolon
powerpc/selftests: Add selftest to test concurrent perf/ptrace events
powerpc/selftests/perf-hwbreak: Add testcases for 2nd DAWR
powerpc/selftests/perf-hwbreak: Coalesce event creation code
powerpc/selftests/ptrace-hwbreak: Add testcases for 2nd DAWR
powerpc/configs: Add IBMVNIC to some 64-bit configs
selftests/powerpc: Add uaccess flush test
...
Implement Extended Berkeley Packet Filter on Powerpc 32
Test result with test_bpf module:
test_bpf: Summary: 378 PASSED, 0 FAILED, [354/366 JIT'ed]
Registers mapping:
[BPF_REG_0] = r11-r12
/* function arguments */
[BPF_REG_1] = r3-r4
[BPF_REG_2] = r5-r6
[BPF_REG_3] = r7-r8
[BPF_REG_4] = r9-r10
[BPF_REG_5] = r21-r22 (Args 9 and 10 come in via the stack)
/* non volatile registers */
[BPF_REG_6] = r23-r24
[BPF_REG_7] = r25-r26
[BPF_REG_8] = r27-r28
[BPF_REG_9] = r29-r30
/* frame pointer aka BPF_REG_10 */
[BPF_REG_FP] = r17-r18
/* eBPF jit internal registers */
[BPF_REG_AX] = r19-r20
[TMP_REG] = r31
As PPC32 doesn't have a redzone in the stack, a stack frame must always
be set in order to host at least the tail count counter.
The stack frame remains for tail calls, it is set by the first callee
and freed by the last callee.
r0 is used as temporary register as much as possible. It is referenced
directly in the code in order to avoid misusing it, because some
instructions interpret it as value 0 instead of register r0
(ex: addi, addis, stw, lwz, ...)
The following operations are not implemented:
case BPF_ALU64 | BPF_DIV | BPF_X: /* dst /= src */
case BPF_ALU64 | BPF_MOD | BPF_X: /* dst %= src */
case BPF_STX | BPF_XADD | BPF_DW: /* *(u64 *)(dst + off) += src */
The following operations are only implemented for power of two constants:
case BPF_ALU64 | BPF_MOD | BPF_K: /* dst %= imm */
case BPF_ALU64 | BPF_DIV | BPF_K: /* dst /= imm */
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/61d8b149176ddf99e7d5cef0b6dc1598583ca202.1616430991.git.christophe.leroy@csgroup.eu
netdev_unregister_timeout_secs=0 can lead to printing the
"waiting for dev to become free" message every jiffy.
This is too frequent and unnecessary.
Set the min value to 1 second.
Also fix the merge issue introduced by
"net: make unregister netdev warning timeout configurable":
it changed "refcnt != 1" to "refcnt".
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Fixes: 5aa3afe107 ("net: make unregister netdev warning timeout configurable")
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
netdev_wait_allrefs() issues a warning if refcount does not drop to 0
after 10 seconds. While 10 second wait generally should not happen
under normal workload in normal environment, it seems to fire falsely
very often during fuzzing and/or in qemu emulation (~10x slower).
At least it's not possible to understand if it's really a false
positive or not. Automated testing generally bumps all timeouts
to very high values to avoid flake failures.
Add net.core.netdev_unregister_timeout_secs sysctl to make
the timeout configurable for automated testing systems.
Lowering the timeout may also be useful for e.g. manual bisection.
The default value matches the current behavior.
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=211877
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
I went to go add a new RECLAIM_* mode for the zone_reclaim_mode sysctl.
Like a good kernel developer, I also went to go update the
documentation. I noticed that the bits in the documentation didn't
match the bits in the #defines.
The VM never explicitly checks the RECLAIM_ZONE bit. The bit is,
however implicitly checked when checking 'node_reclaim_mode==0'. The
RECLAIM_ZONE #define was removed in a cleanup. That, by itself is fine.
But, when the bit was removed (bit 0) the _other_ bit locations also got
changed. That's not OK because the bit values are documented to mean
one specific thing. Users surely do not expect the meaning to change
from kernel to kernel.
The end result is that if someone had a script that did:
sysctl vm.zone_reclaim_mode=1
it would have gone from enabling node reclaim for clean unmapped pages
to writing out pages during node reclaim after the commit in question.
That's not great.
Put the bits back the way they were and add a comment so something like
this is a bit harder to do again. Update the documentation to make it
clear that the first bit is ignored.
Link: https://lkml.kernel.org/r/20210219172555.FF0CDF23@viggo.jf.intel.com
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Fixes: 648b5cf368 ("mm/vmscan: remove unused RECLAIM_OFF/RECLAIM_ZONE")
Reviewed-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Daniel Wagner <dwagner@suse.de>
Cc: "Tobin C. Harding" <tobin@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Qian Cai <cai@lca.pw>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull documentation fixes from Jonathan Corbet:
"A small set of late-arriving, small documentation fixes"
* tag 'docs-5.11-2' of git://git.lwn.net/linux:
docs: admin-guide: Fix default value of max_map_count in sysctl/vm.rst
Documentation/submitting-patches: Document the SoB chain
Documentation: process: Correct numbering
docs: submitting-patches: Trivial - fix grammatical error