Commit Graph

1105818 Commits

Author SHA1 Message Date
Tetsuo Handa
bd27acaac2 lib/smp_processor_id: fix imbalanced instrumentation_end() call
Currently instrumentation_end() won't be called if printk_ratelimit()
returned false.

Link: https://lkml.kernel.org/r/a636d8e0-ad32-5888-acac-671f7f553bb3@I-love.SAKURA.ne.jp
Fixes: 126f21f0e8 ("lib/smp_processor_id: Move it into noinstr section")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexandre Chartre <alexandre.chartre@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17 17:31:41 -07:00
Sander Vanheule
953257a925 cpumask: update cpumask_next_wrap() signature
The extern specifier is not needed for this declaration, so drop it.  The
function also depends only on the input parameters, and has no side
effects, so it can be marked __pure like other functions in cpumask.h.

Link: https://lkml.kernel.org/r/72ab755695b74bb5fbaa756ae4c0edd708d172f1.1656777646.git.sander@svanheule.net
Signed-off-by: Sander Vanheule <sander@svanheule.net>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Marco Elver <elver@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17 17:31:41 -07:00
Sander Vanheule
c41e8866c2 lib/test: introduce cpumask KUnit test suite
Add a basic suite of tests for cpumask, providing some tests for empty and
completely filled cpumasks.

Link: https://lkml.kernel.org/r/c96980ec35c3bd23f17c3374bf42c22971545e85.1656777646.git.sander@svanheule.net
Signed-off-by: Sander Vanheule <sander@svanheule.net>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Suggested-by: Yury Norov <yury.norov@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Marco Elver <elver@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17 17:31:41 -07:00
Sander Vanheule
b81dce77ce cpumask: Fix invalid uniprocessor mask assumption
On uniprocessor builds, any CPU mask is assumed to contain exactly one CPU
(cpu0).  This assumption ignores the existence of empty masks, resulting
in incorrect behaviour.

cpumask_first_zero(), cpumask_next_zero(), and for_each_cpu_not() don't
provide behaviour matching the assumption that a UP mask is always "1",
and instead provide behaviour matching the empty mask.

Drop the incorrectly optimised code and use the generic implementations in
all cases.

Link: https://lkml.kernel.org/r/86bf3f005abba2d92120ddd0809235cab4f759a6.1656777646.git.sander@svanheule.net
Signed-off-by: Sander Vanheule <sander@svanheule.net>
Suggested-by: Yury Norov <yury.norov@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Marco Elver <elver@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17 17:31:41 -07:00
Sander Vanheule
4f09903078 cpumask: add UP optimised for_each_*_cpu versions
On uniprocessor builds, the following loops will always run over a mask
that contains one enabled CPU (cpu0):

    - for_each_possible_cpu
    - for_each_online_cpu
    - for_each_present_cpu

Provide uniprocessor-specific macros for these loops, that always run
exactly once.

Link: https://lkml.kernel.org/r/3a92869b902a075b97be5d1452c9c6badbbff0df.1656777646.git.sander@svanheule.net
Signed-off-by: Sander Vanheule <sander@svanheule.net>
Acked-by: Yury Norov <yury.norov@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Marco Elver <elver@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17 17:31:41 -07:00
Sander Vanheule
adbcaef840 x86/cacheinfo: move shared cache map definitions
Patch series "cpumask: Fix invalid uniprocessor assumptions", v4.

On uniprocessor builds, it is currently assumed that any cpumask will
contain the single CPU: cpu0.  This assumption is used to provide
optimised implementations.

The current assumption also appears to be wrong, by ignoring the fact that
users can provide empty cpumasks.  This can result in bugs as explained in
[1] - for_each_cpu() will run one iteration of the loop even when passed
an empty cpumask.

This series introduces some basic tests, and updates the optimisations for
uniprocessor builds.

The x86 patch was written after the kernel test robot [2] ran into a
failed build.  I have tried to list the files potentially affected by the
changes to cpumask.h, in an attempt to find any other cases that fail on
!SMP.  I've gone through some of the files manually, and ran a few cross
builds, but nothing else popped up.  I (build) checked about half of the
potientally affected files, but I do not have the resources to do them
all.  I hope we can fix other issues if/when they pop up later.

[1] https://lore.kernel.org/all/20220530082552.46113-1-sander@svanheule.net/
[2] https://lore.kernel.org/all/202206060858.wA0FOzRy-lkp@intel.com/


This patch (of 5):

The maps to keep track of shared caches between CPUs on SMP systems are
declared in asm/smp.h, among them specifically cpu_llc_shared_map.  These
maps are externally defined in cpu/smpboot.c.  The latter is only compiled
on CONFIG_SMP=y, which means the declared extern symbols from asm/smp.h do
not have a corresponding definition on uniprocessor builds.

The inline cpu_llc_shared_mask() function from asm/smp.h refers to the map
declaration mentioned above.  This function is referenced in cacheinfo.c
inside for_each_cpu() loop macros, to provide cpumask for the loop.  On
uniprocessor builds, the symbol for the cpu_llc_shared_map does not exist.
However, the current implementation of for_each_cpu() also (wrongly)
ignores the provided mask.

By sheer luck, the compiler thus optimises out this unused reference to
cpu_llc_shared_map, and the linker therefore does not require the
cpu_llc_shared_mask to actually exist on uniprocessor builds.  Only on SMP
bulids does smpboot.o exist to provide the required symbols.

To no longer rely on compiler optimisations for successful uniprocessor
builds, move the definitions of cpu_llc_shared_map and cpu_l2c_shared_map
from smpboot.c to cacheinfo.c.

Link: https://lkml.kernel.org/r/cover.1656777646.git.sander@svanheule.net
Link: https://lkml.kernel.org/r/e8167ddb570f56744a3dc12c2149a660a324d969.1656777646.git.sander@svanheule.net
Signed-off-by: Sander Vanheule <sander@svanheule.net>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Marco Elver <elver@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Yury Norov <yury.norov@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17 17:31:40 -07:00
Nikolay Borisov
8b5db66798 scripts/bloat-o-meter: add -p argument
When doing cross platform development on a machine sometimes it might be
useful to invoke bloat-o-meter for files which haven't been build with the
native toolchain.  In cases when the host nm doesn't support the target
one then a toolchain-specific nm could be used.  Add this ability by
adding the -p allowing invocations as:

./scripts/bloat-o-meter -p riscv64-unknown-linux-gnu- file1.o file2.o

Link: https://lkml.kernel.org/r/20220701113513.1938008-2-nborisov@suse.com
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17 17:31:40 -07:00
Nikolay Borisov
b62eb2731e scripts/bloat-o-meter: switch argument parsing to using argparse
This will facilitate further extension to the arguments the script takes. 
As an added benefit it also produces saner usage output, where mutual
exclusivity of the c|d|t parameters is clearly visible:

./scripts/bloat-o-meter  -h
usage: bloat-o-meter [-h] [-c | -d | -t] file1 file2

Simple script used to compare the symbol sizes of 2 object files

positional arguments:
  file1       First file to compare
  file2       Second file to compare

optional arguments:
  -h, --help  show this help message and exit
  -c          categorize output based on symbol type
  -d          Show delta of Data Section
  -t          Show delta of text Section

Link: https://lkml.kernel.org/r/20220701113513.1938008-1-nborisov@suse.com
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17 17:31:40 -07:00
Benjamin Segall
a16ceb1396 epoll: autoremove wakers even more aggressively
If a process is killed or otherwise exits while having active network
connections and many threads waiting on epoll_wait, the threads will all
be woken immediately, but not removed from ep->wq.  Then when network
traffic scans ep->wq in wake_up, every wakeup attempt will fail, and will
not remove the entries from the list.

This means that the cost of the wakeup attempt is far higher than usual,
does not decrease, and this also competes with the dying threads trying to
actually make progress and remove themselves from the wq.

Handle this by removing visited epoll wq entries unconditionally, rather
than only when the wakeup succeeds - the structure of ep_poll means that
the only potential loss is the timed_out->eavail heuristic, which now can
race and result in a redundant ep_send_events attempt.  (But only when
incoming data and a timeout actually race, not on every timeout)

Shakeel added:

: We are seeing this issue in production with real workloads and it has
: caused hard lockups.  Particularly network heavy workloads with a lot
: of threads in epoll_wait() can easily trigger this issue if they get
: killed (oom-killed in our case).

Link: https://lkml.kernel.org/r/xm26fsjotqda.fsf@google.com
Signed-off-by: Ben Segall <bsegall@google.com>
Tested-by: Shakeel Butt <shakeelb@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Roman Penyaev <rpenyaev@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Khazhismel Kumykov <khazhy@google.com>
Cc: Heiher <r@hev.cc>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17 17:31:40 -07:00
Yu Zhe
2c795fb03f ipc/mqueue: remove unnecessary (void*) conversion
Remove unnecessary void* type casting.

Link: https://lkml.kernel.org/r/20220628021251.17197-1-yuzhe@nfschina.com
Signed-off-by: Yu Zhe <yuzhe@nfschina.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17 17:31:40 -07:00
Tao Liu
46d36b1be1 kdump: round up the total memory size to 128M for crashkernel reservation
The total memory size we get in kernel is usually slightly less than the
actual memory size because BIOS/firmware will reserve some memory region. 
So it won't export all memory as usable.

E.g, on my x86_64 kvm guest with 1G memory, the total_mem value shows:
UEFI boot with ovmf: 0x3faef000 Legacy boot kvm guest: 0x3ff7ec00

When specifying crashkernel=1G-2G:128M, if we have a 1G memory machine, we
get total size 1023M from firmware.  Then it will not fall into 1G-2G,
thus no memory reserved.  User will never know this, it is hard to let
user know the exact total value in kernel.

One way is to use dmi/smbios to get physical memory size, but it's not
reliable as well.  According to Prarit hardware vendors sometimes screw
this up.  Thus round up total size to 128M to work around this problem.

This patch is a resend of [1] and rebased onto v5.19-rc2, and the
original credit goes to Dave Young.

[1]: http://lists.infradead.org/pipermail/kexec/2018-April/020568.html

Link: https://lkml.kernel.org/r/20220627074440.187222-1-ltao@redhat.com
Signed-off-by: Tao Liu <ltao@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17 17:31:40 -07:00
Alexey Dobriyan
376b0c2661 proc: delete unused <linux/uaccess.h> includes
Those aren't necessary after seq files won.

Link: https://lkml.kernel.org/r/YqnA3mS7KBt8Z4If@localhost.localdomain
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17 17:31:39 -07:00
Stephen Brennan
5fd8fea935 vmcoreinfo: include kallsyms symbols
The internal kallsyms tables contain information which could be quite
useful to a debugging tool in the absence of other debuginfo.  If kallsyms
is enabled, then a debugging tool could parse it and use it as a fallback
symbol table.  Combined with BTF data, live & post-mortem debuggers can
support basic operations without needing a large DWARF debuginfo file
available.  As many as five symbols are necessary to properly parse
kallsyms names and addresses.  Add these to the vmcoreinfo note.

CONFIG_KALLSYMS_ABSOLUTE_PERCPU does impact the computation of symbol
addresses.  However, a debugger can infer this configuration value by
comparing the address of _stext in the vmcoreinfo with the address
computed via kallsyms.  So there's no need to include information about
this config value in the vmcoreinfo note.

To verify that we're still well below the maximum of 4096 bytes, I created
a script[1] to compute a rough upper bound on the possible size of
vmcoreinfo.  On v5.18-rc7, the script reports 3106 bytes, and with this
patch, the maximum become 3370 bytes.

[1]: https://github.com/brenns10/kernel_stuff/blob/master/vmcoreinfosize/

Link: https://lkml.kernel.org/r/20220517000508.777145-3-stephen.s.brennan@oracle.com
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Bixuan Cui <cuibixuan@huawei.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Vernet <void@manifault.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17 17:31:39 -07:00
Stephen Brennan
71f8c15565 kallsyms: move declarations to internal header
Patch series "Expose kallsyms data in vmcoreinfo note".

The kernel can be configured to contain a lot of introspection or
debugging information built-in, such as ORC for unwinding stack traces,
BTF for type information, and of course kallsyms.  Debuggers could use
this information to navigate a core dump or live system, but they need to
be able to find it.

This patch series adds the necessary symbols into vmcoreinfo, which would
allow a debugger to find and interpret the kallsyms table.  Using the
kallsyms data, the debugger can then lookup any symbol, allowing it to
find ORC, BTF, or any other useful data.

This would allow a live kernel, or core dump, to be debugged without any
DWARF debuginfo.  This is useful for many cases: the debuginfo may not
have been generated, or you may not want to deploy the large files
everywhere you need them.

I've demonstrated a proof of concept for this at LSF/MM+BPF during a
lighting talk.  Using a work-in-progress branch of the drgn debugger, and
an extended set of BTF generated by a patched version of dwarves, I've
been able to open a core dump without any DWARF info and do basic tasks
such as enumerating slab caches, block devices, tasks, and doing
backtraces.  I hope this series can be a first step toward a new
possibility of "DWARFless debugging".

Related discussion around the BTF side of this:
https://lore.kernel.org/bpf/586a6288-704a-f7a7-b256-e18a675927df@oracle.com/T/#u

Some work-in-progress branches using this feature:
https://github.com/brenns10/dwarves/tree/remove_percpu_restriction_1
https://github.com/brenns10/drgn/tree/kallsyms_plus_btf


This patch (of 2):

To include kallsyms data in the vmcoreinfo note, we must make the symbol
declarations visible outside of kallsyms.c.  Move these to a new internal
header file.

Link: https://lkml.kernel.org/r/20220517000508.777145-1-stephen.s.brennan@oracle.com
Link: https://lkml.kernel.org/r/20220517000508.777145-2-stephen.s.brennan@oracle.com
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Bixuan Cui <cuibixuan@huawei.com>
Cc: David Vernet <void@manifault.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17 17:31:39 -07:00
Colin Ian King
4a70ce5f93 lib/ts_bm.c: remove redundant store to variable consumed after addition
There is no need to store the result of the addition back to variable
consumed after the addition.  The store is redundant, replace += with just
+

Cleans up clang scan build warning: lib/ts_bm.c:83:11: warning: Although
the value stored to 'consumed' is used in the enclosing expression, the
value is never actually read from 'consumed' [deadcode.DeadStores]

Link: https://lkml.kernel.org/r/20220704215325.600993-1-colin.i.king@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17 17:31:39 -07:00
wuchi
6d529ea80b lib/scatterlist: use matched parameter type when calling __sg_free_table()
commit 4635873c56 ("scsi: lib/sg_pool.c: improve APIs for allocating sg
pool") changeed @(bool)skip_first_chunk of __sg_free_table() to @(unsigned
int)nents_first_chunk, so use unsigend int type instead of bool type
(false -> 0) when calling the function in sg_free_append_table() and
sg_free_table().

Link: https://lkml.kernel.org/r/20220629030241.84559-1-wuchi.zero@gmail.com
Signed-off-by: wuchi <wuchi.zero@gmail.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Cc: Maor Gottlieb <maorg@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17 17:31:39 -07:00
Tiezhu Yang
2d8867f3e0 lib: make LZ4_decompress_safe_forceExtDict() static
LZ4_decompress_safe_forceExtDict() is only used in
lib/lz4/lz4_decompress.c, make it static to fix the build warning about
"no previous prototype" [1].

[1] https://lore.kernel.org/lkml/202206260948.akgsho1q-lkp@intel.com/

Link: https://lkml.kernel.org/r/1656298965-8698-1-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17 17:31:39 -07:00
wuchi
cda83bb8a6 lib/radix-tree: remove unused argument of insert_entries
insert_entries() doesn't use the 'bool replace' argument, and the function
is only used locally, remove the argument.

The historical context of the unused argument is as follow:

2: commit <3a08cd52c37c79> (radix tree: Remove multiorder support)
  Remove the code related to macro CONFIG_RADIX_TREE_MULTIORDER
to convert to the xArray.
  Without the macro, there is no need to retain the argument.

1: commit <175542f575723e> (radix-tree: add radix_tree_join)
  Add insert_entries(..., bool replace) function, depending on the
macro CONFIG_RADIX_TREE_MULTIORDER definition, the implementation
is different. Notice that the implementation without the macro doesn't
use the argument.

[Matthew Wilcox: add historical context for argument]

Link: https://lkml.kernel.org/r/20220625135324.72574-1-wuchi.zero@gmail.com
Signed-off-by: wuchi <wuchi.zero@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17 17:31:38 -07:00
Dan Carpenter
045ed31e23 kfifo: fix kfifo_to_user() return type
The kfifo_to_user() macro is supposed to return zero for success or
negative error codes.  Unfortunately, there is a signedness bug so it
returns unsigned int.  This only affects callers which try to save the
result in ssize_t and as far as I can see the only place which does that
is line6_hwdep_read().

TL;DR: s/_uint/_int/.

Link: https://lkml.kernel.org/r/YrVL3OJVLlNhIMFs@kili
Fixes: 144ecf310e ("kfifo: fix kfifo_alloc() to return a signed int value")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Stefani Seibold <stefani@seibold.net>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17 17:31:38 -07:00
Uros Bizjak
43c249ea0b compiler-gcc.h: remove ancient workaround for gcc PR 58670
The workaround for 'asm goto' miscompilation introduces a compiler barrier
quirk that inhibits many useful compiler optimizations.  For example,
__try_cmpxchg_user compiles to:

   11375:	41 8b 4d 00          	mov    0x0(%r13),%ecx
   11379:	41 8b 02             	mov    (%r10),%eax
   1137c:	f0 0f b1 0a          	lock cmpxchg %ecx,(%rdx)
   11380:	0f 94 c2             	sete   %dl
   11383:	84 d2                	test   %dl,%dl
   11385:	75 c4                	jne    1134b <...>
   11387:	41 89 02             	mov    %eax,(%r10)

where the barrier inhibits flags propagation from asm when compiled with
gcc-12.

When the mentioned quirk is removed, the following code is generated:

   11553:	41 8b 4d 00          	mov    0x0(%r13),%ecx
   11557:	41 8b 02             	mov    (%r10),%eax
   1155a:	f0 0f b1 0a          	lock cmpxchg %ecx,(%rdx)
   1155e:	74 c9                	je     11529 <...>
   11560:	41 89 02             	mov    %eax,(%r10)

The refered compiler bug:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670

was fixed for gcc-4.8.2.

Current minimum required version of GCC is version 5.1 which has the above
'asm goto' miscompilation fixed, so remove the workaround.

Link: https://lkml.kernel.org/r/20220624141412.72274-1-ubizjak@gmail.com
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17 17:31:38 -07:00
wuchi
86e5908ec2 lib/error-inject: traverse list with mutex
Traversing list without mutex in get_injectable_error_type will
race with the following code:
    list_del_init(&ent->list)
    kfree(ent)
in module_unload_ei_list. So fix that.

Link: https://lkml.kernel.org/r/20220620100244.82896-1-wuchi.zero@gmail.com
Signed-off-by: wuchi <wuchi.zero@gmail.com>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17 17:31:38 -07:00
Vlastimil Babka
f9987921cb lib/stackdepot: replace CONFIG_STACK_HASH_ORDER with automatic sizing
As Linus explained [1], setting the stackdepot hash table size as a config
option is suboptimal, especially as stackdepot becomes a dependency of
less "expert" subsystems than initially (e.g.  DRM, networking,
SLUB_DEBUG):

: (a) it introduces a new compile-time question that isn't sane to ask
: a regular user, but is now exposed to regular users.

: (b) this by default uses 1MB of memory for a feature that didn't in
: the past, so now if you have small machines you need to make sure you
: make a special kernel config for them.

Ideally we would employ rhashtable for fully automatic resizing, which
should be feasible for many of the new users, but problematic for the
original users with restricted context that call __stack_depot_save() with
can_alloc == false, i.e.  KASAN.

However we can easily remove the config option and scale the hash table
automatically with system memory.  The STACK_HASH_MASK constant becomes
stack_hash_mask variable and is used only in one mask operation, so the
overhead should be negligible to none.  For early allocation we can employ
the existing alloc_large_system_hash() function and perform similar
scaling for the late allocation.

The existing limits of the config option (between 4k and 1M buckets) are
preserved, and scaling factor is set to one bucket per 16kB memory so on
64bit the max 1M buckets (8MB memory) is achieved with 16GB system, while
a 1GB system will use 512kB.

Because KASAN is reported to need the maximum number of buckets even with
smaller amounts of memory [2], set it as such when kasan_enabled().

If needed, the automatic scaling could be complemented with a boot-time
kernel parameter, but it feels pointless to add it without a specific use
case.

[1] https://lore.kernel.org/all/CAHk-=wjC5nS+fnf6EzRD9yQRJApAhxx7gRB87ZV+pAWo9oVrTg@mail.gmail.com/
[2] https://lore.kernel.org/all/CACT4Y+Y4GZfXOru2z5tFPzFdaSUd+GFc6KVL=bsa0+1m197cQQ@mail.gmail.com/

Link: https://lkml.kernel.org/r/20220620150249.16814-1-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17 17:31:38 -07:00
wuchi
62df90b53e net, lib/once: remove {net_}get_random_once_wait macro
DO_ONCE(func, ...) will call func with spinlock which acquired by
spin_lock_irqsave in __do_once_start.  But the get_random_once_wait will
sleep in get_random_bytes_wait -> wait_for_random_bytes.

Fortunately, there is no place to use {net_}get_random_once_wait, so we
could remove them simply.

Link: https://lkml.kernel.org/r/20220619074641.40916-1-wuchi.zero@gmail.com
Signed-off-by: wuchi <wuchi.zero@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17 17:31:37 -07:00
wuchi
5a66fce95b lib/lru_cache: fix error free handing in lc_create
When kmem_cache_alloc in function lc_create returns null, we will
free the memory already allocated. The loop of kmem_cache_free
is wrong, especially:
  i = 0  ==> do wrong loop
  i > 0  ==> do not free element[0]

Link: https://lkml.kernel.org/r/20220618082521.7082-1-wuchi.zero@gmail.com
Signed-off-by: wuchi <wuchi.zero@gmail.com>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Christoph Bhmwalder <christoph.boehmwalder@linbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17 17:31:37 -07:00
Dan Moulding
5a704629f2 init: add "hostname" kernel parameter
The gethostname system call returns the hostname for the current machine. 
However, the kernel has no mechanism to initially set the current
machine's name in such a way as to guarantee that the first userspace
process to call gethostname will receive a meaningful result.  It relies
on some unspecified userspace process to first call sethostname before
gethostname can produce a meaningful name.

Traditionally the machine's hostname is set from userspace by the init
system.  The init system, in turn, often relies on a configuration file
(say, /etc/hostname) to provide the value that it will supply in the call
to sethostname.  Consequently, the file system containing /etc/hostname
usually must be available before the hostname will be set.  There may,
however, be earlier userspace processes that could call gethostname before
the file system containing /etc/hostname is mounted.  Such a process will
get some other, likely meaningless, name from gethostname (such as
"(none)", "localhost", or "darkstar").

A real-world example where this can happen, and lead to undesirable
results, is with mdadm.  When assembling arrays, mdadm distinguishes
between "local" arrays and "foreign" arrays.  A local array is one that
properly belongs to the current machine, and a foreign array is one that
is (possibly temporarily) attached to the current machine, but properly
belongs to some other machine.  To determine if an array is local or
foreign, mdadm may compare the "homehost" recorded on the array with the
current hostname.  If mdadm is run before the root file system is mounted,
perhaps because the root file system itself resides on an md-raid array,
then /etc/hostname isn't yet available and the init system will not yet
have called sethostname, causing mdadm to incorrectly conclude that all of
the local arrays are foreign.

Solving this problem *could* be delegated to the init system.  It could be
left up to the init system (including any init system that starts within
an initramfs, if one is in use) to ensure that sethostname is called
before any other userspace process could possibly call gethostname. 
However, it may not always be obvious which processes could call
gethostname (for example, udev itself might not call gethostname, but it
could via udev rules invoke processes that do).  Additionally, the init
system has to ensure that the hostname configuration value is stored in
some place where it will be readily accessible during early boot. 
Unfortunately, every init system will attempt to (or has already attempted
to) solve this problem in a different, possibly incorrect, way.  This
makes getting consistently working configurations harder for users.

I believe it is better for the kernel to provide the means by which the
hostname may be set early, rather than making this a problem for the init
system to solve.  The option to set the hostname during early startup, via
a kernel parameter, provides a simple, reliable way to solve this problem.
It also could make system configuration easier for some embedded systems.

[dmoulding@me.com: v2]
  Link: https://lkml.kernel.org/r/20220506060310.7495-2-dmoulding@me.com
Link: https://lkml.kernel.org/r/20220505180651.22849-2-dmoulding@me.com
Signed-off-by: Dan Moulding <dmoulding@me.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17 17:31:37 -07:00