linux

mirror of https://github.com/Dasharo/linux.git synced 2026-03-06 15:25:10 -08:00

Author	SHA1	Message	Date
Linus Torvalds	9a0f53e0cf	Merge tag 'csd-lock.2023.10.23a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull CSD lock update from Paul McKenney: "This adds a kernel boot parameter that causes the kernel to panic if one of the call_smp_function() APIs is stalled for more than the specified duration. This is useful in deployments in which a clean panic is preferable to an indefinite stall" * tag 'csd-lock.2023.10.23a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: smp,csd: Throw an error if a CSD lock is stuck for too long	2023-10-30 17:56:53 -10:00
Thomas Gleixner	a940daa521	Merge branch 'linus' into smp/core Pull in upstream to get the fixes so depending changes can be applied.	2023-10-17 21:40:46 +02:00
Rik van Riel	94b3f0b5af	smp,csd: Throw an error if a CSD lock is stuck for too long The CSD lock seems to get stuck in 2 "modes". When it gets stuck temporarily, it usually gets released in a few seconds, and sometimes up to one or two minutes. If the CSD lock stays stuck for more than several minutes, it never seems to get unstuck, and gradually more and more things in the system end up also getting stuck. In the latter case, we should just give up, so the system can dump out a little more information about what went wrong, and, with panic_on_oops and a kdump kernel loaded, dump a whole bunch more information about what might have gone wrong. In addition, there is an smp.panic_on_ipistall kernel boot parameter that by default retains the old behavior, but when set enables the panic after the CSD lock has been stuck for more than the specified number of milliseconds, as in 300,000 for five minutes. [ paulmck: Apply Imran Khan feedback. ] [ paulmck: Apply Leonardo Bras feedback. ] Link: https://lore.kernel.org/lkml/bc7cc8b0-f587-4451-8bcd-0daae627bcc7@paulmck-laptop/ Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Imran Khan <imran.f.khan@oracle.com> Reviewed-by: Leonardo Bras <leobras@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Randy Dunlap <rdunlap@infradead.org>	2023-10-16 16:06:37 -07:00
Leonardo Bras	d090ec0df8	smp: Change function signatures to use call_single_data_t call_single_data_t is a size-aligned typedef of struct __call_single_data. This alignment is desirable in order to have smp_call_function() avoid bouncing an extra cacheline in case of an unaligned csd, given this would hurt performance. Since the removal of struct request->csd in commit `660e802c76` ("blk-mq: use percpu csd to remote complete instead of per-rq csd") there are no current users of smp_call_function() with unaligned csd. Change every 'struct __call_single_data' function parameter to 'call_single_data_t', so we have warnings if any new code tries to introduce an smp_call_function*() call with unaligned csd. Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Guo Ren <guoren@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230831063129.335425-1-leobras@redhat.com	2023-09-13 14:59:24 +02:00
Imran Khan	0d3a00b370	smp: Reduce NMI traffic from CSD waiters to CSD destination On systems with hundreds of CPUs, if most of the CPUs detect a CSD hang, then all of these waiting CPUs send an NMI to the destination CPU in order to dump its backtrace. Given enough NMIs, the destination CPU will spent much of its time producing backtraces, thus further delaying that CPU's response to the original CSD IPI. In the worst case, by the time destination CPU is done producing all of these backtrace NMIs, the CSD wait timeout will have elapsed so that the waiters resend their backtrace NMIs again, further delaying forward progress. Therefore, to avoid these delays, issue the backtrace NMI only from the first waiter. The destination CPU's other waiters can make use of backtrace obtained from the first waiter's NMI. Signed-off-by: Imran Khan <imran.f.khan@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Yury Norov <yury.norov@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2023-07-10 14:19:04 -07:00
Imran Khan	5bd00f6db0	smp: Reduce logging due to dump_stack of CSD waiters If a waiter is waiting for CSD lock, its call stack will not change between first and subsequent hang detection for the same CSD lock. Therefore, do dump_stack only for first-time detection for a given waiter. This avoids excessive logging on systems with hundreds of CPUs where repetitive dump_stack from hundreds of CPUs would otherwise flood the console. Signed-off-by: Imran Khan <imran.f.khan@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Yury Norov <yury.norov@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2023-07-10 14:19:04 -07:00
Leonardo Bras	bf5a8c26ad	trace,smp: Add tracepoints for scheduling remotelly called functions Add a tracepoint for when a CSD is queued to a remote CPU's call_single_queue. This allows finding exactly which CPU queued a given CSD when looking at a csd_function_{entry,exit} event, and also enables us to accurately measure IPI delivery time with e.g. a synthetic event: $ echo 'hist:keys=cpu,csd.hex:ts=common_timestamp.usecs' >\ /sys/kernel/tracing/events/smp/csd_queue_cpu/trigger $ echo 'csd_latency unsigned int dst_cpu; unsigned long csd; u64 time' >\ /sys/kernel/tracing/synthetic_events $ echo \ 'hist:keys=common_cpu,csd.hex:'\ 'time=common_timestamp.usecs-$ts:'\ 'onmatch(smp.csd_queue_cpu).trace(csd_latency,common_cpu,csd,$time)' >\ /sys/kernel/tracing/events/smp/csd_function_entry/trigger $ trace-cmd record -e 'synthetic:csd_latency' hackbench $ trace-cmd report <...>-467 [001] 21.824263: csd_queue_cpu: cpu=0 callsite=try_to_wake_up+0x2ea func=sched_ttwu_pending csd=0xffff8880076148b8 <...>-467 [001] 21.824280: ipi_send_cpu: cpu=0 callsite=try_to_wake_up+0x2ea callback=generic_smp_call_function_single_interrupt+0x0 <...>-489 [000] 21.824299: csd_function_entry: func=sched_ttwu_pending csd=0xffff8880076148b8 <...>-489 [000] 21.824320: csd_latency: dst_cpu=0, csd=18446612682193848504, time=36 Suggested-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Leonardo Bras <leobras@redhat.com> Tested-and-reviewed-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20230615065944.188876-7-leobras@redhat.com	2023-06-16 22:08:09 +02:00
Leonardo Bras	949fa3f11c	trace,smp: Add tracepoints around remotelly called functions The recently added ipi_send_{cpu,cpumask} tracepoints allow finding sources of IPIs targeting CPUs running latency-sensitive applications. For NOHZ_FULL CPUs, all IPIs are interference, and those tracepoints are sufficient to find them and work on getting rid of them. In some setups however, not all IPIs are to be suppressed, but long-running IPI callbacks can still be problematic. Add a pair of tracepoints to mark the start and end of processing a CSD IPI callback, similar to what exists for softirq, workqueue or timer callbacks. Signed-off-by: Leonardo Bras <leobras@redhat.com> Tested-and-reviewed-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20230615065944.188876-5-leobras@redhat.com	2023-06-16 22:08:09 +02:00
Thomas Gleixner	ba831b7b1a	cpu/hotplug: Mark arch_disable_smp_support() and bringup_nonboot_cpus() __init No point in keeping them around. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Tested-by: Helge Deller <deller@gmx.de> # parisc Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck Link: https://lore.kernel.org/r/20230512205255.551974164@linutronix.de	2023-05-15 13:44:47 +02:00
Peter Zijlstra	5c3124975e	trace,smp: Trace all smp_function_call() invocations (Ab)use the trace_ipi_send_cpu() family to trace all smp_function_call*() invocations, not only those that result in an actual IPI. The queued entries log their callback function while the actual IPIs are traced on generic_smp_call_function_single_interrupt(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>	2023-03-24 11:01:30 +01:00
Peter Zijlstra	68e2d17c9e	trace: Add trace_ipi_send_cpu() Because copying cpumasks around when targeting a single CPU is a bit daft... Tested-and-reviewed-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20230322103004.GA571242%40hirez.programming.kicks-ass.net	2023-03-24 11:01:29 +01:00
Valentin Schneider	68f4ff04db	sched, smp: Trace smp callback causing an IPI Context ======= The newly-introduced ipi_send_cpumask tracepoint has a "callback" parameter which so far has only been fed with NULL. While CSD_TYPE_SYNC/ASYNC and CSD_TYPE_IRQ_WORK share a similar backing struct layout (meaning their callback func can be accessed without caring about the actual CSD type), CSD_TYPE_TTWU doesn't even have a function attached to its struct. This means we need to check the type of a CSD before eventually dereferencing its associated callback. This isn't as trivial as it sounds: the CSD type is stored in __call_single_node.u_flags, which get cleared right before the callback is executed via csd_unlock(). This implies checking the CSD type before it is enqueued on the call_single_queue, as the target CPU's queue can be flushed before we get to sending an IPI. Furthermore, send_call_function_single_ipi() only has a CPU parameter, and would need to have an additional argument to trickle down the invoked function. This is somewhat silly, as the extra argument will always be pushed down to the function even when nothing is being traced, which is unnecessary overhead. Changes ======= send_call_function_single_ipi() is only used by smp.c, and is defined in sched/core.c as it contains scheduler-specific ops (set_nr_if_polling() of a CPU's idle task). Split it into two parts: the scheduler bits remain in sched/core.c, and the actual IPI emission is moved into smp.c. This lets us define an __always_inline helper function that can take the related callback as parameter without creating useless register pressure in the non-traced path which only gains a (disabled) static branch. Do the same thing for the multi IPI case. Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20230307143558.294354-8-vschneid@redhat.com	2023-03-24 11:01:29 +01:00
Valentin Schneider	253a0fb4c6	smp: reword smp call IPI comment Accessing the call_single_queue hasn't involved a spinlock since 2014: `6897fc22ea` ("kernel: use lockless list for smp_call_function_single") The llist operations (namely cmpxchg() and xchg()) provide similar ordering guarantees, update the comment to lessen confusion. Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20230307143558.294354-7-vschneid@redhat.com	2023-03-24 11:01:28 +01:00
Valentin Schneider	08407b5f61	smp: Trace IPIs sent via arch_send_call_function_ipi_mask() This simply wraps around the arch function and prepends it with a tracepoint, similar to send_call_function_single_ipi(). Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20230307143558.294354-4-vschneid@redhat.com	2023-03-24 11:01:27 +01:00
Valentin Schneider	cc9cb0a717	sched, smp: Trace IPIs sent via send_call_function_single_ipi() send_call_function_single_ipi() is the thing that sends IPIs at the bottom of smp_call_function*() via either generic_exec_single() or smp_call_function_many_cond(). Give it an IPI-related tracepoint. Note that this ends up tracing any IPI sent via __smp_call_single_queue(), which covers __ttwu_queue_wakelist() and irq_work_queue_on() "for free". Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230307143558.294354-3-vschneid@redhat.com	2023-03-24 11:01:27 +01:00
Paul E. McKenney	203e435844	kernel/smp: Make csdlock_debug= resettable It is currently possible to set the csdlock_debug_enabled static branch, but not to reset it. This is an issue when several different entities supply kernel boot parameters and also for kernels built with CONFIG_CSD_LOCK_WAIT_DEBUG_DEFAULT=y. Therefore, make the csdlock_debug=0 kernel boot parameter turn off debugging. Last one wins! Reported-by: Jes Sorensen <Jes.Sorensen@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230321005516.50558-4-paulmck@kernel.org	2023-03-24 11:01:26 +01:00
Paul E. McKenney	6366d062e7	locking/csd_lock: Remove per-CPU data indirection from CSD lock debugging The diagnostics added by this commit were extremely useful in one instance: `a5aabace5f` ("locking/csd_lock: Add more data to CSD lock debugging") However, they have not seen much action since, and there have been some concerns expressed that the complexity is not worth the benefit. Therefore, manually revert the following commit preparatory commit: `de7b09ef65` ("locking/csd_lock: Prepare more CSD lock debugging") Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230321005516.50558-3-paulmck@kernel.org	2023-03-24 11:01:26 +01:00
Paul E. McKenney	1771257cb4	locking/csd_lock: Remove added data from CSD lock debugging The diagnostics added by this commit were extremely useful in one instance: `a5aabace5f` ("locking/csd_lock: Add more data to CSD lock debugging") However, they have not seen much action since, and there have been some concerns expressed that the complexity is not worth the benefit. Therefore, manually revert this commit, but leave a comment telling people where to find these diagnostics. [ paulmck: Apply Juergen Gross feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230321005516.50558-2-paulmck@kernel.org	2023-03-24 11:01:25 +01:00
Paul E. McKenney	c521986016	locking/csd_lock: Add Kconfig option for csd_debug default The csd_debug kernel parameter works well, but is inconvenient in cases where it is more closely associated with boot loaders or automation than with a particular kernel version or release. Thererfore, provide a new CSD_LOCK_WAIT_DEBUG_DEFAULT Kconfig option that defaults csd_debug to 1 when selected and 0 otherwise, with this latter being the default. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230321005516.50558-1-paulmck@kernel.org	2023-03-24 11:01:25 +01:00
Linus Torvalds	d4013bc4d4	Merge tag 'bitmap-6.1-rc1' of https://github.com/norov/linux Pull bitmap updates from Yury Norov: - Fix unsigned comparison to -1 in CPUMAP_FILE_MAX_BYTES (Phil Auld) - cleanup nr_cpu_ids vs nr_cpumask_bits mess (me) This series cleans that mess and adds new config FORCE_NR_CPUS that allows to optimize cpumask subsystem if the number of CPUs is known at compile-time. - optimize find_bit() functions (me) Reworks find_bit() functions based on new FIND_{FIRST,NEXT}_BIT() macros. - add find_nth_bit() (me) Adds find_nth_bit(), which is ~70 times faster than bitcounting with for_each() loop: for_each_set_bit(bit, mask, size) if (n-- == 0) return bit; Also adds bitmap_weight_and() to let people replace this pattern: tmp = bitmap_alloc(nbits); bitmap_and(tmp, map1, map2, nbits); weight = bitmap_weight(tmp, nbits); bitmap_free(tmp); with a single bitmap_weight_and() call. - repair cpumask_check() (me) After switching cpumask to use nr_cpu_ids, cpumask_check() started generating many false-positive warnings. This series fixes it. - Add for_each_cpu_andnot() and for_each_cpu_andnot() (Valentin Schneider) Extends the API with one more function and applies it in sched/core. * tag 'bitmap-6.1-rc1' of https://github.com/norov/linux: (28 commits) sched/core: Merge cpumask_andnot()+for_each_cpu() into for_each_cpu_andnot() lib/test_cpumask: Add for_each_cpu_and(not) tests cpumask: Introduce for_each_cpu_andnot() lib/find_bit: Introduce find_next_andnot_bit() cpumask: fix checking valid cpu range lib/bitmap: add tests for for_each() loops lib/find: optimize for_each() macros lib/bitmap: introduce for_each_set_bit_wrap() macro lib/find_bit: add find_next{,_and}_bit_wrap cpumask: switch for_each_cpu{,_not} to use for_each_bit() net: fix cpu_max_bits_warn() usage in netif_attrmask_next{,_and} cpumask: add cpumask_nth_{,and,andnot} lib/bitmap: remove bitmap_ord_to_pos lib/bitmap: add tests for find_nth_bit() lib: add find_nth{,_and,_andnot}_bit() lib/bitmap: add bitmap_weight_and() lib/bitmap: don't call __bitmap_weight() in kernel code tools: sync find_bit() implementation lib/find_bit: optimize find_next_bit() functions lib/find_bit: create find_first_zero_bit_le() ...	2022-10-10 12:49:34 -07:00
Yury Norov	6f9c07be9d	lib/cpumask: add FORCE_NR_CPUS config option The size of cpumasks is hard-limited by compile-time parameter NR_CPUS, but defined at boot-time when kernel parses ACPI/DT tables, and stored in nr_cpu_ids. In many practical cases, number of CPUs for a target is known at compile time, and can be provided with NR_CPUS. In that case, compiler may be instructed to rely on NR_CPUS as on actual number of CPUs, not an upper limit. It allows to optimize many cpumask routines and significantly shrink size of the kernel image. This patch adds FORCE_NR_CPUS option to teach the compiler to rely on NR_CPUS and enable corresponding optimizations. If FORCE_NR_CPUS=y, kernel will not set nr_cpu_ids at boot, but only check that the actual number of possible CPUs is equal to NR_CPUS, and WARN if that doesn't hold. The new option is especially useful in embedded applications because kernel configurations are unique for each SoC, the number of CPUs is constant and known well, and memory limitations are typically harder. For my 4-CPU ARM64 build with NR_CPUS=4, FORCE_NR_CPUS=y saves 46KB: add/remove: 3/4 grow/shrink: 46/729 up/down: 652/-46952 (-46300) Signed-off-by: Yury Norov <yury.norov@gmail.com>	2022-09-20 16:11:44 -07:00
Yury Norov	38bef8e57f	smp: add set_nr_cpu_ids() In preparation to support compile-time nr_cpu_ids, add a setter for the variable. This is a no-op for all arches. Signed-off-by: Yury Norov <yury.norov@gmail.com>	2022-09-19 17:51:53 -07:00
Yury Norov	53fc190cc6	smp: don't declare nr_cpu_ids if NR_CPUS == 1 SMP and NR_CPUS are independent options, hence nr_cpu_ids may be declared even if NR_CPUS == 1, which is useless. Signed-off-by: Yury Norov <yury.norov@gmail.com>	2022-09-19 17:51:53 -07:00
Zhen Lei	e73dfe3093	sched/debug: Try trigger_single_cpu_backtrace(cpu) in dump_cpu_task() The trigger_all_cpu_backtrace() function attempts to send an NMI to the target CPU, which usually provides much better stack traces than the dump_cpu_task() function's approach of dumping that stack from some other CPU. So much so that most calls to dump_cpu_task() only happen after a call to trigger_all_cpu_backtrace() has failed. And the exception to this rule really should attempt to use trigger_all_cpu_backtrace() first. Therefore, move the trigger_all_cpu_backtrace() invocation into dump_cpu_task(). Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Valentin Schneider <vschneid@redhat.com>	2022-08-31 05:03:14 -07:00
Chen Zhongjin	9c9b26b0df	locking/csd_lock: Change csdlock_debug from early_param to __setup The csdlock_debug kernel-boot parameter is parsed by the early_param() function csdlock_debug(). If set, csdlock_debug() invokes static_branch_enable() to enable csd_lock_wait feature, which triggers a panic on arm64 for kernels built with CONFIG_SPARSEMEM=y and CONFIG_SPARSEMEM_VMEMMAP=n. With CONFIG_SPARSEMEM_VMEMMAP=n, __nr_to_section is called in static_key_enable() and returns NULL, resulting in a NULL dereference because mem_section is initialized only later in sparse_init(). This is also a problem for powerpc because early_param() functions are invoked earlier than jump_label_init(), also resulting in static_key_enable() failures. These failures cause the warning "static key 'xxx' used before call to jump_label_init()". Thus, early_param is too early for csd_lock_wait to run static_branch_enable(), so changes it to __setup to fix these. Fixes: `8d0968cc6b` ("locking/csd_lock: Add boot parameter for controlling CSD lock debugging") Cc: stable@vger.kernel.org Reported-by: Chen jingwen <chenjingwen6@huawei.com> Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-07-19 11:40:00 -07:00

1 2 3 4 5 ...

183 Commits