Commit Graph

924 Commits

Author SHA1 Message Date
Baoquan He
b4722b8593 kernel/workqueue.c: fix DEFINE_PER_CPU_SHARED_ALIGNED expansion
Make tags always produces below annoying warnings:

ctags: Warning: kernel/workqueue.c:470: null expansion of name pattern "\1"
ctags: Warning: kernel/workqueue.c:474: null expansion of name pattern "\1"
ctags: Warning: kernel/workqueue.c:478: null expansion of name pattern "\1"

In commit 25528213fe ("tags: Fix DEFINE_PER_CPU expansions"), codes in
places have been adjusted including cpu_worker_pools definition. I noticed
in commit 4cb1ef6460 ("workqueue: Implement BH workqueues to eventually
replace tasklets"), cpu_worker_pools definition was unfolded back. Not
sure if it was intentionally done or ignored carelessly.

Makes change to mute them specifically.

Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-09-11 08:10:44 -10:00
Sergey Senozhatsky
84c425bef3 workqueue: fix null-ptr-deref on __alloc_workqueue() error
wq->lockdep_map is set only after __alloc_workqueue()
successfully returns. However, on its error path
__alloc_workqueue() may call destroy_workqueue() which
expects wq->lockdep_map to be already set, which results
in a null-ptr-deref in touch_wq_lockdep_map().

Add a simple NULL-check to touch_wq_lockdep_map().

Oops: general protection fault, probably for non-canonical address
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
RIP: 0010:__lock_acquire+0x81/0x7800
[..]
Call Trace:
 <TASK>
 ? __die_body+0x66/0xb0
 ? die_addr+0xb2/0xe0
 ? exc_general_protection+0x300/0x470
 ? asm_exc_general_protection+0x22/0x30
 ? __lock_acquire+0x81/0x7800
 ? mark_lock+0x94/0x330
 ? __lock_acquire+0x12fd/0x7800
 ? __lock_acquire+0x3439/0x7800
 lock_acquire+0x14c/0x3e0
 ? __flush_workqueue+0x167/0x13a0
 ? __init_swait_queue_head+0xaf/0x150
 ? __flush_workqueue+0x167/0x13a0
 __flush_workqueue+0x17d/0x13a0
 ? __flush_workqueue+0x167/0x13a0
 ? lock_release+0x50f/0x830
 ? drain_workqueue+0x94/0x300
 drain_workqueue+0xe3/0x300
 destroy_workqueue+0xac/0xc40
 ? workqueue_sysfs_register+0x159/0x2f0
 __alloc_workqueue+0x1506/0x1760
 alloc_workqueue+0x61/0x150
...

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-08-21 06:13:26 -10:00
Matthew Brost
9b59a85a84 workqueue: Don't call va_start / va_end twice
Calling va_start / va_end multiple times is undefined and causes
problems with certain compiler / platforms.

Change alloc_ordered_workqueue_lockdep_map to a macro and updated
__alloc_workqueue to take a va_list argument.

Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-08-20 09:38:39 -10:00
Matthew Brost
ec0a7d44b3 workqueue: Add interface for user-defined workqueue lockdep map
Add an interface for a user-defined workqueue lockdep map, which is
helpful when multiple workqueues are created for the same purpose. This
also helps avoid leaking lockdep maps on each workqueue creation.

v2:
 - Add alloc_workqueue_lockdep_map (Tejun)
v3:
 - Drop __WQ_USER_OWNED_LOCKDEP (Tejun)
 - static inline alloc_ordered_workqueue_lockdep_map (Tejun)

Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-08-13 09:05:51 -10:00
Matthew Brost
4f022f430e workqueue: Change workqueue lockdep map to pointer
Will help enable user-defined lockdep maps for workqueues.

Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-08-13 09:05:40 -10:00
Matthew Brost
b188c57af2 workqueue: Split alloc_workqueue into internal function and lockdep init
Will help enable user-defined lockdep maps for workqueues.

Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-08-13 09:05:28 -10:00
Sangmoon Kim
073107b39e workqueue: add cmdline parameter workqueue.panic_on_stall
When we want to debug the workqueue stall, we can immediately make
a panic to get the information we want.

In some systems, it may be necessary to quickly reboot the system to
escape from a workqueue lockup situation. In this case, we can control
the number of stall detections to generate panic.

workqueue.panic_on_stall sets the number times of the stall to trigger
panic. 0 disables the panic on stall.

Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-08-06 10:06:31 -10:00
Lai Jiangshan
aa8684755a workqueue: Remove unneeded lockdep_assert_cpus_held()
The commit 19af457573 ("workqueue: Remove cpus_read_lock() from
apply_wqattrs_lock()") removes the unneed cpus_read_lock() after the pwq
creations and installations have been reworked based on wq_online_cpumask
rather than cpu_online_mask making cpus_read_lock() is unneeded during
wqattrs changes.

But it desn't remove the lockdep_assert_cpus_held() checks during wqattrs
changes, which leads to complaints from lockdep reported by kernel test
robot:

[   15.726567][  T131] ------------[ cut here ]------------
[ 15.728117][ T131] WARNING: CPU: 1 PID: 131 at kernel/cpu.c:525 lockdep_assert_cpus_held (kernel/cpu.c:525)
[   15.731191][  T131] Modules linked in: floppy(+) parport_pc(+) parport qemu_fw_cfg rtc_cmos
[   15.733423][  T131] CPU: 1 PID: 131 Comm: systemd-udevd Tainted: G                T  6.10.0-rc2-00254-g19af45757383 #1 df6f039f42e8818bf9a534449362ebad1aad32e2
[   15.737011][  T131] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 15.739760][ T131] EIP: lockdep_assert_cpus_held (kernel/cpu.c:525)
[ 15.741326][ T131] Code: 97 c2 03 72 20 83 3d f4 73 97 c2 00 74 17 55 89 e5 b8 fc bd 4d c2 ba ff ff ff ff e8 e4 57 d1 00 85 c0 74 06 5d 31 c0 31 d2 c3 <0f> 0b eb f6 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 89 e5 b8

Fix it by removing the unneeded lockdep_assert_cpus_held().
Also remove the unneed cpus_read_lock() from wq_affn_dfl_set().

tj: Dropped the removal of cpus_read_lock/unlock() in wq_affn_dfl_set() to
    keep this patch fix only.

Cc: kernel test robot <oliver.sang@intel.com>
Fixes: 19af45757383("workqueue: Remove cpus_read_lock() from apply_wqattrs_lock()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202407141846.665c0446-lkp@intel.com
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-15 14:01:14 -10:00
Linus Torvalds
b02c520fee Merge tag 'wq-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:

 - Lai fixed a bug where CPU hotplug and workqueue attribute changes
   race leaving some workqueues not fully updated. This involved
   refactoring and changing how online CPUs are tracked. The resulting
   code is cleaner.

 - Workqueue watchdog touch operation was causing too much cacheline
   contention on very large machines. Nicholas improved scalabililty by
   avoiding unnecessary global updates.

 - Code cleanups and minor rescuer behavior improvement.

 - The last commit 58629d4871 ("workqueue: Always queue work items to
   the newest PWQ for order workqueues") is a cherry-picked straggler
   commit from for-6.10-fixes, a fix for a bug which may not actually
   trigger.

* tag 'wq-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (24 commits)
  workqueue: Always queue work items to the newest PWQ for order workqueues
  workqueue: Rename wq_update_pod() to unbound_wq_update_pwq()
  workqueue: Remove the arguments @hotplug_cpu and @online from wq_update_pod()
  workqueue: Remove the argument @cpu_going_down from wq_calc_pod_cpumask()
  workqueue: Remove the unneeded cpumask empty check in wq_calc_pod_cpumask()
  workqueue: Remove cpus_read_lock() from apply_wqattrs_lock()
  workqueue: Simplify wq_calc_pod_cpumask() with wq_online_cpumask
  workqueue: Add wq_online_cpumask
  workqueue: Init rescuer's affinities as the wq's effective cpumask
  workqueue: Put PWQ allocation and WQ enlistment in the same lock C.S.
  workqueue: Move kthread_flush_worker() out of alloc_and_link_pwqs()
  workqueue: Make rescuer initialization as the last step of the creation of a new wq
  workqueue: Register sysfs after the whole creation of the new wq
  workqueue: Simplify goto statement
  workqueue: Update cpumasks after only applying it successfully
  workqueue: Improve scalability of workqueue watchdog touch
  workqueue: wq_watchdog_touch is always called with valid CPU
  workqueue: Remove useless pool->dying_workers
  workqueue: Detach workers directly in idle_cull_fn()
  workqueue: Don't bind the rescuer in the last working cpu
  ...
2024-07-15 16:51:22 -07:00
Lai Jiangshan
58629d4871 workqueue: Always queue work items to the newest PWQ for order workqueues
To ensure non-reentrancy, __queue_work() attempts to enqueue a work
item to the pool of the currently executing worker. This is not only
unnecessary for an ordered workqueue, where order inherently suggests
non-reentrancy, but it could also disrupt the sequence if the item is
not enqueued on the newest PWQ.

Just queue it to the newest PWQ and let order management guarantees
non-reentrancy.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Fixes: 4c065dbce1 ("workqueue: Enable unbound cpumask update on ordered workqueues")
Cc: stable@vger.kernel.org # v6.9+
Signed-off-by: Tejun Heo <tj@kernel.org>
(cherry picked from commit 74347be3edfd11277799242766edf844c43dd5d3)
2024-07-14 18:20:19 -10:00
Lai Jiangshan
b2b1f93384 workqueue: Rename wq_update_pod() to unbound_wq_update_pwq()
What wq_update_pod() does is just to update the pwq of the specific
cpu.  Rename it and update the comments.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-11 12:50:35 -10:00
Lai Jiangshan
d160a58de5 workqueue: Remove the arguments @hotplug_cpu and @online from wq_update_pod()
The arguments @hotplug_cpu and @online are not used in wq_update_pod()
since the functions called by wq_update_pod() don't need them.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-11 12:50:35 -10:00
Lai Jiangshan
88a41b185d workqueue: Remove the argument @cpu_going_down from wq_calc_pod_cpumask()
wq_calc_pod_cpumask() uses wq_online_cpumask, which excludes the cpu
going down, so the argument cpu_going_down is unused and can be removed.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-11 12:50:34 -10:00
Lai Jiangshan
2cb61f76be workqueue: Remove the unneeded cpumask empty check in wq_calc_pod_cpumask()
The cpumask empty check in wq_calc_pod_cpumask() has long been useless.
It just works purely as documents which states that the cpumask is not
possible empty after the function returns.

Now the code above is even more explicit that the cpumask is not empty,
so the document-only empty check can be removed.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-11 12:50:34 -10:00
Lai Jiangshan
19af457573 workqueue: Remove cpus_read_lock() from apply_wqattrs_lock()
1726a17135 ("workqueue: Put PWQ allocation and WQ enlistment in the same
lock C.S.") led to the following possible deadlock:

  WARNING: possible recursive locking detected
  6.10.0-rc5-00004-g1d4c6111406c #1 Not tainted
   --------------------------------------------
   swapper/0/1 is trying to acquire lock:
   c27760f4 (cpu_hotplug_lock){++++}-{0:0}, at: alloc_workqueue (kernel/workqueue.c:5152 kernel/workqueue.c:5730) 
  
   but task is already holding lock:
   c27760f4 (cpu_hotplug_lock){++++}-{0:0}, at: padata_alloc (kernel/padata.c:1007) 
   ...  
   stack backtrace:
   ...
   cpus_read_lock (include/linux/percpu-rwsem.h:53 kernel/cpu.c:488) 
   alloc_workqueue (kernel/workqueue.c:5152 kernel/workqueue.c:5730) 
   padata_alloc (kernel/padata.c:1007 (discriminator 1)) 
   pcrypt_init_padata (crypto/pcrypt.c:327 (discriminator 1)) 
   pcrypt_init (crypto/pcrypt.c:353) 
   do_one_initcall (init/main.c:1267) 
   do_initcalls (init/main.c:1328 (discriminator 1) init/main.c:1345 (discriminator 1)) 
   kernel_init_freeable (init/main.c:1364) 
   kernel_init (init/main.c:1469) 
   ret_from_fork (arch/x86/kernel/process.c:153) 
   ret_from_fork_asm (arch/x86/entry/entry_32.S:737) 
   entry_INT80_32 (arch/x86/entry/entry_32.S:944) 

This is caused by pcrypt allocating a workqueue while holding
cpus_read_lock(), so workqueue code can't do it again as that can lead to
deadlocks if down_write starts after the first down_read.

The pwq creations and installations have been reworked based on
wq_online_cpumask rather than cpu_online_mask making cpus_read_lock() is
unneeded during wqattrs changes. Fix the deadlock by removing
cpus_read_lock() from apply_wqattrs_lock().

tj: Updated changelog.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Fixes: 1726a17135 ("workqueue: Put PWQ allocation and WQ enlistment in the same lock C.S.")
Link: http://lkml.kernel.org/r/202407081521.83b627c1-lkp@intel.com
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-11 12:50:34 -10:00
Lai Jiangshan
fbb3d4c15d workqueue: Simplify wq_calc_pod_cpumask() with wq_online_cpumask
Avoid relying on cpu_online_mask for wqattrs changes so that
cpus_read_lock() can be removed from apply_wqattrs_lock().

And with wq_online_cpumask, attrs->__pod_cpumask doesn't need to be
reused as a temporary storage to calculate if the pod have any online
CPUs @attrs wants since @cpu_going_down is not in the wq_online_cpumask.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-11 12:50:34 -10:00
Lai Jiangshan
8d84baf760 workqueue: Add wq_online_cpumask
The new wq_online_mask mirrors the cpu_online_mask except during
hotplugging; specifically, it differs between the hotplugging stages
of workqueue_offline_cpu() and workqueue_online_cpu(), during which
the transitioning CPU is not represented in the mask.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-11 12:50:34 -10:00
Lai Jiangshan
449b31ad29 workqueue: Init rescuer's affinities as the wq's effective cpumask
Make it consistent with apply_wqattrs_commit().

Link: https://lore.kernel.org/lkml/20240203154334.791910-5-longman@redhat.com/
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-05 09:14:40 -10:00
Lai Jiangshan
1726a17135 workqueue: Put PWQ allocation and WQ enlistment in the same lock C.S.
The PWQ allocation and WQ enlistment are not within the same lock-held
critical section; therefore, their states can become out of sync when
the user modifies the unbound mask or if CPU hotplug events occur in
the interim since those operations only update the WQs that are already
in the list.

Make the PWQ allocation and WQ enlistment atomic.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-05 09:14:40 -10:00
Lai Jiangshan
4e9a37389e workqueue: Move kthread_flush_worker() out of alloc_and_link_pwqs()
kthread_flush_worker() can't be called with wq_pool_mutex held.

Prepare for moving wq_pool_mutex and cpu hotplug lock out of
alloc_and_link_pwqs().

Cc: Zqiang <qiang.zhang1211@gmail.com>
Link: https://lore.kernel.org/lkml/20230920060704.24981-1-qiang.zhang1211@gmail.com/
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-05 09:14:40 -10:00
Lai Jiangshan
c5178e6ca6 workqueue: Make rescuer initialization as the last step of the creation of a new wq
For early wq allocation, rescuer initialization is the last step of the
creation of a new wq.  Make the behavior the same for all allocations.

Prepare for initializing rescuer's affinities with the default pwq's
affinities.

Prepare for moving the whole workqueue initializing procedure into
wq_pool_mutex and cpu hotplug locks.

Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-05 09:14:40 -10:00
Lai Jiangshan
c3138f3881 workqueue: Register sysfs after the whole creation of the new wq
workqueue creation includes adding it to the workqueue list.

Prepare for moving the whole workqueue initializing procedure into
wq_pool_mutex and cpu hotplug locks.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-05 09:14:40 -10:00
Lai Jiangshan
b3d209164d workqueue: Simplify goto statement
Use a simple if-statement to replace the cumbersome goto-statement in
workqueue_set_unbound_cpumask().

Cc: Waiman Long <longman@redhat.com>
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-02 07:17:22 -10:00
Lai Jiangshan
8416588323 workqueue: Update cpumasks after only applying it successfully
Make workqueue_unbound_exclude_cpumask() and workqueue_set_unbound_cpumask()
only update wq_isolated_cpumask and wq_requested_unbound_cpumask when
workqueue_apply_unbound_cpumask() returns successfully.

Fixes: fe28f631fa94("workqueue: Add workqueue_unbound_exclude_cpumask() to exclude CPUs from wq_unbound_cpumask")
Cc: Waiman Long <longman@redhat.com>
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-02 07:14:33 -10:00
Nicholas Piggin
98f887f820 workqueue: Improve scalability of workqueue watchdog touch
On a ~2000 CPU powerpc system, hard lockups have been observed in the
workqueue code when stop_machine runs (in this case due to CPU hotplug).
This is due to lots of CPUs spinning in multi_cpu_stop, calling
touch_nmi_watchdog() which ends up calling wq_watchdog_touch().
wq_watchdog_touch() writes to the global variable wq_watchdog_touched,
and that can find itself in the same cacheline as other important
workqueue data, which slows down operations to the point of lockups.

In the case of the following abridged trace, worker_pool_idr was in
the hot line, causing the lockups to always appear at idr_find.

  watchdog: CPU 1125 self-detected hard LOCKUP @ idr_find
  Call Trace:
  get_work_pool
  __queue_work
  call_timer_fn
  run_timer_softirq
  __do_softirq
  do_softirq_own_stack
  irq_exit
  timer_interrupt
  decrementer_common_virt
  * interrupt: 900 (timer) at multi_cpu_stop
  multi_cpu_stop
  cpu_stopper_thread
  smpboot_thread_fn
  kthread

Fix this by having wq_watchdog_touch() only write to the line if the
last time a touch was recorded exceeds 1/4 of the watchdog threshold.

Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2024-06-25 06:55:44 -10:00