Commit Graph

774 Commits

Author SHA1 Message Date
Linus Torvalds
7ab044a4f4 Merge tag 'wq-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:

 - Concurrency-managed per-cpu work items that hog CPUs and delay the
   execution of other work items are now automatically detected and
   excluded from concurrency management. Reporting on such work items
   can also be enabled through a config option.

 - Added tools/workqueue/wq_monitor.py which improves visibility into
   workqueue usages and behaviors.

 - Arnd's minimal fix for gcc-13 enum warning on 32bit compiles,
   superseded by commit afa4bb778e in mainline.

* tag 'wq-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Disable per-cpu CPU hog detection when wq_cpu_intensive_thresh_us is 0
  workqueue: Fix WARN_ON_ONCE() triggers in worker_enter_idle()
  workqueue: fix enum type for gcc-13
  workqueue: Track and monitor per-workqueue CPU time usage
  workqueue: Report work funcs that trigger automatic CPU_INTENSIVE mechanism
  workqueue: Automatically mark CPU-hogging work items CPU_INTENSIVE
  workqueue: Improve locking rule description for worker fields
  workqueue: Move worker_set/clr_flags() upwards
  workqueue: Re-order struct worker fields
  workqueue: Add pwq->stats[] and a monitoring script
  Further upgrade queue_work_on() comment
2023-06-27 16:32:52 -07:00
Linus Torvalds
afa4bb778e workqueue: clean up WORK_* constant types, clarify masking
Dave Airlie reports that gcc-13.1.1 has started complaining about some
of the workqueue code in 32-bit arm builds:

  kernel/workqueue.c: In function ‘get_work_pwq’:
  kernel/workqueue.c:713:24: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
    713 |                 return (void *)(data & WORK_STRUCT_WQ_DATA_MASK);
        |                        ^
  [ ... a couple of other cases ... ]

and while it's not immediately clear exactly why gcc started complaining
about it now, I suspect it's some C23-induced enum type handlign fixup in
gcc-13 is the cause.

Whatever the reason for starting to complain, the code and data types
are indeed disgusting enough that the complaint is warranted.

The wq code ends up creating various "helper constants" (like that
WORK_STRUCT_WQ_DATA_MASK) using an enum type, which is all kinds of
confused.  The mask needs to be 'unsigned long', not some unspecified
enum type.

To make matters worse, the actual "mask and cast to a pointer" is
repeated a couple of times, and the cast isn't even always done to the
right pointer, but - as the error case above - to a 'void *' with then
the compiler finishing the job.

That's now how we roll in the kernel.

So create the masks using the proper types rather than some ambiguous
enumeration, and use a nice helper that actually does the type
conversion in one well-defined place.

Incidentally, this magically makes clang generate better code.  That,
admittedly, is really just a sign of clang having been seriously
confused before, and cleaning up the typing unconfuses the compiler too.

Reported-by: Dave Airlie <airlied@gmail.com>
Link: https://lore.kernel.org/lkml/CAPM=9twNnV4zMCvrPkw3H-ajZOH-01JVh_kDrxdPYQErz8ZTdA@mail.gmail.com/
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-06-23 12:08:14 -07:00
Zqiang
18c8ae8131 workqueue: Disable per-cpu CPU hog detection when wq_cpu_intensive_thresh_us is 0
If workqueue.cpu_intensive_thresh_us is set to 0, the detection mechanism
for CPU-hogging per-cpu work item will keep triggering spuriously:

  workqueue: process_srcu hogged CPU for >0us 4 times, consider switching to WQ_UNBOUND
  workqueue: gc_worker hogged CPU for >0us 4 times, consider switching to WQ_UNBOUND
  workqueue: gc_worker hogged CPU for >0us 8 times, consider switching to WQ_UNBOUND
  workqueue: wait_rcu_exp_gp hogged CPU for >0us 4 times, consider switching to WQ_UNBOUND
  workqueue: kfree_rcu_monitor hogged CPU for >0us 4 times, consider switching to WQ_UNBOUND
  workqueue: kfree_rcu_monitor hogged CPU for >0us 8 times, consider switching to WQ_UNBOUND
  workqueue: reg_todo hogged CPU for >0us 4 times, consider switching to WQ_UNBOUND

This commit therefore disables the CPU-hog detection mechanism when
workqueue.cpu_intensive_thresh_us is set to 0.

tj: Patch description updated and the condition check on
    cpu_intensive_thresh_us separated into a separate if statement for
    readability.

Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-05-25 10:46:53 -10:00
Zqiang
c8f6219be2 workqueue: Fix WARN_ON_ONCE() triggers in worker_enter_idle()
Currently, pool->nr_running can be modified from timer tick, that means the
timer tick can run nested inside a not-irq-protected section that's in the
process of modifying nr_running. Consider the following scenario:

CPU0
kworker/0:2 (events)
   worker_clr_flags(worker, WORKER_PREP | WORKER_REBOUND);
   ->pool->nr_running++;  (1)

   process_one_work()
   ->worker->current_func(work);
     ->schedule()
       ->wq_worker_sleeping()
         ->worker->sleeping = 1;
         ->pool->nr_running--;  (0)
           ....
       ->wq_worker_running()
               ....
               CPU0 by interrupt:
               wq_worker_tick()
               ->worker_set_flags(worker, WORKER_CPU_INTENSIVE);
                 ->pool->nr_running--;  (-1)
	         ->worker->flags |= WORKER_CPU_INTENSIVE;
               ....
         ->if (!(worker->flags & WORKER_NOT_RUNNING))
           ->pool->nr_running++;    (will not execute)
         ->worker->sleeping = 0;
         ....
    ->worker_clr_flags(worker, WORKER_CPU_INTENSIVE);
      ->pool->nr_running++;  (0)
    ....
    worker_set_flags(worker, WORKER_PREP);
    ->pool->nr_running--;   (-1)
    ....
    worker_enter_idle()
    ->WARN_ON_ONCE(pool->nr_workers == pool->nr_idle && pool->nr_running);

if the nr_workers is equal to nr_idle, due to the nr_running is not zero,
will trigger WARN_ON_ONCE().

[    2.460602] WARNING: CPU: 0 PID: 63 at kernel/workqueue.c:1999 worker_enter_idle+0xb2/0xc0
[    2.462163] Modules linked in:
[    2.463401] CPU: 0 PID: 63 Comm: kworker/0:2 Not tainted 6.4.0-rc2-next-20230519 #1
[    2.463771] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
[    2.465127] Workqueue:  0x0 (events)
[    2.465678] RIP: 0010:worker_enter_idle+0xb2/0xc0
...
[    2.472614] Call Trace:
[    2.473152]  <TASK>
[    2.474182]  worker_thread+0x71/0x430
[    2.474992]  ? _raw_spin_unlock_irqrestore+0x28/0x50
[    2.475263]  kthread+0x103/0x120
[    2.475493]  ? __pfx_worker_thread+0x10/0x10
[    2.476355]  ? __pfx_kthread+0x10/0x10
[    2.476635]  ret_from_fork+0x2c/0x50
[    2.477051]  </TASK>

This commit therefore add the check of worker->sleeping in wq_worker_tick(),
if the worker->sleeping is not zero, directly return.

tj: Updated comment and description.

Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Closes: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230519/testrun/17078554/suite/boot/test/clang-nightly-lkftconfig/log
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-05-24 11:57:26 -10:00
Tejun Heo
8a1dd1e547 workqueue: Track and monitor per-workqueue CPU time usage
Now that wq_worker_tick() is there, we can easily track the rough CPU time
consumption of each workqueue by charging the whole tick whenever a tick
hits an active workqueue. While not super accurate, it provides reasonable
visibility into the workqueues that consume a lot of CPU cycles.
wq_monitor.py is updated to report the per-workqueue CPU times.

v2: wq_monitor.py was using "cputime" as the key when outputting in json
    format. Use "cpu_time" instead for consistency with other fields.

Signed-off-by: Tejun Heo <tj@kernel.org>
2023-05-17 17:02:09 -10:00
Tejun Heo
6363845005 workqueue: Report work funcs that trigger automatic CPU_INTENSIVE mechanism
Workqueue now automatically marks per-cpu work items that hog CPU for too
long as CPU_INTENSIVE, which excludes them from concurrency management and
prevents stalling other concurrency-managed work items. If a work function
keeps running over the thershold, it likely needs to be switched to use an
unbound workqueue.

This patch adds a debug mechanism which tracks the work functions which
trigger the automatic CPU_INTENSIVE mechanism and report them using
pr_warn() with exponential backoff.

v3: Documentation update.

v2: Drop bouncing to kthread_worker for printing messages. It was to avoid
    introducing circular locking dependency through printk but not effective
    as it still had pool lock -> wci_lock -> printk -> pool lock loop. Let's
    just print directly using printk_deferred().

Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
2023-05-17 17:02:08 -10:00
Tejun Heo
616db8779b workqueue: Automatically mark CPU-hogging work items CPU_INTENSIVE
If a per-cpu work item hogs the CPU, it can prevent other work items from
starting through concurrency management. A per-cpu workqueue which intends
to host such CPU-hogging work items can choose to not participate in
concurrency management by setting %WQ_CPU_INTENSIVE; however, this can be
error-prone and difficult to debug when missed.

This patch adds an automatic CPU usage based detection. If a
concurrency-managed work item consumes more CPU time than the threshold
(10ms by default) continuously without intervening sleeps, wq_worker_tick()
which is called from scheduler_tick() will detect the condition and
automatically mark it CPU_INTENSIVE.

The mechanism isn't foolproof:

* Detection depends on tick hitting the work item. Getting preempted at the
  right timings may allow a violating work item to evade detection at least
  temporarily.

* nohz_full CPUs may not be running ticks and thus can fail detection.

* Even when detection is working, the 10ms detection delays can add up if
  many CPU-hogging work items are queued at the same time.

However, in vast majority of cases, this should be able to detect violations
reliably and provide reasonable protection with a small increase in code
complexity.

If some work items trigger this condition repeatedly, the bigger problem
likely is the CPU being saturated with such per-cpu work items and the
solution would be making them UNBOUND. The next patch will add a debug
mechanism to help spot such cases.

v4: Documentation for workqueue.cpu_intensive_thresh_us added to
    kernel-parameters.txt.

v3: Switch to use wq_worker_tick() instead of hooking into preemptions as
    suggested by Peter.

v2: Lai pointed out that wq_worker_stopping() also needs to be called from
    preemption and rtlock paths and an earlier patch was updated
    accordingly. This patch adds a comment describing the risk of infinte
    recursions and how they're avoided.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
2023-05-17 17:02:08 -10:00
Tejun Heo
bdf8b9bfc1 workqueue: Improve locking rule description for worker fields
* Some worker fields are modified only by the worker itself while holding
  pool->lock thus making them safe to read from self, IRQ context if the CPU
  is running the worker or while holding pool->lock. Add 'K' locking rule
  for them.

* worker->sleeping is currently marked "None" which isn't very descriptive.
  It's used only by the worker itself. Add 'S' locking rule for it.

A future patch will depend on the 'K' rule to access worker->current_* from
the scheduler ticks.

Signed-off-by: Tejun Heo <tj@kernel.org>
2023-05-17 17:02:08 -10:00
Tejun Heo
c54d5046a0 workqueue: Move worker_set/clr_flags() upwards
They are going to be used in wq_worker_stopping(). Move them upwards.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
2023-05-17 17:02:08 -10:00
Tejun Heo
725e8ec59c workqueue: Add pwq->stats[] and a monitoring script
Currently, the only way to peer into workqueue operations is through
tracing. While possible, it isn't easy or convenient to monitor
per-workqueue behaviors over time this way. Let's add pwq->stats[] that
track relevant events and a drgn monitoring script -
tools/workqueue/wq_monitor.py.

It's arguable whether this needs to be configurable. However, it currently
only has several counters and the runtime overhead shouldn't be noticeable
given that they're on pwq's which are per-cpu on per-cpu workqueues and
per-numa-node on unbound ones. Let's keep it simple for the time being.

v2: Patch reordered to earlier with fewer fields. Field will be added back
    gradually. Help message improved.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
2023-05-17 17:02:08 -10:00
Paul E. McKenney
854f5cc5b7 Further upgrade queue_work_on() comment
The current queue_work_on() docbook comment says that the caller must
ensure that the specified CPU can't go away, and further says that the
penalty for failing to nail down the specified CPU is that the workqueue
handler might find itself executing on some other CPU.  This is true
as far as it goes, but fails to note what happens if the specified CPU
never was online.  Therefore, further expand this comment to say that
specifying a CPU that was never online will result in a splat.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-05-09 05:37:42 -10:00
Linus Torvalds
cd546fa325 Merge tag 'wq-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
 "Mostly changes from Petr to improve warning and error reporting.

  Workqueue now reports more of the relevant failures with better
  context which should help debugging"

* tag 'wq-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Introduce show_freezable_workqueues
  workqueue: Print backtraces from CPUs with hung CPU bound workqueues
  workqueue: Warn when a rescuer could not be created
  workqueue: Interrupted create_worker() is not a repeated event
  workqueue: Warn when a new worker could not be created
  workqueue: Fix hung time report of worker pools
  workqueue: Simplify a pr_warn() call in wq_select_unbound_cpu()
  MAINTAINERS: Add workqueue_internal.h to the WORKQUEUE entry
2023-04-29 09:48:52 -07:00
Jungseung Lee
704bc669e1 workqueue: Introduce show_freezable_workqueues
Currently show_all_workqueue is called if freeze fails at the time of
freeze the workqueues, which shows the status of all workqueues and of
all worker pools. In this cases we may only need to dump state of only
workqueues that are freezable and busy.

This patch defines show_freezable_workqueues, which uses
show_one_workqueue, a granular function that shows the state of individual
workqueues, so that dump only the state of freezable workqueues
at that time.

tj: Minor message adjustment.

Signed-off-by: Jungseung Lee <js07.lee@samsung.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-03-23 15:55:38 -10:00
Petr Mladek
cd2440d66f workqueue: Print backtraces from CPUs with hung CPU bound workqueues
The workqueue watchdog reports a lockup when there was not any progress
in the worker pool for a long time. The progress means that a pending
work item starts being proceed.

Worker pools for unbound workqueues always wake up an idle worker and
try to process the work immediately. The last idle worker has to create
new worker first. The stall might happen only when a new worker could
not be created in which case an error should get printed. Another problem
might be too high load. In this case, workers are victims of a global
system problem.

Worker pools for CPU bound workqueues are designed for lightweight
work items that do not need much CPU time. They are proceed one by
one on a single worker. New worker is used only when a work is sleeping.
It creates one additional scenario. The stall might happen when
the CPU-bound workqueue is used for CPU-intensive work.

More precisely, the stall is detected when a CPU-bound worker is in
the TASK_RUNNING state for too long. In this case, it might be useful
to see the backtrace from the problematic worker.

The information how long a worker is in the running state is not available.
But the CPU-bound worker pools do not have many workers in the running
state by definition. And only few pools are typically blocked.

It should be acceptable to print backtraces from all workers in
TASK_RUNNING state in the stalled worker pools. The number of false
positives should be very low.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-03-17 12:03:47 -10:00
Petr Mladek
4c0736a76a workqueue: Warn when a rescuer could not be created
Rescuers are created when a workqueue with WQ_MEM_RECLAIM is allocated.
It typically happens during the system boot.

systemd switches the root filesystem from initrd to the booted system
during boot. It kills processes that block the switch for too long.
One of the process might be modprobe that tries to create a workqueue.

These problems are hard to reproduce. Also alloc_workqueue() does not
pass the error code. Make the debugging easier by printing an error,
similar to create_worker().

Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-03-17 12:03:46 -10:00
Petr Mladek
60f540389a workqueue: Interrupted create_worker() is not a repeated event
kthread_create_on_node() might get interrupted(). It is rare but realistic.
For example, when an unbound workqueue is allocated in module_init()
callback. It is done in the context of the "modprobe" process. And,
for example, systemd might kill pending processes when switching root
from initrd to the booted system.

The interrupt is a one-off event and the race might be hard to reproduce.
It is always worth printing.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-03-17 12:03:46 -10:00
Petr Mladek
3f0ea0b864 workqueue: Warn when a new worker could not be created
The workqueue watchdog reports a lockup when there was not any progress
in the worker pool for a long time. The progress means that a pending
work item starts being proceed.

The progress is guaranteed by using idle workers or creating new workers
for pending work items.

There are several reasons why a new worker could not be created:

   + there is not enough memory

   + there is no free pool ID (IDR API)

   + the system reached PID limit

   + the process creating the new worker was interrupted

   + the last idle worker (manager) has not been scheduled for a long
     time. It was not able to even start creating the kthread.

None of these failures is reported at the moment. The only clue is that
show_one_worker_pool() prints that there is a manager. It is the last
idle worker that is responsible for creating a new one. But it is not
clear if create_worker() is failing and why.

Make the debugging easier by printing errors in create_worker().

The error code is important, especially from kthread_create_on_node().
It helps to distinguish the various reasons. For example, reaching
memory limit (-ENOMEM), other system limits (-EAGAIN), or process
interrupted (-EINTR).

Use pr_once() to avoid repeating the same error every CREATE_COOLDOWN
for each stuck worker pool.

Ratelimited printk() might be better. It would help to know if the problem
remains. It would be more clear if the create_worker() errors and workqueue
stalls are related. Also old messages might get lost when the internal log
buffer is full. The problem is that printk() might touch the watchdog.
For example, see touch_nmi_watchdog() in serial8250_console_write().
It would require synchronization of the begin and length of the ratelimit
interval with the workqueue watchdog. Otherwise, the error messages
might break the watchdog. This does not look worth the complexity.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-03-17 12:03:46 -10:00
Petr Mladek
335a42ebb0 workqueue: Fix hung time report of worker pools
The workqueue watchdog prints a warning when there is no progress in
a worker pool. Where the progress means that the pool started processing
a pending work item.

Note that it is perfectly fine to process work items much longer.
The progress should be guaranteed by waking up or creating idle
workers.

show_one_worker_pool() prints state of non-idle worker pool. It shows
a delay since the last pool->watchdog_ts.

The timestamp is updated when a first pending work is queued in
__queue_work(). Also it is updated when a work is dequeued for
processing in worker_thread() and rescuer_thread().

The delay is misleading when there is no pending work item. In this
case it shows how long the last work item is being proceed. Show
zero instead. There is no stall if there is no pending work.

Fixes: 82607adcf9 ("workqueue: implement lockup detector")
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-03-17 12:03:46 -10:00
Ammar Faizi
a8ec5880bd workqueue: Simplify a pr_warn() call in wq_select_unbound_cpu()
Use pr_warn_once() to achieve the same thing. It's simpler.

Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-03-17 11:49:03 -10:00
Greg Kroah-Hartman
686f669780 workqueue: move to use bus_get_dev_root()
Direct access to the struct bus_type dev_root pointer is going away soon
so replace that with a call to bus_get_dev_root() instead, which is what
it is there for.

Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20230313182918.1312597-8-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-17 15:29:23 +01:00
Valentin Schneider
c63a2e52d5 workqueue: Fold rebind_worker() within rebind_workers()
!CONFIG_SMP builds complain about rebind_worker() being unused. Its only
user, rebind_workers() is indeed only defined for CONFIG_SMP, so just fold
the two lines back up there.

Link: http://lore.kernel.org/r/20230113143102.2e94d74f@canb.auug.org.au
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-01-13 07:50:40 -10:00
Valentin Schneider
e02b931248 workqueue: Unbind kworkers before sending them to exit()
It has been reported that isolated CPUs can suffer from interference due to
per-CPU kworkers waking up just to die.

A surge of workqueue activity during initial setup of a latency-sensitive
application (refresh_vm_stats() being one of the culprits) can cause extra
per-CPU kworkers to be spawned. Then, said latency-sensitive task can be
running merrily on an isolated CPU only to be interrupted sometime later by
a kworker marked for death (cf. IDLE_WORKER_TIMEOUT, 5 minutes after last
kworker activity).

Prevent this by affining kworkers to the wq_unbound_cpumask (which doesn't
contain isolated CPUs, cf. HK_TYPE_WQ) before waking them up after marking
them with WORKER_DIE.

Changing the affinity does require a sleepable context, leverage the newly
introduced pool->idle_cull_work to get that.

Remove dying workers from pool->workers and keep track of them in a
separate list. This intentionally prevents for_each_loop_worker() from
iterating over workers that are marked for death.

Rename destroy_worker() to set_working_dying() to better reflect its
effects and relationship with wake_dying_workers().

Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-01-12 06:21:49 -10:00
Valentin Schneider
9ab03be42b workqueue: Don't hold any lock while rcuwait'ing for !POOL_MANAGER_ACTIVE
put_unbound_pool() currently passes wq_manager_inactive() as exit condition
to rcuwait_wait_event(), which grabs pool->lock to check for

  pool->flags & POOL_MANAGER_ACTIVE

A later patch will require destroy_worker() to be invoked with
wq_pool_attach_mutex held, which needs to be acquired before
pool->lock. A mutex cannot be acquired within rcuwait_wait_event(), as
it could clobber the task state set by rcuwait_wait_event()

Instead, restructure the waiting logic to acquire any necessary lock
outside of rcuwait_wait_event().

Since further work cannot be inserted into unbound pwqs that have reached
->refcnt==0, this is bound to make forward progress as eventually the
worklist will be drained and need_more_worker(pool) will remain false,
preventing any worker from stealing the manager position from us.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-01-12 06:21:49 -10:00
Valentin Schneider
3f959aa3b3 workqueue: Convert the idle_timer to a timer + work_struct
A later patch will require a sleepable context in the idle worker timeout
function. Converting worker_pool.idle_timer to a delayed_work gives us just
that, however this would imply turning all idle_timer expiries into
scheduler events (waking up a worker to handle the dwork).

Instead, implement a "custom dwork" where the timer callback does some
extra checks before queuing the associated work.

No change in functionality intended.

Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-01-12 06:21:49 -10:00
Valentin Schneider
793777bc19 workqueue: Factorize unbind/rebind_workers() logic
Later patches will reuse this code, move it into reusable functions.

Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-01-12 06:21:49 -10:00