Commit Graph

687 Commits

Author SHA1 Message Date
Christoph Hellwig
fe557319aa maccess: rename probe_kernel_{read,write} to copy_{from,to}_kernel_nofault
Better describe what these functions do.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-17 10:57:41 -07:00
Lai Jiangshan
10cdb15759 workqueue: use BUILD_BUG_ON() for compile time test instead of WARN_ON()
Any runtime WARN_ON() has to be fixed, and BUILD_BUG_ON() can
help you nitice it earlier.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-06-01 11:02:42 -04:00
Lai Jiangshan
b8f06b0444 workqueue: remove useless unlock() and lock() in series
This is no point to unlock() and then lock() the same mutex
back to back.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-05-29 10:25:23 -04:00
Lai Jiangshan
4f3f4cf388 workqueue: void unneeded requeuing the pwq in rescuer thread
008847f66c ("workqueue: allow rescuer thread to do more work.") made
the rescuer worker requeue the pwq immediately if there may be more
work items which need rescuing instead of waiting for the next mayday
timer expiration.  Unfortunately, it checks only whether the pool needs
help from rescuers, but it doesn't check whether the pwq has work items
in the pool (the real reason that this rescuer can help for the pool).

The patch adds the check and void unneeded requeuing.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-05-29 10:22:10 -04:00
Sebastian Andrzej Siewior
a9b8a98529 workqueue: Convert the pool::lock and wq_mayday_lock to raw_spinlock_t
The workqueue code has it's internal spinlocks (pool::lock), which
are acquired on most workqueue operations. These spinlocks are
converted to 'sleeping' spinlocks on a RT-kernel.

Workqueue functions can be invoked from contexts which are truly atomic
even on a PREEMPT_RT enabled kernel. Taking sleeping locks from such
contexts is forbidden.

The pool::lock hold times are bound and the code sections are
relatively short, which allows to convert pool::lock and as a
consequence wq_mayday_lock to raw spinlocks which are truly spinning
locks even on a PREEMPT_RT kernel.

With the previous conversion of the manager waitqueue to a simple
waitqueue workqueues are now fully RT compliant.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-05-29 10:03:47 -04:00
Sebastian Andrzej Siewior
d8bb65ab70 workqueue: Use rcuwait for wq_manager_wait
The workqueue code has it's internal spinlock (pool::lock) and also
implicit spinlock usage in the wq_manager waitqueue. These spinlocks
are converted to 'sleeping' spinlocks on a RT-kernel.

Workqueue functions can be invoked from contexts which are truly atomic
even on a PREEMPT_RT enabled kernel. Taking sleeping locks from such
contexts is forbidden.

pool::lock can be converted to a raw spinlock as the lock held times
are short. But the workqueue manager waitqueue is handled inside of
pool::lock held regions which again violates the lock nesting rules
of raw and regular spinlocks.

The manager waitqueue has no special requirements like custom wakeup
callbacks or mass wakeups. While it does not use exclusive wait mode
explicitly there is no strict requirement to queue the waiters in a
particular order as there is only one waiter at a time.

This allows to replace the waitqueue with rcuwait which solves the
locking problem because rcuwait relies on existing locking.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-05-29 10:00:35 -04:00
Zhang Qiang
342ed2400b workqueue: Remove unnecessary kfree() call in rcu_free_wq()
The data structure member "wq->rescuer" was reset to a null pointer
in one if branch. It was passed to a call of the function "kfree"
in the callback function "rcu_free_wq" (which was eventually executed).
The function "kfree" does not perform more meaningful data processing
for a passed null pointer (besides immediately returning from such a call).
Thus delete this function call which became unnecessary with the referenced
software update.

Fixes: def98c84b6 ("workqueue: Fix spurious sanity check failures in destroy_workqueue()")

Suggested-by: Markus Elfring <Markus.Elfring@web.de>
Signed-off-by: Zhang Qiang <qiang.zhang@windriver.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-05-27 09:52:41 -04:00
Dan Carpenter
b92b36eadf workqueue: Fix an use after free in init_rescuer()
We need to preserve error code before freeing "rescuer".

Fixes: f187b6974f ("workqueue: Use IS_ERR and PTR_ERR instead of PTR_ERR_OR_ZERO.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-05-11 10:25:42 -04:00
Sean Fu
f187b6974f workqueue: Use IS_ERR and PTR_ERR instead of PTR_ERR_OR_ZERO.
Replace inline function PTR_ERR_OR_ZERO with IS_ERR and PTR_ERR to
remove redundant parameter definitions and checks.
Reduce code size.
Before:
   text	   data	    bss	    dec	    hex	filename
  47510	   5979	    840	  54329	   d439	kernel/workqueue.o
After:
   text	   data	    bss	    dec	    hex	filename
  47474	   5979	    840	  54293	   d415	kernel/workqueue.o

Signed-off-by: Sean Fu <fxinrong@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-05-05 11:56:07 -04:00
Sebastian Andrzej Siewior
62849a9612 workqueue: Remove the warning in wq_worker_sleeping()
The kernel test robot triggered a warning with the following race:
   task-ctx A                            interrupt-ctx B
 worker
  -> process_one_work()
    -> work_item()
      -> schedule();
         -> sched_submit_work()
           -> wq_worker_sleeping()
             -> ->sleeping = 1
               atomic_dec_and_test(nr_running)
         __schedule();                *interrupt*
                                       async_page_fault()
                                       -> local_irq_enable();
                                       -> schedule();
                                          -> sched_submit_work()
                                            -> wq_worker_sleeping()
                                               -> if (WARN_ON(->sleeping)) return
                                          -> __schedule()
                                            ->  sched_update_worker()
                                              -> wq_worker_running()
                                                 -> atomic_inc(nr_running);
                                                 -> ->sleeping = 0;

      ->  sched_update_worker()
        -> wq_worker_running()
          if (!->sleeping) return

In this context the warning is pointless everything is fine.
An interrupt before wq_worker_sleeping() will perform the ->sleeping
assignment (0 -> 1 > 0) twice.
An interrupt after wq_worker_sleeping() will trigger the warning and
nr_running will be decremented (by A) and incremented once (only by B, A
will skip it). This is the case until the ->sleeping is zeroed again in
wq_worker_running().

Remove the WARN statement because this condition may happen. Document
that preemption around wq_worker_sleeping() needs to be disabled to
protect ->sleeping and not just as an optimisation.

Fixes: 6d25be5782 ("sched/core, workqueues: Distangle worker accounting from rq lock")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Link: https://lkml.kernel.org/r/20200327074308.GY11705@shao2-debian
2020-04-08 11:35:20 +02:00
Linus Torvalds
0adb8bc039 Merge branch 'for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
 "Nothing too interesting. Just two trivial patches"

* 'for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Mark up unlocked access to wq->first_flusher
  workqueue: Make workqueue_init*() return void
2020-04-03 12:27:36 -07:00
Chris Wilson
00d5d15b06 workqueue: Mark up unlocked access to wq->first_flusher
[ 7329.671518] BUG: KCSAN: data-race in flush_workqueue / flush_workqueue
[ 7329.671549]
[ 7329.671572] write to 0xffff8881f65fb250 of 8 bytes by task 37173 on cpu 2:
[ 7329.671607]  flush_workqueue+0x3bc/0x9b0 (kernel/workqueue.c:2844)
[ 7329.672527]
[ 7329.672540] read to 0xffff8881f65fb250 of 8 bytes by task 37175 on cpu 0:
[ 7329.672571]  flush_workqueue+0x28d/0x9b0 (kernel/workqueue.c:2835)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-03-12 14:26:50 -04:00
Hillf Danton
aa202f1f56 workqueue: don't use wq_select_unbound_cpu() for bound works
wq_select_unbound_cpu() is designed for unbound workqueues only, but
it's wrongly called when using a bound workqueue too.

Fixing this ensures work queued to a bound workqueue with
cpu=WORK_CPU_UNBOUND always runs on the local CPU.

Before, that would happen only if wq_unbound_cpumask happened to include
it (likely almost always the case), or was empty, or we got lucky with
forced round-robin placement.  So restricting
/sys/devices/virtual/workqueue/cpumask to a small subset of a machine's
CPUs would cause some bound work items to run unexpectedly there.

Fixes: ef55718044 ("workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs")
Cc: stable@vger.kernel.org # v4.5+
Signed-off-by: Hillf Danton <hdanton@sina.com>
[dj: massage changelog]
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-03-10 10:30:51 -04:00
Yu Chen
2333e82995 workqueue: Make workqueue_init*() return void
The return values of workqueue_init() and workqueue_early_int() are
always 0, and there is no usage of their return value.  So just make
them return void.

Signed-off-by: Yu Chen <chen.yu@easystack.cn>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-03-04 11:21:49 -05:00
Linus Torvalds
c677124e63 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "These were the main changes in this cycle:

   - More -rt motivated separation of CONFIG_PREEMPT and
     CONFIG_PREEMPTION.

   - Add more low level scheduling topology sanity checks and warnings
     to filter out nonsensical topologies that break scheduling.

   - Extend uclamp constraints to influence wakeup CPU placement

   - Make the RT scheduler more aware of asymmetric topologies and CPU
     capacities, via uclamp metrics, if CONFIG_UCLAMP_TASK=y

   - Make idle CPU selection more consistent

   - Various fixes, smaller cleanups, updates and enhancements - please
     see the git log for details"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits)
  sched/fair: Define sched_idle_cpu() only for SMP configurations
  sched/topology: Assert non-NUMA topology masks don't (partially) overlap
  idle: fix spelling mistake "iterrupts" -> "interrupts"
  sched/fair: Remove redundant call to cpufreq_update_util()
  sched/psi: create /proc/pressure and /proc/pressure/{io|memory|cpu} only when psi enabled
  sched/fair: Fix sgc->{min,max}_capacity calculation for SD_OVERLAP
  sched/fair: calculate delta runnable load only when it's needed
  sched/cputime: move rq parameter in irqtime_account_process_tick
  stop_machine: Make stop_cpus() static
  sched/debug: Reset watchdog on all CPUs while processing sysrq-t
  sched/core: Fix size of rq::uclamp initialization
  sched/uclamp: Fix a bug in propagating uclamp value in new cgroups
  sched/fair: Load balance aggressively for SCHED_IDLE CPUs
  sched/fair : Improve update_sd_pick_busiest for spare capacity case
  watchdog: Remove soft_lockup_hrtimer_cnt and related code
  sched/rt: Make RT capacity-aware
  sched/fair: Make EAS wakeup placement consider uclamp restrictions
  sched/fair: Make task_fits_capacity() consider uclamp restrictions
  sched/uclamp: Rename uclamp_util_with() into uclamp_rq_util_with()
  sched/uclamp: Make uclamp util helpers use and return UL values
  ...
2020-01-28 10:07:09 -08:00
Daniel Jordan
1c5da0ec7f workqueue: add worker function to workqueue_execute_end tracepoint
It's surprising that workqueue_execute_end includes only the work when
its counterpart workqueue_execute_start has both the work and the worker
function.

You can't set a tracing filter or trigger based on the function, and
postprocessing scripts interested in specific functions are harder to
write since they have to remember the work from _start and match it up
with the same field in _end.

Add the function name, taking care to use the copy stashed in the
worker since the work is no longer safe to touch.

Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-01-15 08:02:47 -08:00
Ingo Molnar
1e5f8a3085 Merge tag 'v5.5-rc3' into sched/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-25 10:41:37 +01:00
Sebastian Andrzej Siewior
025f50f386 sched/rt, workqueue: Use PREEMPTION
CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT.
Both PREEMPT and PREEMPT_RT require the same functionality which today
depends on CONFIG_PREEMPT.

Update the comment to use PREEMPTION because it is true for both
preemption models.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20191015191821.11479-35-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-08 14:37:37 +01:00
Kefeng Wang
1d9a6159bd workqueue: Use pr_warn instead of pr_warning
Use pr_warn() instead of the remaining pr_warning() calls.

Link: http://lkml.kernel.org/r/20191128004752.35268-2-wangkefeng.wang@huawei.com
To: joe@perches.com
To: linux-kernel@vger.kernel.org
Cc: gregkh@linuxfoundation.org
Cc: tj@kernel.org
Cc: arnd@arndb.de
Cc: sergey.senozhatsky@gmail.com
Cc: rostedt@goodmis.org
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2019-12-06 09:59:30 +01:00
Linus Torvalds
1ae78780ed Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Dynamic tick (nohz) updates, perhaps most notably changes to force
     the tick on when needed due to lengthy in-kernel execution on CPUs
     on which RCU is waiting.

   - Linux-kernel memory consistency model updates.

   - Replace rcu_swap_protected() with rcu_prepace_pointer().

   - Torture-test updates.

   - Documentation updates.

   - Miscellaneous fixes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
  security/safesetid: Replace rcu_swap_protected() with rcu_replace_pointer()
  net/sched: Replace rcu_swap_protected() with rcu_replace_pointer()
  net/netfilter: Replace rcu_swap_protected() with rcu_replace_pointer()
  net/core: Replace rcu_swap_protected() with rcu_replace_pointer()
  bpf/cgroup: Replace rcu_swap_protected() with rcu_replace_pointer()
  fs/afs: Replace rcu_swap_protected() with rcu_replace_pointer()
  drivers/scsi: Replace rcu_swap_protected() with rcu_replace_pointer()
  drm/i915: Replace rcu_swap_protected() with rcu_replace_pointer()
  x86/kvm/pmu: Replace rcu_swap_protected() with rcu_replace_pointer()
  rcu: Upgrade rcu_swap_protected() to rcu_replace_pointer()
  rcu: Suppress levelspread uninitialized messages
  rcu: Fix uninitialized variable in nocb_gp_wait()
  rcu: Update descriptions for rcu_future_grace_period tracepoint
  rcu: Update descriptions for rcu_nocb_wake tracepoint
  rcu: Remove obsolete descriptions for rcu_barrier tracepoint
  rcu: Ensure that ->rcu_urgent_qs is set before resched IPI
  workqueue: Convert for_each_wq to use built-in list check
  rcu: Several rcu_segcblist functions can be static
  rcu: Remove unused function hlist_bl_del_init_rcu()
  Documentation: Rename rcu_node_context_switch() to rcu_note_context_switch()
  ...
2019-11-26 15:42:43 -08:00
Sebastian Andrzej Siewior
49e9d1a9fa workqueue: Add RCU annotation for pwq list walk
An additional check has been recently added to ensure that a RCU related lock
is held while the RCU list is iterated.
The `pwqs' are sometimes iterated without a RCU lock but with the &wq->mutex
acquired leading to a warning.

Teach list_for_each_entry_rcu() that the RCU usage is okay if &wq->mutex
is acquired during the list traversal.

Fixes: 28875945ba ("rcu: Add support for consolidated-RCU reader checking")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
2019-11-15 11:53:35 -08:00
Joel Fernandes (Google)
5a6446626d workqueue: Convert for_each_wq to use built-in list check
Because list_for_each_entry_rcu() can now check for holding a
lock as well as for being in an RCU read-side critical section,
this commit replaces the workqueue_sysfs_unregister() function's
use of assert_rcu_or_wq_mutex() and list_for_each_entry_rcu() with
list_for_each_entry_rcu() augmented with a lockdep_is_held() optional
argument.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-10-30 08:34:10 -07:00
Tejun Heo
e66b39af00 workqueue: Fix pwq ref leak in rescuer_thread()
008847f66c ("workqueue: allow rescuer thread to do more work.") made
the rescuer worker requeue the pwq immediately if there may be more
work items which need rescuing instead of waiting for the next mayday
timer expiration.  Unfortunately, it doesn't check whether the pwq is
already on the mayday list and unconditionally gets the ref and moves
it onto the list.  This doesn't corrupt the list but creates an
additional reference to the pwq.  It got queued twice but will only be
removed once.

This leak later can trigger pwq refcnt warning on workqueue
destruction and prevent freeing of the workqueue.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: "Williams, Gerald S" <gerald.s.williams@intel.com>
Cc: NeilBrown <neilb@suse.de>
Cc: stable@vger.kernel.org # v3.19+
2019-10-04 10:23:11 -07:00
Tejun Heo
c29eb85386 workqueue: more destroy_workqueue() fixes
destroy_workqueue() warnings still, at a lower frequency, trigger
spuriously.  The problem seems to be in-flight operations which
haven't reached put_pwq() yet.

* Make sanity check grab all the related locks so that it's
  synchronized against operations which puts pwq at the end.

* Always print out the offending pwq.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: "Williams, Gerald S" <gerald.s.williams@intel.com>
2019-10-04 10:23:01 -07:00
Tejun Heo
30ae2fc0a7 workqueue: Minor follow-ups to the rescuer destruction change
* Now that wq->rescuer may be cleared while rescuer is still there,
  switch show_pwq() debug printout to test worker->rescue_wq to
  identify rescuers intead of testing wq->rescuer.

* Update comment on ->rescuer locking.

Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Lai Jiangshan <jiangshanlai@gmail.com>
2019-09-20 14:09:14 -07:00