To implement non-reentrant workqueue, the last gcwq a work was
executed on must be reliably obtainable as long as the work structure
is valid even if the previous workqueue has been destroyed.
To achieve this, work->data will be overloaded to carry the last cpu
number once execution starts so that the previous gcwq can be located
reliably. This means that cwq can't be obtained from work after
execution starts but only gcwq.
Implement set_work_{cwq|cpu}(), get_work_[g]cwq() and
clear_work_data() to set work data to the cpu number when starting
execution, access the overloaded work data and clear it after
cancellation.
queue_delayed_work_on() is updated to preserve the last cpu while
in-flight in timer and other callers which depended on getting cwq
from work after execution starts are converted to depend on gcwq
instead.
* Anton Blanchard fixed compile error on powerpc due to missing
linux/threads.h include.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Anton Blanchard <anton@samba.org>
Reimplement st (single thread) workqueue so that it's friendly to
shared worker pool. It was originally implemented by confining st
workqueues to use cwq of a fixed cpu and always having a worker for
the cpu. This implementation isn't very friendly to shared worker
pool and suboptimal in that it ends up crossing cpu boundaries often.
Reimplement st workqueue using dynamic single cpu binding and
cwq->limit. WQ_SINGLE_THREAD is replaced with WQ_SINGLE_CPU. In a
single cpu workqueue, at most single cwq is bound to the wq at any
given time. Arbitration is done using atomic accesses to
wq->single_cpu when queueing a work. Once bound, the binding stays
till the workqueue is drained.
Note that the binding is never broken while a workqueue is frozen.
This is because idle cwqs may have works waiting in delayed_works
queue while frozen. On thaw, the cwq is restarted if there are any
delayed works or unbound otherwise.
When combined with max_active limit of 1, single cpu workqueue has
exactly the same execution properties as the original single thread
workqueue while allowing sharing of per-cpu workers.
Signed-off-by: Tejun Heo <tj@kernel.org>
Currently, workqueue freezing is implemented by marking the worker
freezeable and calling try_to_freeze() from dispatch loop.
Reimplement it using cwq->limit so that the workqueue is frozen
instead of the worker.
* workqueue_struct->saved_max_active is added which stores the
specified max_active on initialization.
* On freeze, all cwq->max_active's are quenched to zero. Freezing is
complete when nr_active on all cwqs reach zero.
* On thaw, all cwq->max_active's are restored to wq->saved_max_active
and the worklist is repopulated.
This new implementation allows having single shared pool of workers
per cpu.
Signed-off-by: Tejun Heo <tj@kernel.org>
Add cwq->nr_active, cwq->max_active and cwq->delayed_work. nr_active
counts the number of active works per cwq. A work is active if it's
flushable (colored) and is on cwq's worklist. If nr_active reaches
max_active, new works are queued on cwq->delayed_work and activated
later as works on the cwq complete and decrement nr_active.
cwq->max_active can be specified via the new @max_active parameter to
__create_workqueue() and is set to 1 for all workqueues for now. As
each cwq has only single worker now, this double queueing doesn't
cause any behavior difference visible to its users.
This will be used to reimplement freeze/thaw and implement shared
worker pool.
Signed-off-by: Tejun Heo <tj@kernel.org>
A work is linked to the next one by having WORK_STRUCT_LINKED bit set
and these links can be chained. When a linked work is dispatched to a
worker, all linked works are dispatched to the worker's newly added
->scheduled queue and processed back-to-back.
Currently, as there's only single worker per cwq, having linked works
doesn't make any visible behavior difference. This change is to
prepare for multiple shared workers per cpu.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reimplement workqueue flushing using color coded works. wq has the
current work color which is painted on the works being issued via
cwqs. Flushing a workqueue is achieved by advancing the current work
colors of cwqs and waiting for all the works which have any of the
previous colors to drain.
Currently there are 16 possible colors, one is reserved for no color
and 15 colors are useable allowing 14 concurrent flushes. When color
space gets full, flush attempts are batched up and processed together
when color frees up, so even with many concurrent flushers, the new
implementation won't build up huge queue of flushers which has to be
processed one after another.
Only works which are queued via __queue_work() are colored. Works
which are directly put on queue using insert_work() use NO_COLOR and
don't participate in workqueue flushing. Currently only works used
for work-specific flush fall in this category.
This new implementation leaves only cleanup_workqueue_thread() as the
user of flush_cpu_workqueue(). Just make its users use
flush_workqueue() and kthread_stop() directly and kill
cleanup_workqueue_thread(). As workqueue flushing doesn't use barrier
request anymore, the comment describing the complex synchronization
around it in cleanup_workqueue_thread() is removed together with the
function.
This new implementation is to allow having and sharing multiple
workers per cpu.
Please note that one more bit is reserved for a future work flag by
this patch. This is to avoid shifting bits and updating comments
later.
Signed-off-by: Tejun Heo <tj@kernel.org>
work->data field is used for two purposes. It points to cwq it's
queued on and the lower bits are used for flags. Currently, two bits
are reserved which is always safe as 4 byte alignment is guaranteed on
every architecture. However, future changes will need more flag bits.
On SMP, the percpu allocator is capable of honoring larger alignment
(there are other users which depend on it) and larger alignment works
just fine. On UP, percpu allocator is a thin wrapper around
kzalloc/kfree() and don't honor alignment request.
This patch introduces WORK_STRUCT_FLAG_BITS and implements
alloc/free_cwqs() which guarantees max(1 << WORK_STRUCT_FLAG_BITS,
__alignof__(unsigned long long) alignment both on SMP and UP. On SMP,
simply wrapping percpu allocator is enough. On UP, extra space is
allocated so that cwq can be aligned and the original pointer can be
stored after it which is used in the free path.
* Alignment problem on UP is reported by Michal Simek.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Reported-by: Michal Simek <michal.simek@petalogix.com>
Work flags are about to see more traditional mask handling. Define
WORK_STRUCT_*_BIT as the bit position constant and redefine
WORK_STRUCT_* as bit masks. Also, make WORK_STRUCT_STATIC_* flags
conditional
While at it, re-define these constants as enums and use
WORK_STRUCT_STATIC instead of hard-coding 2 in
WORK_DATA_STATIC_INIT().
Signed-off-by: Tejun Heo <tj@kernel.org>
Currently, __create_workqueue_key() takes @singlethread and
@freezeable paramters and store them separately in workqueue_struct.
Merge them into a single flags parameter and field and use
WQ_FREEZEABLE and WQ_SINGLE_THREAD.
Signed-off-by: Tejun Heo <tj@kernel.org>
Make the following updates in preparation of concurrency managed
workqueue. None of these changes causes any visible behavior
difference.
* Add comments and adjust indentations to data structures and several
functions.
* Rename wq_per_cpu() to get_cwq() and swap the position of two
parameters for consistency. Convert a direct per_cpu_ptr() access
to wq->cpu_wq to get_cwq().
* Add work_static() and Update set_wq_data() such that it sets the
flags part to WORK_STRUCT_PENDING | WORK_STRUCT_STATIC if static |
@extra_flags.
* Move santiy check on work->entry emptiness from queue_work_on() to
__queue_work() which all queueing paths share.
* Make __queue_work() take @cpu and @wq instead of @cwq.
* Restructure flush_work() and __create_workqueue_key() to make them
easier to modify.
Signed-off-by: Tejun Heo <tj@kernel.org>
With stop_machine() converted to use cpu_stop, RT workqueue doesn't
have any user left. Kill RT workqueue support.
Signed-off-by: Tejun Heo <tj@kernel.org>
Some recent uses of RCU make use of workqueues. In these uses, execution
within the context of a specific workqueue takes the place of the usual
RCU read-side primitives such as rcu_read_lock(), and flushing of workqueues
takes the place of the usual RCU grace-period primitives. Checking for
correct use of rcu_dereference() in such cases requires a test of whether
the code is executing in the context of a particular workqueue. This
commit adds an in_workqueue_context() function that provides this test.
This new function is only defined when lockdep is enabled, which allows
it to be used as the second argument of rcu_dereference_check().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Add debugobject support to track the life time of work_structs.
While at it, remove duplicate definition of
INIT_DELAYED_WORK_ON_STACK().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
cancel_delayed_work() has to use del_timer_sync() to guarantee the timer
function is not running after return. But most users doesn't actually
need this, and del_timer_sync() has problems: it is not useable from
interrupt, and it depends on every lock which could be taken from irq.
Introduce __cancel_delayed_work() which calls del_timer() instead.
The immediate reason for this patch is
http://bugzilla.kernel.org/show_bug.cgi?id=13757
but hopefully this helper makes sense anyway.
As for 13757 bug, actually we need requeue_delayed_work(), but its
semantics are not yet clear.
Merge this patch early to resolves cross-tree interdependencies between
input and infiniband.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It is a fairly common operation to have a pointer to a work and to need a
pointer to the delayed work it is contained in. In particular, all
delayed works which want to rearm themselves will have to do that. So it
would seem fair to offer a helper function for this operation.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Greg KH <greg@kroah.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
debugobjects: add and use INIT_WORK_ON_STACK
rcu: remove duplicate CONFIG_RCU_CPU_STALL_DETECTOR
relay: fix lock imbalance in relay_late_setup_files
oprofile: fix uninitialized use of struct op_entry
rcu: move Kconfig menu
softlock: fix false panic which can occur if softlockup_thresh is reduced
rcu: add __cpuinit to rcu_init_percpu_data()
Impact: Fix debugobjects warning
debugobject enabled kernels spit out a warning in hpet code due to a
workqueue which is initialized on stack.
Add INIT_WORK_ON_STACK() which calls init_timer_on_stack() and use it
in hpet.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Alexander Beregalov reported that this warning is caused by the HPET code:
> hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
> hpet0: 3 comparators, 64-bit 14.318180 MHz counter
> ODEBUG: object is on stack, but not annotated
> ------------[ cut here ]------------
> WARNING: at lib/debugobjects.c:251 __debug_object_init+0x2a4/0x352()
> Bisected down to 26afe5f2fb
> (x86: HPET_MSI Initialise per-cpu HPET timers)
The commit is fine - but the on-stack workqueue entry needs annotation.
Reported-and-bisected-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: introduce new APIs
We want to deprecate cpumasks on the stack, as we are headed for
gynormous numbers of CPUs. Eventually, we want to head towards an
undefined 'struct cpumask' so they can never be declared on stack.
1) New cpumask functions which take pointers instead of copies.
(cpus_* -> cpumask_*)
2) Several new helpers to reduce requirements for temporary cpumasks
(cpumask_first_and, cpumask_next_and, cpumask_any_and)
3) Helpers for declaring cpumasks on or offstack for large NR_CPUS
(cpumask_var_t, alloc_cpumask_var and free_cpumask_var)
4) 'struct cpumask' for explicitness and to mark new-style code.
5) Make iterator functions stop at nr_cpu_ids (a runtime constant),
not NR_CPUS for time efficiency and for smaller dynamic allocations
in future.
6) cpumask_copy() so we can allocate less than a full cpumask eventually
(for alloc_cpumask_var), and so we can eliminate the 'struct cpumask'
definition eventually.
7) work_on_cpu() helper for doing task on a CPU, rather than saving old
cpumask for current thread and manipulating it.
8) smp_call_function_many() which is smp_call_function_mask() except
taking a cpumask pointer.
Note that this patch simply introduces the new functions and leaves
the obsolescent ones in place. This is to simplify the transition
patches.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
create_rt_workqueue will create a real time prioritized workqueue.
This is needed for the conversion of stop_machine to a workqueue based
implementation.
This patch adds yet another parameter to __create_workqueue_key to tell
it that we want an rt workqueue.
However it looks like we rather should have something like "int type"
instead of singlethread, freezable and rt.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@elte.hu>
Most of users of flush_workqueue() can be changed to use cancel_work_sync(),
but sometimes we really need to wait for the completion and cancelling is not
an option. schedule_on_each_cpu() is good example.
Add the new helper, flush_work(work), which waits for the completion of the
specific work_struct. More precisely, it "flushes" the result of of the last
queue_work() which is visible to the caller.
For example, this code
queue_work(wq, work);
/* WINDOW */
queue_work(wq, work);
flush_work(work);
doesn't necessary work "as expected". What can happen in the WINDOW above is
- wq starts the execution of work->func()
- the caller migrates to another CPU
now, after the 2nd queue_work() this work is active on the previous CPU, and
at the same time it is queued on another. In this case flush_work(work) may
return before the first work->func() completes.
It is trivial to add another helper
int flush_work_sync(struct work_struct *work)
{
return flush_work(work) || wait_on_work(work);
}
which works "more correctly", but it has to iterate over all CPUs and thus
it much slower than flush_work().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Max Krasnyansky <maxk@qualcomm.com>
Acked-by: Jarek Poplawski <jarkao2@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This interface allows adding a job on a specific cpu.
Although a work struct on a cpu will be scheduled to other cpu if the cpu
dies, there is a recursion if a work task tries to offline the cpu it's
running on. we need to schedule the task to a specific cpu in this case.
http://bugzilla.kernel.org/show_bug.cgi?id=10897
[oleg@tv-sign.ru: cleanups]
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Rus <harbour@sfinx.od.ua>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
FASTCALL() is always expanded to empty, remove it.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>