Implement worker states. After created, a worker is STARTED. While a
worker isn't processing a work, it's IDLE and chained on
gcwq->idle_list. While processing a work, a worker is BUSY and
chained on gcwq->busy_hash. Also, gcwq now counts the number of all
workers and idle ones.
worker_thread() is restructured to reflect state transitions.
cwq->more_work is removed and waking up a worker makes it check for
events. A worker is killed by setting DIE flag while it's IDLE and
waking it up.
This gives gcwq better visibility of what's going on and allows it to
find out whether a work is executing quickly which is necessary to
have multiple workers processing the same cwq.
Signed-off-by: Tejun Heo <tj@kernel.org>
There is one gcwq (global cwq) per each cpu and all cwqs on an cpu
point to it. A gcwq contains a lock to be used by all cwqs on the cpu
and an ida to give IDs to workers belonging to the cpu.
This patch introduces gcwq, moves worker_ida into gcwq and make all
cwqs on the same cpu use the cpu's gcwq->lock instead of separate
locks. gcwq->ida is now protected by gcwq->lock too.
Signed-off-by: Tejun Heo <tj@kernel.org>
Currently, workqueue freezing is implemented by marking the worker
freezeable and calling try_to_freeze() from dispatch loop.
Reimplement it using cwq->limit so that the workqueue is frozen
instead of the worker.
* workqueue_struct->saved_max_active is added which stores the
specified max_active on initialization.
* On freeze, all cwq->max_active's are quenched to zero. Freezing is
complete when nr_active on all cwqs reach zero.
* On thaw, all cwq->max_active's are restored to wq->saved_max_active
and the worklist is repopulated.
This new implementation allows having single shared pool of workers
per cpu.
Signed-off-by: Tejun Heo <tj@kernel.org>
Add cwq->nr_active, cwq->max_active and cwq->delayed_work. nr_active
counts the number of active works per cwq. A work is active if it's
flushable (colored) and is on cwq's worklist. If nr_active reaches
max_active, new works are queued on cwq->delayed_work and activated
later as works on the cwq complete and decrement nr_active.
cwq->max_active can be specified via the new @max_active parameter to
__create_workqueue() and is set to 1 for all workqueues for now. As
each cwq has only single worker now, this double queueing doesn't
cause any behavior difference visible to its users.
This will be used to reimplement freeze/thaw and implement shared
worker pool.
Signed-off-by: Tejun Heo <tj@kernel.org>
A work is linked to the next one by having WORK_STRUCT_LINKED bit set
and these links can be chained. When a linked work is dispatched to a
worker, all linked works are dispatched to the worker's newly added
->scheduled queue and processed back-to-back.
Currently, as there's only single worker per cwq, having linked works
doesn't make any visible behavior difference. This change is to
prepare for multiple shared workers per cpu.
Signed-off-by: Tejun Heo <tj@kernel.org>
Separate out worker thread related information to struct worker from
struct cpu_workqueue_struct and implement helper functions to deal
with the new struct worker. The only change which is visible outside
is that now workqueue worker are all named "kworker/CPUID:WORKERID"
where WORKERID is allocated from per-cpu ida.
This is in preparation of concurrency managed workqueue where shared
multiple workers would be available per cpu.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reimplement workqueue flushing using color coded works. wq has the
current work color which is painted on the works being issued via
cwqs. Flushing a workqueue is achieved by advancing the current work
colors of cwqs and waiting for all the works which have any of the
previous colors to drain.
Currently there are 16 possible colors, one is reserved for no color
and 15 colors are useable allowing 14 concurrent flushes. When color
space gets full, flush attempts are batched up and processed together
when color frees up, so even with many concurrent flushers, the new
implementation won't build up huge queue of flushers which has to be
processed one after another.
Only works which are queued via __queue_work() are colored. Works
which are directly put on queue using insert_work() use NO_COLOR and
don't participate in workqueue flushing. Currently only works used
for work-specific flush fall in this category.
This new implementation leaves only cleanup_workqueue_thread() as the
user of flush_cpu_workqueue(). Just make its users use
flush_workqueue() and kthread_stop() directly and kill
cleanup_workqueue_thread(). As workqueue flushing doesn't use barrier
request anymore, the comment describing the complex synchronization
around it in cleanup_workqueue_thread() is removed together with the
function.
This new implementation is to allow having and sharing multiple
workers per cpu.
Please note that one more bit is reserved for a future work flag by
this patch. This is to avoid shifting bits and updating comments
later.
Signed-off-by: Tejun Heo <tj@kernel.org>
work->data field is used for two purposes. It points to cwq it's
queued on and the lower bits are used for flags. Currently, two bits
are reserved which is always safe as 4 byte alignment is guaranteed on
every architecture. However, future changes will need more flag bits.
On SMP, the percpu allocator is capable of honoring larger alignment
(there are other users which depend on it) and larger alignment works
just fine. On UP, percpu allocator is a thin wrapper around
kzalloc/kfree() and don't honor alignment request.
This patch introduces WORK_STRUCT_FLAG_BITS and implements
alloc/free_cwqs() which guarantees max(1 << WORK_STRUCT_FLAG_BITS,
__alignof__(unsigned long long) alignment both on SMP and UP. On SMP,
simply wrapping percpu allocator is enough. On UP, extra space is
allocated so that cwq can be aligned and the original pointer can be
stored after it which is used in the free path.
* Alignment problem on UP is reported by Michal Simek.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Reported-by: Michal Simek <michal.simek@petalogix.com>
Worker management is about to be overhauled. Simplify things by
removing cpu_populated_map, creating workers for all possible cpus and
making single threaded workqueues behave more like multi threaded
ones.
After this patch, all cwqs are always initialized, all workqueues are
linked on the workqueues list and workers for all possibles cpus
always exist. This also makes CPU hotplug support simpler - checking
->cpus_allowed before processing works in worker_thread() and flushing
cwqs on CPU_POST_DEAD are enough.
While at it, make get_cwq() always return the cwq for the specified
cpu, add target_cwq() for cases where single thread distinction is
necessary and drop all direct usage of per_cpu_ptr() on wq->cpu_wq.
Signed-off-by: Tejun Heo <tj@kernel.org>
Strip tracing code from workqueue and remove workqueue tracing. This
is temporary measure till concurrency managed workqueue is complete.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Work flags are about to see more traditional mask handling. Define
WORK_STRUCT_*_BIT as the bit position constant and redefine
WORK_STRUCT_* as bit masks. Also, make WORK_STRUCT_STATIC_* flags
conditional
While at it, re-define these constants as enums and use
WORK_STRUCT_STATIC instead of hard-coding 2 in
WORK_DATA_STATIC_INIT().
Signed-off-by: Tejun Heo <tj@kernel.org>
Currently, __create_workqueue_key() takes @singlethread and
@freezeable paramters and store them separately in workqueue_struct.
Merge them into a single flags parameter and field and use
WQ_FREEZEABLE and WQ_SINGLE_THREAD.
Signed-off-by: Tejun Heo <tj@kernel.org>
Make the following updates in preparation of concurrency managed
workqueue. None of these changes causes any visible behavior
difference.
* Add comments and adjust indentations to data structures and several
functions.
* Rename wq_per_cpu() to get_cwq() and swap the position of two
parameters for consistency. Convert a direct per_cpu_ptr() access
to wq->cpu_wq to get_cwq().
* Add work_static() and Update set_wq_data() such that it sets the
flags part to WORK_STRUCT_PENDING | WORK_STRUCT_STATIC if static |
@extra_flags.
* Move santiy check on work->entry emptiness from queue_work_on() to
__queue_work() which all queueing paths share.
* Make __queue_work() take @cpu and @wq instead of @cwq.
* Restructure flush_work() and __create_workqueue_key() to make them
easier to modify.
Signed-off-by: Tejun Heo <tj@kernel.org>
With stop_machine() converted to use cpu_stop, RT workqueue doesn't
have any user left. Kill RT workqueue support.
Signed-off-by: Tejun Heo <tj@kernel.org>
Some recent uses of RCU make use of workqueues. In these uses, execution
within the context of a specific workqueue takes the place of the usual
RCU read-side primitives such as rcu_read_lock(), and flushing of workqueues
takes the place of the usual RCU grace-period primitives. Checking for
correct use of rcu_dereference() in such cases requires a test of whether
the code is executing in the context of a particular workqueue. This
commit adds an in_workqueue_context() function that provides this test.
This new function is only defined when lockdep is enabled, which allows
it to be used as the second argument of rcu_dereference_check().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
In short: change cancel_work_sync(work) to mark this work as "never
queued" upon return.
When cancel_work_sync(work) succeeds, we know that this work can't be
queued or running, and since we own WORK_STRUCT_PENDING nobody can change
the bits in work->data under us. This means we can also clear the "cwq"
part along with _PENDING bit lockless before return, unless the work is
queued nobody can assume get_wq_data() is stable even under cwq->lock.
This change can speedup the subsequent cancel/flush requests, and as
Dmitry pointed out this simplifies the usage of work_struct's which
can be queued on different workqueues. Consider this pseudo code from
the input subsystem:
struct workqueue_struct *WQ;
struct work_struct *WORK;
for (;;) {
WQ = create_workqueue();
...
if (condition())
queue_work(WQ, WORK);
...
cancel_work_sync(WORK);
destroy_workqueue(WQ);
}
If condition() returns T and then F, cancel_work_sync() will crash the
kernel because WORK->data still points to the already destroyed workqueue.
With this patch the code like above becomes correct.
Suggested-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
This patch (as1319) adds kerneldoc and a pointed warning to
flush_scheduled_work().
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Tejun Heo <tj@kernel.org>
flush_delayed_work() always uses keventd_wq for re-queueing,
but it should use the workqueue this dwork was queued on.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Commit 65a6446434 ("HWPOISON: Allow
schedule_on_each_cpu() from keventd") which allows schedule_on_each_cpu()
to be called from keventd added a race condition. schedule_on_each_cpu()
may race with cpu hotplug and end up executing the function twice on a
cpu.
Fix it by moving direct execution into the section protected with
get/put_online_cpus(). While at it, update code such that direct
execution is done after works have been scheduled for all other cpus and
drop unnecessary cpu != orig test from flush loop.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add debugobject support to track the life time of work_structs.
While at it, remove duplicate definition of
INIT_DELAYED_WORK_ON_STACK().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
* 'hwpoison-2.6.32' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6:
HWPOISON: fix invalid page count in printk output
HWPOISON: Allow schedule_on_each_cpu() from keventd
HWPOISON: fix/proc/meminfo alignment
HWPOISON: fix oops on ksm pages
HWPOISON: Fix page count leak in hwpoison late kill in do_swap_page
HWPOISON: return early on non-LRU pages
HWPOISON: Add brief hwpoison description to Documentation
HWPOISON: Clean up PR_MCE_KILL interface
Right now when calling schedule_on_each_cpu() from keventd there
is a deadlock because it tries to schedule a work item on the current CPU
too. This happens via lru_add_drain_all() in hwpoison.
Just call the function for the current CPU in this case. This is actually
faster too.
Debugging with Fengguang Wu & Max Asbock
Signed-off-by: Andi Kleen <ak@linux.intel.com>