Commit Graph

454 Commits

Author SHA1 Message Date
Ingo Molnar
65c2ce7004 Merge tag 'v3.15-rc6' into sched/core, to pick up the latest fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 10:28:56 +02:00
Lai Jiangshan
77668c8b55 workqueue: fix a possible race condition between rescuer and pwq-release
There is a race condition between rescuer_thread() and
pwq_unbound_release_workfn().

Even after a pwq is scheduled for rescue, the associated work items
may be consumed by any worker.  If all of them are consumed before the
rescuer gets to them and the pwq's base ref was put due to attribute
change, the pwq may be released while still being linked on
@wq->maydays list making the rescuer dereference already freed pwq
later.

Make send_mayday() pin the target pwq until the rescuer is done with
it.

tj: Updated comment and patch description.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org # v3.10+
2014-04-18 12:33:29 -04:00
Lai Jiangshan
4d595b866d workqueue: make rescuer_thread() empty wq->maydays list before exiting
After a @pwq is scheduled for emergency execution, other workers may
consume the affectd work items before the rescuer gets to them.  This
means that a workqueue many have pwqs queued on @wq->maydays list
while not having any work item pending or in-flight.  If
destroy_workqueue() executes in such condition, the rescuer may exit
without emptying @wq->maydays.

This currently doesn't cause any actual harm.  destroy_workqueue() can
safely destroy all the involved data structures whether @wq->maydays
is populated or not as nobody access the list once the rescuer exits.

However, this is nasty and makes future development difficult.  Let's
update rescuer_thread() so that it empties @wq->maydays after seeing
should_stop to guarantee that the list is empty on rescuer exit.

tj: Updated comment and patch description.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org # v3.10+
2014-04-18 11:04:16 -04:00
Dongsheng Yang
8698a745d8 sched, treewide: Replace hardcoded nice values with MIN_NICE/MAX_NICE
Replace various -20/+19 hardcoded nice values with MIN_NICE/MAX_NICE.

Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/ff13819fd09b7a5dba5ab5ae797f2e7019bdfa17.1394532288.git.yangds.fnst@cn.fujitsu.com
Cc: devel@driverdev.osuosl.org
Cc: devicetree@vger.kernel.org
Cc: fcoe-devel@open-fcoe.org
Cc: linux390@de.ibm.com
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-s390@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Cc: nbd-general@lists.sourceforge.net
Cc: ocfs2-devel@oss.oracle.com
Cc: openipmi-developer@lists.sourceforge.net
Cc: qla2xxx-upstream@qlogic.com
Cc: linux-arch@vger.kernel.org
[ Consolidated the patches, twiddled the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-18 12:07:24 +02:00
Daeseok Youn
77f300b198 workqueue: fix bugs in wq_update_unbound_numa() failure path
wq_update_unbound_numa() failure path has the following two bugs.

- alloc_unbound_pwq() is called without holding wq->mutex; however, if
  the allocation fails, it jumps to out_unlock which tries to unlock
  wq->mutex.

- The function should switch to dfl_pwq on failure but didn't do so
  after alloc_unbound_pwq() failure.

Fix it by regrabbing wq->mutex and jumping to use_dfl_pwq on
alloc_unbound_pwq() failure.

Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com>
Acked-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 4c16bd327c ("workqueue: implement NUMA affinity for unbound workqueues")
2014-04-16 13:27:57 -04:00
Linus Torvalds
1ead658124 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer changes from Thomas Gleixner:
 "This assorted collection provides:

   - A new timer based timer broadcast feature for systems which do not
     provide a global accessible timer device.  That allows those
     systems to put CPUs into deep idle states where the per cpu timer
     device stops.

   - A few NOHZ_FULL related improvements to the timer wheel

   - The usual updates to timer devices found in ARM SoCs

   - Small improvements and updates all over the place"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
  tick: Remove code duplication in tick_handle_periodic()
  tick: Fix spelling mistake in tick_handle_periodic()
  x86: hpet: Use proper destructor for delayed work
  workqueue: Provide destroy_delayed_work_on_stack()
  clocksource: CMT, MTU2, TMU and STI should depend on GENERIC_CLOCKEVENTS
  timer: Remove code redundancy while calling get_nohz_timer_target()
  hrtimer: Rearrange comments in the order struct members are declared
  timer: Use variable head instead of &work_list in __run_timers()
  clocksource: exynos_mct: silence a static checker warning
  arm: zynq: Add support for cpufreq
  arm: zynq: Don't use arm_global_timer with cpufreq
  clocksource/cadence_ttc: Overhaul clocksource frequency adjustment
  clocksource/cadence_ttc: Call clockevents_update_freq() with IRQs enabled
  clocksource: Add Kconfig entries for CMT, MTU2, TMU and STI
  sh: Remove Kconfig entries for TMU, CMT and MTU2
  ARM: shmobile: Remove CMT, TMU and STI Kconfig entries
  clocksource: armada-370-xp: Use atomic access for shared registers
  clocksource: orion: Use atomic access for shared registers
  clocksource: timer-keystone: Delete unnecessary variable
  clocksource: timer-keystone: introduce clocksource driver for Keystone
  ...
2014-04-01 11:00:07 -07:00
Thomas Gleixner
ea2e64f280 workqueue: Provide destroy_delayed_work_on_stack()
If a delayed or deferrable work is on stack we need to tell debug
objects that we are destroying the timer and the work. Otherwise we
leak the tracking object.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Acked-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20140323141939.911487677@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-25 17:33:42 +01:00
Dongsheng Yang
144818422b workqueue: Replace hardcoding of -20 and 19 with MIN_NICE and MAX_NICE
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/6d85138180c00ce86975addab6e34b24b84f00a5.1392103744.git.yangds.fnst@cn.fujitsu.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-22 18:16:41 +01:00
Lai Jiangshan
5bdfff96c6 workqueue: ensure @task is valid across kthread_stop()
When a kworker should die, the kworkre is notified through WORKER_DIE
flag instead of kthread_should_stop().  This, IIRC, is primarily to
keep the test synchronized inside worker_pool lock.  WORKER_DIE is
first set while holding pool->lock, the lock is dropped and
kthread_stop() is called.

Unfortunately, this means that there's a slight chance that the target
kworker may see WORKER_DIE before kthread_stop() finishes and exits
and frees the target task before or during kthread_stop().

Fix it by pinning the target task before setting WORKER_DIE and
putting it after kthread_stop() is done.

tj: Improved patch description and comment.  Moved pinning above
    WORKER_DIE for better signify what it's protecting.

CC: stable@vger.kernel.org
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-02-18 16:35:20 -05:00
Linus Torvalds
4a2829b976 Merge branch 'for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue update from Tejun Heo:
 "Just one patch to add destroy_work_on_stack() annotations to help
  debugobj debugging"

* 'for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Calling destroy_work_on_stack() to pair with INIT_WORK_ONSTACK()
2014-01-21 17:46:31 -08:00
Chuansheng Liu
440a113603 workqueue: Calling destroy_work_on_stack() to pair with INIT_WORK_ONSTACK()
In case CONFIG_DEBUG_OBJECTS_WORK is defined, it is needed to
call destroy_work_on_stack() which frees the debug object to pair
with INIT_WORK_ONSTACK().

Signed-off-by: Liu, Chuansheng <chuansheng.liu@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-01-11 22:26:33 -05:00
Linus Torvalds
9199c4caa1 Merge tag 'pci-v3.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
 "PCI device hotplug
    - Move device_del() from pci_stop_dev() to pci_destroy_dev() (Rafael
      Wysocki)

  Host bridge drivers
    - Update maintainers for DesignWare, i.MX6, Armada, R-Car (Bjorn
      Helgaas)
    - mvebu: Return 'unsupported' for Interrupt Line and Interrupt Pin
      (Jason Gunthorpe)

  Miscellaneous
    - Avoid unnecessary CPU switch when calling .probe() (Alexander
      Duyck)
    - Revert "workqueue: allow work_on_cpu() to be called recursively"
      (Bjorn Helgaas)
    - Disable Bus Master only on kexec reboot (Khalid Aziz)
    - Omit PCI ID macro strings to shorten quirk names for LTO (Michal
      Marek)"

* tag 'pci-v3.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  MAINTAINERS: Add DesignWare, i.MX6, Armada, R-Car PCI host maintainers
  PCI: Disable Bus Master only on kexec reboot
  PCI: mvebu: Return 'unsupported' for Interrupt Line and Interrupt Pin
  PCI: Omit PCI ID macro strings to shorten quirk names
  PCI: Move device_del() from pci_stop_dev() to pci_destroy_dev()
  Revert "workqueue: allow work_on_cpu() to be called recursively"
  PCI: Avoid unnecessary CPU switch when calling driver .probe() method
2013-12-15 11:45:27 -08:00
Bjorn Helgaas
12997d1a99 Revert "workqueue: allow work_on_cpu() to be called recursively"
This reverts commit c2fda50966.

c2fda50966 removed lockdep annotation from work_on_cpu() to work around
the PCI path that calls work_on_cpu() from within a work_on_cpu() work item
(PF driver .probe() method -> pci_enable_sriov() -> add VFs -> VF driver
.probe method).

961da7fb6b22 ("PCI: Avoid unnecessary CPU switch when calling driver
.probe() method) avoids that recursive work_on_cpu() use in a different
way, so this revert restores the work_on_cpu() lockdep annotation.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
2013-11-25 14:37:22 -07:00
Li Bin
4e8b22bd1a workqueue: fix pool ID allocation leakage and remove BUILD_BUG_ON() in init_workqueues
When one work starts execution, the high bits of work's data contain
pool ID. It can represent a maximum of WORK_OFFQ_POOL_NONE. Pool ID
is assigned WORK_OFFQ_POOL_NONE when the work being initialized
indicating that no pool is associated and get_work_pool() uses it to
check the associated pool. So if worker_pool_assign_id() assigns a
ID greater than or equal WORK_OFFQ_POOL_NONE to a pool, it triggers
leakage, and it may break the non-reentrance guarantee.

This patch fix this issue by modifying the worker_pool_assign_id()
function calling idr_alloc() by setting @end param WORK_OFFQ_POOL_NONE.

Furthermore, in the current implementation, the BUILD_BUG_ON() in
init_workqueues makes no sense. The number of worker pools needed
cannot be determined at compile time, because the number of backing
pools for UNBOUND workqueues is dynamic based on the assigned custom
attributes. So remove it.

tj: Minor comment and indentation updates.

Signed-off-by: Li Bin <huawei.libin@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-11-22 18:14:47 -05:00
Li Bin
9ef28a73ff workqueue: fix comment typo for __queue_work()
It seems the "dying" should be "draining" here.

Signed-off-by: Li Bin <huawei.libin@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-11-22 18:14:27 -05:00
Tejun Heo
8a2b753844 workqueue: fix ordered workqueues in NUMA setups
An ordered workqueue implements execution ordering by using single
pool_workqueue with max_active == 1.  On a given pool_workqueue, work
items are processed in FIFO order and limiting max_active to 1
enforces the queued work items to be processed one by one.

Unfortunately, 4c16bd327c ("workqueue: implement NUMA affinity for
unbound workqueues") accidentally broke this guarantee by applying
NUMA affinity to ordered workqueues too.  On NUMA setups, an ordered
workqueue would end up with separate pool_workqueues for different
nodes.  Each pool_workqueue still limits max_active to 1 but multiple
work items may be executed concurrently and out of order depending on
which node they are queued to.

Fix it by using dedicated ordered_wq_attrs[] when creating ordered
workqueues.  The new attrs match the unbound ones except that no_numa
is always set thus forcing all NUMA nodes to share the default
pool_workqueue.

While at it, add sanity check in workqueue creation path which
verifies that an ordered workqueues has only the default
pool_workqueue.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Libin <huawei.libin@huawei.com>
Cc: stable@vger.kernel.org
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-11-22 18:14:02 -05:00
Oleg Nesterov
9115122806 workqueue: swap set_cpus_allowed_ptr() and PF_NO_SETAFFINITY
Move the setting of PF_NO_SETAFFINITY up before set_cpus_allowed()
in create_worker(). Otherwise userland can change ->cpus_allowed
in between.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-11-22 18:13:20 -05:00
Linus Torvalds
2e515bf096 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree from Jiri Kosina:
 "The usual trivial updates all over the tree -- mostly typo fixes and
  documentation updates"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (52 commits)
  doc: Documentation/cputopology.txt fix typo
  treewide: Convert retrun typos to return
  Fix comment typo for init_cma_reserved_pageblock
  Documentation/trace: Correcting and extending tracepoint documentation
  mm/hotplug: fix a typo in Documentation/memory-hotplug.txt
  power: Documentation: Update s2ram link
  doc: fix a typo in Documentation/00-INDEX
  Documentation/printk-formats.txt: No casts needed for u64/s64
  doc: Fix typo "is is" in Documentations
  treewide: Fix printks with 0x%#
  zram: doc fixes
  Documentation/kmemcheck: update kmemcheck documentation
  doc: documentation/hwspinlock.txt fix typo
  PM / Hibernate: add section for resume options
  doc: filesystems : Fix typo in Documentations/filesystems
  scsi/megaraid fixed several typos in comments
  ppc: init_32: Fix error typo "CONFIG_START_KERNEL"
  treewide: Add __GFP_NOWARN to k.alloc calls with v.alloc fallbacks
  page_isolation: Fix a comment typo in test_pages_isolated()
  doc: fix a typo about irq affinity
  ...
2013-09-06 09:36:28 -07:00
Linus Torvalds
9ee52a1633 Merge branch 'for-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
 "Nothing interesting.  All are doc / comment updates"

* 'for-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Correct/Drop references to gcwq in Documentation
  workqueue: Fix manage_workers() RETURNS description
  workqueue: Comment correction in file header
  workqueue: mark WQ_NON_REENTRANT deprecated
2013-09-03 18:19:21 -07:00
Linus Torvalds
542a086ac7 Merge tag 'driver-core-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core patches from Greg KH:
 "Here's the big driver core pull request for 3.12-rc1.

  Lots of tiny changes here fixing up the way sysfs attributes are
  created, to try to make drivers simpler, and fix a whole class race
  conditions with creations of device attributes after the device was
  announced to userspace.

  All the various pieces are acked by the different subsystem
  maintainers"

* tag 'driver-core-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (119 commits)
  firmware loader: fix pending_fw_head list corruption
  drivers/base/memory.c: introduce help macro to_memory_block
  dynamic debug: line queries failing due to uninitialized local variable
  sysfs: sysfs_create_groups returns a value.
  debugfs: provide debugfs_create_x64() when disabled
  rbd: convert bus code to use bus_groups
  firmware: dcdbas: use binary attribute groups
  sysfs: add sysfs_create/remove_groups for when SYSFS is not enabled
  driver core: add #include <linux/sysfs.h> to core files.
  HID: convert bus code to use dev_groups
  Input: serio: convert bus code to use drv_groups
  Input: gameport: convert bus code to use drv_groups
  driver core: firmware: use __ATTR_RW()
  driver core: core: use DEVICE_ATTR_RO
  driver core: bus: use DRIVER_ATTR_WO()
  driver core: create write-only attribute macros for devices and drivers
  sysfs: create __ATTR_WO()
  driver-core: platform: convert bus code to use dev_groups
  workqueue: convert bus code to use dev_groups
  MEI: convert bus code to use dev_groups
  ...
2013-09-03 11:37:15 -07:00
Tejun Heo
b22ce2785d workqueue: cond_resched() after processing each work item
If !PREEMPT, a kworker running work items back to back can hog CPU.
This becomes dangerous when a self-requeueing work item which is
waiting for something to happen races against stop_machine.  Such
self-requeueing work item would requeue itself indefinitely hogging
the kworker and CPU it's running on while stop_machine would wait for
that CPU to enter stop_machine while preventing anything else from
happening on all other CPUs.  The two would deadlock.

Jamie Liu reports that this deadlock scenario exists around
scsi_requeue_run_queue() and libata port multiplier support, where one
port may exclude command processing from other ports.  With the right
timing, scsi_requeue_run_queue() can end up requeueing itself trying
to execute an IO which is asked to be retried while another device has
an exclusive access, which in turn can't make forward progress due to
stop_machine.

Fix it by invoking cond_resched() after executing each work item.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jamie Liu <jamieliu@google.com>
References: http://thread.gmane.org/gmane.linux.kernel/1552567
Cc: stable@vger.kernel.org
--
 kernel/workqueue.c |    9 +++++++++
 1 file changed, 9 insertions(+)
2013-08-29 09:19:28 -04:00
Greg Kroah-Hartman
1a6661dafd workqueue: convert bus code to use dev_groups
The dev_attrs field of struct bus_type is going away soon, dev_groups
should be used instead.  This converts the workqueue bus code to use
the correct field.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-23 14:38:12 -07:00
Libin
2d498db981 workqueue: Fix manage_workers() RETURNS description
No functional change. The comment of function manage_workers()
RETURNS description is obvious wrong, same as the CONTEXT.
Fix it.

Signed-off-by: Libin <huawei.libin@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-08-21 10:32:09 -04:00
Libin
b11895c458 workqueue: Comment correction in file header
No functional change. There are two worker pools for each cpu in
current implementation (one for normal work items and the other for
high priority ones).

tj: Whitespace adjustments.

Signed-off-by: Libin <huawei.libin@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-08-21 10:32:09 -04:00
Yacine Belkadi
d185af300f workqueue: fix some scripts/kernel-doc warnings
When building the htmldocs (in verbose mode), scripts/kernel-doc reports the
following type of warnings:

Warning(kernel/workqueue.c:653): No description found for return value of
'get_work_pool'

Fix them by:
- Using "Return:" sections to introduce descriptions of return values
- Adding some missing descriptions

Signed-off-by: Yacine Belkadi <yacine.belkadi.1@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-08-20 12:57:25 +02:00