Commit Graph

445 Commits

Author SHA1 Message Date
Linus Torvalds
4a2829b976 Merge branch 'for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue update from Tejun Heo:
 "Just one patch to add destroy_work_on_stack() annotations to help
  debugobj debugging"

* 'for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Calling destroy_work_on_stack() to pair with INIT_WORK_ONSTACK()
2014-01-21 17:46:31 -08:00
Chuansheng Liu
440a113603 workqueue: Calling destroy_work_on_stack() to pair with INIT_WORK_ONSTACK()
In case CONFIG_DEBUG_OBJECTS_WORK is defined, it is needed to
call destroy_work_on_stack() which frees the debug object to pair
with INIT_WORK_ONSTACK().

Signed-off-by: Liu, Chuansheng <chuansheng.liu@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-01-11 22:26:33 -05:00
Linus Torvalds
9199c4caa1 Merge tag 'pci-v3.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
 "PCI device hotplug
    - Move device_del() from pci_stop_dev() to pci_destroy_dev() (Rafael
      Wysocki)

  Host bridge drivers
    - Update maintainers for DesignWare, i.MX6, Armada, R-Car (Bjorn
      Helgaas)
    - mvebu: Return 'unsupported' for Interrupt Line and Interrupt Pin
      (Jason Gunthorpe)

  Miscellaneous
    - Avoid unnecessary CPU switch when calling .probe() (Alexander
      Duyck)
    - Revert "workqueue: allow work_on_cpu() to be called recursively"
      (Bjorn Helgaas)
    - Disable Bus Master only on kexec reboot (Khalid Aziz)
    - Omit PCI ID macro strings to shorten quirk names for LTO (Michal
      Marek)"

* tag 'pci-v3.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  MAINTAINERS: Add DesignWare, i.MX6, Armada, R-Car PCI host maintainers
  PCI: Disable Bus Master only on kexec reboot
  PCI: mvebu: Return 'unsupported' for Interrupt Line and Interrupt Pin
  PCI: Omit PCI ID macro strings to shorten quirk names
  PCI: Move device_del() from pci_stop_dev() to pci_destroy_dev()
  Revert "workqueue: allow work_on_cpu() to be called recursively"
  PCI: Avoid unnecessary CPU switch when calling driver .probe() method
2013-12-15 11:45:27 -08:00
Bjorn Helgaas
12997d1a99 Revert "workqueue: allow work_on_cpu() to be called recursively"
This reverts commit c2fda50966.

c2fda50966 removed lockdep annotation from work_on_cpu() to work around
the PCI path that calls work_on_cpu() from within a work_on_cpu() work item
(PF driver .probe() method -> pci_enable_sriov() -> add VFs -> VF driver
.probe method).

961da7fb6b22 ("PCI: Avoid unnecessary CPU switch when calling driver
.probe() method) avoids that recursive work_on_cpu() use in a different
way, so this revert restores the work_on_cpu() lockdep annotation.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
2013-11-25 14:37:22 -07:00
Li Bin
4e8b22bd1a workqueue: fix pool ID allocation leakage and remove BUILD_BUG_ON() in init_workqueues
When one work starts execution, the high bits of work's data contain
pool ID. It can represent a maximum of WORK_OFFQ_POOL_NONE. Pool ID
is assigned WORK_OFFQ_POOL_NONE when the work being initialized
indicating that no pool is associated and get_work_pool() uses it to
check the associated pool. So if worker_pool_assign_id() assigns a
ID greater than or equal WORK_OFFQ_POOL_NONE to a pool, it triggers
leakage, and it may break the non-reentrance guarantee.

This patch fix this issue by modifying the worker_pool_assign_id()
function calling idr_alloc() by setting @end param WORK_OFFQ_POOL_NONE.

Furthermore, in the current implementation, the BUILD_BUG_ON() in
init_workqueues makes no sense. The number of worker pools needed
cannot be determined at compile time, because the number of backing
pools for UNBOUND workqueues is dynamic based on the assigned custom
attributes. So remove it.

tj: Minor comment and indentation updates.

Signed-off-by: Li Bin <huawei.libin@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-11-22 18:14:47 -05:00
Li Bin
9ef28a73ff workqueue: fix comment typo for __queue_work()
It seems the "dying" should be "draining" here.

Signed-off-by: Li Bin <huawei.libin@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-11-22 18:14:27 -05:00
Tejun Heo
8a2b753844 workqueue: fix ordered workqueues in NUMA setups
An ordered workqueue implements execution ordering by using single
pool_workqueue with max_active == 1.  On a given pool_workqueue, work
items are processed in FIFO order and limiting max_active to 1
enforces the queued work items to be processed one by one.

Unfortunately, 4c16bd327c ("workqueue: implement NUMA affinity for
unbound workqueues") accidentally broke this guarantee by applying
NUMA affinity to ordered workqueues too.  On NUMA setups, an ordered
workqueue would end up with separate pool_workqueues for different
nodes.  Each pool_workqueue still limits max_active to 1 but multiple
work items may be executed concurrently and out of order depending on
which node they are queued to.

Fix it by using dedicated ordered_wq_attrs[] when creating ordered
workqueues.  The new attrs match the unbound ones except that no_numa
is always set thus forcing all NUMA nodes to share the default
pool_workqueue.

While at it, add sanity check in workqueue creation path which
verifies that an ordered workqueues has only the default
pool_workqueue.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Libin <huawei.libin@huawei.com>
Cc: stable@vger.kernel.org
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-11-22 18:14:02 -05:00
Oleg Nesterov
9115122806 workqueue: swap set_cpus_allowed_ptr() and PF_NO_SETAFFINITY
Move the setting of PF_NO_SETAFFINITY up before set_cpus_allowed()
in create_worker(). Otherwise userland can change ->cpus_allowed
in between.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-11-22 18:13:20 -05:00
Linus Torvalds
2e515bf096 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree from Jiri Kosina:
 "The usual trivial updates all over the tree -- mostly typo fixes and
  documentation updates"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (52 commits)
  doc: Documentation/cputopology.txt fix typo
  treewide: Convert retrun typos to return
  Fix comment typo for init_cma_reserved_pageblock
  Documentation/trace: Correcting and extending tracepoint documentation
  mm/hotplug: fix a typo in Documentation/memory-hotplug.txt
  power: Documentation: Update s2ram link
  doc: fix a typo in Documentation/00-INDEX
  Documentation/printk-formats.txt: No casts needed for u64/s64
  doc: Fix typo "is is" in Documentations
  treewide: Fix printks with 0x%#
  zram: doc fixes
  Documentation/kmemcheck: update kmemcheck documentation
  doc: documentation/hwspinlock.txt fix typo
  PM / Hibernate: add section for resume options
  doc: filesystems : Fix typo in Documentations/filesystems
  scsi/megaraid fixed several typos in comments
  ppc: init_32: Fix error typo "CONFIG_START_KERNEL"
  treewide: Add __GFP_NOWARN to k.alloc calls with v.alloc fallbacks
  page_isolation: Fix a comment typo in test_pages_isolated()
  doc: fix a typo about irq affinity
  ...
2013-09-06 09:36:28 -07:00
Linus Torvalds
9ee52a1633 Merge branch 'for-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
 "Nothing interesting.  All are doc / comment updates"

* 'for-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Correct/Drop references to gcwq in Documentation
  workqueue: Fix manage_workers() RETURNS description
  workqueue: Comment correction in file header
  workqueue: mark WQ_NON_REENTRANT deprecated
2013-09-03 18:19:21 -07:00
Linus Torvalds
542a086ac7 Merge tag 'driver-core-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core patches from Greg KH:
 "Here's the big driver core pull request for 3.12-rc1.

  Lots of tiny changes here fixing up the way sysfs attributes are
  created, to try to make drivers simpler, and fix a whole class race
  conditions with creations of device attributes after the device was
  announced to userspace.

  All the various pieces are acked by the different subsystem
  maintainers"

* tag 'driver-core-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (119 commits)
  firmware loader: fix pending_fw_head list corruption
  drivers/base/memory.c: introduce help macro to_memory_block
  dynamic debug: line queries failing due to uninitialized local variable
  sysfs: sysfs_create_groups returns a value.
  debugfs: provide debugfs_create_x64() when disabled
  rbd: convert bus code to use bus_groups
  firmware: dcdbas: use binary attribute groups
  sysfs: add sysfs_create/remove_groups for when SYSFS is not enabled
  driver core: add #include <linux/sysfs.h> to core files.
  HID: convert bus code to use dev_groups
  Input: serio: convert bus code to use drv_groups
  Input: gameport: convert bus code to use drv_groups
  driver core: firmware: use __ATTR_RW()
  driver core: core: use DEVICE_ATTR_RO
  driver core: bus: use DRIVER_ATTR_WO()
  driver core: create write-only attribute macros for devices and drivers
  sysfs: create __ATTR_WO()
  driver-core: platform: convert bus code to use dev_groups
  workqueue: convert bus code to use dev_groups
  MEI: convert bus code to use dev_groups
  ...
2013-09-03 11:37:15 -07:00
Tejun Heo
b22ce2785d workqueue: cond_resched() after processing each work item
If !PREEMPT, a kworker running work items back to back can hog CPU.
This becomes dangerous when a self-requeueing work item which is
waiting for something to happen races against stop_machine.  Such
self-requeueing work item would requeue itself indefinitely hogging
the kworker and CPU it's running on while stop_machine would wait for
that CPU to enter stop_machine while preventing anything else from
happening on all other CPUs.  The two would deadlock.

Jamie Liu reports that this deadlock scenario exists around
scsi_requeue_run_queue() and libata port multiplier support, where one
port may exclude command processing from other ports.  With the right
timing, scsi_requeue_run_queue() can end up requeueing itself trying
to execute an IO which is asked to be retried while another device has
an exclusive access, which in turn can't make forward progress due to
stop_machine.

Fix it by invoking cond_resched() after executing each work item.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jamie Liu <jamieliu@google.com>
References: http://thread.gmane.org/gmane.linux.kernel/1552567
Cc: stable@vger.kernel.org
--
 kernel/workqueue.c |    9 +++++++++
 1 file changed, 9 insertions(+)
2013-08-29 09:19:28 -04:00
Greg Kroah-Hartman
1a6661dafd workqueue: convert bus code to use dev_groups
The dev_attrs field of struct bus_type is going away soon, dev_groups
should be used instead.  This converts the workqueue bus code to use
the correct field.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-23 14:38:12 -07:00
Libin
2d498db981 workqueue: Fix manage_workers() RETURNS description
No functional change. The comment of function manage_workers()
RETURNS description is obvious wrong, same as the CONTEXT.
Fix it.

Signed-off-by: Libin <huawei.libin@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-08-21 10:32:09 -04:00
Libin
b11895c458 workqueue: Comment correction in file header
No functional change. There are two worker pools for each cpu in
current implementation (one for normal work items and the other for
high priority ones).

tj: Whitespace adjustments.

Signed-off-by: Libin <huawei.libin@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-08-21 10:32:09 -04:00
Yacine Belkadi
d185af300f workqueue: fix some scripts/kernel-doc warnings
When building the htmldocs (in verbose mode), scripts/kernel-doc reports the
following type of warnings:

Warning(kernel/workqueue.c:653): No description found for return value of
'get_work_pool'

Fix them by:
- Using "Return:" sections to introduce descriptions of return values
- Adding some missing descriptions

Signed-off-by: Yacine Belkadi <yacine.belkadi.1@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-08-20 12:57:25 +02:00
Linus Torvalds
4264bc1371 Merge branch 'for-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull two workqueue fixes from Tejun Heo:
 "A lockdep notation update so that nested work_on_cpu() invocations
  don't lead to spurious lockdep warnings and fix for an unbound attr
  bug which made what's shown in sysfs deviate from the actual ones.
  Both patches have pretty limited scope"

* 'for-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: copy workqueue_attrs with all fields
  workqueue: allow work_on_cpu() to be called recursively
2013-08-06 13:58:34 -07:00
Shaohua Li
2865a8fb44 workqueue: copy workqueue_attrs with all fields
$echo '0' > /sys/bus/workqueue/devices/xxx/numa
 $cat /sys/bus/workqueue/devices/xxx/numa

I got 1. It should be 0, the reason is copy_workqueue_attrs() called
in apply_workqueue_attrs() doesn't copy no_numa field.

Fix it by making copy_workqueue_attrs() copy ->no_numa too.  This
would also make get_unbound_pool() set a pool's ->no_numa attribute
according to the workqueue attributes used when the pool was created.
While harmelss, as ->no_numa isn't a pool attribute, this is a bit
confusing.  Clear it explicitly.

tj: Updated description and comments a bit.

Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
2013-08-01 08:36:40 -04:00
Lai Jiangshan
c2fda50966 workqueue: allow work_on_cpu() to be called recursively
If the @fn call work_on_cpu() again, the lockdep will complain:

> [ INFO: possible recursive locking detected ]
> 3.11.0-rc1-lockdep-fix-a #6 Not tainted
> ---------------------------------------------
> kworker/0:1/142 is trying to acquire lock:
>  ((&wfc.work)){+.+.+.}, at: [<ffffffff81077100>] flush_work+0x0/0xb0
>
> but task is already holding lock:
>  ((&wfc.work)){+.+.+.}, at: [<ffffffff81075dd9>] process_one_work+0x169/0x610
>
> other info that might help us debug this:
>  Possible unsafe locking scenario:
>
>        CPU0
>        ----
>   lock((&wfc.work));
>   lock((&wfc.work));
>
>  *** DEADLOCK ***

It is false-positive lockdep report. In this sutiation,
the two "wfc"s of the two work_on_cpu() are different,
they are both on stack. flush_work() can't be deadlock.

To fix this, we need to avoid the lockdep checking in this case,
thus we instroduce a internal __flush_work() which skip the lockdep.

tj: Minor comment adjustment.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reported-by: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Reported-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-07-24 12:24:25 -04:00
Paul Gortmaker
0db0628d90 kernel: delete __cpuinit usage from all core kernel files
The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications.  For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.

After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out.  Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.

This removes all the uses of the __cpuinit macros from C files in
the core kernel directories (kernel, init, lib, mm, and include)
that don't really have a specific maintainer.

[1] https://lkml.org/lkml/2013/5/20/589

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-07-14 19:36:59 -04:00
Linus Torvalds
f317ff9eed Merge branch 'for-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue changes from Tejun Heo:
 "Surprisingly, Lai and I didn't break too many things implementing
  custom pools and stuff last time around and there aren't any follow-up
  changes necessary at this point.

  The only change in this pull request is Viresh's patches to make some
  per-cpu workqueues to behave as unbound workqueues dependent on a boot
  param whose default can be configured via a config option.  This leads
  to higher processing overhead / lower bandwidth as more work items are
  bounced across CPUs; however, it can lead to noticeable powersave in
  certain configurations - ~10% w/ idlish constant workload on a
  big.LITTLE configuration according to Viresh.

  This is because per-cpu workqueues interfere with how the scheduler
  perceives whether or not each CPU is idle by forcing pinned tasks on
  them, which makes the scheduler's power-aware scheduling decisions
  less effective.

  Its effectiveness is likely less pronounced on homogenous
  configurations and this type of optimization can probably be made
  automatic; however, the changes are pretty minimal and the affected
  workqueues are clearly marked, so it's an easy gain for some
  configurations for the time being with pretty unintrusive changes."

* 'for-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  fbcon: queue work on power efficient wq
  block: queue work on power efficient wq
  PHYLIB: queue work on system_power_efficient_wq
  workqueue: Add system wide power_efficient workqueues
  workqueues: Introduce new flag WQ_POWER_EFFICIENT for power oriented workqueues
2013-07-02 19:53:30 -07:00
Tejun Heo
1be0c25da5 workqueue: don't perform NUMA-aware allocations on offline nodes in wq_numa_init()
wq_numa_init() builds per-node cpumasks which are later used to make
unbound workqueues NUMA-aware.  The cpumasks are allocated using
alloc_cpumask_var_node() for all possible nodes.  Unfortunately, on
machines with off-line nodes, this leads to NUMA-aware allocations on
existing bug offline nodes, which in turn triggers BUG in the memory
allocation code.

Fix it by using NUMA_NO_NODE for cpumask allocations for offline
nodes.

  kernel BUG at include/linux/gfp.h:323!
  invalid opcode: 0000 [#1] SMP
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0+ #1
  Hardware name: ProLiant BL465c G7, BIOS A19 12/10/2011
  task: ffff880234608000 ti: ffff880234602000 task.ti: ffff880234602000
  RIP: 0010:[<ffffffff8117495d>]  [<ffffffff8117495d>] new_slab+0x2ad/0x340
  RSP: 0000:ffff880234603bf8  EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff880237404b40 RCX: 00000000000000d0
  RDX: 0000000000000001 RSI: 0000000000000003 RDI: 00000000002052d0
  RBP: ffff880234603c28 R08: 0000000000000000 R09: 0000000000000001
  R10: 0000000000000001 R11: ffffffff812e3aa8 R12: 0000000000000001
  R13: ffff8802378161c0 R14: 0000000000030027 R15: 00000000000040d0
  FS:  0000000000000000(0000) GS:ffff880237800000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: ffff88043fdff000 CR3: 00000000018d5000 CR4: 00000000000007f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Stack:
   ffff880234603c28 0000000000000001 00000000000000d0 ffff8802378161c0
   ffff880237404b40 ffff880237404b40 ffff880234603d28 ffffffff815edba1
   ffff880237816140 0000000000000000 ffff88023740e1c0
  Call Trace:
   [<ffffffff815edba1>] __slab_alloc+0x330/0x4f2
   [<ffffffff81174b25>] kmem_cache_alloc_node_trace+0xa5/0x200
   [<ffffffff812e3aa8>] alloc_cpumask_var_node+0x28/0x90
   [<ffffffff81a0bdb3>] wq_numa_init+0x10d/0x1be
   [<ffffffff81a0bec8>] init_workqueues+0x64/0x341
   [<ffffffff810002ea>] do_one_initcall+0xea/0x1a0
   [<ffffffff819f1f31>] kernel_init_freeable+0xb7/0x1ec
   [<ffffffff815d50de>] kernel_init+0xe/0xf0
   [<ffffffff815ff89c>] ret_from_fork+0x7c/0xb0
  Code: 45  84 ac 00 00 00 f0 41 80 4d 00 40 e9 f6 fe ff ff 66 0f 1f 84 00 00 00 00 00 e8 eb 4b ff ff 49 89 c5 e9 05 fe ff ff <0f> 0b 4c 8b 73 38 44 89 ff 81 cf 00 00 20 00 4c 89 f6 48 c1 ee

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-Tested-by: Lingzhu Xiang <lxiang@redhat.com>
2013-05-15 14:24:24 -07:00
Marc Dionne
ad7b1f841f workqueue: Make schedule_work() available again to non GPL modules
Commit 8425e3d5bd ("workqueue: inline trivial wrappers") changed
schedule_work() and schedule_delayed_work() to inline wrappers,
but these rely on some symbols that are EXPORT_SYMBOL_GPL, while
the original functions were EXPORT_SYMBOL.  This has the effect of
changing the licensing requirement for these functions and making
them unavailable to non GPL modules.

Make them available again by removing the restriction on the
required symbols.

Signed-off-by: Marc Dionne <marc.dionne@your-file-system.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-05-14 11:52:51 -07:00
Joonsoo Kim
8f174b1175 workqueue: correct handling of the pool spin_lock
When we fail to mutex_trylock(), we release the pool spin_lock and do
mutex_lock(). After that, we should regrab the pool spin_lock, but,
regrabbing is missed in current code. So correct it.

Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-05-14 11:48:15 -07:00
Viresh Kumar
0668106ca3 workqueue: Add system wide power_efficient workqueues
This patch adds system wide workqueues aligned towards power saving. This is
done by allocating them with WQ_UNBOUND flag if 'wq_power_efficient' is set to
'true'.

tj: updated comments a bit.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-05-14 10:50:06 -07:00