Commit Graph

1885 Commits

Author SHA1 Message Date
Linus Torvalds
46d9be3e5e Merge branch 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
 "A lot of activities on workqueue side this time.  The changes achieve
  the followings.

   - WQ_UNBOUND workqueues - the workqueues which are per-cpu - are
     updated to be able to interface with multiple backend worker pools.
     This involved a lot of churning but the end result seems actually
     neater as unbound workqueues are now a lot closer to per-cpu ones.

   - The ability to interface with multiple backend worker pools are
     used to implement unbound workqueues with custom attributes.
     Currently the supported attributes are the nice level and CPU
     affinity.  It may be expanded to include cgroup association in
     future.  The attributes can be specified either by calling
     apply_workqueue_attrs() or through /sys/bus/workqueue/WQ_NAME/* if
     the workqueue in question is exported through sysfs.

     The backend worker pools are keyed by the actual attributes and
     shared by any workqueues which share the same attributes.  When
     attributes of a workqueue are changed, the workqueue binds to the
     worker pool with the specified attributes while leaving the work
     items which are already executing in its previous worker pools
     alone.

     This allows converting custom worker pool implementations which
     want worker attribute tuning to use workqueues.  The writeback pool
     is already converted in block tree and there are a couple others
     are likely to follow including btrfs io workers.

   - WQ_UNBOUND's ability to bind to multiple worker pools is also used
     to make it NUMA-aware.  Because there's no association between work
     item issuer and the specific worker assigned to execute it, before
     this change, using unbound workqueue led to unnecessary cross-node
     bouncing and it couldn't be helped by autonuma as it requires tasks
     to have implicit node affinity and workers are assigned randomly.

     After these changes, an unbound workqueue now binds to multiple
     NUMA-affine worker pools so that queued work items are executed in
     the same node.  This is turned on by default but can be disabled
     system-wide or for individual workqueues.

     Crypto was requesting NUMA affinity as encrypting data across
     different nodes can contribute noticeable overhead and doing it
     per-cpu was too limiting for certain cases and IO throughput could
     be bottlenecked by one CPU being fully occupied while others have
     idle cycles.

  While the new features required a lot of changes including
  restructuring locking, it didn't complicate the execution paths much.
  The unbound workqueue handling is now closer to per-cpu ones and the
  new features are implemented by simply associating a workqueue with
  different sets of backend worker pools without changing queue,
  execution or flush paths.

  As such, even though the amount of change is very high, I feel
  relatively safe in that it isn't likely to cause subtle issues with
  basic correctness of work item execution and handling.  If something
  is wrong, it's likely to show up as being associated with worker pools
  with the wrong attributes or OOPS while workqueue attributes are being
  changed or during CPU hotplug.

  While this creates more backend worker pools, it doesn't add too many
  more workers unless, of course, there are many workqueues with unique
  combinations of attributes.  Assuming everything else is the same,
  NUMA awareness costs an extra worker pool per NUMA node with online
  CPUs.

  There are also a couple things which are being routed outside the
  workqueue tree.

   - block tree pulled in workqueue for-3.10 so that writeback worker
     pool can be converted to unbound workqueue with sysfs control
     exposed.  This simplifies the code, makes writeback workers
     NUMA-aware and allows tuning nice level and CPU affinity via sysfs.

   - The conversion to workqueue means that there's no 1:1 association
     between a specific worker, which makes writeback folks unhappy as
     they want to be able to tell which filesystem caused a problem from
     backtrace on systems with many filesystems mounted.  This is
     resolved by allowing work items to set debug info string which is
     printed when the task is dumped.  As this change involves unifying
     implementations of dump_stack() and friends in arch codes, it's
     being routed through Andrew's -mm tree."

* 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (84 commits)
  workqueue: use kmem_cache_free() instead of kfree()
  workqueue: avoid false negative WARN_ON() in destroy_workqueue()
  workqueue: update sysfs interface to reflect NUMA awareness and a kernel param to disable NUMA affinity
  workqueue: implement NUMA affinity for unbound workqueues
  workqueue: introduce put_pwq_unlocked()
  workqueue: introduce numa_pwq_tbl_install()
  workqueue: use NUMA-aware allocation for pool_workqueues
  workqueue: break init_and_link_pwq() into two functions and introduce alloc_unbound_pwq()
  workqueue: map an unbound workqueues to multiple per-node pool_workqueues
  workqueue: move hot fields of workqueue_struct to the end
  workqueue: make workqueue->name[] fixed len
  workqueue: add workqueue->unbound_attrs
  workqueue: determine NUMA node of workers accourding to the allowed cpumask
  workqueue: drop 'H' from kworker names of unbound worker pools
  workqueue: add wq_numa_tbl_len and wq_numa_possible_cpumask[]
  workqueue: move pwq_pool_locking outside of get/put_unbound_pool()
  workqueue: fix memory leak in apply_workqueue_attrs()
  workqueue: fix unbound workqueue attrs hashing / comparison
  workqueue: fix race condition in unbound workqueue free path
  workqueue: remove pwq_lock which is no longer used
  ...
2013-04-29 19:07:40 -07:00
Linus Torvalds
73154383f0 Merge branch 'akpm' (incoming from Andrew)
Merge first batch of fixes from Andrew Morton:

 - A couple of kthread changes

 - A few minor audit patches

 - A number of fbdev patches.  Florian remains AWOL so I'm picking up
   some of these.

 - A few kbuild things

 - ocfs2 updates

 - Almost all of the MM queue

(And in the meantime, I already have the second big batch from Andrew
pending in my mailbox ;^)

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (149 commits)
  memcg: take reference before releasing rcu_read_lock
  mem hotunplug: fix kfree() of bootmem memory
  mmKconfig: add an option to disable bounce
  mm, nobootmem: do memset() after memblock_reserve()
  mm, nobootmem: clean-up of free_low_memory_core_early()
  fs/buffer.c: remove unnecessary init operation after allocating buffer_head.
  numa, cpu hotplug: change links of CPU and node when changing node number by onlining CPU
  mm: fix memory_hotplug.c printk format warning
  mm: swap: mark swap pages writeback before queueing for direct IO
  swap: redirty page if page write fails on swap file
  mm, memcg: give exiting processes access to memory reserves
  thp: fix huge zero page logic for page with pfn == 0
  memcg: avoid accessing memcg after releasing reference
  fs: fix fsync() error reporting
  memblock: fix missing comment of memblock_insert_region()
  mm: Remove unused parameter of pages_correctly_reserved()
  firmware, memmap: fix firmware_map_entry leak
  mm/vmstat: add note on safety of drain_zonestat
  mm: thp: add split tail pages to shrink page list in page reclaim
  mm: allow for outstanding swap writeback accounting
  ...
2013-04-29 17:29:08 -07:00
Linus Torvalds
7b053842b9 Merge tag 'regmap-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap updates from Mark Brown:
 "In user visible terms just a couple of enhancements here, though there
  was a moderate amount of refactoring required in order to support the
  register cache sync performance improvements.

   - Support for block and asynchronous I/O during register cache
     syncing; this provides a use case dependant performance
     improvement.
   - Additional debugfs information on the memory consuption and
     register set"

* tag 'regmap-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: (23 commits)
  regmap: don't corrupt work buffer in _regmap_raw_write()
  regmap: cache: Fix format specifier in dev_dbg
  regmap: cache: Make regcache_sync_block_raw static
  regmap: cache: Write consecutive registers in a single block write
  regmap: cache: Split raw and non-raw syncs
  regmap: cache: Factor out block sync
  regmap: cache: Factor out reg_present support from rbtree cache
  regmap: cache: Use raw I/O to sync rbtrees if we can
  regmap: core: Provide regmap_can_raw_write() operation
  regmap: cache: Provide a get address of value operation
  regmap: Cut down on the average # of nodes in the rbtree cache
  regmap: core: Make raw write available to regcache
  regmap: core: Warn on invalid operation combinations
  regmap: irq: Clarify error message when we fail to request primary IRQ
  regmap: rbtree Expose total memory consumption in the rbtree debugfs entry
  regmap: debugfs: Add a registers `range' file
  regmap: debugfs: Simplify calculation of `c->max_reg'
  regmap: cache: Store caches in native register format where possible
  regmap: core: Split out in place value parsing
  regmap: cache: Use regcache_get_value() to check if we updated
  ...
2013-04-29 16:31:26 -07:00
Yasuaki Ishimatsu
3464046824 numa, cpu hotplug: change links of CPU and node when changing node number by onlining CPU
When booting x86 system contains memoryless node, node numbers of CPUs
on memoryless node were changed to nearest online node number by
init_cpu_to_node() because the node is not online.

In my system, node numbers of cpu#30-44 and 75-89 were changed from 2 to
0 as follows:

  $ numactl --hardware
  available: 2 nodes (0-1)
  node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 30 31 32 33 34 35 36 37 38 39 40
  41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 75 76 77 78 79 80 81 82
  83 84 85 86 87 88 89
  node 0 size: 32394 MB
  node 0 free: 27898 MB
  node 1 cpus: 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 60 61 62 63 64 65 66
  67 68 69 70 71 72 73 74
  node 1 size: 32768 MB
  node 1 free: 30335 MB

If we hot add memory to memoryless node and offine/online all CPUs on
the node, node numbers of these CPUs are changed to correct node numbers
by srat_detect_node() because the node become online.

In this case, node numbers of cpu#30-44 and 75-89 were changed from 0 to
2 in my system as follows:

  $ numactl --hardware
  available: 3 nodes (0-2)
  node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 45 46 47 48 49 50 51 52 53 54 55
  56 57 58 59
  node 0 size: 32394 MB
  node 0 free: 27218 MB
  node 1 cpus: 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 60 61 62 63 64 65 66
  67 68 69 70 71 72 73 74
  node 1 size: 32768 MB
  node 1 free: 30014 MB
  node 2 cpus: 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 75 76 77 78 79 80 81
  82 83 84 85 86 87 88 89
  node 2 size: 16384 MB
  node 2 free: 16384 MB

But "cpu to node" and "node to cpu" links were not changed as follows:

  $ ls /sys/devices/system/cpu/cpu30/|grep node
  node0
  $ ls /sys/devices/system/node/node0/|grep cpu30
  cpu30

"numactl --hardware" shows that cpu30 belongs to node 2.  But sysfs
links does not change.

This patch changes "cpu to node" and "node to cpu" links when node
number changed by onlining CPU.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:39 -07:00
Tang Chen
6056d619a8 mm: Remove unused parameter of pages_correctly_reserved()
nr_pages is not used in pages_correctly_reserved().
So remove it.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:38 -07:00
David Rientjes
4edd7ceff0 mm, hotplug: avoid compiling memory hotremove functions when disabled
__remove_pages() is only necessary for CONFIG_MEMORY_HOTREMOVE.  PowerPC
pseries will return -EOPNOTSUPP if unsupported.

Adding an #ifdef causes several other functions it depends on to also
become unnecessary, which saves in .text when disabled (it's disabled in
most defconfigs besides powerpc, including x86).  remove_memory_block()
becomes static since it is not referenced outside of
drivers/base/memory.c.

Build tested on x86 and powerpc with CONFIG_MEMORY_HOTREMOVE both enabled
and disabled.

Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:37 -07:00
Andrew Morton
6e259e7dc4 drivers/base/node.c: switch to register_hotmemory_notifier()
Squishes a warning which my change to hotplug_memory_notifier() added.

I want to keep that warning, because it is punishment for failnig to check
the hotplug_memory_notifier() return value.

Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:36 -07:00
Mark Brown
38a817965d Merge remote-tracking branch 'regmap/topic/range' into regmap-next 2013-04-16 16:05:50 +01:00
Mark Brown
7f47db4006 Merge remote-tracking branch 'regmap/topic/irq' into regmap-next 2013-04-16 16:05:48 +01:00
Mark Brown
5b3b448475 Merge remote-tracking branch 'regmap/topic/cache' into regmap-next 2013-04-16 16:05:46 +01:00
Mark Brown
b508c80c55 Merge remote-tracking branch 'regmap/topic/async' into regmap-next 2013-04-16 16:05:44 +01:00
Stephen Warren
5a08d15602 regmap: don't corrupt work buffer in _regmap_raw_write()
_regmap_raw_write() contains code to call regcache_write() to write
values to the cache. That code calls memcpy() to copy the value data to
the start of the work_buf. However, at least when _regmap_raw_write() is
called from _regmap_bus_raw_write(), the value data is in the work_buf,
and this memcpy() operation may over-write part of that value data,
depending on the value of reg_bytes + pad_bytes. At least when using
reg_bytes==1 and pad_bytes==0, corruption of the value data does occur.

To solve this, remove the memcpy() operation, and modify the subsequent
.parse_val() call to parse the original value buffer directly.

At least in the case of 8-bit register address and 16-bit values, and
writes of single registers at a time, this memcpy-then-parse combination
used to cancel each-other out; for a work-buffer containing xx 89 03,
the memcpy changed it to 89 03 03, and the parse_val changed it back to
89 89 03, thus leaving the value uncorrupted. This appears completely
accidental though. Since commit 8a819ff "regmap: core: Split out in
place value parsing", .parse_val only returns the parsed value, and does
not modify the buffer, and hence does not (accidentally) undo the
corruption caused by memcpy(). This caused bogus values to get written
to HW, thus preventing e.g. audio playback on systems with a WM8903
CODEC. This patch fixes that.

Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2013-04-16 16:05:03 +01:00
Mark Brown
60f7110e36 Merge tag 'v3.9-rc7' into regmap-cache
Linux 3.9-rc7
2013-04-16 16:02:41 +01:00
Greg Kroah-Hartman
0d1d392f01 Merge 3.9-rc7 into driver-core-next
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-14 18:37:05 -07:00
Ulf Hansson
fa180eb448 PM / Runtime: Idle devices asynchronously after probe|release
Putting devices into idle|suspend in a synchronous manner means we are
waiting for each device to become idle|suspended before the probe|release
is fully done.

This patch switch to use the asynchronous runtime PM API:s instead and
thus improves the parallelism since we can move on and handle the next
device in queue in an earlier phase.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Kevin Hilman <khilman@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-11 12:42:52 -07:00
Greg Kroah-Hartman
4e4098a3e0 driver core: handle user namespaces properly with the uid/gid devtmpfs change
Now that devtmpfs is caring about uid/gid, we need to use the correct
internal types so users who have USER_NS enabled will have things work
properly for them.

Thanks to Eric for pointing this out, and the patch review.

Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-11 11:43:29 -07:00
Ming Lei
d81c8d19da driver core: devtmpfs: fix compile failure with CONFIG_UIDGID_STRICT_TYPE_CHECKS
If CONFIG_UIDGID_STRICT_TYPE_CHECKS is enalbed, the below compile
failure will be triggered:

	drivers/base/devtmpfs.c: In function 'handle_create':
	drivers/base/devtmpfs.c:214:19: error: incompatible types when assigning to type 'kuid_t' from type 'uid_t'
	drivers/base/devtmpfs.c:215:19: error: incompatible types when assigning to type 'kgid_t' from type 'gid_t'
	make[2]: *** [drivers/base/devtmpfs.o] Error 1

This patch fixes the compile failure.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-10 09:36:06 -07:00
Greg Kroah-Hartman
c3a3042090 devtmpfs: add base.h include
This fixes a sparse warning, and is a good idea given that the
devtmpfs_init() prototype is in this file.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-10 09:00:15 -07:00
Mark Brown
51a246aa5c regmap: Back out work buffer fix
This reverts commit bc8ce4 (regmap: don't corrupt work buffer in
_regmap_raw_write()) since it turns out that it can cause issues when
taken in isolation from the other changes in -next that lead to its
discovery.  On the basis that nobody noticed the problems for quite some
time without that subsequent work let's drop it from v3.9.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2013-04-09 18:03:25 +01:00
Kay Sievers
3c2670e651 driver core: add uid and gid to devtmpfs
Some drivers want to tell userspace what uid and gid should be used for
their device nodes, so allow that information to percolate through the
driver core to userspace in order to make this happen.  This means that
some systems (i.e.  Android and friends) will not need to even run a
udev-like daemon for their device node manager and can just rely in
devtmpfs fully, reducing their footprint even more.

Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-08 08:21:48 -07:00
Stratos Karafotis
9659293c17 regmap: cache: Fix format specifier in dev_dbg
Fix format specifier in dev_dbg and suppress the following warning

drivers/base/regmap/regcache.c: In function
‘regcache_sync_block_raw_flush’:
drivers/base/regmap/regcache.c:593:2: warning: format ‘%d’ expects
argument of type ‘int’, but argument 4 has type ‘size_t’ [-Wformat]

Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2013-04-05 11:28:12 +01:00
Sachin Kamat
f52687afb8 regmap: cache: Make regcache_sync_block_raw static
regcache_sync_block_raw is used only in this file. Hence make it static.
Silences the following warning:
drivers/base/regmap/regcache.c:608:5: warning:
symbol 'regcache_sync_block_raw' was not declared. Should it be static?

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2013-04-05 11:28:12 +01:00
Linus Torvalds
d08d528dc1 Merge tag 'pm+acpi-3.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes from Rafael Wysocki:

 - Revert of a recent cpuidle change that caused Nehalem machines to
   hang on boot from Alex Shi.

 - USB power management fix addressing a crash in the port device
   object's release routine from Rafael J Wysocki.

 - Device PM QoS fix for a potential deadlock related to sysfs interface
   from Rafael J Wysocki.

 - Fix for a cpufreq crash when the /cpus Device Tree node is missing
   from Paolo Pisati.

 - Fix for a build issue on ia64 related to the Boot Graphics Resource
   Table (BGRT) from Tony Luck.

 - Two fixes for ACPI handles being set incorrectly for device objects
   that don't correspond to any ACPI namespace nodes in the I2C and SPI
   subsystems from Rafael J Wysocki.

 - Fix for compiler warnings related to CONFIG_PM_DEVFREQ being unset
   from Rajagopal Venkat.

 - Fix for a symbol definition typo in cpufreq_governor.h from Borislav
   Petkov.

* tag 'pm+acpi-3.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / BGRT: Don't let users configure BGRT on non X86 systems
  cpuidle / ACPI: recover percpu ACPI processor cstate
  ACPI / I2C: Use parent's ACPI_HANDLE() in acpi_i2c_register_devices()
  cpufreq: Correct header guards typo
  ACPI / SPI: Use parent's ACPI_HANDLE() in acpi_register_spi_devices()
  cpufreq: check OF node /cpus presence before dereferencing it
  PM / devfreq: Fix compiler warnings for CONFIG_PM_DEVFREQ unset
  PM / QoS: Avoid possible deadlock related to sysfs access
  USB / PM: Don't try to hide PM QoS flags from usb_port_device_release()
2013-04-04 15:56:28 -07:00
Arnd Bergmann
bcfb87fb75 sysfs: fix crash_notes_size build warning
commit eca4549f57 "sysfs: Add crash_notes_size to export percpu
note size" adds a printk that outputs a size_t value as %lu
when it should be %zu, resulting in this warning.

drivers/base/cpu.c: In function 'show_crash_notes_size':
drivers/base/cpu.c:142:2: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'unsigned int' [-Wformat=]

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Simon Horman <horms@verge.net.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-03 11:09:03 -07:00
Tejun Heo
229641a6f1 Merge tag 'v3.9-rc5' into wq/for-3.10
Writeback conversion to workqueue will be based on top of wq/for-3.10
branch to take advantage of custom attrs and NUMA support for unbound
workqueues.  Mainline currently contains two commits which result in
non-trivial merge conflicts with wq/for-3.10 and because
block/for-3.10/core is based on v3.9-rc3 which contains one of the
conflicting commits, we need a pre-merge-window merge anyway.  Let's
pull v3.9-rc5 into wq/for-3.10 so that the block tree doesn't suffer
from workqueue merge conflicts.

The two conflicts and their resolutions:

* e68035fb65 ("workqueue: convert to idr_alloc()") in mainline changes
  worker_pool_assign_id() to use idr_alloc() instead of the old idr
  interface.  worker_pool_assign_id() goes through multiple locking
  changes in wq/for-3.10 causing the following conflict.

  static int worker_pool_assign_id(struct worker_pool *pool)
  {
	  int ret;

  <<<<<<< HEAD
	  lockdep_assert_held(&wq_pool_mutex);

	  do {
		  if (!idr_pre_get(&worker_pool_idr, GFP_KERNEL))
			  return -ENOMEM;
		  ret = idr_get_new(&worker_pool_idr, pool, &pool->id);
	  } while (ret == -EAGAIN);
  =======
	  mutex_lock(&worker_pool_idr_mutex);
	  ret = idr_alloc(&worker_pool_idr, pool, 0, 0, GFP_KERNEL);
	  if (ret >= 0)
		  pool->id = ret;
	  mutex_unlock(&worker_pool_idr_mutex);
  >>>>>>> c67bf5361e7e66a0ff1f4caf95f89347d55dfb89

	  return ret < 0 ? ret : 0;
  }

  We want locking from the former and idr_alloc() usage from the
  latter, which can be combined to the following.

  static int worker_pool_assign_id(struct worker_pool *pool)
  {
	  int ret;

	  lockdep_assert_held(&wq_pool_mutex);

	  ret = idr_alloc(&worker_pool_idr, pool, 0, 0, GFP_KERNEL);
	  if (ret >= 0) {
		  pool->id = ret;
		  return 0;
	  }
	  return ret;
   }

* eb2834285c ("workqueue: fix possible pool stall bug in
  wq_unbind_fn()") updated wq_unbind_fn() such that it has single
  larger for_each_std_worker_pool() loop instead of two separate loops
  with a schedule() call inbetween.  wq/for-3.10 renamed
  pool->assoc_mutex to pool->manager_mutex causing the following
  conflict (earlier function body and comments omitted for brevity).

  static void wq_unbind_fn(struct work_struct *work)
  {
  ...
		  spin_unlock_irq(&pool->lock);
  <<<<<<< HEAD
		  mutex_unlock(&pool->manager_mutex);
	  }
  =======
		  mutex_unlock(&pool->assoc_mutex);
  >>>>>>> c67bf5361e7e66a0ff1f4caf95f89347d55dfb89

		  schedule();

  <<<<<<< HEAD
	  for_each_cpu_worker_pool(pool, cpu)
  =======
  >>>>>>> c67bf5361e7e66a0ff1f4caf95f89347d55dfb89
		  atomic_set(&pool->nr_running, 0);

		  spin_lock_irq(&pool->lock);
		  wake_up_worker(pool);
		  spin_unlock_irq(&pool->lock);
	  }
  }

  The resolution is mostly trivial.  We want the control flow of the
  latter with the rename of the former.

  static void wq_unbind_fn(struct work_struct *work)
  {
  ...
		  spin_unlock_irq(&pool->lock);
		  mutex_unlock(&pool->manager_mutex);

		  schedule();

		  atomic_set(&pool->nr_running, 0);

		  spin_lock_irq(&pool->lock);
		  wake_up_worker(pool);
		  spin_unlock_irq(&pool->lock);
	  }
  }

Signed-off-by: Tejun Heo <tj@kernel.org>
2013-04-01 18:45:36 -07:00