When the requested range is outside of the root range the logic in
__reserve_region_with_split will cause an infinite recursion which will
overflow the stack as seen in the warning bellow.
This particular stack overflow was caused by requesting the
(100000000-107ffffff) range while the root range was (0-ffffffff). In
this case __request_resource would return the whole root range as
conflict range (i.e. 0-ffffffff). Then, the logic in
__reserve_region_with_split would continue the recursion requesting the
new range as (conflict->end+1, end) which incidentally in this case
equals the originally requested range.
This patch aborts looking for an usable range when the request does not
intersect with the root range. When the request partially overlaps with
the root range, it ajust the request to fall in the root range and then
continues with the new request.
When the request is modified or aborted errors and a stack trace are
logged to allow catching the errors in the upper layers.
[ 5.968374] WARNING: at kernel/sched.c:4129 sub_preempt_count+0x63/0x89()
[ 5.975150] Modules linked in:
[ 5.978184] Pid: 1, comm: swapper Not tainted 3.0.22-mid27-00004-gb72c817 #46
[ 5.985324] Call Trace:
[ 5.987759] [<c1039dfc>] ? console_unlock+0x17b/0x18d
[ 5.992891] [<c1039620>] warn_slowpath_common+0x48/0x5d
[ 5.998194] [<c1031758>] ? sub_preempt_count+0x63/0x89
[ 6.003412] [<c1039644>] warn_slowpath_null+0xf/0x13
[ 6.008453] [<c1031758>] sub_preempt_count+0x63/0x89
[ 6.013499] [<c14d60c4>] _raw_spin_unlock+0x27/0x3f
[ 6.018453] [<c10c6349>] add_partial+0x36/0x3b
[ 6.022973] [<c10c7c0a>] deactivate_slab+0x96/0xb4
[ 6.027842] [<c14cf9d9>] __slab_alloc.isra.54.constprop.63+0x204/0x241
[ 6.034456] [<c103f78f>] ? kzalloc.constprop.5+0x29/0x38
[ 6.039842] [<c103f78f>] ? kzalloc.constprop.5+0x29/0x38
[ 6.045232] [<c10c7dc9>] kmem_cache_alloc_trace+0x51/0xb0
[ 6.050710] [<c103f78f>] ? kzalloc.constprop.5+0x29/0x38
[ 6.056100] [<c103f78f>] kzalloc.constprop.5+0x29/0x38
[ 6.061320] [<c17b45e9>] __reserve_region_with_split+0x1c/0xd1
[ 6.067230] [<c17b4693>] __reserve_region_with_split+0xc6/0xd1
...
[ 7.179057] [<c17b4693>] __reserve_region_with_split+0xc6/0xd1
[ 7.184970] [<c17b4779>] reserve_region_with_split+0x30/0x42
[ 7.190709] [<c17a8ebf>] e820_reserve_resources_late+0xd1/0xe9
[ 7.196623] [<c17c9526>] pcibios_resource_survey+0x23/0x2a
[ 7.202184] [<c17cad8a>] pcibios_init+0x23/0x35
[ 7.206789] [<c17ca574>] pci_subsys_init+0x3f/0x44
[ 7.211659] [<c1002088>] do_one_initcall+0x72/0x122
[ 7.216615] [<c17ca535>] ? pci_legacy_init+0x3d/0x3d
[ 7.221659] [<c17a27ff>] kernel_init+0xa6/0x118
[ 7.226265] [<c17a2759>] ? start_kernel+0x334/0x334
[ 7.231223] [<c14d7482>] kernel_thread_helper+0x6/0x10
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
register_sysctl_table() is a strange function, as it makes internal
allocations (a header) to register a sysctl_table. This header is a
handle to the table that is created, and can be used to unregister the
table. But if the table is permanent and never unregistered, the header
acts the same as a static variable.
Unfortunately, this allocation of memory that is never expected to be
freed fools kmemleak in thinking that we have leaked memory. For those
sysctl tables that are never unregistered, and have no pointer referencing
them, kmemleak will think that these are memory leaks:
unreferenced object 0xffff880079fb9d40 (size 192):
comm "swapper/0", pid 0, jiffies 4294667316 (age 12614.152s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff8146b590>] kmemleak_alloc+0x73/0x98
[<ffffffff8110a935>] kmemleak_alloc_recursive.constprop.42+0x16/0x18
[<ffffffff8110b852>] __kmalloc+0x107/0x153
[<ffffffff8116fa72>] kzalloc.constprop.8+0xe/0x10
[<ffffffff811703c9>] __register_sysctl_paths+0xe1/0x160
[<ffffffff81170463>] register_sysctl_paths+0x1b/0x1d
[<ffffffff8117047d>] register_sysctl_table+0x18/0x1a
[<ffffffff81afb0a1>] sysctl_init+0x10/0x14
[<ffffffff81b05a6f>] proc_sys_init+0x2f/0x31
[<ffffffff81b0584c>] proc_root_init+0xa5/0xa7
[<ffffffff81ae5b7e>] start_kernel+0x3d0/0x40a
[<ffffffff81ae52a7>] x86_64_start_reservations+0xae/0xb2
[<ffffffff81ae53ad>] x86_64_start_kernel+0x102/0x111
[<ffffffffffffffff>] 0xffffffffffffffff
The sysctl_base_table used by sysctl itself is one such instance that
registers the table to never be unregistered.
Use kmemleak_not_leak() to suppress the kmemleak false positive.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The last line of vmcoreinfo note does not end with \n. Parsing all the
lines in note becomes easier if all lines end with \n instead of trying to
special case the last line.
I know at least one tool, vmcore-dmesg in kexec-tools tree which made the
assumption that all lines end with \n. I think it is a good idea to fix
it.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The function dup_task() may fail at the following function calls in the
following order.
0) alloc_task_struct_node()
1) alloc_thread_info_node()
2) arch_dup_task_struct()
Error by 0) is not a matter, it can just return. But error by 1) requires
releasing task_struct allocated by 0) before it returns. Likewise, error
by 2) requires releasing task_struct and thread_info allocated by 0) and
1).
The existing error handling calls free_task_struct() and
free_thread_info() which do not only release task_struct and thread_info,
but also call architecture specific arch_release_task_struct() and
arch_release_thread_info().
The problem is that task_struct and thread_info are not fully initialized
yet at this point, but arch_release_task_struct() and
arch_release_thread_info() are called with them.
For example, x86 defines its own arch_release_task_struct() that releases
a task_xstate. If alloc_thread_info_node() fails in dup_task(),
arch_release_task_struct() is called with task_struct which is just
allocated and filled with garbage in this error handling.
This actually happened with tools/testing/fault-injection/failcmd.sh
# env FAILCMD_TYPE=fail_page_alloc \
./tools/testing/fault-injection/failcmd.sh --times=100 \
--min-order=0 --ignore-gfp-wait=0 \
-- make -C tools/testing/selftests/ run_tests
In order to fix this issue, make free_{task_struct,thread_info}() not to
call arch_release_{task_struct,thread_info}() and call
arch_release_{task_struct,thread_info}() implicitly where needed.
Default arch_release_task_struct() and arch_release_thread_info() are
defined as empty by default. So this change only affects the
architectures which implement their own arch_release_task_struct() or
arch_release_thread_info() as listed below.
arch_release_task_struct(): x86, sh
arch_release_thread_info(): mn10300, tile
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Salman Qazi <sqazi@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current code can be replaced by vma_pages(). So use it to simplify
the code.
[akpm@linux-foundation.org: initialise `len' at its definition site]
Signed-off-by: Huang Shijie <shijie8@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The system deadlocks (at least since 2.6.10) when
call_usermodehelper(UMH_WAIT_EXEC) request triggers
call_usermodehelper(UMH_WAIT_PROC) request.
This is because "khelper thread is waiting for the worker thread at
wait_for_completion() in do_fork() since the worker thread was created
with CLONE_VFORK flag" and "the worker thread cannot call complete()
because do_execve() is blocked at UMH_WAIT_PROC request" and "the khelper
thread cannot start processing UMH_WAIT_PROC request because the khelper
thread is waiting for the worker thread at wait_for_completion() in
do_fork()".
The easiest example to observe this deadlock is to use a corrupted
/sbin/hotplug binary (like shown below).
# : > /tmp/dummy
# chmod 755 /tmp/dummy
# echo /tmp/dummy > /proc/sys/kernel/hotplug
# modprobe whatever
call_usermodehelper("/tmp/dummy", UMH_WAIT_EXEC) is called from
kobject_uevent_env() in lib/kobject_uevent.c upon loading/unloading a
module. do_execve("/tmp/dummy") triggers a call to
request_module("binfmt-0000") from search_binary_handler() which in turn
calls call_usermodehelper(UMH_WAIT_PROC).
In order to avoid deadlock, as a for-now and easy-to-backport solution, do
not try to call wait_for_completion() in call_usermodehelper_exec() if the
worker thread was created by khelper thread with CLONE_VFORK flag. Future
and fundamental solution might be replacing singleton khelper thread with
some workqueue so that recursive calls up to max_active dependency loop
can be handled without deadlock.
[akpm@linux-foundation.org: add comment to kmod_thread_locker]
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
vprintk_emit() prefix parsing should only be done for internal kernel
messages. This allows existing behavior to be kept in all cases.
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current form of a KERN_<LEVEL> is "<.>".
Add printk_get_level and printk_skip_level functions to handle these
formats.
These functions centralize tests of KERN_<LEVEL> so a future modification
can change the KERN_<LEVEL> style and shorten the number of bytes consumed
by these headers.
[akpm@linux-foundation.org: fix build error and warning]
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Wu Fengguang <wfg@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On the suspend/resume path the boot CPU does not go though an
offline->online transition. This breaks the NMI detector post-resume
since it depends on PMU state that is lost when the system gets
suspended.
Fix this by forcing a CPU offline->online transition for the lockup
detector on the boot CPU during resume.
To provide more context, we enable NMI watchdog on Chrome OS. We have
seen several reports of systems freezing up completely which indicated
that the NMI watchdog was not firing for some reason.
Debugging further, we found a simple way of repro'ing system freezes --
issuing the command 'tasket 1 sh -c "echo nmilockup > /proc/breakme"'
after the system has been suspended/resumed one or more times.
With this patch in place, the system freeze result in panics, as
expected.
These panics provide a nice stack trace for us to debug the actual issue
causing the freeze.
[akpm@linux-foundation.org: fiddle with code comment]
[akpm@linux-foundation.org: make lockup_detector_bootcpu_resume() conditional on CONFIG_SUSPEND]
[akpm@linux-foundation.org: fix section errors]
Signed-off-by: Sameer Nanda <snanda@chromium.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Mandeep Singh Baines <msb@chromium.org>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
panic_lock is meant to ensure that panic processing takes place only on
one cpu; if any of the other cpus encounter a panic, they will spin
waiting to be shut down.
However, this causes a regression in this scenario:
1. Cpu 0 encounters a panic and acquires the panic_lock
and proceeds with the panic processing.
2. There is an interrupt on cpu 0 that also encounters
an error condition and invokes panic.
3. This second invocation fails to acquire the panic_lock
and enters the infinite while loop in panic_smp_self_stop.
Thus all panic processing is stopped, and the cpu is stuck for eternity
in the while(1) inside panic_smp_self_stop.
To address this, disable local interrupts with local_irq_disable before
acquiring the panic_lock. This will prevent interrupt handlers from
executing during the panic processing, thus avoiding this particular
problem.
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Just setting the "error" to error number is enough on failure and It
doesn't require to set "error" variable to zero in each switch case,
since it was already initialized with zero. And also removed return 0
in switch case with break statement
Signed-off-by: Sasikantha babu <sasikanth.v19@gmail.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Serge E. Hallyn <serge@hallyn.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Recently, glibc made a change to suppress sign-conversion warnings in
FD_SET (glibc commit ceb9e56b3d1). This uncovered an issue with the
kernel's definition of __NFDBITS if applications #include
<linux/types.h> after including <sys/select.h>. A build failure would
be seen when passing the -Werror=sign-compare and -D_FORTIFY_SOURCE=2
flags to gcc.
It was suggested that the kernel should either match the glibc
definition of __NFDBITS or remove that entirely. The current in-kernel
uses of __NFDBITS can be replaced with BITS_PER_LONG, and there are no
uses of the related __FDELT and __FDMASK defines. Given that, we'll
continue the cleanup that was started with commit 8b3d1cda4f
("posix_types: Remove fd_set macros") and drop the remaining unused
macros.
Additionally, linux/time.h has similar macros defined that expand to
nothing so we'll remove those at the same time.
Reported-by: Jeff Law <law@redhat.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
CC: <stable@vger.kernel.org>
Signed-off-by: Josh Boyer <jwboyer@redhat.com>
[ .. and fix up whitespace as per akpm ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull scheduler changes from Ingo Molnar:
"The biggest change is a performance improvement on SMP systems:
| 4 socket 40 core + SMT Westmere box, single 30 sec tbench
| runs, higher is better:
|
| clients 1 2 4 8 16 32 64 128
|..........................................................................
| pre 30 41 118 645 3769 6214 12233 14312
| post 299 603 1211 2418 4697 6847 11606 14557
|
| A nice increase in performance.
which speedup is particularly noticeable on heavily interacting
few-tasks workloads, so the changes should help desktop-style Xorg
workloads and interactivity as well, on multi-core CPUs.
There are also cpuset suspend behavior fixes/restructuring and various
smaller tweaks."
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Fix race in task_group()
sched: Improve balance_cpu() to consider other cpus in its group as target of (pinned) task
sched: Reset loop counters if all tasks are pinned and we need to redo load balance
sched: Reorder 'struct lb_env' members to reduce its size
sched: Improve scalability via 'CPU buddies', which withstand random perturbations
cpusets: Remove/update outdated comments
cpusets, hotplug: Restructure functions that are invoked during hotplug
cpusets, hotplug: Implement cpuset tree traversal in a helper function
CPU hotplug, cpusets, suspend: Don't modify cpusets during suspend/resume
sched/x86: Remove broken power estimation
Pull driver core changes from Greg Kroah-Hartman:
"Here's the big driver core pull request for 3.6-rc1.
Unlike 3.5, this kernel should be a lot tamer, with the printk changes
now settled down. All we have here is some extcon driver updates, w1
driver updates, a few printk cleanups that weren't needed for 3.5, but
are good to have now, and some other minor fixes/changes in the driver
core.
All of these have been in the linux-next releases for a while now.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
* tag 'driver-core-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (38 commits)
printk: Export struct log size and member offsets through vmcoreinfo
Drivers: hv: Change the hex constant to a decimal constant
driver core: don't trigger uevent after failure
extcon: MAX77693: Add extcon-max77693 driver to support Maxim MAX77693 MUIC device
sysfs: fail dentry revalidation after namespace change fix
sysfs: fail dentry revalidation after namespace change
extcon: spelling of detach in function doc
extcon: arizona: Stop microphone detection if we give up on it
extcon: arizona: Update cable reporting calls and split headset
PM / Runtime: Do not increment device usage counts before probing
kmsg - do not flush partial lines when the console is busy
kmsg - export "continuation record" flag to /dev/kmsg
kmsg - avoid warning for CONFIG_PRINTK=n compilations
kmsg - properly print over-long continuation lines
driver-core: Use kobj_to_dev instead of re-implementing it
driver-core: Move kobj_to_dev from genhd.h to device.h
driver core: Move deferred devices to the end of dpm_list before probing
driver core: move uevent call to driver_register
driver core: fix shutdown races with probe/remove(v3)
Extcon: Arizona: Add driver for Wolfson Arizona class devices
...
Pull staging tree patches from Greg Kroah-Hartman:
"Here's the big staging tree merge for the 3.6-rc1 merge window.
There are some patches in here outside of drivers/staging/, notibly
the iio code (which is still stradeling the staging / not staging
boundry), the pstore code, and the tracing code. All of these have
gotten acks from the various subsystem maintainers to be included in
this tree. The pstore and tracing patches are related, and are coming
here as they replace one of the android staging drivers.
Otherwise, the normal staging mess. Lots of cleanups and a few new
drivers (some iio drivers, and the large csr wireless driver
abomination.)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
Fixed up trivial conflicts in drivers/staging/comedi/drivers/s626.h and
drivers/staging/gdm72xx/netlink_k.c
* tag 'staging-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (1108 commits)
staging: csr: delete a bunch of unused library functions
staging: csr: remove csr_utf16.c
staging: csr: remove csr_pmem.h
staging: csr: remove CsrPmemAlloc
staging: csr: remove CsrPmemFree()
staging: csr: remove CsrMemAllocDma()
staging: csr: remove CsrMemCalloc()
staging: csr: remove CsrMemAlloc()
staging: csr: remove CsrMemFree() and CsrMemFreeDma()
staging: csr: remove csr_util.h
staging: csr: remove CsrOffSetOf()
stating: csr: remove unneeded #includes in csr_util.c
staging: csr: make CsrUInt16ToHex static
staging: csr: remove CsrMemCpy()
staging: csr: remove CsrStrLen()
staging: csr: remove CsrVsnprintf()
staging: csr: remove CsrStrDup
staging: csr: remove CsrStrChr()
staging: csr: remove CsrStrNCmp
staging: csr: remove CsrStrCmp
...
Pull first round of SCSI updates from James Bottomley:
"The most important feature of this patch set is the new async
infrastructure that makes sure async_synchronize_full() synchronizes
all domains and allows us to remove all the hacks (like having
scsi_complete_async_scans() in the device base code) and means that
the async infrastructure will "just work" in future.
The rest is assorted driver updates (aacraid, bnx2fc, virto-scsi,
megaraid, bfa, lpfc, qla2xxx, qla4xxx) plus a lot of infrastructure
work in sas and FC.
Signed-off-by: James Bottomley <JBottomley@Parallels.com>"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (97 commits)
[SCSI] Revert "[SCSI] fix async probe regression"
[SCSI] cleanup usages of scsi_complete_async_scans
[SCSI] queue async scan work to an async_schedule domain
[SCSI] async: make async_synchronize_full() flush all work regardless of domain
[SCSI] async: introduce 'async_domain' type
[SCSI] bfa: Fix to set correct return error codes and misc cleanup.
[SCSI] aacraid: Series 7 Async. (performance) mode support
[SCSI] aha152x: Allow use on 64bit systems
[SCSI] virtio-scsi: Add vdrv->scan for post VIRTIO_CONFIG_S_DRIVER_OK LUN scanning
[SCSI] bfa: squelch lockdep complaint with a spin_lock_init
[SCSI] qla2xxx: remove unnecessary reads of PCI_CAP_ID_EXP
[SCSI] qla4xxx: remove unnecessary read of PCI_CAP_ID_EXP
[SCSI] ufs: fix incorrect return value about SUCCESS and FAILED
[SCSI] ufs: reverse the ufshcd_is_device_present logic
[SCSI] ufs: use module_pci_driver
[SCSI] usb-storage: update usb devices for write cache quirk in quirk list.
[SCSI] usb-storage: add support for write cache quirk
[SCSI] set to WCE if usb cache quirk is present.
[SCSI] virtio-scsi: hotplug support for virtio-scsi
[SCSI] virtio-scsi: split scatterlist per target
...
Pull cgroup changes from Tejun Heo:
"Nothing too interesting. A minor bug fix and some cleanups."
* 'for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: Update remount documentation
cgroup: cgroup_rm_files() was calling simple_unlink() with the wrong inode
cgroup: Remove populate() documentation
cgroup: remove hierarchy_mutex
Pull workqueue changes from Tejun Heo:
"There are three major changes.
- WQ_HIGHPRI has been reimplemented so that high priority work items
are served by worker threads with -20 nice value from dedicated
highpri worker pools.
- CPU hotplug support has been reimplemented such that idle workers
are kept across CPU hotplug events. This makes CPU hotplug cheaper
(for PM) and makes the code simpler.
- flush_kthread_work() has been reimplemented so that a work item can
be freed while executing. This removes an annoying behavior
difference between kthread_worker and workqueue."
* 'for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: fix spurious CPU locality WARN from process_one_work()
kthread_worker: reimplement flush_kthread_work() to allow freeing the work item being executed
kthread_worker: reorganize to prepare for flush_kthread_work() reimplementation
workqueue: simplify CPU hotplug code
workqueue: remove CPU offline trustee
workqueue: don't butcher idle workers on an offline CPU
workqueue: reimplement CPU online rebinding to handle idle workers
workqueue: drop @bind from create_worker()
workqueue: use mutex for global_cwq manager exclusion
workqueue: ROGUE workers are UNBOUND workers
workqueue: drop CPU_DYING notifier operation
workqueue: perform cpu down operations from low priority cpu_notifier()
workqueue: reimplement WQ_HIGHPRI using a separate worker_pool
workqueue: introduce NR_WORKER_POOLS and for_each_worker_pool()
workqueue: separate out worker_pool flags
workqueue: use @pool instead of @gcwq or @cpu where applicable
workqueue: factor out worker_pool from global_cwq
workqueue: don't use WQ_HIGHPRI for unbound workqueues
Pull PCI changes from Bjorn Helgaas:
"Host bridge hotplug:
- Add MMCONFIG support for hot-added host bridges (Jiang Liu)
Device hotplug:
- Move fixups from __init to __devinit (Sebastian Andrzej Siewior)
- Call FINAL fixups for hot-added devices, too (Myron Stowe)
- Factor out generic code for P2P bridge hot-add (Yinghai Lu)
- Remove all functions in a slot, not just those with _EJx (Amos
Kong)
Dynamic resource management:
- Track bus number allocation (struct resource tree per domain)
(Yinghai Lu)
- Make P2P bridge 1K I/O windows work with resource reassignment
(Bjorn Helgaas, Yinghai Lu)
- Disable decoding while updating 64-bit BARs (Bjorn Helgaas)
Power management:
- Add PCIe runtime D3cold support (Huang Ying)
Virtualization:
- Add VFIO infrastructure (ACS, DMA source ID quirks) (Alex
Williamson)
- Add quirks for devices with broken INTx masking (Jan Kiszka)
Miscellaneous:
- Fix some PCI Express capability version issues (Myron Stowe)
- Factor out some arch code with a weak, generic, pcibios_setup()
(Myron Stowe)"
* tag 'for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (122 commits)
PCI: hotplug: ensure a consistent return value in error case
PCI: fix undefined reference to 'pci_fixup_final_inited'
PCI: build resource code for M68K architecture
PCI: pciehp: remove unused pciehp_get_max_lnk_width(), pciehp_get_cur_lnk_width()
PCI: reorder __pci_assign_resource() (no change)
PCI: fix truncation of resource size to 32 bits
PCI: acpiphp: merge acpiphp_debug and debug
PCI: acpiphp: remove unused res_lock
sparc/PCI: replace pci_cfg_fake_ranges() with pci_read_bridge_bases()
PCI: call final fixups hot-added devices
PCI: move final fixups from __init to __devinit
x86/PCI: move final fixups from __init to __devinit
MIPS/PCI: move final fixups from __init to __devinit
PCI: support sizing P2P bridge I/O windows with 1K granularity
PCI: reimplement P2P bridge 1K I/O windows (Intel P64H2)
PCI: disable MEM decoding while updating 64-bit MEM BARs
PCI: leave MEM and IO decoding disabled during 64-bit BAR sizing, too
PCI: never discard enable/suspend/resume_early/resume fixups
PCI: release temporary reference in __nv_msi_ht_cap_quirk()
PCI: restructure 'pci_do_fixups()'
...