In 65f63384 "xen: improve error handling in do_suspend" I said:
- xs_suspend()/xs_resume() and dpm_suspend_noirq()/dpm_resume_noirq() were not
nested in the obvious way.
and changed the ordering of the calls as so:
BEFORE AFTER
xs_suspend dpm_suspend_noirq
dpm_suspend_noirq xs_suspend
*SUSPEND* *SUSPEND*
dpm_resume_noirq dpm_resume_noirq
xs_resume xs_resume
Clearly this is not an improvement and I was talking rubbish.
In particular the new ordering is susceptible to a hang if a xenstore write is
in progress at the point at which the suspend kicks in. When the suspend
process calls xs_suspend it tries to take the request_mutex but if a write is
in progress it could be looping in xenbus_xs.c:read_reply() waiting for
something to arrive on &xs_state.reply_list while holding the request_mutex
(taken in the caller of read_reply).
However if we have done dpm_suspend_noirq before xs_suspend then we won't get
any more xenstore interrupts and process_msg() will never be woken up to add
anything to the reply_list.
Fix this by calling xs_suspend before dpm_suspend_noirq. If dpm_suspend_noirq
fails then make sure we go through the xs_suspend_cancel() code path.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
* 'bugfix' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
xen: try harder to balloon up under memory pressure.
Xen balloon: fix totalram_pages counting.
xen: explicitly create/destroy stop_machine workqueues outside suspend/resume region.
xen: improve error handling in do_suspend.
xen: don't leak IRQs over suspend/resume.
xen: call clock resume notifier on all CPUs
xen: use iret for return from 64b kernel to 32b usermode
xen: don't call dpm_resume_noirq() with interrupts disabled.
xen: register runstate info for boot CPU early
xen: register runstate on secondary CPUs
xen: register timer interrupt with IRQF_TIMER
xen: correctly restore pfn_to_mfn_list_list after resume
xen: restore runstate_info even if !have_vcpu_info_placement
xen: re-register runstate area earlier on resume.
xen: wait up to 5 minutes for device connetion
xen: improvement to wait_for_devices()
xen: fix is_disconnected_device/exists_disconnected_device
xen/xenbus: make DEVICE_ATTR()s static
Currently if the balloon driver is unable to increase the guest's
reservation it assumes the failure was due to reaching its full
allocation, gives up on the ballooning operation and records the limit
it reached as the "hard limit". The driver will not try again until
the target is set again (even to the same value).
However it is possible that ballooning has in fact failed due to
memory pressure in the host and therefore it is desirable to keep
attempting to reach the target in case memory becomes available. The
most likely scenario is that some guests are ballooning down while
others are ballooning up and therefore there is temporary memory
pressure while things stabilise. You would not expect a well behaved
toolstack to ask a domain to balloon to more than its allocation nor
would you expect it to deliberately over-commit memory by setting
balloon targets which exceed the total host memory.
This patch drops the concept of a hard limit and causes the balloon
driver to retry increasing the reservation on a timer in the same
manner as when decreasing the reservation.
Also if we partially succeed in increasing the reservation
(i.e. receive less pages than we asked for) then we may as well keep
those pages rather than returning them to Xen.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
Change totalram_pages when a single page is added/removed to the
ballooned list. This avoid totalram_pages to be set erroneously to
max_pfn at boot.
Signed-off-by: Gianluca Guida <gianluca.guida@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
I have observed cases where the implicit stop_machine_destroy() done by
stop_machine() hangs while destroying the workqueues, specifically in
kthread_stop(). This seems to be because timer ticks are not restarted
until after stop_machine() returns.
Fortunately stop_machine provides a facility to pre-create/post-destroy
the workqueues so use this to ensure that workqueues are only destroyed
after everything is really up and running again.
I only actually observed this failure with 2.6.30. It seems that newer
kernels are somehow more robust against doing kthread_stop() without timer
interrupts (I tried some backports of some likely looking candidates but
did not track down the commit which added this robustness). However this
change seems like a reasonable belt&braces thing to do.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
The existing error handling has a few issues:
- If freeze_processes() fails it exits with shutting_down = SHUTDOWN_SUSPEND.
- If dpm_suspend_noirq() fails it exits without resuming xenbus.
- If stop_machine() fails it exits without resuming xenbus or calling
dpm_resume_end().
- xs_suspend()/xs_resume() and dpm_suspend_noirq()/dpm_resume_noirq() were not
nested in the obvious way.
Fix by ensuring each failure case goto's the correct label. Treat a failure of
stop_machine() as a cancelled suspend in order to follow the correct resume
path.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
On resume irq_info[*].evtchn is reset to 0 since event channel mappings
are not preserved over suspend/resume. The other contents of irq_info
is preserved to allow rebind_evtchn_irq() to function.
However when a device resumes it will try to unbind from the
previous IRQ (e.g. blkfront goes blkfront_resume() -> blkif_free() ->
unbind_from_irqhandler() -> unbind_from_irq()). This will fail due to the
check for VALID_EVTCHN in unbind_from_irq() and the IRQ is leaked. The
device will then continue to resume and allocate a new IRQ, eventually
leading to find_unbound_irq() panic()ing.
Fix this by changing unbind_from_irq() to handle teardown of interrupts
which have type!=IRQT_UNBOUND but are not currently bound to a specific
event channel.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
dpm_resume_noirq() takes a mutex, so it can't be called from a no-interrupt
context. Don't call it from within the stop-machine function, but just
afterwards, since we're resuming anyway, regardless of what happened.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
Increases the device timeout from 10s to 5 minutes, giving the user a
visual indication during that time in case there are problems. The patch
is a backport of changesets 144 and 150 in the Xenbits tree.
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
When printing a warning about a timed-out device, print the
current state of both ends of the device connection (i.e., backend as
well as frontend). This backports half of changeset 146 from the
Xenbits tree.
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
The logic of is_disconnected_device/exists_disconnected_device is wrong
in that they are used to test whether a device is trying to connect (i.e.
connecting). For this reason the patch fixes them to not consider a
Closing or Closed device to be connecting. At the same time the patch
also renames the functions according to what they really do; you could
say a closed device is "disconnected" (the old name), but not "connecting"
(the new name).
This patch is a backport of changeset 909 from the Xenbits tree.
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Move xen_domain and related tests out of asm-x86 to xen/xen.h so they
can be included whenever they are necessary.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
trivial: fix typo in aic7xxx comment
trivial: fix comment typo in drivers/ata/pata_hpt37x.c
trivial: typo in kernel-parameters.txt
trivial: fix typo in tracing documentation
trivial: add __init/__exit macros in drivers/gpio/bt8xxgpio.c
trivial: add __init macro/ fix of __exit macro location in ipmi_poweroff.c
trivial: remove unnecessary semicolons
trivial: Fix duplicated word "options" in comment
trivial: kbuild: remove extraneous blank line after declaration of usage()
trivial: improve help text for mm debug config options
trivial: doc: hpfall: accept disk device to unload as argument
trivial: doc: hpfall: reduce risk that hpfall can do harm
trivial: SubmittingPatches: Fix reference to renumbered step
trivial: fix typos "man[ae]g?ment" -> "management"
trivial: media/video/cx88: add __init/__exit macros to cx88 drivers
trivial: fix typo in CONFIG_DEBUG_FS in gcov doc
trivial: fix missing printk space in amd_k7_smp_check
trivial: fix typo s/ketymap/keymap/ in comment
trivial: fix typo "to to" in multiple files
trivial: fix typos in comments s/DGBU/DBGU/
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 commits)
powerpc64: convert to dynamic percpu allocator
sparc64: use embedding percpu first chunk allocator
percpu: kill lpage first chunk allocator
x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA
percpu: update embedding first chunk allocator to handle sparse units
percpu: use group information to allocate vmap areas sparsely
vmalloc: implement pcpu_get_vm_areas()
vmalloc: separate out insert_vmalloc_vm()
percpu: add chunk->base_addr
percpu: add pcpu_unit_offsets[]
percpu: introduce pcpu_alloc_info and pcpu_group_info
percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward
percpu: add @align to pcpu_fc_alloc_fn_t
percpu: make @dyn_size mandatory for pcpu_setup_first_chunk()
percpu: drop @static_size from first chunk allocators
percpu: generalize first chunk allocator selection
percpu: build first chunk allocators selectively
percpu: rename 4k first chunk allocator to page
percpu: improve boot messages
percpu: fix pcpu_reclaim() locking
...
Fix trivial conflict as by Tejun Heo in kernel/sched.c
-fstack-protector uses a special per-cpu "stack canary" value.
gcc generates special code in each function to test the canary to make
sure that the function's stack hasn't been overrun.
On x86-64, this is simply an offset of %gs, which is the usual per-cpu
base segment register, so setting it up simply requires loading %gs's
base as normal.
On i386, the stack protector segment is %gs (rather than the usual kernel
percpu %fs segment register). This requires setting up the full kernel
GDT and then loading %gs accordingly. We also need to make sure %gs is
initialized when bringing up secondary cpus too.
To keep things consistent, we do the full GDT/segment register setup on
both architectures.
Because we need to avoid -fstack-protected code before setting up the GDT
and because there's no way to disable it on a per-function basis, several
files need to have stack-protector inhibited.
[ Impact: allow Xen booting with stack-protector enabled ]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Conflicts:
arch/sparc/kernel/smp_64.c
arch/x86/kernel/cpu/perf_counter.c
arch/x86/kernel/setup_percpu.c
drivers/cpufreq/cpufreq_ondemand.c
mm/percpu.c
Conflicts in core and arch percpu codes are mostly from commit
ed78e1e078dd44249f88b1dd8c76dafb39567161 which substituted many
num_possible_cpus() with nr_cpu_ids. As for-next branch has moved all
the first chunk allocators into mm/percpu.c, the changes are moved
from arch code to mm/percpu.c.
Signed-off-by: Tejun Heo <tj@kernel.org>
The init_IRQ() function is now called with slab allocator initialized.
Therefore, we must not use the bootmem allocator in xen_init_IRQ().
Fixes the following boot-time warning:
------------[ cut here ]------------
WARNING: at mm/bootmem.c:535 alloc_arch_preferred_bootmem+0x27/0x45()
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.30 #1
Call Trace:
[<ffffffff8102d6e3>] ? warn_slowpath_common+0x73/0xb0
[<ffffffff810210d9>] ? pvclock_clocksource_read+0x49/0x90
[<ffffffff812e522f>] ? alloc_arch_preferred_bootmem+0x27/0x45
[<ffffffff812e5761>] ? ___alloc_bootmem_nopanic+0x39/0xc9
[<ffffffff812e57fa>] ? ___alloc_bootmem+0x9/0x2f
[<ffffffff812e9e21>] ? xen_init_IRQ+0x25/0x61
[<ffffffff812d69ee>] ? start_kernel+0x1b5/0x29e
---[ end trace 4eaa2a86a8e2da22 ]---
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Tested-by: Christian Kujau <lists@nerdbynature.de>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: lists@nerdbynature.de
Cc: jeremy.fitzhardinge@citrix.com
LKML-Reference: <1246438278.22417.28.camel@penberg-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Percpu variable definition is about to be updated such that all percpu
symbols including the static ones must be unique. Update percpu
variable definitions accordingly.
* as,cfq: rename ioc_count uniquely
* cpufreq: rename cpu_dbs_info uniquely
* xen: move nesting_count out of xen_evtchn_do_upcall() and rename it
* mm: move ratelimits out of balance_dirty_pages_ratelimited_nr() and
rename it
* ipv4,6: rename cookie_scratch uniquely
* x86 perf_counter: rename prev_left to pmc_prev_left, irq_entry to
pmc_irq_entry and nmi_entry to pmc_nmi_entry
* perf_counter: rename disable_count to perf_disable_count
* ftrace: rename test_event_disable to ftrace_test_event_disable
* kmemleak: rename test_pointer to kmemleak_test_pointer
* mce: rename next_interval to mce_next_interval
[ Impact: percpu usage cleanups, no duplicate static percpu var names ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: linux-mm <linux-mm@kvack.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andi Kleen <andi@firstfloor.org>