I have received a report about the BUG_ON() in free_basic_memory_bitmaps()
triggering mysteriously during an aborted s2disk hibernation attempt.
The only way I can explain that is that /dev/snapshot was first
opened for writing (resume mode), then closed and then opened again
for reading and closed again without freezing tasks. In that case
the first invocation of snapshot_open() would set the free_bitmaps
flag in snapshot_state, which is a static variable. That flag
wouldn't be cleared later and the second invocation of snapshot_open()
would just leave it like that, so the subsequent snapshot_release()
would see data->frozen set and free_basic_memory_bitmaps() would be
called unnecessarily.
To prevent that from happening clear data->free_bitmaps in
snapshot_open() when the file is being opened for reading (hibernate
mode).
In addition to that, replace the BUG_ON() in free_basic_memory_bitmaps()
with a WARN_ON() as the kernel can continue just fine if the condition
checked by that macro occurs.
Fixes: aab1728915 (PM / hibernate: Fix user space driven resume regression)
Reported-by: Oliver Lorenz <olli@olorenz.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: 3.12+ <stable@vger.kernel.org> # 3.12+
When system has a lot of highmem (e.g. 16GiB using a 32 bits kernel),
the code to calculate how much memory we need to preallocate in
normal zone may cause overflow. As Leon has analysed:
It looks that during computing 'alloc' variable there is overflow:
alloc = (3943404 - 1970542) - 1978280 = -5418 (signed)
And this function goes to err_out.
Fix this by avoiding that overflow.
References: https://bugzilla.kernel.org/show_bug.cgi?id=60817
Reported-and-tested-by: Leon Drugi <eyak@wp.pl>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
software_resume is being called after deferred_probe_initcall in
drivers base. If the probing of the device that contains the resume
image is deferred, and the system has been instructed to wait for
it to show up, this wait will occur in software_resume. This causes
a deadlock.
Move software_resume into late_initcall_sync so that it happens
after all the other late_initcalls.
Signed-off-by: Russ Dill <Russ.Dill@ti.com>
Acked-by: Pavel Machek <Pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The snapshot_data structure used internally by the hibernate user
space interface code in user.c has three char fields that are used
to store boolean values. Change their data type to bool and use
true and false instead of 1 and 0, respectively, in assignments
involving those fields.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Rather than hard-lock the kernel, dump the suspend/resume thread stack
and panic() to capture a message in pstore when a driver takes too long
to suspend/resume. Default suspend/resume watchdog timeout is set to 12
seconds to be longer than the usbhid 10 second timeout, but could be
changed at compile time.
Exclude from the watchdog the time spent waiting for children that
are resumed asynchronously and time every device, whether or not they
resumed synchronously.
This patch is targeted for mobile devices where a suspend/resume lockup
could cause a system reboot. Information about failing device can be
retrieved in subsequent boot session by mounting pstore and inspecting
the log. Laptops with EFI-enabled pstore could also benefit from
this feature.
The hardware watchdog timer is likely suspended during this time and
couldn't be relied upon. The soft-lockup detector would eventually tell
that tasks are not scheduled, but would provide little context as to why.
The patch hence uses system timer and assumes it is still active while the
devices are suspended/resumed.
This feature can be enabled/disabled during kernel configuration.
This change is based on earlier work by San Mehat.
Signed-off-by: Benoit Goby <benoit@android.com>
Signed-off-by: Zoran Markovic <zoran.markovic@linaro.org>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Recent commit 8fd37a4 (PM / hibernate: Create memory bitmaps after
freezing user space) broke the resume part of the user space driven
hibernation (s2disk), because I forgot that the resume utility
loaded the image into memory without freezing user space (it still
freezes tasks after loading the image). This means that during user
space driven resume we need to create the memory bitmaps at the
"device open" time rather than at the "freeze tasks" time, so make
that happen (that's a special case anyway, so it needs to be treated
in a special way).
Reported-and-tested-by: Ronald <ronald645@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull ACPI and power management fixes from Rafael Wysocki:
"All of these commits are fixes that have emerged recently and some of
them fix bugs introduced during this merge window.
Specifics:
1) ACPI-based PCI hotplug (ACPIPHP) fixes related to spurious events
After the recent ACPIPHP changes we've seen some interesting
breakage on a system that triggers device check notifications
during boot for non-existing devices. Although those
notifications are really spurious, we should be able to deal with
them nevertheless and that shouldn't introduce too much overhead.
Four commits to make that work properly.
2) Memory hotplug and hibernation mutual exclusion rework
This was maent to be a cleanup, but it happens to fix a classical
ABBA deadlock between system suspend/hibernation and ACPI memory
hotplug which is possible if they are started roughly at the same
time. Three commits rework memory hotplug so that it doesn't
acquire pm_mutex and make hibernation use device_hotplug_lock
which prevents it from racing with memory hotplug.
3) ACPI Intel LPSS (Low-Power Subsystem) driver crash fix
The ACPI LPSS driver crashes during boot on Apple Macbook Air with
Haswell that has slightly unusual BIOS configuration in which one
of the LPSS device's _CRS method doesn't return all of the
information expected by the driver. Fix from Mika Westerberg, for
stable.
4) ACPICA fix related to Store->ArgX operation
AML interpreter fix for obscure breakage that causes AML to be
executed incorrectly on some machines (observed in practice).
From Bob Moore.
5) ACPI core fix for PCI ACPI device objects lookup
There still are cases in which there is more than one ACPI device
object matching a given PCI device and we don't choose the one
that the BIOS expects us to choose, so this makes the lookup take
more criteria into account in those cases.
6) Fix to prevent cpuidle from crashing in some rare cases
If the result of cpuidle_get_driver() is NULL, which can happen on
some systems, cpuidle_driver_ref() will crash trying to use that
pointer and the Daniel Fu's fix prevents that from happening.
7) cpufreq fixes related to CPU hotplug
Stephen Boyd reported a number of concurrency problems with
cpufreq related to CPU hotplug which are addressed by a series of
fixes from Srivatsa S Bhat and Viresh Kumar.
8) cpufreq fix for time conversion in time_in_state attribute
Time conversion carried out by cpufreq when user space attempts to
read /sys/devices/system/cpu/cpu*/cpufreq/stats/time_in_state
won't work correcty if cputime_t doesn't map directly to jiffies.
Fix from Andreas Schwab.
9) Revert of a troublesome cpufreq commit
Commit 7c30ed5 (cpufreq: make sure frequency transitions are
serialized) was intended to address some known concurrency
problems in cpufreq related to the ordering of transitions, but
unfortunately it introduced several problems of its own, so I
decided to revert it now and address the original problems later
in a more robust way.
10) Intel Haswell CPU models for intel_pstate from Nell Hardcastle.
11) cpufreq fixes related to system suspend/resume
The recent cpufreq changes that made it preserve CPU sysfs
attributes over suspend/resume cycles introduced a possible NULL
pointer dereference that caused it to crash during the second
attempt to suspend. Three commits from Srivatsa S Bhat fix that
problem and a couple of related issues.
12) cpufreq locking fix
cpufreq_policy_restore() should acquire the lock for reading, but
it acquires it for writing. Fix from Lan Tianyu"
* tag 'pm+acpi-fixes-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (25 commits)
cpufreq: Acquire the lock in cpufreq_policy_restore() for reading
cpufreq: Prevent problems in update_policy_cpu() if last_cpu == new_cpu
cpufreq: Restructure if/else block to avoid unintended behavior
cpufreq: Fix crash in cpufreq-stats during suspend/resume
intel_pstate: Add Haswell CPU models
Revert "cpufreq: make sure frequency transitions are serialized"
cpufreq: Use signed type for 'ret' variable, to store negative error values
cpufreq: Remove temporary fix for race between CPU hotplug and sysfs-writes
cpufreq: Synchronize the cpufreq store_*() routines with CPU hotplug
cpufreq: Invoke __cpufreq_remove_dev_finish() after releasing cpu_hotplug.lock
cpufreq: Split __cpufreq_remove_dev() into two parts
cpufreq: Fix wrong time unit conversion
cpufreq: serialize calls to __cpufreq_governor()
cpufreq: don't allow governor limits to be changed when it is disabled
ACPI / bind: Prefer device objects with _STA to those without it
ACPI / hotplug / PCI: Avoid parent bus rescans on spurious device checks
ACPI / hotplug / PCI: Use _OST to notify firmware about notify status
ACPI / hotplug / PCI: Avoid doing too much for spurious notifies
ACPICA: Fix for a Store->ArgX when ArgX contains a reference to a field.
ACPI / hotplug / PCI: Don't trim devices before scanning the namespace
...
Pull x86/asmlinkage changes from Ingo Molnar:
"As a preparation for Andi Kleen's LTO patchset (link time
optimizations using GCC's -flto which build time optimization has
steadily increased in quality over the past few years and might
eventually be usable for the kernel too) this tree includes a handful
of preparatory patches that make function calling convention
annotations consistent again:
- Mark every function without arguments (or 64bit only) that is used
by assembly code with asmlinkage()
- Mark every function with parameters or variables that is used by
assembly code as __visible.
For the vanilla kernel this has documentation, consistency and
debuggability advantages, for the time being"
* 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asmlinkage: Fix warning in xen asmlinkage change
x86, asmlinkage, vdso: Mark vdso variables __visible
x86, asmlinkage, power: Make various symbols used by the suspend asm code visible
x86, asmlinkage: Make dump_stack visible
x86, asmlinkage: Make 64bit checksum functions visible
x86, asmlinkage, paravirt: Add __visible/asmlinkage to xen paravirt ops
x86, asmlinkage, apm: Make APM data structure used from assembler visible
x86, asmlinkage: Make syscall tables visible
x86, asmlinkage: Make several variables used from assembler/linker script visible
x86, asmlinkage: Make kprobes code visible and fix assembler code
x86, asmlinkage: Make various syscalls asmlinkage
x86, asmlinkage: Make 32bit/64bit __switch_to visible
x86, asmlinkage: Make _*_start_kernel visible
x86, asmlinkage: Make all interrupt handlers asmlinkage / __visible
x86, asmlinkage: Change dotraplinkage into __visible on 32bit
x86: Fix sys_call_table type in asm/syscall.h
Since all of the memory hotplug operations have to be carried out
under device_hotplug_lock, they won't need to acquire pm_mutex if
device_hotplug_lock is held around hibernation.
For this reason, make the hibernation code acquire
device_hotplug_lock after freezing user space processes and
release it before thawing them. At the same tim drop the
lock_system_sleep() and unlock_system_sleep() calls from
lock_memory_hotplug() and unlock_memory_hotplug(), respectively.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
The hibernation core uses special memory bitmaps during image
creation and restoration and traditionally those bitmaps are
allocated before freezing tasks, because in the past GFP_KERNEL
allocations might not work after all tasks had been frozen.
However, this is an anachronism, because hibernation_snapshot()
now calls hibernate_preallocate_memory() which allocates memory
for the image upfront anyway, so the memory bitmaps may be
allocated after freezing user space safely.
For this reason, move all of the create_basic_memory_bitmaps()
calls after freeze_processes() and all of the corresponding
free_basic_memory_bitmaps() calls before thaw_processes().
This will allow us to hold device_hotplug_lock around hibernation
without the need to worry about freezing issues with user space
processes attempting to acquire it via sysfs attributes after the
creation of memory bitmaps and before the freezing of tasks.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
pm_qos_update_request_timeout() updates a qos and then schedules
a delayed work item to bring the qos back down to the default
after the timeout. When the work item runs, pm_qos_work_fn() will
call pm_qos_update_request() and deadlock because it tries to
cancel itself via cancel_delayed_work_sync(). Future callers of
that qos will also hang waiting to cancel the work that is
canceling itself. Let's extract the little bit of code that does
the real work of pm_qos_update_request() and call it from the
work function so that we don't deadlock.
Before ed1ac6e (PM: don't use [delayed_]work_pending()) this didn't
happen because the work function wouldn't try to cancel itself.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Reviewed-by: Tejun Heo <tj@kernel.org>
Cc: 3.9+ <stable@vger.kernel.org> # 3.9+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Calling freeze_processes sets a global flag that will cause any
process that calls try_to_freeze to enter the refrigerator. It
skips sending a signal to the current task, but if the current
task ever hits try_to_freeze, all threads will be frozen and the
system will deadlock.
Set a new flag, PF_SUSPEND_TASK, on the task that calls
freeze_processes. The flag notifies the freezer that the thread
is involved in suspend and should not be frozen. Also add a
WARN_ON in thaw_processes if the caller does not have the
PF_SUSPEND_TASK flag set to catch if a different task calls
thaw_processes than the one that called freeze_processes, leaving
a task with PF_SUSPEND_TASK permanently set on it.
Threads that spawn off a task with PF_SUSPEND_TASK set (which
swsusp does) will also have PF_SUSPEND_TASK set, preventing them
from freezing while they are helping with suspend, but they need
to be dead by the time suspend is triggered, otherwise they may
run when userspace is expected to be frozen. Add a WARN_ON in
thaw_processes if more than one thread has the PF_SUSPEND_TASK
flag set.
Reported-and-tested-by: Michael Leun <lkml20130126@newton.leun.net>
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Change where ftrace is disabled and re-enabled during system
suspend/resume to allow tracing of device driver pm callbacks.
Ftrace will now be turned off when suspend reaches
disable_nonboot_cpus() instead of at the very beginning of system
suspend.
Ftrace was disabled during suspend/resume back in 2008 by
Steven Rostedt as he discovered there was a conflict in the
enable_nonboot_cpus() call (see commit f42ac38 "ftrace: disable
tracing for suspend to ram"). This change preserves his fix by
disabling ftrace, but only at the function where it is known
to cause problems.
The new change allows tracing of the device level code for better
debug.
[rjw: Changelog]
Signed-off-by: Todd Brandt <todd.e.brandt@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Prevent automatic system suspend from happening during system
shutdown by making try_to_suspend() check system_state and return
immediately if it is not SYSTEM_RUNNING.
This prevents the following breakage from happening (scenario from
Zhang Yanmin):
Kernel starts shutdown and calls all device driver's shutdown
callback. When a driver's shutdown is called, the last wakelock is
released and suspend-to-ram starts. However, as some driver's shut
down callbacks already shut down devices and disabled runtime pm,
the suspend-to-ram calls driver's suspend callback without noticing
that device is already off and causes crash.
[rjw: Changelog]
Signed-off-by: Liu ShuoX <shuox.liu@intel.com>
Cc: 3.5+ <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Merge first patch-bomb from Andrew Morton:
- various misc bits
- I'm been patchmonkeying ocfs2 for a while, as Joel and Mark have been
distracted. There has been quite a bit of activity.
- About half the MM queue
- Some backlight bits
- Various lib/ updates
- checkpatch updates
- zillions more little rtc patches
- ptrace
- signals
- exec
- procfs
- rapidio
- nbd
- aoe
- pps
- memstick
- tools/testing/selftests updates
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (445 commits)
tools/testing/selftests: don't assume the x bit is set on scripts
selftests: add .gitignore for kcmp
selftests: fix clean target in kcmp Makefile
selftests: add .gitignore for vm
selftests: add hugetlbfstest
self-test: fix make clean
selftests: exit 1 on failure
kernel/resource.c: remove the unneeded assignment in function __find_resource
aio: fix wrong comment in aio_complete()
drivers/w1/slaves/w1_ds2408.c: add magic sequence to disable P0 test mode
drivers/memstick/host/r592.c: convert to module_pci_driver
drivers/memstick/host/jmb38x_ms: convert to module_pci_driver
pps-gpio: add device-tree binding and support
drivers/pps/clients/pps-gpio.c: convert to module_platform_driver
drivers/pps/clients/pps-gpio.c: convert to devm_* helpers
drivers/parport/share.c: use kzalloc
Documentation/accounting/getdelays.c: avoid strncpy in accounting tool
aoe: update internal version number to v83
aoe: update copyright date
aoe: perform I/O completions in parallel
...