Commit 2aede851dd (PM / Hibernate: Freeze
kernel threads after preallocating memory) moved the freezing of kernel
threads to hibernation_snapshot() function.
So now, if the call to hibernation_snapshot() returns early due to a
successful hibernation test, the caller has to thaw processes to ensure
that the system gets back to its original state.
But in SNAPSHOT_CREATE_IMAGE hibernation ioctl, the caller does not thaw
processes in case hibernation_snapshot() returned due to a successful
freezer test. Fix this issue. But note we still send the value of 'in_suspend'
(which is now 0) to userspace, because we are not in an error path per-se,
and moreover, the value of in_suspend correctly depicts the situation here.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Use threads for LZO compression/decompression on hibernate/thaw.
Improve buffering on hibernate/thaw.
Calculate/verify CRC32 of the image pages on hibernate/thaw.
In my testing, this improved write/read speed by a factor of about two.
Signed-off-by: Bojan Smojver <bojan@rexursive.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
There is a problem with the current ordering of hibernate code which
leads to deadlocks in some filesystems' memory shrinkers. Namely,
some filesystems use freezable kernel threads that are inactive when
the hibernate memory preallocation is carried out. Those same
filesystems use memory shrinkers that may be triggered by the
hibernate memory preallocation. If those memory shrinkers wait for
the frozen kernel threads, the hibernate process deadlocks (this
happens with XFS, for one example).
Apparently, it is not technically viable to redesign the filesystems
in question to avoid the situation described above, so the only
possible solution of this issue is to defer the freezing of kernel
threads until the hibernate memory preallocation is done, which is
implemented by this change.
Unfortunately, this requires the memory preallocation to be done
before the "prepare" stage of device freeze, so after this change the
only way drivers can allocate additional memory for their freeze
routines in a clean way is to use PM notifiers.
Reported-by: Christoph <cr2005@u-club.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Martin reports that on his system hibernation occasionally fails due
to the lack of memory, because the radeon driver apparently allocates
too much of it during the device freeze stage. It turns out that the
amount of memory allocated by radeon during hibernation (and
presumably during system suspend too) depends on the utilization of
the GPU (e.g. hibernating while there are two KDE 4 sessions with
compositing enabled causes radeon to allocate more memory than for
one KDE 4 session).
In principle it should be possible to use image_size to make the
memory preallocation mechanism free enough memory for the radeon
driver, but in practice it is not easy to guess the right value
because of the way the preallocation code uses image_size. For this
reason, it seems reasonable to allow users to control the amount of
memory reserved for driver allocations made after the hibernate
preallocation, which currently is constant and amounts to 1 MB.
Introduce a new sysfs file, /sys/power/reserved_size, whose value
will be used as the amount of memory to reserve for the
post-preallocation reservations made by device drivers, in bytes.
For backwards compatibility, set its default (and initial) value to
the currently used number (1 MB).
References: https://bugzilla.kernel.org/show_bug.cgi?id=34102
Reported-and-tested-by: Martin Steigerwald <Martin@Lichtvoll.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
The default hibernation image size is currently hard coded and euqal
to 500 MB, which is not a reasonable default on many contemporary
systems. Make it equal 2/5 of the total RAM size (this is slightly
below the maximum, i.e. 1/2 of the total RAM size, and seems to be
generally suitable).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Tested-by: M. Vefa Bicakci <bicave@superonline.com>
Compress hibernation image with LZO in order to save on I/O and
therefore time to hibernate/thaw.
[rjw: Added hibernate=nocompress command line option instead of just
nocompress which would be confusing, fixed a couple of compiler
warnings, fixed kerneldoc comments, minor cleanups.]
Signed-off-by: Bojan Smojver <bojan@rexursive.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Move block I/O operations to a separate file. It is because it will
be used later not only by the swap writer.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Remove support of reads with offset. This means snapshot_read/write_next
now does not accept count parameter. It allows to clean up the functions
and snapshot handle which no longer needs to care about offsets.
/dev/snapshot handler is converted to simple_{read_from,write_to}_buffer
which take care of offsets.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Since the hibernation code is now going to use allocations of memory
to make enough room for the image, it can also use the page frames
allocated at this stage as image page frames. The low-level
hibernation code needs to be rearranged for this purpose, but it
allows us to avoid freeing a great number of pages and allocating
these same pages once again later, so it generally is worth doing.
[rev. 2: Take highmem into account correctly.]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Change the name of kernel/power/disk.c to kernel/power/hibernate.c
in analogy with the file names introduced by the changes that
separated the suspend to RAM and standby funtionality from the
common PM functions.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Move the suspend to RAM and standby code from kernel/power/main.c
to two separate files, kernel/power/suspend.c containing the basic
functions and kernel/power/suspend_test.c containing the automatic
suspend test facility based on the RTC clock alarm.
There are no changes in functionality related to these modifications.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
A future patch is going to modify the memory shrinking code so that
it will make memory allocations to free memory instead of using an
artificial memory shrinking mechanism for that. For this purpose it
is convenient to move swsusp_shrink_memory() from
kernel/power/swsusp.c to kernel/power/snapshot.c, because the new
memory-shrinking code is going to use things that are local to
kernel/power/snapshot.c .
[rev. 2: Make some functions static and remove their headers from
kernel/power/power.h]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
This patch implements devices state save/restore before after kexec.
This patch together with features in kexec_jump patch can be used for
following:
- A simple hibernation implementation without ACPI support. You can kexec a
hibernating kernel, save the memory image of original system and shutdown
the system. When resuming, you restore the memory image of original system
via ordinary kexec load then jump back.
- Kernel/system debug through making system snapshot. You can make system
snapshot, jump back, do some thing and make another system snapshot.
- Cooperative multi-kernel/system. With kexec jump, you can switch between
several kernels/systems quickly without boot process except the first time.
This appears like swap a whole kernel/system out/in.
- A general method to call program in physical mode (paging turning
off). This can be used to invoke BIOS code under Linux.
The following user-space tools can be used with kexec jump:
- kexec-tools needs to be patched to support kexec jump. The patches
and the precompiled kexec can be download from the following URL:
source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2
patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2
binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10
- makedumpfile with patches are used as memory image saving tool, it
can exclude free pages from original kernel memory image file. The
patches and the precompiled makedumpfile can be download from the
following URL:
source: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile-src_cvs_kh10.tar.bz2
patches: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile-patches_cvs_kh10.tar.bz2
binary: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile_cvs_kh10
- An initramfs image can be used as the root file system of kexeced
kernel. An initramfs image built with "BuildRoot" can be downloaded
from the following URL:
initramfs image: http://khibernation.sourceforge.net/download/release_v10/initramfs/rootfs_cvs_kh10.gz
All user space tools above are included in the initramfs image.
Usage example of simple hibernation:
1. Compile and install patched kernel with following options selected:
CONFIG_X86_32=y
CONFIG_RELOCATABLE=y
CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
CONFIG_PM=y
CONFIG_HIBERNATION=y
CONFIG_KEXEC_JUMP=y
2. Build an initramfs image contains kexec-tool and makedumpfile, or
download the pre-built initramfs image, called rootfs.gz in
following text.
3. Prepare a partition to save memory image of original kernel, called
hibernating partition in following text.
4. Boot kernel compiled in step 1 (kernel A).
5. In the kernel A, load kernel compiled in step 1 (kernel B) with
/sbin/kexec. The shell command line can be as follow:
/sbin/kexec --load-preserve-context /boot/bzImage --mem-min=0x100000
--mem-max=0xffffff --initrd=rootfs.gz
6. Boot the kernel B with following shell command line:
/sbin/kexec -e
7. The kernel B will boot as normal kexec. In kernel B the memory
image of kernel A can be saved into hibernating partition as
follow:
jump_back_entry=`cat /proc/cmdline | tr ' ' '\n' | grep kexec_jump_back_entry | cut -d '='`
echo $jump_back_entry > kexec_jump_back_entry
cp /proc/vmcore dump.elf
Then you can shutdown the machine as normal.
8. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as
root file system.
9. In kernel C, load the memory image of kernel A as follow:
/sbin/kexec -l --args-none --entry=`cat kexec_jump_back_entry` dump.elf
10. Jump back to the kernel A as follow:
/sbin/kexec -e
Then, kernel A is resumed.
Implementation point:
To support jumping between two kernels, before jumping to (executing)
the new kernel and jumping back to the original kernel, the devices
are put into quiescent state, and the state of devices and CPU is
saved. After jumping back from kexeced kernel and jumping to the new
kernel, the state of devices and CPU are restored accordingly. The
devices/CPU state save/restore code of software suspend is called to
implement corresponding function.
Known issues:
- Because the segment number supported by sys_kexec_load is limited,
hibernation image with many segments may not be load. This is
planned to be eliminated by adding a new flag to sys_kexec_load to
make a image can be loaded with multiple sys_kexec_load invoking.
Now, only the i386 architecture is supported.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch makes the freezer optional for suspend to allow the
system to work (or not work) like the original PMU suspend.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
Move the low level restore code to kernel/power/disk.c , since the
corresponding low level hibernation code is already there.
Make restore fail if device_power_down(PMSG_PRETHAW) returns an
error.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Len Brown <len.brown@intel.com>
This patch (as1008b) converts the PM notifier routines from inline
calls to out-of-line code. It also prevents pm_chain_head from
being created when CONFIG_PM_SLEEP isn't enabled, and EXPORTs the
notifier registration and unregistration routines.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
Make it possible to test the hibernation core code with the help of the
/sys/power/pm_test attribute introduced for suspend testing in the previous
patch.
Writing an appropriate string to this file causes the hibernation code to work
in one of the test modes defined as follows:
freezer
- test the freezing of processes
devices
- test the freezing of processes and suspending of devices
platform
- test the freezing of processes, suspending of devices and platform global
control methods(*)
processors
- test the freezing of processes, suspending of devices, platform global
control methods(*) and the disabling of nonboot CPUs
core
- test the freezing of processes, suspending of devices, platform global
control methods(*), the disabling of nonboot CPUs and suspending of
platform/system devices
(*) - the platform global control methods are only available on ACPI systems
and are only tested if the hibernation mode is set to "platform"
Then, if a hibernation is started by normal means, the hibernation core will
perform its normal operations up to the point indicated by given test level.
Next, it will wait for 5 seconds and carry out the resume operations needed to
transition the system back to the fully functional state.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Len Brown <len.brown@intel.com>
Introduce sysfs attribute /sys/power/pm_test allowing one to test the suspend
core code. Namely, writing one of the strings:
freezer
devices
platform
processors
core
to this file causes the suspend code to work in one of the test modes defined as
follows:
freezer
- test the freezing of processes
devices
- test the freezing of processes and suspending of devices
platform
- test the freezing of processes, suspending of devices and platform global
control methods(*)
processors
- test the freezing of processes, suspending of devices, platform global
control methods and the disabling of nonboot CPUs
core
- test the freezing of processes, suspending of devices, platform global
control methods, the disabling of nonboot CPUs and suspending of
platform/system devices
(*) These are ACPI global control methods on ACPI systems
Then, if a suspend is started by normal means, the suspend core will perform
its normal operations up to the point indicated by given test level. Next, it
will wait for 5 seconds and carry out the resume operations needed to transition
the system back to the fully functional state.
Writing "none" to /sys/power/pm_test turns the testing off.
When open for reading, /sys/power/pm_test contains a space-separated list of all
available tests (including "none" that represents the normal functionality) in
which the current test level is indicated by square brackets.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Len Brown <len.brown@intel.com>
This patch moves the prototypes of count_highmem_pages() and
restore_highmem() to kernel/power/power.h
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
Move the definitions of hibernation ioctls to a separate header file in
include/linux, which can be exported to the user space.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Len Brown <len.brown@intel.com>
Three ioctl numbers belonging to the hibernation userland interface,
SNAPSHOT_ATOMIC_SNAPSHOT, SNAPSHOT_AVAIL_SWAP, SNAPSHOT_GET_SWAP_PAGE,
are defined in a wrong way (eg. not portable). Provide new ioctl numbers for
these ioctls and mark the existing ones as deprecated.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Len Brown <len.brown@intel.com>
Mark the SNAPSHOT_SET_SWAP_FILE ioctl belonging to the hibernation userland
interface as deprecated.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Len Brown <len.brown@intel.com>
Modify the hibernation userland interface by adding two new ioctls to it,
SNAPSHOT_PLATFORM_SUPPORT and SNAPSHOT_POWER_OFF, that can be used,
respectively, to switch the hibernation platform support on/off and to make the
kernel transition the system to the hibernation state (eg. ACPI S4) using the
platform (eg. ACPI) driver.
These ioctls are intended to replace the misdesigned SNAPSHOT_PMOPS ioctl,
which from now is regarded as obsolete and will be removed in the future.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Len Brown <len.brown@intel.com>
Add a new ioctl, SNAPSHOT_GET_IMAGE_SIZE, returning the size of the (just
created) hibernation image, to the hibernation userland interface.
This ioctl is necessary so that the userland utilities using the interface need
not access the hibernation image header, owned by the kernel, in order to obtain
the size of the image.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Len Brown <len.brown@intel.com>