1. Do not allocate memory for buffers from emergency pools, unless
absolutely required. Do not warn about and do not retry non-essential
failed allocations.
2. Do not check the amount of free pages left on every single page
write, but wait until one map is completely populated and then check.
3. Set maximum number of pages for read buffering consistently, instead
of inadvertently depending on the size of the sector type.
4. Fix copyright line, which I missed when I submitted the hibernation
threading patch.
5. Dispense with bit shifting arithmetic to improve readability.
6. Really recalculate the number of pages required to be free after all
allocations have been done.
7. Fix calculation of pages required for read buffering. Only count in
pages that do not belong to high memory.
Signed-off-by: Bojan Smojver <bojan@rexursive.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Hibernation regression fix, since 3.2.
Calculate the number of required free pages based on non-high memory
pages only, because that is where the buffers will come from.
Commit 081a9d043c introduced a new buffer
page allocation logic during hibernation, in order to improve the
performance. The amount of pages allocated was calculated based on total
amount of pages available, although only non-high memory pages are
usable for this purpose. This caused hibernation code to attempt to over
allocate pages on platforms that have high memory, which led to hangs.
Signed-off-by: Bojan Smojver <bojan@rexursive.com>
Signed-off-by: Rafael J. Wysocki <rjw@suse.de>
For compressed image, the space required is not known until
we finish compressing and writing all pages.
This patch drops the check, and if swap space is not enough
finally, system can still restore to normal after writing
swap fails for compressed images.
Signed-off-by: Barry Song <Baohua.Song@csr.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Move invalidate_bdev, block_sync_page into fs/block_dev.c. Export
kill_bdev as well, so brd doesn't have to open code it. Reduce
buffer_head.h requirement accordingly.
Removed a rather large comment from invalidate_bdev, as it looked a bit
obsolete to bother moving. The small comment replacing it says enough.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Use threads for LZO compression/decompression on hibernate/thaw.
Improve buffering on hibernate/thaw.
Calculate/verify CRC32 of the image pages on hibernate/thaw.
In my testing, this improved write/read speed by a factor of about two.
Signed-off-by: Bojan Smojver <bojan@rexursive.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits)
block: ensure that completion error gets properly traced
blktrace: add missing probe argument to block_bio_complete
block cfq: don't use atomic_t for cfq_group
block cfq: don't use atomic_t for cfq_queue
block: trace event block fix unassigned field
block: add internal hd part table references
block: fix accounting bug on cross partition merges
kref: add kref_test_and_get
bio-integrity: mark kintegrityd_wq highpri and CPU intensive
block: make kblockd_workqueue smarter
Revert "sd: implement sd_check_events()"
block: Clean up exit_io_context() source code.
Fix compile warnings due to missing removal of a 'ret' variable
fs/block: type signature of major_to_index(int) to major_to_index(unsigned)
block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p)
cfq-iosched: don't check cfqg in choose_service_tree()
fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors
cdrom: export cdrom_check_events()
sd: implement sd_check_events()
sr: implement sr_check_events()
...
Conflicts:
MAINTAINERS
arch/arm/mach-omap2/pm24xx.c
drivers/scsi/bfa/bfa_fcpim.c
Needed to update to apply fixes for which the old branch was too
outdated.
Commit 3624eb0 (PM / Hibernate: Modify signature used to mark swap)
attempted to modify hibernate signature used to mark swap partitions
containing hibernation images, so that old kernels don't try to
handle compressed images. However, this change broke resume from
hibernation on Fedora 14 that apparently doesn't pass the resume=
argument to the kernel and tries to trigger resume from early user
space. This doesn't work, because the signature is now different,
so the old signature has to be restored to avoid the problem.
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=22732 .
Reported-by: Dr. David Alan Gilbert <linux@treblig.org>
Reported-by: Zhang Rui <rui.zhang@intel.com>
Reported-by: Pascal Chapperon <pascal.chapperon@wanadoo.fr>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
This is a fix for reading LZO compressed image using async I/O.
Essentially, instead of having just one page into which we keep
reading blocks from swap, we allocate enough of them to cover the
largest compressed size and then let block I/O pick them all up. Once
we have them all (and here we wait), we decompress them, as usual.
Obviously, the very first block we still pick up synchronously,
because we need to know the size of the lot before we pick up the
rest.
Also fixed the copyright line, which I've forgotten before.
Signed-off-by: Bojan Smojver <bojan@rexursive.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
After recent blkdev_get() modifications, open_by_devnum() and
open_bdev_exclusive() are simple wrappers around blkdev_get().
Replace them with blkdev_get_by_dev() and blkdev_get_by_path().
blkdev_get_by_dev() is identical to open_by_devnum().
blkdev_get_by_path() is slightly different in that it doesn't
automatically add %FMODE_EXCL to @mode.
All users are converted. Most conversions are mechanical and don't
introduce any behavior difference. There are several exceptions.
* btrfs now sets FMODE_EXCL in btrfs_device->mode, so there's no
reason to OR it explicitly on blkdev_put().
* gfs2, nilfs2 and the generic mount_bdev() now set FMODE_EXCL in
sb->s_mode.
* With the above changes, sb->s_mode now always should contain
FMODE_EXCL. WARN_ON_ONCE() added to kill_block_super() to detect
errors.
The new blkdev_get_*() functions are with proper docbook comments.
While at it, add function description to blkdev_get() too.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Joern Engel <joern@lazybastard.org>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jan Kara <jack@suse.cz>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Cc: reiserfs-devel@vger.kernel.org
Cc: xfs-masters@oss.sgi.com
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Over time, block layer has accumulated a set of APIs dealing with bdev
open, close, claim and release.
* blkdev_get/put() are the primary open and close functions.
* bd_claim/release() deal with exclusive open.
* open/close_bdev_exclusive() are combination of open and claim and
the other way around, respectively.
* bd_link/unlink_disk_holder() to create and remove holder/slave
symlinks.
* open_by_devnum() wraps bdget() + blkdev_get().
The interface is a bit confusing and the decoupling of open and claim
makes it impossible to properly guarantee exclusive access as
in-kernel open + claim sequence can disturb the existing exclusive
open even before the block layer knows the current open if for another
exclusive access. Reorganize the interface such that,
* blkdev_get() is extended to include exclusive access management.
@holder argument is added and, if is @FMODE_EXCL specified, it will
gain exclusive access atomically w.r.t. other exclusive accesses.
* blkdev_put() is similarly extended. It now takes @mode argument and
if @FMODE_EXCL is set, it releases an exclusive access. Also, when
the last exclusive claim is released, the holder/slave symlinks are
removed automatically.
* bd_claim/release() and close_bdev_exclusive() are no longer
necessary and either made static or removed.
* bd_link_disk_holder() remains the same but bd_unlink_disk_holder()
is no longer necessary and removed.
* open_bdev_exclusive() becomes a simple wrapper around lookup_bdev()
and blkdev_get(). It also has an unexpected extra bdev_read_only()
test which probably should be moved into blkdev_get().
* open_by_devnum() is modified to take @holder argument and pass it to
blkdev_get().
Most of bdev open/close operations are unified into blkdev_get/put()
and most exclusive accesses are tested atomically at the open time (as
it should). This cleans up code and removes some, both valid and
invalid, but unnecessary all the same, corner cases.
open_bdev_exclusive() and open_by_devnum() can use further cleanup -
rename to blkdev_get_by_path() and blkdev_get_by_devt() and drop
special features. Well, let's leave them for another day.
Most conversions are straight-forward. drbd conversion is a bit more
involved as there was some reordering, but the logic should stay the
same.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Neil Brown <neilb@suse.de>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Peter Osterlund <petero2@telia.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <joel.becker@oracle.com>
Cc: Alex Elder <aelder@sgi.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: dm-devel@redhat.com
Cc: drbd-dev@lists.linbit.com
Cc: Leo Chen <leochen@broadcom.com>
Cc: Scott Branden <sbranden@broadcom.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Joern Engel <joern@logfs.org>
Cc: reiserfs-devel@vger.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Since we are adding compression to the kernel's hibernate code,
change signature used by it to mark swap spaces, so that earlier
kernels don't attempt to restore compressed images they cannot
handle.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
The hibernate resume code checks if there is an image to resume from
on every boot and, if the kernel is built with CONFIG_PM_DEBUG set
and the image is not present, it prints some scary messages
suggesting there was a boot error of some sort. Apparently, some
users are confused by them, so make them look less scary and adjust
the other hibernate resume debug messages to match them.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Compress hibernation image with LZO in order to save on I/O and
therefore time to hibernate/thaw.
[rjw: Added hibernate=nocompress command line option instead of just
nocompress which would be confusing, fixed a couple of compiler
warnings, fixed kerneldoc comments, minor cleanups.]
Signed-off-by: Bojan Smojver <bojan@rexursive.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Please revert 2.6.36-rc commit d2997b1042
"hibernation: freeze swap at hibernation". It complicated matters by
adding a second swap allocation path, just for hibernation; without in any
way fixing the issue that it was intended to address - page reclaim after
fixing the hibernation image might free swap from a page already imaged as
swapcache, letting its swap be reallocated to store a different page of
the image: resulting in data corruption if the imaged page were freed as
clean then swapped back in. Pages freed to si->swap_map were still in
danger of being reallocated by the alternative allocation path.
I guess it inadvertently fixed slow SSD swap allocation for hibernation,
as reported by Nigel Cunningham: by missing out the discards that occur on
the usual swap allocation path; but that was unintentional, and needs a
separate fix.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Ondrej Zary <linux@rainbow-software.org>
Cc: Andrea Gelmini <andrea.gelmini@gmail.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Nigel Cunningham <nigel@tuxonice.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When taking a memory snapshot in hibernate_snapshot(), all (directly
called) memory allocations use GFP_ATOMIC. Hence swap misusage during
hibernation never occurs.
But from a pessimistic point of view, there is no guarantee that no page
allcation has __GFP_WAIT. It is better to have a global indication "we
enter hibernation, don't use swap!".
This patch tries to freeze new-swap-allocation during hibernation. (All
user processes are frozenm so swapin is not a concern).
This way, no updates will happen to swap_map[] between
hibernate_snapshot() and save_image(). Swap is thawed when swsusp_free()
is called. We can be assured that swap corruption will not occur.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Hugh Dickins <hughd@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Ondrej Zary <linux@rainbow-software.org>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are a few typos in kernel/power/swap.c. Fix them.
Signed-off-by: Cesar Eduardo Barros <cesarb@cesarb.net>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Move all the swap processing into one function. It will make swap
calls from a non-swap code easier.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
The first sector knowledge is swap-only specific. Move it into the
swap handle. This will be needed for later non-swap specific code
moving into snapshot.c.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Move block I/O operations to a separate file. It is because it will
be used later not only by the swap writer.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Remove support of reads with offset. This means snapshot_read/write_next
now does not accept count parameter. It allows to clean up the functions
and snapshot handle which no longer needs to care about offsets.
/dev/snapshot handler is converted to simple_{read_from,write_to}_buffer
which take care of offsets.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>