Commit Graph

214 Commits

Author SHA1 Message Date
Jan Kara
169ebd9013 writeback: Avoid iput() from flusher thread
Doing iput() from flusher thread (writeback_sb_inodes()) can create problems
because iput() can do a lot of work - for example truncate the inode if it's
the last iput on unlinked file. Some filesystems depend on flusher thread
progressing (e.g. because they need to flush delay allocated blocks to reduce
allocation uncertainty) and so flusher thread doing truncate creates
interesting dependencies and possibilities for deadlocks.

We get rid of iput() in flusher thread by using the fact that I_SYNC inode
flag effectively pins the inode in memory. So if we take care to either hold
i_lock or have I_SYNC set, we can get away without taking inode reference
in writeback_sb_inodes().

As a side effect of these changes, we also fix possible use-after-free in
wb_writeback() because inode_wait_for_writeback() call could try to reacquire
i_lock on the inode that was already free.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
2012-05-06 13:43:41 +08:00
Jan Kara
4f8ad655db writeback: Refactor writeback_single_inode()
The code in writeback_single_inode() is relatively complex. The list requeing
logic makes sense only for flusher thread but not really for sync_inode() or
write_inode_now() callers. Also when we want to get rid of inode references
held by flusher thread, we will need a special I_SYNC handling there.

So separate part of writeback_single_inode() which does the real writeback work
into __writeback_single_inode() and make writeback_single_inode() do only stuff
necessary for callers writing only one inode, moving the special list handling
into writeback_sb_inodes(). As a sideeffect this fixes a possible race where we
could skip some inode during sync(2) because other writer refiled it from b_io
to b_dirty list. Also I_SYNC handling is moved into the callers of
__writeback_single_inode() to make locking easier.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
2012-05-06 13:43:40 +08:00
Jan Kara
f0d07b7ffd writeback: Remove wb->list_lock from writeback_single_inode()
writeback_single_inode() doesn't need wb->list_lock for anything on entry now.
So remove the requirement. This makes locking of writeback_single_inode()
temporarily awkward (entering with i_lock, returning with i_lock and
wb->list_lock) but it will be sanitized in the next patch.

Also inode_wait_for_writeback() doesn't need wb->list_lock for anything. It was
just taking it to make usage convenient for callers but with
writeback_single_inode() changing it's not very convenient anymore. So remove
the lock from that function.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
2012-05-06 13:43:39 +08:00
Jan Kara
ccb26b5a65 writeback: Separate inode requeueing after writeback
Move inode requeueing after inode has been written out into a separate
function.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
2012-05-06 13:43:39 +08:00
Jan Kara
6290be1c1d writeback: Move I_DIRTY_PAGES handling
Instead of clearing I_DIRTY_PAGES and resetting it when we didn't succeed in
writing them all, just clear the bit only when we succeeded writing all the
pages. We also move the clearing of the bit close to other i_state handling to
separate it from writeback list handling. This is desirable because list
handling will differ for flusher thread and other writeback_single_inode()
callers in future. No filesystem plays any tricks with I_DIRTY_PAGES (like
checking it in ->writepages or ->write_inode implementation) so this movement
is safe.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
2012-05-06 13:43:39 +08:00
Jan Kara
cc1676d917 writeback: Move requeueing when I_SYNC set to writeback_sb_inodes()
When writeback_single_inode() is called on inode which has I_SYNC already
set while doing WB_SYNC_NONE, inode is moved to b_more_io list. However
this makes sense only if the caller is flusher thread. For other callers of
writeback_single_inode() it doesn't really make sense and may be even wrong
- flusher thread may be doing WB_SYNC_ALL writeback in parallel.

So we move requeueing from writeback_single_inode() to writeback_sb_inodes().

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
2012-05-06 13:43:38 +08:00
Jan Kara
365b94ae67 writeback: Move clearing of I_SYNC into inode_sync_complete()
Move clearing of I_SYNC into inode_sync_complete().  It is more logical to have
clearing of I_SYNC bit and waking of waiters in one place. Also later we will
have two places needing to clear I_SYNC and wake up waiters so this allows them
to use the common helper. Moving of I_SYNC clearing to a later stage of
writeback_single_inode() is safe since we hold i_lock all the time.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
2012-05-06 13:43:38 +08:00
Linus Torvalds
529b73fc0a Merge tag 'writeback-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux
Pull trivial writeback fixes from Wu Fengguang:
 "They've been tested in linux-next for 20 days actually."

* tag 'writeback-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
  writeback: Remove outdated comment
  fs: Remove bogus wait in write_inode_now()
2012-03-28 10:07:27 -07:00
Linus Torvalds
11bcb32848 Merge tag 'module-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
Pull cleanup of fs/ and lib/ users of module.h from Paul Gortmaker:
 "Fix up files in fs/ and lib/ dirs to only use module.h if they really
  need it.

  These are trivial in scope vs the work done previously.  We now have
  things where any few remaining cleanups can be farmed out to arch or
  subsystem maintainers, and I have done so when possible.  What is
  remaining here represents the bits that don't clearly lie within a
  single arch/subsystem boundary, like the fs dir and the lib dir.

  Some duplicate includes arising from overlapping fixes from
  independent subsystem maintainer submissions are also quashed."

Fix up trivial conflicts due to clashes with other include file cleanups
(including some due to the previous bug.h cleanup pull).

* tag 'module-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
  lib: reduce the use of module.h wherever possible
  fs: reduce the use of module.h wherever possible
  includecheck: delete any duplicate instances of module.h
2012-03-24 10:24:31 -07:00
Jan Kara
697e6fed9f writeback: Remove outdated comment
The comment is hopelessly outdated and misplaced. We no longer have 'bdi'
part of writeback work, the comment about blockdev super is outdated,
comment about throttling as well. Information about list handling is in
more detail at queue_io(). So just move the bit about older_than_this to
close to move_expired_inodes() and remove the rest.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
2012-03-21 15:27:08 +08:00
Jan Kara
f469ec9c5b fs: Remove bogus wait in write_inode_now()
inode_sync_wait() in write_inode_now() is just bogus. That function waits for
I_SYNC bit to be cleared but writeback_single_inode() clears the bit on return
so the wait is effectivelly a nop unless someone else submits the inode for
writeback again. All the waiting write_inode_now() needs is achieved by using
WB_SYNC_ALL writeback mode.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
2012-03-21 15:26:47 +08:00
Linus Torvalds
69a7aebcf0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree from Jiri Kosina:
 "It's indeed trivial -- mostly documentation updates and a bunch of
  typo fixes from Masanari.

  There are also several linux/version.h include removals from Jesper."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (101 commits)
  kcore: fix spelling in read_kcore() comment
  constify struct pci_dev * in obvious cases
  Revert "char: Fix typo in viotape.c"
  init: fix wording error in mm_init comment
  usb: gadget: Kconfig: fix typo for 'different'
  Revert "power, max8998: Include linux/module.h just once in drivers/power/max8998_charger.c"
  writeback: fix fn name in writeback_inodes_sb_nr_if_idle() comment header
  writeback: fix typo in the writeback_control comment
  Documentation: Fix multiple typo in Documentation
  tpm_tis: fix tis_lock with respect to RCU
  Revert "media: Fix typo in mixer_drv.c and hdmi_drv.c"
  Doc: Update numastat.txt
  qla4xxx: Add missing spaces to error messages
  compiler.h: Fix typo
  security: struct security_operations kerneldoc fix
  Documentation: broken URL in libata.tmpl
  Documentation: broken URL in filesystems.tmpl
  mtd: simplify return logic in do_map_probe()
  mm: fix comment typo of truncate_inode_pages_range
  power: bq27x00: Fix typos in comment
  ...
2012-03-20 21:12:50 -07:00
Fengguang Wu
c097b2ca51 writeback: fix fn name in writeback_inodes_sb_nr_if_idle() comment header
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-03-07 16:08:46 +01:00
Paul Gortmaker
630d9c4727 fs: reduce the use of module.h wherever possible
For files only using THIS_MODULE and/or EXPORT_SYMBOL, map
them onto including export.h -- or if the file isn't even
using those, then just delete the include.  Fix up any implicit
include dependencies that were being masked by module.h along
the way.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-02-28 19:31:58 -05:00
Wu Fengguang
15eb77a07c writeback: fix NULL bdi->dev in trace writeback_single_inode
bdi_prune_sb() resets sb->s_bdi to default_backing_dev_info when the
tearing down the original bdi. Fix trace_writeback_single_inode to
use sb->s_bdi=default_backing_dev_info rather than bdi->dev=NULL for a
teared down bdi.

Cc: <stable@kernel.org>
Reported-by: Rabin Vincent <rabin@rab.in>
Tested-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2012-02-01 16:53:40 +08:00
Linus Torvalds
001a541ea9 Merge branch 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux
* 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
  writeback: move MIN_WRITEBACK_PAGES to fs-writeback.c
  writeback: balanced_rate cannot exceed write bandwidth
  writeback: do strict bdi dirty_exceeded
  writeback: avoid tiny dirty poll intervals
  writeback: max, min and target dirty pause time
  writeback: dirty ratelimit - think time compensation
  btrfs: fix dirtied pages accounting on sub-page writes
  writeback: fix dirtied pages accounting on redirty
  writeback: fix dirtied pages accounting on sub-page writes
  writeback: charge leaked page dirties to active tasks
  writeback: Include all dirty inodes in background writeback
2012-01-10 16:59:59 -08:00
Linus Torvalds
eb59c505f8 Merge branch 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
* 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (76 commits)
  PM / Hibernate: Implement compat_ioctl for /dev/snapshot
  PM / Freezer: fix return value of freezable_schedule_timeout_killable()
  PM / shmobile: Allow the A4R domain to be turned off at run time
  PM / input / touchscreen: Make st1232 use device PM QoS constraints
  PM / QoS: Introduce dev_pm_qos_add_ancestor_request()
  PM / shmobile: Remove the stay_on flag from SH7372's PM domains
  PM / shmobile: Don't include SH7372's INTCS in syscore suspend/resume
  PM / shmobile: Add support for the sh7372 A4S power domain / sleep mode
  PM: Drop generic_subsys_pm_ops
  PM / Sleep: Remove forward-only callbacks from AMBA bus type
  PM / Sleep: Remove forward-only callbacks from platform bus type
  PM: Run the driver callback directly if the subsystem one is not there
  PM / Sleep: Make pm_op() and pm_noirq_op() return callback pointers
  PM/Devfreq: Add Exynos4-bus device DVFS driver for Exynos4210/4212/4412.
  PM / Sleep: Merge internal functions in generic_ops.c
  PM / Sleep: Simplify generic system suspend callbacks
  PM / Hibernate: Remove deprecated hibernation snapshot ioctls
  PM / Sleep: Fix freezer failures due to racy usermodehelper_is_disabled()
  ARM: S3C64XX: Implement basic power domain support
  PM / shmobile: Use common always on power domain governor
  ...

Fix up trivial conflict in fs/xfs/xfs_buf.c due to removal of unused
XBT_FORCE_SLEEP bit
2012-01-08 13:10:57 -08:00
Wu Fengguang
bc31b86a59 writeback: move MIN_WRITEBACK_PAGES to fs-writeback.c
Fix compile error

 fs/fs-writeback.c:515:33: error: ‘PAGE_CACHE_SHIFT’ undeclared (first use in this function)

Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2012-01-08 10:35:19 +08:00
Al Viro
ff01bb4832 fs: move code out of buffer.c
Move invalidate_bdev, block_sync_page into fs/block_dev.c.  Export
kill_bdev as well, so brd doesn't have to open code it.  Reduce
buffer_head.h requirement accordingly.

Removed a rather large comment from invalidate_bdev, as it looked a bit
obsolete to bother moving.  The small comment replacing it says enough.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:07 -05:00
Rafael J. Wysocki
b7ba68c4a0 Merge branch 'pm-sleep' into pm-for-linus
* pm-sleep: (51 commits)
  PM: Drop generic_subsys_pm_ops
  PM / Sleep: Remove forward-only callbacks from AMBA bus type
  PM / Sleep: Remove forward-only callbacks from platform bus type
  PM: Run the driver callback directly if the subsystem one is not there
  PM / Sleep: Make pm_op() and pm_noirq_op() return callback pointers
  PM / Sleep: Merge internal functions in generic_ops.c
  PM / Sleep: Simplify generic system suspend callbacks
  PM / Hibernate: Remove deprecated hibernation snapshot ioctls
  PM / Sleep: Fix freezer failures due to racy usermodehelper_is_disabled()
  PM / Sleep: Recommend [un]lock_system_sleep() over using pm_mutex directly
  PM / Sleep: Replace mutex_[un]lock(&pm_mutex) with [un]lock_system_sleep()
  PM / Sleep: Make [un]lock_system_sleep() generic
  PM / Sleep: Use the freezer_count() functions in [un]lock_system_sleep() APIs
  PM / Freezer: Remove the "userspace only" constraint from freezer[_do_not]_count()
  PM / Hibernate: Replace unintuitive 'if' condition in kernel/power/user.c with 'else'
  Freezer / sunrpc / NFS: don't allow TASK_KILLABLE sleeps to block the freezer
  PM / Sleep: Unify diagnostic messages from device suspend/resume
  ACPI / PM: Do not save/restore NVS on Asus K54C/K54HR
  PM / Hibernate: Remove deprecated hibernation test modes
  PM / Hibernate: Thaw processes in SNAPSHOT_CREATE_IMAGE ioctl test path
  ...

Conflicts:
	kernel/kmod.c
2011-12-25 23:42:20 +01:00
Rafael J. Wysocki
b00f4dc5ff Merge branch 'master' into pm-sleep
* master: (848 commits)
  SELinux: Fix RCU deref check warning in sel_netport_insert()
  binary_sysctl(): fix memory leak
  mm/vmalloc.c: remove static declaration of va from __get_vm_area_node
  ipmi_watchdog: restore settings when BMC reset
  oom: fix integer overflow of points in oom_badness
  memcg: keep root group unchanged if creation fails
  nilfs2: potential integer overflow in nilfs_ioctl_clean_segments()
  nilfs2: unbreak compat ioctl
  cpusets: stall when updating mems_allowed for mempolicy or disjoint nodemask
  evm: prevent racing during tfm allocation
  evm: key must be set once during initialization
  mmc: vub300: fix type of firmware_rom_wait_states module parameter
  Revert "mmc: enable runtime PM by default"
  mmc: sdhci: remove "state" argument from sdhci_suspend_host
  x86, dumpstack: Fix code bytes breakage due to missing KERN_CONT
  IB/qib: Correct sense on freectxts increment and decrement
  RDMA/cma: Verify private data length
  cgroups: fix a css_set not found bug in cgroup_attach_proc
  oprofile: Fix uninitialized memory access when writing to writing to oprofilefs
  Revert "xen/pv-on-hvm kexec: add xs_reset_watches to shutdown watches from old kernel"
  ...

Conflicts:
	kernel/cgroup_freezer.c
2011-12-21 21:59:45 +01:00
Jan Kara
1bc36b6426 writeback: Include all dirty inodes in background writeback
Current livelock avoidance code makes background work to include only inodes
that were dirtied before background writeback has started. However background
writeback can be running for a long time and thus excluding newly dirtied
inodes can eventually exclude significant portion of dirty inodes making
background writeback inefficient. Since background writeback avoids livelocking
the flusher thread by yielding to any other work, there is no real reason why
background work should not include all dirty inodes so change the logic in
wb_writeback().

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-12-18 14:20:18 +08:00
Wu Fengguang
b3bba872dd writeback: show writeback reason with __print_symbolic
This makes the binary trace understandable by trace-cmd.

CC: Dave Chinner <david@fromorbit.com>
CC: Curt Wohlgemuth <curtw@google.com>
CC: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-12-18 14:20:17 +08:00
Marcos Paulo de Souza
786228ab30 writeback: Fix issue on make htmldocs
Document the @reason parameter to make "make htmldocs" happy.

Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Marcos Paulo de Souza <marcos.mage@gmail.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-11-29 15:50:28 +08:00
Tejun Heo
8a32c441c1 freezer: implement and use kthread_freezable_should_stop()
Writeback and thinkpad_acpi have been using thaw_process() to prevent
deadlock between the freezer and kthread_stop(); unfortunately, this
is inherently racy - nothing prevents freezing from happening between
thaw_process() and kthread_stop().

This patch implements kthread_freezable_should_stop() which enters
refrigerator if necessary but is guaranteed to return if
kthread_stop() is invoked.  Both thaw_process() users are converted to
use the new function.

Note that this deadlock condition exists for many of freezable
kthreads.  They need to be converted to use the new should_stop or
freezable workqueue.

Tested with synthetic test case.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Henrique de Moraes Holschuh <ibm-acpi@hmh.eng.br>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
2011-11-21 12:32:23 -08:00