Commit Graph

39 Commits

Author SHA1 Message Date
Christoph Hellwig 83ba7b071f writeback: simplify the write back thread queue
First remove items from work_list as soon as we start working on them.  This
means we don't have to track any pending or visited state and can get
rid of all the RCU magic freeing the work items - we can simply free
them once the operation has finished.  Second use a real completion for
tracking synchronous requests - if the caller sets the completion pointer
we complete it, otherwise use it as a boolean indicator that we can free
the work item directly.  Third unify struct wb_writeback_args and struct
bdi_work into a single data structure, wb_writeback_work.  Previous we
set all parameters into a struct wb_writeback_args, copied it into
struct bdi_work, copied it again on the stack to use it there.  Instead
of just allocate one structure dynamically or on the stack and use it
all the way through the stack.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-07-06 08:59:53 +02:00
Christoph Hellwig 9c3a8ee8a1 writeback: remove writeback_inodes_wbc
This was just an odd wrapper around writeback_inodes_wb.  Removing this
also allows to get rid of the bdi member of struct writeback_control
which was rather out of place there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-07-06 08:54:03 +02:00
Jens Axboe 6423104b6a writeback: fixups for !dirty_writeback_centisecs
Commit 69b62d01 fixed up most of the places where we would enter
busy schedule() spins when disabling the periodic background
writeback. This fixes up the sb timer so that it doesn't get
hammered on with the delay disabled, and ensures that it gets
rearmed if needed when /proc/sys/vm/dirty_writeback_centisecs
gets modified.

bdi_forker_task() also needs to check for !dirty_writeback_centisecs
and use schedule() appropriately, fix that up too.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-05-21 20:00:35 +02:00
Jörn Engel 5129a469a9 Catch filesystems lacking s_bdi
noop_backing_dev_info is used only as a flag to mark filesystems that
don't have any backing store, like tmpfs, procfs, spufs, etc.

Signed-off-by: Joern Engel <joern@logfs.org>

Changed the BUG_ON() to a WARN_ON(). Note that adding dirty inodes
to the noop_backing_dev_info is not legal and will not result in
them being flushed, but we already catch this condition in
__mark_inode_dirty() when checking for a registered bdi.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-25 08:54:42 +02:00
Jens Axboe c3c532061e bdi: add helper function for doing init and register of a bdi for a file system
Pretty trivial helper, just sets up the bdi and registers it. An atomic
sequence count is used to ensure that the registered sysfs names are
unique.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-22 11:39:36 +02:00
Anton Blanchard 1442145373 backing-dev: Handle class_create() failure
I hit this when we had a bug in IDR for a few days. Basically sysfs would
fail to create new inodes since it uses an IDR and therefore class_create would
fail.

While we are unlikely to see this fail we may as well handle it instead of
oopsing.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-02 09:46:55 +02:00
OGAWA Hirofumi bf7ec5bb61 flusher: Fix PF_FROZEN race
To touch task->flags directly is racy. thaw_process() still has race
(changing non_current->flags, but this is another issue) though, I think
it's much better off.

So, use thaw_process() instead.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-03 13:49:43 +01:00
Romit Dasgupta c62b17a58a Thaw refrigerated bdi flusher threads before invoking kthread_stop on them
Unfreezes the bdi flusher task when the said task needs to exit.

Steps to reproduce this.
1) Mount a file system from MMC/SD card.
2) Unmount the file system. This creates a flusher task.
3) Attempt suspend to RAM. System is unresponsive.

This is because the bdi flusher thread is already in the refrigerator and will
remain so until it is thawed. The MMC driver suspend routine call stack will
ultimately issue a 'kthread_stop' on the bdi flusher thread and will block
until the flusher thread is exited. Since the bdi flusher thread is in the
refrigerator it never cleans up until thawed.

Signed-off-by: Romit Dasgupta <romit@ti.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-11-12 13:08:11 +01:00
Jens Axboe 8c4db3355b backing-dev: bdi sb prune should be in the unregister path, not destroy
Commit 592b09a42f was different from
the tested path, in that it moved the bdi super_block prune from
unregister to destroy context. This doesn't fully fix the sync hang
bug on unexpected device removal, as need to prune the bdi cache
pointer before killing flusher thread.

Tested-by: Artur Skawina <art.08.09@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-11-03 20:18:44 +01:00
Jens Axboe 592b09a42f backing-dev: ensure that a removed bdi no longer has super_block referencing it
When the bdi is being removed, we have to ensure that no super_blocks
currently have that cached in sb->s_bdi. Normally this is ensured by
the sb having a longer life span than the bdi, but if the device is
suddenly yanked, we have to kill this reference. sb->s_bdi is pointed
to freed memory at that point.

This fixes a problem with sync(1) hanging when a USB stick is pulled
without cleanly umounting it first.

Reported-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-29 11:46:12 +01:00
Wu Fengguang 961515f613 writeback: kill space in debugfs item name
The space is not script friendly, kill it.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-09 13:01:27 +02:00
Jens Axboe ce5f8e7795 writeback: splice dirty inode entries to default bdi on bdi_destroy()
We cannot safely ensure that the inodes are all gone at this point
in time, and we must not destroy this bdi with inodes having off it.
So just splice our entries to the default bdi since that one will
always persist.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-16 15:18:52 +02:00
Jens Axboe cfc4ba5365 writeback: use RCU to protect bdi_list
Now that bdi_writeback_all() no longer handles integrity writeback,
it doesn't have to block anymore. This means that we can switch
bdi_list reader side protection to RCU.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-16 15:18:51 +02:00
Jens Axboe 500b067c5e writeback: check for registered bdi in flusher add and inode dirty
Also a debugging aid. We want to catch dirty inodes being added to
backing devices that don't do writeback.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11 09:20:26 +02:00
Jens Axboe d993831fa7 writeback: add name to backing_dev_info
This enables us to track who does what and print info. Its main use
is catching dirty inodes on the default_backing_dev_info, so we can
fix that up.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11 09:20:26 +02:00
Jens Axboe f09b00d3e7 writeback: add some debug inode list counters to bdi stats
Add some debug entries to be able to inspect the internal state of
the writeback details.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11 09:20:26 +02:00
Jens Axboe 03ba3782e8 writeback: switch to per-bdi threads for flushing data
This gets rid of pdflush for bdi writeout and kupdated style cleaning.
pdflush writeout suffers from lack of locality and also requires more
threads to handle the same workload, since it has to work in a
non-blocking fashion against each queue. This also introduces lumpy
behaviour and potential request starvation, since pdflush can be starved
for queue access if others are accessing it. A sample ffsb workload that
does random writes to files is about 8% faster here on a simple SATA drive
during the benchmark phase. File layout also seems a LOT more smooth in
vmstat:

 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 0  1      0 608848   2652 375372    0    0     0 71024  604    24  1 10 48 42
 0  1      0 549644   2712 433736    0    0     0 60692  505    27  1  8 48 44
 1  0      0 476928   2784 505192    0    0     4 29540  553    24  0  9 53 37
 0  1      0 457972   2808 524008    0    0     0 54876  331    16  0  4 38 58
 0  1      0 366128   2928 614284    0    0     4 92168  710    58  0 13 53 34
 0  1      0 295092   3000 684140    0    0     0 62924  572    23  0  9 53 37
 0  1      0 236592   3064 741704    0    0     4 58256  523    17  0  8 48 44
 0  1      0 165608   3132 811464    0    0     0 57460  560    21  0  8 54 38
 0  1      0 102952   3200 873164    0    0     4 74748  540    29  1 10 48 41
 0  1      0  48604   3252 926472    0    0     0 53248  469    29  0  7 47 45

where vanilla tends to fluctuate a lot in the creation phase:

 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 1  1      0 678716   5792 303380    0    0     0 74064  565    50  1 11 52 36
 1  0      0 662488   5864 319396    0    0     4   352  302   329  0  2 47 51
 0  1      0 599312   5924 381468    0    0     0 78164  516    55  0  9 51 40
 0  1      0 519952   6008 459516    0    0     4 78156  622    56  1 11 52 37
 1  1      0 436640   6092 541632    0    0     0 82244  622    54  0 11 48 41
 0  1      0 436640   6092 541660    0    0     0     8  152    39  0  0 51 49
 0  1      0 332224   6200 644252    0    0     4 102800  728    46  1 13 49 36
 1  0      0 274492   6260 701056    0    0     4 12328  459    49  0  7 50 43
 0  1      0 211220   6324 763356    0    0     0 106940  515    37  1 10 51 39
 1  0      0 160412   6376 813468    0    0     0  8224  415    43  0  6 49 45
 1  1      0  85980   6452 886556    0    0     4 113516  575    39  1 11 54 34
 0  2      0  85968   6452 886620    0    0     0  1640  158   211  0  0 46 54

A 10 disk test with btrfs performs 26% faster with per-bdi flushing. A
SSD based writeback test on XFS performs over 20% better as well, with
the throughput being very stable around 1GB/sec, where pdflush only
manages 750MB/sec and fluctuates wildly while doing so. Random buffered
writes to many files behave a lot better as well, as does random mmap'ed
writes.

A separate thread is added to sync the super blocks. In the long term,
adding sync_supers_bdi() functionality could get rid of this thread again.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11 09:20:25 +02:00
Jens Axboe 66f3b8e2e1 writeback: move dirty inodes from super_block to backing_dev_info
This is a first step at introducing per-bdi flusher threads. We should
have no change in behaviour, although sb_has_dirty_inodes() is now
ridiculously expensive, as there's no easy way to answer that question.
Not a huge problem, since it'll be deleted in subsequent patches.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11 09:20:25 +02:00
Jens Axboe 8aa7e847d8 Fix congestion_wait() sync/async vs read/write confusion
Commit 1faa16d228 accidentally broke
the bdi congestion wait queue logic, causing us to wait on congestion
for WRITE (== 1) when we really wanted BLK_RW_ASYNC (== 0) instead.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-07-10 20:31:53 +02:00
Jens Axboe 1faa16d228 block: change the request allocation/congestion logic to be sync/async based
This makes sure that we never wait on async IO for sync requests, instead
of doing the split on writes vs reads.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-06 08:04:53 -07:00
Jens Axboe 26160158d3 Move the default_backing_dev_info out of readahead.c and into backing-dev.c
It really makes no sense to have it in readahead.c, so move it where
it belongs.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-03-26 11:01:33 +01:00
Linus Torvalds f94181da71 Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  rcu: fix rcutorture bug
  rcu: eliminate synchronize_rcu_xxx macro
  rcu: make treercu safe for suspend and resume
  rcu: fix rcutree grace-period-latency bug on small systems
  futex: catch certain assymetric (get|put)_futex_key calls
  futex: make futex_(get|put)_key() calls symmetric
  locking, percpu counters: introduce separate lock classes
  swiotlb: clean up EXPORT_SYMBOL usage
  swiotlb: remove unnecessary declaration
  swiotlb: replace architecture-specific swiotlb.h with linux/swiotlb.h
  swiotlb: add support for systems with highmem
  swiotlb: store phys address in io_tlb_orig_addr array
  swiotlb: add hwdev to swiotlb_phys_to_bus() / swiotlb_sg_to_bus()
2009-01-06 17:10:04 -08:00
David Rientjes 364aeb2849 mm: change dirty limit type specifiers to unsigned long
The background dirty and dirty limits are better defined with type
specifiers of unsigned long since negative writeback thresholds are not
possible.

These values, as returned by get_dirty_limits(), are normally compared
with ZVC values to determine whether writeback shall commence or be
throttled.  Such page counts cannot be negative, so declaring the page
limits as signed is unnecessary.

Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Andrea Righi <righi.andrea@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:02 -08:00
Ingo Molnar fdbc0450df Merge branches 'core/futexes', 'core/locking', 'core/rcu' and 'linus' into core/urgent 2009-01-06 09:32:11 +01:00
Peter Zijlstra ea319518ba locking, percpu counters: introduce separate lock classes
Impact: fix lockdep false positives

Classify percpu_counter instances similar to regular lock objects --
that is, per instantiation site.

The networking code has increased its use of percpu_counters, which
leads to false positives if they are treated as a single class.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-29 13:43:00 +01:00