There is no reason to export two functions for entering the
refrigerator. Calling refrigerator() instead of try_to_freeze()
doesn't save anything noticeable or removes any race condition.
* Rename refrigerator() to __refrigerator() and make it return bool
indicating whether it scheduled out for freezing.
* Update try_to_freeze() to return bool and relay the return value of
__refrigerator() if freezing().
* Convert all refrigerator() users to try_to_freeze().
* Update documentation accordingly.
* While at it, add might_sleep() to try_to_freeze().
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Samuel Ortiz <samuel@sortiz.org>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Cc: Christoph Hellwig <hch@infradead.org>
Checkpoint generation interval of nilfs goes wrong after user has
changed the interval parameter with nilfs-tune tool.
segctord starting. Construction interval = 5 seconds,
CP frequency < 30 seconds
segctord starting. Construction interval = 0 seconds,
CP frequency < 30 seconds
This turned out to be caused by a trivial bug in initialization code
of log writer. This will fix it.
Reported-by: Andrea Gelmini <andrea.gelmini@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
This replaces nilfs_mdt_mark_buffer_dirty and nilfs_btnode_mark_dirty
macros with mark_buffer_dirty and gets rid of nilfs_mark_buffer_dirty,
an own mark buffer dirty function.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
In the current nilfs, page cache for btree nodes and meta data files
do not set a valid back pointer to the host inode in mapping->host.
This will change it so that every address space in nilfs uses
mapping->host to hold its host inode.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
This uses list_first_entry macro instead of list_entry if it's used to
get the first entry.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
The super root block is newly-allocated each time it is written back
to disk, so unused portion of the block should be cleared.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
The size of super root structure depends on inode size, so
NILFS_SR_BYTES macro should be a function of the inode size. This
fixes the issue.
Even though a different size value will be written for a possible
future filesystem with extended inode, but fortunately this does not
break disk format compatibility.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Previously, nilfs was cloning pages for mmapped region to freeze their
data and ensure consistency of checksum during writeback cycles. A
private page allocator was used for this page cloning. But, we no
longer need to do that since clear_page_dirty_for_io function sets up
pte so that vm_ops->page_mkwrite function is called right before the
mmapped pages are modified and nilfs_page_mkwrite function can safely
wait for the pages to be written back to disk.
So, this stops making a copy of mmapped pages during writeback, and
eliminates the private page allocation and deallocation functions from
nilfs.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
This directly uses sb->s_fs_info to keep a nilfs filesystem object and
fully removes the intermediate nilfs_sb_info structure. With this
change, the hierarchy of on-memory structures of nilfs will be
simplified as follows:
Before:
super_block
-> nilfs_sb_info
-> the_nilfs
-> cptree --+-> nilfs_root (current file system)
+-> nilfs_root (snapshot A)
+-> nilfs_root (snapshot B)
:
-> nilfs_sc_info (log writer structure)
After:
super_block
-> the_nilfs
-> cptree --+-> nilfs_root (current file system)
+-> nilfs_root (snapshot A)
+-> nilfs_root (snapshot B)
:
-> nilfs_sc_info (log writer structure)
The reason why we didn't design so from the beginning is because the
initial shape also differed from the above. The early hierachy was
composed of "per-mount-point" super_block -> nilfs_sb_info pairs and a
shared nilfs object. On the kernel 2.6.37, it was changed to the
current shape in order to unify super block instances into one per
device, and this cleanup became applicable as the result.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Removes sci->sc_sbi which is a back pointer to nilfs_sb_info struct
from log writer object (nilfs_sc_info).
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Log writer is held by the nilfs_sb_info structure. This moves it into
nilfs object and replaces all uses of NILFS_SC() accessor.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Moves s_inode_lock spinlock and s_dirty_files list to nilfs object
from nilfs_sb_info structure.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
This moves four parameter variables on nilfs_sb_info s_resuid,
s_resgid, s_interval and s_watermark to the nilfs object.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
According to the report from Jiro SEKIBA titled "regression in
2.6.37?" (Message-Id: <8739n8vs1f.wl%jir@sekiba.com>), on 2.6.37 and
later kernels, lscp command no longer displays "i" flag on checkpoints
that snapshot operations or garbage collection created.
This is a regression of nilfs2 checkpointing function, and it's
critical since it broke behavior of a part of nilfs2 applications.
For instance, snapshot manager of TimeBrowse gets to create
meaningless snapshots continuously; snapshot creation triggers another
checkpoint, but applications cannot distinguish whether the new
checkpoint contains meaningful changes or not without the i-flag.
This patch fixes the regression and brings that application behavior
back to normal.
Reported-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Jiro SEKIBA <jir@unicus.jp>
Cc: stable <stable@kernel.org> [2.6.37]
nilfs_dat_inode function was a wrapper to switch between normal dat
inode and gcdat, a clone of the dat inode for garbage collection.
This function got obsolete when the gcdat inode was removed, and now
we can access the dat inode directly from a nilfs object. So, we will
unfold the wrapper and remove it.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Nilfs does not allocate new blocks on disk until they are actually
written to. To implement fiemap, we need to deal with such blocks.
To allow successive fiemap patch to distinguish mapped but unallocated
regions, this marks buffer heads of those new blocks as delayed and
clears the flag after the blocks are written to disk.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Some functions using nilfs bmap routines can wrongly return invalid
argument error (i.e. -EINVAL) that bmap returns as an internal code
for btree corruption.
This fixes the issue by catching and converting the internal EINVAL to
EIO and calling nilfs_error function inside bmap routines.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
To help developers and applications gain visibility into writeback
behaviour this patch adds two counters to /proc/vmstat.
# grep nr_dirtied /proc/vmstat
nr_dirtied 3747
# grep nr_written /proc/vmstat
nr_written 3618
These entries allow user apps to understand writeback behaviour over time
and learn how it is impacting their performance. Currently there is no
way to inspect dirty and writeback speed over time. It's not possible for
nr_dirty/nr_writeback.
These entries are necessary to give visibility into writeback behaviour.
We have /proc/diskstats which lets us understand the io in the block
layer. We have blktrace for more in depth understanding. We have
e2fsprogs and debugsfs to give insight into the file systems behaviour,
but we don't offer our users the ability understand what writeback is
doing. There is no way to know how active it is over the whole system, if
it's falling behind or to quantify it's efforts. With these values
exported users can easily see how much data applications are sending
through writeback and also at what rates writeback is processing this
data. Comparing the rates of change between the two allow developers to
see when writeback is not able to keep up with incoming traffic and the
rate of dirty memory being sent to the IO back end. This allows folks to
understand their io workloads and track kernel issues. Non kernel
engineers at Google often use these counters to solve puzzling performance
problems.
Patch #4 adds a pernode vmstat file with nr_dirtied and nr_written
Patch #5 add writeback thresholds to /proc/vmstat
Currently these values are in debugfs. But they should be promoted to
/proc since they are useful for developers who are writing databases
and file servers and are not debugging the kernel.
The output is as below:
# grep threshold /proc/vmstat
nr_pages_dirty_threshold 409111
nr_pages_dirty_background_threshold 818223
This patch:
This allows code outside of the mm core to safely manipulate page
writeback state and not worry about the other accounting. Not using these
routines means that some code will lose track of the accounting and we get
bugs.
Modify nilfs2 to use interface.
Signed-off-by: Michael Rubin <mrubin@google.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Cc: Jiro SEKIBA <jir@unicus.jp>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
insert sparse annotations to fix following sparse warning.
fs/nilfs2/segment.c:2681:3: warning: context imbalance in 'nilfs_segctor_kill_thread' - unexpected unlock
nilfs_segctor_kill_thread is only called inside sc_state_lock lock.
sparse doesn't detect the context and warn "unexpected unlock".
__acquires/__releases pretend to lock/unlock the sc_state_lock for sparse.
Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Nilfs hasn't supported the freeze/thaw feature because it didn't work
due to the peculiar design that multiple super block instances could
be allocated for a device. This limitation was removed by the patch
"nilfs2: do not allocate multiple super block instances for a device".
So now this adds the freeze/thaw support to nilfs.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Nilfs object holds a back pointer to a writable super block instance
in nilfs->ns_writer, and this became eliminable since sb is now made
per device and all inodes have a valid pointer to it.
This deletes the ns_writer pointer and a reader/writer semaphore
protecting it.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
This applies prepared rollback function and redirect function of
metadata file to DAT file, and eliminates GCDAT inode.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>