* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: (22 commits)
nilfs2: support contiguous lookup of blocks
nilfs2: add sync_page method to page caches of meta data
nilfs2: use device's backing_dev_info for btree node caches
nilfs2: return EBUSY against delete request on snapshot
nilfs2: modify list of unsupported features in caveats
nilfs2: enable sync_page method
nilfs2: set bio unplug flag for the last bio in segment
nilfs2: allow future expansion of metadata read out via get info ioctl
NILFS2: Pagecache usage optimization on NILFS2
nilfs2: remove nilfs_btree_operations from btree mapping
nilfs2: remove nilfs_direct_operations from direct mapping
nilfs2: remove bmap pointer operations
nilfs2: remove useless b_low and b_high fields from nilfs_bmap struct
nilfs2: remove pointless NULL check of bpop_commit_alloc_ptr function
nilfs2: move get block functions in bmap.c into btree codes
nilfs2: remove nilfs_bmap_delete_block
nilfs2: remove nilfs_bmap_put_block
nilfs2: remove header file for segment list operations
nilfs2: eliminate removal list of segments
nilfs2: add sufile function that can modify multiple segment usages
...
This will remove every bd_mount_sem use in nilfs.
The intended exclusion control was replaced by the previous patch
("nilfs2: correct exclusion control in nilfs_remount function") for
nilfs_remount(), and this patch will replace remains with a new mutex
that this inserts in nilfs object.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
nilfs_remount() changes mount state of a superblock instance. Even
though nilfs accesses other superblock instances during mount or
remount, the mount state was not properly protected in
nilfs_remount().
Moreover, nilfs_remount() has a lock order reversal problem;
nilfs_get_sb() holds:
1. bdev->bd_mount_sem
2. sb->s_umount (sget acquires)
and nilfs_remount() holds:
1. sb->s_umount (locked by the caller in vfs)
2. bdev->bd_mount_sem
To avoid these problems, this patch divides a semaphore protecting
super block instances from nilfs->ns_sem, and applies it to the mount
state protection in nilfs_remount().
With this change, bd_mount_sem use is removed from nilfs_remount() and
the lock order reversal will be resolved. And the new rw-semaphore,
nilfs->ns_super_sem will properly protect the mount state except the
modification from nilfs_error function.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This simplifies the test function passed on the remaining sget()
callsite in nilfs.
Instead of checking mount type (i.e. ro-mount/rw-mount/snapshot mount)
in the test function passed to sget(), this patch first looks up the
nilfs_sb_info struct which the given mount type matches, and then
acquires the super block instance holding the nilfs_sb_info.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This stops using sget() for checking if an r/w-mount or an r/o-mount
exists on the device. This elimination uses a back pointer to the
current mount added to nilfs object.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This will change the way to obtain nilfs object in nilfs_get_sb()
function.
Previously, a preliminary sget() call was performed, and the nilfs
object was acquired from a super block instance found by the sget()
call.
This patch, instead, instroduces a new dedicated function
find_or_create_nilfs(); as the name implies, the function finds an
existent nilfs object from a global list or creates a new one if no
object is found on the device.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The following EBUSY case in nilfs_get_sb() is meaningless. Indeed,
this error code is never returned to the caller.
if (!s->s_root) {
...
} else if (!(s->s_flags & MS_RDONLY)) {
err = -EBUSY;
}
This simply removes the else case.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The call to ->write_super from __sync_filesystem will go away, so make
sure nilfs2 performs the same actions from inside ->sync_fs.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Move BKL into ->put_super from the only caller. A couple of
filesystems had trivial enough ->put_super (only kfree and NULLing of
s_fs_info + stuff in there) to not get any locking: coda, cramfs, efs,
hugetlbfs, omfs, qnx4, shmem, all others got the full treatment. Most
of them probably don't need it, but I'd rather sort that out individually.
Preferably after all the other BKL pushdowns in that area.
[AV: original used to move lock_super() down as well; these changes are
removed since we don't do lock_super() at all in generic_shutdown_super()
now]
[AV: fuse, btrfs and xfs are known to need no damn BKL, exempt]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
We just did a full fs writeout using sync_filesystem before, and if
that's not enough for the filesystem it can perform it's own writeout
in ->put_super, which many filesystems already do.
Move a call to foofs_write_super into every foofs_put_super for now to
guarantee identical behaviour until it's cleaned up by the individual
filesystem maintainers.
Exceptions:
- affs already has identical copy & pasted code at the beginning of
affs_put_super so no need to do it twice.
- xfs does the right thing without it and I have changes pending for
the xfs tree touching this are so I don't really need conflicts
here..
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block: (153 commits)
block: add request clone interface (v2)
floppy: fix hibernation
ramdisk: remove long-deprecated "ramdisk=" boot-time parameter
fs/bio.c: add missing __user annotation
block: prevent possible io_context->refcount overflow
Add serial number support for virtio_blk, V4a
block: Add missing bounce_pfn stacking and fix comments
Revert "block: Fix bounce limit setting in DM"
cciss: decode unit attention in SCSI error handling code
cciss: Remove no longer needed sendcmd reject processing code
cciss: change SCSI error handling routines to work with interrupts enabled.
cciss: separate error processing and command retrying code in sendcmd_withirq_core()
cciss: factor out fix target status processing code from sendcmd functions
cciss: simplify interface of sendcmd() and sendcmd_withirq()
cciss: factor out core of sendcmd_withirq() for use by SCSI error handling code
cciss: Use schedule_timeout_uninterruptible in SCSI error handling code
block: needs to set the residual length of a bidi request
Revert "block: implement blkdev_readpages"
block: Fix bounce limit setting in DM
Removed reference to non-existing file Documentation/PCI/PCI-DMA-mapping.txt
...
Manually fix conflicts with tracing updates in:
block/blk-sysfs.c
drivers/ide/ide-atapi.c
drivers/ide/ide-cd.c
drivers/ide/ide-floppy.c
drivers/ide/ide-tape.c
include/trace/events/block.h
kernel/trace/blktrace.c
Although get_block() callback function can return extent of contiguous
blocks with bh->b_size, nilfs_get_block() function did not support
this feature.
This adds contiguous lookup feature to the block mapping codes of
nilfs, and allows the nilfs_get_blocks() function to return the extent
information by applying the feature.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
This applies block_sync_page() function to the sync_page method of
page caches for meta data files, gc page caches, and btree node
buffers. This is a companion patch of ("nilfs2: enable sync_page
mothod") which applied the function for data pages.
This allows lock_page() for those meta data to unplug pending bio
requests.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Previously, default_backing_dev_info was used for the mapping of btree
node caches. This uses device dependent backing_dev_info to allow
detailed control of the device for the btree node pages.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
This helps userland programs like the rmcp command to distinguish
error codes returned against a checkpoint removal request.
Previously -EPERM was returned, and not discriminable from real
permission errors. This also allows removal of the latest checkpoint
because the deletion leads to create a new checkpoint, and thus it's
harmless for the filesystem.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
This adds a missing sync_page method which unplugs bio requests when
waiting for page locks. This will improve read performance of nilfs.
Here is a measurement result using dd command.
Without this patch:
# mount -t nilfs2 /dev/sde1 /test
# dd if=/test/aaa of=/dev/null bs=512k
1024+0 records in
1024+0 records out
536870912 bytes (537 MB) copied, 6.00688 seconds, 89.4 MB/s
With this patch:
# mount -t nilfs2 /dev/sde1 /test
# dd if=/test/aaa of=/dev/null bs=512k
1024+0 records in
1024+0 records out
536870912 bytes (537 MB) copied, 3.54998 seconds, 151 MB/s
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
This sets BIO_RW_UNPLUG flag on the last bio of each segment during
write. The last bio should be unplugged immediately because the
caller waits for the completion after the submission.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Nilfs has some ioctl commands to read out metadata from meta data
files:
- NILFS_IOCTL_GET_CPINFO for checkpoint file,
- NILFS_IOCTL_GET_SUINFO for segment usage file, and
- NILFS_IOCTL_GET_VINFO for Disk Address Transalation (DAT) file,
respectively.
Every routine on these metadata files is implemented so that it allows
future expansion of on-disk format. But, the above ioctl commands do
not support expansion even though nilfs_argv structure can handle
arbitrary size for data exchanged via ioctl.
This allows future expansion of the following structures which give
basic format of the "get information" ioctls:
- struct nilfs_cpinfo
- struct nilfs_suinfo
- struct nilfs_vinfo
So, this introduces forward compatility of such ioctl commands.
In this patch, a sanity check in nilfs_ioctl_get_info() function is
changed to accept larger data structure [1], and metadata read
routines are rewritten so that they become compatible for larger
structures; the routines will just ignore the remaining fields which
the current version of nilfs doesn't know.
[1] The ioctl function already has another upper limit (PAGE_SIZE
against a structure, which appears in nilfs_ioctl_wrap_copy
function), and this will not cause security problem.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Hi,
I introduced "is_partially_uptodate" aops for NILFS2.
A page can have multiple buffers and even if a page is not uptodate, some buffers
can be uptodate on pagesize != blocksize environment.
This aops checks that all buffers which correspond to a part of a file
that we want to read are uptodate. If so, we do not have to issue actual
read IO to HDD even if a page is not uptodate because the portion we
want to read are uptodate.
"block_is_partially_uptodate" function is already used by ext2/3/4.
With the following patch random read/write mixed workloads or random read after
random write workloads can be optimized and we can get performance improvement.
I did a performance test using the sysbench.
1 --file-block-size=8K --file-total-size=2G --file-test-mode=rndrw --file-fsync-freq=0 --fil
e-rw-ratio=1 run
-2.6.30-rc5
Test execution summary:
total time: 151.2907s
total number of events: 200000
total time taken by event execution: 2409.8387
per-request statistics:
min: 0.0000s
avg: 0.0120s
max: 0.9306s
approx. 95 percentile: 0.0439s
Threads fairness:
events (avg/stddev): 12500.0000/238.52
execution time (avg/stddev): 150.6149/0.01
-2.6.30-rc5-patched
Test execution summary:
total time: 140.8828s
total number of events: 200000
total time taken by event execution: 2240.8577
per-request statistics:
min: 0.0000s
avg: 0.0112s
max: 0.8750s
approx. 95 percentile: 0.0418s
Threads fairness:
events (avg/stddev): 12500.0000/218.43
execution time (avg/stddev): 140.0536/0.01
arch: ia64
pagesize: 16k
Thanks.
Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Previously, the bmap codes of nilfs used three types of function
tables. The abuse of indirect function calls decreased source
readability and suffered many indirect jumps which would confuse
branch prediction of processors.
This eliminates one type of the function tables,
nilfs_bmap_ptr_operations, which was used to dispatch low level
pointer operations of the nilfs bmap.
This adds a new integer variable "b_ptr_type" to nilfs_bmap struct,
and uses the value to select the pointer operations.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
This will cut off 16 bytes from the nilfs_bmap struct which is
embedded in the on-memory inode of nilfs.
The b_high field was never used, and the b_low field stores a constant
value which can be determined by whether the inode uses btree for
block mapping or not.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
This indirect function is set to NULL only for gc cache inodes, but
the gc cache inodes never call this function.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>