Commit Graph

96 Commits

Author SHA1 Message Date
Linus Torvalds 9c7cb99a82 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: (22 commits)
  nilfs2: support contiguous lookup of blocks
  nilfs2: add sync_page method to page caches of meta data
  nilfs2: use device's backing_dev_info for btree node caches
  nilfs2: return EBUSY against delete request on snapshot
  nilfs2: modify list of unsupported features in caveats
  nilfs2: enable sync_page method
  nilfs2: set bio unplug flag for the last bio in segment
  nilfs2: allow future expansion of metadata read out via get info ioctl
  NILFS2: Pagecache usage optimization on NILFS2
  nilfs2: remove nilfs_btree_operations from btree mapping
  nilfs2: remove nilfs_direct_operations from direct mapping
  nilfs2: remove bmap pointer operations
  nilfs2: remove useless b_low and b_high fields from nilfs_bmap struct
  nilfs2: remove pointless NULL check of bpop_commit_alloc_ptr function
  nilfs2: move get block functions in bmap.c into btree codes
  nilfs2: remove nilfs_bmap_delete_block
  nilfs2: remove nilfs_bmap_put_block
  nilfs2: remove header file for segment list operations
  nilfs2: eliminate removal list of segments
  nilfs2: add sufile function that can modify multiple segment usages
  ...
2009-06-15 09:13:49 -07:00
Ryusuke Konishi aa7dfb8954 nilfs2: get rid of bd_mount_sem use from nilfs
This will remove every bd_mount_sem use in nilfs.

The intended exclusion control was replaced by the previous patch
("nilfs2: correct exclusion control in nilfs_remount function") for
nilfs_remount(), and this patch will replace remains with a new mutex
that this inserts in nilfs object.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-11 21:36:18 -04:00
Ryusuke Konishi e59399d010 nilfs2: correct exclusion control in nilfs_remount function
nilfs_remount() changes mount state of a superblock instance.  Even
though nilfs accesses other superblock instances during mount or
remount, the mount state was not properly protected in
nilfs_remount().

Moreover, nilfs_remount() has a lock order reversal problem;
nilfs_get_sb() holds:

  1. bdev->bd_mount_sem
  2. sb->s_umount  (sget acquires)

and nilfs_remount() holds:

  1. sb->s_umount  (locked by the caller in vfs)
  2. bdev->bd_mount_sem

To avoid these problems, this patch divides a semaphore protecting
super block instances from nilfs->ns_sem, and applies it to the mount
state protection in nilfs_remount().

With this change, bd_mount_sem use is removed from nilfs_remount() and
the lock order reversal will be resolved.  And the new rw-semaphore,
nilfs->ns_super_sem will properly protect the mount state except the
modification from nilfs_error function.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-11 21:36:18 -04:00
Ryusuke Konishi 6dd4740662 nilfs2: simplify remaining sget() use
This simplifies the test function passed on the remaining sget()
callsite in nilfs.

Instead of checking mount type (i.e. ro-mount/rw-mount/snapshot mount)
in the test function passed to sget(), this patch first looks up the
nilfs_sb_info struct which the given mount type matches, and then
acquires the super block instance holding the nilfs_sb_info.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-11 21:36:18 -04:00
Ryusuke Konishi 3f82ff5516 nilfs2: get rid of sget use for checking if current mount is present
This stops using sget() for checking if an r/w-mount or an r/o-mount
exists on the device.  This elimination uses a back pointer to the
current mount added to nilfs object.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-11 21:36:17 -04:00
Ryusuke Konishi 33c8e57c86 nilfs2: get rid of sget use for acquiring nilfs object
This will change the way to obtain nilfs object in nilfs_get_sb()
function.

Previously, a preliminary sget() call was performed, and the nilfs
object was acquired from a super block instance found by the sget()
call.

This patch, instead, instroduces a new dedicated function
find_or_create_nilfs(); as the name implies, the function finds an
existent nilfs object from a global list or creates a new one if no
object is found on the device.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-11 21:36:17 -04:00
Ryusuke Konishi 81fc20bd0e nilfs2: remove meaningless EBUSY case from nilfs_get_sb function
The following EBUSY case in nilfs_get_sb() is meaningless.  Indeed,
this error code is never returned to the caller.

    if (!s->s_root) {
          ...
    } else if (!(s->s_flags & MS_RDONLY)) {
        err = -EBUSY;
    }

This simply removes the else case.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-11 21:36:17 -04:00
Christoph Hellwig d731e06323 nilfs2: call nilfs2_write_super from nilfs2_sync_fs
The call to ->write_super from __sync_filesystem will go away, so make
sure nilfs2 performs the same actions from inside ->sync_fs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-11 21:36:17 -04:00
Alessio Igor Bogani 337eb00a2c Push BKL down into ->remount_fs()
[xfs, btrfs, capifs, shmem don't need BKL, exempt]

Signed-off-by: Alessio Igor Bogani <abogani@texware.it>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-11 21:36:11 -04:00
Christoph Hellwig 6cfd014842 push BKL down into ->put_super
Move BKL into ->put_super from the only caller.  A couple of
filesystems had trivial enough ->put_super (only kfree and NULLing of
s_fs_info + stuff in there) to not get any locking: coda, cramfs, efs,
hugetlbfs, omfs, qnx4, shmem, all others got the full treatment.  Most
of them probably don't need it, but I'd rather sort that out individually.
Preferably after all the other BKL pushdowns in that area.

[AV: original used to move lock_super() down as well; these changes are
removed since we don't do lock_super() at all in generic_shutdown_super()
now]
[AV: fuse, btrfs and xfs are known to need no damn BKL, exempt]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-11 21:36:07 -04:00
Christoph Hellwig 8c85e12512 remove ->write_super call in generic_shutdown_super
We just did a full fs writeout using sync_filesystem before, and if
that's not enough for the filesystem it can perform it's own writeout
in ->put_super, which many filesystems already do.

Move a call to foofs_write_super into every foofs_put_super for now to
guarantee identical behaviour until it's cleaned up by the individual
filesystem maintainers.

Exceptions:

 - affs already has identical copy & pasted code at the beginning of
   affs_put_super so no need to do it twice.
 - xfs does the right thing without it and I have changes pending for
   the xfs tree touching this are so I don't really need conflicts
   here..

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-11 21:36:06 -04:00
Linus Torvalds c9059598ea Merge branch 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block: (153 commits)
  block: add request clone interface (v2)
  floppy: fix hibernation
  ramdisk: remove long-deprecated "ramdisk=" boot-time parameter
  fs/bio.c: add missing __user annotation
  block: prevent possible io_context->refcount overflow
  Add serial number support for virtio_blk, V4a
  block: Add missing bounce_pfn stacking and fix comments
  Revert "block: Fix bounce limit setting in DM"
  cciss: decode unit attention in SCSI error handling code
  cciss: Remove no longer needed sendcmd reject processing code
  cciss: change SCSI error handling routines to work with interrupts enabled.
  cciss: separate error processing and command retrying code in sendcmd_withirq_core()
  cciss: factor out fix target status processing code from sendcmd functions
  cciss: simplify interface of sendcmd() and sendcmd_withirq()
  cciss: factor out core of sendcmd_withirq() for use by SCSI error handling code
  cciss: Use schedule_timeout_uninterruptible in SCSI error handling code
  block: needs to set the residual length of a bidi request
  Revert "block: implement blkdev_readpages"
  block: Fix bounce limit setting in DM
  Removed reference to non-existing file Documentation/PCI/PCI-DMA-mapping.txt
  ...

Manually fix conflicts with tracing updates in:
	block/blk-sysfs.c
	drivers/ide/ide-atapi.c
	drivers/ide/ide-cd.c
	drivers/ide/ide-floppy.c
	drivers/ide/ide-tape.c
	include/trace/events/block.h
	kernel/trace/blktrace.c
2009-06-11 11:10:35 -07:00
Ryusuke Konishi c3a7abf06c nilfs2: support contiguous lookup of blocks
Although get_block() callback function can return extent of contiguous
blocks with bh->b_size, nilfs_get_block() function did not support
this feature.

This adds contiguous lookup feature to the block mapping codes of
nilfs, and allows the nilfs_get_blocks() function to return the extent
information by applying the feature.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-06-10 23:41:12 +09:00
Ryusuke Konishi fa032744ad nilfs2: add sync_page method to page caches of meta data
This applies block_sync_page() function to the sync_page method of
page caches for meta data files, gc page caches, and btree node
buffers.  This is a companion patch of ("nilfs2: enable sync_page
mothod") which applied the function for data pages.

This allows lock_page() for those meta data to unplug pending bio
requests.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-06-10 23:41:12 +09:00
Ryusuke Konishi a53b4751ae nilfs2: use device's backing_dev_info for btree node caches
Previously, default_backing_dev_info was used for the mapping of btree
node caches.  This uses device dependent backing_dev_info to allow
detailed control of the device for the btree node pages.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-06-10 23:41:12 +09:00
Ryusuke Konishi 30c25be71f nilfs2: return EBUSY against delete request on snapshot
This helps userland programs like the rmcp command to distinguish
error codes returned against a checkpoint removal request.

Previously -EPERM was returned, and not discriminable from real
permission errors.  This also allows removal of the latest checkpoint
because the deletion leads to create a new checkpoint, and thus it's
harmless for the filesystem.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-06-10 23:41:12 +09:00
Ryusuke Konishi e85dc1d529 nilfs2: enable sync_page method
This adds a missing sync_page method which unplugs bio requests when
waiting for page locks. This will improve read performance of nilfs.

Here is a measurement result using dd command.

Without this patch:

 # mount -t nilfs2 /dev/sde1 /test
 # dd if=/test/aaa of=/dev/null bs=512k
 1024+0 records in
 1024+0 records out
 536870912 bytes (537 MB) copied, 6.00688 seconds, 89.4 MB/s

With this patch:

 # mount -t nilfs2 /dev/sde1 /test
 # dd if=/test/aaa of=/dev/null bs=512k
 1024+0 records in
 1024+0 records out
 536870912 bytes (537 MB) copied, 3.54998 seconds, 151 MB/s

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-06-10 23:41:11 +09:00
Ryusuke Konishi 30bda0b8ae nilfs2: set bio unplug flag for the last bio in segment
This sets BIO_RW_UNPLUG flag on the last bio of each segment during
write.  The last bio should be unplugged immediately because the
caller waits for the completion after the submission.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-06-10 23:41:11 +09:00
Ryusuke Konishi 003ff182fd nilfs2: allow future expansion of metadata read out via get info ioctl
Nilfs has some ioctl commands to read out metadata from meta data
files:

 - NILFS_IOCTL_GET_CPINFO for checkpoint file,
 - NILFS_IOCTL_GET_SUINFO for segment usage file, and
 - NILFS_IOCTL_GET_VINFO for Disk Address Transalation (DAT) file,
   respectively.

Every routine on these metadata files is implemented so that it allows
future expansion of on-disk format.  But, the above ioctl commands do
not support expansion even though nilfs_argv structure can handle
arbitrary size for data exchanged via ioctl.

This allows future expansion of the following structures which give
basic format of the "get information" ioctls:

 - struct nilfs_cpinfo
 - struct nilfs_suinfo
 - struct nilfs_vinfo

So, this introduces forward compatility of such ioctl commands.

In this patch, a sanity check in nilfs_ioctl_get_info() function is
changed to accept larger data structure [1], and metadata read
routines are rewritten so that they become compatible for larger
structures; the routines will just ignore the remaining fields which
the current version of nilfs doesn't know.

[1] The ioctl function already has another upper limit (PAGE_SIZE
    against a structure, which appears in nilfs_ioctl_wrap_copy
    function), and this will not cause security problem.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-06-10 23:41:11 +09:00
Hisashi Hifumi 258ef67e24 NILFS2: Pagecache usage optimization on NILFS2
Hi,

I introduced "is_partially_uptodate" aops for NILFS2.

A page can have multiple buffers and even if a page is not uptodate, some buffers
can be uptodate on pagesize != blocksize environment.
This aops checks that all buffers which correspond to a part of a file
that we want to read are uptodate. If so, we do not have to issue actual
read IO to HDD even if a page is not uptodate because the portion we
want to read are uptodate.
"block_is_partially_uptodate" function is already used by ext2/3/4.
With the following patch random read/write mixed workloads or random read after
random write workloads can be optimized and we can get performance improvement.

I did a performance test using the sysbench.

1 --file-block-size=8K --file-total-size=2G --file-test-mode=rndrw --file-fsync-freq=0 --fil
e-rw-ratio=1 run

-2.6.30-rc5

Test execution summary:
    total time:                          151.2907s
    total number of events:              200000
    total time taken by event execution: 2409.8387
    per-request statistics:
         min:                            0.0000s
         avg:                            0.0120s
         max:                            0.9306s
         approx.  95 percentile:         0.0439s

Threads fairness:
    events (avg/stddev):           12500.0000/238.52
    execution time (avg/stddev):   150.6149/0.01

-2.6.30-rc5-patched

Test execution summary:
    total time:                          140.8828s
    total number of events:              200000
    total time taken by event execution: 2240.8577
    per-request statistics:
         min:                            0.0000s
         avg:                            0.0112s
         max:                            0.8750s
         approx.  95 percentile:         0.0418s

Threads fairness:
    events (avg/stddev):           12500.0000/218.43
    execution time (avg/stddev):   140.0536/0.01

arch: ia64
pagesize: 16k

Thanks.

Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-06-10 23:41:11 +09:00
Ryusuke Konishi 7cde31d7d6 nilfs2: remove nilfs_btree_operations from btree mapping
will remove indirect function calls using nilfs_btree_operations
table.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-06-10 23:41:11 +09:00
Ryusuke Konishi 355c6b6103 nilfs2: remove nilfs_direct_operations from direct mapping
will remove indirect function calls using nilfs_direct_operations
table.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-06-10 23:41:11 +09:00
Ryusuke Konishi d4b961576d nilfs2: remove bmap pointer operations
Previously, the bmap codes of nilfs used three types of function
tables.  The abuse of indirect function calls decreased source
readability and suffered many indirect jumps which would confuse
branch prediction of processors.

This eliminates one type of the function tables,
nilfs_bmap_ptr_operations, which was used to dispatch low level
pointer operations of the nilfs bmap.

This adds a new integer variable "b_ptr_type" to nilfs_bmap struct,
and uses the value to select the pointer operations.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-06-10 23:41:10 +09:00
Ryusuke Konishi 3033342a0b nilfs2: remove useless b_low and b_high fields from nilfs_bmap struct
This will cut off 16 bytes from the nilfs_bmap struct which is
embedded in the on-memory inode of nilfs.

The b_high field was never used, and the b_low field stores a constant
value which can be determined by whether the inode uses btree for
block mapping or not.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-06-10 23:41:10 +09:00
Ryusuke Konishi e473c1f265 nilfs2: remove pointless NULL check of bpop_commit_alloc_ptr function
This indirect function is set to NULL only for gc cache inodes, but
the gc cache inodes never call this function.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-06-10 23:41:10 +09:00