Commit Graph

22146 Commits

Author SHA1 Message Date
Dave Chinner
0444d76ae6 fs: don't use igrab() while holding i_lock
Fix the incorrect use of igrab() inside the i_lock in NFS and Ceph‥

If we are already holding the i_lock, we have a reference to the
inode so we can safely use ihold() to gain an extra reference. This
avoids hangs due to lock recursion on the i_lock now that the
inode_lock is gone and igrab() uses the i_lock itself.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: Ryan Mallon <ryan@bluewatersys.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-29 07:50:34 -07:00
Linus Torvalds
c5850150d0 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: stop using the page cache to back the buffer cache
  xfs: register the inode cache shrinker before quotachecks
  xfs: xfs_trans_read_buf() should return an error on failure
  xfs: introduce inode cluster buffer trylocks for xfs_iflush
  vmap: flush vmap aliases when mapping fails
  xfs: preallocation transactions do not need to be synchronous

Fix up trivial conflicts in fs/xfs/linux-2.6/xfs_buf.c due to plug removal.
2011-03-28 15:51:02 -07:00
Linus Torvalds
7f5fe3ec8e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6:
  eCryptfs: write lock requested keys
  eCryptfs: move ecryptfs_find_auth_tok_for_sig() call before mutex_lock
  eCryptfs: verify authentication tokens before their use
  eCryptfs: modified size of keysig in the ecryptfs_key_sig structure
  eCryptfs: removed num_global_auth_toks from ecryptfs_mount_crypt_stat
  eCryptfs: ecryptfs_keyring_auth_tok_for_sig() bug fix
  eCryptfs: Unlock page in write_begin error path
  ecryptfs: modify write path to encrypt page in writepage
  eCryptfs: Remove ECRYPTFS_NEW_FILE crypt stat flag
  eCryptfs: Remove unnecessary grow_file() function
2011-03-28 15:43:25 -07:00
Linus Torvalds
212a17ab87 Merge branch 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (45 commits)
  Btrfs: fix __btrfs_map_block on 32 bit machines
  btrfs: fix possible deadlock by clearing __GFP_FS flag
  btrfs: check link counter overflow in link(2)
  btrfs: don't mess with i_nlink of unlocked inode in rename()
  Btrfs: check return value of btrfs_alloc_path()
  Btrfs: fix OOPS of empty filesystem after balance
  Btrfs: fix memory leak of empty filesystem after balance
  Btrfs: fix return value of setflags ioctl
  Btrfs: fix uncheck memory allocations
  btrfs: make inode ref log recovery faster
  Btrfs: add btrfs_trim_fs() to handle FITRIM
  Btrfs: adjust btrfs_discard_extent() return errors and trimmed bytes
  Btrfs: make btrfs_map_block() return entire free extent for each device of RAID0/1/10/DUP
  Btrfs: make update_reserved_bytes() public
  btrfs: return EXDEV when linking from different subvolumes
  Btrfs: Per file/directory controls for COW and compression
  Btrfs: add datacow flag in inode flag
  btrfs: use GFP_NOFS instead of GFP_KERNEL
  Btrfs: check return value of read_tree_block()
  btrfs: properly access unaligned checksum buffer
  ...

Fix up trivial conflicts in fs/btrfs/volumes.c due to plug removal in
the block layer.
2011-03-28 15:31:05 -07:00
Linus Torvalds
03e4970c10 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (39 commits)
  Treat writes as new when holes span across page boundaries
  fs,ocfs2: Move o2net_get_func_run_time under CONFIG_OCFS2_FS_STATS.
  ocfs2/dlm: Move kmalloc() outside the spinlock
  ocfs2: Make the left masklogs compat.
  ocfs2: Remove masklog ML_AIO.
  ocfs2: Remove masklog ML_UPTODATE.
  ocfs2: Remove masklog ML_BH_IO.
  ocfs2: Remove masklog ML_JOURNAL.
  ocfs2: Remove masklog ML_EXPORT.
  ocfs2: Remove masklog ML_DCACHE.
  ocfs2: Remove masklog ML_NAMEI.
  ocfs2: Remove mlog(0) from fs/ocfs2/dir.c
  ocfs2: remove NAMEI from symlink.c
  ocfs2: Remove masklog ML_QUOTA.
  ocfs2: Remove mlog(0) from quota_local.c.
  ocfs2: Remove masklog ML_RESERVATIONS.
  ocfs2: Remove masklog ML_XATTR.
  ocfs2: Remove masklog ML_SUPER.
  ocfs2: Remove mlog(0) from fs/ocfs2/heartbeat.c
  ocfs2: Remove mlog(0) from fs/ocfs2/slot_map.c
  ...

Fix up trivial conflict in fs/ocfs2/super.c
2011-03-28 13:03:31 -07:00
Goldwyn Rodrigues
272b62c1f0 Treat writes as new when holes span across page boundaries
When a hole spans across page boundaries, the next write forces
a read of the block. This could end up reading existing garbage
data from the disk in ocfs2_map_page_blocks. This leads to
non-zero holes. In order to avoid this, mark the writes as new
when the holes span across page boundaries.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de>
Signed-off-by: jlbec <jlbec@evilplan.org>
2011-03-28 09:44:58 -07:00
Joel Becker
99bdc3880c Merge branch 'mlog_replace_for_39' of git://repo.or.cz/taoma-kernel into ocfs2-merge-window-fix 2011-03-28 09:44:26 -07:00
Rakib Mullick
ed59992e8d fs,ocfs2: Move o2net_get_func_run_time under CONFIG_OCFS2_FS_STATS.
When CONFIG_DEBUG_FS=y and CONFIG_OCFS2_FS_STATS=n, we get the
following warning:

fs/ocfs2/cluster/tcp.c:213:16: warning: ‘o2net_get_func_run_time’
defined but not used

Since o2net_get_func_run_time is only called from
o2net_update_recv_stats, so move it under CONFIG_OCFS2_FS_STATS.

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Signed-off-by: jlbec <jlbec@evilplan.org>
2011-03-28 09:43:28 -07:00
Linus Torvalds
1788c208aa Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: Ensure that rpc_release_resources_task() can be called twice.
  NFS: Don't leak RPC clients in NFSv4 secinfo negotiation
  NFS: Fix a hang in the writeback path
2011-03-28 07:52:58 -07:00
Chris Mason
d9d0487932 Btrfs: fix __btrfs_map_block on 32 bit machines
Recent changes for discard support didn't compile,
this fixes them not to try and % 64 bit numbers.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-03-28 05:37:59 -04:00
Miao Xie
1561deda68 btrfs: fix possible deadlock by clearing __GFP_FS flag
Using the GFP_HIGHUSER_MOVABLE flag to allocate the metadata's page may cause
deadlock.
  Task1
  open()
    ...
    btrfs_search_slot()
      ...
      btrfs_cow_block()
	...
	alloc_page()
	  wait for reclaiming
					shrink_slab()
					  ...
					  shrink_icache_memory()
					    ...
					    btrfs_evict_inode()
					      ...
					      btrfs_search_slot()

If the path is locked by task1, the deadlock happens.

So the btree's page cache is different with the file's page cache, it can not
allocate pages by GFP_HIGHUSER_MOVABLE flag, we must clear __GFP_FS flag in
GFP_HIGHUSER_MOVABLE flag.

Reported-by: Itaru Kitayama <kitayama@cl.bb4u.ne.jp>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-03-28 05:37:58 -04:00
Al Viro
c055e99eea btrfs: check link counter overflow in link(2)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-03-28 05:37:56 -04:00
Al Viro
92986796d8 btrfs: don't mess with i_nlink of unlocked inode in rename()
old_inode is not locked; it's not safe to play with its link
count.  Instead of bumping it and calling btrfs_unlink_inode(),
add a variant of the latter that does not do btrfs_drop_nlink()/
btrfs_update_inode(), call it instead of btrfs_inc_nlink()/
btrfs_unlink_inode() and do btrfs_update_inode() ourselves.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-03-28 05:37:55 -04:00
Tsutomu Itoh
c2db1073fd Btrfs: check return value of btrfs_alloc_path()
Adding the check on the return value of btrfs_alloc_path() to several places.
And, some of callers are modified by this change.

Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-03-28 05:37:54 -04:00
liubo
c59021f846 Btrfs: fix OOPS of empty filesystem after balance
btrfs will remove unused block groups after balance.
When a empty filesystem is balanced, the block group with tag "DATA" may be
dropped, and after umount and mount again, it will not find "DATA" space_info
and lead to OOPS.
So we initial the necessary space_infos(DATA, SYSTEM, METADATA) to avoid OOPS.

Reported-by: Daniel J Blueman <daniel.blueman@gmail.com>
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-03-28 05:37:53 -04:00
liubo
9f7c43c967 Btrfs: fix memory leak of empty filesystem after balance
After Josef's patch(commit 3c14874acc),
btrfs will exclude super bytes when reading block groups(by marking a extent
state UPTODATE).  However, these bytes do not get freed while balance remove
unused block groups, and we won't process those removed ones any more, when
we do umount and unload the btrfs module,  btrfs hits a memory leak.

This patch add the missing free operation.

Reproduce steps:
$ mkfs.btrfs disk
$ mount disk /mnt/btrfs -o loop
$ btrfs filesystem balance /mnt/btrfs
$ umount /mnt/btrfs
$ rmmod btrfs

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-03-28 05:37:52 -04:00
liubo
2d4e6f6ad2 Btrfs: fix return value of setflags ioctl
setflags ioctl should return error when any checks fail.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-03-28 05:37:51 -04:00
Yoshinori Sano
dac97e516c Btrfs: fix uncheck memory allocations
To make Btrfs code more robust, several return value checks where memory
allocation can fail are introduced. I use BUG_ON where I don't know how
to handle the error properly, which increases the number of using the
notorious BUG_ON, though.

Signed-off-by: Yoshinori Sano <yoshinori.sano@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-03-28 05:37:49 -04:00
liubo
c622ae6085 btrfs: make inode ref log recovery faster
When we recover from crash via write-ahead log tree and process
the inode refs, for each btrfs_inode_ref item, we will
1) check if we already have a perfect match in fs/file tree, if
   we have, then we're done.
2) search the corresponding back reference in fs/file tree, and
   check all the names in this back reference to see if they are
   also in the log to avoid conflict corners.
3) recover the logged inode refs to fs/file tree.

In current btrfs, however,
- for 2)'s check, once is enough, since the checked back reference
  will remain unchanged after processing all the inode refs belonged
  to the key.
- it has no need to do another 1) between 2) and 3).

I've made a small test to show how it improves,

$dd if=/dev/zero of=foobar bs=4K count=1
$sync
$make 100 hard links continuously, like ln foobar link_i
$fsync foobar
$echo b > /proc/sysrq-trigger
after reboot
$time mount DEV PATH

without patch:
real    0m0.285s
user    0m0.001s
sys     0m0.009s

with patch:
real    0m0.123s
user    0m0.000s
sys     0m0.010s

Changelog v1->v2:
- fix double free - pointed by David Sterba
Changelog v2->v3:
- adjust free order

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-03-28 05:37:48 -04:00
Li Dongyang
f7039b1d5c Btrfs: add btrfs_trim_fs() to handle FITRIM
We take an free extent out from allocator, trim it, then put it back,
but before we trim the block group, we should make sure the block group is
cached, so plus a little change to make cache_block_group() run without a
transaction.

Signed-off-by: Li Dongyang <lidongyang@novell.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-03-28 05:37:47 -04:00
Li Dongyang
5378e60734 Btrfs: adjust btrfs_discard_extent() return errors and trimmed bytes
Callers of btrfs_discard_extent() should check if we are mounted with -o discard,
as we want to make fitrim to work even the fs is not mounted with -o discard.
Also we should use REQ_DISCARD to map the free extent to get a full mapping,
last we only return errors if
1. the error is not a EOPNOTSUPP
2. no device supports discard

Signed-off-by: Li Dongyang <lidongyang@novell.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-03-28 05:37:46 -04:00
Li Dongyang
fce3bb9a1b Btrfs: make btrfs_map_block() return entire free extent for each device of RAID0/1/10/DUP
btrfs_map_block() will only return a single stripe length, but we want the
full extent be mapped to each disk when we are trimming the extent,
so we add length to btrfs_bio_stripe and fill it if we are mapping for REQ_DISCARD.

Signed-off-by: Li Dongyang <lidongyang@novell.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-03-28 05:37:45 -04:00
Li Dongyang
b4d00d569a Btrfs: make update_reserved_bytes() public
Make the function public as we should update the reserved extents calculations
after taking out an extent for trimming.

Signed-off-by: Li Dongyang <lidongyang@novell.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-03-28 05:37:43 -04:00
Mark Fasheh
3ab3564f01 btrfs: return EXDEV when linking from different subvolumes
btrfs_link returns EPERM if a cross-subvolume link is attempted.

However, in this case I believe EXDEV to be the more appropriate value.
>From the link(2) man page:

EXDEV  oldpath and newpath are not on the same mounted file system.  (Linux
       permits a file system to be mounted at multiple points, but link()
       does not work across different mount points, even if the same file
       system is mounted on both.)

This matters because an application may have different behaviors based on
return codes.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-03-28 05:37:42 -04:00
Liu Bo
75e7cb7fe0 Btrfs: Per file/directory controls for COW and compression
Data compression and data cow are controlled across the entire FS by mount
options right now.  ioctls are needed to set this on a per file or per
directory basis.  This has been proposed previously, but VFS developers
wanted us to use generic ioctls rather than btrfs-specific ones.

According to Chris's comment, there should be just one true compression
method(probably LZO) stored in the super.  However, before this, we would
wait for that one method is stable enough to be adopted into the super.
So I list it as a long term goal, and just store it in ram today.

After applying this patch, we can use the generic "FS_IOC_SETFLAGS" ioctl to
control file and directory's datacow and compression attribute.

NOTE:
 - The compression type is selected by such rules:
   If we mount btrfs with compress options, ie, zlib/lzo, the type is it.
   Otherwise, we'll use the default compress type (zlib today).

v1->v2:
- rebase to the latest btrfs.
v2->v3:
- fix a problem, i.e. when a file is set NOCOW via mount option, then this NOCOW
  will be screwed by inheritance from parent directory.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-03-28 05:37:41 -04:00