Commit Graph

519 Commits

Author SHA1 Message Date
Dave Chinner 0444d76ae6 fs: don't use igrab() while holding i_lock
Fix the incorrect use of igrab() inside the i_lock in NFS and Ceph‥

If we are already holding the i_lock, we have a reference to the
inode so we can safely use ihold() to gain an extra reference. This
avoids hangs due to lock recursion on the i_lock now that the
inode_lock is gone and igrab() uses the i_lock itself.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: Ryan Mallon <ryan@bluewatersys.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-29 07:50:34 -07:00
Sage Weil 147851d2dc ceph: rename dentry_release -> d_release, fix comment
Just for consistency's sake.  Fix obsolete comment too.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-21 12:24:26 -07:00
Henry C Chang 49bcb93236 ceph: add request to the tail of unsafe write list
In sync_write_wait(), we assume that the newest request is at the
tail of unsafe write list. We should maintain the semantics here.

Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-21 12:24:25 -07:00
Henry C Chang 78a255654f ceph: remove request from unsafe list if it is canceled/timed out
This fixes the list corruption warning like this:

------------[ cut here ]------------
WARNING: at lib/list_debug.c:30 __list_add+0x68/0x81()
Hardware name: X8DTU
list_add corruption. prev->next should be next (ffff880618931250), but was (null). (prev=ffff880c188b9130).
Modules linked in: nfsd lockd nfs_acl auth_rpcgss exportfs ceph libceph libcrc32c sunrpc ipv6 fuse igb i2c_i801 ioatdma i2c_core iTCO_wdt iTCO_vendor_support joydev dca serio_raw usb_storage [last unloaded: scsi_wait_scan]
Pid: 10977, comm: smbd Tainted: G        W  2.6.32.23-170.Elaster.xendom0.fc12.x86_64 #1
Call Trace:
[<ffffffff8105753c>] warn_slowpath_common+0x7c/0x94
[<ffffffff810575ab>] warn_slowpath_fmt+0x41/0x43
[<ffffffff812351a3>] __list_add+0x68/0x81
[<ffffffffa014799d>] ceph_aio_write+0x614/0x8a2 [ceph]
[<ffffffff8111d2a0>] do_sync_write+0xe8/0x125
[<ffffffff81075a1f>] ? autoremove_wake_function+0x0/0x39
[<ffffffff811f21ec>] ? selinux_file_permission+0x5c/0xb3
[<ffffffff811e8521>] ? security_file_permission+0x16/0x18
[<ffffffff8111d864>] vfs_write+0xae/0x10b
[<ffffffff8111d91b>] sys_pwrite64+0x5a/0x76
[<ffffffff81012d32>] system_call_fastpath+0x16/0x1b
---[ end trace 08573eb9f07ff6f4 ]---

Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-21 12:24:24 -07:00
Sage Weil 80456f8672 ceph: move readahead default to fs/ceph from libceph
Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-21 12:24:23 -07:00
Yehuda Sadeh ad1fee96cb ceph: add ino32 mount option
The ino32 mount option forces the ceph fs to report 32 bit
ino values.  This is useful for 64 bit kernels with 32 bit userspace.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2011-03-21 12:24:22 -07:00
Sage Weil 21f3b5f1bb ceph: remove debugfs debug cruft
Whoops!

Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-21 12:24:20 -07:00
Sage Weil 09adc80c61 ceph: preserve I_COMPLETE across rename
d_move puts the renamed dentry at the end of d_subdirs, screwing with our
cached dentry directory offsets.  We were just clearing I_COMPLETE to avoid
any possibility of trouble.  However, assigning the renamed dentry an
offset at the end of the directory (to match it's new d_subdirs position)
is sufficient to maintain correct behavior and hold onto I_COMPLETE.

This is especially important for workloads like rsync, which renames files
into place.  Before, we would lose I_COMPLETE and do MDS lookups for each
file.  With this patch we only talk to the MDS on create and rename.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-15 09:14:03 -07:00
Al Viro 0eb980e317 ceph: fix d_revalidate oopsen on NFS exports
can't blindly check nd->flags in ->d_revalidate()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-10 03:44:05 -05:00
Sage Weil 455cec0abf ceph: no .snap inside of snapped namespace
Otherwise you can do things like

# mkdir .snap/foo
# cd .snap/foo/.snap
# ls
<badness>

Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-04 12:25:09 -08:00
Sage Weil 16a8b70a5a ceph: do not clear I_COMPLETE from d_release
First, this was racy anyway: d_release isn't called until well after the
dentry is unhashed.  Second, this runs afoul of the recent dcache change
that clears d_parent prior to calling d_release (949854d0), causing a NULL
pointer dereference.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-03 10:09:52 -08:00
Sage Weil b545cc1505 ceph: do not set I_COMPLETE
Do not set the I_COMPLETE flag on directories until we resolve races with
dcache pruning.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-03 10:09:51 -08:00
Sage Weil 9bde178d05 Revert "ceph: keep reference to parent inode on ceph_dentry"
This reverts commit 97d79b403e.

This fails to account for d_parent changes due to rename or disconnected
dentries due to submounts or NFS reexports.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-03 10:09:50 -08:00
Linus Torvalds 8bd89ca220 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: keep reference to parent inode on ceph_dentry
  ceph: queue cap_snaps once per realm
  libceph: fix socket write error handling
  libceph: fix socket read error handling
2011-02-21 15:01:38 -08:00
Yehuda Sadeh 97d79b403e ceph: keep reference to parent inode on ceph_dentry
When creating a new dentry we now hold a reference to the parent
inode in the ceph_dentry.  This is required due to the new RCU
changes from 949854d0, which set dentry->d_parent to NULL in d_kill before
calling the ->release() callback.  If/when that behavior is changed, we can
revert this hack.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-02-19 19:59:14 -08:00
Sage Weil e8e1ba96b2 ceph: queue cap_snaps once per realm
We were forming a dirty list, and then queueing cap_snaps for each realm
_and_ its children, regardless of whether the children were already in the
dirty list.  This meant we did it twice for some realms.  Which in turn
meant we corrupted mdsc->snap_flush_list when the cap_snap was re-added to
the list it was already on, and could trigger an infinite loop.

We were also using recursion to do reach all the children, a no-no when
stack is limited.

Instead, (re)queue any children on the dirty list, avoiding processing
anything twice and avoiding any recursion.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-02-04 20:45:58 -08:00
Linus Torvalds b12ece7d85 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: avoid picking MDS that is not active
  ceph: avoid immediate cap check after import
  ceph: fix flushing of caps vs cap import
  ceph: fix erroneous cap flush to non-auth mds
  ceph: fix cap_wanted_delay_{min,max} mount option initialization
  ceph: fix xattr rbtree search
  ceph: fix getattr on directory when using norbytes
2011-01-28 12:12:58 +10:00
Sage Weil d66bbd441c ceph: avoid picking MDS that is not active
Ignore replication or auth frag data if it indicates an MDS that is not
active.  This can happen if the MDS shuts down and the client has stale
data about the namespace distribution across the MDS cluster.  If that's
the case, fall back to directing the request based on the auth cap (which
should always be accurate).

Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-25 08:16:37 -08:00
Sage Weil 7e57b81c76 ceph: avoid immediate cap check after import
The NODELAY flag avoids the heuristics that delay cap (issued/wanted)
release.  There's no reason for that after we import a cap, and it kills
whatever benefit we get from those delays.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-19 09:23:26 -08:00
Sage Weil 088b3f5e9e ceph: fix flushing of caps vs cap import
If we are mid-flush and a cap is migrated to another node, we need to
resend the cap flush message to the new MDS, and do so with the original
flush_seq to avoid leaking across a sync boundary.  Previously we didn't
redo the flush (we only flushed newly dirty data), which would cause a
later sync to hang forever.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-19 09:23:25 -08:00
Sage Weil 24be0c4810 ceph: fix erroneous cap flush to non-auth mds
The int flushing is global and not clear on each iteration of the loop,
which can cause a second flush of caps to any MDSs with ids greater than
the auth.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-19 09:23:24 -08:00
Sage Weil 50aac4fec5 ceph: fix cap_wanted_delay_{min,max} mount option initialization
These were initialized to 0 instead of the default, fallout from the RBD
refactor in 3d14c5d2b6.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-19 09:23:22 -08:00
Sage Weil 17db143fc0 ceph: fix xattr rbtree search
Fix xattr name comparison in rbtree search for strings that share a prefix.
The *name argument is null terminated, but the xattr name is not, so we
need to use strncmp, but that means adjusting for the case where name is
a prefix of xattr->name.

The corresponding case in __set_xattr() already handles this properly
(although in that case *name is also not null terminated).

Reported-by: Sergiy Kibrik <sakib@meta.ua>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-13 15:50:11 -08:00
Yehuda Sadeh 1c1266bb91 ceph: fix getattr on directory when using norbytes
The norbytes mount option was broken, and when doing getattr
on a directory it return the rbytes instead of the number of
entities. This commit fixes it.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-13 15:50:06 -08:00
Linus Torvalds a170315420 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  rbd: fix cleanup when trying to mount inexistent image
  net/ceph: make ceph_msgr_wq non-reentrant
  ceph: fsc->*_wq's aren't used in memory reclaim path
  ceph: Always free allocated memory in osdmap_decode()
  ceph: Makefile: Remove unnessary code
  ceph: associate requests with opening sessions
  ceph: drop redundant r_mds field
  ceph: implement DIRLAYOUTHASH feature to get dir layout from MDS
  ceph: add dir_layout to inode
2011-01-13 10:25:24 -08:00