Nothing we do here sleeps, so just do it under d_lock and avoid the dget/
dput entirely.
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Sage Weil <sage@newdream.net>
We now set d_fsdata unconditionally on all dentries prior to setting up
the d_ops, so all of these checks are unnecessary.
Signed-off-by: Sage Weil <sage@newdream.net>
Ceph attempts to use the dcache to satisfy negative lookups and readdir
when the entire directory contents are in cache. Disable this behavior
until lingering bugs in this code are shaken out; we'll re-enable these
hooks once things are fully stable.
Signed-off-by: Sage Weil <sage@newdream.net>
Commit 06222e491e got the if wrong so that
it always evaluates as true. This is semantically harmless, but makes
SEEK_CUR and SEEK_SET needlessly query the server.
Rewrite the if to explicitly enumerate the cases we DO need a valid i_size
to make this code less fragile.
Reported-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
We have been using i_lock to protect all kinds of data structures in the
ceph_inode_info struct, including lists of inodes that we need to iterate
over while avoiding races with inode destruction. That requires grabbing
a reference to the inode with the list lock protected, but igrab() now
takes i_lock to check the inode flags.
Changing the list lock ordering would be a painful process.
However, using a ceph-specific i_ceph_lock in the ceph inode instead of
i_lock is a simple mechanical change and avoids the ordering constraints
imposed by igrab().
Reported-by: Amon Ott <a.ott@m-privacy.de>
Signed-off-by: Sage Weil <sage@newdream.net>
Set up d_fsdata on the root dentry. This fixes a NULL pointer dereference
in ceph_d_prune on umount. It also means we can eventually strip out all
of the conditional checks on d_fsdata because it is now set unconditionally
(prior to setting up the d_ops).
Fix the ceph_d_prune debug print while we're here.
Signed-off-by: Sage Weil <sage@newdream.net>
If we queue a work item that calls iput(), make sure we ihold() before
attempting to queue work. Otherwise our queued work might miraculously run
before we notice the queue_work() succeeded and call ihold(), allowing the
inode to be destroyed.
That is, instead of
if (queue_work(...))
ihold();
we need to do
ihold();
if (!queue_work(...))
iput();
Reported-by: Amon Ott <a.ott@m-privacy.de>
Signed-off-by: Sage Weil <sage@newdream.net>
Quiet the following sparse noise:
warning: symbol 'get_nonsnap_parent' was not declared. Should it be static?
warning: symbol 'done_closing_sessions' was not declared. Should it be static?
Local functions don't need external visability. Make them static.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Sage Weil <sage@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
We used to use a flag on the directory inode to track whether the dcache
contents for a directory were a complete cached copy. Switch to a dentry
flag CEPH_D_COMPLETE that is safely updated by ->d_prune().
Signed-off-by: Sage Weil <sage@newdream.net>
When the VFS prunes a dentry from the cache, clear the D_COMPLETE flag
on the parent dentry. Do this for the live and snapshotted namespaces. Do
not bother for the .snap dir contents, since we do not cache that.
Signed-off-by: Sage Weil <sage@newdream.net>
Replace remaining direct i_nlink updates with a new set_nlink()
updater function.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
ceph_release_page_vector() kfrees the vector; we shouldn't do it here too.
Reported-by: Jeff Wu <cpwu@tnsoft.com.cn>
Signed-off-by: Sage Weil <sage@newdream.net>
Previously we were validating the passed-in stripe unit, object size,
and stripe count against each other (and not testing most other stuff).
Instead, make sure that the composed previous layout and new values are valid,
and only send the new values to the MDS. This lets users change the
pool without setting the whole layout, for instance.
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
This reverts commit c9af9fb68e.
We need to block and truncate all pages in order to reliably invalidate
them. Otherwise, we could:
- have some uptodate pages in the cache
- queue an invalidate
- write(2) locks some pages
- invalidate_work skips them
- write(2) only overwrites part of the page
- page now dirty and uptodate
-> partial leakage of invalidated data
It's not entirely clear why we started skipping locked pages in the first
place. I just ran this through fsx and didn't see any problems.
Signed-off-by: Sage Weil <sage@newdream.net>
The pool allocation failures are masked by the pool; there is no need to
spam the console about them. (That's the whole point of having the pool
in the first place.)
Mark msg allocations whose failure is safely handled as such.
Signed-off-by: Sage Weil <sage@newdream.net>
This simplifies the init/shutdown paths, and makes client->msgr available
during the rest of the setup process.
Signed-off-by: Sage Weil <sage@newdream.net>
The 'rsize' mount option limits the maximum size of an individual
read(ahead) operation that is sent off to an OSD. This is distinct from
'rasize', which controls the size of the readahead window.
Signed-off-by: Sage Weil <sage@newdream.net>