Remove the if and else conditional because the code is in mainline and there
is no need in it being there.
Also, Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.
Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
Associate request with sessions that aren't yep open. This makes the
debugfs mdsc request list more informative.
Signed-off-by: Sage Weil <sage@newdream.net>
The r_mds field is redundant, since we can find the same information at
r_session->s_mds, and when r_session is NULL then r_mds is meaningless.
Signed-off-by: Sage Weil <sage@newdream.net>
This implements the DIRLAYOUTHASH protocol feature, which passes the dir
layout over the wire from the MDS. This gives the client knowledge
of the correct hash function to use for mapping dentries among dir
fragments.
Note that if this feature is _not_ present on the client but is on the
MDS, the client may misdirect requests. This will result in a forward
and degrade performance. It may also result in inaccurate NFS filehandle
generation, which will prevent fh resolution when the inode is not present
in the client cache and the parent directories have been fragmented.
Signed-off-by: Sage Weil <sage@newdream.net>
Add a ceph_dir_layout to the inode, and calculate dentry hash values based
on the parent directory's specified dir_hash function. This is needed
because the old default Linux dcache hash function is extremely week and
leads to a poor distribution of files among dir fragments.
Signed-off-by: Sage Weil <sage@newdream.net>
For read operation, we have to set the argument _write_ of get_user_pages
to 1 since we will write data to pages. Also, we need to SetPageDirty before
releasing these pages.
Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: Sage Weil <sage@newdream.net>
The fh_to_dentry etc. methods use ceph_init_dentry(), which assumes that
d_parent is defined. It isn't for those callers, so check!
Signed-off-by: Sage Weil <sage@newdream.net>
The user buffer may be 512-byte aligned, not page-aligned. We were
assuming the buffer was page-aligned and only accounting for
non-page-aligned io offsets.
Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: Sage Weil <sage@newdream.net>
last may be NULL, but we dereference it in the else branch without
checking. Normally it doesn't trigger because last == NULL when fpos == 2,
but it could happen on a newly opened dir if the user seeks forward.
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: fix readdir EOVERFLOW on 32-bit archs
ceph: fix frag offset for non-leftmost frags
ceph: fix dangling pointer
ceph: explicitly specify page alignment in network messages
ceph: make page alignment explicit in osd interface
ceph: fix comment, remove extraneous args
ceph: fix update of ctime from MDS
ceph: fix version check on racing inode updates
ceph: fix uid/gid on resent mds requests
ceph: fix rdcache_gen usage and invalidate
ceph: re-request max_size if cap auth changes
ceph: only let auth caps update max_size
ceph: fix open for write on clustered mds
ceph: fix bad pointer dereference in ceph_fill_trace
ceph: fix small seq message skipping
Revert "ceph: update issue_seq on cap grant"
One of the readdir filldir_t callers was passing the raw ceph 64-bit ino
instead of the hashed 32-bit one, producing an EOVERFLOW in the filler
callback. Fix this by calling the ceph_vino_to_ino() helper to do the
conversion.
Reported-by: Jan Smets <jan.smets@alcatel-lucent.com>
Tested-by: Jan Smets <jan.smets@alcatel-lucent.com>
Signed-off-by: Sage Weil <sage@newdream.net>
The big kernel lock has been removed from all these files at some point,
leaving only the #include.
Remove this too as a cleanup.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We start at offset 2 for the leftmost frag, and 0 for subsequent frags.
When we reach the end (rightmost), we go back to 2. This fixes readdir on
fragmented (large) directories.
Signed-off-by: Sage Weil <sage@newdream.net>
We used to infer alignment of IOs within a page based on the file offset,
which assumed they matched. This broke with direct IO that was not aligned
to pages (e.g., 512-byte aligned IO). We were also trusting the alignment
specified in the OSD reply, which could have been adjusted by the server.
Explicitly specify the page alignment when setting up OSD IO requests.
Signed-off-by: Sage Weil <sage@newdream.net>
The client can have a newer ctime than the MDS due to AUTH_EXCL and
XATTR_EXCL caps as well; update the check in ceph_fill_file_time
appropriately.
This fixes cases where ctime/mtime goes backward under the right sequence
of local updates (e.g. chmod) and mds replies (e.g. subsequent stat that
goes to the MDS).
Signed-off-by: Sage Weil <sage@newdream.net>
We may get updates on the same inode from multiple MDSs; generally we only
pay attention if the update is newer than what we already have. The
exception is when an MDS sense unstable information, in which case we
always update.
The old > check got this wrong when our version was odd (e.g. 3) and the
reply version was even (e.g. 2): the older stale (v2) info would be
applied. Fixed and clarified the comment.
Signed-off-by: Sage Weil <sage@newdream.net>
MDS requests can be rebuilt and resent in non-process context, but were
filling in uid/gid from current_fsuid/gid. Put that information in the
request struct on request setup.
This fixes incorrect (and root) uid/gid getting set for requests that
are forwarded between MDSs, usually due to metadata migrations.
Signed-off-by: Sage Weil <sage@newdream.net>
We used to use rdcache_gen to indicate whether we "might" have cached
pages. Now we just look at the mapping to determine that. However, some
old behavior remains from that transition.
First, rdcache_gen == 0 no longer means we have no pages. That can happen
at any time (presumably when we carry FILE_CACHE). We should not reset it
to zero, and we should not check that it is zero.
That means that the only purpose for rdcache_revoking is to resolve races
between new issues of FILE_CACHE and an async invalidate. If they are
equal, we should invalidate. On success, we decrement rdcache_revoking,
so that it is no longer equal to rdcache_gen. Similarly, if we success
in doing a sync invalidate, set revoking = gen - 1. (This is a small
optimization to avoid doing unnecessary invalidate work and does not
affect correctness.)
Signed-off-by: Sage Weil <sage@newdream.net>
If the auth cap migrates to another MDS, clear requested_max_size so that
we resend any pending max_size increase requests. This fixes potential
hangs on writes that extend a file and race with an cap migration between
MDSs.
Signed-off-by: Sage Weil <sage@newdream.net>