Commit Graph

40 Commits

Author SHA1 Message Date
Sage Weil
be655596b3 ceph: use i_ceph_lock instead of i_lock
We have been using i_lock to protect all kinds of data structures in the
ceph_inode_info struct, including lists of inodes that we need to iterate
over while avoiding races with inode destruction.  That requires grabbing
a reference to the inode with the list lock protected, but igrab() now
takes i_lock to check the inode flags.

Changing the list lock ordering would be a painful process.

However, using a ceph-specific i_ceph_lock in the ceph inode instead of
i_lock is a simple mechanical change and avoids the ordering constraints
imposed by igrab().

Reported-by: Amon Ott <a.ott@m-privacy.de>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-12-07 10:46:44 -08:00
Sage Weil
3395734067 libceph: fix double-free of page vector
ceph_release_page_vector() kfrees the vector; we shouldn't do it here too.

Reported-by: Jeff Wu <cpwu@tnsoft.com.cn>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-10-25 16:10:17 -07:00
Sage Weil
0d66a487c1 ceph: implement (optional) max read size
The 'rsize' mount option limits the maximum size of an individual
read(ahead) operation that is sent off to an OSD.  This is distinct from
'rasize', which controls the size of the readahead window.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-10-25 16:10:15 -07:00
Sage Weil
7c272194e6 ceph: make readpages fully async
When we get a ->readpages() aop, submit async reads for all page ranges
in the provided page list.  Lock the pages immediately, so that VFS/MM
will block until the reads complete.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-10-25 16:10:14 -07:00
Sage Weil
70b666c3b4 ceph: use ihold when we already have an inode ref
We should use ihold whenever we already have a stable inode ref, even
when we aren't holding i_lock.  This avoids adding new and unnecessary
locking dependencies.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-06-07 21:34:11 -07:00
Sage Weil
9d6fcb081a ceph: check return value for start_request in writepages
Since we pass the nofail arg, we should never get an error; BUG if we do.
(And fix the function to not return an error if __map_request fails.)

Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-19 11:25:05 -07:00
Sage Weil
6b4a3b517a ceph: remove useless check
rc is only ever 0 or negative in this method.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-19 11:25:05 -07:00
Henry C Chang
8c71897be2 ceph: handle ceph_osdc_new_request failure in ceph_writepages_start
We should unlock the page and return -ENOMEM if ceph_osdc_new_request
failed.

Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-05-03 09:28:12 -07:00
Lucas De Marchi
25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
Dave Chinner
0444d76ae6 fs: don't use igrab() while holding i_lock
Fix the incorrect use of igrab() inside the i_lock in NFS and Ceph‥

If we are already holding the i_lock, we have a reference to the
inode so we can safely use ihold() to gain an extra reference. This
avoids hangs due to lock recursion on the i_lock now that the
inode_lock is gone and igrab() uses the i_lock itself.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: Ryan Mallon <ryan@bluewatersys.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-29 07:50:34 -07:00
Linus Torvalds
76db8ac45f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: fix readdir EOVERFLOW on 32-bit archs
  ceph: fix frag offset for non-leftmost frags
  ceph: fix dangling pointer
  ceph: explicitly specify page alignment in network messages
  ceph: make page alignment explicit in osd interface
  ceph: fix comment, remove extraneous args
  ceph: fix update of ctime from MDS
  ceph: fix version check on racing inode updates
  ceph: fix uid/gid on resent mds requests
  ceph: fix rdcache_gen usage and invalidate
  ceph: re-request max_size if cap auth changes
  ceph: only let auth caps update max_size
  ceph: fix open for write on clustered mds
  ceph: fix bad pointer dereference in ceph_fill_trace
  ceph: fix small seq message skipping
  Revert "ceph: update issue_seq on cap grant"
2010-11-19 15:32:22 -08:00
Sage Weil
b7495fc2ff ceph: make page alignment explicit in osd interface
We used to infer alignment of IOs within a page based on the file offset,
which assumed they matched.  This broke with direct IO that was not aligned
to pages (e.g., 512-byte aligned IO).  We were also trusting the alignment
specified in the OSD reply, which could have been adjusted by the server.

Explicitly specify the page alignment when setting up OSD IO requests.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-09 12:43:12 -08:00
Wu Fengguang
1b430beee5 writeback: remove nonblocking/encountered_congestion references
This removes more dead code that was somehow missed by commit 0d99519efe
(writeback: remove unused nonblocking and congestion checks).  There are
no behavior change except for the removal of two entries from one of the
ext4 tracing interface.

The nonblocking checks in ->writepages are no longer used because the
flusher now prefer to block on get_request_wait() than to skip inodes on
IO congestion.  The latter will lead to more seeky IO.

The nonblocking checks in ->writepage are no longer used because it's
redundant with the WB_SYNC_NONE check.

We no long set ->nonblocking in VM page out and page migration, because
a) it's effectively redundant with WB_SYNC_NONE in current code
b) it's old semantic of "Don't get stuck on request queues" is mis-behavior:
   that would skip some dirty inodes on congestion and page out others, which
   is unfair in terms of LRU age.

Inspired by Christoph Hellwig. Thanks!

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: David Howells <dhowells@redhat.com>
Cc: Sage Weil <sage@newdream.net>
Cc: Steve French <sfrench@samba.org>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 16:52:05 -07:00
Yehuda Sadeh
3d14c5d2b6 ceph: factor out libceph from Ceph file system
This factors out protocol and low-level storage parts of ceph into a
separate libceph module living in net/ceph and include/linux/ceph.  This
is mostly a matter of moving files around.  However, a few key pieces
of the interface change as well:

 - ceph_client becomes ceph_fs_client and ceph_client, where the latter
   captures the mon and osd clients, and the fs_client gets the mds client
   and file system specific pieces.
 - Mount option parsing and debugfs setup is correspondingly broken into
   two pieces.
 - The mon client gets a generic handler callback for otherwise unknown
   messages (mds map, in this case).
 - The basic supported/required feature bits can be expanded (and are by
   ceph_fs_client).

No functional change, aside from some subtle error handling cases that got
cleaned up in the refactoring process.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-20 15:37:28 -07:00
Sage Weil
ae00d4f37f ceph: fix cap_snap and realm split
The cap_snap creation/queueing relies on both the current i_head_snapc
_and_ the i_snap_realm pointers being correct, so that the new cap_snap
can properly reference the old context and the new i_head_snapc can be
updated to reference the new snaprealm's context.  To fix this, we:

 - move inodes completely to the new (split) realm so that i_snap_realm
   is correct, and
 - generate the new snapc's _before_ queueing the cap_snaps in
   ceph_update_snap_trace().

Signed-off-by: Sage Weil <sage@newdream.net>
2010-09-16 16:26:51 -07:00
Sage Weil
a77d9f7dce ceph: fix file offset wrapping at 4GB on 32-bit archs
Cast the value before shifting so that we don't run out of bits with a
32-bit unsigned long.  This fixes wrapping of high file offsets into the
low 4GB of a file on disk, and the subsequent data corruption for large
files.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-09-11 10:55:25 -07:00
Sage Weil
7d8cb26d7d ceph: maintain i_head_snapc when any caps are dirty, not just for data
We used to use i_head_snapc to keep track of which snapc the current epoch
of dirty data was dirtied under.  It is used by queue_cap_snap to set up
the cap_snap.  However, since we queue cap snaps for any dirty caps, not
just for dirty file data, we need to keep a valid i_head_snapc anytime
we have dirty|flushing caps.  This fixes a NULL pointer deref in
queue_cap_snap when writing back dirty caps without data (e.g.,
snaptest-authwb.sh).

Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-24 16:24:18 -07:00
Michael Rubin
679ceace84 mm: exporting account_page_dirty
This allows code outside of the mm core to safely manipulate page state
and not worry about the other accounting. Not using these routines means
that some code will lose track of the accounting and we get bugs. This
has happened once already.

Signed-off-by: Michael Rubin <mrubin@google.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-22 15:16:51 -07:00
Sage Weil
213c99ee0c ceph: whitespace cleanup
Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-03 10:25:11 -07:00
Sage Weil
2962507ca2 ceph: perform lazy reads when file mode and caps permit
If the file mode is marked as "lazy," perform cached/buffered reads when
the caps permit it.  Adjust the rdcache_gen and invalidation logic
accordingly so that we manage our cache based on the FILE_CACHE -or-
FILE_LAZYIO cap bits.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01 20:11:39 -07:00
Cheng Renquan
640ef79d27 ceph: use ceph_sb_to_client instead of ceph_client
ceph_sb_to_client and ceph_client are really identical, we need to dump
one; while function ceph_client is confusing with "struct ceph_client",
ceph_sb_to_client's definition is more clear; so we'd better switch all
call to ceph_sb_to_client.

  -static inline struct ceph_client *ceph_client(struct super_block *sb)
  -{
  -	return sb->s_fs_info;
  -}

Signed-off-by: Cheng Renquan <crquan@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:17 -07:00
Yehuda Sadeh
31459fe4b2 ceph: use __page_cache_alloc and add_to_page_cache_lru
Following Nick Piggin patches in btrfs, pagecache pages should be
allocated with __page_cache_alloc, so they obey pagecache memory
policies.

Also, using add_to_page_cache_lru instead of using a private
pagevec where applicable.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17 15:25:12 -07:00
Sage Weil
54ad023ba8 ceph: don't use writeback_control in writepages completion
The ->writepages writeback_control is not still valid in the writepages
completion.  We were touching it solely to adjust pages_skipped when there
was a writeback error (EIO, ENOSPC, EPERM due to bad osd credentials),
causing an oops in the writeback code shortly thereafter.  Updating
pages_skipped on error isn't correct anyway, so let's just rip out this
(clearly broken) code to pass the wbc to the completion.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-05 21:31:40 -07:00
Sage Weil
7ff899da02 ceph: fix lockless caps check
The __ variant requires caller to hold i_lock.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-03 10:49:25 -07:00
Linus Torvalds
96e35b40c0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: use separate class for ceph sockets' sk_lock
  ceph: reserve one more caps space when doing readdir
  ceph: queue_cap_snap should always queue dirty context
  ceph: fix dentry reference leak in dcache readdir
  ceph: decode v5 of osdmap (pool names) [protocol change]
  ceph: fix ack counter reset on connection reset
  ceph: fix leaked inode ref due to snap metadata writeback race
  ceph: fix snap context reference leaks
  ceph: allow writeback of snapped pages older than 'oldest' snapc
  ceph: fix dentry rehashing on virtual .snap dir
2010-04-14 18:45:31 -07:00