Commit Graph

320385 Commits

Author SHA1 Message Date
Linus Torvalds cc8362b1f6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph changes from Sage Weil:
 "Lots of stuff this time around:

   - lots of cleanup and refactoring in the libceph messenger code, and
     many hard to hit races and bugs closed as a result.
   - lots of cleanup and refactoring in the rbd code from Alex Elder,
     mostly in preparation for the layering functionality that will be
     coming in 3.7.
   - some misc rbd cleanups from Josh Durgin that are finally going
     upstream
   - support for CRUSH tunables (used by newer clusters to improve the
     data placement)
   - some cleanup in our use of d_parent that Al brought up a while back
   - a random collection of fixes across the tree

  There is another patch coming that fixes up our ->atomic_open()
  behavior, but I'm going to hammer on it a bit more before sending it."

Fix up conflicts due to commits that were already committed earlier in
drivers/block/rbd.c, net/ceph/{messenger.c, osd_client.c}

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (132 commits)
  rbd: create rbd_refresh_helper()
  rbd: return obj version in __rbd_refresh_header()
  rbd: fixes in rbd_header_from_disk()
  rbd: always pass ops array to rbd_req_sync_op()
  rbd: pass null version pointer in add_snap()
  rbd: make rbd_create_rw_ops() return a pointer
  rbd: have __rbd_add_snap_dev() return a pointer
  libceph: recheck con state after allocating incoming message
  libceph: change ceph_con_in_msg_alloc convention to be less weird
  libceph: avoid dropping con mutex before fault
  libceph: verify state after retaking con lock after dispatch
  libceph: revoke mon_client messages on session restart
  libceph: fix handling of immediate socket connect failure
  ceph: update MAINTAINERS file
  libceph: be less chatty about stray replies
  libceph: clear all flags on con_close
  libceph: clean up con flags
  libceph: replace connection state bits with states
  libceph: drop unnecessary CLOSED check in socket state change callback
  libceph: close socket directly from ceph_con_close()
  ...
2012-07-31 14:35:28 -07:00
Linus Torvalds 2e3ee61348 Merge tag 'writeback-proportions' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux
Pull writeback updates from Wu Fengguang:
 "Use time based periods to age the writeback proportions, which can
  adapt equally well to fast/slow devices."

Fix up trivial conflict in comment in fs/sync.c

* tag 'writeback-proportions' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
  writeback: Fix some comment errors
  block: Convert BDI proportion calculations to flexible proportions
  lib: Fix possible deadlock in flexible proportion code
  lib: Proportions with flexible period
2012-07-30 22:14:04 -07:00
Linus Torvalds 1fad1e9a74 Merge tag 'nfs-for-3.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
 "Features include:
   - More preparatory patches for modularising NFSv2/v3/v4.  Split out
     the various NFSv2/v3/v4-specific code into separate files
   - More preparation for the NFSv4 migration code
   - Ensure that OPEN(O_CREATE) observes the pNFS mds threshold
     parameters
   - pNFS fast failover when the data servers are down
   - Various cleanups and debugging patches"

* tag 'nfs-for-3.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (67 commits)
  nfs: fix fl_type tests in NFSv4 code
  NFS: fix pnfs regression with directio writes
  NFS: fix pnfs regression with directio reads
  sunrpc: clnt: Add missing braces
  nfs: fix stub return type warnings
  NFS: exit_nfs_v4() shouldn't be an __exit function
  SUNRPC: Add a missing spin_unlock to gss_mech_list_pseudoflavors
  NFS: Split out NFS v4 client functions
  NFS: Split out the NFS v4 filesystem types
  NFS: Create a single nfs_clone_super() function
  NFS: Split out NFS v4 server creating code
  NFS: Initialize the NFS v4 client from init_nfs_v4()
  NFS: Move the v4 getroot code to nfs4getroot.c
  NFS: Split out NFS v4 file operations
  NFS: Initialize v4 sysctls from nfs_init_v4()
  NFS: Create an init_nfs_v4() function
  NFS: Split out NFS v4 inode operations
  NFS: Split out NFS v3 inode operations
  NFS: Split out NFS v2 inode operations
  NFS: Clean up nfs4_proc_setclientid() and friends
  ...
2012-07-30 19:16:57 -07:00
Linus Torvalds bbeb0af25f Merge tag 'mfd-for-linus-3.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6
Pull MFD fix from Samuel Ortiz:
 "This one fixes an s5m8767 regulator build breakage due to a merge
  conflict caused by the MFD s5m API changes."

* tag 'mfd-for-linus-3.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6:
  regulator: Fix an s5m8767 build failure
2012-07-30 19:06:25 -07:00
Linus Torvalds 6df419e45d Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
 "This is the first part of the media patches for v3.6.

  This patch series contain:
   - new DVB frontend: rtl2832
   - new video drivers: adv7393
   - some unused files got removed
   - a selection API cleanup between V4L2 and V4L2 subdev API's
   - a major redesign at v4l-ioctl2, in order to clean it up
   - several driver fixes and improvements."

* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (174 commits)
  v4l: Export v4l2-common.h in include/linux/Kbuild
  media: Revert "[media] Terratec Cinergy S2 USB HD Rev.2"
  [media] media: Use pr_info not homegrown pr_reg macro
  [media] Terratec Cinergy S2 USB HD Rev.2
  [media] v4l: Correct conflicting V4L2 subdev selection API documentation
  [media] Feature removal: V4L2 selections API target and flag definitions
  [media] v4l: Unify selection flags documentation
  [media] v4l: Unify selection flags
  [media] v4l: Common documentation for selection targets
  [media] v4l: Unify selection targets across V4L2 and V4L2 subdev interfaces
  [media] v4l: Remove "_ACTUAL" from subdev selection API target definition names
  [media] V4L: Remove "_ACTIVE" from the selection target name definitions
  [media] media: dvb-usb: print mac address via native %pM
  [media] s5p-tv: Use module_i2c_driver in sii9234_drv.c file
  [media] media: gpio-ir-recv: add allowed_protos for platform data
  [media] s5p-jpeg: Use module_platform_driver in jpeg-core.c file
  [media] saa7134: fix spelling of detach in label
  [media] cx88-blackbird: replace ioctl by unlocked_ioctl
  [media] cx88: don't use current_norm
  [media] cx88: fix a number of v4l2-compliance violations
  ...
2012-07-30 19:03:41 -07:00
Alex Elder 1fe5e99321 rbd: create rbd_refresh_helper()
Create a simple helper that handles the common case of calling
__rbd_refresh_header() while holding the ctl_mutex.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-07-30 18:21:46 -07:00
Alex Elder b813623ab9 rbd: return obj version in __rbd_refresh_header()
Add a new parameter to __rbd_refresh_header() through which the
version of the header object is passed back to the caller.  In most
cases this isn't needed.  The main motivation is to normalize
(almost) all calls to __rbd_refresh_header() so they are all
wrapped immediately by mutex_lock()/mutex_unlock().

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-07-30 18:21:46 -07:00
Alex Elder ccece235d3 rbd: fixes in rbd_header_from_disk()
This fixes a few issues in rbd_header_from_disk():
    - There is a check intended to catch overflow, but it's wrong in
      two ways.
	- First, the type we don't want to overflow is size_t, not
	  unsigned int, and there is now a SIZE_MAX we can use for
	  use with that type.
	- Second, we're allocating the snapshot ids and snapshot
	  image sizes separately (each has type u64; on disk they
          grouped together as a rbd_image_header_ondisk structure).
	  So we can use the size of u64 in this overflow check.
    - If there are no snapshots, then there should be no snapshot
      names.  Enforce this, and issue a warning if we encounter a
      header with no snapshots but a non-zero snap_names_len.
    - When saving the snapshot names into the header, be more direct
      in defining the offset in the on-disk structure from which
      they're being copied by using "snap_count" rather than "i"
      in the array index.
    - If an error occurs, the "snapc" and "snap_names" fields are
      freed at the end of the function.  Make those fields be null
      pointers after they're freed, to be explicit that they are
      no longer valid.
    - Finally, move the definition of the local variable "i" to the
      innermost scope in which it's needed.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-07-30 18:21:46 -07:00
Alex Elder 913d2fdcf6 rbd: always pass ops array to rbd_req_sync_op()
All of the callers of rbd_req_sync_op() except one pass a non-null
"ops" pointer.  The only one that does not is rbd_req_sync_read(),
which passes CEPH_OSD_OP_READ as its "opcode" and, CEPH_OSD_FLAG_READ
for "flags".

By allocating the ops array in rbd_req_sync_read() and moving the
special case code for the null ops pointer into it, it becomes
clear that much of that code is not even necessary.

In addition, the "opcode" argument to rbd_req_sync_op() is never
actually used, so get rid of that.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-07-30 18:21:46 -07:00
Alex Elder d67d4be56a rbd: pass null version pointer in add_snap()
rbd_header_add_snap() passes the address of a version variable to
rbd_req_sync_exec(), but it ignores the result.  Just pass a null
pointer instead.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-07-30 18:21:46 -07:00
Alex Elder 57cfc1060f rbd: make rbd_create_rw_ops() return a pointer
Either rbd_create_rw_ops() will succeed, or it will fail because a
memory allocation failed.  Have it just return a valid pointer or
null rather than stuffing a pointer into a provided address and
returning an errno.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-07-30 18:21:46 -07:00
Alex Elder 4e891e0af0 rbd: have __rbd_add_snap_dev() return a pointer
It's not obvious whether the snapshot pointer whose address is
provided to __rbd_add_snap_dev() will be assigned by that function.
Change it to return the snapshot, or a pointer-coded errno in the
event of a failure.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-07-30 18:21:45 -07:00
Sage Weil 6139919133 libceph: recheck con state after allocating incoming message
We drop the lock when calling the ->alloc_msg() con op, which means
we need to (a) not clobber con->in_msg without the mutex held, and (b)
we need to verify that we are still in the OPEN state when we retake
it to avoid causing any mayhem.  If the state does change, -EAGAIN
will get us back to con_work() and loop.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
2012-07-30 18:19:45 -07:00
Sage Weil 4740a623d2 libceph: change ceph_con_in_msg_alloc convention to be less weird
This function's calling convention is very limiting.  In particular,
we can't return any error other than ENOMEM (and only implicitly),
which is a problem (see next patch).

Instead, return an normal 0 or error code, and make the skip a pointer
output parameter.  Drop the useless in_hdr argument (we have the con
pointer).

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
2012-07-30 18:19:30 -07:00
Sage Weil 8636ea672f libceph: avoid dropping con mutex before fault
The ceph_fault() function takes the con mutex, so we should avoid
dropping it before calling it.  This fixes a potential race with
another thread calling ceph_con_close(), or _open(), or similar (we
don't reverify con->state after retaking the lock).

Add annotation so that lockdep realizes we will drop the mutex before
returning.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
2012-07-30 18:17:13 -07:00
Sage Weil 7b862e07b1 libceph: verify state after retaking con lock after dispatch
We drop the con mutex when delivering a message.  When we retake the
lock, we need to verify we are still in the OPEN state before
preparing to read the next tag, or else we risk stepping on a
connection that has been closed.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
2012-07-30 18:16:56 -07:00
Sage Weil 4f471e4a9c libceph: revoke mon_client messages on session restart
Revoke all mon_client messages when we shut down the old connection.
This is mostly moot since we are re-using the same ceph_connection,
but it is cleaner.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
2012-07-30 18:16:40 -07:00
Sage Weil 8007b8d626 libceph: fix handling of immediate socket connect failure
If the connect() call immediately fails such that sock == NULL, we
still need con_close_socket() to reset our socket state to CLOSED.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
2012-07-30 18:16:16 -07:00
Sage Weil 09d9032751 ceph: update MAINTAINERS file
* shiny new inktank.com email addresses
 * add include/linux/crush directory (previous oversight)

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
2012-07-30 18:16:03 -07:00
Sage Weil 756a16a5d5 libceph: be less chatty about stray replies
There are many (normal) conditions that can lead to us getting
unexpected replies, include cluster topology changes, osd failures,
and timeouts.  There's no need to spam the console about it.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
2012-07-30 18:16:03 -07:00
Sage Weil 43c7427d10 libceph: clear all flags on con_close
Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-30 18:16:02 -07:00
Sage Weil 4a86169208 libceph: clean up con flags
Rename flags with CON_FLAG prefix, move the definitions into the c file,
and (better) document their meaning.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-30 18:16:01 -07:00
Sage Weil 8dacc7da69 libceph: replace connection state bits with states
Use a simple set of 6 enumerated values for the socket states (CON_STATE_*)
and use those instead of the state bits.  All of the con->state checks are
now under the protection of the con mutex, so this is safe.  It also
simplifies many of the state checks because we can check for anything other
than the expected state instead of various bits for races we can think of.

This appears to hold up well to stress testing both with and without socket
failure injection on the server side.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-30 18:16:00 -07:00
Sage Weil d7353dd5aa libceph: drop unnecessary CLOSED check in socket state change callback
If we are CLOSED, the socket is closed and we won't get these.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-30 18:15:59 -07:00
Sage Weil ee76e0736d libceph: close socket directly from ceph_con_close()
It is simpler to do this immediately, since we already hold the con mutex.
It also avoids the need to deal with a not-quite-CLOSED socket in con_work.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-30 18:15:58 -07:00