We need to flush the msgr workqueue during mon_client shutdown to
ensure that any work affecting our embedded ceph_connection is
finished so that we can be safely destroyed.
Previously, we were flushing the work queue after osd_client
shutdown and before mon_client shutdown to ensure that any osd
connection refs to authorizers are flushed. Remove the redundant
flush, and document in the comment that the mon_client flush is
needed to cover that case as well.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
(cherry picked from commit f3dea7edd3)
The bug can cause NULL pointer dereference in write_partial_msg_pages
Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Reviewed-by: Alex Elder <elder@inktank.com>
(cherry picked from commit 43643528cc)
There were a few direct calls to ceph_con_{get,put}() instead of the con
ops from osd_client.c. This is a bug since those ops aren't defined to
be ceph_con_get/put.
This breaks refcounting on the ceph_osd structs that contain the
ceph_connections, and could lead to all manner of strangeness.
The purpose of the ->get and ->put methods in a ceph connection are
to allow the connection to indicate it has a reference to something
external to the messaging system, *not* to indicate something
external has a reference to the connection.
[elder@inktank.com: added that last sentence]
Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Alex Elder <elder@inktank.com>
(cherry picked from commit 0d47766f14)
In ceph_osdc_release_request(), a reference to the r_reply message
is dropped. But just after that, that same message is revoked if it
was in use to receive an incoming reply. Reorder these so we are
sure we hold a reference until we're actually done with the message.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit ab8cb34a4b)
Pull ceph updates from Sage Weil:
"There are some updates and cleanups to the CRUSH placement code, a bug
fix with incremental maps, several cleanups and fixes from Josh Durgin
in the RBD block device code, a series of cleanups and bug fixes from
Alex Elder in the messenger code, and some miscellaneous bounds
checking and gfp cleanups/fixes."
Fix up trivial conflicts in net/ceph/{messenger.c,osdmap.c} due to the
networking people preferring "unsigned int" over just "unsigned".
* git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (45 commits)
libceph: fix pg_temp updates
libceph: avoid unregistering osd request when not registered
ceph: add auth buf in prepare_write_connect()
ceph: rename prepare_connect_authorizer()
ceph: return pointer from prepare_connect_authorizer()
ceph: use info returned by get_authorizer
ceph: have get_authorizer methods return pointers
ceph: ensure auth ops are defined before use
ceph: messenger: reduce args to create_authorizer
ceph: define ceph_auth_handshake type
ceph: messenger: check return from get_authorizer
ceph: messenger: rework prepare_connect_authorizer()
ceph: messenger: check prepare_write_connect() result
ceph: don't set WRITE_PENDING too early
ceph: drop msgr argument from prepare_write_connect()
ceph: messenger: send banner in process_connect()
ceph: messenger: reset connection kvec caller
libceph: don't reset kvec in prepare_write_banner()
ceph: ignore preferred_osd field
ceph: fully initialize new layout
...
Usually, we are adding pg_temp entries or removing them. Occasionally they
update. In that case, osdmap_apply_incremental() was failing because the
rbtree entry already exists.
Fix by removing the existing entry before inserting a new one.
Fixes http://tracker.newdream.net/issues/2446
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
There is a race between two __unregister_request() callers: the
reply path and the ceph_osdc_wait_request(). If we get a reply
*and* the timeout expires at roughly the same time, both callers
will try to unregister the request, and the second one will do bad
things.
Simply check if the request is still already unregistered; if so,
return immediately and do nothing.
Fixes http://tracker.newdream.net/issues/2420
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Move the addition of the authorizer buffer to a connection's
out_kvec out of get_connect_authorizer() and into its caller. This
way, the caller--prepare_write_connect()--can avoid adding the
connect header to out_kvec before it has been fully initialized.
Prior to this patch, it was possible for a connect header to be
sent over the wire before the authorizer protocol or buffer length
fields were initialized. An authorizer buffer associated with that
header could also be queued to send only after the connection header
that describes it was on the wire.
Fixes http://tracker.newdream.net/issues/2424
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Change the name of prepare_connect_authorizer(). The next
patch is going to make this function no longer add anything to the
connection's out_kvec, so it will no longer fit the pattern of
the rest of the prepare_connect_*() functions.
In addition, pass the address of a variable that will hold the
authorization protocol to use. Move the assignment of that to the
connection's out_connect structure into prepare_write_connect().
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Change prepare_connect_authorizer() so it returns a pointer (or
pointer-coded error).
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Rather than passing a bunch of arguments to be filled in with the
content of the ceph_auth_handshake buffer now returned by the
get_authorizer method, just use the returned information in the
caller, and drop the unnecessary arguments.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Have the get_authorizer auth_client method return a ceph_auth
pointer rather than an integer, pointer-encoding any returned
error value. This is to pave the way for making use of the
returned value in an upcoming patch.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
In the create_authorizer method for both the mds and osd clients,
the auth_client->ops pointer is blindly dereferenced. There is no
obvious guarantee that this pointer has been assigned. And
furthermore, even if the ops pointer is non-null there is definitely
no guarantee that the create_authorizer or destroy_authorizer
methods are defined.
Add checks in both routines to make sure they are defined (non-null)
before use. Add similar checks in a few other spots in these files
while we're at it.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Make use of the new ceph_auth_handshake structure in order to reduce
the number of arguments passed to the create_authorizor method in
ceph_auth_client_ops. Use a local variable of that type as a
shorthand in the get_authorizer method definitions.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
The definitions for the ceph_mds_session and ceph_osd both contain
five fields related only to "authorizers." Encapsulate those fields
into their own struct type, allowing for better isolation in some
upcoming patches.
Fix the #includes in "linux/ceph/osd_client.h" to lay out their more
complete canonical path.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
In prepare_connect_authorizer(), a connection's get_authorizer
method is called but ignores its return value. This function can
return an error, so check for it and return it if that ever occurs.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Change prepare_connect_authorizer() so it returns without dropping
the connection mutex if the connection has no get_authorizer method.
Use the symbolic CEPH_AUTH_UNKNOWN instead of 0 when assigning
authorization protocols.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
prepare_write_connect() can return an error, but only one of its
callers checks for it. All the rest are in functions that already
return errors, so it should be fine to return the error if one
gets returned.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
prepare_write_connect() prepares a connect message, then sets
WRITE_PENDING on the connection. Then *after* this, it calls
prepare_connect_authorizer(), which updates the content of the
connection buffer already queued for sending. It's also possible it
will result in prepare_write_connect() returning -EAGAIN despite the
WRITE_PENDING big getting set.
Fix this by preparing the connect authorizer first, setting the
WRITE_PENDING bit only after that is done.
Partially addresses http://tracker.newdream.net/issues/2424
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
In all cases, the value passed as the msgr argument to
prepare_write_connect() is just con->msgr. Just get the msgr
value from the ceph connection and drop the unneeded argument.
The only msgr passed to prepare_write_banner() is also therefore
just the one from con->msgr, so change that function to drop the
msgr argument as well.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
prepare_write_connect() has an argument indicating whether a banner
should be sent out before sending out a connection message. It's
only ever set in one of its callers, so move the code that arranges
to send the banner into that caller and drop the "include_banner"
argument from prepare_write_connect().
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Reset a connection's kvec fields in the caller rather than in
prepare_write_connect(). This ends up repeating a few lines of
code but it's improving the separation between distinct operations
on the connection, which we can take advantage of later.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Move the kvec reset for a connection out of prepare_write_banner and
into its only caller.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Make the second argument to read_partial() be the ending input byte
position rather than the beginning offset it now represents. This
amounts to moving the addition "to + size" into the caller.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
read_partial() always increases whatever "to" value is supplied by
adding the requested size to it, and that's the only thing it does
with that pointed-to value.
Do that pointer advance in the caller (and then only when the
updated value will be subsequently used), and change the "to"
parameter to be an in-only and non-pointer value.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>