Commit Graph

446 Commits

Author SHA1 Message Date
Alex Elder c851c49591 libceph: record bio length
The bio is the only data item type that doesn't record its full
length.  Fix that.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:28 -07:00
Alex Elder f759ebb968 libceph: skip message if too big to receive
We know the length of our message buffers.  If we get a message
that's too long, just dump it and ignore it.  If skip was set
then con->in_msg won't be valid, so be careful not to dereference
a null pointer in the process.

This resolves:
    http://tracker.ceph.com/issues/4664

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:27 -07:00
Alex Elder ea96571f7b libceph: fix possible CONFIG_BLOCK build problem
This patch:
    15a0d7b libceph: record message data length
did not enclose some bio-specific code inside CONFIG_BLOCK as
it should have.  Fix that.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:26 -07:00
Alex Elder 5476492fba libceph: kill off osd request r_data_in and r_data_out
Finally!  Convert the osd op data pointers into real structures, and
make the switch over to using them instead of having all ops share
the in and/or out data structures in the osd request.

Set up a new function to traverse the set of ops and release any
data associated with them (pages).

This and the patches leading up to it resolve:
    http://tracker.ceph.com/issues/4657

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:25 -07:00
Alex Elder ec9123c567 libceph: set the data pointers when encoding ops
Still using the osd request r_data_in and r_data_out pointer, but
we're basically only referring to it via the data pointers in the
osd ops.  And we're transferring that information to the request
or reply message only when the op indicates it's needed, in
osd_req_encode_op().

To avoid a forward reference, ceph_osdc_msg_data_set() was moved up
in the file.

Don't bother calling ceph_osd_data_init(), in ceph_osd_alloc(),
because the ops array will already be zeroed anyway.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:24 -07:00
Alex Elder a4ce40a9a7 libceph: combine initializing and setting osd data
This ends up being a rather large patch but what it's doing is
somewhat straightforward.

Basically, this is replacing two calls with one.  The first of the
two calls is initializing a struct ceph_osd_data with data (either a
page array, a page list, or a bio list); the second is setting an
osd request op so it associates that data with one of the op's
parameters.  In place of those two will be a single function that
initializes the op directly.

That means we sort of fan out a set of the needed functions:
    - extent ops with pages data
    - extent ops with pagelist data
    - extent ops with bio list data
and
    - class ops with page data for receiving a response

We also have define another one, but it's only used internally:
    - class ops with pagelist data for request parameters

Note that we *still* haven't gotten rid of the osd request's
r_data_in and r_data_out fields.  All the osd ops refer to them for
their data.  For now, these data fields are pointers assigned to the
appropriate r_data_* field when these new functions are called.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:23 -07:00
Alex Elder 39b44cbe86 libceph: set message data when building osd request
All calls of ceph_osdc_start_request() are preceded (in the case of
rbd, almost) immediately by a call to ceph_osdc_build_request().

Move the build calls at the top of ceph_osdc_start_request() out of
there and into the ceph_osdc_build_request().  Nothing prevents
moving these calls to the top of ceph_osdc_build_request(), either
(and we're going to want them there in the next patch) so put them
at the top.

This and the next patch are related to:
    http://tracker.ceph.com/issues/4657

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:22 -07:00
Alex Elder e65550fd94 libceph: move ceph_osdc_build_request()
This simply moves ceph_osdc_build_request() later in its source
file without any change.  Done as a separate patch to facilitate
review of the change in the next patch.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:21 -07:00
Alex Elder 5f562df5f5 libceph: format class info at init time
An object class method is formatted using a pagelist which contains
the class name, the method name, and the data concatenated into an
osd request's outbound data.

Currently when a class op is initialized in osd_req_op_cls_init(),
the lengths of and pointers to these three items are recorded.
Later, when the op is getting formatted into the request message, a
new pagelist is created and that is when these items get copied into
the pagelist.

This patch makes it so the pagelist to hold these items is created
when the op is initialized instead.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:19 -07:00
Alex Elder c99d2d4abb libceph: specify osd op by index in request
An osd request now holds all of its source op structures, and every
place that initializes one of these is in fact initializing one
of the entries in the the osd request's array.

So rather than supplying the address of the op to initialize, have
caller specify the osd request and an indication of which op it
would like to initialize.  This better hides the details the
op structure (and faciltates moving the data pointers they use).

Since osd_req_op_init() is a common routine, and it's not used
outside the osd client code, give it static scope.  Also make
it return the address of the specified op (so all the other
init routines don't have to repeat that code).

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:15 -07:00
Alex Elder 8c042b0df9 libceph: add data pointers in osd op structures
An extent type osd operation currently implies that there will
be corresponding data supplied in the data portion of the request
(for write) or response (for read) message.  Similarly, an osd class
method operation implies a data item will be supplied to receive
the response data from the operation.

Add a ceph_osd_data pointer to each of those structures, and assign
it to point to eithre the incoming or the outgoing data structure in
the osd message.  The data is not always available when an op is
initially set up, so add two new functions to allow setting them
after the op has been initialized.

Begin to make use of the data item pointer available in the osd
operation rather than the request data in or out structure in
places where it's convenient.  Add some assertions to verify
pointers are always set the way they're expected to be.

This is a sort of stepping stone toward really moving the data
into the osd request ops, to allow for some validation before
making that jump.

This is the first in a series of patches that resolve:
    http://tracker.ceph.com/issues/4657

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:14 -07:00
Alex Elder 54d5064912 libceph: rename data out field in osd request op
There are fields "indata" and "indata_len" defined the ceph osd
request op structure.  The "in" part is with from the point of view
of the osd server, but is a little confusing here on the client
side.  Change their names to use "request" instead of "in" to
indicate that it defines data provided with the request (as opposed
the data returned in the response).

Rename the local variable in osd_req_encode_op() to match.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:13 -07:00
Alex Elder 79528734f3 libceph: keep source rather than message osd op array
An osd request keeps a pointer to the osd operations (ops) array
that it builds in its request message.

In order to allow each op in the array to have its own distinct
data, we will need to keep track of each op's data, and that
information does not go over the wire.

As long as we're tracking the data we might as well just track the
entire (source) op definition for each of the ops.  And if we're
doing that, we'll have no more need to keep a pointer to the
wire-encoded version.

This patch makes the array of source ops be kept with the osd
request structure, and uses that instead of the version encoded in
the message in places where that was previously used.  The array
will be embedded in the request structure, and the maximum number of
ops we ever actually use is currently 2.  So reduce CEPH_OSD_MAX_OP
to 2 to reduce the size of the structure.

The result of doing this sort of ripples back up, and as a result
various function parameters and local variables become unnecessary.

Make r_num_ops be unsigned, and move the definition of struct
ceph_osd_req_op earlier to ensure it's defined where needed.

It does not yet add per-op data, that's coming soon.

This resolves:
    http://tracker.ceph.com/issues/4656

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:12 -07:00
Alex Elder 23c08a9cb2 libceph: define ceph_osd_data_length()
One more osd data helper, which returns the length of the
data item, regardless of its type.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:09 -07:00
Alex Elder c54d47bfad libceph: define a few more helpers
Define ceph_osd_data_init() and ceph_osd_data_release() to clean up
a little code.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:08 -07:00
Alex Elder 43bfe5de9f libceph: define osd data initialization helpers
Define and use functions that encapsulate the initializion of a
ceph_osd_data structure.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:06 -07:00
Alex Elder 9fc6e06471 libceph: compute incoming bytes once
This is a simple change, extracting the number of incoming data
bytes just once in handle_reply().

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:05 -07:00
Alex Elder 98fa5dd883 libceph: provide data length when preparing message
In prepare_message_data(), the length used to initialize the cursor
is taken from the header of the message provided.  I'm working
toward not using the header data length field to determine length in
outbound messages, and this is a step in that direction.  For
inbound messages this will be set to be the actual number of bytes
that are arriving (which may be less than the total size of the data
buffer available).

This resolves:
    http://tracker.ceph.com/issues/4589

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:03 -07:00
Alex Elder e5975c7c8e ceph: build osd request message later for writepages
Hold off building the osd request message in ceph_writepages_start()
until just before it will be submitted to the osd client for
execution.

We'll still create the request and allocate the page pointer array
after we learn we have at least one page to write.  A local variable
will be used to keep track of the allocated array of pages.  Wait
until just before submitting the request for assigning that page
array pointer to the request message.

Create ands use a new function osd_req_op_extent_update() whose
purpose is to serve this one spot where the length value supplied
when an osd request's op was initially formatted might need to get
changed (reduced, never increased) before submitting the request.

Previously, ceph_writepages_start() assigned the message header's
data length because of this update.  That's no longer necessary,
because ceph_osdc_build_request() will recalculate the right
value to use based on the content of the ops in the request.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:02 -07:00
Alex Elder 02ee07d300 libceph: hold off building osd request
Defer building the osd request until just before submitting it in
all callers except ceph_writepages_start().  (That caller will be
handed in the next patch.)

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:01 -07:00
Alex Elder acead002b2 libceph: don't build request in ceph_osdc_new_request()
This patch moves the call to ceph_osdc_build_request() out of
ceph_osdc_new_request() and into its caller.

This is in order to defer formatting osd operation information into
the request message until just before request is started.

The only unusual (ab)user of ceph_osdc_build_request() is
ceph_writepages_start(), where the final length of write request may
change (downward) based on the current inode size or the oldest
snapshot context with dirty data for the inode.

The remaining callers don't change anything in the request after has
been built.

This means the ops array is now supplied by the caller.  It also
means there is no need to pass the mtime to ceph_osdc_new_request()
(it gets provided to ceph_osdc_build_request()).  And rather than
passing a do_sync flag, have the number of ops in the ops array
supplied imply adding a second STARTSYNC operation after the READ or
WRITE requested.

This and some of the patches that follow are related to having the
messenger (only) be responsible for filling the content of the
message header, as described here:
    http://tracker.ceph.com/issues/4589

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:17:58 -07:00
Alex Elder a193080481 libceph: record message data length
Keep track of the length of the data portion for a message in a
separate field in the ceph_msg structure.  This information has
been maintained in wire byte order in the message header, but
that's going to change soon.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:17:57 -07:00
Alex Elder ace6d3a96f libceph: drop ceph_osd_request->r_con_filling_msg
A field in an osd request keeps track of whether a connection is
currently filling the request's reply message.  This patch gets rid
of that field.

An osd request includes two messages--a request and a reply--and
they're both associated with the connection that existed to its
the target osd at the time the request was created.

An osd request can be dropped early, even when it's in flight.
And at that time both messages are released.  It's possible the
reply message has been supplied to its connection to receive
an incoming response message at the time the osd request gets
dropped.  So ceph_osdc_release_request() revokes that message
from the connection before releasing it so things get cleaned up
properly.

Previously this may have caused a problem, because the connection
that a message was associated with might have gone away before the
revoke request.  And to avoid any problems using that connection,
the osd client held a reference to it when it supplies its response
message.

However since this commit:
    38941f80 libceph: have messages point to their connection
all messages hold a reference to the connection they are associated
with whenever the connection is actively operating on the message
(i.e. while the message is queued to send or sending, and when it
data is being received into it).  And if a message has no connection
associated with it, ceph_msg_revoke_incoming() won't do anything
when asked to revoke it.

As a result, there is no need to keep an additional reference to the
connection associated with a message when we hand the message to the
messenger when it calls our alloc_msg() method to receive something.
If the connection *were* operating on it, it would have its own
reference, and if not, there's no work to be done when we need to
revoke it.

So get rid of the osd request's r_con_filling_msg field.

This resolves:
    http://tracker.ceph.com/issues/4647

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:17:54 -07:00
Alex Elder ef4859d647 libceph: define ceph_decode_pgid() only once
There are two basically identical definitions of __decode_pgid()
in libceph, one in "net/ceph/osdmap.c" and the other in
"net/ceph/osd_client.c".  Get rid of both, and instead define
a single inline version in "include/linux/ceph/osdmap.h".

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:17:52 -07:00
Alex Elder 8058fd4503 libceph: drop mutex on error in handle_reply()
The osd client mutex is acquired just before getting a reference to
a request in handle_reply().  However the error paths after that
don't drop the mutex before returning as they should.

Drop the mutex after dropping the request reference.  Also add a
bad_mutex label at that point and use it so the failed request
lookup case can be handled with the rest.

This resolves:
    http://tracker.ceph.com/issues/4615

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-05-01 21:17:51 -07:00