Pull nfsd updates from Bruce Fields:
"This consists mostly of nfsd container work:
Scott Mayhew revived an old api that communicates with a userspace
daemon to manage some on-disk state that's used to track clients
across server reboots. We've been using a usermode_helper upcall for
that, but it's tough to run those with the right namespaces, so a
daemon is much friendlier to container use cases.
Trond fixed nfsd's handling of user credentials in user namespaces. He
also contributed patches that allow containers to support different
sets of NFS protocol versions.
The only remaining container bug I'm aware of is that the NFS reply
cache is shared between all containers. If anyone's aware of other
gaps in our container support, let me know.
The rest of this is miscellaneous bugfixes"
* tag 'nfsd-5.2' of git://linux-nfs.org/~bfields/linux: (23 commits)
nfsd: update callback done processing
locks: move checks from locks_free_lock() to locks_release_private()
nfsd: fh_drop_write in nfsd_unlink
nfsd: allow fh_want_write to be called twice
nfsd: knfsd must use the container user namespace
SUNRPC: rsi_parse() should use the current user namespace
SUNRPC: Fix the server AUTH_UNIX userspace mappings
lockd: Pass the user cred from knfsd when starting the lockd server
SUNRPC: Temporary sockets should inherit the cred from their parent
SUNRPC: Cache the process user cred in the RPC server listener
nfsd: Allow containers to set supported nfs versions
nfsd: Add custom rpcbind callbacks for knfsd
SUNRPC: Allow further customisation of RPC program registration
SUNRPC: Clean up generic dispatcher code
SUNRPC: Add a callback to initialise server requests
SUNRPC/nfs: Fix return value for nfs4_callback_compound()
nfsd: handle legacy client tracking records sent by nfsdcld
nfsd: re-order client tracking method selection
nfsd: keep a tally of RECLAIM_COMPLETE operations when using nfsdcld
nfsd: un-deprecate nfsdcld
...
When converting kuids to AUTH_UNIX creds, etc we will want to use the
same user namespace as the process that created the rpc client.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Add the 'softerr' rpc client flag that sets the RPC_TASK_TIMEOUT
flag on all new rpc tasks that are attached to that rpc client.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Add variables to track RPC level errors so that we can distinguish
between issue that arose in the RPC transport layer as opposed to
those arising from the reply message.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Clean up the RPC task sleep interfaces by replacing the task->tk_timeout
'hidden parameter' to rpc_sleep_on() with a new function that takes an
absolute timeout.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Convert the transport callback to actually put the request to sleep
instead of just setting a timeout. This is in preparation for
rpc_sleep_on_timeout().
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
The RPC_TASK_KILLED flag should really not be set from another context
because it can clobber data in the struct task when task->tk_flags is
changed non-atomically.
Let's therefore swap out RPC_TASK_KILLED with an atomic flag, and add
a function to set that flag and safely wake up the task.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
In order to be able to interpret uids and gids correctly in knfsd, we
should cache the user namespace of the process that created the RPC
server's listener. To do so, we refcount the credential of that process.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Add a callback to allow customisation of the rpcbind registration.
When clients have the ability to turn on and off version support,
we want to allow them to also prevent registration of those
versions with the rpc portmapper.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Add a callback to help initialise server requests before they are
processed. This will allow us to clean up the NFS server version
support, and to make it container safe.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
RPC server procedures are normally expected to return a __be32 encoded
status value of type 'enum rpc_accept_stat', however at least one function
wants to return an authentication status of type 'enum rpc_auth_stat'
in the case where authentication fails.
This patch adds functionality to allow this.
Fixes: a4e187d83d ("NFS: Don't drop CB requests with invalid principals")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This reverts commit 009a82f643.
The ability to optimise here relies on compiler being able to optimise
away tail calls to avoid stack overflows. Unfortunately, we are seeing
reports of problems, so let's just revert.
Reported-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
In cases where we know the task is not sleeping, try to optimise
away the indirect call to task->tk_action() by replacing it with
a direct call.
Only change tail calls, to allow gcc to perform tail call
elimination.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If a layout segment gets invalidated while a pNFS I/O operation
is queued for transmission, then we ideally want to abort
immediately. This is particularly the case when there is a large
number of I/O related RPCs queued in the RPC layer, and the layout
segment gets invalidated due to an ENOSPC error, or an EACCES (because
the client was fenced). We may end up forced to spam the MDS with a
lot of otherwise unnecessary LAYOUTERRORs after that I/O fails.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
udp_poll() checks the struct file for the O_NONBLOCK flag, so we must not
call it with a NULL file pointer.
Fixes: 0ffe86f480 ("SUNRPC: Use poll() to fix up the socket requeue races")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Currently rpc_inline_rcv_pages() uses au_rslack to estimate the
size of the upper layer reply header. This is fine for auth flavors
where au_verfsize == au_rslack.
However, some auth flavors have more going on. krb5i for example has
two more words after the verifier, and another blob following the
RPC message. The calculation involving au_rslack pushes the upper
layer reply header too far into the rcv_buf.
au_rslack is still valuable: it's the amount of buffer space needed
for the reply, and is used when allocating the reply buffer. We'll
keep that.
But, add a new field that can be used to properly estimate the
location of the upper layer header in each RPC reply, based on the
auth flavor in use.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
au_verfsize will be needed for a non-flavor-specific computation
in a subsequent patch.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
prepare_reply_buffer() and its NFSv4 equivalents expose the details
of the RPC header and the auth slack values to upper layer
consumers, creating a layering violation, and duplicating code.
Remedy these issues by adding a new RPC client API that hides those
details from upper layers in a common helper function.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
The key action of xdr_buf_trim() is that it shortens buf->len, the
length of the xdr_buf's content. The other actions -- shortening the
head, pages, and tail components -- are actually not necessary. In
particular, changing the size of those components can corrupt the
RPC message contained in the buffer. This is an accident waiting to
happen rather than a current bug, as far as we know.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
- Recover some instruction count because I'm about to introduce a
few xdr_inline_decode call sites
- Replace dprintk() call sites with trace points
- Reduce the hot path so it fits in fewer cachelines
I've also renamed it rpc_decode_header() to match everything else
in the RPC client.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>