Pull NFS client updates from Trond Myklebust:
"Highlights include:
- stable fix for a bug in nfs3_list_one_acl()
- speed up NFS path walks by supporting LOOKUP_RCU
- more read/write code cleanups
- pNFS fixes for layout return on close
- fixes for the RCU handling in the rpcsec_gss code
- more NFS/RDMA fixes"
* tag 'nfs-for-3.17-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (79 commits)
nfs: reject changes to resvport and sharecache during remount
NFS: Avoid infinite loop when RELEASE_LOCKOWNER getting expired error
SUNRPC: remove all refcounting of groupinfo from rpcauth_lookupcred
NFS: fix two problems in lookup_revalidate in RCU-walk
NFS: allow lockless access to access_cache
NFS: teach nfs_lookup_verify_inode to handle LOOKUP_RCU
NFS: teach nfs_neg_need_reval to understand LOOKUP_RCU
NFS: support RCU_WALK in nfs_permission()
sunrpc/auth: allow lockless (rcu) lookup of credential cache.
NFS: prepare for RCU-walk support but pushing tests later in code.
NFS: nfs4_lookup_revalidate: only evaluate parent if it will be used.
NFS: add checks for returned value of try_module_get()
nfs: clear_request_commit while holding i_lock
pnfs: add pnfs_put_lseg_async
pnfs: find swapped pages on pnfs commit lists too
nfs: fix comment and add warn_on for PG_INODE_REF
nfs: check wait_on_bit_lock err in page_group_lock
sunrpc: remove "ec" argument from encrypt_v2 operation
sunrpc: clean up sparse endianness warnings in gss_krb5_wrap.c
sunrpc: clean up sparse endianness warnings in gss_krb5_seal.c
...
Pull nfsd updates from Bruce Fields:
"This includes a major rewrite of the NFSv4 state code, which has
always depended on a single mutex. As an example, open creates are no
longer serialized, fixing a performance regression on NFSv3->NFSv4
upgrades. Thanks to Jeff, Trond, and Benny, and to Christoph for
review.
Also some RDMA fixes from Chuck Lever and Steve Wise, and
miscellaneous fixes from Kinglong Mee and others"
* 'for-3.17' of git://linux-nfs.org/~bfields/linux: (167 commits)
svcrdma: remove rdma_create_qp() failure recovery logic
nfsd: add some comments to the nfsd4 object definitions
nfsd: remove the client_mutex and the nfs4_lock/unlock_state wrappers
nfsd: remove nfs4_lock_state: nfs4_state_shutdown_net
nfsd: remove nfs4_lock_state: nfs4_laundromat
nfsd: Remove nfs4_lock_state(): reclaim_complete()
nfsd: Remove nfs4_lock_state(): setclientid, setclientid_confirm, renew
nfsd: Remove nfs4_lock_state(): exchange_id, create/destroy_session()
nfsd: Remove nfs4_lock_state(): nfsd4_open and nfsd4_open_confirm
nfsd: Remove nfs4_lock_state(): nfsd4_delegreturn()
nfsd: Remove nfs4_lock_state(): nfsd4_open_downgrade + nfsd4_close
nfsd: Remove nfs4_lock_state(): nfsd4_lock/locku/lockt()
nfsd: Remove nfs4_lock_state(): nfsd4_release_lockowner
nfsd: Remove nfs4_lock_state(): nfsd4_test_stateid/nfsd4_free_stateid
nfsd: Remove nfs4_lock_state(): nfs4_preprocess_stateid_op()
nfsd: remove old fault injection infrastructure
nfsd: add more granular locking to *_delegations fault injectors
nfsd: add more granular locking to forget_openowners fault injector
nfsd: add more granular locking to forget_locks fault injector
nfsd: add a list_head arg to nfsd_foreach_client_lock
...
The new flag RPCAUTH_LOOKUP_RCU to credential lookup avoids locking,
does not take a reference on the returned credential, and returns
-ECHILD if a simple lookup was not possible.
The returned value can only be used within an rcu_read_lock protected
region.
The main user of this is the new rpc_lookup_cred_nonblock() which
returns a pointer to the current credential which is only rcu-safe (no
ref-count held), and might return -ECHILD if allocation was required.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* 'nfs-rdma' of git://git.linux-nfs.org/projects/anna/nfs-rdma: (916 commits)
xprtrdma: Handle additional connection events
xprtrdma: Remove RPCRDMA_PERSISTENT_REGISTRATION macro
xprtrdma: Make rpcrdma_ep_disconnect() return void
xprtrdma: Schedule reply tasklet once per upcall
xprtrdma: Allocate each struct rpcrdma_mw separately
xprtrdma: Rename frmr_wr
xprtrdma: Disable completions for LOCAL_INV Work Requests
xprtrdma: Disable completions for FAST_REG_MR Work Requests
xprtrdma: Don't post a LOCAL_INV in rpcrdma_register_frmr_external()
xprtrdma: Reset FRMRs after a flushed LOCAL_INV Work Request
xprtrdma: Reset FRMRs when FAST_REG_MR is flushed by a disconnect
xprtrdma: Properly handle exhaustion of the rb_mws list
xprtrdma: Chain together all MWs in same buffer pool
xprtrdma: Back off rkey when FAST_REG_MR fails
xprtrdma: Unclutter struct rpcrdma_mr_seg
xprtrdma: Don't invalidate FRMRs if registration fails
xprtrdma: On disconnect, don't ignore pending CQEs
xprtrdma: Update rkeys after transport reconnect
xprtrdma: Limit data payload size for ALLPHYSICAL
xprtrdma: Protect ia->ri_id when unmapping/invalidating MRs
...
Clean up.
RPCRDMA_PERSISTENT_REGISTRATION was a compile-time switch between
RPCRDMA_REGISTER mode and RPCRDMA_ALLPHYSICAL mode. Since
RPCRDMA_REGISTER has been removed, there's no need for the extra
conditional compilation.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Shirley Ma <shirley.ma@oracle.com>
Tested-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
The RDMA credit limit controls how many concurrent RPCs are allowed
per connection.
An NFS/RDMA client and server exchange their credit limits in the
RPC/RDMA headers. The Linux client and the Solaris client and server
allow 32 credits. The Linux server allows only 16, which limits its
performance.
Set the server's default credit limit to 32, like the other well-
known implementations, so the out-of-the-shrinkwrap performance of
the Linux server is better.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The current code always selects XPRT_TRANSPORT_BC_TCP for the back
channel, even when the forward channel was not TCP (eg, RDMA). When
a 4.1 mount is attempted with RDMA, the server panics in the TCP BC
code when trying to send CB_NULL.
Instead, construct the transport protocol number from the forward
channel transport or'd with XPRT_TRANSPORT_BC. Transports that do
not support bi-directional RPC will not have registered a "BC"
transport, causing create_backchannel_client() to fail immediately.
Fixes: https://bugzilla.linux-nfs.org/show_bug.cgi?id=265
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
It is currently not possible for various wait_on_bit functions
to implement a timeout.
While the "action" function that is called to do the waiting
could certainly use schedule_timeout(), there is no way to carry
forward the remaining timeout after a false wake-up.
As false-wakeups a clearly possible at least due to possible
hash collisions in bit_waitqueue(), this is a real problem.
The 'action' function is currently passed a pointer to the word
containing the bit being waited on. No current action functions
use this pointer. So changing it to something else will be a
little noisy but will have no immediate effect.
This patch changes the 'action' function to take a pointer to
the "struct wait_bit_key", which contains a pointer to the word
containing the bit so nothing is really lost.
It also adds a 'private' field to "struct wait_bit_key", which
is initialized to zero.
An action function can now implement a timeout with something
like
static int timed_out_waiter(struct wait_bit_key *key)
{
unsigned long waited;
if (key->private == 0) {
key->private = jiffies;
if (key->private == 0)
key->private -= 1;
}
waited = jiffies - key->private;
if (waited > 10 * HZ)
return -EAGAIN;
schedule_timeout(waited - 10 * HZ);
return 0;
}
If any other need for context in a waiter were found it would be
easy to use ->private for some other purpose, or even extend
"struct wait_bit_key".
My particular need is to support timeouts in nfs_release_page()
to avoid deadlocks with loopback mounted NFS.
While wait_on_bit_timeout() would be a cleaner interface, it
will not meet my need. I need the timeout to be sensitive to
the state of the connection with the server, which could change.
So I need to use an 'action' interface.
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steve French <sfrench@samba.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140707051604.28027.41257.stgit@notabene.brown
Signed-off-by: Ingo Molnar <mingo@kernel.org>
...and add an new rpc_auth function to call it when it exists. This
is only applicable for AUTH_GSS mechanisms, so we only specify this
for those sorts of credentials.
Signed-off-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
If rpc.gssd sends us an acceptor name string trailing the context token,
stash it as part of the context.
Signed-off-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
rq_usedeferral and rq_splice_ok are used as 0 and 1, just defined to bool.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pull NFS client updates from Trond Myklebust:
"Highlights include:
- massive cleanup of the NFS read/write code by Anna and Dros
- support multiple NFS read/write requests per page in order to deal
with non-page aligned pNFS striping. Also cleans up the r/wsize <
page size code nicely.
- stable fix for ensuring inode is declared uptodate only after all
the attributes have been checked.
- stable fix for a kernel Oops when remounting
- NFS over RDMA client fixes
- move the pNFS files layout driver into its own subdirectory"
* tag 'nfs-for-3.16-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (79 commits)
NFS: populate ->net in mount data when remounting
pnfs: fix lockup caused by pnfs_generic_pg_test
NFSv4.1: Fix typo in dprintk
NFSv4.1: Comment is now wrong and redundant to code
NFS: Use raw_write_seqcount_begin/end int nfs4_reclaim_open_state
xprtrdma: Disconnect on registration failure
xprtrdma: Remove BUG_ON() call sites
xprtrdma: Avoid deadlock when credit window is reset
SUNRPC: Move congestion window constants to header file
xprtrdma: Reset connection timeout after successful reconnect
xprtrdma: Use macros for reconnection timeout constants
xprtrdma: Allocate missing pagelist
xprtrdma: Remove Tavor MTU setting
xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting
xprtrdma: Reduce the number of hardway buffer allocations
xprtrdma: Limit work done by completion handler
xprtrmda: Reduce calls to ib_poll_cq() in completion handlers
xprtrmda: Reduce lock contention in completion handlers
xprtrdma: Split the completion queue
xprtrdma: Make rpcrdma_ep_destroy() return void
...
Pull nfsd updates from Bruce Fields:
"The largest piece is a long-overdue rewrite of the xdr code to remove
some annoying limitations: for example, there was no way to return
ACLs larger than 4K, and readdir results were returned only in 4k
chunks, limiting performance on large directories.
Also:
- part of Neil Brown's work to make NFS work reliably over the
loopback interface (so client and server can run on the same
machine without deadlocks). The rest of it is coming through
other trees.
- cleanup and bugfixes for some of the server RDMA code, from
Steve Wise.
- Various cleanup of NFSv4 state code in preparation for an
overhaul of the locking, from Jeff, Trond, and Benny.
- smaller bugfixes and cleanup from Christoph Hellwig and
Kinglong Mee.
Thanks to everyone!
This summer looks likely to be busier than usual for knfsd. Hopefully
we won't break it too badly; testing definitely welcomed"
* 'for-3.16' of git://linux-nfs.org/~bfields/linux: (100 commits)
nfsd4: fix FREE_STATEID lockowner leak
svcrdma: Fence LOCAL_INV work requests
svcrdma: refactor marshalling logic
nfsd: don't halt scanning the DRC LRU list when there's an RC_INPROG entry
nfs4: remove unused CHANGE_SECURITY_LABEL
nfsd4: kill READ64
nfsd4: kill READ32
nfsd4: simplify server xdr->next_page use
nfsd4: hash deleg stateid only on successful nfs4_set_delegation
nfsd4: rename recall_lock to state_lock
nfsd: remove unneeded zeroing of fields in nfsd4_proc_compound
nfsd: fix setting of NFS4_OO_CONFIRMED in nfsd4_open
nfsd4: use recall_lock for delegation hashing
nfsd: fix laundromat next-run-time calculation
nfsd: make nfsd4_encode_fattr static
SUNRPC/NFSD: Remove using of dprintk with KERN_WARNING
nfsd: remove unused function nfsd_read_file
nfsd: getattr for FATTR4_WORD0_FILES_AVAIL needs the statfs buffer
NFSD: Error out when getting more than one fsloc/secinfo/uuid
NFSD: Using type of uint32_t for ex_nflavors instead of int
...
This patch refactors the NFSRDMA server marshalling logic to
remove the intermediary map structures. It also fixes an existing bug
where the NFSRDMA server was not minding the device fast register page
list length limitations.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
I would like to use one of the RPC client's congestion algorithm
constants in transport-specific code.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
RPC_MAX_AUTH_SIZE is scattered around several places. Better to set it
once in the auth code, where this kind of estimate should be made. And
while we're at it we can leave it zero when we're not using krb5i or
krb5p.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
After this we can handle for example getattr of very large ACLs.
Read, readdir, readlink are still special cases with their own limits.
Also we can't handle a new operation starting close to the end of a
page.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This will be used in the server side in a few cases:
- when certain operations (read, readdir, readlink) fail after
encoding a partial response.
- when we run out of space after encoding a partial response.
- in readlink, where we initially reserve PAGE_SIZE bytes for
data, then truncate to the actual size.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
If an incoming NFS request is coming from the local host, then
nfsd will need to perform some special handling. So detect that
possibility and make the source visible in rq_local.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
An NFS/RDMA client's source port is meaningless for RDMA transports.
The transport layer typically sets the source port value on the
connection to a random ephemeral port.
Currently, NFS server administrators must specify the "insecure"
export option to enable clients to access exports via RDMA.
But this means NFS clients can access such an export via IP using an
ephemeral port, which may not be desirable.
This patch eliminates the need to specify the "insecure" export
option to allow NFS/RDMA clients access to an export.
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=250
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>