Commit Graph

1084 Commits

Author SHA1 Message Date
Linus Torvalds
4710e78940 Merge tag 'nfs-for-4.20-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
 "Highlights include:

  Bugfix:
   - Fix build issues on architectures that don't provide 64-bit cmpxchg

  Cleanups:
   - Fix a spelling mistake"

* tag 'nfs-for-4.20-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: fix spelling mistake, EACCESS -> EACCES
  SUNRPC: Use atomic(64)_t for seq_send(64)
2018-11-04 08:20:09 -08:00
Paul Burton
c3be6577d8 SUNRPC: Use atomic(64)_t for seq_send(64)
The seq_send & seq_send64 fields in struct krb5_ctx are used as
atomically incrementing counters. This is implemented using cmpxchg() &
cmpxchg64() to implement what amount to custom versions of
atomic_fetch_inc() & atomic64_fetch_inc().

Besides the duplication, using cmpxchg64() has another major drawback in
that some 32 bit architectures don't provide it. As such commit
571ed1fd23 ("SUNRPC: Replace krb5_seq_lock with a lockless scheme")
resulted in build failures for some architectures.

Change seq_send to be an atomic_t and seq_send64 to be an atomic64_t,
then use atomic(64)_* functions to manipulate the values. The atomic64_t
type & associated functions are provided even on architectures which
lack real 64 bit atomic memory access via CONFIG_GENERIC_ATOMIC64 which
uses spinlocks to serialize access. This fixes the build failures for
architectures lacking cmpxchg64().

A potential alternative that was raised would be to provide cmpxchg64()
on the 32 bit architectures that currently lack it, using spinlocks.
However this would provide a version of cmpxchg64() with semantics a
little different to the implementations on architectures with real 64
bit atomics - the spinlock-based implementation would only work if all
access to the memory used with cmpxchg64() is *always* performed using
cmpxchg64(). That is not currently a requirement for users of
cmpxchg64(), and making it one seems questionable. As such avoiding
cmpxchg64() outside of architecture-specific code seems best,
particularly in cases where atomic64_t seems like a better fit anyway.

The CONFIG_GENERIC_ATOMIC64 implementation of atomic64_* functions will
use spinlocks & so faces the same issue, but with the key difference
that the memory backing an atomic64_t ought to always be accessed via
the atomic64_* functions anyway making the issue moot.

Signed-off-by: Paul Burton <paul.burton@mips.com>
Fixes: 571ed1fd23 ("SUNRPC: Replace krb5_seq_lock with a lockless scheme")
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: linux-nfs@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-11-01 13:55:24 -04:00
Linus Torvalds
310c7585e8 Merge tag 'nfsd-4.20' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
 "Olga added support for the NFSv4.2 asynchronous copy protocol. We
  already supported COPY, by copying a limited amount of data and then
  returning a short result, letting the client resend. The asynchronous
  protocol should offer better performance at the expense of some
  complexity.

  The other highlight is Trond's work to convert the duplicate reply
  cache to a red-black tree, and to move it and some other server caches
  to RCU. (Previously these have meant taking global spinlocks on every
  RPC)

  Otherwise, some RDMA work and miscellaneous bugfixes"

* tag 'nfsd-4.20' of git://linux-nfs.org/~bfields/linux: (30 commits)
  lockd: fix access beyond unterminated strings in prints
  nfsd: Fix an Oops in free_session()
  nfsd: correctly decrement odstate refcount in error path
  svcrdma: Increase the default connection credit limit
  svcrdma: Remove try_module_get from backchannel
  svcrdma: Remove ->release_rqst call in bc reply handler
  svcrdma: Reduce max_send_sges
  nfsd: fix fall-through annotations
  knfsd: Improve lookup performance in the duplicate reply cache using an rbtree
  knfsd: Further simplify the cache lookup
  knfsd: Simplify NFS duplicate replay cache
  knfsd: Remove dead code from nfsd_cache_lookup
  SUNRPC: Simplify TCP receive code
  SUNRPC: Replace the cache_detail->hash_lock with a regular spinlock
  SUNRPC: Remove non-RCU protected lookup
  NFS: Fix up a typo in nfs_dns_ent_put
  NFS: Lockless DNS lookups
  knfsd: Lockless lookup of NFSv4 identities.
  SUNRPC: Lockless server RPCSEC_GSS context lookup
  knfsd: Allow lockless lookups of the exports
  ...
2018-10-30 13:03:29 -07:00
Chuck Lever
3ae2cefb61 svcrdma: Increase the default connection credit limit
Reduce queuing on clients by allowing more credits by default.

64 is the default NFSv4.1 slot table size on Linux clients. This
size prevents the credit limit from putting RPC requests to sleep
again after they have already slept waiting for a session slot.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-29 16:58:04 -04:00
Trond Myklebust
1863d77f15 SUNRPC: Replace the cache_detail->hash_lock with a regular spinlock
Now that the reader functions are all RCU protected, use a regular
spinlock rather than a reader/writer lock.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-29 16:58:04 -04:00
Trond Myklebust
d48cf356a1 SUNRPC: Remove non-RCU protected lookup
Clean up the cache code by removing the non-RCU protected lookup.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-29 16:58:04 -04:00
Trond Myklebust
ae74136b4b SUNRPC: Allow cache lookups to use RCU protection rather than the r/w spinlock
Instead of the reader/writer spinlock, allow cache lookups to use RCU
for looking up entries. This is more efficient since modifications can
occur while other entries are being looked up.

Note that for now, we keep the reader/writer spinlock until all users
have been converted to use RCU-safe freeing of their cache entries.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-29 16:57:59 -04:00
Linus Torvalds
c7a2c49ea6 Merge tag 'nfs-for-4.20-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Stable fixes:
   - Fix the NFSv4.1 r/wsize sanity checking
   - Reset the RPC/RDMA credit grant properly after a disconnect
   - Fix a missed page unlock after pg_doio()

  Features and optimisations:
   - Overhaul of the RPC client socket code to eliminate a locking
     bottleneck and reduce the latency when transmitting lots of
     requests in parallel.
   - Allow parallelisation of the RPCSEC_GSS encoding of an RPC request.
   - Convert the RPC client socket receive code to use iovec_iter() for
     improved efficiency.
   - Convert several NFS and RPC lookup operations to use RCU instead of
     taking global locks.
   - Avoid the need for BH-safe locks in the RPC/RDMA back channel.

  Bugfixes and cleanups:
   - Fix lock recovery during NFSv4 delegation recalls
   - Fix the NFSv4 + NFSv4.1 "lookup revalidate + open file" case.
   - Fixes for the RPC connection metrics
   - Various RPC client layer cleanups to consolidate stream based
     sockets
   - RPC/RDMA connection cleanups
   - Simplify the RPC/RDMA cleanup after memory operation failures
   - Clean ups for NFS v4.2 copy completion and NFSv4 open state
     reclaim"

* tag 'nfs-for-4.20-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (97 commits)
  SUNRPC: Convert the auth cred cache to use refcount_t
  SUNRPC: Convert auth creds to use refcount_t
  SUNRPC: Simplify lookup code
  SUNRPC: Clean up the AUTH cache code
  NFS: change sign of nfs_fh length
  sunrpc: safely reallow resvport min/max inversion
  nfs: remove redundant call to nfs_context_set_write_error()
  nfs: Fix a missed page unlock after pg_doio()
  SUNRPC: Fix a compile warning for cmpxchg64()
  NFSv4.x: fix lock recovery during delegation recall
  SUNRPC: use cmpxchg64() in gss_seq_send64_fetch_and_inc()
  xprtrdma: Squelch a sparse warning
  xprtrdma: Clean up xprt_rdma_disconnect_inject
  xprtrdma: Add documenting comments
  xprtrdma: Report when there were zero posted Receives
  xprtrdma: Move rb_flags initialization
  xprtrdma: Don't disable BH's in backchannel server
  xprtrdma: Remove memory address of "ep" from an error message
  xprtrdma: Rename rpcrdma_qp_async_error_upcall
  xprtrdma: Simplify RPC wake-ups on connect
  ...
2018-10-26 13:05:26 -07:00
Trond Myklebust
331bc71cb1 SUNRPC: Convert the auth cred cache to use refcount_t
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-10-23 12:24:33 -04:00
Trond Myklebust
79b1818102 SUNRPC: Convert auth creds to use refcount_t
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-10-23 12:24:33 -04:00
Trond Myklebust
07d02a67b7 SUNRPC: Simplify lookup code
We no longer need to worry about whether or not the entry is hashed in
order to figure out if the contents are valid. We only care whether or
not the refcount is non-zero.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-10-23 12:24:33 -04:00
Trond Myklebust
608a0ab2f5 SUNRPC: Add lockless lookup of the server's auth domain
Avoid taking the global auth_domain_lock in most lookups of the auth domain
by adding an RCU protected lookup.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-10-03 11:32:59 -04:00
Trond Myklebust
571ed1fd23 SUNRPC: Replace krb5_seq_lock with a lockless scheme
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:18 -04:00
Trond Myklebust
ec846469ba SUNRPC: Unexport xdr_partial_copy_from_skb()
It is no longer used outside of net/sunrpc/socklib.c

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:16 -04:00
Trond Myklebust
550aebfe1c SUNRPC: Allow AF_LOCAL sockets to use the generic stream receive
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:16 -04:00
Trond Myklebust
277e4ab7d5 SUNRPC: Simplify TCP receive code by switching to using iterators
Most of this code should also be reusable with other socket types.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:16 -04:00
Trond Myklebust
9d96acbc7f SUNRPC: Add a bvec array to struct xdr_buf for use with iovec_iter()
Add a bvec array to struct xdr_buf, and have the client allocate it
when we need to receive data into pages.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:16 -04:00
Trond Myklebust
431f6eb357 SUNRPC: Add a label for RPC calls that require allocation on receive
If the RPC call relies on the receive call allocating pages as buffers,
then let's label it so that we
a) Don't leak memory by allocating pages for requests that do not expect
   this behaviour
b) Can optimise for the common case where calls do not require allocation.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:16 -04:00
Trond Myklebust
f42f7c2830 SUNRPC: Fix priority queue fairness
Fix up the priority queue to not batch by owner, but by queue, so that
we allow '1 << priority' elements to be dequeued before switching to
the next priority queue.
The owner field is still used to wake up requests in round robin order
by owner to avoid single processes hogging the RPC layer by loading the
queues.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:16 -04:00
Trond Myklebust
95f7691daa SUNRPC: Convert xprt receive queue to use an rbtree
If the server is slow, we can find ourselves with quite a lot of entries
on the receive queue. Converting the search from an O(n) to O(log(n))
can make a significant difference, particularly since we have to hold
a number of locks while searching.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:16 -04:00
Trond Myklebust
adfa71446d SUNRPC: Cleanup: remove the unused 'task' argument from the request_send()
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:16 -04:00
Trond Myklebust
c544577dad SUNRPC: Clean up transport write space handling
Treat socket write space handling in the same way we now treat transport
congestion: by denying the XPRT_LOCK until the transport signals that it
has free buffer space.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:15 -04:00
Trond Myklebust
36bd7de949 SUNRPC: Turn off throttling of RPC slots for TCP sockets
The theory was that we would need to grab the socket lock anyway, so we
might as well use it to gate the allocation of RPC slots for a TCP
socket.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:15 -04:00
Trond Myklebust
75891f502f SUNRPC: Support for congestion control when queuing is enabled
Both RDMA and UDP transports require the request to get a "congestion control"
credit before they can be transmitted. Right now, this is done when
the request locks the socket. We'd like it to happen when a request attempts
to be transmitted for the first time.
In order to support retransmission of requests that already hold such
credits, we also want to ensure that they get queued first, so that we
don't deadlock with requests that have yet to obtain a credit.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:15 -04:00
Trond Myklebust
918f3c1fe8 SUNRPC: Improve latency for interactive tasks
One of the intentions with the priority queues was to ensure that no
single process can hog the transport. The field task->tk_owner therefore
identifies the RPC call's origin, and is intended to allow the RPC layer
to organise queues for fairness.
This commit therefore modifies the transmit queue to group requests
by task->tk_owner, and ensures that we round robin among those groups.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:15 -04:00