Commit Graph

1196 Commits

Author SHA1 Message Date
Linus Torvalds
29a47f456d Merge tag 'nfs-for-5.7-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
 "Highlights include:

  Stable fixes:
   - fix handling of backchannel binding in BIND_CONN_TO_SESSION

  Bugfixes:
   - Fix a credential use-after-free issue in pnfs_roc()
   - Fix potential posix_acl refcnt leak in nfs3_set_acl
   - defer slow parts of rpc_free_client() to a workqueue
   - Fix an Oopsable race in __nfs_list_for_each_server()
   - Fix trace point use-after-free race
   - Regression: the RDMA client no longer responds to server disconnect
     requests
   - Fix return values of xdr_stream_encode_item_{present, absent}
   - _pnfs_return_layout() must always wait for layoutreturn completion

  Cleanups:
   - Remove unreachable error conditions"

* tag 'nfs-for-5.7-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: Fix a race in __nfs_list_for_each_server()
  NFSv4.1: fix handling of backchannel binding in BIND_CONN_TO_SESSION
  SUNRPC: defer slow parts of rpc_free_client() to a workqueue.
  NFSv4: Remove unreachable error condition due to rpc_run_task()
  SUNRPC: Remove unreachable error condition
  xprtrdma: Fix use of xdr_stream_encode_item_{present, absent}
  xprtrdma: Fix trace point use-after-free race
  xprtrdma: Restore wake-up-all to rpcrdma_cm_event_handler()
  nfs: Fix potential posix_acl refcnt leak in nfs3_set_acl
  NFS/pnfs: Fix a credential use-after-free issue in pnfs_roc()
  NFS/pnfs: Ensure that _pnfs_return_layout() waits for layoutreturn completion
2020-05-02 11:24:01 -07:00
Olga Kornievskaia
dff58530c4 NFSv4.1: fix handling of backchannel binding in BIND_CONN_TO_SESSION
Currently, if the client sends BIND_CONN_TO_SESSION with
NFS4_CDFC4_FORE_OR_BOTH but only gets NFS4_CDFS4_FORE back it ignores
that it wasn't able to enable a backchannel.

To make sure, the client sends BIND_CONN_TO_SESSION as the first
operation on the connections (ie., no other session compounds haven't
been sent before), and if the client's request to bind the backchannel
is not satisfied, then reset the connection and retry.

Cc: stable@vger.kernel.org
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-28 15:58:38 -04:00
NeilBrown
7c4310ff56 SUNRPC: defer slow parts of rpc_free_client() to a workqueue.
The rpciod workqueue is on the write-out path for freeing dirty memory,
so it is important that it never block waiting for memory to be
allocated - this can lead to a deadlock.

rpc_execute() - which is often called by an rpciod work item - calls
rcp_task_release_client() which can lead to rpc_free_client().

rpc_free_client() makes two calls which could potentially block wating
for memory allocation.

rpc_clnt_debugfs_unregister() calls into debugfs and will block while
any of the debugfs files are being accessed.  In particular it can block
while any of the 'open' methods are being called and all of these use
malloc for one thing or another.  So this can deadlock if the memory
allocation waits for NFS to complete some writes via rpciod.

rpc_clnt_remove_pipedir() can take the inode_lock() and while it isn't
obvious that memory allocations can happen while the lock it held, it is
safer to assume they might and to not let rpciod call
rpc_clnt_remove_pipedir().

So this patch moves these two calls (together with the final kfree() and
rpciod_down()) into a work-item to be run from the system work-queue.
rpciod can continue its important work, and the final stages of the free
can happen whenever they happen.

I have seen this deadlock on a 4.12 based kernel where debugfs used
synchronize_srcu() when removing objects.  synchronize_srcu() requires a
workqueue and there were no free workther threads and none could be
allocated.  While debugsfs no longer uses SRCU, I believe the deadlock
is still possible.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-28 15:58:38 -04:00
Chuck Lever
23cf1ee1f1 svcrdma: Fix leak of svc_rdma_recv_ctxt objects
Utilize the xpo_release_rqst transport method to ensure that each
rqstp's svc_rdma_recv_ctxt object is released even when the server
cannot return a Reply for that rqstp.

Without this fix, each RPC whose Reply cannot be sent leaks one
svc_rdma_recv_ctxt. This is a 2.5KB structure, a 4KB DMA-mapped
Receive buffer, and any pages that might be part of the Reply
message.

The leak is infrequent unless the network fabric is unreliable or
Kerberos is in use, as GSS sequence window overruns, which result
in connection loss, are more common on fast transports.

Fixes: 3a88092ee3 ("svcrdma: Preserve Receive buffer until svc_rdma_sendto")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-04-17 12:40:38 -04:00
Linus Torvalds
04de788e61 Merge tag 'nfs-for-5.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Stable fixes:
   - Fix a page leak in nfs_destroy_unlinked_subrequests()

   - Fix use-after-free issues in nfs_pageio_add_request()

   - Fix new mount code constant_table array definitions

   - finish_automount() requires us to hold 2 refs to the mount record

  Features:
   - Improve the accuracy of telldir/seekdir by using 64-bit cookies
     when possible.

   - Allow one RDMA active connection and several zombie connections to
     prevent blocking if the remote server is unresponsive.

   - Limit the size of the NFS access cache by default

   - Reduce the number of references to credentials that are taken by
     NFS

   - pNFS files and flexfiles drivers now support per-layout segment
     COMMIT lists.

   - Enable partial-file layout segments in the pNFS/flexfiles driver.

   - Add support for CB_RECALL_ANY to the pNFS flexfiles layout type

   - pNFS/flexfiles Report NFS4ERR_DELAY and NFS4ERR_GRACE errors from
     the DS using the layouterror mechanism.

  Bugfixes and cleanups:
   - SUNRPC: Fix krb5p regressions

   - Don't specify NFS version in "UDP not supported" error

   - nfsroot: set tcp as the default transport protocol

   - pnfs: Return valid stateids in nfs_layout_find_inode_by_stateid()

   - alloc_nfs_open_context() must use the file cred when available

   - Fix locking when dereferencing the delegation cred

   - Fix memory leaks in O_DIRECT when nfs_get_lock_context() fails

   - Various clean ups of the NFS O_DIRECT commit code

   - Clean up RDMA connect/disconnect

   - Replace zero-length arrays with C99-style flexible arrays"

* tag 'nfs-for-5.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (86 commits)
  NFS: Clean up process of marking inode stale.
  SUNRPC: Don't start a timer on an already queued rpc task
  NFS/pnfs: Reference the layout cred in pnfs_prepare_layoutreturn()
  NFS/pnfs: Fix dereference of layout cred in pnfs_layoutcommit_inode()
  NFS: Beware when dereferencing the delegation cred
  NFS: Add a module parameter to set nfs_mountpoint_expiry_timeout
  NFS: finish_automount() requires us to hold 2 refs to the mount record
  NFS: Fix a few constant_table array definitions
  NFS: Try to join page groups before an O_DIRECT retransmission
  NFS: Refactor nfs_lock_and_join_requests()
  NFS: Reverse the submission order of requests in __nfs_pageio_add_request()
  NFS: Clean up nfs_lock_and_join_requests()
  NFS: Remove the redundant function nfs_pgio_has_mirroring()
  NFS: Fix memory leaks in nfs_pageio_stop_mirroring()
  NFS: Fix a request reference leak in nfs_direct_write_clear_reqs()
  NFS: Fix use-after-free issues in nfs_pageio_add_request()
  NFS: Fix races nfs_page_group_destroy() vs nfs_destroy_unlinked_subrequests()
  NFS: Fix a page leak in nfs_destroy_unlinked_subrequests()
  NFS: Remove unused FLUSH_SYNC support in nfs_initiate_pgio()
  pNFS/flexfiles: Specify the layout segment range in LAYOUTGET
  ...
2020-04-07 13:51:39 -07:00
J. Bruce Fields
9a81ef42b2 SUNRPC/cache: don't allow invalid entries to be flushed
Trond points out in commit 277f27e2f2 ("SUNRPC/cache: Allow
garbage collection of invalid cache entries") that we allow invalid
cache entries to persist indefinitely. That fix, however,
reintroduces the problem fixed by Kinglong Mee's commit d6fc8821c2
("SUNRPC/Cache: Always treat the invalid cache as unexpired"), where
an invalid cache entry is immediately removed by a flush before
mountd responds to it. The result is that the server thread that
should be waiting for mountd to fill in that entry instead gets an
-ETIMEDOUT return from cache_check(). Symptoms are the server
becoming unresponsive after a restart, reproduceable by running
pynfs 4.1 test REBT5.

Instead, take a compromise approach: allow invalid cache entries to
be removed after they expire, but not to be removed by a cache
flush.

Fixes: 277f27e2f2 ("SUNRPC/cache: Allow garbage collection ... ")
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-27 12:17:31 -04:00
Trond Myklebust
277f27e2f2 SUNRPC/cache: Allow garbage collection of invalid cache entries
If the cache entry never gets initialised, we want the garbage
collector to be able to evict it. Otherwise if the upcall daemon
fails to initialise the entry, we end up never expiring it.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
[ cel: resolved a merge conflict ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16 12:04:34 -04:00
Trond Myklebust
65286b883c nfsd: export upcalls must not return ESTALE when mountd is down
If the rpc.mountd daemon goes down, then that should not cause all
exports to start failing with ESTALE errors. Let's explicitly
distinguish between the cache upcall cases that need to time out,
and those that do not.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16 12:04:33 -04:00
Chuck Lever
9e55eef4ab SUNRPC: Refactor xs_sendpages()
Re-locate xs_sendpages() so that it can be shared with server code.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16 12:04:33 -04:00
Chuck Lever
0dabe948f2 svcrdma: Avoid DMA mapping small RPC Replies
On some platforms, DMA mapping part of a page is more costly than
copying bytes. Indeed, not involving the I/O MMU can help the
RPC/RDMA transport scale better for tiny I/Os across more RDMA
devices. This is because interaction with the I/O MMU is eliminated
for each of these small I/Os. Without the explicit unmapping, the
NIC no longer needs to do a costly internal TLB shoot down for
buffers that are just a handful of bytes.

Since pull-up is now a more a frequent operation, I've introduced a
trace point in the pull-up path. It can be used for debugging or
user-space tools that count pull-up frequency.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16 12:04:33 -04:00
Chuck Lever
aee4b74a3f svcrdma: Fix double sync of transport header buffer
Performance optimization: Avoid syncing the transport buffer twice
when Reply buffer pull-up is necessary.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16 12:04:33 -04:00
Chuck Lever
6fd5034db4 svcrdma: Refactor chunk list encoders
Same idea as the receive-side changes I did a while back: use
xdr_stream helpers rather than open-coding the XDR chunk list
encoders. This builds the Reply transport header from beginning to
end without backtracking.

As additional clean-ups, fill in documenting comments for the XDR
encoders and sprinkle some trace points in the new encoding
functions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16 12:04:33 -04:00
Chuck Lever
5c266df527 SUNRPC: Add encoders for list item discriminators
Clean up. These are taken from the client-side RPC/RDMA transport
to a more global header file so they can be used elsewhere.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16 12:04:32 -04:00
Chuck Lever
4554755ed8 svcrdma: Update synopsis of svc_rdma_map_reply_msg()
Preparing for subsequent patches, no behavior change expected.

Pass the RPC Call's svc_rdma_recv_ctxt deeper into the sendto()
path. This enables passing more information about Requester-
provided Write and Reply chunks into those lower-level functions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16 12:04:32 -04:00
Chuck Lever
6fa5785e78 svcrdma: Update synopsis of svc_rdma_send_reply_chunk()
Preparing for subsequent patches, no behavior change expected.

Pass the RPC Call's svc_rdma_recv_ctxt deeper into the sendto()
path. This enables passing more information about Requester-
provided Write and Reply chunks into the lower-level send
functions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16 12:04:32 -04:00
Chuck Lever
2fe8c44633 svcrdma: De-duplicate code that locates Write and Reply chunks
Cache the locations of the Requester-provided Write list and Reply
chunk so that the Send path doesn't need to parse the Call header
again.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16 12:04:32 -04:00
Chuck Lever
e604aad2ca svcrdma: Use struct xdr_stream to decode ingress transport headers
The logic that checks incoming network headers has to be scrupulous.

De-duplicate: replace open-coded buffer overflow checks with the use
of xdr_stream helpers that are used most everywhere else XDR
decoding is done.

One minor change to the sanity checks: instead of checking the
length of individual segments, cap the length of the whole chunk
to be sure it can fit in the set of pages available in rq_pages.
This should be a better test of whether the server can handle the
chunks in each request.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16 12:04:32 -04:00
Chuck Lever
96f194b715 SUNRPC: Add xdr_pad_size() helper
Introduce a helper function to compute the XDR pad size of a
variable-length XDR object.

Clean up: Replace open-coded calculation of XDR pad sizes.
I'm sure I haven't found every instance of this calculation.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16 12:04:31 -04:00
Chuck Lever
412055398b nfsd: Fix NFSv4 READ on RDMA when using readv
svcrdma expects that the payload falls precisely into the xdr_buf
page vector. This does not seem to be the case for
nfsd4_encode_readv().

This code is called only when fops->splice_read is missing or when
RQ_SPLICE_OK is clear, so it's not a noticeable problem in many
common cases.

Add new transport method: ->xpo_read_payload so that when a READ
payload does not fit exactly in rq_res's page vector, the XDR
encoder can inform the RPC transport exactly where that payload is,
without the payload's XDR pad.

That way, when a Write chunk is present, the transport knows what
byte range in the Reply message is supposed to be matched with the
chunk.

Note that the Linux NFS server implementation of NFS/RDMA can
currently handle only one Write chunk per RPC-over-RDMA message.
This simplifies the implementation of this fix.

Fixes: b042098063 ("nfsd4: allow exotic read compounds")
Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=198053
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16 12:04:31 -04:00
Gustavo A. R. Silva
469aef23aa sunrpc: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16 12:04:31 -04:00
Chuck Lever
8d6bda7f23 SUNRPC: Remove xdr_buf_read_mic()
Clean up: this function is no longer used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16 10:18:45 -04:00
Trond Myklebust
7eac52648a SUNRPC: Add a flag to avoid reference counts on credentials
Add a flag to signal to the RPC layer that the credential is already
pinned for the duration of the RPC call.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16 08:34:28 -04:00
Linus Torvalds
f43574d0ac Merge tag 'nfs-for-5.6-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Puyll NFS client updates from Anna Schumaker:
 "Stable bugfixes:
   - Fix memory leaks and corruption in readdir # v2.6.37+
   - Directory page cache needs to be locked when read # v2.6.37+

  New features:
   - Convert NFS to use the new mount API
   - Add "softreval" mount option to let clients use cache if server goes down
   - Add a config option to compile without UDP support
   - Limit the number of inactive delegations the client can cache at once
   - Improved readdir concurrency using iterate_shared()

  Other bugfixes and cleanups:
   - More 64-bit time conversions
   - Add additional diagnostic tracepoints
   - Check for holes in swapfiles, and add dependency on CONFIG_SWAP
   - Various xprtrdma cleanups to prepare for 5.7's changes
   - Several fixes for NFS writeback and commit handling
   - Fix acls over krb5i/krb5p mounts
   - Recover from premature loss of openstateids
   - Fix NFS v3 chacl and chmod bug
   - Compare creds using cred_fscmp()
   - Use kmemdup_nul() in more places
   - Optimize readdir cache page invalidation
   - Lease renewal and recovery fixes"

* tag 'nfs-for-5.6-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (93 commits)
  NFSv4.0: nfs4_do_fsinfo() should not do implicit lease renewals
  NFSv4: try lease recovery on NFS4ERR_EXPIRED
  NFS: Fix memory leaks
  nfs: optimise readdir cache page invalidation
  NFS: Switch readdir to using iterate_shared()
  NFS: Use kmemdup_nul() in nfs_readdir_make_qstr()
  NFS: Directory page cache pages need to be locked when read
  NFS: Fix memory leaks and corruption in readdir
  SUNRPC: Use kmemdup_nul() in rpc_parse_scope_id()
  NFS: Replace various occurrences of kstrndup() with kmemdup_nul()
  NFSv4: Limit the total number of cached delegations
  NFSv4: Add accounting for the number of active delegations held
  NFSv4: Try to return the delegation immediately when marked for return on close
  NFS: Clear NFS_DELEGATION_RETURN_IF_CLOSED when the delegation is returned
  NFSv4: nfs_inode_evict_delegation() should set NFS_DELEGATION_RETURNING
  NFS: nfs_find_open_context() should use cred_fscmp()
  NFS: nfs_access_get_cached_rcu() should use cred_fscmp()
  NFSv4: pnfs_roc() must use cred_fscmp() to compare creds
  NFS: remove unused macros
  nfs: Return EINVAL rather than ERANGE for mount parse errors
  ...
2020-02-07 17:39:56 -08:00
Alexey Dobriyan
97a32539b9 proc: convert everything to "struct proc_ops"
The most notable change is DEFINE_SHOW_ATTRIBUTE macro split in
seq_file.h.

Conversion rule is:

	llseek		=> proc_lseek
	unlocked_ioctl	=> proc_ioctl

	xxx		=> proc_xxx

	delete ".owner = THIS_MODULE" line

[akpm@linux-foundation.org: fix drivers/isdn/capi/kcapi_proc.c]
[sfr@canb.auug.org.au: fix kernel/sched/psi.c]
  Link: http://lkml.kernel.org/r/20200122180545.36222f50@canb.auug.org.au
Link: http://lkml.kernel.org/r/20191225172546.GB13378@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-04 03:05:26 +00:00
Trond Myklebust
b32d285539 SUNRPC: Remove broken gss_mech_list_pseudoflavors()
Remove gss_mech_list_pseudoflavors() and its callers. This is part of
an unused API, and could leak an RCU reference if it were ever called.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:32 -05:00