Pull NFS client fixes from Anna Schumaker:
- Add FMODE_CAN_ODIRECT support to NFSv4 so opens don't fail
- Fix trunking detection & cl_max_connect setting
- Avoid pnfs_update_layout() livelocks
- Don't keep retrying pNFS if the server replies with NFS4ERR_UNAVAILABLE
* tag 'nfs-for-5.19-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFSv4: Add FMODE_CAN_ODIRECT after successful open of a NFS4.x file
sunrpc: set cl_max_connect when cloning an rpc_clnt
pNFS: Avoid a live lock condition in pnfs_update_layout()
pNFS: Don't keep retrying if the server replied NFS4ERR_LAYOUTUNAVAILABLE
Pull nfsd fixes from Chuck Lever:
"Notable changes:
- There is now a backup maintainer for NFSD
Notable fixes:
- Prevent array overruns in svc_rdma_build_writes()
- Prevent buffer overruns when encoding NFSv3 READDIR results
- Fix a potential UAF in nfsd_file_put()"
* tag 'nfsd-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
SUNRPC: Remove pointer type casts from xdr_get_next_encode_buffer()
SUNRPC: Clean up xdr_get_next_encode_buffer()
SUNRPC: Clean up xdr_commit_encode()
SUNRPC: Optimize xdr_reserve_space()
SUNRPC: Fix the calculation of xdr->end in xdr_get_next_encode_buffer()
SUNRPC: Trap RDMA segment overflows
NFSD: Fix potential use-after-free in nfsd_file_put()
MAINTAINERS: reciprocal co-maintainership for file locking and nfsd
To make the code easier to read, remove visual clutter by changing
the declared type of @p.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: J. Bruce Fields <bfields@fieldses.org>
The value of @p is not used until the "location of the next item" is
computed. Help human readers by moving its initial assignment to the
paragraph where that value is used and by clarifying the antecedents
in the documenting comment.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Reviewed-by: J. Bruce Fields <bfields@fieldses.org>
Both the kvec::iov_len field and the third parameter of memcpy() and
memmove() are size_t. There's no reason for the implicit conversion
from size_t to int and back. Change the type of @shift to make the
code easier to read and understand.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: J. Bruce Fields <bfields@fieldses.org>
Transitioning between encode buffers is quite infrequent. It happens
about 1 time in 400 calls to xdr_reserve_space(), measured on NFSD
with a typical build/test workload.
Force the compiler to remove that code from xdr_reserve_space(),
which is a hot path on both the server and the client. This change
reduces the size of xdr_reserve_space() from 10 cache lines to 2
when compiled with -Os.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: J. Bruce Fields <bfields@fieldses.org>
I found that NFSD's new NFSv3 READDIRPLUS XDR encoder was screwing up
right at the end of the page array. xdr_get_next_encode_buffer() does
not compute the value of xdr->end correctly:
* The check to see if we're on the final available page in xdr->buf
needs to account for the space consumed by @nbytes.
* The new xdr->end value needs to account for the portion of @nbytes
that is to be encoded into the previous buffer.
Fixes: 2825a7f907 ("nfsd4: allow encoding across page boundaries")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: J. Bruce Fields <bfields@fieldses.org>
If the initial attempt at trunking detection using the krb5i auth flavor
fails with -EACCES, -NFS4ERR_CLID_INUSE, or -NFS4ERR_WRONGSEC, then the
NFS client tries again using auth_sys, cloning the rpc_clnt in the
process. If this second attempt at trunking detection succeeds, then
the resulting nfs_client->cl_rpcclient winds up having cl_max_connect=0
and subsequent attempts to add additional transport connections to the
rpc_clnt will fail with a message similar to the following being logged:
[502044.312640] SUNRPC: reached max allowed number (0) did not add
transport to server: 192.168.122.3
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Fixes: dc48e0abee ("SUNRPC enforce creation of no more than max_connect xprts")
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Prevent svc_rdma_build_writes() from walking off the end of a Write
chunk's segment array. Caught with KASAN.
The test that this fix replaces is invalid, and might have been left
over from an earlier prototype of the PCL work.
Fixes: 7a1cbfa180 ("svcrdma: Use parsed chunk lists to construct RDMA Writes")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Pull NFS client updates from Anna Schumaker:
"New Features:
- Add support for 'dacl' and 'sacl' attributes
Bugfixes and Cleanups:
- Fixes for reporting mapping errors
- Fixes for memory allocation errors
- Improve warning message when locks are lost
- Update documentation for the nfs4_unique_id parameter
- Add an explanation of NFSv4 client identifiers
- Ensure the i_size attribute is written to the fscache storage
- Fix freeing uninitialized nfs4_labels
- Better handling when xprtrdma bc_serv is NULL
- Mark qualified async operations as MOVEABLE tasks"
* tag 'nfs-for-5.19-1' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFSv4.1 mark qualified async operations as MOVEABLE tasks
xprtrdma: treat all calls not a bcall when bc_serv is NULL
NFSv4: Fix free of uninitialized nfs4_label on referral lookup.
NFS: Pass i_size to fscache_unuse_cookie() when a file is released
Documentation: Add an explanation of NFSv4 client identifiers
NFS: update documentation for the nfs4_unique_id parameter
NFS: Improve warning message when locks are lost.
NFSv4.1: Enable access to the NFSv4.1 'dacl' and 'sacl' attributes
NFSv4: Add encoders/decoders for the NFSv4.1 dacl and sacl attributes
NFSv4: Specify the type of ACL to cache
NFSv4: Don't hold the layoutget locks across multiple RPC calls
pNFS/files: Fall back to I/O through the MDS on non-fatal layout errors
NFS: Further fixes to the writeback error handling
NFSv4/pNFS: Do not fail I/O when we fail to allocate the pNFS layout
NFS: Memory allocation failures are not server fatal errors
NFS: Don't report errors from nfs_pageio_complete() more than once
NFS: Do not report flush errors in nfs_write_end()
NFS: Don't report ENOSPC write errors twice
NFS: fsync() should report filesystem errors over EINTR/ERESTARTSYS
NFS: Do not report EINTR/ERESTARTSYS as mapping errors
Pull rdma updates from Jason Gunthorpe:
"Small collection of incremental improvement patches:
- Minor code cleanup patches, comment improvements, etc from static
tools
- Clean the some of the kernel caps, reducing the historical stealth
uAPI leftovers
- Bug fixes and minor changes for rdmavt, hns, rxe, irdma
- Remove unimplemented cruft from rxe
- Reorganize UMR QP code in mlx5 to avoid going through the IB verbs
layer
- flush_workqueue(system_unbound_wq) removal
- Ensure rxe waits for objects to be unused before allowing the core
to free them
- Several rc quality bug fixes for hfi1"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (67 commits)
RDMA/rtrs-clt: Fix one kernel-doc comment
RDMA/hfi1: Remove all traces of diagpkt support
RDMA/hfi1: Consolidate software versions
RDMA/hfi1: Remove pointless driver version
RDMA/hfi1: Fix potential integer multiplication overflow errors
RDMA/hfi1: Prevent panic when SDMA is disabled
RDMA/hfi1: Prevent use of lock before it is initialized
RDMA/rxe: Fix an error handling path in rxe_get_mcg()
IB/core: Fix typo in comment
RDMA/core: Fix typo in comment
IB/hf1: Fix typo in comment
IB/qib: Fix typo in comment
IB/iser: Fix typo in comment
RDMA/mlx4: Avoid flush_scheduled_work() usage
IB/isert: Avoid flush_scheduled_work() usage
RDMA/mlx5: Remove duplicate pointer assignment in mlx5_ib_alloc_implicit_mr()
RDMA/qedr: Remove unnecessary synchronize_irq() before free_irq()
RDMA/hns: Use hr_reg_read() instead of remaining roce_get_xxx()
RDMA/hns: Use hr_reg_xxx() instead of remaining roce_set_xxx()
RDMA/irdma: Add SW mechanism to generate completions on error
...
Pull nfsd updates from Chuck Lever:
"We introduce 'courteous server' in this release. Previously NFSD would
purge open and lock state for an unresponsive client after one lease
period (typically 90 seconds). Now, after one lease period, another
client can open and lock those files and the unresponsive client's
lease is purged; otherwise if the unresponsive client's open and lock
state is uncontended, the server retains that open and lock state for
up to 24 hours, allowing the client's workload to resume after a
lengthy network partition.
A longstanding issue with NFSv4 file creation is also addressed.
Previously a file creation can fail internally, returning an error to
the client, but leave the newly created file in place as an artifact.
The file creation code path has been reorganized so that internal
failures and race conditions are less likely to result in an unwanted
file creation.
A fault injector has been added to help exercise paths that are run
during kernel metadata cache invalidation. These caches contain
information maintained by user space about exported filesystems. Many
of our test workloads do not trigger cache invalidation.
There is one patch that is needed to support PREEMPT_RT and a fix for
an ancient 'sleep while spin-locked' splat that seems to have become
easier to hit since v5.18-rc3"
* tag 'nfsd-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (36 commits)
NFSD: nfsd_file_put() can sleep
NFSD: Add documenting comment for nfsd4_release_lockowner()
NFSD: Modernize nfsd4_release_lockowner()
NFSD: Fix possible sleep during nfsd4_release_lockowner()
nfsd: destroy percpu stats counters after reply cache shutdown
nfsd: Fix null-ptr-deref in nfsd_fill_super()
nfsd: Unregister the cld notifier when laundry_wq create failed
SUNRPC: Use RMW bitops in single-threaded hot paths
NFSD: Clean up the show_nf_flags() macro
NFSD: Trace filecache opens
NFSD: Move documenting comment for nfsd4_process_open2()
NFSD: Fix whitespace
NFSD: Remove dprintk call sites from tail of nfsd4_open()
NFSD: Instantiate a struct file when creating a regular NFSv4 file
NFSD: Clean up nfsd_open_verified()
NFSD: Remove do_nfsd_create()
NFSD: Refactor NFSv4 OPEN(CREATE)
NFSD: Refactor NFSv3 CREATE
NFSD: Refactor nfsd_create_setattr()
NFSD: Avoid calling fh_drop_write() twice in do_nfsd_create()
...
I noticed CPU pipeline stalls while using perf.
Once an svc thread is scheduled and executing an RPC, no other
processes will touch svc_rqst::rq_flags. Thus bus-locked atomics are
not needed outside the svc thread scheduler.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Clean up: There is one caller. The @cpu argument can be made
implicit now that a get_cpu/put_cpu pair is no longer needed.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
svc_xprt_enqueue() disables preemption via get_cpu() and then asks
for a pool of a specific CPU (current) via svc_pool_for_cpu().
While preemption is disabled, svc_xprt_enqueue() acquires
svc_pool::sp_lock with bottom-halfs disabled, which can sleep on
PREEMPT_RT.
Disabling preemption is not required here. The pool is protected with a
lock so the following list access is safe even cross-CPU. The following
iteration through svc_pool::sp_all_threads is under RCU-readlock and
remaining operations within the loop are atomic and do not rely on
disabled-preemption.
Use raw_smp_processor_id() as the argument for the requested CPU in
svc_pool_for_cpu().
Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cache deferral injection stress-tests the cache deferral logic as
well as upper layer protocol deferred request handlers. This
facility is for developers and professional testers to ensure
coverage of the rqst deferral code paths. To date, we haven't
had an adequate way to ensure these code paths are covered
during testing, short of temporary code changes to force their
use.
A file called /sys/kernel/debug/fail_sunrpc/ignore-cache-wait
enables administrators to disable cache deferral injection while
allowing other types of sunrpc errors to be injected. The default
setting is that cache deferral injection is enabled (ignore=false).
To enable support for cache deferral injection,
CONFIG_FAULT_INJECTION, CONFIG_FAULT_INJECTION_DEBUG_FS, and
CONFIG_SUNRPC_DEBUG must all be set to "Y".
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Ensure that the gssproxy client connects to the server from the gssproxy
daemon process context so that the AF_LOCAL socket connection is done
using the correct path and namespaces.
Fixes: 1d658336b0 ("SUNRPC: Add RPC based upcall mechanism for RPCGSS auth")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>