Commit Graph

1381 Commits

Author SHA1 Message Date
Linus Torvalds
66b8345585 Merge tag 'nfs-for-6.1-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker:
 "New Features:
   - Add NFSv4.2 xattr tracepoints
   - Replace xprtiod WQ in rpcrdma
   - Flexfiles cancels I/O on layout recall or revoke

  Bugfixes and Cleanups:
   - Directly use ida_alloc() / ida_free()
   - Don't open-code max_t()
   - Prefer using strscpy over strlcpy
   - Remove unused forward declarations
   - Always return layout states on flexfiles layout return
   - Have LISTXATTR treat NFS4ERR_NOXATTR as an empty reply instead of
     error
   - Allow more xprtrdma memory allocations to fail without triggering a
     reclaim
   - Various other xprtrdma clean ups
   - Fix rpc_killall_tasks() races"

* tag 'nfs-for-6.1-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (27 commits)
  NFSv4/flexfiles: Cancel I/O if the layout is recalled or revoked
  SUNRPC: Add API to force the client to disconnect
  SUNRPC: Add a helper to allow pNFS drivers to selectively cancel RPC calls
  SUNRPC: Fix races with rpc_killall_tasks()
  xprtrdma: Fix uninitialized variable
  xprtrdma: Prevent memory allocations from driving a reclaim
  xprtrdma: Memory allocation should be allowed to fail during connect
  xprtrdma: MR-related memory allocation should be allowed to fail
  xprtrdma: Clean up synopsis of rpcrdma_regbuf_alloc()
  xprtrdma: Clean up synopsis of rpcrdma_req_create()
  svcrdma: Clean up RPCRDMA_DEF_GFP
  SUNRPC: Replace the use of the xprtiod WQ in rpcrdma
  NFSv4.2: Add a tracepoint for listxattr
  NFSv4.2: Add tracepoints for getxattr, setxattr, and removexattr
  NFSv4.2: Move TRACE_DEFINE_ENUM(NFS4_CONTENT_*) under CONFIG_NFS_V4_2
  NFSv4.2: Add special handling for LISTXATTR receiving NFS4ERR_NOXATTR
  nfs: remove nfs_wait_atomic_killable() and nfs_write_prepare() declaration
  NFSv4: remove nfs4_renewd_prepare_shutdown() declaration
  fs/nfs/pnfs_nfs.c: fix spelling typo and syntax error in comment
  NFSv4/pNFS: Always return layout stats on layout return for flexfiles
  ...
2022-10-13 09:58:42 -07:00
Linus Torvalds
30c999937f Merge tag 'sched-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "Debuggability:

   - Change most occurances of BUG_ON() to WARN_ON_ONCE()

   - Reorganize & fix TASK_ state comparisons, turn it into a bitmap

   - Update/fix misc scheduler debugging facilities

  Load-balancing & regular scheduling:

   - Improve the behavior of the scheduler in presence of lot of
     SCHED_IDLE tasks - in particular they should not impact other
     scheduling classes.

   - Optimize task load tracking, cleanups & fixes

   - Clean up & simplify misc load-balancing code

  Freezer:

   - Rewrite the core freezer to behave better wrt thawing and be
     simpler in general, by replacing PF_FROZEN with TASK_FROZEN &
     fixing/adjusting all the fallout.

  Deadline scheduler:

   - Fix the DL capacity-aware code

   - Factor out dl_task_is_earliest_deadline() &
     replenish_dl_new_period()

   - Relax/optimize locking in task_non_contending()

  Cleanups:

   - Factor out the update_current_exec_runtime() helper

   - Various cleanups, simplifications"

* tag 'sched-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
  sched: Fix more TASK_state comparisons
  sched: Fix TASK_state comparisons
  sched/fair: Move call to list_last_entry() in detach_tasks
  sched/fair: Cleanup loop_max and loop_break
  sched/fair: Make sure to try to detach at least one movable task
  sched: Show PF_flag holes
  freezer,sched: Rewrite core freezer logic
  sched: Widen TAKS_state literals
  sched/wait: Add wait_event_state()
  sched/completion: Add wait_for_completion_state()
  sched: Add TASK_ANY for wait_task_inactive()
  sched: Change wait_task_inactive()s match_state
  freezer,umh: Clean up freezer/initrd interaction
  freezer: Have {,un}lock_system_sleep() save/restore flags
  sched: Rename task_running() to task_on_cpu()
  sched/fair: Cleanup for SIS_PROP
  sched/fair: Default to false in test_idle_cores()
  sched/fair: Remove useless check in select_idle_core()
  sched/fair: Avoid double search on same cpu
  sched/fair: Remove redundant check in select_idle_smt()
  ...
2022-10-10 09:10:28 -07:00
Trond Myklebust
dc4c430485 SUNRPC: Add API to force the client to disconnect
Allow the caller to force a disconnection of the RPC client so that we
can clear any pending requests that are buffered in the socket.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-10-06 09:52:09 -04:00
Trond Myklebust
f8423909ec SUNRPC: Add a helper to allow pNFS drivers to selectively cancel RPC calls
Add the helper rpc_cancel_tasks(), which uses a caller-defined selection
function to define a set of in-flight RPC calls to cancel. This is
mainly intended for pNFS drivers which are subject to a layout recall,
and which may therefore want to cancel all pending I/O using that layout
in order to redrive it after the layout recall has been satisfied.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-10-06 09:52:09 -04:00
Trond Myklebust
39494194f9 SUNRPC: Fix races with rpc_killall_tasks()
Ensure that we immediately call rpc_exit_task() after waking up, and
that the tk_rpc_status cannot get clobbered by some other function.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-10-06 09:52:09 -04:00
Chuck Lever
98124f5bd6 NFSD: Refactor common code out of dirlist helpers
The dust has settled a bit and it's become obvious what code is
totally common between nfsd_init_dirlist_pages() and
nfsd3_init_dirlist_pages(). Move that common code to SUNRPC.

The new helper brackets the existing xdr_init_decode_pages() API.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26 14:02:47 -04:00
Chuck Lever
103cc1fafe SUNRPC: Parametrize how much of argsize should be zeroed
Currently, SUNRPC clears the whole of .pc_argsize before processing
each incoming RPC transaction. Add an extra parameter to struct
svc_procedure to enable upper layers to reduce the amount of each
operation's argument structure that is zeroed by SUNRPC.

The size of struct nfsd4_compoundargs, in particular, is a lot to
clear on each incoming RPC Call. A subsequent patch will cut this
down to something closer to what NFSv2 and NFSv3 uses.

This patch should cause no behavior changes.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26 14:02:42 -04:00
Chuck Lever
1242a87da0 SUNRPC: Fix svcxdr_init_encode's buflen calculation
Commit 2825a7f907 ("nfsd4: allow encoding across page boundaries")
added an explicit computation of the remaining length in the rq_res
XDR buffer.

The computation appears to suffer from an "off-by-one" bug. Because
buflen is too large by one page, XDR encoding can run off the end of
the send buffer by eventually trying to use the struct page address
in rq_page_end, which always contains NULL.

Fixes: bddfdbcddb ("NFSD: Extract the svcxdr_init_encode() helper")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26 14:02:26 -04:00
Chuck Lever
90bfc37b5a SUNRPC: Fix svcxdr_init_decode's end-of-buffer calculation
Ensure that stream-based argument decoding can't go past the actual
end of the receive buffer. xdr_init_decode's calculation of the
value of xdr->end over-estimates the end of the buffer because the
Linux kernel RPC server code does not remove the size of the RPC
header from rqstp->rq_arg before calling the upper layer's
dispatcher.

The server-side still uses the svc_getnl() macros to decode the
RPC call header. These macros reduce the length of the head iov
but do not update the total length of the message in the buffer
(buf->len).

A proper fix for this would be to replace the use of svc_getnl() and
friends in the RPC header decoder, but that would be a large and
invasive change that would be difficult to backport.

Fixes: 5191955d6f ("SUNRPC: Prepare for xdr_stream-style decoding on the server-side")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26 14:02:25 -04:00
Peter Zijlstra
f5d39b0208 freezer,sched: Rewrite core freezer logic
Rewrite the core freezer to behave better wrt thawing and be simpler
in general.

By replacing PF_FROZEN with TASK_FROZEN, a special block state, it is
ensured frozen tasks stay frozen until thawed and don't randomly wake
up early, as is currently possible.

As such, it does away with PF_FROZEN and PF_FREEZER_SKIP, freeing up
two PF_flags (yay!).

Specifically; the current scheme works a little like:

	freezer_do_not_count();
	schedule();
	freezer_count();

And either the task is blocked, or it lands in try_to_freezer()
through freezer_count(). Now, when it is blocked, the freezer
considers it frozen and continues.

However, on thawing, once pm_freezing is cleared, freezer_count()
stops working, and any random/spurious wakeup will let a task run
before its time.

That is, thawing tries to thaw things in explicit order; kernel
threads and workqueues before doing bringing SMP back before userspace
etc.. However due to the above mentioned races it is entirely possible
for userspace tasks to thaw (by accident) before SMP is back.

This can be a fatal problem in asymmetric ISA architectures (eg ARMv9)
where the userspace task requires a special CPU to run.

As said; replace this with a special task state TASK_FROZEN and add
the following state transitions:

	TASK_FREEZABLE	-> TASK_FROZEN
	__TASK_STOPPED	-> TASK_FROZEN
	__TASK_TRACED	-> TASK_FROZEN

The new TASK_FREEZABLE can be set on any state part of TASK_NORMAL
(IOW. TASK_INTERRUPTIBLE and TASK_UNINTERRUPTIBLE) -- any such state
is already required to deal with spurious wakeups and the freezer
causes one such when thawing the task (since the original state is
lost).

The special __TASK_{STOPPED,TRACED} states *can* be restored since
their canonical state is in ->jobctl.

With this, frozen tasks need an explicit TASK_FROZEN wakeup and are
free of undue (early / spurious) wakeups.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20220822114649.055452969@infradead.org
2022-09-07 21:53:50 +02:00
Linus Torvalds
aeb6e6ac18 Merge tag 'nfs-for-5.20-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Stable fixes:
   - pNFS/flexfiles: Fix infinite looping when the RDMA connection
     errors out

  Bugfixes:
   - NFS: fix port value parsing
   - SUNRPC: Reinitialise the backchannel request buffers before reuse
   - SUNRPC: fix expiry of auth creds
   - NFSv4: Fix races in the legacy idmapper upcall
   - NFS: O_DIRECT fixes from Jeff Layton
   - NFSv4.1: Fix OP_SEQUENCE error handling
   - SUNRPC: Fix an RPC/RDMA performance regression
   - NFS: Fix case insensitive renames
   - NFSv4/pnfs: Fix a use-after-free bug in open
   - NFSv4.1: RECLAIM_COMPLETE must handle EACCES

  Features:
   - NFSv4.1: session trunking enhancements
   - NFSv4.2: READ_PLUS performance optimisations
   - NFS: relax the rules for rsize/wsize mount options
   - NFS: don't unhash dentry during unlink/rename
   - SUNRPC: Fail faster on bad verifier
   - NFS/SUNRPC: Various tracing improvements"

* tag 'nfs-for-5.20-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (46 commits)
  NFS: Improve readpage/writepage tracing
  NFS: Improve O_DIRECT tracing
  NFS: Improve write error tracing
  NFS: don't unhash dentry during unlink/rename
  NFSv4/pnfs: Fix a use-after-free bug in open
  NFS: nfs_async_write_reschedule_io must not recurse into the writeback code
  SUNRPC: Don't reuse bvec on retransmission of the request
  SUNRPC: Reinitialise the backchannel request buffers before reuse
  NFSv4.1: RECLAIM_COMPLETE must handle EACCES
  NFSv4.1 probe offline transports for trunking on session creation
  SUNRPC create a function that probes only offline transports
  SUNRPC export xprt_iter_rewind function
  SUNRPC restructure rpc_clnt_setup_test_and_add_xprt
  NFSv4.1 remove xprt from xprt_switch if session trunking test fails
  SUNRPC create an rpc function that allows xprt removal from rpc_clnt
  SUNRPC enable back offline transports in trunking discovery
  SUNRPC create an iterator to list only OFFLINE xprts
  NFSv4.1 offline trunkable transports on DESTROY_SESSION
  SUNRPC add function to offline remove trunkable transports
  SUNRPC expose functions for offline remote xprt functionality
  ...
2022-08-10 14:04:32 -07:00
Linus Torvalds
e394ff83bb Merge tag 'nfsd-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
 "Work on 'courteous server', which was introduced in 5.19, continues
  apace. This release introduces a more flexible limit on the number of
  NFSv4 clients that NFSD allows, now that NFSv4 clients can remain in
  courtesy state long after the lease expiration timeout. The client
  limit is adjusted based on the physical memory size of the server.

  The NFSD filecache is a cache of files held open by NFSv4 clients or
  recently touched by NFSv2 or NFSv3 clients. This cache had some
  significant scalability constraints that have been relieved in this
  release. Thanks to all who contributed to this work.

  A data corruption bug found during the most recent NFS bake-a-thon
  that involves NFSv3 and NFSv4 clients writing the same file has been
  addressed in this release.

  This release includes several improvements in CPU scalability for
  NFSv4 operations. In addition, Neil Brown provided patches that
  simplify locking during file lookup, creation, rename, and removal
  that enables subsequent work on making these operations more scalable.
  We expect to see that work materialize in the next release.

  There are also numerous single-patch fixes, clean-ups, and the usual
  improvements in observability"

* tag 'nfsd-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (78 commits)
  lockd: detect and reject lock arguments that overflow
  NFSD: discard fh_locked flag and fh_lock/fh_unlock
  NFSD: use (un)lock_inode instead of fh_(un)lock for file operations
  NFSD: use explicit lock/unlock for directory ops
  NFSD: reduce locking in nfsd_lookup()
  NFSD: only call fh_unlock() once in nfsd_link()
  NFSD: always drop directory lock in nfsd_unlink()
  NFSD: change nfsd_create()/nfsd_symlink() to unlock directory before returning.
  NFSD: add posix ACLs to struct nfsd_attrs
  NFSD: add security label to struct nfsd_attrs
  NFSD: set attributes when creating symlinks
  NFSD: introduce struct nfsd_attrs
  NFSD: verify the opened dentry after setting a delegation
  NFSD: drop fh argument from alloc_init_deleg
  NFSD: Move copy offload callback arguments into a separate structure
  NFSD: Add nfsd4_send_cb_offload()
  NFSD: Remove kmalloc from nfsd4_do_async_copy()
  NFSD: Refactor nfsd4_do_copy()
  NFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2)
  NFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2)
  ...
2022-08-09 14:56:49 -07:00
Linus Torvalds
c1c76700a0 Merge tag 'spdx-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx
Pull SPDX updates from Greg KH:
 "Here is the set of SPDX comment updates for 6.0-rc1.

  Nothing huge here, just a number of updated SPDX license tags and
  cleanups based on the review of a number of common patterns in GPLv2
  boilerplate text.

  Also included in here are a few other minor updates, two USB files,
  and one Documentation file update to get the SPDX lines correct.

  All of these have been in the linux-next tree for a very long time"

* tag 'spdx-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx: (28 commits)
  Documentation: samsung-s3c24xx: Add blank line after SPDX directive
  x86/crypto: Remove stray comment terminator
  treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_406.RULE
  treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_398.RULE
  treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_391.RULE
  treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_390.RULE
  treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_385.RULE
  treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_320.RULE
  treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_319.RULE
  treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_318.RULE
  treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_298.RULE
  treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_292.RULE
  treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_179.RULE
  treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_168.RULE (part 2)
  treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_168.RULE (part 1)
  treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_160.RULE
  treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_152.RULE
  treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_149.RULE
  treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_147.RULE
  treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_133.RULE
  ...
2022-08-04 12:12:54 -07:00
Chuck Lever
c770f31d8f SUNRPC: Fix xdr_encode_bool()
I discovered that xdr_encode_bool() was returning the same address
that was passed in the @p parameter. The documenting comment states
that the intent is to return the address of the next buffer
location, just like the other "xdr_encode_*" helpers.

The result was the encoded results of NFSv3 PATHCONF operations were
not formed correctly.

Fixes: ded04a587f ("NFSD: Update the NFSv3 PATHCONF3res encoder to use struct xdr_stream")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-07-29 20:08:56 -04:00
Trond Myklebust
72691a269f SUNRPC: Don't reuse bvec on retransmission of the request
If a request is re-encoded and then retransmitted, we need to make sure
that we also re-encode the bvec, in case the page lists have changed.

Fixes: ff053dbbaf ("SUNRPC: Move the call to xprt_send_pagedata() out of xprt_sock_sendmsg()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-27 16:26:17 -04:00
Olga Kornievskaia
92cc04f60a SUNRPC create a function that probes only offline transports
For only offline transports, attempt to check connectivity via
a NULL call and, if that succeeds, call a provided session trunking
detection function.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-25 10:06:04 -04:00
Olga Kornievskaia
273d6aed9e SUNRPC export xprt_iter_rewind function
Make xprt_iter_rewind callable outside of xprtmultipath.c

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-25 10:06:04 -04:00
Olga Kornievskaia
497e6464d6 SUNRPC create an rpc function that allows xprt removal from rpc_clnt
Expose a function that allows a removal of xprt from the rpc_clnt.

When called from NFS that's running a trunked transport then don't
decrement the active transport counter.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-25 10:06:04 -04:00
Olga Kornievskaia
9368fd6c75 SUNRPC enable back offline transports in trunking discovery
When we are adding a transport to a xprt_switch that's already on
the list but has been marked OFFLINE, then make the state ONLINE
since it's been tested now.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-25 10:06:04 -04:00
Olga Kornievskaia
95d0d30c66 SUNRPC create an iterator to list only OFFLINE xprts
Create a new iterator helper that will go thru the all the transports
in the switch and return transports that are marked OFFLINE.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-25 10:06:04 -04:00
Olga Kornievskaia
895245ccea SUNRPC add function to offline remove trunkable transports
Iterate thru available transports in the xprt_switch for all
trunkable transports offline and possibly remote them as well.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-25 10:06:04 -04:00
Olga Kornievskaia
7ffcdaa670 SUNRPC expose functions for offline remote xprt functionality
Re-arrange the code that make offline transport and delete transport
callable functions.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-25 10:06:04 -04:00
Anna Schumaker
29946fbcb2 SUNRPC: Remove xdr_align_data() and xdr_expand_hole()
These functions are no longer needed now that the NFS client places data
and hole segments directly.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-23 15:38:29 -04:00
Anna Schumaker
e1bd87608d SUNRPC: Add a function for zeroing out a portion of an xdr_stream
This will be used during READ_PLUS decoding for handling HOLE segments.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-23 15:38:29 -04:00
Anna Schumaker
7c4cd5f4d2 SUNRPC: Add a function for directly setting the xdr page len
We need to do this step during READ_PLUS decoding so that we know pages
are the right length and any extra data has been preserved in the tail.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-23 15:38:29 -04:00