Commit Graph

47115 Commits

Author SHA1 Message Date
Linus Torvalds
52f6c588c7 Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random
Pull random updates from Ted Ts'o:
 "Add wait_for_random_bytes() and get_random_*_wait() functions so that
  callers can more safely get random bytes if they can block until the
  CRNG is initialized.

  Also print a warning if get_random_*() is called before the CRNG is
  initialized. By default, only one single-line warning will be printed
  per boot. If CONFIG_WARN_ALL_UNSEEDED_RANDOM is defined, then a
  warning will be printed for each function which tries to get random
  bytes before the CRNG is initialized. This can get spammy for certain
  architecture types, so it is not enabled by default"

* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
  random: reorder READ_ONCE() in get_random_uXX
  random: suppress spammy warnings about unseeded randomness
  random: warn when kernel uses unseeded randomness
  net/route: use get_random_int for random counter
  net/neighbor: use get_random_u32 for 32-bit hash random
  rhashtable: use get_random_u32 for hash_rnd
  ceph: ensure RNG is seeded before using
  iscsi: ensure RNG is seeded before use
  cifs: use get_random_u32 for 32-bit lock random
  random: add get_random_{bytes,u32,u64,int,long,once}_wait family
  random: add wait_for_random_bytes() API
2017-07-15 12:44:02 -07:00
Linus Torvalds
78dcf73421 Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull ->s_options removal from Al Viro:
 "Preparations for fsmount/fsopen stuff (coming next cycle). Everything
  gets moved to explicit ->show_options(), killing ->s_options off +
  some cosmetic bits around fs/namespace.c and friends. Basically, the
  stuff needed to work with fsmount series with minimum of conflicts
  with other work.

  It's not strictly required for this merge window, but it would reduce
  the PITA during the coming cycle, so it would be nice to have those
  bits and pieces out of the way"

* 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  isofs: Fix isofs_show_options()
  VFS: Kill off s_options and helpers
  orangefs: Implement show_options
  9p: Implement show_options
  isofs: Implement show_options
  afs: Implement show_options
  affs: Implement show_options
  befs: Implement show_options
  spufs: Implement show_options
  bpf: Implement show_options
  ramfs: Implement show_options
  pstore: Implement show_options
  omfs: Implement show_options
  hugetlbfs: Implement show_options
  VFS: Don't use save/replace_mount_options if not using generic_show_options
  VFS: Provide empty name qstr
  VFS: Make get_filesystem() return the affected filesystem
  VFS: Clean up whitespace in fs/namespace.c and fs/super.c
  Provide a function to create a NUL-terminated string from unterminated data
2017-07-15 12:00:42 -07:00
Linus Torvalds
2173bd0631 Merge branch 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull network field-by-field copy-in updates from Al Viro:
 "This part of the misc compat queue was held back for review from
  networking folks and since davem has jus ACKed those..."

* 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  get_compat_bpf_fprog(): don't copyin field-by-field
  get_compat_msghdr(): get rid of field-by-field copyin
  copy_msghdr_from_user(): get rid of field-by-field copyin
2017-07-15 11:06:17 -07:00
Linus Torvalds
b86faee6d1 Merge tag 'nfs-for-4.13-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker:
 "Stable bugfixes:
   - Fix -EACCESS on commit to DS handling
   - Fix initialization of nfs_page_array->npages
   - Only invalidate dentries that are actually invalid

  Features:
   - Enable NFSoRDMA transparent state migration
   - Add support for lookup-by-filehandle
   - Add support for nfs re-exporting

  Other bugfixes and cleanups:
   - Christoph cleaned up the way we declare NFS operations
   - Clean up various internal structures
   - Various cleanups to commits
   - Various improvements to error handling
   - Set the dt_type of . and .. entries in NFS v4
   - Make slot allocation more reliable
   - Fix fscache stat printing
   - Fix uninitialized variable warnings
   - Fix potential list overrun in nfs_atomic_open()
   - Fix a race in NFSoRDMA RPC reply handler
   - Fix return size for nfs42_proc_copy()
   - Fix against MAC forgery timing attacks"

* tag 'nfs-for-4.13-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (68 commits)
  NFS: Don't run wake_up_bit() when nobody is waiting...
  nfs: add export operations
  nfs4: add NFSv4 LOOKUPP handlers
  nfs: add a nfs_ilookup helper
  nfs: replace d_add with d_splice_alias in atomic_open
  sunrpc: use constant time memory comparison for mac
  NFSv4.2 fix size storage for nfs42_proc_copy
  xprtrdma: Fix documenting comments in frwr_ops.c
  xprtrdma: Replace PAGE_MASK with offset_in_page()
  xprtrdma: FMR does not need list_del_init()
  xprtrdma: Demote "connect" log messages
  NFSv4.1: Use seqid returned by EXCHANGE_ID after state migration
  NFSv4.1: Handle EXCHGID4_FLAG_CONFIRMED_R during NFSv4.1 migration
  xprtrdma: Don't defer MR recovery if ro_map fails
  xprtrdma: Fix FRWR invalidation error recovery
  xprtrdma: Fix client lock-up after application signal fires
  xprtrdma: Rename rpcrdma_req::rl_free
  xprtrdma: Pass only the list of registered MRs to ro_unmap_sync
  xprtrdma: Pre-mark remotely invalidated MRs
  xprtrdma: On invalidation failure, remove MWs from rl_registered
  ...
2017-07-13 14:35:37 -07:00
Linus Torvalds
6240300597 Merge tag 'nfsd-4.13' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
 "Chuck's RDMA update overhauls the "call receive" side of the
  RPC-over-RDMA transport to use the new rdma_rw API.

  Christoph cleaned the way nfs operations are declared, removing a
  bunch of function-pointer casts and declaring the operation vectors as
  const.

  Christoph's changes touch both client and server, and both client and
  server pulls this time around should be based on the same commits from
  Christoph"

* tag 'nfsd-4.13' of git://linux-nfs.org/~bfields/linux: (53 commits)
  svcrdma: fix an incorrect check on -E2BIG and -EINVAL
  nfsd4: factor ctime into change attribute
  svcrdma: Remove svc_rdma_chunk_ctxt::cc_dir field
  svcrdma: use offset_in_page() macro
  svcrdma: Clean up after converting svc_rdma_recvfrom to rdma_rw API
  svcrdma: Clean-up svc_rdma_unmap_dma
  svcrdma: Remove frmr cache
  svcrdma: Remove unused Read completion handlers
  svcrdma: Properly compute .len and .buflen for received RPC Calls
  svcrdma: Use generic RDMA R/W API in RPC Call path
  svcrdma: Add recvfrom helpers to svc_rdma_rw.c
  sunrpc: Allocate up to RPCSVC_MAXPAGES per svc_rqst
  svcrdma: Don't account for Receive queue "starvation"
  svcrdma: Improve Reply chunk sanity checking
  svcrdma: Improve Write chunk sanity checking
  svcrdma: Improve Read chunk sanity checking
  svcrdma: Remove svc_rdma_marshal.c
  svcrdma: Avoid Send Queue overflow
  svcrdma: Squelch disconnection messages
  sunrpc: Disable splice for krb5i
  ...
2017-07-13 13:56:24 -07:00
Jason A. Donenfeld
15a8b93fd5 sunrpc: use constant time memory comparison for mac
Otherwise, we enable a MAC forgery via timing attack.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@poochiereds.net>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: linux-nfs@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-07-13 16:00:14 -04:00
Chuck Lever
6afafa7799 xprtrdma: Fix documenting comments in frwr_ops.c
Clean up.

FASTREG and LOCAL_INV WRs are typically not signaled. localinv_wake
is used for the last LOCAL_INV WR in a chain, which is always
signaled. The documenting comments should reflect that.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-07-13 16:00:13 -04:00
Chuck Lever
d933cc3201 xprtrdma: Replace PAGE_MASK with offset_in_page()
Clean up.

Reported by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-07-13 16:00:13 -04:00
Chuck Lever
e2f6ef0915 xprtrdma: FMR does not need list_del_init()
Clean up.

Commit 38f1932e60 ("xprtrdma: Remove FMRs from the unmap list
after unmapping") utilized list_del_init() to try to prevent some
list corruption. The corruption was actually caused by the reply
handler racing with a signal. Now that MR invalidation is properly
serialized, list_del_init() can safely be replaced.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-07-13 16:00:13 -04:00
Chuck Lever
173b8f49b3 xprtrdma: Demote "connect" log messages
Some have complained about the log messages generated when xprtrdma
opens or closes a connection to a server. When an NFS mount is
mostly idle these can appear every few minutes as the client idles
out the connection and reconnects.

Connection and disconnection is a normal part of operation, and not
exceptional, so change these to dprintk's for now. At some point
all of these will be converted to tracepoints, but that's for
another day.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-07-13 16:00:12 -04:00
Chuck Lever
1f541895da xprtrdma: Don't defer MR recovery if ro_map fails
Deferred MR recovery does a DMA-unmapping of the MW. However, ro_map
invokes rpcrdma_defer_mr_recovery in some error cases where the MW
has not even been DMA-mapped yet.

Avoid a DMA-unmapping error replacing rpcrdma_defer_mr_recovery.

Also note that if ib_dma_map_sg is asked to map 0 nents, it will
return 0. So the extra "if (i == 0)" check is no longer needed.

Fixes: 42fe28f607 ("xprtrdma: Do not leak an MW during a DMA ...")
Fixes: 505bbe64dd ("xprtrdma: Refactor MR recovery work queues")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-07-13 16:00:11 -04:00
Chuck Lever
8d75483a23 xprtrdma: Fix FRWR invalidation error recovery
When ib_post_send() fails, all LOCAL_INV WRs past @bad_wr have to be
examined, and the MRs reset by hand.

I'm not sure how the existing code can work by comparing R_keys.
Restructure the logic so that instead it walks the chain of WRs,
starting from the first bad one.

Make sure to wait for completion if at least one WR was actually
posted. Otherwise, if the ib_post_send fails, we can end up
DMA-unmapping the MR while LOCAL_INV operations are in flight.

Commit 7a89f9c626 ("xprtrdma: Honor ->send_request API contract")
added the rdma_disconnect() call site. The disconnect actually
causes more problems than it solves, and SQ overruns happen only as
a result of software bugs. So remove it.

Fixes: d7a21c1bed ("xprtrdma: Reset MRs in frwr_op_unmap_sync()")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-07-13 16:00:11 -04:00
Chuck Lever
431af645cf xprtrdma: Fix client lock-up after application signal fires
After a signal, the RPC client aborts synchronous RPCs running on
behalf of the signaled application.

The server is still executing those RPCs, and will write the results
back into the client's memory when it's done. By the time the server
writes the results, that memory is likely being used for other
purposes. Therefore xprtrdma has to immediately invalidate all
memory regions used by those aborted RPCs to prevent the server's
writes from clobbering that re-used memory.

With FMR memory registration, invalidation takes a relatively long
time. In fact, the invalidation is often still running when the
server tries to write the results into the memory regions that are
being invalidated.

This sets up a race between two processes:

1.  After the signal, xprt_rdma_free calls ro_unmap_safe.
2.  While ro_unmap_safe is still running, the server replies and
    rpcrdma_reply_handler runs, calling ro_unmap_sync.

Both processes invoke ib_unmap_fmr on the same FMR.

The mlx4 driver allows two ib_unmap_fmr calls on the same FMR at
the same time, but HCAs generally don't tolerate this. Sometimes
this can result in a system crash.

If the HCA happens to survive, rpcrdma_reply_handler continues. It
removes the rpc_rqst from rq_list and releases the transport_lock.
This enables xprt_rdma_free to run in another process, and the
rpc_rqst is released while rpcrdma_reply_handler is still waiting
for the ib_unmap_fmr call to finish.

But further down in rpcrdma_reply_handler, the transport_lock is
taken again, and "rqst" is dereferenced. If "rqst" has already been
released, this triggers a general protection fault. Since bottom-
halves are disabled, the system locks up.

Address both issues by reversing the order of the xprt_lookup_rqst
call and the ro_unmap_sync call. Introduce a separate lookup
mechanism for rpcrdma_req's to enable calling ro_unmap_sync before
xprt_lookup_rqst. Now the handler takes the transport_lock once
and holds it for the XID lookup and RPC completion.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
Fixes: 68791649a7 ('xprtrdma: Invalidate in the RPC reply ... ')
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-07-13 16:00:11 -04:00
Chuck Lever
a80d66c9e0 xprtrdma: Rename rpcrdma_req::rl_free
Clean up: I'm about to use the rl_free field for purposes other than
a free list. So use a more generic name.

This is a refactoring change only.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
Fixes: 68791649a7 ('xprtrdma: Invalidate in the RPC reply ... ')
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-07-13 16:00:10 -04:00
Chuck Lever
451d26e151 xprtrdma: Pass only the list of registered MRs to ro_unmap_sync
There are rare cases where an rpcrdma_req can be re-used (via
rpcrdma_buffer_put) while the RPC reply handler is still running.
This is due to a signal firing at just the wrong instant.

Since commit 9d6b040978 ("xprtrdma: Place registered MWs on a
per-req list"), rpcrdma_mws are self-contained; ie., they fully
describe an MR and scatterlist, and no part of that information is
stored in struct rpcrdma_req.

As part of closing the above race window, pass only the req's list
of registered MRs to ro_unmap_sync, rather than the rpcrdma_req
itself.

Some extra transport header sanity checking is removed. Since the
client depends on its own recollection of what memory had been
registered, there doesn't seem to be a way to abuse this change.

And, the check was not terribly effective. If the client had sent
Read chunks, the "list_empty" test is negative in both of the
removed cases, which are actually looking for Write or Reply
chunks.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
Fixes: 68791649a7 ('xprtrdma: Invalidate in the RPC reply ... ')
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-07-13 16:00:10 -04:00
Chuck Lever
4b196dc6fe xprtrdma: Pre-mark remotely invalidated MRs
There are rare cases where an rpcrdma_req and its matched
rpcrdma_rep can be re-used, via rpcrdma_buffer_put, while the RPC
reply handler is still using that req. This is typically due to a
signal firing at just the wrong instant.

As part of closing this race window, avoid using the wrong
rpcrdma_rep to detect remotely invalidated MRs. Mark MRs as
invalidated while we are sure the rep is still OK to use.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
Fixes: 68791649a7 ('xprtrdma: Invalidate in the RPC reply ... ')
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-07-13 16:00:10 -04:00
Chuck Lever
04d25b7d5d xprtrdma: On invalidation failure, remove MWs from rl_registered
Callers assume the ro_unmap_sync and ro_unmap_safe methods empty
the list of registered MRs. Ensure that all paths through
fmr_op_unmap_sync() remove MWs from that list.

Fixes: 9d6b040978 ("xprtrdma: Place registered MWs on a ... ")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-07-13 16:00:09 -04:00
Trond Myklebust
92ea011f7c SUNRPC: Make slot allocation more reliable
In xprt_alloc_slot(), the spin lock is only needed to provide atomicity
between the atomic_add_unless() failure and the call to xprt_add_backlog().
We do not actually need to hold it across the memory allocation itself.

By dropping the lock, we can use a more resilient GFP_NOFS allocation,
just as we now do in the rest of the RPC client code.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-07-13 15:58:04 -04:00
Christoph Hellwig
aa8217d5dc sunrpc: mark all struct svc_version instances as const
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-07-13 15:58:03 -04:00
Christoph Hellwig
b9c744c19c sunrpc: mark all struct svc_procinfo instances as const
struct svc_procinfo contains function pointers, and marking it as
constant avoids it being able to be used as an attach vector for
code injections.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-07-13 15:58:02 -04:00
Christoph Hellwig
0becc1181c sunrpc: move pc_count out of struct svc_procinfo
pc_count is the only writeable memeber of struct svc_procinfo, which is
a good candidate to be const-ified as it contains function pointers.

This patch moves it into out out struct svc_procinfo, and into a
separate writable array that is pointed to by struct svc_version.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-07-13 15:58:02 -04:00
Christoph Hellwig
d16d186721 sunrpc: properly type pc_encode callbacks
Drop the resp argument as it can trivially be derived from the rqstp
argument.  With that all functions now have the same prototype, and we
can remove the unsafe casting to kxdrproc_t.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-07-13 15:58:00 -04:00
Christoph Hellwig
cc6acc20a6 sunrpc: properly type pc_decode callbacks
Drop the argp argument as it can trivially be derived from the rqstp
argument.  With that all functions now have the same prototype, and we
can remove the unsafe casting to kxdrproc_t.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-07-13 15:58:00 -04:00
Christoph Hellwig
1150ded804 sunrpc: properly type pc_release callbacks
Drop the p and resp arguments as they are always NULL or can trivially
be derived from the rqstp argument.  With that all functions now have the
same prototype, and we can remove the unsafe casting to kxdrproc_t.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-07-13 15:57:59 -04:00
Christoph Hellwig
1c8a5409f3 sunrpc: properly type pc_func callbacks
Drop the argp and resp arguments as they can trivially be derived from
the rqstp argument.  With that all functions now have the same prototype,
and we can remove the unsafe casting to svc_procfunc as well as the
svc_procfunc typedef itself.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-07-13 15:57:59 -04:00