Commit Graph

61792 Commits

Author SHA1 Message Date
Linus Torvalds
a78f7cdddb Merge tag '5.5-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
 "Nine cifs/smb3 fixes:

   - one fix for stable (oops during oplock break)

   - two timestamp fixes including important one for updating mtime at
     close to avoid stale metadata caching issue on dirty files (also
     improves perf by using SMB2_CLOSE_FLAG_POSTQUERY_ATTRIB over the
     wire)

   - two fixes for "modefromsid" mount option for file create (now
     allows mode bits to be set more atomically and accurately on create
     by adding "sd_context" on create when modefromsid specified on
     mount)

   - two fixes for multichannel found in testing this week against
     different servers

   - two small cleanup patches"

* tag '5.5-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
  smb3: improve check for when we send the security descriptor context on create
  smb3: fix mode passed in on create for modetosid mount option
  cifs: fix possible uninitialized access and race on iface_list
  cifs: Fix lookup of SMB connections on multichannel
  smb3: query attributes on file close
  smb3: remove unused flag passed into close functions
  cifs: remove redundant assignment to pointer pneg_ctxt
  fs: cifs: Fix atime update check vs mtime
  CIFS: Fix NULL-pointer dereference in smb2_push_mandatory_locks
2019-12-08 12:12:18 -08:00
Linus Torvalds
5bf9a06a5f Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs cleanups from Al Viro:
 "No common topic, just three cleanups".

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  make __d_alloc() static
  fs/namespace: add __user to open_tree and move_mount syscalls
  fs/fnctl: fix missing __user in fcntl_rw_hint()
2019-12-08 11:08:28 -08:00
Linus Torvalds
95207d554b Merge tag 'iomap-5.5-merge-14' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull iomap fixes from Darrick Wong:
 "Fix a race condition and a use-after-free error:

   - Fix a UAF when reporting writeback errors

   - Fix a race condition when handling page uptodate on fragmented file
     with blocksize < pagesize"

* tag 'iomap-5.5-merge-14' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  iomap: stop using ioend after it's been freed in iomap_finish_ioend()
  iomap: fix sub-page uptodate handling
2019-12-07 17:07:18 -08:00
Linus Torvalds
50caca9d7f Merge tag 'xfs-5.5-merge-17' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
 "Fix a couple of resource management errors and a hang:

   - fix a crash in the log setup code when log mounting fails

   - fix a hang when allocating space on the realtime device

   - fix a block leak when freeing space on the realtime device"

* tag 'xfs-5.5-merge-17' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: fix mount failure crash on invalid iclog memory access
  xfs: don't check for AG deadlock for realtime files in bunmapi
  xfs: fix realtime file data space leak
2019-12-07 17:05:33 -08:00
Linus Torvalds
316933cf74 Merge tag 'for-linus-5.5-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux
Pull orangefs update from Mike Marshall:
 "orangefs: posix open permission checking...

  Orangefs has no open, and orangefs checks file permissions on each
  file access. Posix requires that file permissions be checked on open
  and nowhere else. Orangefs-through-the-kernel needs to seem posix
  compliant.

  The VFS opens files, even if the filesystem provides no method. We can
  see if a file was successfully opened for read and or for write by
  looking at file->f_mode.

  When writes are flowing from the page cache, file is no longer
  available. We can trust the VFS to have checked file->f_mode before
  writing to the page cache.

  The mode of a file might change between when it is opened and IO
  commences, or it might be created with an arbitrary mode.

  We'll make sure we don't hit EACCES during the IO stage by using
  UID 0"

[ This is "posixish", but not a great solution in the long run, since a
  proper secure network server shouldn't really trust the client like this.
  But proper and secure POSIX behavior requires an open method and a
  resulting cookie for IO of some kind, or similar.    - Linus ]

* tag 'for-linus-5.5-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
  orangefs: posix open permission checking...
2019-12-07 16:59:25 -08:00
Linus Torvalds
911d137ab0 Merge tag 'nfsd-5.5' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
 "This is a relatively quiet cycle for nfsd, mainly various bugfixes.

  Possibly most interesting is Trond's fixes for some callback races
  that were due to my incomplete understanding of rpc client shutdown.
  Unfortunately at the last minute I've started noticing a new
  intermittent failure to send callbacks. As the logic seems basically
  correct, I'm leaving Trond's patches in for now, and hope to find a
  fix in the next week so I don't have to revert those patches"

* tag 'nfsd-5.5' of git://linux-nfs.org/~bfields/linux: (24 commits)
  nfsd: depend on CRYPTO_MD5 for legacy client tracking
  NFSD fixing possible null pointer derefering in copy offload
  nfsd: check for EBUSY from vfs_rmdir/vfs_unink.
  nfsd: Ensure CLONE persists data and metadata changes to the target file
  SUNRPC: Fix backchannel latency metrics
  nfsd: restore NFSv3 ACL support
  nfsd: v4 support requires CRYPTO_SHA256
  nfsd: Fix cld_net->cn_tfm initialization
  lockd: remove __KERNEL__ ifdefs
  sunrpc: remove __KERNEL__ ifdefs
  race in exportfs_decode_fh()
  nfsd: Drop LIST_HEAD where the variable it declares is never used.
  nfsd: document callback_wq serialization of callback code
  nfsd: mark cb path down on unknown errors
  nfsd: Fix races between nfsd4_cb_release() and nfsd4_shutdown_callback()
  nfsd: minor 4.1 callback cleanup
  SUNRPC: Fix svcauth_gss_proxy_init()
  SUNRPC: Trace gssproxy upcall results
  sunrpc: fix crash when cache_head become valid before update
  nfsd: remove private bin2hex implementation
  ...
2019-12-07 16:56:00 -08:00
Linus Torvalds
fb9bf40cf0 Merge tag 'nfs-for-5.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Features:

   - NFSv4.2 now supports cross device offloaded copy (i.e. offloaded
     copy of a file from one source server to a different target
     server).

   - New RDMA tracepoints for debugging congestion control and Local
     Invalidate WRs.

  Bugfixes and cleanups

   - Drop the NFSv4.1 session slot if nfs4_delegreturn_prepare waits for
     layoutreturn

   - Handle bad/dead sessions correctly in nfs41_sequence_process()

   - Various bugfixes to the delegation return operation.

   - Various bugfixes pertaining to delegations that have been revoked.

   - Cleanups to the NFS timespec code to avoid unnecessary conversions
     between timespec and timespec64.

   - Fix unstable RDMA connections after a reconnect

   - Close race between waking an RDMA sender and posting a receive

   - Wake pending RDMA tasks if connection fails

   - Fix MR list corruption, and clean up MR usage

   - Fix another RPCSEC_GSS issue with MIC buffer space"

* tag 'nfs-for-5.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (79 commits)
  SUNRPC: Capture completion of all RPC tasks
  SUNRPC: Fix another issue with MIC buffer space
  NFS4: Trace lock reclaims
  NFS4: Trace state recovery operation
  NFSv4.2 fix memory leak in nfs42_ssc_open
  NFSv4.2 fix kfree in __nfs42_copy_file_range
  NFS: remove duplicated include from nfs4file.c
  NFSv4: Make _nfs42_proc_copy_notify() static
  NFS: Fallocate should use the nfs4_fattr_bitmap
  NFS: Return -ETXTBSY when attempting to write to a swapfile
  fs: nfs: sysfs: Remove NULL check before kfree
  NFS: remove unneeded semicolon
  NFSv4: add declaration of current_stateid
  NFSv4.x: Drop the slot if nfs4_delegreturn_prepare waits for layoutreturn
  NFSv4.x: Handle bad/dead sessions correctly in nfs41_sequence_process()
  nfsv4: Move NFSPROC4_CLNT_COPY_NOTIFY to end of list
  SUNRPC: Avoid RPC delays when exiting suspend
  NFS: Add a tracepoint in nfs_fh_to_dentry()
  NFSv4: Don't retry the GETATTR on old stateid in nfs4_delegreturn_done()
  NFSv4: Handle NFS4ERR_OLD_STATEID in delegreturn
  ...
2019-12-07 16:50:55 -08:00
Steve French
231e2a0ba5 smb3: improve check for when we send the security descriptor context on create
We had cases in the previous patch where we were sending the security
descriptor context on SMB3 open (file create) in cases when we hadn't
mounted with with "modefromsid" mount option.

Add check for that mount flag before calling ad_sd_context in
open init.

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2019-12-07 17:38:22 -06:00
Linus Torvalds
85190d15f4 pipe: don't use 'pipe_wait() for basic pipe IO
pipe_wait() may be simple, but since it relies on the pipe lock, it
means that we have to do the wakeup while holding the lock.  That's
unfortunate, because the very first thing the waked entity will want to
do is to get the pipe lock for itself.

So get rid of the pipe_wait() usage by simply releasing the pipe lock,
doing the wakeup (if required) and then using wait_event_interruptible()
to wait on the right condition instead.

wait_event_interruptible() handles races on its own by comparing the
wakeup condition before and after adding itself to the wait queue, so
you can use an optimistic unlocked condition for it.

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-07 13:53:09 -08:00
Linus Torvalds
a28c8b9db8 pipe: remove 'waiting_writers' merging logic
This code is ancient, and goes back to when we only had a single page
for the pipe buffers.  The exact history is hidden in the mists of time
(ie "before git", and in fact predates the BK repository too).

At that long-ago point in time, it actually helped to try to merge big
back-and-forth pipe reads and writes, and not limit pipe reads to the
single pipe buffer in length just because that was all we had at a time.

However, since then we've expanded the pipe buffers to multiple pages,
and this logic really doesn't seem to make sense.  And a lot of it is
somewhat questionable (ie "hmm, the user asked for a non-blocking read,
but we see that there's a writer pending, so let's wait anyway to get
the extra data that the writer will have").

But more importantly, it makes the "go to sleep" logic much less
obvious, and considering the wakeup issues we've had, I want to make for
less of those kinds of things.

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-07 13:21:01 -08:00
Linus Torvalds
f467a6a664 pipe: fix and clarify pipe read wakeup logic
This is the read side version of the previous commit: it simplifies the
logic to only wake up waiting writers when necessary, and makes sure to
use a synchronous wakeup.  This time not so much for GNU make jobserver
reasons (that pipe never fills up), but simply to get the writer going
quickly again.

A bit less verbose commentary this time, if only because I assume that
the write side commentary isn't going to be ignored if you touch this
code.

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-07 12:54:26 -08:00
Linus Torvalds
1b6b26ae70 pipe: fix and clarify pipe write wakeup logic
The pipe rework ends up having been extra painful, partly becaused of
actual bugs with ordering and caching of the pipe state, but also
because of subtle performance issues.

In particular, the pipe rework caused the kernel build to inexplicably
slow down.

The reason turns out to be that the GNU make jobserver (which limits the
parallelism of the build) uses a pipe to implement a "token" system: a
parallel submake will read a character from the pipe to get the job
token before starting a new job, and will write a character back to the
pipe when it is done.  The overall job limit is thus easily controlled
by just writing the appropriate number of initial token characters into
the pipe.

But to work well, that really means that the old behavior of write
wakeups being synchronous (WF_SYNC) is very important - when the pipe
writer wakes up a reader, we want the reader to actually get scheduled
immediately.  Otherwise you lose the parallelism of the build.

The pipe rework lost that synchronous wakeup on write, and we had
clearly all forgotten the reasons and rules for it.

This rewrites the pipe write wakeup logic to do the required Wsync
wakeups, but also clarifies the logic and avoids extraneous wakeups.

It also ends up addign a number of comments about what oit does and why,
so that we hopefully don't end up forgetting about this next time we
change this code.

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-07 12:14:28 -08:00
Linus Torvalds
ad910e36da pipe: fix poll/select race introduced by the pipe rework
The kernel wait queues have a basic rule to them: you add yourself to
the wait-queue first, and then you check the things that you're going to
wait on.  That avoids the races with the event you're waiting for.

The same goes for poll/select logic: the "poll_wait()" goes first, and
then you check the things you're polling for.

Of course, if you use locking, the ordering doesn't matter since the
lock will serialize with anything that changes the state you're looking
at. That's not the case here, though.

So move the poll_wait() first in pipe_poll(), before you start looking
at the pipe state.

Fixes: 8cefc107ca ("pipe: Use head and tail pointers for the ring, not cursor and length")
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-07 10:41:17 -08:00
Patrick Steinhardt
38a2204f52 nfsd: depend on CRYPTO_MD5 for legacy client tracking
The legacy client tracking infrastructure of nfsd makes use of MD5 to
derive a client's recovery directory name. As the nfsd module doesn't
declare any dependency on CRYPTO_MD5, though, it may fail to allocate
the hash if the kernel was compiled without it. As a result, generation
of client recovery directories will fail with the following error:

    NFSD: unable to generate recoverydir name

The explicit dependency on CRYPTO_MD5 was removed as redundant back in
6aaa67b5f3 (NFSD: Remove redundant "select" clauses in fs/Kconfig
2008-02-11) as it was already implicitly selected via RPCSEC_GSS_KRB5.
This broke when RPCSEC_GSS_KRB5 was made optional for NFSv4 in commit
df486a2590 (NFS: Fix the selection of security flavours in Kconfig) at
a later point.

Fix the issue by adding back an explicit dependency on CRYPTO_MD5.

Fixes: df486a2590 (NFS: Fix the selection of security flavours in Kconfig)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-12-07 11:28:52 -05:00
Olga Kornievskaia
18f428d4e2 NFSD fixing possible null pointer derefering in copy offload
Static checker revealed possible error path leading to possible
NULL pointer dereferencing.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: e0639dc580: ("NFSD introduce async copy feature")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-12-07 11:28:46 -05:00
David Howells
76f6777c9c pipe: Fix iteration end check in fuse_dev_splice_write()
Fix the iteration end check in fuse_dev_splice_write().  The iterator
position can only be compared with == or != since wrappage may be involved.

Fixes: 8cefc107ca ("pipe: Use head and tail pointers for the ring, not cursor and length")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-06 13:57:04 -08:00
Linus Torvalds
ec057595cb pipe: fix incorrect caching of pipe state over pipe_wait()
Similarly to commit 8f868d68d3 ("pipe: Fix missing mask update after
pipe_wait()") this fixes a case where the pipe rewrite ended up caching
the pipe state incorrectly over a pipe lock drop event.

It wasn't quite as obvious, because you needed to splice data from a
pipe to a file, which is a fairly unusual operation, but it's completely
wrong.

Make sure we load the pipe head/tail/size information only after we've
waited for there to be data in the pipe.

While in that file, also make one of the splice helper functions use the
canonical arghument order for pipe_empty().  That's syntactic - pipe
emptiness is just that head and tail are equal, and thus mixing up head
and tail doesn't really matter.  It's still wrong, though.

Reported-by: David Sterba <dsterba@suse.cz>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-06 12:40:35 -08:00
Steve French
fdef665ba4 smb3: fix mode passed in on create for modetosid mount option
When using the special SID to store the mode bits in an ACE (See
http://technet.microsoft.com/en-us/library/hh509017(v=ws.10).aspx)
which is enabled with mount parm "modefromsid" we were not
passing in the mode via SMB3 create (although chmod was enabled).
SMB3 create allows a security descriptor context to be passed
in (which is more atomic and thus preferable to setting the mode
bits after create via a setinfo).

This patch enables setting the mode bits on create when using
modefromsid mount option.  In addition it fixes an endian
error in the definition of the Control field flags in the SMB3
security descriptor. It also makes the ACE type of the special
SID better match the documentation (and behavior of servers
which use this to store mode bits in SMB3 ACLs).

Signed-off-by: Steve French <stfrench@microsoft.com>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2019-12-06 14:15:52 -06:00
Linus Torvalds
9feb1af97e Merge tag 'for-linus-20191205' of git://git.kernel.dk/linux-block
Pull more block and io_uring updates from Jens Axboe:
 "I wasn't expecting this to be so big, and if I was, I would have used
  separate branches for this. Going forward I'll be doing separate
  branches for the current tree, just like for the next kernel version
  tree. In any case, this contains:

   - Series from Christoph that fixes an inherent race condition with
     zoned devices and revalidation.

   - null_blk zone size fix (Damien)

   - Fix for a regression in this merge window that caused busy spins by
     sending empty disk uevents (Eric)

   - Fix for a regression in this merge window for bfq stats (Hou)

   - Fix for io_uring creds allocation failure handling (me)

   - io_uring -ERESTARTSYS send/recvmsg fix (me)

   - Series that fixes the need for applications to retain state across
     async request punts for io_uring. This one is a bit larger than I
     would have hoped, but I think it's important we get this fixed for
     5.5.

   - connect(2) improvement for io_uring, handling EINPROGRESS instead
     of having applications needing to poll for it (me)

   - Have io_uring use a hash for poll requests instead of an rbtree.
     This turned out to work much better in practice, so I think we
     should make the switch now. For some workloads, even with a fair
     amount of cancellations, the insertion sort is just too expensive.
     (me)

   - Various little io_uring fixes (me, Jackie, Pavel, LimingWu)

   - Fix for brd unaligned IO, and a warning for the future (Ming)

   - Fix for a bio integrity data leak (Justin)

   - bvec_iter_advance() improvement (Pavel)

   - Xen blkback page unmap fix (SeongJae)

  The major items in here are all well tested, and on the liburing side
  we continue to add regression and feature test cases. We're up to 50
  topic cases now, each with anywhere from 1 to more than 10 cases in
  each"

* tag 'for-linus-20191205' of git://git.kernel.dk/linux-block: (33 commits)
  block: fix memleak of bio integrity data
  io_uring: fix a typo in a comment
  bfq-iosched: Ensure bio->bi_blkg is valid before using it
  io_uring: hook all linked requests via link_list
  io_uring: fix error handling in io_queue_link_head
  io_uring: use hash table for poll command lookups
  io-wq: clear node->next on list deletion
  io_uring: ensure deferred timeouts copy necessary data
  io_uring: allow IO_SQE_* flags on IORING_OP_TIMEOUT
  null_blk: remove unused variable warning on !CONFIG_BLK_DEV_ZONED
  brd: warn on un-aligned buffer
  brd: remove max_hw_sectors queue limit
  xen/blkback: Avoid unmapping unmapped grant pages
  io_uring: handle connect -EINPROGRESS like -EAGAIN
  block: set the zone size in blk_revalidate_disk_zones atomically
  block: don't handle bio based drivers in blk_revalidate_disk_zones
  block: allocate the zone bitmaps lazily
  block: replace seq_zones_bitmap with conv_zones_bitmap
  block: simplify blkdev_nr_zones
  block: remove the empty line at the end of blk-zoned.c
  ...
2019-12-06 10:08:59 -08:00
Linus Torvalds
0aecba6173 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs d_inode/d_flags memory ordering fixes from Al Viro:
 "Fallout from tree-wide audit for ->d_inode/->d_flags barriers use.
  Basically, the problem is that negative pinned dentries require
  careful treatment - unless ->d_lock is locked or parent is held at
  least shared, another thread can make them positive right under us.

  Most of the uses turned out to be safe - the main surprises as far as
  filesystems are concerned were

   - race in dget_parent() fastpath, that might end up with the caller
     observing the returned dentry _negative_, due to insufficient
     barriers. It is positive in memory, but we could end up seeing the
     wrong value of ->d_inode in CPU cache. Fixed.

   - manual checks that result of lookup_one_len_unlocked() is positive
     (and rejection of negatives). Again, insufficient barriers (we
     might end up with inconsistent observed values of ->d_inode and
     ->d_flags). Fixed by switching to a new primitive that does the
     checks itself and returns ERR_PTR(-ENOENT) instead of a negative
     dentry. That way we get rid of boilerplate converting negatives
     into ERR_PTR(-ENOENT) in the callers and have a single place to
     deal with the barrier-related mess - inside fs/namei.c rather than
     in every caller out there.

  The guts of pathname resolution *do* need to be careful - the race
  found by Ritesh is real, as well as several similar races.
  Fortunately, it turns out that we can take care of that with fairly
  local changes in there.

  The tree-wide audit had not been fun, and I hate the idea of repeating
  it. I think the right approach would be to annotate the places where
  we are _not_ guaranteed ->d_inode/->d_flags stability and have sparse
  catch regressions. But I'm still not sure what would be the least
  invasive way of doing that and it's clearly the next cycle fodder"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs/namei.c: fix missing barriers when checking positivity
  fix dget_parent() fastpath race
  new helper: lookup_positive_unlocked()
  fs/namei.c: pull positivity check into follow_managed()
2019-12-06 09:06:58 -08:00
Linus Torvalds
b0d4beaa5a Merge branch 'next.autofs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull autofs updates from Al Viro:
 "autofs misuses checks for ->d_subdirs emptiness; the cursors are in
  the same lists, resulting in false negatives. It's not needed anyway,
  since autofs maintains counter in struct autofs_info, containing 0 for
  removed ones, 1 for live symlinks and 1 + number of children for live
  directories, which is precisely what we need for those checks.

  This series switches to use of that counter and untangles the crap
  around its uses (it needs not be atomic and there's a bunch of
  completely pointless "defensive" checks).

  This fell out of dcache_readdir work; the main point is to get rid of
  ->d_subdirs abuses in there. I've more followup cleanups, but I hadn't
  run those by Ian yet, so they can go next cycle"

* 'next.autofs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  autofs: don't bother with atomics for ino->count
  autofs_dir_rmdir(): check ino->count for deciding whether it's empty...
  autofs: get rid of pointless checks around ->count handling
  autofs_clear_leaf_automount_flags(): use ino->count instead of ->d_subdirs
2019-12-05 17:11:48 -08:00
Linus Torvalds
da73fcd8cf Merge branch 'pipe-rework' (patches from David Howells)
Merge two fixes for the pipe rework from David Howells:
 "Here are a couple of patches to fix bugs syzbot found in the pipe
  changes:

   - An assertion check will sometimes trip when polling a pipe because
     the ring size and indices used are approximate and may be being
     changed simultaneously.

     An equivalent approximate calculation was done previously, but
     without the assertion check, so I've just dropped the check. To
     make it accurate, the pipe mutex would need to be taken or the spin
     lock could be used - but usage of the spinlock would need to be
     rolled out into splice, iov_iter and other places for that.

   - The index mask and the max_usage values cannot be cached across
     pipe_wait() as F_SETPIPE_SZ could have been called during the wait.
     This can cause pipe_write() to break"

* pipe-rework:
  pipe: Fix missing mask update after pipe_wait()
  pipe: Remove assertion from pipe_poll()
2019-12-05 16:35:53 -08:00
David Howells
8f868d68d3 pipe: Fix missing mask update after pipe_wait()
Fix pipe_write() to not cache the ring index mask and max_usage as their
values are invalidated by calling pipe_wait() because the latter
function drops the pipe lock, thereby allowing F_SETPIPE_SZ change them.
Without this, pipe_write() may subsequently miscalculate the array
indices and pipe fullness, leading to an oops like the following:

  BUG: KASAN: slab-out-of-bounds in pipe_write+0xc25/0xe10 fs/pipe.c:481
  Write of size 8 at addr ffff8880771167a8 by task syz-executor.3/7987
  ...
  CPU: 1 PID: 7987 Comm: syz-executor.3 Not tainted 5.4.0-rc2-syzkaller #0
  ...
  Call Trace:
    pipe_write+0xc25/0xe10 fs/pipe.c:481
    call_write_iter include/linux/fs.h:1895 [inline]
    new_sync_write+0x3fd/0x7e0 fs/read_write.c:483
    __vfs_write+0x94/0x110 fs/read_write.c:496
    vfs_write+0x18a/0x520 fs/read_write.c:558
    ksys_write+0x105/0x220 fs/read_write.c:611
    __do_sys_write fs/read_write.c:623 [inline]
    __se_sys_write fs/read_write.c:620 [inline]
    __x64_sys_write+0x6e/0xb0 fs/read_write.c:620
    do_syscall_64+0xca/0x5d0 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

This is not a problem for pipe_read() as the mask is recalculated on
each pass of the loop, after pipe_wait() has been called.

Fixes: 8cefc107ca ("pipe: Use head and tail pointers for the ring, not cursor and length")
Reported-by: syzbot+838eb0878ffd51f27c41@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Eric Biggers <ebiggers@kernel.org>
[ Changed it to use a temporary variable 'mask' to avoid long lines -Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-05 15:56:20 -08:00
David Howells
8c7b8c34ae pipe: Remove assertion from pipe_poll()
An assertion check was added to pipe_poll() to make sure that the ring
occupancy isn't seen to overflow the ring size.  However, since no locks
are held when the three values are read, it is possible for F_SETPIPE_SZ
to intervene and muck up the calculation, thereby causing the oops.

Fix this by simply removing the assertion and accepting that the
calculation might be approximate.

Note that the previous code also had a similar issue, though there was
no assertion check, since the occupancy counter and the ring size were
not read with a lock held, so it's possible that the poll check might
have malfunctioned then too.

Also wake up all the waiters so that they can reissue their checks if
there was a competing read or write.

Fixes: 8cefc107ca ("pipe: Use head and tail pointers for the ring, not cursor and length")
Reported-by: syzbot+d37abaade33a934f16f2@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-05 15:33:50 -08:00
Linus Torvalds
3f1266ec70 Merge tag 'gfs2-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull GFS2 updates from Andreas Gruenbacher:
 "Bob's extensive filesystem withdrawal and recovery testing:
   - don't write log headers after file system withdraw
   - clean up iopen glock mess in gfs2_create_inode
   - close timing window with GLF_INVALIDATE_IN_PROGRESS
   - abort gfs2_freeze if io error is seen
   - don't loop forever in gfs2_freeze if withdrawn
   - fix infinite loop in gfs2_ail1_flush on io error
   - introduce function gfs2_withdrawn
   - fix glock reference problem in gfs2_trans_remove_revoke

  Filesystems with a block size smaller than the page size:
   - fix end-of-file handling in gfs2_page_mkwrite
   - improve mmap write vs. punch_hole consistency

  Other:
   - remove active journal side effect from gfs2_write_log_header
   - multi-block allocations in gfs2_page_mkwrite

  Minor cleanups and coding style fixes:
   - remove duplicate call from gfs2_create_inode
   - make gfs2_log_shutdown static
   - make gfs2_fs_parameters static
   - some whitespace cleanups
   - removed unnecessary semicolon"

* tag 'gfs2-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Don't write log headers after file system withdraw
  gfs2: Remove duplicate call from gfs2_create_inode
  gfs2: clean up iopen glock mess in gfs2_create_inode
  gfs2: Close timing window with GLF_INVALIDATE_IN_PROGRESS
  gfs2: Abort gfs2_freeze if io error is seen
  gfs2: Don't loop forever in gfs2_freeze if withdrawn
  gfs2: fix infinite loop in gfs2_ail1_flush on io error
  gfs2: Introduce function gfs2_withdrawn
  gfs2: fix glock reference problem in gfs2_trans_remove_revoke
  gfs2: make gfs2_log_shutdown static
  gfs2: Remove active journal side effect from gfs2_write_log_header
  gfs2: Fix end-of-file handling in gfs2_page_mkwrite
  gfs2: Multi-block allocations in gfs2_page_mkwrite
  gfs2: Improve mmap write vs. punch_hole consistency
  gfs2: make gfs2_fs_parameters static
  gfs2: Some whitespace cleanups
  gfs2: removed unnecessary semicolon
2019-12-05 13:20:11 -08:00