Pull nfsd fixes from Chuck Lever:
"Notable bug fixes:
- Ensure SM_NOTIFY doesn't crash the NFS server host
- Ensure NLM locks are cleaned up after client reboot
- Fix a leak of internal NFSv4 lease information"
* tag 'nfsd-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: nfsd4_setclientid_confirm mistakenly expires confirmed client.
lockd: fix failure to cleanup client locks
lockd: fix server crash on reboot of client holding lock
Pull fanotify fix from Jan Kara:
"Fix stale file descriptor in copy_event_to_user"
* tag 'fsnotify_for_v5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fanotify: Fix stale file descriptor in copy_event_to_user()
Pull overlayfs fixes from Miklos Szeredi:
"Fix a regression introduced in v5.15, affecting copy up of files with
'noatime' or 'sync' attributes to a tmpfs upper layer"
* tag 'ovl-fixes-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: don't fail copy up if no fileattr support on upper
ovl: fix NULL pointer dereference in copy up warning
Pull unicode cleanup from Gabriel Krisman Bertazi:
"A fix from Christoph Hellwig merging the CONFIG_UNICODE_UTF8_DATA into
the previous CONFIG_UNICODE. It is -rc material since we don't want to
expose the former symbol on 5.17.
This has been living on linux-next for the past week"
* tag 'unicode-for-next-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode:
unicode: clean up the Kconfig symbol confusion
This code calls fd_install() which gives the userspace access to the fd.
Then if copy_info_records_to_user() fails it calls put_unused_fd(fd) but
that will not release it and leads to a stale entry in the file
descriptor table.
Generally you can't trust the fd after a call to fd_install(). The fix
is to delay the fd_install() until everything else has succeeded.
Fortunately it requires CAP_SYS_ADMIN to reach this code so the security
impact is less.
Fixes: f644bc449b ("fanotify: fix copy_event_to_user() fid error clean up")
Link: https://lore.kernel.org/r/20220128195656.GA26981@kili
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Jan Kara <jack@suse.cz>
We should unregister the table upon module unload otherwise something
horrible will happen when we load binfmt_misc module again. Also note
that we should keep value returned by register_sysctl_mount_point() and
release it later, otherwise it will leak.
Also, per Christian's comment, to fully restore the old behavior that
won't break userspace the check(binfmt_misc_header) should be
eliminated.
To reproduce:
modprobe binfmt_misc
modprobe -r binfmt_misc
modprobe binfmt_misc
modprobe -r binfmt_misc
modprobe binfmt_misc
resulting in
modprobe: can't load module binfmt_misc (kernel/fs/binfmt_misc.ko): Cannot allocate memory
and an unhappy kernel:
binfmt_misc: Failed to create fs/binfmt_misc sysctl mount point
binfmt_misc: Failed to create fs/binfmt_misc sysctl mount point
BUG: unable to handle page fault for address: fffffbfff8004802
Call Trace:
init_misc_binfmt+0x2d/0x1000 [binfmt_misc]
Link: https://lkml.kernel.org/r/20220124181812.1869535-2-ztong0001@gmail.com
Fixes: 3ba442d533 ("fs: move binfmt_misc sysctl to its own file")
Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Co-developed-by: Christian Brauner<brauner@kernel.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull io_uring fixes from Jens Axboe:
"Just two small fixes this time:
- Fix a bug that can lead to node registration taking 1 second, when
it should finish much quicker (Dylan)
- Remove an unused argument from a function (Usama)"
* tag 'io_uring-5.17-2022-01-28' of git://git.kernel.dk/linux-block:
io_uring: remove unused argument from io_rsrc_node_alloc
io_uring: fix bug in slow unregistering of nodes
Pull ceph fixes from Ilya Dryomov:
"A ZERO_SIZE_PTR dereference fix from Xiubo and two fixes for async
creates interacting with pool namespace-constrained OSD permissions
from Jeff (marked for stable)"
* tag 'ceph-for-5.17-rc2' of git://github.com/ceph/ceph-client:
ceph: set pool_ns in new inode layout for async creates
ceph: properly put ceph_string reference after async create attempt
ceph: put the requests/sessions when it fails to alloc memory
The kernel test robot reports that commit c42ff46f97 ("ocfs2: simplify
subdirectory registration with register_sysctl()") is broken, and
results in kernel warning messages like
sysctl table check failed: fs/ocfs2/nm Not a file
sysctl table check failed: fs/ocfs2/nm No proc_handler
sysctl table check failed: fs/ocfs2/nm bogus .mode 0555
and in fact this was already reported back in linux-next, but nobody
seems to have reacted to that report. Possibly that original report
only ever made it to the lkp list.
The problem seems to be that the simplification didn't actually go far
enough, and should have converted the whole directory path to the final
sysctl file, rather than just the two first components.
So take that last step.
Fixes: c42ff46f97 ("ocfs2: simplify subdirectory registration with register_sysctl()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/all/20220128065310.GF8421@xsang-OptiPlex-9020/
Link: https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/KQ2F6TPJWMDVEXJM4WTUC4DU3EH3YJVT/
Tested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull fsnotify fixes from Jan Kara:
"Fixes for userspace breakage caused by fsnotify changes ~3 years ago
and one fanotify cleanup"
* tag 'fsnotify_for_v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fsnotify: fix fsnotify hooks in pseudo filesystems
fsnotify: invalidate dcache before IN_DELETE event
fanotify: remove variable set but not used
Pull udf and quota fixes from Jan Kara:
"Fixes for crashes in UDF when inode expansion fails and one quota
cleanup"
* tag 'fs_for_v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
quota: cleanup double word in comment
udf: Restore i_lenAlloc when inode expansion fails
udf: Fix NULL ptr deref when converting from inline format
From RFC 7530 Section 16.34.5:
o The server has not recorded an unconfirmed { v, x, c, *, * } and
has recorded a confirmed { v, x, c, *, s }. If the principals of
the record and of SETCLIENTID_CONFIRM do not match, the server
returns NFS4ERR_CLID_INUSE without removing any relevant leased
client state, and without changing recorded callback and
callback_ident values for client { x }.
The current code intends to do what the spec describes above but
it forgot to set 'old' to NULL resulting to the confirmed client
to be expired.
Fixes: 2b63482185 ("nfsd: fix clid_inuse on mount with security change")
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Bruce Fields <bfields@fieldses.org>
In my testing, we're sometimes hitting the request->fl_flags & FL_EXISTS
case in posix_lock_inode, presumably just by random luck since we're not
actually initializing fl_flags here.
This probably didn't matter before commit 7f024fcd5c ("Keep read and
write fds with each nlm_file") since we wouldn't previously unlock
unless we knew there were locks.
But now it causes lockd to give up on removing more locks.
We could just initialize fl_flags, but really it seems dubious to be
calling vfs_lock_file with random values in some of the fields.
Fixes: 7f024fcd5c ("Keep read and write fds with each nlm_file")
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
[ cel: fixed checkpatch.pl nit ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Dan reported that he was unable to write to files that had been
asynchronously created when the client's OSD caps are restricted to a
particular namespace.
The issue is that the layout for the new inode is only partially being
filled. Ensure that we populate the pool_ns_data and pool_ns_len in the
iinfo before calling ceph_fill_inode.
Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/54013
Fixes: 9a8d03ca2e ("ceph: attempt to do async create when possible")
Reported-by: Dan van der Ster <dan@vanderster.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
When failing to allocate the sessions memory we should make sure
the req1 and req2 and the sessions get put. And also in case the
max_sessions decreased so when kreallocate the new memory some
sessions maybe missed being put.
And if the max_sessions is 0 krealloc will return ZERO_SIZE_PTR,
which will lead to a distinct access fault.
URL: https://tracker.ceph.com/issues/53819
Fixes: e1a4541ec0 ("ceph: flush the mdlog before waiting on unsafe reqs")
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Pull NFS client updates from Anna Schumaker:
"New Features:
- Basic handling for case insensitive filesystems
- Initial support for fs_locations and server trunking
Bugfixes and Cleanups:
- Cleanups to how the "struct cred *" is handled for the
nfs_access_entry
- Ensure the server has an up to date ctimes before hardlinking or
renaming
- Update 'blocks used' after writeback, fallocate, and clone
- nfs_atomic_open() fixes
- Improvements to sunrpc tracing
- Various null check & indenting related cleanups
- Some improvements to the sunrpc sysfs code:
- Use default_groups in kobj_type
- Fix some potential races and reference leaks
- A few tracepoint cleanups in xprtrdma"
[ This should have gone in during the merge window, but didn't. The
original pull request - sent during the merge window - had gotten
marked as spam and discarded due missing DKIM headers in the email
from Anna. - Linus ]
* tag 'nfs-for-5.17-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (35 commits)
SUNRPC: Don't dereference xprt->snd_task if it's a cookie
xprtrdma: Remove definitions of RPCDBG_FACILITY
xprtrdma: Remove final dprintk call sites from xprtrdma
sunrpc: Fix potential race conditions in rpc_sysfs_xprt_state_change()
net/sunrpc: fix reference count leaks in rpc_sysfs_xprt_state_change
NFSv4.1 test and add 4.1 trunking transport
SUNRPC allow for unspecified transport time in rpc_clnt_add_xprt
NFSv4 handle port presence in fs_location server string
NFSv4 expose nfs_parse_server_name function
NFSv4.1 query for fs_location attr on a new file system
NFSv4 store server support for fs_location attribute
NFSv4 remove zero number of fs_locations entries error check
NFSv4: nfs_atomic_open() can race when looking up a non-regular file
NFSv4: Handle case where the lookup of a directory fails
NFSv42: Fallocate and clone should also request 'blocks used'
NFSv4: Allow writebacks to request 'blocks used'
SUNRPC: use default_groups in kobj_type
NFS: use default_groups in kobj_type
NFS: Fix the verifier for case sensitive filesystem in nfs_atomic_open()
NFS: Add a helper to remove case-insensitive aliases
...
Pull btrfs fixes from David Sterba:
"Several fixes for defragmentation that got broken in 5.16 after
refactoring and added subpage support. The observed bugs are excessive
IO or uninterruptible ioctl.
All stable material"
* tag 'for-5.17-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: update writeback index when starting defrag
btrfs: add back missing dirty page rate limiting to defrag
btrfs: fix deadlock when reserving space during defrag
btrfs: defrag: properly update range->start for autodefrag
btrfs: defrag: fix wrong number of defragged sectors
btrfs: allow defrag to be interruptible
btrfs: fix too long loop when defragging a 1 byte file
When starting a defrag, we should update the writeback index of the
inode's mapping in case it currently has a value beyond the start of the
range we are defragging. This can help performance and often result in
getting less extents after writeback - for e.g., if the current value
of the writeback index sits somewhere in the middle of a range that
gets dirty by the defrag, then after writeback we can get two smaller
extents instead of a single, larger extent.
We used to have this before the refactoring in 5.16, but it was removed
without any reason to do so. Originally it was added in kernel 3.1, by
commit 2a0f7f5769 ("Btrfs: fix recursive auto-defrag"), in order to
fix a loop with autodefrag resulting in dirtying and writing pages over
and over, but some testing on current code did not show that happening,
at least with the test described in that commit.
So add back the behaviour, as at the very least it is a nice to have
optimization.
Fixes: 7b508037d4 ("btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file()")
CC: stable@vger.kernel.org # 5.16
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
A defrag operation can dirty a lot of pages, specially if operating on
the entire file or a large file range. Any task dirtying pages should
periodically call balance_dirty_pages_ratelimited(), as stated in that
function's comments, otherwise they can leave too many dirty pages in
the system. This is what we did before the refactoring in 5.16, and
it should have remained, just like in the buffered write path and
relocation. So restore that behaviour.
Fixes: 7b508037d4 ("btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file()")
CC: stable@vger.kernel.org # 5.16
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When defragging we can end up collecting a range for defrag that has
already pages under delalloc (dirty), as long as the respective extent
map for their range is not mapped to a hole, a prealloc extent or
the extent map is from an old generation.
Most of the time that is harmless from a functional perspective at
least, however it can result in a deadlock:
1) At defrag_collect_targets() we find an extent map that meets all
requirements but there's delalloc for the range it covers, and we add
its range to list of ranges to defrag;
2) The defrag_collect_targets() function is called at defrag_one_range(),
after it locked a range that overlaps the range of the extent map;
3) At defrag_one_range(), while the range is still locked, we call
defrag_one_locked_target() for the range associated to the extent
map we collected at step 1);
4) Then finally at defrag_one_locked_target() we do a call to
btrfs_delalloc_reserve_space(), which will reserve data and metadata
space. If the space reservations can not be satisfied right away, the
flusher might be kicked in and start flushing delalloc and wait for
the respective ordered extents to complete. If this happens we will
deadlock, because both flushing delalloc and finishing an ordered
extent, requires locking the range in the inode's io tree, which was
already locked at defrag_collect_targets().
So fix this by skipping extent maps for which there's already delalloc.
Fixes: eb793cf857 ("btrfs: defrag: introduce helper to collect target file extents")
CC: stable@vger.kernel.org # 5.16
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When we fail to expand inode from inline format to a normal format, we
restore inode to contain the original inline formatting but we forgot to
set i_lenAlloc back. The mismatch between i_lenAlloc and i_size was then
causing further problems such as warnings and lost data down the line.
Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com>
CC: stable@vger.kernel.org
Fixes: 7e49b6f248 ("udf: Convert UDF to new truncate calling sequence")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>