Commit 49246466a9 ("fsnotify: move fsnotify_nameremove() hook out of
d_delete()") moved the fsnotify delete hook before d_delete() so fsnotify
will have access to a positive dentry.
This allowed a race where opening the deleted file via cached dentry
is now possible after receiving the IN_DELETE event.
To fix the regression in pseudo filesystems, convert d_delete() calls
to d_drop() (see commit 46c46f8df9 ("devpts_pty_kill(): don't bother
with d_delete()") and move the fsnotify hook after d_drop().
Add a missing fsnotify_unlink() hook in nfsdfs that was found during
the audit of fsnotify hooks in pseudo filesystems.
Note that the fsnotify hooks in simple_recursive_removal() follow
d_invalidate(), so they require no change.
Link: https://lore.kernel.org/r/20220120215305.282577-2-amir73il@gmail.com
Reported-by: Ivan Delalande <colona@arista.com>
Link: https://lore.kernel.org/linux-fsdevel/YeNyzoDM5hP5LtGW@visor/
Fixes: 49246466a9 ("fsnotify: move fsnotify_nameremove() hook out of d_delete()")
Cc: stable@vger.kernel.org # v5.3+
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
When configfs_lookup() is executing list_for_each_entry(),
it is possible that configfs_dir_lseek() is calling list_del().
Some unfortunate interleavings of them can cause a kernel NULL
pointer dereference error
Thread 1 Thread 2
//configfs_dir_lseek() //configfs_lookup()
list_del(&cursor->s_sibling);
list_for_each_entry(sd, ...)
Fix this by grabbing configfs_dirent_lock in configfs_lookup()
while iterating ->s_children.
Signed-off-by: Sishuai Gong <sishuai@purdue.edu>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Instead of appending new text attribute data at the offset specified by the
write() system call, only pass the newly written data to the .store()
callback.
Reported-by: Bodo Stroesser <bostroesser@gmail.com>
Tested-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Commit 7fe1e79b59 ("configfs: implement the .read_iter and .write_iter
methods") changed the simple_read_from_buffer() calls into copy_to_iter()
calls and the simple_write_to_buffer() calls into copy_from_iter() calls.
The simple*buffer() methods update the file offset (*ppos) but the read
and write iterators not yet. Make the read and write iterators update the
file offset (iocb->ki_pos).
This patch has been tested as follows:
# modprobe target_core_user
# dd if=/sys/kernel/config/target/dbroot bs=1
/var/target
12+0 records in
12+0 records out
12 bytes copied, 9.5539e-05 s, 126 kB/s
# cd /sys/kernel/config/acpi/table
# mkdir test
# cd test
# dmesg -c >/dev/null; printf 'SSDT\x8\0\0\0abcdefghijklmnopqrstuvwxyz' | dd of=aml bs=1; dmesg -c
34+0 records in
34+0 records out
34 bytes copied, 0.010627 s, 3.2 kB/s
[ 261.056551] ACPI configfs: invalid table length
Reported-by: Yanko Kaneti <yaneti@declera.com>
Cc: Yanko Kaneti <yaneti@declera.com>
Fixes: 7fe1e79b59 ("configfs: implement the .read_iter and .write_iter methods")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Pull configfs updates from Christoph Hellwig:
- fix a memleak in configfs_release_bin_file (Chung-Chiang Cheng)
- implement the .read_iter and .write_iter (Bart Van Assche)
- minor cleanups (Bart Van Assche, me)
* tag 'configfs-5.13' of git://git.infradead.org/users/hch/configfs:
configfs: simplify configfs_release_bin_file
configfs: fix memleak in configfs_release_bin_file
configfs: implement the .read_iter and .write_iter methods
configfs: drop pointless kerneldoc comments
configfs: fix the kerneldoc comment for configfs_create_bin_file
When reading binary attributes in progress, buffer->bin_buffer is setup in
configfs_read_bin_file() but never freed.
Fixes: 03607ace80 ("configfs: implement binary attributes")
Signed-off-by: Chung-Chiang Cheng <cccheng@synology.com>
[hch: move the vfree rather than duplicating it]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Configfs is one of the few filesystems that does not yet support the
.read_iter and .write_iter callbacks. This patch adds support for these
methods in configfs.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
[hch: split out a separate fix]
Signed-off-by: Christoph Hellwig <hch@lst.de>
file.c has a bunch of kerneldoc comments for static functions that do not
document any API but just list what is done. Drop them.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Mention the correct argument name.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
[hch: split from a larger patch]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Commit b0841eefd9 ("configfs: provide exclusion between IO and removals")
uses ->frag_dead to mark the fragment state, thus no bothering with extra
refcount on config_item when opening a file. The configfs_get_config_item
was removed in __configfs_open_file, but not with config_item_put. So the
refcount on config_item will lost its balance, causing use-after-free
issues in some occasions like this:
Test:
1. Mount configfs on /config with read-only items:
drwxrwx--- 289 root root 0 2021-04-01 11:55 /config
drwxr-xr-x 2 root root 0 2021-04-01 11:54 /config/a
--w--w--w- 1 root root 4096 2021-04-01 11:53 /config/a/1.txt
......
2. Then run:
for file in /config
do
echo $file
grep -R 'key' $file
done
3. __configfs_open_file will be called in parallel, the first one
got called will do:
if (file->f_mode & FMODE_READ) {
if (!(inode->i_mode & S_IRUGO))
goto out_put_module;
config_item_put(buffer->item);
kref_put()
package_details_release()
kfree()
the other one will run into use-after-free issues like this:
BUG: KASAN: use-after-free in __configfs_open_file+0x1bc/0x3b0
Read of size 8 at addr fffffff155f02480 by task grep/13096
CPU: 0 PID: 13096 Comm: grep VIP: 00 Tainted: G W 4.14.116-kasan #1
TGID: 13096 Comm: grep
Call trace:
dump_stack+0x118/0x160
kasan_report+0x22c/0x294
__asan_load8+0x80/0x88
__configfs_open_file+0x1bc/0x3b0
configfs_open_file+0x28/0x34
do_dentry_open+0x2cc/0x5c0
vfs_open+0x80/0xe0
path_openat+0xd8c/0x2988
do_filp_open+0x1c4/0x2fc
do_sys_open+0x23c/0x404
SyS_openat+0x38/0x48
Allocated by task 2138:
kasan_kmalloc+0xe0/0x1ac
kmem_cache_alloc_trace+0x334/0x394
packages_make_item+0x4c/0x180
configfs_mkdir+0x358/0x740
vfs_mkdir2+0x1bc/0x2e8
SyS_mkdirat+0x154/0x23c
el0_svc_naked+0x34/0x38
Freed by task 13096:
kasan_slab_free+0xb8/0x194
kfree+0x13c/0x910
package_details_release+0x524/0x56c
kref_put+0xc4/0x104
config_item_put+0x24/0x34
__configfs_open_file+0x35c/0x3b0
configfs_open_file+0x28/0x34
do_dentry_open+0x2cc/0x5c0
vfs_open+0x80/0xe0
path_openat+0xd8c/0x2988
do_filp_open+0x1c4/0x2fc
do_sys_open+0x23c/0x404
SyS_openat+0x38/0x48
el0_svc_naked+0x34/0x38
To fix this issue, remove the config_item_put in
__configfs_open_file to balance the refcount of config_item.
Fixes: b0841eefd9 ("configfs: provide exclusion between IO and removals")
Signed-off-by: Daiyue Zhang <zhangdaiyue1@huawei.com>
Signed-off-by: Yi Chen <chenyi77@huawei.com>
Signed-off-by: Ge Qiu <qiuge@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Extend some inode methods with an additional user namespace argument. A
filesystem that is aware of idmapped mounts will receive the user
namespace the mount has been marked with. This can be used for
additional permission checking and also to enable filesystems to
translate between uids and gids if they need to. We have implemented all
relevant helpers in earlier patches.
As requested we simply extend the exisiting inode method instead of
introducing new ones. This is a little more code churn but it's mostly
mechanical and doesnt't leave us with additional inode methods.
Link: https://lore.kernel.org/r/20210121131959.646623-25-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
The two helpers inode_permission() and generic_permission() are used by
the vfs to perform basic permission checking by verifying that the
caller is privileged over an inode. In order to handle idmapped mounts
we extend the two helpers with an additional user namespace argument.
On idmapped mounts the two helpers will make sure to map the inode
according to the mount's user namespace and then peform identical
permission checks to inode_permission() and generic_permission(). If the
initial user namespace is passed nothing changes so non-idmapped mounts
will see identical behavior as before.
Link: https://lore.kernel.org/r/20210121131959.646623-6-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Add explanation for 'frag' parameter to avoid kernel-doc issue:
fs/configfs/dir.c:277: warning: Function parameter or member 'frag' not
described in 'configfs_create_dir'
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Christoph Hellwig <hch@lst.de>
Pull documentation updates from Jonathan Corbet:
"A fair amount of stuff this time around, dominated by yet another
massive set from Mauro toward the completion of the RST conversion. I
*really* hope we are getting close to the end of this. Meanwhile,
those patches reach pretty far afield to update document references
around the tree; there should be no actual code changes there. There
will be, alas, more of the usual trivial merge conflicts.
Beyond that we have more translations, improvements to the sphinx
scripting, a number of additions to the sysctl documentation, and lots
of fixes"
* tag 'docs-5.8' of git://git.lwn.net/linux: (130 commits)
Documentation: fixes to the maintainer-entry-profile template
zswap: docs/vm: Fix typo accept_threshold_percent in zswap.rst
tracing: Fix events.rst section numbering
docs: acpi: fix old http link and improve document format
docs: filesystems: add info about efivars content
Documentation: LSM: Correct the basic LSM description
mailmap: change email for Ricardo Ribalda
docs: sysctl/kernel: document unaligned controls
Documentation: admin-guide: update bug-hunting.rst
docs: sysctl/kernel: document ngroups_max
nvdimm: fixes to maintainter-entry-profile
Documentation/features: Correct RISC-V kprobes support entry
Documentation/features: Refresh the arch support status files
Revert "docs: sysctl/kernel: document ngroups_max"
docs: move locking-specific documents to locking/
docs: move digsig docs to the security book
docs: move the kref doc into the core-api book
docs: add IRQ documentation at the core-api book
docs: debugging-via-ohci1394.txt: add it to the core-api book
docs: fix references for ipmi.rst file
...
- Add a SPDX header;
- Adjust document and section titles;
- Use copyright symbol;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add it to filesystems/index.rst.
Also, as this file is alone on its own dir, and it doesn't
seem too likely that other documents will follow it, let's
move it to the filesystems/ root documentation dir.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/c2424ec2ad4d735751434ff7f52144c44aa02d5a.1588021877.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
configfs_rmdir() invokes configfs_get_config_item(), which returns a
reference of the specified config_item object to "parent_item" with
increased refcnt.
When configfs_rmdir() returns, local variable "parent_item" becomes
invalid, so the refcount should be decreased to keep refcount balanced.
The reference counting issue happens in one exception handling path of
configfs_rmdir(). When down_write_killable() fails, the function forgets
to decrease the refcnt increased by configfs_get_config_item(), causing
a refcnt leak.
Fix this issue by calling config_item_put() when down_write_killable()
fails.
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Push clamping timestamps into notify_change(), so in-kernel
callers like nfsd and overlayfs will get similar timestamp
set behavior as utimes.
AV: get rid of clamping in ->setattr() instances; we don't need
to bother with that there, with notify_change() doing normalization
in all cases now (it already did for implicit case, since current_time()
clamps).
Suggested-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 42e729b9dd ("utimes: Clamp the timestamps before update")
Cc: stable@vger.kernel.org # v5.4
Cc: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>