Commit Graph

1389 Commits

Author SHA1 Message Date
Linus Torvalds
13d88ac54d Merge tag 'vfs-6.7.fsid' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fanotify fsid updates from Christian Brauner:
 "This work is part of the plan to enable fanotify to serve as a drop-in
  replacement for inotify. While inotify is availabe on all filesystems,
  fanotify currently isn't.

  In order to support fanotify on all filesystems two things are needed:

   (1) all filesystems need to support AT_HANDLE_FID

   (2) all filesystems need to report a non-zero f_fsid

  This contains (1) and allows filesystems to encode non-decodable file
  handlers for fanotify without implementing any exportfs operations by
  encoding a file id of type FILEID_INO64_GEN from i_ino and
  i_generation.

  Filesystems that want to opt out of encoding non-decodable file ids
  for fanotify that don't support NFS export can do so by providing an
  empty export_operations struct.

  This also partially addresses (2) by generating f_fsid for simple
  filesystems as well as freevxfs. Remaining filesystems will be dealt
  with by separate patches.

  Finally, this contains the patch from the current exportfs maintainers
  which moves exportfs under vfs with Chuck, Jeff, and Amir as
  maintainers and vfs.git as tree"

* tag 'vfs-6.7.fsid' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  MAINTAINERS: create an entry for exportfs
  fs: fix build error with CONFIG_EXPORTFS=m or not defined
  freevxfs: derive f_fsid from bdev->bd_dev
  fs: report f_fsid from s_dev for "simple" filesystems
  exportfs: support encoding non-decodeable file handles by default
  exportfs: define FILEID_INO64_GEN* file handle types
  exportfs: make ->encode_fh() a mandatory method for NFS export
  exportfs: add helpers to check if filesystem can encode/decode file handles
2023-11-07 12:11:26 -08:00
Linus Torvalds
14ab6d425e Merge tag 'vfs-6.7.ctime' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Pull vfs inode time accessor updates from Christian Brauner:
 "This finishes the conversion of all inode time fields to accessor
  functions as discussed on list. Changing timestamps manually as we
  used to do before is error prone. Using accessors function makes this
  robust.

  It does not contain the switch of the time fields to discrete 64 bit
  integers to replace struct timespec and free up space in struct inode.
  But after this, the switch can be trivially made and the patch should
  only affect the vfs if we decide to do it"

* tag 'vfs-6.7.ctime' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (86 commits)
  fs: rename inode i_atime and i_mtime fields
  security: convert to new timestamp accessors
  selinux: convert to new timestamp accessors
  apparmor: convert to new timestamp accessors
  sunrpc: convert to new timestamp accessors
  mm: convert to new timestamp accessors
  bpf: convert to new timestamp accessors
  ipc: convert to new timestamp accessors
  linux: convert to new timestamp accessors
  zonefs: convert to new timestamp accessors
  xfs: convert to new timestamp accessors
  vboxsf: convert to new timestamp accessors
  ufs: convert to new timestamp accessors
  udf: convert to new timestamp accessors
  ubifs: convert to new timestamp accessors
  tracefs: convert to new timestamp accessors
  sysv: convert to new timestamp accessors
  squashfs: convert to new timestamp accessors
  server: convert to new timestamp accessors
  client: convert to new timestamp accessors
  ...
2023-10-30 09:47:13 -10:00
Amir Goldstein
41d1ddd271 exportfs: define FILEID_INO64_GEN* file handle types
Similar to the common FILEID_INO32* file handle types, define common
FILEID_INO64* file handle types.

The type values of FILEID_INO64_GEN and FILEID_INO64_GEN_PARENT are the
values returned by fuse and xfs for 64bit ino encoded file handle types.

Note that these type value are filesystem specific and they do not define
a universal file handle format, for example:
fuse encodes FILEID_INO64_GEN as [ino-hi32,ino-lo32,gen] and xfs encodes
FILEID_INO64_GEN as [hostr-order-ino64,gen] (a.k.a xfs_fid64).

The FILEID_INO64_GEN fhandle type is going to be used for file ids for
fanotify from filesystems that do not support NFS export.

Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20231023180801.2953446-4-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-28 16:16:18 +02:00
Jeff Layton
3c0d5df2d0 fuse: convert to new timestamp accessors
Convert to using the new inode timestamp accessor functions.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20231004185347.80880-37-jlayton@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-18 14:08:21 +02:00
Wedson Almeida Filho
34271edb18 fuse: move fuse_xattr_handlers to .rodata
This makes it harder for accidental or malicious changes to
fuse_xattr_handlers at runtime.

Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
Link: https://lore.kernel.org/r/20230930050033.41174-12-wedsonaf@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-09 16:24:18 +02:00
Linus Torvalds
9e310ea5c8 Merge tag 'fuse-update-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi:

 - Revert non-waiting FLUSH due to a regression

 - Fix a lookup counter leak in readdirplus

 - Add an option to allow shared mmaps in no-cache mode

 - Add btime support and statx intrastructure to the protocol

 - Invalidate positive/negative dentry on failed create/delete

* tag 'fuse-update-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: conditionally fill kstat in fuse_do_statx()
  fuse: invalidate dentry on EEXIST creates or ENOENT deletes
  fuse: cache btime
  fuse: implement statx
  fuse: add ATTR_TIMEOUT macro
  fuse: add STATX request
  fuse: handle empty request_mask in statx
  fuse: write back dirty pages before direct write in direct_io_relax mode
  fuse: add a new fuse init flag to relax restrictions in no cache mode
  fuse: invalidate page cache pages before direct write
  fuse: nlookup missing decrement in fuse_direntplus_link
  Revert "fuse: in fuse_flush only wait if someone wants the return code"
2023-09-05 12:45:55 -07:00
Linus Torvalds
b96a3e9142 Merge tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:

 - Some swap cleanups from Ma Wupeng ("fix WARN_ON in
   add_to_avail_list")

 - Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which
   reduces the special-case code for handling hugetlb pages in GUP. It
   also speeds up GUP handling of transparent hugepages.

 - Peng Zhang provides some maple tree speedups ("Optimize the fast path
   of mas_store()").

 - Sergey Senozhatsky has improved te performance of zsmalloc during
   compaction (zsmalloc: small compaction improvements").

 - Domenico Cerasuolo has developed additional selftest code for zswap
   ("selftests: cgroup: add zswap test program").

 - xu xin has doe some work on KSM's handling of zero pages. These
   changes are mainly to enable the user to better understand the
   effectiveness of KSM's treatment of zero pages ("ksm: support
   tracking KSM-placed zero-pages").

 - Jeff Xu has fixes the behaviour of memfd's
   MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl
   MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED").

 - David Howells has fixed an fscache optimization ("mm, netfs, fscache:
   Stop read optimisation when folio removed from pagecache").

 - Axel Rasmussen has given userfaultfd the ability to simulate memory
   poisoning ("add UFFDIO_POISON to simulate memory poisoning with
   UFFD").

 - Miaohe Lin has contributed some routine maintenance work on the
   memory-failure code ("mm: memory-failure: remove unneeded PageHuge()
   check").

 - Peng Zhang has contributed some maintenance work on the maple tree
   code ("Improve the validation for maple tree and some cleanup").

 - Hugh Dickins has optimized the collapsing of shmem or file pages into
   THPs ("mm: free retracted page table by RCU").

 - Jiaqi Yan has a patch series which permits us to use the healthy
   subpages within a hardware poisoned huge page for general purposes
   ("Improve hugetlbfs read on HWPOISON hugepages").

 - Kemeng Shi has done some maintenance work on the pagetable-check code
   ("Remove unused parameters in page_table_check").

 - More folioification work from Matthew Wilcox ("More filesystem folio
   conversions for 6.6"), ("Followup folio conversions for zswap"). And
   from ZhangPeng ("Convert several functions in page_io.c to use a
   folio").

 - page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext").

 - Baoquan He has converted some architectures to use the
   GENERIC_IOREMAP ioremap()/iounmap() code ("mm: ioremap: Convert
   architectures to take GENERIC_IOREMAP way").

 - Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support
   batched/deferred tlb shootdown during page reclamation/migration").

 - Better maple tree lockdep checking from Liam Howlett ("More strict
   maple tree lockdep"). Liam also developed some efficiency
   improvements ("Reduce preallocations for maple tree").

 - Cleanup and optimization to the secondary IOMMU TLB invalidation,
   from Alistair Popple ("Invalidate secondary IOMMU TLB on permission
   upgrade").

 - Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes
   for arm64").

 - Kemeng Shi provides some maintenance work on the compaction code
   ("Two minor cleanups for compaction").

 - Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle
   most file-backed faults under the VMA lock").

 - Aneesh Kumar contributes code to use the vmemmap optimization for DAX
   on ppc64, under some circumstances ("Add support for DAX vmemmap
   optimization for ppc64").

 - page-ext cleanups from Kemeng Shi ("add page_ext_data to get client
   data in page_ext"), ("minor cleanups to page_ext header").

 - Some zswap cleanups from Johannes Weiner ("mm: zswap: three
   cleanups").

 - kmsan cleanups from ZhangPeng ("minor cleanups for kmsan").

 - VMA handling cleanups from Kefeng Wang ("mm: convert to
   vma_is_initial_heap/stack()").

 - DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes:
   implement DAMOS tried total bytes file"), ("Extend DAMOS filters for
   address ranges and DAMON monitoring targets").

 - Compaction work from Kemeng Shi ("Fixes and cleanups to compaction").

 - Liam Howlett has improved the maple tree node replacement code
   ("maple_tree: Change replacement strategy").

 - ZhangPeng has a general code cleanup - use the K() macro more widely
   ("cleanup with helper macro K()").

 - Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for
   memmap on memory feature on ppc64").

 - pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list
   in page_alloc"), ("Two minor cleanups for get pageblock
   migratetype").

 - Vishal Moola introduces a memory descriptor for page table tracking,
   "struct ptdesc" ("Split ptdesc from struct page").

 - memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups
   for vm.memfd_noexec").

 - MM include file rationalization from Hugh Dickins ("arch: include
   asm/cacheflush.h in asm/hugetlb.h").

 - THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text
   output").

 - kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use
   object_cache instead of kmemleak_initialized").

 - More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor
   and _folio_order").

 - A VMA locking scalability improvement from Suren Baghdasaryan
   ("Per-VMA lock support for swap and userfaults").

 - pagetable handling cleanups from Matthew Wilcox ("New page table
   range API").

 - A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop
   using page->private on tail pages for THP_SWAP + cleanups").

 - Cleanups and speedups to the hugetlb fault handling from Matthew
   Wilcox ("Change calling convention for ->huge_fault").

 - Matthew Wilcox has also done some maintenance work on the MM
   subsystem documentation ("Improve mm documentation").

* tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (489 commits)
  maple_tree: shrink struct maple_tree
  maple_tree: clean up mas_wr_append()
  secretmem: convert page_is_secretmem() to folio_is_secretmem()
  nios2: fix flush_dcache_page() for usage from irq context
  hugetlb: add documentation for vma_kernel_pagesize()
  mm: add orphaned kernel-doc to the rst files.
  mm: fix clean_record_shared_mapping_range kernel-doc
  mm: fix get_mctgt_type() kernel-doc
  mm: fix kernel-doc warning from tlb_flush_rmaps()
  mm: remove enum page_entry_size
  mm: allow ->huge_fault() to be called without the mmap_lock held
  mm: move PMD_ORDER to pgtable.h
  mm: remove checks for pte_index
  memcg: remove duplication detection for mem_cgroup_uncharge_swap
  mm/huge_memory: work on folio->swap instead of page->private when splitting folio
  mm/swap: inline folio_set_swap_entry() and folio_swap_entry()
  mm/swap: use dedicated entry for swap in folio
  mm/swap: stop using page->private on tail pages for THP_SWAP
  selftests/mm: fix WARNING comparing pointer to 0
  selftests: cgroup: fix test_kmem_memcg_deletion kernel mem check
  ...
2023-08-29 14:25:26 -07:00
Bernd Schubert
f73016b63b fuse: conditionally fill kstat in fuse_do_statx()
The code path

fuse_update_attributes
    fuse_update_get_attr
        fuse_do_statx

has the risk to use a NULL pointer for struct kstat *stat, although current
callers of fuse_update_attributes() only set request_mask to values that
will trigger the call of fuse_do_getattr(), which already handles the NULL
pointer.  Future updates might miss that fuse_do_statx() does not handle it
it is safer to add a condition already right now.

Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Fixes: d3045530bd ("fuse: implement statx")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-08-29 14:58:48 +02:00
Linus Torvalds
615e95831e Merge tag 'v6.6-vfs.ctime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs timestamp updates from Christian Brauner:
 "This adds VFS support for multi-grain timestamps and converts tmpfs,
  xfs, ext4, and btrfs to use them. This carries acks from all relevant
  filesystems.

  The VFS always uses coarse-grained timestamps when updating the ctime
  and mtime after a change. This has the benefit of allowing filesystems
  to optimize away a lot of metadata updates, down to around 1 per
  jiffy, even when a file is under heavy writes.

  Unfortunately, this has always been an issue when we're exporting via
  NFSv3, which relies on timestamps to validate caches. A lot of changes
  can happen in a jiffy, so timestamps aren't sufficient to help the
  client decide to invalidate the cache.

  Even with NFSv4, a lot of exported filesystems don't properly support
  a change attribute and are subject to the same problems with timestamp
  granularity. Other applications have similar issues with timestamps
  (e.g., backup applications).

  If we were to always use fine-grained timestamps, that would improve
  the situation, but that becomes rather expensive, as the underlying
  filesystem would have to log a lot more metadata updates.

  This introduces fine-grained timestamps that are used when they are
  actively queried.

  This uses the 31st bit of the ctime tv_nsec field to indicate that
  something has queried the inode for the mtime or ctime. When this flag
  is set, on the next mtime or ctime update, the kernel will fetch a
  fine-grained timestamp instead of the usual coarse-grained one.

  As POSIX generally mandates that when the mtime changes, the ctime
  must also change the kernel always stores normalized ctime values, so
  only the first 30 bits of the tv_nsec field are ever used.

  Filesytems can opt into this behavior by setting the FS_MGTIME flag in
  the fstype. Filesystems that don't set this flag will continue to use
  coarse-grained timestamps.

  Various preparatory changes, fixes and cleanups are included:

   - Fixup all relevant places where POSIX requires updating ctime
     together with mtime. This is a wide-range of places and all
     maintainers provided necessary Acks.

   - Add new accessors for inode->i_ctime directly and change all
     callers to rely on them. Plain accesses to inode->i_ctime are now
     gone and it is accordingly rename to inode->__i_ctime and commented
     as requiring accessors.

   - Extend generic_fillattr() to pass in a request mask mirroring in a
     sense the statx() uapi. This allows callers to pass in a request
     mask to only get a subset of attributes filled in.

   - Rework timestamp updates so it's possible to drop the @now
     parameter the update_time() inode operation and associated helpers.

   - Add inode_update_timestamps() and convert all filesystems to it
     removing a bunch of open-coding"

* tag 'v6.6-vfs.ctime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (107 commits)
  btrfs: convert to multigrain timestamps
  ext4: switch to multigrain timestamps
  xfs: switch to multigrain timestamps
  tmpfs: add support for multigrain timestamps
  fs: add infrastructure for multigrain timestamps
  fs: drop the timespec64 argument from update_time
  xfs: have xfs_vn_update_time gets its own timestamp
  fat: make fat_update_time get its own timestamp
  fat: remove i_version handling from fat_update_time
  ubifs: have ubifs_update_time use inode_update_timestamps
  btrfs: have it use inode_update_timestamps
  fs: drop the timespec64 arg from generic_update_time
  fs: pass the request_mask to generic_fillattr
  fs: remove silly warning from current_time
  gfs2: fix timestamp handling on quota inodes
  fs: rename i_ctime field to __i_ctime
  selinux: convert to ctime accessor functions
  security: convert to ctime accessor functions
  apparmor: convert to ctime accessor functions
  sunrpc: convert to ctime accessor functions
  ...
2023-08-28 09:31:32 -07:00
Matthew Wilcox (Oracle)
1d024e7a8d mm: remove enum page_entry_size
Remove the unnecessary encoding of page order into an enum and pass the
page order directly.  That lets us get rid of pe_order().

The switch constructs have to be changed to if/else constructs to prevent
GCC from warning on builds with 3-level page tables where PMD_ORDER and
PUD_ORDER have the same value.

If you are looking at this commit because your driver stopped compiling,
look at the previous commit as well and audit your driver to be sure it
doesn't depend on mmap_lock being held in its ->huge_fault method.

[willy@infradead.org: use "order %u" to match the (non dev_t) style]
  Link: https://lkml.kernel.org/r/ZOUYekbtTv+n8hYf@casper.infradead.org
Link: https://lkml.kernel.org/r/20230818202335.2739663-4-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24 16:20:30 -07:00
Jiachen Zhang
7d875e6685 fuse: invalidate dentry on EEXIST creates or ENOENT deletes
The EEXIST errors returned from server are strong sign that a local
negative dentry should be invalidated.  Similarly, The ENOENT errors from
server can also be a sign of revalidate failure.

This commit invalidates dentries on EEXIST creates and ENOENT deletes by
calling fuse_invalidate_entry(), which improves the consistency with no
performance degradation.

Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-08-21 12:14:59 +02:00
Miklos Szeredi
972f4c46d0 fuse: cache btime
Not all inode attributes are supported by all filesystems, but for the
basic stats (which are returned by stat(2) and friends) all of them will
have some value, even if that doesn't reflect a real attribute of the file.

Btime is different, in that filesystems are free to report or not report a
value in statx.  If the value is available, then STATX_BTIME bit is set in
stx_mask.

When caching the value of btime, remember the availability of the attribute
as well as the value (if available).  This is done by using the
FUSE_I_BTIME bit in fuse_inode->state to indicate availability, while using
fuse_inode->inval_mask & STATX_BTIME to indicate the state of the cache
itself (i.e. set if cache is invalid, and cleared if cache is valid).

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-08-21 12:14:59 +02:00
Miklos Szeredi
d3045530bd fuse: implement statx
Allow querying btime.  When btime is requested in mask, then FUSE_STATX
request is sent.  Otherwise keep using FUSE_GETATTR.

The userspace interface for statx matches that of the statx(2) API.
However there are limitations on how this interface is used:

 - returned basic stats and btime are used, stx_attributes, etc. are
   ignored

 - always query basic stats and btime, regardless of what was requested

 - requested sync type is ignored, the default is passed to the server

 - if server returns with some attributes missing from the result_mask,
   then no attributes will be cached

 - btime is not cached yet (next patch will fix that)

For new inodes initialize fi->inval_mask to "all invalid", instead of "all
valid" as previously.  Also only clear basic stats from inval_mask when
caching attributes.  This will result in the caching logic not thinking
that btime is cached.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-08-21 12:14:49 +02:00
Miklos Szeredi
9dc10a54ab fuse: add ATTR_TIMEOUT macro
Next patch will introduce yet another type attribute reply.  Add a macro
that can handle attribute timeouts for all of the structs.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-08-16 12:39:41 +02:00
Miklos Szeredi
8d8f9c4b8d fuse: handle empty request_mask in statx
If no attribute is requested, then don't send request to userspace.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-08-16 12:39:24 +02:00
Hao Xu
b5a2a3a0b7 fuse: write back dirty pages before direct write in direct_io_relax mode
In direct_io_relax mode, there can be shared mmaped files and thus dirty
pages in its page cache.  Therefore those dirty pages should be written
back to backend before direct io to avoid data loss.

Signed-off-by: Hao Xu <howeyxu@tencent.com>
Reviewed-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-08-16 12:39:24 +02:00
Hao Xu
e78662e818 fuse: add a new fuse init flag to relax restrictions in no cache mode
FOPEN_DIRECT_IO is usually set by fuse daemon to indicate need of strong
coherency, e.g. network filesystems.  Thus shared mmap is disabled since it
leverages page cache and may write to it, which may cause inconsistence.

But FOPEN_DIRECT_IO can be used not for coherency but to reduce memory
footprint as well, e.g. reduce guest memory usage with virtiofs.
Therefore, add a new fuse init flag FUSE_DIRECT_IO_RELAX to relax
restrictions in that mode, currently, it allows shared mmap.  One thing to
note is to make sure it doesn't break coherency in your use case.

Signed-off-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-08-16 12:39:16 +02:00
Hao Xu
80e4f25262 fuse: invalidate page cache pages before direct write
In FOPEN_DIRECT_IO, page cache may still be there for a file since private
mmap is allowed.  Direct write should respect that and invalidate the
corresponding pages so that page cache readers don't get stale data.

Signed-off-by: Hao Xu <howeyxu@tencent.com>
Tested-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-08-16 12:20:37 +02:00
ruanmeisi
b8bd342d50 fuse: nlookup missing decrement in fuse_direntplus_link
During our debugging of glusterfs, we found an Assertion failed error:
inode_lookup >= nlookup, which was caused by the nlookup value in the
kernel being greater than that in the FUSE file system.

The issue was introduced by fuse_direntplus_link, where in the function,
fuse_iget increments nlookup, and if d_splice_alias returns failure,
fuse_direntplus_link returns failure without decrementing nlookup
https://github.com/gluster/glusterfs/pull/4081

Signed-off-by: ruanmeisi <ruan.meisi@zte.com.cn>
Fixes: 0b05b18381 ("fuse: implement NFS-like readdirplus support")
Cc: <stable@vger.kernel.org> # v3.9
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-08-16 09:40:48 +02:00
Miklos Szeredi
91ec6c8559 Revert "fuse: in fuse_flush only wait if someone wants the return code"
This reverts commit 5a8bee63b1.

Jürg Billeter reports the following regression:

  Since v6.3-rc1 commit 5a8bee63b1 ("fuse: in fuse_flush only wait if
  someone wants the return code") `fput()` is called asynchronously if a
  file is closed as part of a process exiting, i.e., if there was no
  explicit `close()` before exit.

  If the file was open for writing, also `put_write_access()` is called
  asynchronously as part of the async `fput()`.

  If that newly written file is an executable, attempting to `execve()` the
  new file can fail with `ETXTBSY` if it's called after the writer process
  exited but before the async `fput()` has run.

Reported-and-tested-by: "Jürg Billeter" <j@bitron.ch>
Cc: <stable@vger.kernel.org> # v6.3
Link: https://lore.kernel.org/all/4f66cded234462964899f2a661750d6798a57ec0.camel@bitron.ch/
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-08-14 16:23:06 +02:00
Jeff Layton
0d72b92883 fs: pass the request_mask to generic_fillattr
generic_fillattr just fills in the entire stat struct indiscriminately
today, copying data from the inode. There is at least one attribute
(STATX_CHANGE_COOKIE) that can have side effects when it is reported,
and we're looking at adding more with the addition of multigrain
timestamps.

Add a request_mask argument to generic_fillattr and have most callers
just pass in the value that is passed to getattr. Have other callers
(e.g. ksmbd) just pass in STATX_BASIC_STATS. Also move the setting of
STATX_CHANGE_COOKIE into generic_fillattr.

Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: "Paulo Alcantara (SUSE)" <pc@manguebit.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Message-Id: <20230807-mgctime-v7-2-d1dec143a704@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09 08:56:36 +02:00
Jeff Layton
ceb2d5e92f fuse: convert to ctime accessor functions
In later patches, we're going to change how the inode's ctime field is
used. Switch to using accessor functions instead of raw accesses of
inode->i_ctime.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230705190309.579783-44-jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-07-24 10:29:59 +02:00
Linus Torvalds
bfa3037d82 Merge tag 'fuse-update-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
 "Small but important fixes and a trivial cleanup"

* tag 'fuse-update-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: ioctl: translate ENOSYS in outarg
  fuse: revalidate: don't invalidate if interrupted
  fuse: Apply flags2 only when userspace set the FUSE_INIT_EXT
  fuse: remove duplicate check for nodeid
  fuse: add feature flag for expire-only
2023-07-19 11:00:27 -07:00
Linus Torvalds
0a1c979c6b Merge tag 'libnvdimm-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull nvdimm and DAX updates from Vishal Verma:
 "This is mostly small cleanups and fixes, with the biggest change being
  the change to the DAX fault handler allowing it to return
  VM_FAULT_HWPOISON.

  Summary:

   - DAX fixes and cleanups including a use after free, extra
     references, and device unregistration, and a redundant variable.

   - Allow the DAX fault handler to return VM_FAULT_HWPOISON

   - A few libnvdimm cleanups such as making some functions and
     variables static where sufficient.

   - Add a few missing prototypes for wrapped functions in
     tools/testing/nvdimm"

* tag 'libnvdimm-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  dax: enable dax fault handler to report VM_FAULT_HWPOISON
  nvdimm: make security_show static
  nvdimm: make nd_class variable static
  dax/kmem: Pass valid argument to memory_group_register_static
  fsdax: remove redundant variable 'error'
  dax: Cleanup extra dax_region references
  dax: Introduce alloc_dev_dax_id()
  dax: Use device_unregister() in unregister_dax_mapping()
  dax: Fix dax_mapping_release() use after free
  tools/testing/nvdimm: Drop empty platform remove function
  libnvdimm: mark 'security_show' static again
  testing: nvdimm: add missing prototypes for wrapped functions
  dax: fix missing-prototype warnings
2023-07-01 08:48:02 -07:00
Linus Torvalds
6e17c6de3d Merge tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull mm updates from Andrew Morton:

 - Yosry Ahmed brought back some cgroup v1 stats in OOM logs

 - Yosry has also eliminated cgroup's atomic rstat flushing

 - Nhat Pham adds the new cachestat() syscall. It provides userspace
   with the ability to query pagecache status - a similar concept to
   mincore() but more powerful and with improved usability

 - Mel Gorman provides more optimizations for compaction, reducing the
   prevalence of page rescanning

 - Lorenzo Stoakes has done some maintanance work on the
   get_user_pages() interface

 - Liam Howlett continues with cleanups and maintenance work to the
   maple tree code. Peng Zhang also does some work on maple tree

 - Johannes Weiner has done some cleanup work on the compaction code

 - David Hildenbrand has contributed additional selftests for
   get_user_pages()

 - Thomas Gleixner has contributed some maintenance and optimization
   work for the vmalloc code

 - Baolin Wang has provided some compaction cleanups,

 - SeongJae Park continues maintenance work on the DAMON code

 - Huang Ying has done some maintenance on the swap code's usage of
   device refcounting

 - Christoph Hellwig has some cleanups for the filemap/directio code

 - Ryan Roberts provides two patch series which yield some
   rationalization of the kernel's access to pte entries - use the
   provided APIs rather than open-coding accesses

 - Lorenzo Stoakes has some fixes to the interaction between pagecache
   and directio access to file mappings

 - John Hubbard has a series of fixes to the MM selftesting code

 - ZhangPeng continues the folio conversion campaign

 - Hugh Dickins has been working on the pagetable handling code, mainly
   with a view to reducing the load on the mmap_lock

 - Catalin Marinas has reduced the arm64 kmalloc() minimum alignment
   from 128 to 8

 - Domenico Cerasuolo has improved the zswap reclaim mechanism by
   reorganizing the LRU management

 - Matthew Wilcox provides some fixups to make gfs2 work better with the
   buffer_head code

 - Vishal Moola also has done some folio conversion work

 - Matthew Wilcox has removed the remnants of the pagevec code - their
   functionality is migrated over to struct folio_batch

* tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (380 commits)
  mm/hugetlb: remove hugetlb_set_page_subpool()
  mm: nommu: correct the range of mmap_sem_read_lock in task_mem()
  hugetlb: revert use of page_cache_next_miss()
  Revert "page cache: fix page_cache_next/prev_miss off by one"
  mm/vmscan: fix root proactive reclaim unthrottling unbalanced node
  mm: memcg: rename and document global_reclaim()
  mm: kill [add|del]_page_to_lru_list()
  mm: compaction: convert to use a folio in isolate_migratepages_block()
  mm: zswap: fix double invalidate with exclusive loads
  mm: remove unnecessary pagevec includes
  mm: remove references to pagevec
  mm: rename invalidate_mapping_pagevec to mapping_try_invalidate
  mm: remove struct pagevec
  net: convert sunrpc from pagevec to folio_batch
  i915: convert i915_gpu_error to use a folio_batch
  pagevec: rename fbatch_count()
  mm: remove check_move_unevictable_pages()
  drm: convert drm_gem_put_pages() to use a folio_batch
  i915: convert shmem_sg_free_table() to use a folio_batch
  scatterlist: add sg_set_folio()
  ...
2023-06-28 10:28:11 -07:00