Commit Graph

74388 Commits

Author SHA1 Message Date
Linus Torvalds 3acbdbf42e Merge tag 'libnvdimm-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull dax and libnvdimm updates from Dan Williams:
 "The bulk of this is a rework of the dax_operations API after
  discovering the obstacles it posed to the work-in-progress DAX+reflink
  support for XFS and other copy-on-write filesystem mechanics.

  Primarily the need to plumb a block_device through the API to handle
  partition offsets was a sticking point and Christoph untangled that
  dependency in addition to other cleanups to make landing the
  DAX+reflink support easier.

  The DAX_PMEM_COMPAT option has been around for 4 years and not only
  are distributions shipping userspace that understand the current
  configuration API, but some are not even bothering to turn this option
  on anymore, so it seems a good time to remove it per the deprecation
  schedule. Recall that this was added after the device-dax subsystem
  moved from /sys/class/dax to /sys/bus/dax for its sysfs organization.
  All recent functionality depends on /sys/bus/dax.

  Some other miscellaneous cleanups and reflink prep patches are
  included as well.

  Summary:

   - Simplify the dax_operations API:

      - Eliminate bdev_dax_pgoff() in favor of the filesystem
        maintaining and applying a partition offset to all its DAX iomap
        operations.

      - Remove wrappers and device-mapper stacked callbacks for
        ->copy_from_iter() and ->copy_to_iter() in favor of moving
        block_device relative offset responsibility to the
        dax_direct_access() caller.

      - Remove the need for an @bdev in filesystem-DAX infrastructure

      - Remove unused uio helpers copy_from_iter_flushcache() and
        copy_mc_to_iter() as only the non-check_copy_size() versions are
        used for DAX.

   - Prepare XFS for the pending (next merge window) DAX+reflink support

   - Remove deprecated DEV_DAX_PMEM_COMPAT support

   - Cleanup a straggling misuse of the GUID api"

* tag 'libnvdimm-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (38 commits)
  iomap: Fix error handling in iomap_zero_iter()
  ACPI: NFIT: Import GUID before use
  dax: remove the copy_from_iter and copy_to_iter methods
  dax: remove the DAXDEV_F_SYNC flag
  dax: simplify dax_synchronous and set_dax_synchronous
  uio: remove copy_from_iter_flushcache() and copy_mc_to_iter()
  iomap: turn the byte variable in iomap_zero_iter into a ssize_t
  memremap: remove support for external pgmap refcounts
  fsdax: don't require CONFIG_BLOCK
  iomap: build the block based code conditionally
  dax: fix up some of the block device related ifdefs
  fsdax: shift partition offset handling into the file systems
  dax: return the partition offset from fs_dax_get_by_bdev
  iomap: add a IOMAP_DAX flag
  xfs: pass the mapping flags to xfs_bmbt_to_iomap
  xfs: use xfs_direct_write_iomap_ops for DAX zeroing
  xfs: move dax device handling into xfs_{alloc,free}_buftarg
  ext4: cleanup the dax handling in ext4_fill_super
  ext2: cleanup the dax handling in ext2_fill_super
  fsdax: decouple zeroing from the iomap buffered I/O code
  ...
2022-01-12 15:46:11 -08:00
Linus Torvalds 8834147f95 Merge tag 'fscache-rewrite-20220111' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull fscache rewrite from David Howells:
 "This is a set of patches that rewrites the fscache driver and the
  cachefiles driver, significantly simplifying the code compared to
  what's upstream, removing the complex operation scheduling and object
  state machine in favour of something much smaller and simpler.

  The series is structured such that the first few patches disable
  fscache use by the network filesystems using it, remove the cachefiles
  driver entirely and as much of the fscache driver as can be got away
  with without causing build failures in the network filesystems.

  The patches after that recreate fscache and then cachefiles,
  attempting to add the pieces in a logical order. Finally, the
  filesystems are reenabled and then the very last patch changes the
  documentation.

  [!] Note: I have dropped the cifs patch for the moment, leaving local
      caching in cifs disabled. I've been having trouble getting that
      working. I think I have it done, but it needs more testing (there
      seem to be some test failures occurring with v5.16 also from
      xfstests), so I propose deferring that patch to the end of the
      merge window.

  WHY REWRITE?
  ============

  Fscache's operation scheduling API was intended to handle sequencing
  of cache operations, which were all required (where possible) to run
  asynchronously in parallel with the operations being done by the
  network filesystem, whilst allowing the cache to be brought online and
  offline and to interrupt service for invalidation.

  With the advent of the tmpfile capacity in the VFS, however, an
  opportunity arises to do invalidation much more simply, without having
  to wait for I/O that's actually in progress: Cachefiles can simply
  create a tmpfile, cut over the file pointer for the backing object
  attached to a cookie and abandon the in-progress I/O, dismissing it
  upon completion.

  Future work here would involve using Omar Sandoval's vfs_link() with
  AT_LINK_REPLACE[1] to allow an extant file to be displaced by a new
  hard link from a tmpfile as currently I have to unlink the old file
  first.

  These patches can also simplify the object state handling as I/O
  operations to the cache don't all have to be brought to a stop in
  order to invalidate a file. To that end, and with an eye on to writing
  a new backing cache model in the future, I've taken the opportunity to
  simplify the indexing structure.

  I've separated the index cookie concept from the file cookie concept
  by C type now. The former is now called a "volume cookie" (struct
  fscache_volume) and there is a container of file cookies. There are
  then just the two levels. All the index cookie levels are collapsed
  into a single volume cookie, and this has a single printable string as
  a key. For instance, an AFS volume would have a key of something like
  "afs,example.com,1000555", combining the filesystem name, cell name
  and volume ID. This is freeform, but must not have '/' chars in it.

  I've also eliminated all pointers back from fscache into the network
  filesystem. This required the duplication of a little bit of data in
  the cookie (cookie key, coherency data and file size), but it's not
  actually that much. This gets rid of problems with making sure we keep
  netfs data structures around so that the cache can access them.

  These patches mean that most of the code that was in the drivers
  before is simply gone and those drivers are now almost entirely new
  code. That being the case, there doesn't seem any particular reason to
  try and maintain bisectability across it. Further, there has to be a
  point in the middle where things are cut over as there's a single
  point everything has to go through (ie. /dev/cachefiles) and it can't
  be in use by two drivers at once.

  ISSUES YET OUTSTANDING
  ======================

  There are some issues still outstanding, unaddressed by this patchset,
  that will need fixing in future patchsets, but that don't stop this
  series from being usable:

  (1) The cachefiles driver needs to stop using the backing filesystem's
      metadata to store information about what parts of the cache are
      populated. This is not reliable with modern extent-based
      filesystems.

      Fixing this is deferred to a separate patchset as it involves
      negotiation with the network filesystem and the VM as to how much
      data to download to fulfil a read - which brings me on to (2)...

  (2) NFS (and CIFS with the dropped patch) do not take account of how
      the cache would like I/O to be structured to meet its granularity
      requirements. Previously, the cache used page granularity, which
      was fine as the network filesystems also dealt in page
      granularity, and the backing filesystem (ext4, xfs or whatever)
      did whatever it did out of sight. However, we now have folios to
      deal with and the cache will now have to store its own metadata to
      track its contents.

      The change I'm looking at making for cachefiles is to store
      content bitmaps in one or more xattrs and making a bit in the map
      correspond to something like a 256KiB block. However, the size of
      an xattr and the fact that they have to be read/updated in one go
      means that I'm looking at covering 1GiB of data per 512-byte map
      and storing each map in an xattr. Cachefiles has the potential to
      grow into a fully fledged filesystem of its very own if I'm not
      careful.

      However, I'm also looking at changing things even more radically
      and going to a different model of how the cache is arranged and
      managed - one that's more akin to the way, say, openafs does
      things - which brings me on to (3)...

  (3) The way cachefilesd does culling is very inefficient for large
      caches and it would be better to move it into the kernel if I can
      as cachefilesd has to keep asking the kernel if it can cull a
      file. Changing the way the backend works would allow this to be
      addressed.

  BITS THAT MAY BE CONTROVERSIAL
  ==============================

  There are some bits I've added that may be controversial:

  (1) I've provided a flag, S_KERNEL_FILE, that cachefiles uses to check
      if a files is already being used by some other kernel service
      (e.g. a duplicate cachefiles cache in the same directory) and
      reject it if it is. This isn't entirely necessary, but it helps
      prevent accidental data corruption.

      I don't want to use S_SWAPFILE as that has other effects, but
      quite possibly swapon() should set S_KERNEL_FILE too.

      Note that it doesn't prevent userspace from interfering, though
      perhaps it should. (I have made it prevent a marked directory from
      being rmdir-able).

  (2) Cachefiles wants to keep the backing file for a cookie open whilst
      we might need to write to it from network filesystem writeback.
      The problem is that the network filesystem unuses its cookie when
      its file is closed, and so we have nothing pinning the cachefiles
      file open and it will get closed automatically after a short time
      to avoid EMFILE/ENFILE problems.

      Reopening the cache file, however, is a problem if this is being
      done due to writeback triggered by exit(). Some filesystems will
      oops if we try to open a file in that context because they want to
      access current->fs or suchlike.

      To get around this, I added the following:

      (A) An inode flag, I_PINNING_FSCACHE_WB, to be set on a network
          filesystem inode to indicate that we have a usage count on the
          cookie caching that inode.

      (B) A flag in struct writeback_control, unpinned_fscache_wb, that
          is set when __writeback_single_inode() clears the last dirty
          page from i_pages - at which point it clears
          I_PINNING_FSCACHE_WB and sets this flag.

          This has to be done here so that clearing I_PINNING_FSCACHE_WB
          can be done atomically with the check of PAGECACHE_TAG_DIRTY
          that clears I_DIRTY_PAGES.

      (C) A function, fscache_set_page_dirty(), which if it is not set,
          sets I_PINNING_FSCACHE_WB and calls fscache_use_cookie() to
          pin the cache resources.

      (D) A function, fscache_unpin_writeback(), to be called by
          ->write_inode() to unuse the cookie.

      (E) A function, fscache_clear_inode_writeback(), to be called when
          the inode is evicted, before clear_inode() is called. This
          cleans up any lingering I_PINNING_FSCACHE_WB.

      The network filesystem can then use these tools to make sure that
      fscache_write_to_cache() can write locally modified data to the
      cache as well as to the server.

      For the future, I'm working on write helpers for netfs lib that
      should allow this facility to be removed by keeping track of the
      dirty regions separately - but that's incomplete at the moment and
      is also going to be affected by folios, one way or another, since
      it deals with pages"

Link: https://lore.kernel.org/all/510611.1641942444@warthog.procyon.org.uk/
Tested-by: Dominique Martinet <asmadeus@codewreck.org> # 9p
Tested-by: kafs-testing@auristor.com # afs
Tested-by: Jeff Layton <jlayton@kernel.org> # ceph
Tested-by: Dave Wysochanski <dwysocha@redhat.com> # nfs
Tested-by: Daire Byrne <daire@dneg.com> # nfs

* tag 'fscache-rewrite-20220111' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (67 commits)
  9p, afs, ceph, nfs: Use current_is_kswapd() rather than gfpflags_allow_blocking()
  fscache: Add a tracepoint for cookie use/unuse
  fscache: Rewrite documentation
  ceph: add fscache writeback support
  ceph: conversion to new fscache API
  nfs: Implement cache I/O by accessing the cache directly
  nfs: Convert to new fscache volume/cookie API
  9p: Copy local writes to the cache when writing to the server
  9p: Use fscache indexing rewrite and reenable caching
  afs: Skip truncation on the server of data we haven't written yet
  afs: Copy local writes to the cache when writing to the server
  afs: Convert afs to use the new fscache API
  fscache, cachefiles: Display stat of culling events
  fscache, cachefiles: Display stats of no-space events
  cachefiles: Allow cachefiles to actually function
  fscache, cachefiles: Store the volume coherency data
  cachefiles: Implement the I/O routines
  cachefiles: Implement cookie resize for truncate
  cachefiles: Implement begin and end I/O operation
  cachefiles: Implement backing file wrangling
  ...
2022-01-12 13:45:12 -08:00
Linus Torvalds 8975f89748 Merge tag 'fuse-update-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi:

 - Fix a regression introduced in 5.15

 - Extend the size of the FUSE_INIT request to accommodate for more
   flags. There's a slight possibility of a regression for obscure fuse
   servers; if this happens, then more complexity will need to be added
   to the protocol

 - Allow the DAX property to be controlled by the server on a per-inode
   basis in virtiofs

 - Allow sending security context to the server when creating a file or
   directory

* tag 'fuse-update-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  Documentation/filesystem/dax: DAX on virtiofs
  fuse: mark inode DONT_CACHE when per inode DAX hint changes
  fuse: negotiate per inode DAX in FUSE_INIT
  fuse: enable per inode DAX
  fuse: support per inode DAX in fuse protocol
  fuse: make DAX mount option a tri-state
  fuse: add fuse_should_enable_dax() helper
  fuse: Pass correct lend value to filemap_write_and_wait_range()
  fuse: send security context of inode on file
  fuse: extend init flags
2022-01-12 13:30:58 -08:00
Linus Torvalds 1fb38c934c Merge tag 'fs_for_v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull UDF / reiserfs updates from Jan Kara:
 "One UDF fix and one reiserfs cleanup"

* tag 'fs_for_v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: Fix error handling in udf_new_inode()
  reiserfs: don't use congestion_wait()
2022-01-12 13:28:06 -08:00
Linus Torvalds 3d3d673306 Merge tag 'fsnotify_for_v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fanotify updates from Jan Kara:
 "Support for new FAN_RENAME fanotify event and support for reporting
  child info in directory fanotify events (FAN_REPORT_TARGET_FID)"

* tag 'fsnotify_for_v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fanotify: wire up FAN_RENAME event
  fanotify: report old and/or new parent+name in FAN_RENAME event
  fanotify: record either old name new name or both for FAN_RENAME
  fanotify: record old and new parent and name in FAN_RENAME event
  fanotify: support secondary dir fh and name in fanotify_info
  fanotify: use helpers to parcel fanotify_info buffer
  fanotify: use macros to get the offset to fanotify_info buffer
  fsnotify: generate FS_RENAME event with rich information
  fanotify: introduce group flag FAN_REPORT_TARGET_FID
  fsnotify: separate mark iterator type from object type enum
  fsnotify: clarify object type argument
2022-01-12 13:19:35 -08:00
Linus Torvalds f079ab01b5 Merge tag 'iomap-5.17' of git://git.infradead.org/users/willy/linux
Pull iomap updates from Matthew Wilcox:
 "Convert xfs/iomap to use folios.

  This should be all that is needed for XFS to use large folios. There
  is no code in this pull request to create large folios, but no
  additional changes should be needed to XFS or iomap once they are
  created.

  Usually this would have come from Darrick, and we had intended that it
  would come that route. Between the holidays and various things which
  Darrick needed to work on, he asked if I could send things directly.

  There weren't any other iomap patches pending for this release, which
  probably also played a role"

* tag 'iomap-5.17' of git://git.infradead.org/users/willy/linux: (26 commits)
  iomap: Inline __iomap_zero_iter into its caller
  xfs: Support large folios
  iomap: Support large folios in invalidatepage
  iomap: Convert iomap_migrate_page() to use folios
  iomap: Convert iomap_add_to_ioend() to take a folio
  iomap: Simplify iomap_do_writepage()
  iomap: Simplify iomap_writepage_map()
  iomap,xfs: Convert ->discard_page to ->discard_folio
  iomap: Convert iomap_write_end_inline to take a folio
  iomap: Convert iomap_write_begin() and iomap_write_end() to folios
  iomap: Convert __iomap_zero_iter to use a folio
  iomap: Allow iomap_write_begin() to be called with the full length
  iomap: Convert iomap_page_mkwrite to use a folio
  iomap: Convert readahead and readpage to use a folio
  iomap: Convert iomap_read_inline_data to take a folio
  iomap: Use folio offsets instead of page offsets
  iomap: Convert bio completions to use folios
  iomap: Pass the iomap_page into iomap_set_range_uptodate
  iomap: Add iomap_invalidate_folio
  iomap: Convert iomap_releasepage to use a folio
  ...
2022-01-12 12:51:41 -08:00
Linus Torvalds 6020c204be Merge tag 'folio-5.17' of git://git.infradead.org/users/willy/pagecache
Pull folio conversion updates from Matthew Wilcox:
 "Convert much of the page cache to use folios

  This stops just short of actually enabling large folios. It converts
  everything that I noticed needs to be converted, but there may still
  be places I've overlooked which still have page size assumptions.

  The big change here is using large entries in the page cache XArray
  instead of many small entries. That only affects shmem for now, but
  it's a pretty big change for shmem since it changes where memory needs
  to be allocated (at split time instead of insertion)"

* tag 'folio-5.17' of git://git.infradead.org/users/willy/pagecache: (49 commits)
  mm: Use multi-index entries in the page cache
  XArray: Add xas_advance()
  truncate,shmem: Handle truncates that split large folios
  truncate: Convert invalidate_inode_pages2_range to folios
  fs: Convert vfs_dedupe_file_range_compare to folios
  mm: Remove pagevec_remove_exceptionals()
  mm: Convert find_lock_entries() to use a folio_batch
  filemap: Return only folios from find_get_entries()
  filemap: Convert filemap_get_read_batch() to use a folio_batch
  filemap: Convert filemap_read() to use a folio
  truncate: Add invalidate_complete_folio2()
  truncate: Convert invalidate_inode_pages2_range() to use a folio
  truncate: Skip known-truncated indices
  truncate,shmem: Add truncate_inode_folio()
  shmem: Convert part of shmem_undo_range() to use a folio
  mm: Add unmap_mapping_folio()
  truncate: Add truncate_cleanup_folio()
  filemap: Add filemap_release_folio()
  filemap: Use a folio in filemap_page_mkwrite
  filemap: Use a folio in filemap_map_pages
  ...
2022-01-12 12:37:02 -08:00
Linus Torvalds 6dc69d3d0d Merge tag 'driver-core-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
 "Here is the set of changes for the driver core for 5.17-rc1.

  Lots of little things here, including:

   - kobj_type cleanups

   - auxiliary_bus documentation updates

   - auxiliary_device conversions for some drivers (relevant subsystems
     all have provided acks for these)

   - kernfs lock contention reduction for some workloads

   - other tiny cleanups and changes.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'driver-core-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (43 commits)
  kobject documentation: remove default_attrs information
  drivers/firmware: Add missing platform_device_put() in sysfb_create_simplefb
  debugfs: lockdown: Allow reading debugfs files that are not world readable
  driver core: Make bus notifiers in right order in really_probe()
  driver core: Move driver_sysfs_remove() after driver_sysfs_add()
  firmware: edd: remove empty default_attrs array
  firmware: dmi-sysfs: use default_groups in kobj_type
  qemu_fw_cfg: use default_groups in kobj_type
  firmware: memmap: use default_groups in kobj_type
  sh: sq: use default_groups in kobj_type
  headers/uninline: Uninline single-use function: kobject_has_children()
  devtmpfs: mount with noexec and nosuid
  driver core: Simplify async probe test code by using ktime_ms_delta()
  nilfs2: use default_groups in kobj_type
  kobject: remove kset from struct kset_uevent_ops callbacks
  driver core: make kobj_type constant.
  driver core: platform: document registration-failure requirement
  vdpa/mlx5: Use auxiliary_device driver data helpers
  net/mlx5e: Use auxiliary_device driver data helpers
  soundwire: intel: Use auxiliary_device driver data helpers
  ...
2022-01-12 11:11:34 -08:00
Linus Torvalds d3c8108035 Merge tag 'for-5.17/block-2022-01-11' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe:

 - Unify where the struct request handling code is located in the blk-mq
   code (Christoph)

 - Header cleanups (Christoph)

 - Clean up the io_context handling code (Christoph, me)

 - Get rid of ->rq_disk in struct request (Christoph)

 - Error handling fix for add_disk() (Christoph)

 - request allocation cleanusp (Christoph)

 - Documentation updates (Eric, Matthew)

 - Remove trivial crypto unregister helper (Eric)

 - Reduce shared tag overhead (John)

 - Reduce poll_stats memory overhead (me)

 - Known indirect function call for dio (me)

 - Use atomic references for struct request (me)

 - Support request list issue for block and NVMe (me)

 - Improve queue dispatch pinning (Ming)

 - Improve the direct list issue code (Keith)

 - BFQ improvements (Jan)

 - Direct completion helper and use it in mmc block (Sebastian)

 - Use raw spinlock for the blktrace code (Wander)

 - fsync error handling fix (Ye)

 - Various fixes and cleanups (Lukas, Randy, Yang, Tetsuo, Ming, me)

* tag 'for-5.17/block-2022-01-11' of git://git.kernel.dk/linux-block: (132 commits)
  MAINTAINERS: add entries for block layer documentation
  docs: block: remove queue-sysfs.rst
  docs: sysfs-block: document virt_boundary_mask
  docs: sysfs-block: document stable_writes
  docs: sysfs-block: fill in missing documentation from queue-sysfs.rst
  docs: sysfs-block: add contact for nomerges
  docs: sysfs-block: sort alphabetically
  docs: sysfs-block: move to stable directory
  block: don't protect submit_bio_checks by q_usage_counter
  block: fix old-style declaration
  nvme-pci: fix queue_rqs list splitting
  block: introduce rq_list_move
  block: introduce rq_list_for_each_safe macro
  block: move rq_list macros to blk-mq.h
  block: drop needless assignment in set_task_ioprio()
  block: remove unnecessary trailing '\'
  bio.h: fix kernel-doc warnings
  block: check minor range in device_add_disk()
  block: use "unsigned long" for blk_validate_block_size().
  block: fix error unwinding in device_add_disk
  ...
2022-01-12 10:26:52 -08:00
Linus Torvalds 42a7b4ed45 Merge tag 'for-5.17/io_uring-2022-01-11' of git://git.kernel.dk/linux-block
Pull io_uring updates from Jens Axboe:

 - Support for prioritized work completions (Hao)

 - Simplification of reissue (Pavel)

 - Add support for CQE skip (Pavel)

 - Memory leak fix going to 5.15-stable (Pavel)

 - Re-write of internal poll. This both cleans up that code, and gets us
   ready to fix the POLLFREE issue (Pavel)

 - Various cleanups (GuoYong, Pavel, Hao)

* tag 'for-5.17/io_uring-2022-01-11' of git://git.kernel.dk/linux-block: (31 commits)
  io_uring: fix not released cached task refs
  io_uring: remove redundant tab space
  io_uring: remove unused function parameter
  io_uring: use completion batching for poll rem/upd
  io_uring: single shot poll removal optimisation
  io_uring: poll rework
  io_uring: kill poll linking optimisation
  io_uring: move common poll bits
  io_uring: refactor poll update
  io_uring: remove double poll on poll update
  io_uring: code clean for some ctx usage
  io_uring: batch completion in prior_task_list
  io_uring: split io_req_complete_post() and add a helper
  io_uring: add helper for task work execution code
  io_uring: add a priority tw list for irq completion work
  io-wq: add helper to merge two wq_lists
  io_uring: reuse io_req_task_complete for timeouts
  io_uring: tweak iopoll CQE_SKIP event counting
  io_uring: simplify selected buf handling
  io_uring: move up io_put_kbuf() and io_put_rw_kbuf()
  ...
2022-01-12 10:20:35 -08:00
Linus Torvalds f692121142 Merge tag 'for-linus-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:

 - set_fs removal

 - Devicetree support

 - Many cleanups from Al

 - Various virtio and build related fixes

* tag 'for-linus-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: (31 commits)
  um: virtio_uml: Allow probing from devicetree
  um: Add devicetree support
  um: Extract load file helper from initrd.c
  um: remove set_fs
  hostfs: Fix writeback of dirty pages
  um: Use swap() to make code cleaner
  um: header debriding - sigio.h
  um: header debriding - os.h
  um: header debriding - net_*.h
  um: header debriding - mem_user.h
  um: header debriding - activate_ipi()
  um: common-offsets.h debriding...
  um, x86: bury crypto_tfm_ctx_offset
  um: unexport handle_page_fault()
  um: remove a dangling extern of syscall_trace()
  um: kill unused cpu()
  uml/i386: missing include in barrier.h
  um: stop polluting the namespace with registers.h contents
  logic_io instance of iounmap() needs volatile on argument
  um: move amd64 variant of mmap(2) to arch/x86/um/syscalls_64.c
  ...
2022-01-11 15:26:52 -08:00
Linus Torvalds 5672cdfba4 Merge tag 'for-linus-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs
Pull JFFS2, UBI and UBIFS updates from Richard Weinberger:
 "JFFS2:
   - Fix for a deadlock in jffs2_write_begin()

  UBI:
   - Fixes in comments

  UBIFS:
   - Expose error counters in sysfs
   - Many bugfixes found by Hulk Robot and others"

* tag 'for-linus-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
  jffs2: GC deadlock reading a page that is used in jffs2_write_begin()
  ubifs: read-only if LEB may always be taken in ubifs_garbage_collect
  ubifs: fix double return leb in ubifs_garbage_collect
  ubifs: fix slab-out-of-bounds in ubifs_change_lp
  ubifs: fix snprintf() length check
  ubifs: Document sysfs nodes
  ubifs: Export filesystem error counters
  ubifs: Error path in ubifs_remount_rw() seems to wrongly free write buffers
  ubifs: Make use of the helper macro kthread_run()
  ubi: Fix a mistake in comment
  ubifs: Fix spelling mistakes
2022-01-11 15:23:27 -08:00
Linus Torvalds 3f67eaed57 Merge tag 'dlm-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:
 "This set includes the normal collection of minor fixes and cleanups,
  new kmem caches for network messaging structs, a start on some basic
  tracepoints, and some new debugfs files for inserting test messages"

* tag 'dlm-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: (32 commits)
  fs: dlm: print cluster addr if non-cluster node connects
  fs: dlm: memory cache for lowcomms hotpath
  fs: dlm: memory cache for writequeue_entry
  fs: dlm: memory cache for midcomms hotpath
  fs: dlm: remove wq_alloc mutex
  fs: dlm: use event based wait for pending remove
  fs: dlm: check for pending users filling buffers
  fs: dlm: use list_empty() to check last iteration
  fs: dlm: fix build with CONFIG_IPV6 disabled
  fs: dlm: replace use of socket sk_callback_lock with sock_lock
  fs: dlm: don't call kernel_getpeername() in error_report()
  fs: dlm: fix potential buffer overflow
  fs: dlm:Remove unneeded semicolon
  fs: dlm: remove double list_first_entry call
  fs: dlm: filter user dlm messages for kernel locks
  fs: dlm: add lkb waiters debugfs functionality
  fs: dlm: add lkb debugfs functionality
  fs: dlm: allow create lkb with specific id range
  fs: dlm: add debugfs rawmsg send functionality
  fs: dlm: let handle callback data as void
  ...
2022-01-11 15:21:54 -08:00
Linus Torvalds 8481c323e4 Merge tag 'gfs2-v5.16-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Andreas Gruenbacher:
 "Various minor gfs2 cleanups and fixes"

* tag 'gfs2-v5.16-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: dump inode object for iopen glocks
  gfs2: Fix gfs2_instantiate description
  gfs2: Remove redundant check for GLF_INSTANTIATE_NEEDED
  gfs2: remove redundant set of INSTANTIATE_NEEDED
  gfs2: Fix __gfs2_holder_init function name in kernel-doc comment
2022-01-11 15:20:32 -08:00
Linus Torvalds 1dbfae0113 Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
 "Convert ext4 to use the new mount API, and add support for the
  FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL ioctls.

  In addition the usual large number of clean ups and bug fixes, in
  particular for the fast_commit feature"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (48 commits)
  ext4: don't use the orphan list when migrating an inode
  ext4: use BUG_ON instead of if condition followed by BUG
  ext4: fix a copy and paste typo
  ext4: set csum seed in tmp inode while migrating to extents
  ext4: remove unnecessary 'offset' assignment
  ext4: remove redundant o_start statement
  ext4: drop an always true check
  ext4: remove unused assignments
  ext4: remove redundant statement
  ext4: remove useless resetting io_end_size in mpage_process_page()
  ext4: allow to change s_last_trim_minblks via sysfs
  ext4: change s_last_trim_minblks type to unsigned long
  ext4: implement support for get/set fs label
  ext4: only set EXT4_MOUNT_QUOTA when journalled quota file is specified
  ext4: don't use kfree() on rcu protected pointer sbi->s_qf_names
  ext4: avoid trim error on fs with small groups
  ext4: fix an use-after-free issue about data=journal writeback mode
  ext4: fix null-ptr-deref in '__ext4_journal_ensure_credits'
  ext4: initialize err_blk before calling __ext4_get_inode_loc
  ext4: fix a possible ABBA deadlock due to busy PA
  ...
2022-01-11 15:07:49 -08:00
Linus Torvalds 11fc88c2e4 Merge tag 'xfs-5.17-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs updates from Darrick Wong:
 "The big new feature here is that the mount code now only bothers to
  try to free stale COW staging extents if the fs unmounted uncleanly.
  This should reduce mount times, particularly on filesystems supporting
  reflink and containing a large number of allocation groups.

  Everything else this cycle are bugfixes, as the iomap folios
  conversion should be plenty enough excitement for anyone. That and I
  ran out of brain bandwidth after Thanksgiving last year.

  Summary:

   - Fix log recovery with da btree buffers when metauuid is in use.

   - Fix type coercion problems in xattr buffer size validation.

   - Fix a bug in online scrub dir leaf bestcount checking.

   - Only run COW recovery when recovering the log.

   - Fix symlink target buffer UAF problems and symlink locking problems
     by not exposing xfs innards to the VFS.

   - Fix incorrect quotaoff lock usage.

   - Don't let transactions cancel cleanly if they have deferred work
     items attached.

   - Fix a UAF when we're deciding if we need to relog an intent item.

   - Reduce kvmalloc overhead for log shadow buffers.

   - Clean up sysfs attr group usage.

   - Fix a bug where scrub's bmap/rmap checking could race with a quota
     file block allocation due to insufficient locking.

   - Teach scrub to complain about invalid project ids"

* tag 'xfs-5.17-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: warn about inodes with project id of -1
  xfs: hold quota inode ILOCK_EXCL until the end of dqalloc
  xfs: Remove redundant assignment of mp
  xfs: reduce kvmalloc overhead for CIL shadow buffers
  xfs: sysfs: use default_groups in kobj_type
  xfs: prevent UAF in xfs_log_item_in_current_chkpt
  xfs: prevent a WARN_ONCE() in xfs_ioc_attr_list()
  xfs: Fix comments mentioning xfs_ialloc
  xfs: check sb_meta_uuid for dabuf buffer recovery
  xfs: fix a bug in the online fsck directory leaf1 bestcount check
  xfs: only run COW extent recovery when there are no live extents
  xfs: don't expose internal symlink metadata buffers to the vfs
  xfs: fix quotaoff mutex usage now that we don't support disabling it
  xfs: shut down filesystem if we xfs_trans_cancel with deferred work items
2022-01-11 15:01:50 -08:00
Linus Torvalds d601e58c5f Merge tag 'for-5.17-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
 "This end of the year branch is intentionally not that exciting. Most
  of the changes are under the hood, but there are some minor user
  visible improvements and several performance improvements too.

  Features:

   - make send work with concurrent block group relocation.

     We're not allowed to prevent send failing or silently producing
     some bad stream but with more fine grained locking and checks it's
     possible. The send vs deduplication exclusion could reuse the same
     logic in the future.

   - new exclusive operation 'balance paused' to allow adding a device
     to filesystem with paused balance

   - new sysfs file for fsid stored in the per-device directory to help
     distinguish devices when seeding is enabled, the fsid may differ
     from the one reported by the filesystem

  Performance improvements:

   - less metadata needed for directory logging, directory deletion is
     20-40% faster

   - in zoned mode, cache zone information during mount to speed up
     repeated queries (about 50% speedup)

   - free space tree entries get indexed and searched by size (latency
     -30%, search run time -30%)

   - less contention in tree node locking when inserting a key and no
     splits are needed (files/sec in fsmark improves by 1-20%)

  Fixes:

   - fix ENOSPC failure when attempting direct IO write into NOCOW range

   - fix deadlock between quota enable and other quota operations

   - global reserve minimum calculations fixed to account for free space
     tree

   - in zoned mode, fix condition for chunk allocation that may not find
     the right zone for reuse and could lead to early ENOSPC

  Core:

   - global reserve stealing got simplified and cleaned up in evict

   - remove async transaction commit based on manual transaction refs,
     reuse existing kthread and mechanisms to let it commit transaction
     before timeout

   - preparatory work for extent tree v2, add wrappers for global tree
     roots, truncation path cleanups

   - remove readahead framework, it's a bit overengineered and used only
     for scrub, and yet it does not cover all its needs, there is
     another readahead built in the b-tree search that is now used,
     performance drop on HDD is about 5% which is acceptable and scrub
     is often throttled anyway, on SSDs there's no reported drop but
     slight improvement

   - self tests report extent tree state when error occurs

   - replace assert with debugging information when an uncommitted
     transaction is found at unmount time

  Other:

   - error handling improvements

   - other cleanups and refactoring"

* tag 'for-5.17-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (115 commits)
  btrfs: output more debug messages for uncommitted transaction
  btrfs: respect the max size in the header when activating swap file
  btrfs: fix argument list that the kdoc format and script verified
  btrfs: remove unnecessary parameter type from compression_decompress_bio
  btrfs: selftests: dump extent io tree if extent-io-tree test failed
  btrfs: scrub: cleanup the argument list of scrub_stripe()
  btrfs: scrub: cleanup the argument list of scrub_chunk()
  btrfs: remove reada infrastructure
  btrfs: scrub: use btrfs_path::reada for extent tree readahead
  btrfs: scrub: remove the unnecessary path parameter for scrub_raid56_parity()
  btrfs: refactor unlock_up
  btrfs: skip transaction commit after failure to create subvolume
  btrfs: zoned: fix chunk allocation condition for zoned allocator
  btrfs: add extent allocator hook to decide to allocate chunk or not
  btrfs: zoned: unset dedicated block group on allocation failure
  btrfs: zoned: drop redundant check for REQ_OP_ZONE_APPEND and btrfs_is_zoned
  btrfs: zoned: sink zone check into btrfs_repair_one_zone
  btrfs: zoned: simplify btrfs_check_meta_write_pointer
  btrfs: zoned: encapsulate inode locking for zoned relocation
  btrfs: sysfs: add devinfo/fsid to retrieve actual fsid from the device
  ...
2022-01-11 14:53:40 -08:00
Linus Torvalds 9149fe8ba7 Merge tag 'erofs-for-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
 "In this cycle, tail-packing data inline for compressed files is now
  supported so that tail pcluster can be stored and read together with
  inode metadata in order to save data I/O and storage space.

  In addition to that, to prepare for the upcoming subpage, folio and
  fscache features, we also introduce meta buffers to get rid of
  erofs_get_meta_page() since it was too close to the page itself.

  In addition, in order to show supported kernel features and control
  sync decompression strategy, new sysfs nodes are introduced in this
  cycle as well.

  Summary:

   - add sysfs interface and a sysfs node to control sync decompression

   - add tail-packing inline support for compressed files

   - get rid of erofs_get_meta_page()"

* tag 'erofs-for-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: use meta buffers for zmap operations
  erofs: use meta buffers for xattr operations
  erofs: use meta buffers for super operations
  erofs: use meta buffers for inode operations
  erofs: introduce meta buffer operations
  erofs: add on-disk compressed tail-packing inline support
  erofs: support inline data decompression
  erofs: support unaligned data decompression
  erofs: introduce z_erofs_fixup_insize
  erofs: tidy up z_erofs_lz4_decompress
  erofs: clean up erofs_map_blocks tracepoints
  erofs: Replace zero-length array with flexible-array member
  erofs: add sysfs node to control sync decompression strategy
  erofs: add sysfs interface
  erofs: rename lz4_0pading to zero_padding
2022-01-11 14:51:10 -08:00
David Howells d7bdba1c81 9p, afs, ceph, nfs: Use current_is_kswapd() rather than gfpflags_allow_blocking()
In 9p, afs ceph, and nfs, gfpflags_allow_blocking() (which wraps a
test for __GFP_DIRECT_RECLAIM being set) is used to determine if
->releasepage() should wait for the completion of a DIO write to fscache
with something like:

	if (folio_test_fscache(folio)) {
		if (!gfpflags_allow_blocking(gfp) || !(gfp & __GFP_FS))
			return false;
		folio_wait_fscache(folio);
	}

Instead, current_is_kswapd() should be used instead.

Note that this is based on a patch originally by Zhaoyang Huang[1].  In
addition to extending it to the other network filesystems and putting it on
top of my fscache rewrite, it also needs to include linux/swap.h in a bunch
of places.  Can current_is_kswapd() be moved to linux/mm.h?

Changes
=======
ver #5:
 - Dropping the changes for cifs.

Originally-signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Co-developed-by: David Howells <dhowells@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Steve French <smfrench@gmail.com>
cc: Trond Myklebust <trond.myklebust@hammerspace.com>
cc: linux-cachefs@redhat.com
cc: v9fs-developer@lists.sourceforge.net
cc: linux-afs@lists.infradead.org
cc: ceph-devel@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: linux-nfs@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/1638952658-20285-1-git-send-email-huangzhaoyang@gmail.com/ [1]
Link: https://lore.kernel.org/r/164021590773.640689.16777975200823659231.stgit@warthog.procyon.org.uk/ # v4
2022-01-11 22:27:42 +00:00
Linus Torvalds 5dfbfe71e3 Merge tag 'fs.idmapped.v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull fs idmapping updates from Christian Brauner:
 "This contains the work to enable the idmapping infrastructure to
  support idmapped mounts of filesystems mounted with an idmapping.

  In addition this contains various cleanups that avoid repeated
  open-coding of the same functionality and simplify the code in quite a
  few places.

  We also finish the renaming of the mapping helpers we started a few
  kernel releases back and move them to a dedicated header to not
  continue polluting the fs header needlessly with low-level idmapping
  helpers. With this series the fs header only contains idmapping
  helpers that interact with fs objects.

  Currently we only support idmapped mounts for filesystems mounted
  without an idmapping themselves. This was a conscious decision
  mentioned in multiple places (cf. [1]).

  As explained at length in [3] it is perfectly fine to extend support
  for idmapped mounts to filesystem's mounted with an idmapping should
  the need arise. The need has been there for some time now (cf. [2]).

  Before we can port any filesystem that is mountable with an idmapping
  to support idmapped mounts in the coming cycles, we need to first
  extend the mapping helpers to account for the filesystem's idmapping.
  This again, is explained at length in our documentation at [3] and
  also in the individual commit messages so here's an overview.

  Currently, the low-level mapping helpers implement the remapping
  algorithms described in [3] in a simplified manner as we could rely on
  the fact that all filesystems supporting idmapped mounts are mounted
  without an idmapping.

  In contrast, filesystems mounted with an idmapping are very likely to
  not use an identity mapping and will instead use a non-identity
  mapping. So the translation step from or into the filesystem's
  idmapping in the remapping algorithm cannot be skipped for such
  filesystems.

  Non-idmapped filesystems and filesystems not supporting idmapped
  mounts are unaffected by this change as the remapping algorithms can
  take the same shortcut as before. If the low-level helpers detect that
  they are dealing with an idmapped mount but the underlying filesystem
  is mounted without an idmapping we can rely on the previous shortcut
  and can continue to skip the translation step from or into the
  filesystem's idmapping. And of course, if the low-level helpers detect
  that they are not dealing with an idmapped mount they can simply
  return the relevant id unchanged; no remapping needs to be performed
  at all.

  These checks guarantee that only the minimal amount of work is
  performed. As before, if idmapped mounts aren't used the low-level
  helpers are idempotent and no work is performed at all"

Link: 2ca4dcc490 ("fs/mount_setattr: tighten permission checks") [1]
Link: https://github.com/containers/podman/issues/10374 [2]
Link: Documentations/filesystems/idmappings.rst [3]
Link: a65e58e791 ("fs: document and rename fsid helpers") [4]

* tag 'fs.idmapped.v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  fs: support mapped mounts of mapped filesystems
  fs: add i_user_ns() helper
  fs: port higher-level mapping helpers
  fs: remove unused low-level mapping helpers
  fs: use low-level mapping helpers
  docs: update mapping documentation
  fs: account for filesystem mappings
  fs: tweak fsuidgid_has_mapping()
  fs: move mapping helpers
  fs: add is_idmapped_mnt() helper
2022-01-11 14:26:55 -08:00
David Howells e6435f1e02 fscache: Add a tracepoint for cookie use/unuse
Add a tracepoint to track fscache_use/unuse_cookie().

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/164021588628.640689.12942919367404043608.stgit@warthog.procyon.org.uk/ # v4
2022-01-11 22:13:01 +00:00
Jeff Layton 1702e79734 ceph: add fscache writeback support
When updating the backing store from the pagecache (a'la writepage or
writepages), write to the cache first. This allows us to keep caching
files even when they are being written, as long as we have appropriate
caps.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20211129162907.149445-3-jlayton@kernel.org/ # v1
Link: https://lore.kernel.org/r/20211207134451.66296-3-jlayton@kernel.org/ # v2
Link: https://lore.kernel.org/r/163906985808.143852.1383891557313186623.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967190257.1823006.16713609520911954804.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021585020.640689.6765214932458435472.stgit@warthog.procyon.org.uk/ # v4
2022-01-11 22:13:01 +00:00
Jeff Layton 400e1286c0 ceph: conversion to new fscache API
Now that the fscache API has been reworked and simplified, change ceph
over to use it.

With the old API, we would only instantiate a cookie when the file was
open for reads. Change it to instantiate the cookie when the inode is
instantiated and call use/unuse when the file is opened/closed.

Also, ensure we resize the cached data on truncates, and invalidate the
cache in response to the appropriate events. This will allow us to
plumb in write support later.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20211129162907.149445-2-jlayton@kernel.org/ # v1
Link: https://lore.kernel.org/r/20211207134451.66296-2-jlayton@kernel.org/ # v2
Link: https://lore.kernel.org/r/163906984277.143852.14697110691303589000.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967188351.1823006.5065634844099079351.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021581427.640689.14128682147127509264.stgit@warthog.procyon.org.uk/ # v4
2022-01-11 22:13:01 +00:00
Jan Kara 68514dacf2 select: Fix indefinitely sleeping task in poll_schedule_timeout()
A task can end up indefinitely sleeping in do_select() ->
poll_schedule_timeout() when the following race happens:

  TASK1 (thread1)             TASK2                   TASK1 (thread2)
  do_select()
    setup poll_wqueues table
    with 'fd'
                              write data to 'fd'
                                pollwake()
                                  table->triggered = 1
                                                      closes 'fd' thread1 is
                                                        waiting for
    poll_schedule_timeout()
      - sees table->triggered
      table->triggered = 0
      return -EINTR
    loop back in do_select()

But at this point when TASK1 loops back, the fdget() in the setup of
poll_wqueues fails.  So now so we never find 'fd' is ready for reading
and sleep in poll_schedule_timeout() indefinitely.

Treat an fd that got closed as a fd on which some event happened.  This
makes sure cannot block indefinitely in do_select().

Another option would be to return -EBADF in this case but that has a
potential of subtly breaking applications that excercise this behavior
and it happens to work for them.  So returning fd as active seems like a
safer choice.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-11 09:03:05 -08:00
Bob Peterson 74382e277a gfs2: dump inode object for iopen glocks
Before this patch, glock dumps would not dump the gl_object for iopen
glocks. This information can help us debug problems related to eviction:
when AN iopen glock is blocked we can see the status of its underlying
inode and its flags, etc.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-01-11 16:52:44 +01:00