Commit Graph

6724 Commits

Author SHA1 Message Date
Linus Torvalds
a93e884edf Merge tag 'driver-core-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
 "Here is the large set of driver core changes for 6.3-rc1.

  There's a lot of changes this development cycle, most of the work
  falls into two different categories:

   - fw_devlink fixes and updates. This has gone through numerous review
     cycles and lots of review and testing by lots of different devices.
     Hopefully all should be good now, and Saravana will be keeping a
     watch for any potential regression on odd embedded systems.

   - driver core changes to work to make struct bus_type able to be
     moved into read-only memory (i.e. const) The recent work with Rust
     has pointed out a number of areas in the driver core where we are
     passing around and working with structures that really do not have
     to be dynamic at all, and they should be able to be read-only
     making things safer overall. This is the contuation of that work
     (started last release with kobject changes) in moving struct
     bus_type to be constant. We didn't quite make it for this release,
     but the remaining patches will be finished up for the release after
     this one, but the groundwork has been laid for this effort.

  Other than that we have in here:

   - debugfs memory leak fixes in some subsystems

   - error path cleanups and fixes for some never-able-to-be-hit
     codepaths.

   - cacheinfo rework and fixes

   - Other tiny fixes, full details are in the shortlog

  All of these have been in linux-next for a while with no reported
  problems"

[ Geert Uytterhoeven points out that that last sentence isn't true, and
  that there's a pending report that has a fix that is queued up - Linus ]

* tag 'driver-core-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (124 commits)
  debugfs: drop inline constant formatting for ERR_PTR(-ERROR)
  OPP: fix error checking in opp_migrate_dentry()
  debugfs: update comment of debugfs_rename()
  i3c: fix device.h kernel-doc warnings
  dma-mapping: no need to pass a bus_type into get_arch_dma_ops()
  driver core: class: move EXPORT_SYMBOL_GPL() lines to the correct place
  Revert "driver core: add error handling for devtmpfs_create_node()"
  Revert "devtmpfs: add debug info to handle()"
  Revert "devtmpfs: remove return value of devtmpfs_delete_node()"
  driver core: cpu: don't hand-override the uevent bus_type callback.
  devtmpfs: remove return value of devtmpfs_delete_node()
  devtmpfs: add debug info to handle()
  driver core: add error handling for devtmpfs_create_node()
  driver core: bus: update my copyright notice
  driver core: bus: add bus_get_dev_root() function
  driver core: bus: constify bus_unregister()
  driver core: bus: constify some internal functions
  driver core: bus: constify bus_get_kset()
  driver core: bus: constify bus_register/unregister_notifier()
  driver core: remove private pointer from struct bus_type
  ...
2023-02-24 12:58:55 -08:00
Linus Torvalds
3822a7c409 Merge tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:

 - Daniel Verkamp has contributed a memfd series ("mm/memfd: add
   F_SEAL_EXEC") which permits the setting of the memfd execute bit at
   memfd creation time, with the option of sealing the state of the X
   bit.

 - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
   thread-safe for pmd unshare") which addresses a rare race condition
   related to PMD unsharing.

 - Several folioification patch serieses from Matthew Wilcox, Vishal
   Moola, Sidhartha Kumar and Lorenzo Stoakes

 - Johannes Weiner has a series ("mm: push down lock_page_memcg()")
   which does perform some memcg maintenance and cleanup work.

 - SeongJae Park has added DAMOS filtering to DAMON, with the series
   "mm/damon/core: implement damos filter".

   These filters provide users with finer-grained control over DAMOS's
   actions. SeongJae has also done some DAMON cleanup work.

 - Kairui Song adds a series ("Clean up and fixes for swap").

 - Vernon Yang contributed the series "Clean up and refinement for maple
   tree".

 - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It
   adds to MGLRU an LRU of memcgs, to improve the scalability of global
   reclaim.

 - David Hildenbrand has added some userfaultfd cleanup work in the
   series "mm: uffd-wp + change_protection() cleanups".

 - Christoph Hellwig has removed the generic_writepages() library
   function in the series "remove generic_writepages".

 - Baolin Wang has performed some maintenance on the compaction code in
   his series "Some small improvements for compaction".

 - Sidhartha Kumar is doing some maintenance work on struct page in his
   series "Get rid of tail page fields".

 - David Hildenbrand contributed some cleanup, bugfixing and
   generalization of pte management and of pte debugging in his series
   "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with
   swap PTEs".

 - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
   flag in the series "Discard __GFP_ATOMIC".

 - Sergey Senozhatsky has improved zsmalloc's memory utilization with
   his series "zsmalloc: make zspage chain size configurable".

 - Joey Gouly has added prctl() support for prohibiting the creation of
   writeable+executable mappings.

   The previous BPF-based approach had shortcomings. See "mm: In-kernel
   support for memory-deny-write-execute (MDWE)".

 - Waiman Long did some kmemleak cleanup and bugfixing in the series
   "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".

 - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series
   "mm: multi-gen LRU: improve".

 - Jiaqi Yan has provided some enhancements to our memory error
   statistics reporting, mainly by presenting the statistics on a
   per-node basis. See the series "Introduce per NUMA node memory error
   statistics".

 - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
   regression in compaction via his series "Fix excessive CPU usage
   during compaction".

 - Christoph Hellwig does some vmalloc maintenance work in the series
   "cleanup vfree and vunmap".

 - Christoph Hellwig has removed block_device_operations.rw_page() in
   ths series "remove ->rw_page".

 - We get some maple_tree improvements and cleanups in Liam Howlett's
   series "VMA tree type safety and remove __vma_adjust()".

 - Suren Baghdasaryan has done some work on the maintainability of our
   vm_flags handling in the series "introduce vm_flags modifier
   functions".

 - Some pagemap cleanup and generalization work in Mike Rapoport's
   series "mm, arch: add generic implementation of pfn_valid() for
   FLATMEM" and "fixups for generic implementation of pfn_valid()"

 - Baoquan He has done some work to make /proc/vmallocinfo and
   /proc/kcore better represent the real state of things in his series
   "mm/vmalloc.c: allow vread() to read out vm_map_ram areas".

 - Jason Gunthorpe rationalized the GUP system's interface to the rest
   of the kernel in the series "Simplify the external interface for
   GUP".

 - SeongJae Park wishes to migrate people from DAMON's debugfs interface
   over to its sysfs interface. To support this, we'll temporarily be
   printing warnings when people use the debugfs interface. See the
   series "mm/damon: deprecate DAMON debugfs interface".

 - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
   and clean-ups" series.

 - Huang Ying has provided a dramatic reduction in migration's TLB flush
   IPI rates with the series "migrate_pages(): batch TLB flushing".

 - Arnd Bergmann has some objtool fixups in "objtool warning fixes".

* tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits)
  include/linux/migrate.h: remove unneeded externs
  mm/memory_hotplug: cleanup return value handing in do_migrate_range()
  mm/uffd: fix comment in handling pte markers
  mm: change to return bool for isolate_movable_page()
  mm: hugetlb: change to return bool for isolate_hugetlb()
  mm: change to return bool for isolate_lru_page()
  mm: change to return bool for folio_isolate_lru()
  objtool: add UACCESS exceptions for __tsan_volatile_read/write
  kmsan: disable ftrace in kmsan core code
  kasan: mark addr_has_metadata __always_inline
  mm: memcontrol: rename memcg_kmem_enabled()
  sh: initialize max_mapnr
  m68k/nommu: add missing definition of ARCH_PFN_OFFSET
  mm: percpu: fix incorrect size in pcpu_obj_full_size()
  maple_tree: reduce stack usage with gcc-9 and earlier
  mm: page_alloc: call panic() when memoryless node allocation fails
  mm: multi-gen LRU: avoid futile retries
  migrate_pages: move THP/hugetlb migration support check to simplify code
  migrate_pages: batch flushing TLB
  migrate_pages: share more code between _unmap and _move
  ...
2023-02-23 17:09:35 -08:00
Linus Torvalds
307e14c039 Merge tag '6.3-rc-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs client updates from Steve French:
 "The largest subset of this is from David Howells et al: making the
  cifs/smb3 driver pass iov_iters down to the lowest layers, directly to
  the network transport rather than passing lists of pages around,
  helping multiple areas:

   - Pin user pages, thereby fixing the race between concurrent DIO read
     and fork, where the pages containing the DIO read buffer may end up
     belonging to the child process and not the parent - with the result
     that the parent might not see the retrieved data.

   - cifs shouldn't take refs on pages extracted from non-user-backed
     iterators (eg. KVEC). With these changes, cifs will apply the
     appropriate cleanup.

   - Making it easier to transition to using folios in cifs rather than
     pages by dealing with them through BVEC and XARRAY iterators.

   - Allowing cifs to use the new splice function

  The remainder are:

   - fixes for stable, including various fixes for uninitialized memory,
     wrong length field causing mount issue to very old servers,
     important directory lease fixes and reconnect fixes

   - cleanups (unused code removal, change one element array usage, and
     a change form strtobool to kstrtobool, and Kconfig cleanups)

   - SMBDIRECT (RDMA) fixes including iov_iter integration and UAF fixes

   - reconnect fixes

   - multichannel fixes, including improving channel allocation (to
     least used channel)

   - remove the last use of lock_page_killable by moving to
     folio_lock_killable"

* tag '6.3-rc-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: (46 commits)
  update internal module version number for cifs.ko
  cifs: update ip_addr for ses only for primary chan setup
  cifs: use tcon allocation functions even for dummy tcon
  cifs: use the least loaded channel for sending requests
  cifs: DIO to/from KVEC-type iterators should now work
  cifs: Remove unused code
  cifs: Build the RDMA SGE list directly from an iterator
  cifs: Change the I/O paths to use an iterator rather than a page list
  cifs: Add a function to read into an iter from a socket
  cifs: Add some helper functions
  cifs: Add a function to Hash the contents of an iterator
  cifs: Add a function to build an RDMA SGE list from an iterator
  netfs: Add a function to extract an iterator into a scatterlist
  netfs: Add a function to extract a UBUF or IOVEC into a BVEC iterator
  cifs: Implement splice_read to pass down ITER_BVEC not ITER_PIPE
  splice: Export filemap/direct_splice_read()
  iov_iter: Add a function to extract a page list from an iterator
  iov_iter: Define flags to qualify page extraction.
  splice: Add a func to do a splice from an O_DIRECT file without ITER_PIPE
  splice: Add a func to do a splice from a buffered file without ITER_PIPE
  ...
2023-02-22 17:12:44 -08:00
Linus Torvalds
6861eaf791 Merge tag 'ata-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata
Pull ATA updates from Damien Le Moal:

 - Small cleanup of the pata_octeon driver to drop a useless platform
   callback (Uwe)

 - Simplify ata_scsi_cmd_error_handler() code using the fact that
   ap->ops->error_handler is NULL most of the time (Wenchao)

 - Several patches improving libata error handling. This is in
   preparation for supporting the command duration limits (CDL) feature.
   The changes allow handling corner cases of ATA NCQ errors which do
   not happen with regular drives but will be triggered with CDL drives
   (Niklas)

 - Simplify the qc_fill_rtf operation (me)

 - Improve SCSI command translation for REPORT_SUPPORTED_OPERATION_CODES
   command (me)

 - Cleanup of libata FUA handling.

   This falls short of enabling FUA for ATA drives that support it by
   default as there were concerns that old drives would break. The
   series however fixes several issues with the FUA support to ensure
   that FUA is reported as being supported only for drives that can
   handle all possible write cases (NCQ and non-NCQ). A check in the
   block layer is also added to ensure that we never see read FUA
   commands (current behavior) (me)

 - Several patches to move the old PARIDE (parallel port IDE) driver to
   libata as pata_parport. Given that this driver also needs protocol
   modules, the driver code resides in its own pata_parport directoy
   under drivers/ata (Ondrej)

* tag 'ata-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
  ata: pata_parport: Fix ida_alloc return value error check
  drivers/block: Move PARIDE protocol modules to drivers/ata/pata_parport
  drivers/block: Remove PARIDE core and high-level protocols
  ata: pata_parport: add driver (PARIDE replacement)
  ata: libata: exclude FUA support for known buggy drives
  ata: libata: Fix FUA handling in ata_build_rw_tf()
  ata: libata: cleanup fua support detection
  ata: libata: Rename and cleanup ata_rwcmd_protocol()
  ata: libata: Introduce ata_ncq_supported()
  block: add a sanity check for non-write flush/fua bios
  ata: libata-scsi: improve ata_scsiop_maint_in()
  ata: libata-scsi: do not overwrite SCSI ML and status bytes
  ata: libata: move NCQ related ATA_DFLAGs
  ata: libata: respect successfully completed commands during errors
  ata: libata: read the shared status for successful NCQ commands once
  ata: libata: simplify qc_fill_rtf port operation interface
  ata: scsi: rename flag ATA_QCFLAG_FAILED to ATA_QCFLAG_EH
  ata: libata-eh: Cleanup ata_scsi_cmd_error_handler()
  ata: octeon: Drop empty platform remove function
2023-02-22 13:35:51 -08:00
Linus Torvalds
9e58df973d Merge tag 'irq-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
 "Updates for the interrupt subsystem:

  Core:

   - Move the interrupt affinity spreading mechanism into lib/group_cpus
     so it can be used for similar spreading requirements, e.g. in the
     block multi-queue code

     This also contains a first usecase in the block multi-queue code
     which Jens asked to take along with the librarization

   - Improve irqdomain locking to close a number race conditions which
     can be observed with massive parallel device driver probing

   - Enforce and document the semantics of disable_irq() which cannot be
     invoked safely from non-sleepable context

   - Move the IPI multiplexing code from the Apple AIC driver into the
     core, so it can be reused by RISCV

  Drivers:

   - Plug OF node refcounting leaks in various drivers

   - Correctly mark level triggered interrupts in the Broadcom L2
     drivers

   - The usual small fixes and improvements

   - No new drivers for the record!"

* tag 'irq-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits)
  irqchip/irq-bcm7120-l2: Set IRQ_LEVEL for level triggered interrupts
  irqchip/irq-brcmstb-l2: Set IRQ_LEVEL for level triggered interrupts
  irqdomain: Switch to per-domain locking
  irqchip/mvebu-odmi: Use irq_domain_create_hierarchy()
  irqchip/loongson-pch-msi: Use irq_domain_create_hierarchy()
  irqchip/gic-v3-mbi: Use irq_domain_create_hierarchy()
  irqchip/gic-v3-its: Use irq_domain_create_hierarchy()
  irqchip/gic-v2m: Use irq_domain_create_hierarchy()
  irqchip/alpine-msi: Use irq_domain_add_hierarchy()
  x86/uv: Use irq_domain_create_hierarchy()
  x86/ioapic: Use irq_domain_create_hierarchy()
  irqdomain: Clean up irq_domain_push/pop_irq()
  irqdomain: Drop leftover brackets
  irqdomain: Drop dead domain-name assignment
  irqdomain: Drop revmap mutex
  irqdomain: Fix domain registration race
  irqdomain: Fix mapping-creation race
  irqdomain: Refactor __irq_domain_alloc_irqs()
  irqdomain: Look for existing mapping only once
  irqdomain: Drop bogus fwspec-mapping error handling
  ...
2023-02-21 10:03:48 -08:00
David Howells
f62e52d127 iov_iter: Define flags to qualify page extraction.
Define flags to qualify page extraction to pass into iov_iter_*_pages*()
rather than passing in FOLL_* flags.

For now only a flag to allow peer-to-peer DMA is supported.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Logan Gunthorpe <logang@deltatee.com>
cc: linux-fsdevel@vger.kernel.org
cc: linux-block@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-02-20 17:25:43 -06:00
Linus Torvalds
5b0ed59649 Merge tag 'for-6.3/block-2023-02-16' of git://git.kernel.dk/linux
Pull block updates from Jens Axboe:

 - NVMe updates via Christoph:
      - Small improvements to the logging functionality (Amit Engel)
      - Authentication cleanups (Hannes Reinecke)
      - Cleanup and optimize the DMA mapping cod in the PCIe driver
        (Keith Busch)
      - Work around the command effects for Format NVM (Keith Busch)
      - Misc cleanups (Keith Busch, Christoph Hellwig)
      - Fix and cleanup freeing single sgl (Keith Busch)

 - MD updates via Song:
      - Fix a rare crash during the takeover process
      - Don't update recovery_cp when curr_resync is ACTIVE
      - Free writes_pending in md_stop
      - Change active_io to percpu

 - Updates to drbd, inching us closer to unifying the out-of-tree driver
   with the in-tree one (Andreas, Christoph, Lars, Robert)

 - BFQ update adding support for multi-actuator drives (Paolo, Federico,
   Davide)

 - Make brd compliant with REQ_NOWAIT (me)

 - Fix for IOPOLL and queue entering, fixing stalled IO waiting on
   timeouts (me)

 - Fix for REQ_NOWAIT with multiple bios (me)

 - Fix memory leak in blktrace cleanup (Greg)

 - Clean up sbitmap and fix a potential hang (Kemeng)

 - Clean up some bits in BFQ, and fix a bug in the request injection
   (Kemeng)

 - Clean up the request allocation and issue code, and fix some bugs
   related to that (Kemeng)

 - ublk updates and fixes:
      - Add support for unprivileged ublk (Ming)
      - Improve device deletion handling (Ming)
      - Misc (Liu, Ziyang)

 - s390 dasd fixes (Alexander, Qiheng)

 - Improve utility of request caching and fixes (Anuj, Xiao)

 - zoned cleanups (Pankaj)

 - More constification for kobjs (Thomas)

 - blk-iocost cleanups (Yu)

 - Remove bio splitting from drivers that don't need it (Christoph)

 - Switch blk-cgroups to use struct gendisk. Some of this is now
   incomplete as select late reverts were done. (Christoph)

 - Add bvec initialization helpers, and convert callers to use that
   rather than open-coding it (Christoph)

 - Misc fixes and cleanups (Jinke, Keith, Arnd, Bart, Li, Martin,
   Matthew, Ulf, Zhong)

* tag 'for-6.3/block-2023-02-16' of git://git.kernel.dk/linux: (169 commits)
  brd: use radix_tree_maybe_preload instead of radix_tree_preload
  block: use proper return value from bio_failfast()
  block: bio-integrity: Copy flags when bio_integrity_payload is cloned
  block: Fix io statistics for cgroup in throttle path
  brd: mark as nowait compatible
  brd: check for REQ_NOWAIT and set correct page allocation mask
  brd: return 0/-error from brd_insert_page()
  block: sync mixed merged request's failfast with 1st bio's
  Revert "blk-cgroup: pin the gendisk in struct blkcg_gq"
  Revert "blk-cgroup: pass a gendisk to blkg_lookup"
  Revert "blk-cgroup: delay blk-cgroup initialization until add_disk"
  Revert "blk-cgroup: delay calling blkcg_exit_disk until disk_release"
  Revert "blk-cgroup: move the cgroup information to struct gendisk"
  nvme-pci: remove iod use_sgls
  nvme-pci: fix freeing single sgl
  block: ublk: check IO buffer based on flag need_get_data
  s390/dasd: Fix potential memleak in dasd_eckd_init()
  s390/dasd: sort out physical vs virtual pointers usage
  block: Remove the ALLOC_CACHE_SLACK constant
  block: make kobj_type structures constant
  ...
2023-02-20 14:27:21 -08:00
Linus Torvalds
c1ef500307 Merge tag 'for-6.3/iter-ubuf-2023-02-16' of git://git.kernel.dk/linux
Pull io_uring ITER_UBUF conversion from Jens Axboe:
 "Since we now have ITER_UBUF available, switch to using it for single
  ranges as it's more efficient than ITER_IOVEC for that"

* tag 'for-6.3/iter-ubuf-2023-02-16' of git://git.kernel.dk/linux:
  block: use iter_ubuf for single range
  iov_iter: move iter_ubuf check inside restore WARN
  io_uring: use iter_ubuf for single range imports
  io_uring: switch network send/recv to ITER_UBUF
  iov: add import_ubuf()
2023-02-20 14:03:57 -08:00
Thomas Gleixner
6f3ee0e22b Merge tag 'irqchip-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
Pull irqchip updates from Marc Zyngier:

   - New and improved irqdomain locking, closing a number of races that
     became apparent now that we are able to probe drivers in parallel

   - A bunch of OF node refcounting bugs have been fixed

   - We now have a new IPI mux, lifted from the Apple AIC code and
     made common. It is expected that riscv will eventually benefit
     from it

   - Two small fixes for the Broadcom L2 drivers

   - Various cleanups and minor bug fixes

Link: https://lore.kernel.org/r/20230218143452.3817627-1-maz@kernel.org
2023-02-19 00:07:56 +01:00
Jens Axboe
f3ca738624 block: use proper return value from bio_failfast()
kernel test robot complains about a type mismatch:

   block/blk-merge.c:984:42: sparse:     expected restricted blk_opf_t const [usertype] ff
   block/blk-merge.c:984:42: sparse:     got unsigned int
   block/blk-merge.c:1010:42: sparse: sparse: incorrect type in initializer (different base types) @@     expected restricted blk_opf_t const [usertype] ff @@     got unsigned int @@
   block/blk-merge.c:1010:42: sparse:     expected restricted blk_opf_t const [usertype] ff
   block/blk-merge.c:1010:42: sparse:     got unsigned int

because bio_failfast() is return an unsigned int rather than the
appropriate blk_opt_f type. Fix it up.

Fixes: 3ce6a11598 ("block: sync mixed merged request's failfast with 1st bio's")
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202302170743.GXypM9Rt-lkp@intel.com/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-16 19:39:15 -07:00
Martin K. Petersen
b6a4bdcda4 block: bio-integrity: Copy flags when bio_integrity_payload is cloned
Make sure to copy the flags when a bio_integrity_payload is cloned.
Otherwise per-I/O properties such as IP checksum flag will not be
passed down to the HBA driver. Since the integrity buffer is owned by
the original bio, the BIP_BLOCK_INTEGRITY flag needs to be masked off
to avoid a double free in the completion path.

Fixes: aae7df5019 ("block: Integrity checksum flag")
Fixes: b1f0138857 ("block: Relocate bio integrity flags")
Reported-by: Saurav Kashyap <skashyap@marvell.com>
Tested-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20230215171801.21062-1-martin.petersen@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-16 11:05:41 -07:00
Jinke Han
0f7c8f0f79 block: Fix io statistics for cgroup in throttle path
In the current code, io statistics are missing for cgroup when bio
was throttled by blk-throttle. Fix it by moving the unreaching code
to submit_bio_noacct_nocheck.

Fixes: 3f98c75371 ("block: don't check bio in blk_throtl_dispatch_work_fn")
Signed-off-by: Jinke Han <hanjinke.666@bytedance.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Acked-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230216032250.74230-1-hanjinke.666@bytedance.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-16 11:04:11 -07:00
Ming Lei
3ce6a11598 block: sync mixed merged request's failfast with 1st bio's
We support mixed merge for requests/bios with different fastfail
settings. When request fails, each time we only handle the portion
with same failfast setting, then bios with failfast can be failed
immediately, and bios without failfast can be retried.

The idea is pretty good, but the current implementation has several
defects:

1) initially RA bio doesn't set failfast, however bio merge code
doesn't consider this point, and just check its failfast setting for
deciding if mixed merge is required. Fix this issue by adding helper
of bio_failfast().

2) when merging bio to request front, if this request is mixed
merged, we have to sync request's faifast setting with 1st bio's
failfast. Fix it by calling blk_update_mixed_merge().

3) when merging bio to request back, if this request is mixed
merged, we have to mark the bio as failfast, because blk_update_request
simply updates request failfast with 1st bio's failfast. Fix
it by calling blk_update_mixed_merge().

Fixes one normal EXT4 READ IO failure issue, because it is observed
that the normal READ IO is merged with RA IO, and the mixed merged
request has different failfast setting with 1st bio's, so finally
the normal READ IO doesn't get retried.

Cc: Tejun Heo <tj@kernel.org>
Fixes: 80a761fd33 ("block: implement mixed merge of different failfast requests")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230209125527.667004-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-16 07:49:27 -07:00
Christoph Hellwig
fd8f8ede23 block: export bio_split_rw
bio_split_rw can be used by file systems to split and incoming write
bio into multiple bios fitting the hardware limit for use as ZONE_APPEND
bios.  Export it for initial use in btrfs.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15 19:38:50 +01:00
Christoph Hellwig
a06377c5d0 Revert "blk-cgroup: pin the gendisk in struct blkcg_gq"
This reverts commit 84d7d462b1.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230214183308.1658775-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-14 14:24:09 -07:00
Christoph Hellwig
9a9c261e6b Revert "blk-cgroup: pass a gendisk to blkg_lookup"
This reverts commit 821e840c08ad83736eced4037cdad864e95e2584.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230214183308.1658775-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-14 14:24:09 -07:00
Christoph Hellwig
b6553bef8c Revert "blk-cgroup: delay blk-cgroup initialization until add_disk"
This reverts commit 178fa7d498.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230214183308.1658775-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-14 14:24:09 -07:00
Christoph Hellwig
b4e94f9c2c Revert "blk-cgroup: delay calling blkcg_exit_disk until disk_release"
This reverts commit c43332fe02 as it is not
needed without moving to disk references in the blkg.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230214183308.1658775-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-14 14:24:09 -07:00
Christoph Hellwig
1231039db3 Revert "blk-cgroup: move the cgroup information to struct gendisk"
This reverts commit 3f13ab7c80 as a patch
it depends on caused a few problems.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230214183308.1658775-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-14 14:24:09 -07:00
Bart Van Assche
9af9935494 block: Remove the ALLOC_CACHE_SLACK constant
Commit b99182c501 ("bio: add pcpu caching for non-polling bio_put")
removed the code that uses this constant. Hence also remove the constant
itself.

Cc: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230209230135.3475829-1-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-09 17:03:36 -07:00
Thomas Weißschuh
5f6224175f block: make kobj_type structures constant
Since commit ee6d3dd4ed ("driver core: make kobj_type constant.")
the driver core allows the usage of const struct kobj_type.

Take advantage of this to constify the structure definitions to prevent
modification at runtime.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/r/20230208-kobj_type-block-v1-1-0b3eafd7d983@weissschuh.net
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-09 09:38:16 -07:00
Xiao Ni
23f3e3272e block: Merge bio before checking ->cached_rq
It checks if plug->cached_rq is empty before merging bio. But the merge action
doesn't have relationship with plug->cached_rq, it trys to merge bio with
requests within plug->mq_list. Now it checks if ->cached_rq is empty before
merging bio. If it's empty, it will miss the merge chances. So move the merge
function before checking ->cached_rq.

Signed-off-by: Xiao Ni <xni@redhat.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230209031930.27354-1-xni@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-09 08:11:25 -07:00
Christoph Hellwig
dcb5220143 Revert "blk-cgroup: simplify blkg freeing from initialization failure paths"
It turns out this was too soon.  blkg_conf_prep does to funky locking games
with the queue lock for this to work properly.

This reverts commit 27b642b07a.

Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230209053523.437927-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-09 08:11:11 -07:00
Christoph Hellwig
c43332fe02 blk-cgroup: delay calling blkcg_exit_disk until disk_release
While del_gendisk ensures there is no outstanding I/O on the queue,
it can't prevent block layer users from building new I/O.

This leads to a NULL ->root_blkg reference in bio_associate_blkg when
allocating a new bio on a shut down file system.  Delay freeing the
blk-cgroup subsystems from del_gendisk until disk_release to make
sure the blkg and throttle information is still avaіlable for bio
submitters, even if those bios will immediately fail.

This now can cause a case where disk_release is called on a disk
that hasn't been added.  That's mostly harmless, except for a case
in blk_throttl_exit that now needs to check for a NULL ->td pointer.

Fixes: 178fa7d498 ("blk-cgroup: delay blk-cgroup initialization until add_disk")
Reported-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230208063514.171485-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-09 08:10:45 -07:00
Yu Kuai
f37bf75ca7 block, bfq: cleanup 'bfqg->online'
After commit dfd6200a09 ("blk-cgroup: support to track if policy is
online"), there is no need to do this again in bfq.

However, 'pd->online' is not protected by 'bfqd->lock', in order to make
sure bfq won't see that 'pd->online' is still set after bfq_pd_offline(),
clear it before bfq_pd_offline() is called. This is fine because other
polices doesn't use 'pd->online' and bfq_pd_offline() will move active
bfqq to root cgroup anyway.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230202134913.2364549-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-07 10:20:59 -07:00