Commit Graph

1073303 Commits

Author SHA1 Message Date
Trond Myklebust e0caaf75d4 NFS: LOOKUP_DIRECTORY is also ok with symlinks
Commit ac795161c9 (NFSv4: Handle case where the lookup of a directory
fails) [1], part of Linux since 5.17-rc2, introduced a regression, where
a symbolic link on an NFS mount to a directory on another NFS does not
resolve(?) the first time it is accessed:

Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Fixes: ac795161c9 ("NFSv4: Handle case where the lookup of a directory fails")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Tested-by: Donald Buczek <buczek@molgen.mpg.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-02-14 14:58:48 -05:00
Trond Myklebust 9d047bf68f NFS: Remove an incorrect revalidation in nfs4_update_changeattr_locked()
In nfs4_update_changeattr_locked(), we don't need to set the
NFS_INO_REVAL_PAGECACHE flag, because we already know the value of the
change attribute, and we're already flagging the size. In fact, this
forces us to revalidate the change attribute a second time for no good
reason.
This extra flag appears to have been introduced as part of the xattr
feature, when update_changeattr_locked() was converted for use by the
xattr code.

Fixes: 1b523ca972 ("nfs: modify update_changeattr to deal with regular files")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-02-14 14:58:48 -05:00
Linus Torvalds 754e0b0e35 Linux 5.17-rc4 2022-02-13 12:13:30 -08:00
Linus Torvalds e89d3a4671 Merge tag 'kbuild-fixes-v5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:

 - Fix the truncated path issue for HAVE_GCC_PLUGINS test in Kconfig

 - Move -Wunsligned-access to W=1 builds to avoid sprinkling warnings
   for the latest Clang

 - Fix missing fclose() in Kconfig

 - Fix Kconfig to touch dep headers correctly when KCONFIG_AUTOCONFIG is
   overridden.

* tag 'kbuild-fixes-v5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kconfig: fix failing to generate auto.conf
  kconfig: fix missing fclose() on error paths
  Makefile.extrawarn: Move -Wunaligned-access to W=1
  kconfig: let 'shell' return enough output for deep path names
2022-02-13 11:58:11 -08:00
Linus Torvalds c5d714aa6d Merge tag 'irq-urgent-2022-02-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
 "Interrupt chip driver fixes:

   - Don't install an hotplug notifier for GICV3-ITS on systems which do
     not need it to prevent a warning in the notifier about inconsistent
     state

   - Add the missing device tree matching for the T-HEAD PLIC variant so
     the related SoC is properly supported"

* tag 'irq-urgent-2022-02-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/sifive-plic: Add missing thead,c900-plic match string
  dt-bindings: update riscv plic compatible string
  irqchip/gic-v3-its: Skip HP notifier when no ITS is registered
2022-02-13 10:06:40 -08:00
Linus Torvalds 42964a18f8 Merge tag 'objtool_urgent_for_v5.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fix from Borislav Petkov:
 "Fix a case where objtool would mistakenly warn about instructions
  being unreachable"

* tag 'objtool_urgent_for_v5.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/bug: Merge annotate_reachable() into _BUG_FLAGS() asm
2022-02-13 09:43:34 -08:00
Linus Torvalds 6f35736723 Merge tag 'sched_urgent_for_v5.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Borislav Petkov:
 "Fix a NULL-ptr dereference when recalculating a sched entity's weight"

* tag 'sched_urgent_for_v5.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Fix fault in reweight_entity
2022-02-13 09:27:26 -08:00
Linus Torvalds f5e02656b1 Merge tag 'perf_urgent_for_v5.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fix from Borislav Petkov:
 "Prevent cgroup event list corruption when switching events"

* tag 'perf_urgent_for_v5.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Fix list corruption in perf_cgroup_switch()
2022-02-13 09:25:26 -08:00
Linus Torvalds 808f0ab221 Merge tag 'x86_urgent_for_v5.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Borislav Petkov:
 "Prevent softlockups when tearing down large SGX enclaves"

* tag 'x86_urgent_for_v5.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sgx: Silence softlockup detection when releasing large enclaves
2022-02-13 09:22:52 -08:00
Linus Torvalds e9c25787db Merge tag '5.17-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
 "Three small smb3 reconnect fixes and an error log clarification"

* tag '5.17-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: mark sessions for reconnection in helper function
  cifs: call helper functions for marking channels for reconnect
  cifs: call cifs_reconnect when a connection is marked
  [smb3] improve error message when mount options conflict with posix
2022-02-13 09:16:45 -08:00
Thomas Gleixner 1e34064b60 Merge tag 'irqchip-fixes-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent
Pull irqchip fixes from Marc Zyngier:

 - Don't register a hotplug notifier on GICv3 systems that advertise
   LPI support, but have no ITS to make use of it

 - Add missing DT matching for the thead,c900-plic variant of the
   SiFive PLIC

Link: https://lore.kernel.org/r/20220211110038.1179155-1-maz@kernel.org
2022-02-13 14:16:23 +01:00
Linus Torvalds b81b1829e7 Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
 "Two minor fixes in the lpfc driver. One changing the classification of
  trace messages and the other fixing a build issue when NVME_FC is
  disabled"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: lpfc: Reduce log messages seen after firmware download
  scsi: lpfc: Remove NVMe support if kernel has NVME_FC disabled
2022-02-12 10:29:02 -08:00
Linus Torvalds 080eba785f Merge tag 'char-misc-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
 "Here are a small number of char/misc driver fixes for 5.17-rc4 for
  reported issues. They contain:

   - phy driver fixes

   - iio driver fix

   - eeprom driver fix

   - speakup regression fix

   - fastrpc fix

  All of these have been in linux-next with no reported issues"

* tag 'char-misc-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  iio: buffer: Fix file related error handling in IIO_BUFFER_GET_FD_IOCTL
  speakup-dectlk: Restore pitch setting
  bus: mhi: pci_generic: Add mru_default for Cinterion MV31-W
  bus: mhi: pci_generic: Add mru_default for Foxconn SDX55
  eeprom: ee1004: limit i2c reads to I2C_SMBUS_BLOCK_MAX
  misc: fastrpc: avoid double fput() on failed usercopy
  phy: dphy: Correct clk_pre parameter
  phy: phy-mtk-tphy: Fix duplicated argument in phy-mtk-tphy
  phy: stm32: fix a refcount leak in stm32_usbphyc_pll_enable()
  phy: xilinx: zynqmp: Fix bus width setting for SGMII
  phy: cadence: Sierra: fix error handling bugs in probe()
  phy: ti: Fix missing sentinel for clk_div_table
  phy: broadcom: Kconfig: Fix PHY_BRCM_USB config option
  phy: usb: Leave some clocks running during suspend
2022-02-12 10:16:32 -08:00
Linus Torvalds dcd72f5466 Merge tag 'staging-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pullstaging driver fixes from Greg KH:
 "Here are two staging driver fixes for 5.17-rc4.  These are:

   - fbtft error path fix

   - vc04_services rcu dereference fix

  Both of these have been in linux-next for a while with no reported
  issues"

* tag 'staging-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: fbtft: Fix error path in fbtft_driver_module_init()
  staging: vc04_services: Fix RCU dereference check
2022-02-12 10:10:35 -08:00
Linus Torvalds 522e7d03f7 Merge tag 'tty-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
 "Here are four small tty/serial fixes for 5.17-rc4.  They are:

   - 8250_pericom change revert to fix a reported regression

   - two speculation fixes for vt_ioctl

   - n_tty regression fix for polling

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'tty-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  vt_ioctl: add array_index_nospec to VT_ACTIVATE
  vt_ioctl: fix array_index_nospec in vt_setactivate
  serial: 8250_pericom: Revert "Re-enable higher baud rates"
  n_tty: wake up poll(POLLRDNORM) on receiving data
2022-02-12 10:01:55 -08:00
Linus Torvalds 8518737899 Merge tag 'usb-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
 "Here are some small USB driver fixes for 5.17-rc4 that resolve some
  reported issues and add new device ids:

   - usb-serial new device ids

   - ulpi cleanup fixes

   - f_fs use-after-free fix

   - dwc3 driver fixes

   - ax88179_178a usb network driver fix

   - usb gadget fixes

  There is a revert at the end of this series to resolve a build problem
  that 0-day found yesterday. Most of these have been in linux-next,
  except for the last few, and all have now passed 0-day tests"

* tag 'usb-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  Revert "usb: dwc2: drd: fix soft connect when gadget is unconfigured"
  usb: dwc2: drd: fix soft connect when gadget is unconfigured
  usb: gadget: rndis: check size of RNDIS_MSG_SET command
  USB: gadget: validate interface OS descriptor requests
  usb: core: Unregister device on component_add() failure
  net: usb: ax88179_178a: Fix out-of-bounds accesses in RX fixup
  usb: dwc3: gadget: Prevent core from processing stale TRBs
  USB: serial: cp210x: add CPI Bulk Coin Recycler id
  USB: serial: cp210x: add NCR Retail IO box id
  USB: serial: ftdi_sio: add support for Brainboxes US-159/235/320
  usb: gadget: f_uac2: Define specific wTerminalType
  usb: gadget: udc: renesas_usb3: Fix host to USB_ROLE_NONE transition
  usb: raw-gadget: fix handling of dual-direction-capable endpoints
  usb: usb251xb: add boost-up property support
  usb: ulpi: Call of_node_put correctly
  usb: ulpi: Move of_node_put to ulpi_dev_release
  USB: serial: option: add ZTE MF286D modem
  USB: serial: ch341: add support for GW Instek USB2.0-Serial devices
  usb: f_fs: Fix use-after-free for epfile
  usb: dwc3: xilinx: fix uninitialized return value
2022-02-12 09:56:18 -08:00
Linus Torvalds a4fd49cdb5 Merge tag 's390-5.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:
 "Maintainers and reviewers changes:

    - Add Alexander Gordeev as maintainer for s390.

    - Christian Borntraeger will focus on s390 KVM maintainership and
      stays as s390 reviewer.

  Fixes:

   - Fix clang build of modules loader KUnit test.

   - Fix kernel panic in CIO code on FCES path-event when no driver is
     attached to a device or the driver does not provide the path_event
     function"

* tag 's390-5.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/cio: verify the driver availability for path_event call
  s390/module: fix building test_modules_helpers.o with clang
  MAINTAINERS: downgrade myself to Reviewer for s390
  MAINTAINERS: add Alexander Gordeev as maintainer for s390
2022-02-12 09:12:44 -08:00
Linus Torvalds 4a387c98b3 Merge tag 'for-linus-5.17a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:

 - Two small cleanups

 - Another fix for addressing the EFI framebuffer above 4GB when running
   as Xen dom0

 - A patch to let Xen guests use reserved bits in MSI- and IO-APIC-
   registers for extended APIC-IDs the same way KVM guests are doing it
   already

* tag 'for-linus-5.17a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/pci: Make use of the helper macro LIST_HEAD()
  xen/x2apic: Fix inconsistent indenting
  xen/x86: detect support for extended destination ID
  xen/x86: obtain full video frame buffer address for Dom0 also under EFI
2022-02-12 09:08:57 -08:00
Linus Torvalds eef8cffcab Merge tag 'seccomp-v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp fixes from Kees Cook:
 "This fixes a corner case of fatal SIGSYS being ignored since v5.15.
  Along with the signal fix is a change to seccomp so that seeing
  another syscall after a fatal filter result will cause seccomp to kill
  the process harder.

  Summary:

   - Force HANDLER_EXIT even for SIGNAL_UNKILLABLE

   - Make seccomp self-destruct after fatal filter results

   - Update seccomp samples for easier behavioral demonstration"

* tag 'seccomp-v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  samples/seccomp: Adjust sample to also provide kill option
  seccomp: Invalidate seccomp mode to catch death failures
  signal: HANDLER_EXIT should clear SIGNAL_UNKILLABLE
2022-02-12 09:04:05 -08:00
Linus Torvalds 9917ff5f31 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "5 patches.

  Subsystems affected by this patch series: binfmt, procfs, and mm
  (vmscan, memcg, and kfence)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  kfence: make test case compatible with run time set sample interval
  mm: memcg: synchronize objcg lists with a dedicated spinlock
  mm: vmscan: remove deadlock due to throttling failing to make progress
  fs/proc: task_mmu.c: don't read mapcount for migration entry
  fs/binfmt_elf: fix PT_LOAD p_align values for loaders
2022-02-12 08:57:37 -08:00
Jing Leng 1b9e740a81 kconfig: fix failing to generate auto.conf
When the KCONFIG_AUTOCONFIG is specified (e.g. export \
KCONFIG_AUTOCONFIG=output/config/auto.conf), the directory of
include/config/ will not be created, so kconfig can't create deps
files in it and auto.conf can't be generated.

Signed-off-by: Jing Leng <jleng@ambarella.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2022-02-12 23:24:19 +09:00
Greg Kroah-Hartman 736e8d8904 Revert "usb: dwc2: drd: fix soft connect when gadget is unconfigured"
This reverts commit 269cbcf7b7.

It causes build errors as reported by the kernel test robot.

Link: https://lore.kernel.org/r/202202112236.AwoOTtHO-lkp@intel.com
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 269cbcf7b7 ("usb: dwc2: drd: fix soft connect when gadget is unconfigured")
Cc: stable@kernel.org
Cc: Amelie Delaunay <amelie.delaunay@foss.st.com>
Cc: Minas Harutyunyan <Minas.Harutyunyan@synopsys.com>
Cc: Fabrice Gasnier <fabrice.gasnier@foss.st.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-12 10:08:54 +01:00
Peng Liu 8913c61001 kfence: make test case compatible with run time set sample interval
The parameter kfence_sample_interval can be set via boot parameter and
late shell command, which is convenient for automated tests and KFENCE
parameter optimization.  However, KFENCE test case just uses
compile-time CONFIG_KFENCE_SAMPLE_INTERVAL, which will make KFENCE test
case not run as users desired.  Export kfence_sample_interval, so that
KFENCE test case can use run-time-set sample interval.

Link: https://lkml.kernel.org/r/20220207034432.185532-1-liupeng256@huawei.com
Signed-off-by: Peng Liu <liupeng256@huawei.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Christian Knig <christian.koenig@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-11 17:55:00 -08:00
Roman Gushchin 0764db9b49 mm: memcg: synchronize objcg lists with a dedicated spinlock
Alexander reported a circular lock dependency revealed by the mmap1 ltp
test:

  LOCKDEP_CIRCULAR (suite: ltp, case: mtest06 (mmap1))
          WARNING: possible circular locking dependency detected
          5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1 Not tainted
          ------------------------------------------------------
          mmap1/202299 is trying to acquire lock:
          00000001892c0188 (css_set_lock){..-.}-{2:2}, at: obj_cgroup_release+0x4a/0xe0
          but task is already holding lock:
          00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180
          which lock already depends on the new lock.
          the existing dependency chain (in reverse order) is:
          -> #1 (&sighand->siglock){-.-.}-{2:2}:
                 __lock_acquire+0x604/0xbd8
                 lock_acquire.part.0+0xe2/0x238
                 lock_acquire+0xb0/0x200
                 _raw_spin_lock_irqsave+0x6a/0xd8
                 __lock_task_sighand+0x90/0x190
                 cgroup_freeze_task+0x2e/0x90
                 cgroup_migrate_execute+0x11c/0x608
                 cgroup_update_dfl_csses+0x246/0x270
                 cgroup_subtree_control_write+0x238/0x518
                 kernfs_fop_write_iter+0x13e/0x1e0
                 new_sync_write+0x100/0x190
                 vfs_write+0x22c/0x2d8
                 ksys_write+0x6c/0xf8
                 __do_syscall+0x1da/0x208
                 system_call+0x82/0xb0
          -> #0 (css_set_lock){..-.}-{2:2}:
                 check_prev_add+0xe0/0xed8
                 validate_chain+0x736/0xb20
                 __lock_acquire+0x604/0xbd8
                 lock_acquire.part.0+0xe2/0x238
                 lock_acquire+0xb0/0x200
                 _raw_spin_lock_irqsave+0x6a/0xd8
                 obj_cgroup_release+0x4a/0xe0
                 percpu_ref_put_many.constprop.0+0x150/0x168
                 drain_obj_stock+0x94/0xe8
                 refill_obj_stock+0x94/0x278
                 obj_cgroup_charge+0x164/0x1d8
                 kmem_cache_alloc+0xac/0x528
                 __sigqueue_alloc+0x150/0x308
                 __send_signal+0x260/0x550
                 send_signal+0x7e/0x348
                 force_sig_info_to_task+0x104/0x180
                 force_sig_fault+0x48/0x58
                 __do_pgm_check+0x120/0x1f0
                 pgm_check_handler+0x11e/0x180
          other info that might help us debug this:
           Possible unsafe locking scenario:
                 CPU0                    CPU1
                 ----                    ----
            lock(&sighand->siglock);
                                         lock(css_set_lock);
                                         lock(&sighand->siglock);
            lock(css_set_lock);
           *** DEADLOCK ***
          2 locks held by mmap1/202299:
           #0: 00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180
           #1: 00000001892ad560 (rcu_read_lock){....}-{1:2}, at: percpu_ref_put_many.constprop.0+0x0/0x168
          stack backtrace:
          CPU: 15 PID: 202299 Comm: mmap1 Not tainted 5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1
          Hardware name: IBM 3906 M04 704 (LPAR)
          Call Trace:
            dump_stack_lvl+0x76/0x98
            check_noncircular+0x136/0x158
            check_prev_add+0xe0/0xed8
            validate_chain+0x736/0xb20
            __lock_acquire+0x604/0xbd8
            lock_acquire.part.0+0xe2/0x238
            lock_acquire+0xb0/0x200
            _raw_spin_lock_irqsave+0x6a/0xd8
            obj_cgroup_release+0x4a/0xe0
            percpu_ref_put_many.constprop.0+0x150/0x168
            drain_obj_stock+0x94/0xe8
            refill_obj_stock+0x94/0x278
            obj_cgroup_charge+0x164/0x1d8
            kmem_cache_alloc+0xac/0x528
            __sigqueue_alloc+0x150/0x308
            __send_signal+0x260/0x550
            send_signal+0x7e/0x348
            force_sig_info_to_task+0x104/0x180
            force_sig_fault+0x48/0x58
            __do_pgm_check+0x120/0x1f0
            pgm_check_handler+0x11e/0x180
          INFO: lockdep is turned off.

In this example a slab allocation from __send_signal() caused a
refilling and draining of a percpu objcg stock, resulted in a releasing
of another non-related objcg.  Objcg release path requires taking the
css_set_lock, which is used to synchronize objcg lists.

This can create a circular dependency with the sighandler lock, which is
taken with the locked css_set_lock by the freezer code (to freeze a
task).

In general it seems that using css_set_lock to synchronize objcg lists
makes any slab allocations and deallocation with the locked css_set_lock
and any intervened locks risky.

To fix the problem and make the code more robust let's stop using
css_set_lock to synchronize objcg lists and use a new dedicated spinlock
instead.

Link: https://lkml.kernel.org/r/Yfm1IHmoGdyUR81T@carbon.dhcp.thefacebook.com
Fixes: bf4f059954 ("mm: memcg/slab: obj_cgroup API")
Signed-off-by: Roman Gushchin <guro@fb.com>
Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-11 17:55:00 -08:00
Mel Gorman b485c6f1f9 mm: vmscan: remove deadlock due to throttling failing to make progress
A soft lockup bug in kcompactd was reported in a private bugzilla with
the following visible in dmesg;

  watchdog: BUG: soft lockup - CPU#33 stuck for 26s! [kcompactd0:479]
  watchdog: BUG: soft lockup - CPU#33 stuck for 52s! [kcompactd0:479]
  watchdog: BUG: soft lockup - CPU#33 stuck for 78s! [kcompactd0:479]
  watchdog: BUG: soft lockup - CPU#33 stuck for 104s! [kcompactd0:479]

The machine had 256G of RAM with no swap and an earlier failed
allocation indicated that node 0 where kcompactd was run was potentially
unreclaimable;

  Node 0 active_anon:29355112kB inactive_anon:2913528kB active_file:0kB
    inactive_file:0kB unevictable:64kB isolated(anon):0kB isolated(file):0kB
    mapped:8kB dirty:0kB writeback:0kB shmem:26780kB shmem_thp:
    0kB shmem_pmdmapped: 0kB anon_thp: 23480320kB writeback_tmp:0kB
    kernel_stack:2272kB pagetables:24500kB all_unreclaimable? yes

Vlastimil Babka investigated a crash dump and found that a task
migrating pages was trying to drain PCP lists;

  PID: 52922  TASK: ffff969f820e5000  CPU: 19  COMMAND: "kworker/u128:3"
  Call Trace:
     __schedule
     schedule
     schedule_timeout
     wait_for_completion
     __flush_work
     __drain_all_pages
     __alloc_pages_slowpath.constprop.114
     __alloc_pages
     alloc_migration_target
     migrate_pages
     migrate_to_node
     do_migrate_pages
     cpuset_migrate_mm_workfn
     process_one_work
     worker_thread
     kthread
     ret_from_fork

This failure is specific to CONFIG_PREEMPT=n builds.  The root of the
problem is that kcompact0 is not rescheduling on a CPU while a task that
has isolated a large number of the pages from the LRU is waiting on
kcompact0 to reschedule so the pages can be released.  While
shrink_inactive_list() only loops once around too_many_isolated, reclaim
can continue without rescheduling if sc->skipped_deactivate == 1 which
could happen if there was no file LRU and the inactive anon list was not
low.

Link: https://lkml.kernel.org/r/20220203100326.GD3301@suse.de
Fixes: d818fca1ca ("mm/vmscan: throttle reclaim and compaction when too may pages are isolated")
Signed-off-by: Mel Gorman <mgorman@suse.de>
Debugged-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-11 17:55:00 -08:00