Commit Graph

1212 Commits

Author SHA1 Message Date
Tao Huang
495fe343ce Merge tag 'v6.1.99'
This is the 6.1.99 stable release

* tag 'v6.1.99': (1975 commits)
  Linux 6.1.99
  Revert "usb: xhci: prevent potential failure in handle_tx_event() for Transfer events without TRB"
  Linux 6.1.98
  nilfs2: fix incorrect inode allocation from reserved inodes
  null_blk: Do not allow runt zone with zone capacity smaller then zone size
  spi: cadence: Ensure data lines set to low during dummy-cycle period
  nfc/nci: Add the inconsistency check between the input data length and count
  kbuild: fix short log for AS in link-vmlinux.sh
  nvmet: fix a possible leak when destroy a ctrl during qp establishment
  platform/x86: touchscreen_dmi: Add info for the EZpad 6s Pro
  platform/x86: touchscreen_dmi: Add info for GlobalSpace SolT IVW 11.6" tablet
  regmap-i2c: Subtract reg size from max_write
  nvme: adjust multiples of NVME_CTRL_PAGE_SIZE in offset
  dma-mapping: benchmark: avoid needless copy_to_user if benchmark fails
  nvme-multipath: find NUMA path only for online numa-node
  ALSA: hda/realtek: Enable headset mic of JP-IK LEAP W502 with ALC897
  fs/ntfs3: Mark volume as dirty if xattr is broken
  i2c: pnx: Fix potential deadlock warning from del_timer_sync() call in isr
  clk: mediatek: mt8183: Only enable runtime PM on mt8183-mfgcfg
  clk: mediatek: clk-mtk: Register MFG notifier in mtk_clk_simple_probe()
  ...

Change-Id: Ibf9c2caa3bbffb7a960e82ec6c2b0b497753778c

Conflicts:
	arch/arm64/boot/dts/rockchip/rk3328.dtsi
	drivers/gpu/drm/rockchip/rockchip_drm_vop2.c
	drivers/phy/rockchip/phy-rockchip-snps-pcie3.c
	drivers/pinctrl/pinctrl-rockchip.c
	drivers/usb/gadget/function/u_audio.c
	include/linux/usb/quirks.h
	mm/cma.c
	sound/soc/rockchip/rockchip_i2s_tdm.c
2024-10-25 17:51:39 +08:00
Tao Huang
b453658077 Merge tag 'v6.1.84'
This is the 6.1.84 stable release

* tag 'v6.1.84': (1865 commits)
  Linux 6.1.84
  tools/resolve_btfids: fix build with musl libc
  USB: core: Fix deadlock in usb_deauthorize_interface()
  x86/sev: Skip ROM range scans and validation for SEV-SNP guests
  scsi: libsas: Fix disk not being scanned in after being removed
  scsi: libsas: Add a helper sas_get_sas_addr_and_dev_type()
  scsi: lpfc: Correct size for wqe for memset()
  scsi: lpfc: Correct size for cmdwqe/rspwqe for memset()
  tls: fix use-after-free on failed backlog decryption
  x86/cpu: Enable STIBP on AMD if Automatic IBRS is enabled
  scsi: qla2xxx: Delay I/O Abort on PCI error
  scsi: qla2xxx: Change debug message during driver unload
  scsi: qla2xxx: Fix double free of fcport
  scsi: qla2xxx: Fix command flush on cable pull
  scsi: qla2xxx: NVME|FCP prefer flag not being honored
  scsi: qla2xxx: Update manufacturer detail
  scsi: qla2xxx: Split FCE|EFT trace control
  scsi: qla2xxx: Fix N2N stuck connection
  scsi: qla2xxx: Prevent command send on chip reset
  usb: typec: ucsi: Clear UCSI_CCI_RESET_COMPLETE before reset
  ...

Change-Id: If6edd552c88012d97f5eefc5e1d97a4f1683f171

Conflicts:
	drivers/gpu/drm/bridge/sii902x.c
	drivers/gpu/drm/rockchip/rockchip_lvds.c
	drivers/media/i2c/imx335.c
	drivers/usb/dwc3/gadget.c
	drivers/usb/host/xhci-plat.c
	sound/soc/rockchip/rockchip_i2s_tdm.c
2024-08-17 17:42:29 +08:00
Tao Huang
fce55f8eb2 Merge tag 'v6.1.75'
This is the 6.1.75 stable release

* tag 'v6.1.75': (2623 commits)
  Linux 6.1.75
  Revert "Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d""
  arm64: dts: armada-3720-turris-mox: set irq type for RTC
  Revert "KEYS: encrypted: Add check for strsep"
  riscv: Fix wrong usage of lm_alias() when splitting a huge linear mapping
  block: Remove special-casing of compound pages
  i2c: s3c24xx: fix transferring more than one message in polling mode
  i2c: s3c24xx: fix read transfers in polling mode
  ipv6: mcast: fix data-race in ipv6_mc_down / mld_ifc_work
  selftests: mlxsw: qos_pfc: Adjust the test to support 8 lanes
  mlxsw: spectrum_acl_erp: Fix error flow of pool allocation failure
  loop: fix the the direct I/O support check when used on top of block devices
  ethtool: netlink: Add missing ethnl_ops_begin/complete
  kdb: Fix a potential buffer overflow in kdb_local()
  ipvs: avoid stat macros calls from preemptible context
  netfilter: nf_tables: reject NFT_SET_CONCAT with not field length description
  netfilter: nf_tables: skip dead set elements in netlink dump
  netfilter: nf_tables: do not allow mismatch field size and set key length
  netfilter: bridge: replace physindev with physinif in nf_bridge_info
  netfilter: propagate net to nf_bridge_get_physindev
  ...

Conflicts:
	drivers/clk/rockchip/clk-rk3568.c
	drivers/devfreq/event/rockchip-dfi.c
	drivers/gpu/drm/rockchip/rockchip_drm_vop.c
	drivers/gpu/drm/rockchip/rockchip_drm_vop2.c
	drivers/i2c/busses/i2c-rk3x.c
	drivers/i2c/i2c-core-base.c
	drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
	drivers/nvme/host/nvme.h

Change-Id: I9649ece83925659bca59cced0be24f0bd165822a
Signed-off-by: Sandy Huang <hjc@rock-chips.com>
Signed-off-by: Tao Huang <huangtao@rock-chips.com>
2024-05-08 11:14:32 +08:00
Anna-Maria Behnsen
bccc8d1550 PM: s2idle: Make sure CPUs will wakeup directly on resume
commit 3c89a068bfd0698a5478f4cf39493595ef757d5e upstream.

s2idle works like a regular suspend with freezing processes and freezing
devices. All CPUs except the control CPU go into idle. Once this is
completed the control CPU kicks all other CPUs out of idle, so that they
reenter the idle loop and then enter s2idle state. The control CPU then
issues an swait() on the suspend state and therefore enters the idle loop
as well.

Due to being kicked out of idle, the other CPUs leave their NOHZ states,
which means the tick is active and the corresponding hrtimer is programmed
to the next jiffie.

On entering s2idle the CPUs shut down their local clockevent device to
prevent wakeups. The last CPU which enters s2idle shuts down its local
clockevent and freezes timekeeping.

On resume, one of the CPUs receives the wakeup interrupt, unfreezes
timekeeping and its local clockevent and starts the resume process. At that
point all other CPUs are still in s2idle with their clockevents switched
off. They only resume when they are kicked by another CPU or after resuming
devices and then receiving a device interrupt.

That means there is no guarantee that all CPUs will wakeup directly on
resume. As a consequence there is no guarantee that timers which are queued
on those CPUs and should expire directly after resume, are handled. Also
timer list timers which are remotely queued to one of those CPUs after
resume will not result in a reprogramming IPI as the tick is
active. Queueing a hrtimer will also not result in a reprogramming IPI
because the first hrtimer event is already in the past.

The recent introduction of the timer pull model (7ee988770326 ("timers:
Implement the hierarchical pull model")) amplifies this problem, if the
current migrator is one of the non woken up CPUs. When a non pinned timer
list timer is queued and the queuing CPU goes idle, it relies on the still
suspended migrator CPU to expire the timer which will happen by chance.

The problem exists since commit 8d89835b04 ("PM: suspend: Do not pause
cpuidle in the suspend-to-idle path"). There the cpuidle_pause() call which
in turn invoked a wakeup for all idle CPUs was moved to a later point in
the resume process. This might not be reached or reached very late because
it waits on a timer of a still suspended CPU.

Address this by kicking all CPUs out of idle after the control CPU returns
from swait() so that they resume their timers and restore consistent system
state.

Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218641
Fixes: 8d89835b04 ("PM: suspend: Do not pause cpuidle in the suspend-to-idle path")
Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mario Limonciello <mario.limonciello@amd.com>
Cc: 5.16+ <stable@kernel.org> # 5.16+
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-17 11:18:22 +02:00
Maulik Shah
a7b6523f92 PM: suspend: Set mem_sleep_current during kernel command line setup
[ Upstream commit 9bc4ffd32ef8943f5c5a42c9637cfd04771d021b ]

psci_init_system_suspend() invokes suspend_set_ops() very early during
bootup even before kernel command line for mem_sleep_default is setup.
This leads to kernel command line mem_sleep_default=s2idle not working
as mem_sleep_current gets changed to deep via suspend_set_ops() and never
changes back to s2idle.

Set mem_sleep_current along with mem_sleep_default during kernel command
line setup as default suspend mode.

Fixes: faf7ec4a92 ("drivers: firmware: psci: add system suspend support")
CC: stable@vger.kernel.org # 5.4+
Signed-off-by: Maulik Shah <quic_mkshah@quicinc.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-03 15:19:28 +02:00
Hongchen Zhang
3a081586c7 PM: hibernate: Enforce ordering during image compression/decompression
commit 71cd7e80cfde548959952eac7063aeaea1f2e1c6 upstream.

An S4 (suspend to disk) test on the LoongArch 3A6000 platform sometimes
fails with the following error messaged in the dmesg log:

	Invalid LZO compressed length

That happens because when compressing/decompressing the image, the
synchronization between the control thread and the compress/decompress/crc
thread is based on a relaxed ordering interface, which is unreliable, and the
following situation may occur:

CPU 0					CPU 1
save_image_lzo				lzo_compress_threadfn
					  atomic_set(&d->stop, 1);
  atomic_read(&data[thr].stop)
  data[thr].cmp = data[thr].cmp_len;
	  				  WRITE data[thr].cmp_len

Then CPU0 gets a stale cmp_len and writes it to disk. During resume from S4,
wrong cmp_len is loaded.

To maintain data consistency between the two threads, use the acquire/release
variants of atomic set and read operations.

Fixes: 081a9d043c ("PM / Hibernate: Improve performance of LZO/plain hibernation, checksum image")
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn>
Co-developed-by: Weihao Li <liweihao@loongson.cn>
Signed-off-by: Weihao Li <liweihao@loongson.cn>
[ rjw: Subject rewrite and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-31 16:16:58 -08:00
XiaoDong Huang
e6b4866a98 PM / sleep: support mem_lite/mem_ultra mode
Change-Id: Iab2e11a2f63b1a3f38c1c9b6c4b675ee2660f244
Signed-off-by: XiaoDong Huang <derrick.huang@rock-chips.com>
2023-12-07 09:44:59 +08:00
Brian Geffon
d150294818 PM: hibernate: Clean up sync_read handling in snapshot_write_next()
commit d08970df1980476f27936e24d452550f3e9e92e1 upstream.

In snapshot_write_next(), sync_read is set and unset in three different
spots unnecessiarly. As a result there is a subtle bug where the first
page after the meta data has been loaded unconditionally sets sync_read
to 0. If this first PFN was actually a highmem page, then the returned
buffer will be the global "buffer," and the page needs to be loaded
synchronously.

That is, I'm not sure we can always assume the following to be safe:

	handle->buffer = get_buffer(&orig_bm, &ca);
	handle->sync_read = 0;

Because get_buffer() can call get_highmem_page_buffer() which can
return 'buffer'.

The easiest way to address this is just set sync_read before
snapshot_write_next() returns if handle->buffer == buffer.

Signed-off-by: Brian Geffon <bgeffon@google.com>
Fixes: 8357376d3d ("[PATCH] swsusp: Improve handling of highmem")
Cc: All applicable <stable@vger.kernel.org>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28 17:07:11 +00:00
Brian Geffon
567c6f6495 PM: hibernate: Use __get_safe_page() rather than touching the list
commit f0c7183008b41e92fa676406d87f18773724b48b upstream.

We found at least one situation where the safe pages list was empty and
get_buffer() would gladly try to use a NULL pointer.

Signed-off-by: Brian Geffon <bgeffon@google.com>
Fixes: 8357376d3d ("[PATCH] swsusp: Improve handling of highmem")
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28 17:07:11 +00:00
Chungkai Yang
9a2c57fd32 PM: QoS: Restore support for default value on frequency QoS
commit 3a8395b565b5b4f019b3dc182be4c4541eb35ac8 upstream.

Commit 8d36694245 ("PM: QoS: Add check to make sure CPU freq is
non-negative") makes sure CPU freq is non-negative to avoid negative
value converting to unsigned data type. However, when the value is
PM_QOS_DEFAULT_VALUE, pm_qos_update_target specifically uses
c->default_value which is set to FREQ_QOS_MIN/MAX_DEFAULT_VALUE when
cpufreq_policy_alloc is executed, for this case handling.

Adding check for PM_QOS_DEFAULT_VALUE to let default setting work will
fix this problem.

Fixes: 8d36694245 ("PM: QoS: Add check to make sure CPU freq is non-negative")
Link: https://lore.kernel.org/lkml/20230626035144.19717-1-Chung-kai.Yang@mediatek.com/
Link: https://lore.kernel.org/lkml/20230627071727.16646-1-Chung-kai.Yang@mediatek.com/
Link: https://lore.kernel.org/lkml/CAJZ5v0gxNOWhC58PHeUhW_tgf6d1fGJVZ1x91zkDdht11yUv-A@mail.gmail.com/
Signed-off-by: Chungkai Yang <Chung-kai.Yang@mediatek.com>
Cc: 6.0+ <stable@vger.kernel.org> # 6.0+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-23 13:49:45 +02:00
Chen Yu
72f3217aa1 PM: hibernate: Do not get block device exclusively in test_resume mode
[ Upstream commit 5904de0d735bbb3b4afe9375c5b4f9748f882945 ]

The system refused to do a test_resume because it found that the
swap device has already been taken by someone else. Specifically,
the swsusp_check()->blkdev_get_by_dev(FMODE_EXCL) is supposed to
do this check.

Steps to reproduce:
 dd if=/dev/zero of=/swapfile bs=$(cat /proc/meminfo |
       awk '/MemTotal/ {print $2}') count=1024 conv=notrunc
 mkswap /swapfile
 swapon /swapfile
 swap-offset /swapfile
 echo 34816 > /sys/power/resume_offset
 echo test_resume > /sys/power/disk
 echo disk > /sys/power/state

 PM: Using 3 thread(s) for compression
 PM: Compressing and saving image data (293150 pages)...
 PM: Image saving progress:   0%
 PM: Image saving progress:  10%
 ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
 ata1.00: configured for UDMA/100
 ata2: SATA link down (SStatus 0 SControl 300)
 ata5: SATA link down (SStatus 0 SControl 300)
 ata6: SATA link down (SStatus 0 SControl 300)
 ata3: SATA link down (SStatus 0 SControl 300)
 ata4: SATA link down (SStatus 0 SControl 300)
 PM: Image saving progress:  20%
 PM: Image saving progress:  30%
 PM: Image saving progress:  40%
 PM: Image saving progress:  50%
 pcieport 0000:00:02.5: pciehp: Slot(0-5): No device found
 PM: Image saving progress:  60%
 PM: Image saving progress:  70%
 PM: Image saving progress:  80%
 PM: Image saving progress:  90%
 PM: Image saving done
 PM: hibernation: Wrote 1172600 kbytes in 2.70 seconds (434.29 MB/s)
 PM: S|
 PM: hibernation: Basic memory bitmaps freed
 PM: Image not found (code -16)

This is because when using the swapfile as the hibernation storage,
the block device where the swapfile is located has already been mounted
by the OS distribution(usually mounted as the rootfs). This is not
an issue for normal hibernation, because software_resume()->swsusp_check()
happens before the block device(rootfs) mount. But it is a problem for the
test_resume mode. Because when test_resume happens, the block device has
been mounted already.

Thus remove the FMODE_EXCL for test_resume mode. This would not be a
problem because in test_resume stage, the processes have already been
frozen, and the race condition described in
Commit 39fbef4b0f ("PM: hibernate: Get block device exclusively in swsusp_check()")
is unlikely to happen.

Fixes: 39fbef4b0f ("PM: hibernate: Get block device exclusively in swsusp_check()")
Reported-by: Yifan Li <yifan2.li@intel.com>
Suggested-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
Tested-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
Tested-by: Wendy Wang <wendy.wang@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-11 23:03:38 +09:00
Chen Yu
208ba216cc PM: hibernate: Turn snapshot_test into global variable
[ Upstream commit 08169a162f97819d3e5b4a342bb9cf5137787154 ]

There is need to check snapshot_test and open block device
in different mode, so as to avoid the race condition.

No functional changes intended.

Suggested-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Stable-dep-of: 5904de0d735b ("PM: hibernate: Do not get block device exclusively in test_resume mode")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-11 23:03:38 +09:00
Greg Kroah-Hartman
5100c4efc3 PM: EM: fix memory leak with using debugfs_lookup()
[ Upstream commit a0e8c13ccd6a9a636d27353da62c2410c4eca337 ]

When calling debugfs_lookup() the result must have dput() called on it,
otherwise the memory will leak over time.  To make things simpler, just
call debugfs_lookup_and_remove() instead which handles all of the logic
at once.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-10 09:33:53 +01:00
Rafael J. Wysocki
f2173508b1 PM: sleep: Avoid using pr_cont() in the tasks freezing code
commit a449dfbfc0894676ad0aa1873383265047529e3a upstream.

Using pr_cont() in the tasks freezing code related to system-wide
suspend and hibernation is problematic, because the continuation
messages printed there are susceptible to interspersing with other
unrelated messages which results in output that is hard to
understand.

Address this issue by modifying try_to_freeze_tasks() to print
messages that don't require continuations and adjusting its
callers accordingly.

Reported-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-03 11:52:23 +01:00
xiongxin
574b10475d PM: hibernate: Fix mistake in kerneldoc comment
[ Upstream commit 6e5d7300cbe7c3541bc31f16db3e9266e6027b4b ]

The actual maximum image size formula in hibernate_preallocate_memory()
is as follows:

max_size = (count - (size + PAGES_FOR_IO)) / 2
	    - 2 * DIV_ROUND_UP(reserved_size, PAGE_SIZE);

but the one in the kerneldoc comment of the function is different and
incorrect.

Fixes: ddeb648708 ("PM / Hibernate: Add sysfs knob to control size of memory for drivers")
Signed-off-by: xiongxin <xiongxin@kylinos.cn>
[ rjw: Subject and changelog rewrite ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31 13:31:55 +01:00
Mario Limonciello
85850af4fc PM: hibernate: Allow hybrid sleep to work with s2idle
Hybrid sleep is currently hardcoded to only operate with S3 even
on systems that might not support it.

Instead of assuming this mode is what the user wants to use, for
hybrid sleep follow the setting of `mem_sleep_current` which
will respect mem_sleep_default kernel command line and policy
decisions made by the presence of the FADT low power idle bit.

Fixes: 81d45bdf89 ("PM / hibernate: Untangle power_down()")
Reported-and-tested-by: kolAflash <kolAflash@kolahilft.de>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216574
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-10-25 14:53:19 +02:00
Linus Torvalds
30c999937f Merge tag 'sched-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "Debuggability:

   - Change most occurances of BUG_ON() to WARN_ON_ONCE()

   - Reorganize & fix TASK_ state comparisons, turn it into a bitmap

   - Update/fix misc scheduler debugging facilities

  Load-balancing & regular scheduling:

   - Improve the behavior of the scheduler in presence of lot of
     SCHED_IDLE tasks - in particular they should not impact other
     scheduling classes.

   - Optimize task load tracking, cleanups & fixes

   - Clean up & simplify misc load-balancing code

  Freezer:

   - Rewrite the core freezer to behave better wrt thawing and be
     simpler in general, by replacing PF_FROZEN with TASK_FROZEN &
     fixing/adjusting all the fallout.

  Deadline scheduler:

   - Fix the DL capacity-aware code

   - Factor out dl_task_is_earliest_deadline() &
     replenish_dl_new_period()

   - Relax/optimize locking in task_non_contending()

  Cleanups:

   - Factor out the update_current_exec_runtime() helper

   - Various cleanups, simplifications"

* tag 'sched-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
  sched: Fix more TASK_state comparisons
  sched: Fix TASK_state comparisons
  sched/fair: Move call to list_last_entry() in detach_tasks
  sched/fair: Cleanup loop_max and loop_break
  sched/fair: Make sure to try to detach at least one movable task
  sched: Show PF_flag holes
  freezer,sched: Rewrite core freezer logic
  sched: Widen TAKS_state literals
  sched/wait: Add wait_event_state()
  sched/completion: Add wait_for_completion_state()
  sched: Add TASK_ANY for wait_task_inactive()
  sched: Change wait_task_inactive()s match_state
  freezer,umh: Clean up freezer/initrd interaction
  freezer: Have {,un}lock_system_sleep() save/restore flags
  sched: Rename task_running() to task_on_cpu()
  sched/fair: Cleanup for SIS_PROP
  sched/fair: Default to false in test_idle_cores()
  sched/fair: Remove useless check in select_idle_core()
  sched/fair: Avoid double search on same cpu
  sched/fair: Remove redundant check in select_idle_smt()
  ...
2022-10-10 09:10:28 -07:00
Mario Limonciello
811d59fdf5 ACPI: s2idle: Add a new ->check() callback for platform_s2idle_ops
On some platforms it is found that Linux more aggressively enters s2idle
than Windows enters Modern Standby and this uncovers some synchronization
issues for the platform.  To aid in debugging this class of problems in
the future, add support for an extra optional callback intended for
drivers to emit extra debugging.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20220829162953.5947-2-mario.limonciello@amd.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-09-09 17:37:40 +02:00
Peter Zijlstra
f5d39b0208 freezer,sched: Rewrite core freezer logic
Rewrite the core freezer to behave better wrt thawing and be simpler
in general.

By replacing PF_FROZEN with TASK_FROZEN, a special block state, it is
ensured frozen tasks stay frozen until thawed and don't randomly wake
up early, as is currently possible.

As such, it does away with PF_FROZEN and PF_FREEZER_SKIP, freeing up
two PF_flags (yay!).

Specifically; the current scheme works a little like:

	freezer_do_not_count();
	schedule();
	freezer_count();

And either the task is blocked, or it lands in try_to_freezer()
through freezer_count(). Now, when it is blocked, the freezer
considers it frozen and continues.

However, on thawing, once pm_freezing is cleared, freezer_count()
stops working, and any random/spurious wakeup will let a task run
before its time.

That is, thawing tries to thaw things in explicit order; kernel
threads and workqueues before doing bringing SMP back before userspace
etc.. However due to the above mentioned races it is entirely possible
for userspace tasks to thaw (by accident) before SMP is back.

This can be a fatal problem in asymmetric ISA architectures (eg ARMv9)
where the userspace task requires a special CPU to run.

As said; replace this with a special task state TASK_FROZEN and add
the following state transitions:

	TASK_FREEZABLE	-> TASK_FROZEN
	__TASK_STOPPED	-> TASK_FROZEN
	__TASK_TRACED	-> TASK_FROZEN

The new TASK_FREEZABLE can be set on any state part of TASK_NORMAL
(IOW. TASK_INTERRUPTIBLE and TASK_UNINTERRUPTIBLE) -- any such state
is already required to deal with spurious wakeups and the freezer
causes one such when thawing the task (since the original state is
lost).

The special __TASK_{STOPPED,TRACED} states *can* be restored since
their canonical state is in ->jobctl.

With this, frozen tasks need an explicit TASK_FROZEN wakeup and are
free of undue (early / spurious) wakeups.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20220822114649.055452969@infradead.org
2022-09-07 21:53:50 +02:00
Peter Zijlstra
5950e5d574 freezer: Have {,un}lock_system_sleep() save/restore flags
Rafael explained that the reason for having both PF_NOFREEZE and
PF_FREEZER_SKIP is that {,un}lock_system_sleep() is callable from
kthread context that has previously called set_freezable().

In preparation of merging the flags, have {,un}lock_system_slee() save
and restore current->flags.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20220822114648.725003428@infradead.org
2022-09-07 21:53:48 +02:00
Linus Torvalds
228dfe98a3 Merge tag 'char-misc-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc driver updates from Greg KH:
 "Here is the large set of char and misc and other driver subsystem
  changes for 6.0-rc1.

  Highlights include:

   - large set of IIO driver updates, additions, and cleanups

   - new habanalabs device support added (loads of register maps much
     like GPUs have)

   - soundwire driver updates

   - phy driver updates

   - slimbus driver updates

   - tiny virt driver fixes and updates

   - misc driver fixes and updates

   - interconnect driver updates

   - hwtracing driver updates

   - fpga driver updates

   - extcon driver updates

   - firmware driver updates

   - counter driver update

   - mhi driver fixes and updates

   - binder driver fixes and updates

   - speakup driver fixes

  All of these have been in linux-next for a while without any reported
  problems"

* tag 'char-misc-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (634 commits)
  drivers: lkdtm: fix clang -Wformat warning
  char: remove VR41XX related char driver
  misc: Mark MICROCODE_MINOR unused
  spmi: trace: fix stack-out-of-bound access in SPMI tracing functions
  dt-bindings: iio: adc: Add compatible for MT8188
  iio: light: isl29028: Fix the warning in isl29028_remove()
  iio: accel: sca3300: Extend the trigger buffer from 16 to 32 bytes
  iio: fix iio_format_avail_range() printing for none IIO_VAL_INT
  iio: adc: max1027: unlock on error path in max1027_read_single_value()
  iio: proximity: sx9324: add empty line in front of bullet list
  iio: magnetometer: hmc5843: Remove duplicate 'the'
  iio: magn: yas530: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros
  iio: magnetometer: ak8974: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros
  iio: light: veml6030: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros
  iio: light: vcnl4035: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros
  iio: light: vcnl4000: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros
  iio: light: tsl2591: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr()
  iio: light: tsl2583: Use DEFINE_RUNTIME_DEV_PM_OPS and pm_ptr()
  iio: light: isl29028: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr()
  iio: light: gp2ap002: Switch to DEFINE_RUNTIME_DEV_PM_OPS and pm_ptr()
  ...
2022-08-04 11:05:48 -07:00
Linus Torvalds
c013d0af81 Merge tag 'for-5.20/block-2022-07-29' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe:

 - Improve the type checking of request flags (Bart)

 - Ensure queue mapping for a single queues always picks the right queue
   (Bart)

 - Sanitize the io priority handling (Jan)

 - rq-qos race fix (Jinke)

 - Reserved tags handling improvements (John)

 - Separate memory alignment from file/disk offset aligment for O_DIRECT
   (Keith)

 - Add new ublk driver, userspace block driver using io_uring for
   communication with the userspace backend (Ming)

 - Use try_cmpxchg() to cleanup the code in various spots (Uros)

 - Finally remove bdevname() (Christoph)

 - Clean up the zoned device handling (Christoph)

 - Clean up independent access range support (Christoph)

 - Clean up and improve block sysfs handling (Christoph)

 - Clean up and improve teardown of block devices.

   This turns the usual two step process into something that is simpler
   to implement and handle in block drivers (Christoph)

 - Clean up chunk size handling (Christoph)

 - Misc cleanups and fixes (Bart, Bo, Dan, GuoYong, Jason, Keith, Liu,
   Ming, Sebastian, Yang, Ying)

* tag 'for-5.20/block-2022-07-29' of git://git.kernel.dk/linux-block: (178 commits)
  ublk_drv: fix double shift bug
  ublk_drv: make sure that correct flags(features) returned to userspace
  ublk_drv: fix error handling of ublk_add_dev
  ublk_drv: fix lockdep warning
  block: remove __blk_get_queue
  block: call blk_mq_exit_queue from disk_release for never added disks
  blk-mq: fix error handling in __blk_mq_alloc_disk
  ublk: defer disk allocation
  ublk: rewrite ublk_ctrl_get_queue_affinity to not rely on hctx->cpumask
  ublk: fold __ublk_create_dev into ublk_ctrl_add_dev
  ublk: cleanup ublk_ctrl_uring_cmd
  ublk: simplify ublk_ch_open and ublk_ch_release
  ublk: remove the empty open and release block device operations
  ublk: remove UBLK_IO_F_PREFLUSH
  ublk: add a MAINTAINERS entry
  block: don't allow the same type rq_qos add more than once
  mmc: fix disk/queue leak in case of adding disk failure
  ublk_drv: fix an IS_ERR() vs NULL check
  ublk: remove UBLK_IO_F_INTEGRITY
  ublk_drv: remove unneeded semicolon
  ...
2022-08-02 13:46:35 -07:00
Rafael J. Wysocki
aa727b7b4b Merge branches 'pm-devfreq', 'pm-qos', 'pm-tools' and 'pm-docs'
Merge devfreq changes, PM QoS change, and power management tools and
documentation changes for v5.20-rc1:

 - Add new devfreq driver for Mediatek CCI (Cache Coherent
   Interconnect) (Johnson Wang).

 - Convert the Samsung Exynos SoC Bus bindings to DT schema of
   exynos-bus.c (Krzysztof Kozlowski).

 - Address kernel-doc warnings by adding the description for unused
   fucntion parameters in devfreq core (Mauro Carvalho Chehab).

 - Use NULL to pass a null pointer rather than zero according to the
   function propotype in imx-bus.c (Colin Ian King).

 - Print error message instead of error interger value in
   tegra30-devfreq.c (Dmitry Osipenko).

 - Add checks to prevent setting negative frequency QoS limits for
   CPUs (Shivnandan Kumar).

 - Update the pm-graph suite of utilities to the latest revision 5.9
   including multiple improvements (Todd Brandt).

 - Drop pme_interrupt reference from the PCI power management
   documentation (Mario Limonciello).

* pm-devfreq:
  PM / devfreq: tegra30: Add error message for devm_devfreq_add_device()
  PM / devfreq: imx-bus: use NULL to pass a null pointer rather than zero
  PM / devfreq: shut up kernel-doc warnings
  dt-bindings: interconnect: samsung,exynos-bus: convert to dtschema
  PM / devfreq: mediatek: Introduce MediaTek CCI devfreq driver
  dt-bindings: interconnect: Add MediaTek CCI dt-bindings

* pm-qos:
  PM: QoS: Add check to make sure CPU freq is non-negative

* pm-tools:
  pm-graph v5.9

* pm-docs:
  Documentation: PM: Drop pme_interrupt reference
2022-07-29 19:46:00 +02:00
Rafael J. Wysocki
954a83fc60 Merge branches 'pm-core', 'pm-sleep', 'powercap', 'pm-domains' and 'pm-em'
Merge core device power management changes for v5.20-rc1:

 - Extend support for wakeirq to callback wrappers used during system
   suspend and resume (Ulf Hansson).

 - Defer waiting for device probe before loading a hibernation image
   till the first actual device access to avoid possible deadlocks
   reported by syzbot (Tetsuo Handa).

 - Unify device_init_wakeup() for PM_SLEEP and !PM_SLEEP (Bjorn
   Helgaas).

 - Add Raptor Lake-P to the list of processors supported by the Intel
   RAPL driver (George D Sworo).

 - Add Alder Lake-N and Raptor Lake-P to the list of processors for
   which Power Limit4 is supported in the Intel RAPL driver (Sumeet
   Pawnikar).

 - Make pm_genpd_remove() check genpd_debugfs_dir against NULL before
   attempting to remove it (Hsin-Yi Wang).

 - Change the Energy Model code to represent power in micro-Watts and
   adjust its users accordingly (Lukasz Luba).

* pm-core:
  PM: runtime: Extend support for wakeirq for force_suspend|resume

* pm-sleep:
  PM: hibernate: defer device probing when resuming from hibernation
  PM: wakeup: Unify device_init_wakeup() for PM_SLEEP and !PM_SLEEP

* powercap:
  powercap: RAPL: Add Power Limit4 support for Alder Lake-N and Raptor Lake-P
  powercap: intel_rapl: Add support for RAPTORLAKE_P

* pm-domains:
  PM: domains: Ensure genpd_debugfs_dir exists before remove

* pm-em:
  cpufreq: scmi: Support the power scale in micro-Watts in SCMI v3.1
  firmware: arm_scmi: Get detailed power scale from perf
  Documentation: EM: Switch to micro-Watts scale
  PM: EM: convert power field to micro-Watts precision and align drivers
2022-07-29 19:33:13 +02:00
Shivnandan Kumar
8d36694245 PM: QoS: Add check to make sure CPU freq is non-negative
CPU frequency should never be negative.

If some client driver calls freq_qos_update_request with a
negative value which will be very high in absolute terms,
then frequency QoS sets max CPU freq at fmax as it considers
it's absolute value but it will add plist node with negative
priority.

plist node has priority from INT_MIN (highest) to INT_MAX(lowest).
Once priority is set as negative, another client will not be able
to reduce CPU frequency.

Adding check to make sure CPU freq is non-negative will fix
this problem.

Signed-off-by: Shivnandan Kumar <quic_kshivnan@quicinc.com>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-26 20:48:33 +02:00