Fix a possible memory leak in the power capping subsystem (Joe Hattori).
* pm-powercap:
powercap: call put_device() on an error path in powercap_register_control_type()
powercap_register_control_type() calls device_register(), but does not
release the refcount of the device when it fails.
Call put_device() before returning an error to balance the refcount.
Since the kfree(control_type) will be done by powercap_release(), remove
the lines in powercap_register_control_type() before returning the error.
This bug was found by an experimental verifier that I am developing.
Signed-off-by: Joe Hattori <joe@pf.is.s.u-tokyo.ac.jp>
Link: https://patch.msgid.link/20250110010554.1583411-1-joe@pf.is.s.u-tokyo.ac.jp
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The continual trickle of small conversion patches is grating on me, and
is really not helping. Just get rid of the 'remove_new' member
function, which is just an alias for the plain 'remove', and had a
comment to that effect:
/*
* .remove_new() is a relic from a prototype conversion of .remove().
* New drivers are supposed to implement .remove(). Once all drivers are
* converted to not use .remove_new any more, it will be dropped.
*/
This was just a tree-wide 'sed' script that replaced '.remove_new' with
'.remove', with some care taken to turn a subsequent tab into two tabs
to make things line up.
I did do some minimal manual whitespace adjustment for places that used
spaces to line things up.
Then I just removed the old (sic) .remove_new member function, and this
is the end result. No more unnecessary conversion noise.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The caller of the function dev_pm_qos_add_request() checks again a non
zero value but dev_pm_qos_add_request() can return '1' if the request
already exists. Therefore, the setup function fails while the QoS
request actually did not failed.
Fix that by changing the check against a negative value like all the
other callers of the function.
Fixes: e446556173 ("powercap/drivers/dtpm: Add dtpm devfreq with energy model support")
Signed-off-by: Yuan Can <yuancan@huawei.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/20241018021205.46460-1-yuancan@huawei.com
[ rjw: Subject edit ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The hardware definition of every TPMI feature contains a major and minor
version. When there is a change in the MMIO offset or change in the
definition of a field, hardware will change major version. For addition
of new fields without modifying existing MMIO offsets or fields, only
the minor version is changed.
If the driver has not been updated to recognize a new hardware major
version, it cannot provide the RAPL interface to users due to possible
register layout incompatibilities. However, the driver does not need to
be updated every time the hardware minor version changes because in that
case it will just miss some new functionality exposed by the hardware.
The current implementation causes the driver to refuse to work for any
hardware version change which is unnecessarily restrictive.
If there is a minor version mismatch, log an information message and
continue, but if there is a major version mismatch, log a warning and
exit (as before).
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://patch.msgid.link/20240930081801.28502-4-rui.zhang@intel.com
Fixes: 9eef7f9da9 ("powercap: intel_rapl: Introduce RAPL TPMI interface driver")
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
After commit ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf"),
on AMD processors that support extended CPUID leaf 0x80000026, the
topology_logical_die_id() macros, no longer returns package id, instead it
returns the CCD (Core Complex Die) id. This leads to the energy-pkg
event scope to be modified to CCD instead of package.
For more historical context, please refer to commit 32fb480e0a
("powercap/intel_rapl: Support multi-die/package"), which initially changed
the RAPL scope from package to die for all systems, as Intel systems
with Die enumeration have RAPL scope as die, and those without die
enumeration are not affected. So, all systems(Intel, AMD, Hygon), worked
correctly with topology_logical_die_id() until recently, but this changed
after the "0x80000026 leaf" commit mentioned above.
Future multi-die Intel systems will have package scope RAPL counters,
but they will be using TPMI RAPL interface, which is not affected by
this change.
Replacing topology_logical_die_id() with topology_physical_package_id()
conditionally only for AMD and Hygon fixes the energy-pkg event.
On an AMD 2 socket 8 CCD Zen4 server:
Before:
linux$ ls /sys/class/powercap/
intel-rapl intel-rapl:4 intel-rapl:8:0 intel-rapl:d
intel-rapl:0 intel-rapl:4:0 intel-rapl:9 intel-rapl:d:0
intel-rapl:0:0 intel-rapl:5 intel-rapl:9:0 intel-rapl:e
intel-rapl:1 intel-rapl:5:0 intel-rapl:a intel-rapl:e:0
intel-rapl:1:0 intel-rapl:6 intel-rapl:a:0 intel-rapl:f
intel-rapl:2 intel-rapl:6:0 intel-rapl:b intel-rapl:f:0
intel-rapl:2:0 intel-rapl:7 intel-rapl:b:0
intel-rapl:3 intel-rapl:7:0 intel-rapl:c
intel-rapl:3:0 intel-rapl:8 intel-rapl:c:0
After:
linux$ ls /sys/class/powercap/
intel-rapl intel-rapl:0 intel-rapl:0:0 intel-rapl:1 intel-rapl:1:0
Only one sysfs entry per-event per-package is created after this change.
Fixes: 63edbaa48a ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf")
Reported-by: Michael Larabel <michael@michaellarabel.com>
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Link: https://patch.msgid.link/20240730044917.4680-3-Dhananjay.Ugwekar@amd.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The if condition !A || A && B can be simplified to !A || B.
Fixes the following Coccinelle/coccicheck warning reported by
excluded_middle.cocci:
WARNING !A || A && B is equivalent to !A || B
Compile-tested only.
Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Introduce two new APIs rapl_package_add_pmu()/rapl_package_remove_pmu().
RAPL driver can invoke these APIs to expose its supported energy
counters via perf PMU. The new RAPL PMU is fully compatible with current
MSR RAPL PMU, including using the same PMU name and events
name/id/unit/scale, etc.
For example, use below command
perf stat -e power/energy-pkg/ -e power/energy-ram/ FOO
to get the energy consumption if power/energy-pkg/ and power/energy-ram/
events are available in the "perf list" output.
This does not introduce any conflict because TPMI RAPL is the only user
of these APIs currently, and it never co-exists with MSR RAPL.
Note that RAPL Packages can be probed/removed dynamically, and the
events supported by each TPMI RAPL device can be different. Thus the
RAPL PMU support is done on demand, which means
1. PMU is registered only if it is needed by a RAPL Package. PMU events
for unsupported counters are not exposed.
2. PMU is unregistered and registered when a new RAPL Package is probed
and supports new counters that are not supported by current PMU.
For example, on a dual-package system using TPMI RAPL, it is possible
that Package 1 behaves as TPMI domain root and supports Psys domain.
In this case, register PMU without Psys event when probing Package 0,
and re-register the PMU with Psys event when probing Package 1.
3. PMU is unregistered when all registered RAPL Packages don't need PMU.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.
Use cpumask_weight_and() to avoid the need for a temporary cpumask on
the stack.
Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull power management updates from Rafael Wysocki:
"From the functional perspective, the most significant change here is
the addition of support for Energy Models that can be updated
dynamically at run time.
There is also the addition of LZ4 compression support for hibernation,
the new preferred core support in amd-pstate, new platforms support in
the Intel RAPL driver, new model-specific EPP handling in intel_pstate
and more.
Apart from that, the cpufreq default transition delay is reduced from
10 ms to 2 ms (along with some related adjustments), the system
suspend statistics code undergoes a significant rework and there is a
usual bunch of fixes and code cleanups all over.
Specifics:
- Allow the Energy Model to be updated dynamically (Lukasz Luba)
- Add support for LZ4 compression algorithm to the hibernation image
creation and loading code (Nikhil V)
- Fix and clean up system suspend statistics collection (Rafael
Wysocki)
- Simplify device suspend and resume handling in the power management
core code (Rafael Wysocki)
- Fix PCI hibernation support description (Yiwei Lin)
- Make hibernation take set_memory_ro() return values into account as
appropriate (Christophe Leroy)
- Set mem_sleep_current during kernel command line setup to avoid an
ordering issue with handling it (Maulik Shah)
- Fix wake IRQs handling when pm_runtime_force_suspend() is used as a
driver's system suspend callback (Qingliang Li)
- Simplify pm_runtime_get_if_active() usage and add a replacement for
pm_runtime_put_autosuspend() (Sakari Ailus)
- Add a tracepoint for runtime_status changes tracking (Vilas Bhat)
- Fix section title markdown in the runtime PM documentation (Yiwei
Lin)
- Enable preferred core support in the amd-pstate cpufreq driver
(Meng Li)
- Fix min_perf assignment in amd_pstate_adjust_perf() and make the
min/max limit perf values in amd-pstate always stay within the
(highest perf, lowest perf) range (Tor Vic, Meng Li)
- Allow intel_pstate to assign model-specific values to strings used
in the EPP sysfs interface and make it do so on Meteor Lake
(Srinivas Pandruvada)
- Drop long-unused cpudata::prev_cummulative_iowait from the
intel_pstate cpufreq driver (Jiri Slaby)
- Prevent scaling_cur_freq from exceeding scaling_max_freq when the
latter is an inefficient frequency (Shivnandan Kumar)
- Change default transition delay in cpufreq to 2ms (Qais Yousef)
- Remove references to 10ms minimum sampling rate from comments in
the cpufreq code (Pierre Gondois)
- Honour transition_latency over transition_delay_us in cpufreq (Qais
Yousef)
- Stop unregistering cpufreq cooling on CPU hot-remove (Viresh Kumar)
- General enhancements / cleanups to ARM cpufreq drivers (tianyu2,
Nícolas F. R. A. Prado, Erick Archer, Arnd Bergmann, Anastasia
Belova)
- Update cpufreq-dt-platdev to block/approve devices (Richard Acayan)
- Make the SCMI cpufreq driver get a transition delay value from
firmware (Pierre Gondois)
- Prevent the haltpoll cpuidle governor from shrinking guest
poll_limit_ns below grow_start (Parshuram Sangle)
- Avoid potential overflow in integer multiplication when computing
cpuidle state parameters (C Cheng)
- Adjust MWAIT hint target C-state computation in the ACPI cpuidle
driver and in intel_idle to return a correct value for C0 (He
Rongguang)
- Address multiple issues in the TPMI RAPL driver and add support for
new platforms (Lunar Lake-M, Arrow Lake) to Intel RAPL (Zhang Rui)
- Fix freq_qos_add_request() return value check in dtpm_cpu (Daniel
Lezcano)
- Fix kernel-doc for dtpm_create_hierarchy() (Yang Li)
- Fix file leak in get_pkg_num() in x86_energy_perf_policy (Samasth
Norway Ananda)
- Fix cpupower-frequency-info.1 man page typo (Jan Kratochvil)
- Fix a couple of warnings in the OPP core code related to W=1 builds
(Viresh Kumar)
- Move dev_pm_opp_{init|free}_cpufreq_table() to pm_opp.h (Viresh
Kumar)
- Extend dev_pm_opp_data with turbo support (Sibi Sankar)
- dt-bindings: drop maxItems from inner items (David Heidelberg)"
* tag 'pm-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (95 commits)
dt-bindings: opp: drop maxItems from inner items
OPP: debugfs: Fix warning around icc_get_name()
OPP: debugfs: Fix warning with W=1 builds
cpufreq: Move dev_pm_opp_{init|free}_cpufreq_table() to pm_opp.h
OPP: Extend dev_pm_opp_data with turbo support
Fix cpupower-frequency-info.1 man page typo
cpufreq: scmi: Set transition_delay_us
firmware: arm_scmi: Populate fast channel rate_limit
firmware: arm_scmi: Populate perf commands rate_limit
cpuidle: ACPI/intel: fix MWAIT hint target C-state computation
PM: sleep: wakeirq: fix wake irq warning in system suspend
powercap: dtpm: Fix kernel-doc for dtpm_create_hierarchy() function
cpufreq: Don't unregister cpufreq cooling on CPU hotplug
PM: suspend: Set mem_sleep_current during kernel command line setup
cpufreq: Honour transition_latency over transition_delay_us
cpufreq: Limit resolving a frequency to policy min/max
Documentation: PM: Fix runtime_pm.rst markdown syntax
cpufreq: amd-pstate: adjust min/max limit perf
cpufreq: Remove references to 10ms min sampling rate
cpufreq: intel_pstate: Update default EPPs for Meteor Lake
...