Commit Graph

235 Commits

Author SHA1 Message Date
Ville Syrjälä
a60ec4485f powercap: intel_rapl: Downgrade BIOS locked limits pr_warn() to pr_debug()
Before the refactoring the pr_warn() only triggered when
someone explicitly tried to write to a BIOS locked limit.
After the refactoring the warning is also triggering during
system resume. The user can't do anything about this so
printing scary warnings doesn't make sense

Keep the printk but make it pr_debug() instead of pr_warn()
to make it clear it's not a serious issue.

Fixes: 9050a9cd5e ("powercap: intel_rapl: Cleanup Power Limits support")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: 6.5+ <stable@vger.kernel.org> # 6.5+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-10-24 22:07:07 +02:00
Srinivas Pandruvada
081690e941 powercap: intel_rapl: Fix invalid setting of Power Limit 4
System runs at minimum performance, once powercap RAPL package domain
enabled flag is changed from 1 to 0 to 1.

Setting RAPL package domain enabled flag to 0, results in setting of
power limit 4 (PL4) MSR 0x601 to 0. This implies disabling PL4 limit.
The PL4 limit controls the peak power. So setting 0, results in some
undesirable performance, which depends on hardware implementation.

Even worse, when the enabled flag is set to 1 again. This will set PL4
MSR value to 0x01, which means reduce peak power to 0.125W. This will
force system to run at the lowest possible performance on every PL4
supported system.

Setting enabled flag should only affect the "enable" bit, not other
bits. Here it is changing power limit.

This is caused by a change which assumes that there is an enable bit in
the PL4 MSR like other power limits. Although PL4 enable/disable bit is
present with TPMI RAPL interface, it is not present with the MSR
interface.

There is a rapl_primitive_info defined for non existent PL4 enable bit
and then it is used with the commit 9050a9cd5e ("powercap: intel_rapl:
Cleanup Power Limits support") to enable PL4. This is wrong, hence remove
this rapl primitive for PL4. Also in the function
rapl_detect_powerlimit(), PL_ENABLE is used to check for the presence of
power limits. Replace PL_ENABLE with PL_LIMIT, as PL_LIMIT must be
present. Without this change, PL4 controls will not be available in the
sysfs once rapl primitive for PL4 is removed.

Fixes: 9050a9cd5e ("powercap: intel_rapl: Cleanup Power Limits support")
Suggested-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Cc: 6.5+ <stable@vger.kernel.org> # 6.5+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-09-06 22:21:22 +02:00
Linus Torvalds
ccc5e98177 Merge tag 'pm-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
 "These rework cpuidle governors to call tick_nohz_get_sleep_length()
  less often and fix one of them, rework hibernation to avoid storing
  pages filled with zeros in hibernation images, switch over some
  cpufreq drivers to use void remove callbacks, fix and clean up
  multiple cpufreq drivers, fix the devfreq core, update the cpupower
  utility and make other assorted improvements.

  Specifics:

   - Rework the menu and teo cpuidle governors to avoid calling
     tick_nohz_get_sleep_length(), which is likely to become quite
     expensive going forward, too often and improve making decisions
     regarding whether or not to stop the scheduler tick in the teo
     governor (Rafael Wysocki)

   - Improve the performance of cpufreq_stats_create_table() in some
     cases (Liao Chang)

   - Fix two issues in the amd-pstate-ut cpufreq driver (Swapnil Sapkal)

   - Use clamp() helper macro to improve the code readability in
     cpufreq_verify_within_limits() (Liao Chang)

   - Set stale CPU frequency to minimum in intel_pstate (Doug Smythies)

   - Migrate cpufreq drivers for various platforms to use void remove
     callback (Yangtao Li)

   - Add online/offline/exit hooks for Tegra driver (Sumit Gupta)

   - Explicitly include correct DT includes in cpufreq (Rob Herring)

   - Frequency domain updates for qcom-hw driver (Neil Armstrong)

   - Modify AMD pstate driver return the highest_perf value (Meng Li)

   - Generic cleanups for cppc, mediatek and powernow driver (Liao
     Chang, Konrad Dybcio)

   - Add more platforms to cpufreq-arm driver's blocklist
     (AngeloGioacchino Del Regno and Konrad Dybcio)

   - brcmstb-avs-cpufreq: Fix -Warray-bounds bug (Gustavo A. R. Silva)

   - Add device PM helpers to allow a device to remain powered-on during
     system-wide transitions (Ulf Hansson)

   - Rework hibernation memory snapshotting to avoid storing pages
     filled with zeros in hibernation image files (Brian Geffon)

   - Add check to make sure that CPU latency QoS constraints do not use
     negative values (Clive Lin)

   - Optimize rp->domains memory allocation in the Intel RAPL power
     capping driver (xiongxin)

   - Remove recursion while parsing zones in the arm_scmi power capping
     driver (Cristian Marussi)

   - Fix memory leak in devfreq_dev_release() (Boris Brezillon)

   - Rewrite devfreq_monitor_start() kerneldoc comment (Manivannan
     Sadhasivam)

   - Explicitly include correct DT includes in devfreq (Rob Herring)

   - Remove unsued pm_runtime_update_max_time_suspended() extern
     declaration (YueHaibing)

   - Add turbo-boost support to cpupower (Wyes Karny)

   - Add support for amd_pstate mode change to cpupower (Wyes Karny)

   - Fix 'cpupower idle_set' command to accept only numeric values of
     arguments (Likhitha Korrapati)

   - Clean up OPP code and add new frequency related APIs to it (Viresh
     Kumar, Manivannan Sadhasivam)

   - Convert ti cpufreq/opp bindings to json schema (Nishanth Menon)"

* tag 'pm-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (74 commits)
  cpufreq: tegra194: remove opp table in exit hook
  cpufreq: powernow-k8: Use related_cpus instead of cpus in driver.exit()
  cpufreq: tegra194: add online/offline hooks
  cpuidle: teo: Avoid unnecessary variable assignments
  cpufreq: qcom-cpufreq-hw: add support for 4 freq domains
  dt-bindings: cpufreq: qcom-hw: add a 4th frequency domain
  cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver
  cpufreq: amd-pstate-ut: Remove module parameter access
  cpufreq: Use clamp() helper macro to improve the code readability
  PM: sleep: Add helpers to allow a device to remain powered-on
  PM: QoS: Add check to make sure CPU latency is non-negative
  PM: runtime: Remove unsued extern declaration of pm_runtime_update_max_time_suspended()
  cpufreq: intel_pstate: set stale CPU frequency to minimum
  cpufreq: stats: Improve the performance of cpufreq_stats_create_table()
  dt-bindings: cpufreq: Convert ti-cpufreq to json schema
  dt-bindings: opp: Convert ti-omap5-opp-supply to json schema
  OPP: Fix argument name in doc comment
  cpuidle: menu: Skip tick_nohz_get_sleep_length() call in some cases
  cpufreq: cppc: Set fie_disabled to FIE_DISABLED if fails to create kworker_fie
  cpufreq: cppc: cppc_cpufreq_get_rate() returns zero in all error cases.
  ...
2023-08-28 18:04:39 -07:00
Linus Torvalds
1a7c611546 Merge tag 'perf-core-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf event updates from Ingo Molnar:

 - AMD IBS improvements

 - Intel PMU driver updates

 - Extend core perf facilities & the ARM PMU driver to better handle ARM big.LITTLE events

 - Micro-optimize software events and the ring-buffer code

 - Misc cleanups & fixes

* tag 'perf-core-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/uncore: Remove unnecessary ?: operator around pcibios_err_to_errno() call
  perf/x86/intel: Add Crestmont PMU
  x86/cpu: Update Hybrids
  x86/cpu: Fix Crestmont uarch
  x86/cpu: Fix Gracemont uarch
  perf: Remove unused extern declaration arch_perf_get_page_size()
  perf: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability
  arm_pmu: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability
  perf/x86: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability
  arm_pmu: Add PERF_PMU_CAP_EXTENDED_HW_TYPE capability
  perf/x86/ibs: Set mem_lvl_num, mem_remote and mem_hops for data_src
  perf/mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_NA
  perf/mem: Introduce PERF_MEM_LVLNUM_UNC
  perf/ring_buffer: Use local_try_cmpxchg in __perf_output_begin
  locking/arch: Avoid variable shadowing in local_try_cmpxchg()
  perf/core: Use local64_try_cmpxchg in perf_swevent_set_period
  perf/x86: Use local64_try_cmpxchg
  perf/amd: Prevent grouping of IBS events
2023-08-28 16:35:01 -07:00
Peter Zijlstra
882cdb06b6 x86/cpu: Fix Gracemont uarch
Alderlake N is an E-core only product using Gracemont
micro-architecture. It fits the pre-existing naming scheme perfectly
fine, adhere to it.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230807150405.686834933@infradead.org
2023-08-09 21:51:06 +02:00
Rafael J. Wysocki
9784239529 Merge back earlier power capping changes for v6.6. 2023-08-04 22:48:58 +02:00
xiongxin
2fa00769b1 powercap: intel_rapl: Optimize rp->domains memory allocation
In the memory allocation of rp->domains in rapl_detect_domains(), there
is an additional memory of struct rapl_domain allocated, optimize the
code here to save sizeof(struct rapl_domain) bytes of memory.

Test in Intel NUC (i5-1135G7).

Signed-off-by: xiongxin <xiongxin@kylinos.cn>
Tested-by: xiongxin <xiongxin@kylinos.cn>
Reviewed-by: Srinivas Pandruvada<srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-08-01 13:54:00 +02:00
Zhang Rui
16e95a62ee powercap: intel_rapl: Fix a sparse warning in TPMI interface
Depends on the interface used, the RAPL registers can be either MSR
indexes or memory mapped IO addresses. Current RAPL common code uses u64
to save both MSR and memory mapped IO registers. With this, when
handling register address with an __iomem annotation, it triggers a
sparse warning like below:

sparse warnings: (new ones prefixed by >>)
>> drivers/powercap/intel_rapl_tpmi.c:141:41: sparse: sparse: incorrect type in initializer (different address spaces) @@     expected unsigned long long [usertype] *tpmi_rapl_regs @@     got void [noderef] __iomem * @@
   drivers/powercap/intel_rapl_tpmi.c:141:41: sparse:     expected unsigned long long [usertype] *tpmi_rapl_regs
   drivers/powercap/intel_rapl_tpmi.c:141:41: sparse:     got void [noderef] __iomem *

Fix the problem by using a union to save the registers instead.

Suggested-by: David Laight <David.Laight@ACULAB.COM>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202307031405.dy3druuy-lkp@intel.com/
Tested-by: Wang Wendy <wendy.wang@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-08-01 13:45:08 +02:00
Cristian Marussi
3e767d6850 powercap: arm_scmi: Remove recursion while parsing zones
Powercap zones can be defined as arranged in a hierarchy of trees and when
registering a zone with powercap_register_zone(), the kernel powercap
subsystem expects this to happen starting from the root zones down to the
leaves; on the other side, de-registration by powercap_deregister_zone()
must begin from the leaf zones.

Available SCMI powercap zones are retrieved dynamically from the platform
at probe time and, while any defined hierarchy between the zones is
described properly in the zones descriptor, the platform returns the
availables zones with no particular well-defined order: as a consequence,
the trees possibly composing the hierarchy of zones have to be somehow
walked properly to register the retrieved zones from the root.

Currently the ARM SCMI Powercap driver walks the zones using a recursive
algorithm; this approach, even though correct and tested can lead to kernel
stack overflow when processing a returned hierarchy of zones composed by
particularly high trees.

Avoid possible kernel stack overflow by substituting the recursive approach
with an iterative one supported by a dynamically allocated stack-like data
structure.

Fixes: b55eef5226 ("powercap: arm_scmi: Add SCMI Powercap based driver")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-07-20 20:27:10 +02:00
Linus Torvalds
e4c8d01865 Merge tag 'soc-drivers-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC driver updates from Arnd Bergmann:
 "Nothing surprising in the SoC specific drivers, with the usual
  updates:

   - Added or improved SoC driver support for Tegra234, Exynos4121,
     RK3588, as well as multiple Mediatek and Qualcomm chips

   - SCMI firmware gains support for multiple SMC/HVC transport and
     version 3.2 of the protocol

   - Cleanups amd minor changes for the reset controller, memory
     controller, firmware and sram drivers

   - Minor changes to amd/xilinx, samsung, tegra, nxp, ti, qualcomm,
     amlogic and renesas SoC specific drivers"

* tag 'soc-drivers-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (118 commits)
  dt-bindings: interrupt-controller: Convert Amlogic Meson GPIO interrupt controller binding
  MAINTAINERS: add PHY-related files to Amlogic SoC file list
  drivers: meson: secure-pwrc: always enable DMA domain
  tee: optee: Use kmemdup() to replace kmalloc + memcpy
  soc: qcom: geni-se: Do not bother about enable/disable of interrupts in secondary sequencer
  dt-bindings: sram: qcom,imem: document qdu1000
  soc: qcom: icc-bwmon: Fix MSM8998 count unit
  dt-bindings: soc: qcom,rpmh-rsc: Require power-domains
  soc: qcom: socinfo: Add Soc ID for IPQ5300
  dt-bindings: arm: qcom,ids: add SoC ID for IPQ5300
  soc: qcom: Fix a IS_ERR() vs NULL bug in probe
  soc: qcom: socinfo: Add support for new fields in revision 19
  soc: qcom: socinfo: Add support for new fields in revision 18
  dt-bindings: firmware: scm: Add compatible for SDX75
  soc: qcom: mdt_loader: Fix split image detection
  dt-bindings: memory-controllers: drop unneeded quotes
  soc: rockchip: dtpm: use C99 array init syntax
  firmware: tegra: bpmp: Add support for DRAM MRQ GSCs
  soc/tegra: pmc: Use devm_clk_notifier_register()
  soc/tegra: pmc: Simplify debugfs initialization
  ...
2023-06-29 15:22:19 -07:00
Dan Carpenter
49776c712e powercap: RAPL: Fix a NULL vs IS_ERR() bug
The devm_ioremap_resource() function returns error pointers on error,
it never returns NULL.  Update the check accordingly.

Fixes: 9eef7f9da9 ("powercap: intel_rapl: Introduce RAPL TPMI interface driver")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-12 19:51:21 +02:00
Zhang Rui
4658fe81b3 powercap: RAPL: Fix CONFIG_IOSF_MBI dependency
After commit 3382388d71 ("intel_rapl: abstract RAPL common code"),
accessing to IOSF_MBI interface is done in the RAPL common code.

Thus it is the CONFIG_INTEL_RAPL_CORE that has dependency of
CONFIG_IOSF_MBI, while CONFIG_INTEL_RAPL_MSR does not.

This problem was not exposed previously because all the previous RAPL
common code users, aka, the RAPL MSR and MMIO I/F drivers, have
CONFIG_IOSF_MBI selected.

Fix the CONFIG_IOSF_MBI dependency in RAPL code. This also fixes a build
time failure when the RAPL TPMI I/F driver is introduced without
selecting CONFIG_IOSF_MBI.

x86_64-linux-ld: vmlinux.o: in function `set_floor_freq_atom':
intel_rapl_common.c:(.text+0x2dac9b8): undefined reference to `iosf_mbi_write'
x86_64-linux-ld: intel_rapl_common.c:(.text+0x2daca66): undefined reference to `iosf_mbi_read'

Reference to iosf_mbi.h is also removed from the RAPL MSR I/F driver.

Fixes: 3382388d71 ("intel_rapl: abstract RAPL common code")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/all/20230601213246.3271412-1-arnd@kernel.org
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-12 19:48:40 +02:00
Sumeet Pawnikar
d05b5e0baf powercap: RAPL: fix invalid initialization for pl4_supported field
The current initialization of the struct x86_cpu_id via
pl4_support_ids[] is partial and wrong. It is initializing
"stepping" field with "X86_FEATURE_ANY" instead of "feature" field.

Use X86_MATCH_INTEL_FAM6_MODEL macro instead of initializing
each field of the struct x86_cpu_id for pl4_supported list of CPUs.
This X86_MATCH_INTEL_FAM6_MODEL macro internally uses another macro
X86_MATCH_VENDOR_FAM_MODEL_FEATURE for X86 based CPU matching with
appropriate initialized values.

Reported-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lore.kernel.org/lkml/28ead36b-2d9e-1a36-6f4e-04684e420260@intel.com
Fixes: eb52bc2ae5 ("powercap: RAPL: Add Power Limit4 support for Meteor Lake SoC")
Fixes: b08b95cf30 ("powercap: RAPL: Add Power Limit4 support for Alder Lake-N and Raptor Lake-P")
Fixes: 5157559069 ("powercap: RAPL: Add Power Limit4 support for RaptorLake")
Fixes: 1cc5b9a411 ("powercap: Add Power Limit4 support for Alder Lake SoC")
Fixes: 8365a898fe ("powercap: Add Power Limit4 support")
Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-12 19:44:52 +02:00
Cristian Marussi
aaffb4cacd powercap: arm_scmi: Add support for disabling powercaps on a zone
Add support to disable/enable powercapping on a zone.

Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Link: https://lore.kernel.org/r/20230531152039.2363181-4-cristian.marussi@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2023-06-06 14:05:10 +01:00
Zhang Rui
9eef7f9da9 powercap: intel_rapl: Introduce RAPL TPMI interface driver
The TPMI (Topology Aware Register and PM Capsule Interface) provides a
flexible, extendable and PCIe enumerable MMIO interface for PM features.

Intel RAPL (Running Average Power Limit) is one of the features that
benefit from this. Using TPMI Interface has advantage over traditional MSR
(Model Specific Register) interface, where a thread needs to be scheduled
on the target CPU to read or write. Also the RAPL features vary between
CPU models, and hence lot of model specific code. Here TPMI provides an
architectural interface by providing hierarchical tables and fields,
which will not need any model specific implementation.

TPMI interface uses a PCI VSEC structure to expose the location of MMIO
interface for PM feature enumeration and control.

The Intel VSEC driver parses VSEC structures present in the PCI
configuration space of the given device and creates an auxiliary device
object for each of them. In particular, it creates an auxiliary device
object representing TPMI that can be bound to by an auxiliary driver.

Then the TPMI enumeration driver binds to the TPMI auxiliary device
object created by the Intel VSEC driver, parses the PM Feature Structure
(PFS) present in the TPMI MMIO region and creates device nodes for PM
features described in the PFS.

This RAPL TPMI Interface driver binds the RAPL auxiliary device created
by the TPMI enumeration driver and expose the RAPL control to userspace
via powercap sysfs class.

RAPL TPMI details are published in the following document:
https://github.com/intel/tpmi_power_management/blob/main/RAPL_TPMI_public_disclosure_FINAL.docx

Note, for now, the RAPL TPMI Interface and RAPL MSR Interface cannot
co-exists on the same platform (RAPL TPMI Interface is not supported on
any platforms in the CPU model list for RAPL MSR Interface). Thus
register the RAPL TPMI powercap control type with name "intel-rapl",
the same as RAPL MSR Interface, so that it is transparent to userspace.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24 18:46:20 +02:00
Zhang Rui
e12dee18b8 powercap: intel_rapl: Introduce core support for TPMI interface
Compared with existing RAPL MSR/MMIO Interface, the RAPL TPMI Interface
1. has per Power Limit register, thus has per Power Limit Lock and
   Enable bit.
2. doesn't have Power Limit Clamp bit.
3. the Power Limit Lock and Enable bits have different bit offsets.
These mean RAPL TPMI Interface needs its own primitive information.

RAPL TPMI Interface also has per domain unit register but with a
different register layout. This requires a TPMI specific rapl_defaults
call to decode the unit register.

Introduce the RAPL core support for TPMI Interface.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24 18:46:20 +02:00
Zhang Rui
b4288ce788 powercap: intel_rapl: Introduce RAPL I/F type
Different RAPL Interfaces may have different primitive information and
rapl_defaults calls.

To better distinguish this difference in the RAPL framework code,
introduce a new enum to represent different types of RAPL Interfaces.

No functional change.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24 18:46:20 +02:00
Zhang Rui
bf44b9011d powercap: intel_rapl: Make cpu optional for rapl_package
MSR RAPL Interface always removes a rapl_package when all the CPUs in
that rapl_package are offlined. This is because it relies on an online
CPU to access the MSR.

But for RAPL Interface using MMIO registers, when all the cpus within
the rapl_package are offlined,
1. the register can still be accessed
2. monitoring and setting the Power Pimits for the rapl_package is still
   meaningful because of uncore power.

This means that, a valid rapl_package doesn't rely on one or more cpus
being onlined.

For this sense, make cpu optional for rapl_package. A rapl_package can
be registered either using a CPU id to represent the physical
package/die, or using the physical package id directly.

Note that, the thermal throttling interrupt is not disabled via
MSR_IA32_PACKAGE_THERM_INTERRUPT for such rapl_package at the moment.
If it is still needed in the future, this can be achieved by selecting
an onlined CPU using the physical package id.

Note that, processor_thermal_rapl, the current MMIO RAPL Interface
driver, can also be converted to register using a package id instead.
But this is not done right now because processor_thermal_rapl driver
works on single-package systems only, and offlining the only package
will not happen. So keep the previous logic.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24 18:46:20 +02:00
Zhang Rui
693c1d7868 powercap: intel_rapl: Remove redundant cpu parameter
For rapl_packages that rely on online CPUs to work, rp->lead_cpu always
has a valid CPU id.

Remove the redundant cpu parameter in rapl_check_domain(),
rapl_detect_domains() and .check_unit() callbacks.

No functional change.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24 18:46:20 +02:00
Zhang Rui
f442bd2742 powercap: intel_rapl: Add support for lock bit per Power Limit
With RAPL MSR/MMIO Interface, each RAPL domain has one Power Limit
register. Each Power Limit register has one lock bit which tells the OS
if the power limit register can be used or not.
Depending on the number of power limits supported by the power limit
register, the lock bit may apply to one or more power limits.

With RAPL TPMI Interface, each RAPL domain has multiple Power Limits,
and each Power Limit has its own register, with a lock bit.

To handle this, introduce support for lock bit per Power Limit.

For existing RAPL MSR/MMIO Interfaces, the lock bit in the Power Limit
register applies to all the Power Limits controlled by this register.

Remove the per domain DOMAIN_STATE_BIOS_LOCKED flag at the same time
because it can be replaced by the per Power Limit lock.

No functional change intended.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24 18:46:20 +02:00
Zhang Rui
9050a9cd5e powercap: intel_rapl: Cleanup Power Limits support
The same set of operations are shared by different Powert Limits,
including Power Limit get/set, Power Limit enable/disable, clamping
enable/disable, time window get/set, and max power get/set, etc.

But the same operation for different Power Limit has different
primitives because they use different registers/register bits.

A lot of dirty/duplicate code was introduced to handle this difference.

Introduce a universal way to issue Power Limit operations.
Instead of using hardcoded primitive name directly, use Power Limit id
+ operation type, and hide all the Power Limit difference details in a
central place, get_pl_prim(). Two helpers, rapl_read_pl_data() and
rapl_write_pl_data(), are introduced at the same time to simplify the
code for issuing Power Limit operations.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24 18:46:19 +02:00
Zhang Rui
a38f300bb2 powercap: intel_rapl: Use bitmap for Power Limits
Currently, a RAPL package is registered with the number of Power Limits
supported in each RAPL domain. But this doesn't tell which Power Limits
are available. Using the number of Power Limits supported to guess the
availability of each Power Limit is fragile.

Use bitmap to represent the availability of each Power Limit.

Note that PL1 is mandatory thus it does not need to be set explicitly by
the RAPL Interface drivers.

No functional change intended.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24 18:46:19 +02:00
Zhang Rui
045610c383 powercap: intel_rapl: Change primitive order
The same set of operations are shared by different Powert Limits,
including Power Limit get/set, Power Limit enable/disable, clamping
enable/disable, time window get/set, and max power get/set, etc.

But the same operation for different Power Limit has different
primitives because they use different registers/register bits.

A lot of dirty/duplicate code was introduced to handle this difference.

Instead of using hardcoded primitive name directly, using Power Limit id
+ operation type is much cleaner.

For this sense, move POWER_LIMIT1/POWER_LIMIT2/POWER_LIMIT4 to the
beginning of enum rapl_primitives so that they can be reused as
Power Limit ids.

No functional change.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24 18:46:19 +02:00
Zhang Rui
11edbe5c66 powercap: intel_rapl: Use index to initialize primitive information
Currently, the RAPL primitive information array is required to be
initialized in the order of enum rapl_primitives.
This can break easily, especially when different RAPL Interfaces may
support different sets of primitives.

Convert the code to initialize the primitive information using array
index explicitly.

No functional change.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24 18:46:19 +02:00
Zhang Rui
cb532e728e powercap: intel_rapl: Support per domain energy/power/time unit
RAPL MSR/MMIO Interface has package scope unit register but some RAPL
domains like Dram/Psys may use a fixed energy unit value instead of the
default unit value on certain platforms.
RAPL TPMI Interface supports per domain unit register.

For the above reasons, add support for per domain unit register and per
domain energy/power/time unit.

When per domain unit register is not available, use the package scope
unit register as the per domain unit register for each RAPL domain so
that this change is transparent to MSR/MMIO Interface.

No functional change intended.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24 18:46:19 +02:00