Commit Graph

3100 Commits

Author SHA1 Message Date
Tang Yizhou
1e81d3e06d cpufreq: Fix a comment in cpufreq_policy_free
Make the comment in blocking_notifier_call_chain() easier to
understand.

Signed-off-by: Tang Yizhou <tangyizhou@huawei.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-12-01 20:02:11 +01:00
Xiongfeng Wang
2c1b5a8466 cpufreq: Fix get_cpu_device() failure in add_cpu_dev_symlink()
When I hot added a CPU, I found 'cpufreq' directory was not created
below /sys/devices/system/cpu/cpuX/.

It is because get_cpu_device() failed in add_cpu_dev_symlink().

cpufreq_add_dev() is the .add_dev callback of a CPU subsys interface.
It will be called when the CPU device registered into the system.
The call chain is as follows:

  register_cpu()
  ->device_register()
   ->device_add()
    ->bus_probe_device()
     ->cpufreq_add_dev()

But only after the CPU device has been registered, we can get the
CPU device by get_cpu_device(), otherwise it will return NULL.

Since we already have the CPU device in cpufreq_add_dev(), pass
it to add_cpu_dev_symlink().

I noticed that the 'kobj' of the CPU device has been added into
the system before cpufreq_add_dev().

Fixes: 2f0ba790df ("cpufreq: Fix creation of symbolic links to policy directories")
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-12-01 19:50:56 +01:00
Srinivas Pandruvada
03c83982a0 cpufreq: intel_pstate: ITMT support for overclocked system
On systems with overclocking enabled, CPPC Highest Performance can be
hard coded to 0xff. In this case even if we have cores with different
highest performance, ITMT can't be enabled as the current implementation
depends on CPPC Highest Performance.

On such systems we can use MSR_HWP_CAPABILITIES maximum performance field
when CPPC.Highest Performance is 0xff.

Due to legacy reasons, we can't solely depend on MSR_HWP_CAPABILITIES as
in some older systems CPPC Highest Performance is the only way to identify
different performing cores.

Reported-by: Michael Larabel <Michael@MichaelLarabel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Michael Larabel <Michael@MichaelLarabel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-11-23 14:11:18 +01:00
Rafael J. Wysocki
ed38eb49d1 cpufreq: intel_pstate: Fix active mode offline/online EPP handling
After commit 4adcf2e582 ("cpufreq: intel_pstate: Add ->offline and
->online callbacks") the EPP value set by the "performance" scaling
algorithm in the active mode is not restored after an offline/online
cycle which replaces it with the saved EPP value coming from user
space.

Address this issue by forcing intel_pstate_hwp_set() to set a new
EPP value when it runs first time after online.

Fixes: 4adcf2e582 ("cpufreq: intel_pstate: Add ->offline and ->online callbacks")
Link: https://lore.kernel.org/linux-pm/adc7132c8655bd4d1c8b6129578e931a14fe1db2.camel@linux.intel.com/
Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: 5.9+ <stable@vger.kernel.org> # 5.9+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-11-23 14:02:11 +01:00
Adamos Ttofari
cd23f02f16 cpufreq: intel_pstate: Add Ice Lake server to out-of-band IDs
Commit fbdc21e9b0 ("cpufreq: intel_pstate: Add Icelake servers
support in no-HWP mode") enabled the use of Intel P-State driver
for Ice Lake servers.

But it doesn't cover the case when OS can't control P-States.

Therefore, for Ice Lake server, if MSR_MISC_PWR_MGMT bits 8 or 18
are enabled, then the Intel P-State driver should exit as OS can't
control P-States.

Fixes: fbdc21e9b0 ("cpufreq: intel_pstate: Add Icelake servers support in no-HWP mode")
Signed-off-by: Adamos Ttofari <attofari@amazon.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-11-23 14:02:10 +01:00
Srinivas Pandruvada
074d0cdfbb cpufreq: intel_pstate: Clear HWP Status during HWP Interrupt enable
It is possible that some performance excursions happened before OS boot
or enable HWP interrupts. So clear MSR_HWP_STATUS bits when we enable
HWP interrupt. In this way a next excursion will results in a HWP
interrupt.

The status bits of MSR_HWP_STATUS must be cleared (0) by software so
that a new status condition change will cause the hardware to set the
bit again and issue the notification.

Fixes: 57577c996d ("cpufreq: intel_pstate: Process HWP Guaranteed change notification")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-11-04 19:48:47 +01:00
Srinivas Pandruvada
5521055670 cpufreq: intel_pstate: Fix unchecked MSR 0x773 access
It is possible that on some platforms HWP interrupts are disabled. In
that case accessing MSR 0x773 will result in warning.

So check X86_FEATURE_HWP_NOTIFY feature to access MSR 0x773. The other
places in code where this MSR is accessed, already checks this feature
except during disable path called during cpufreq offline and suspend
callbacks.

Fixes: 57577c996d ("cpufreq: intel_pstate: Process HWP Guaranteed change notification")
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-11-04 19:48:47 +01:00
Rafael J. Wysocki
dbea75fe18 cpufreq: intel_pstate: Clear HWP desired on suspend/shutdown and offline
Commit a365ab6b9d ("cpufreq: intel_pstate: Implement the
->adjust_perf() callback") caused intel_pstate to use nonzero HWP
desired values in certain usage scenarios, but it did not prevent
them from being leaked into the confugirations in which HWP desired
is expected to be 0.

The failing scenarios are switching the driver from the passive
mode to the active mode and starting a new kernel via kexec() while
intel_pstate is running in the passive mode.

To address this issue, ensure that HWP desired will be cleared on
offline and suspend/shutdown.

Fixes: a365ab6b9d ("cpufreq: intel_pstate: Implement the ->adjust_perf() callback")
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Tested-by: Julia Lawall <julia.lawall@inria.fr>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-11-04 19:48:47 +01:00
Rafael J. Wysocki
bf56b90797 Merge branches 'pm-em' and 'powercap'
Merge Energy Model and power capping updates for 5.16-rc1:

 - Add support for inefficient operating performance points to the
   Energy Model and modify cpufreq to use them properly (Vincent
   Donnefort).

 - Rearrange the DTPM framework code to simplify it and make it easier
   to follow (Daniel Lezcano).

 - Fix power intialization in DTPM (Daniel Lezcano).

 - Add CPU load consideration when estimating the instaneous power
   consumption in DTPM (Daniel Lezcano).

* pm-em:
  cpufreq: mediatek-hw: Fix cpufreq_table_find_index_dl() call
  PM: EM: Mark inefficiencies in CPUFreq
  cpufreq: Use CPUFREQ_RELATION_E in DVFS governors
  cpufreq: Introducing CPUFREQ_RELATION_E
  cpufreq: Add an interface to mark inefficient frequencies
  cpufreq: Make policy min/max hard requirements
  PM: EM: Allow skipping inefficient states
  PM: EM: Extend em_perf_domain with a flag field
  PM: EM: Mark inefficient states
  PM: EM: Fix inefficient states detection

* powercap:
  powercap/drivers/dtpm: Fix power limit initialization
  powercap/drivers/dtpm: Scale the power with the load
  powercap/drivers/dtpm: Use container_of instead of a private data field
  powercap/drivers/dtpm: Simplify the dtpm table
  powercap/drivers/dtpm: Encapsulate even more the code
2021-11-02 19:31:28 +01:00
Rafael J. Wysocki
19ea8a0dd4 Merge branch 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm
Pull ARM cpufreq updates for 5.16-rc1 from Viresh Kumar:

"- Fix tegra driver to handle BPMP errors properly (Mikko Perttunen).

 - Fix the parameter usage of the newly added perf-domain API (Hector
   Yuan).

 - Minor cleanups to cppc, vexpress and s3c244x drivers (Han Wang,
   Guenter Roeck, and Arnd Bergmann)."

* 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
  cpufreq: Fix parameter in parse_perf_domain()
  cpufreq: tegra186/tegra194: Handle errors in BPMP response
  cpufreq: remove useless INIT_LIST_HEAD()
  cpufreq: s3c244x: add fallthrough comments for switch
  cpufreq: vexpress: Drop unused variable
2021-11-02 17:55:31 +01:00
Zhang Rui
c72bcf0ab8 cpufreq: intel_pstate: Fix cpu->pstate.turbo_freq initialization
Fix a problem in active mode that cpu->pstate.turbo_freq is initialized
only if HWP-to-frequency scaling factor is refined.

In passive mode, this problem is not exposed, because
cpu->pstate.turbo_freq is set again, later in
intel_cpufreq_cpu_init()->intel_pstate_get_hwp_cap().

Fixes: eb3693f052 ("cpufreq: intel_pstate: hybrid: CPU-specific scaling factor")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-26 16:00:50 +02:00
Vincent Donnefort
6215a5de9e cpufreq: mediatek-hw: Fix cpufreq_table_find_index_dl() call
The new cpufreq table flag RELATION_E introduced a new "efficient"
parameter for the cpufreq_table_find*() functions.

Fixes: 1f39fa0dcc (cpufreq: Introducing CPUFREQ_RELATION_E)
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-07 19:21:54 +02:00
Vincent Donnefort
b894d20e68 cpufreq: Use CPUFREQ_RELATION_E in DVFS governors
Let the governors schedutil, conservative and ondemand to work, if possible
on efficient frequencies only.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-05 16:33:05 +02:00
Vincent Donnefort
1f39fa0dcc cpufreq: Introducing CPUFREQ_RELATION_E
This newly introduced flag can be applied by a governor to a CPUFreq
relation, when looking for a frequency within the policy table. The
resolution would then only walk through efficient frequencies.

Even with the flag set, the policy max limit will still be honoured. If no
efficient frequencies can be found within the limits of the policy, an
inefficient one would be returned.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-05 16:33:05 +02:00
Vincent Donnefort
1517176906 cpufreq: Make policy min/max hard requirements
When applying the policy min/max limits, the requested frequency is
simply clamped to not be out of range. It means, however, if one of the
boundaries isn't an available frequency, the frequency resolution can
return a value out of those limits, depending on the relation used.

e.g. freq{0,1,2} being available frequencies.

          freq0  policy->min  freq1  policy->max   freq2
            |        |          |        |           |
          17kHz     18kHz     19kHz     20kHz      21kHz

     __resolve_freq(21kHz, CPUFREQ_RELATION_L) -> 21kHz (out of bounds)
     __resolve_freq(17kHz, CPUFREQ_RELATION_H) -> 17kHz (out of bounds)

If, during the policy init, we resolve the requested min/max to existing
frequencies, we ensure that any CPUFREQ_RELATION_* would resolve to a
frequency which is inside the policy min/max range.

Making the policy limits rigid helps to introduce the inefficient
frequencies support. Resolving an inefficient frequency to an efficient
one should not transgress policy->max (which can be set for thermal
reason) and having a value we can trust simplify this comparison.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-05 16:33:05 +02:00
Srinivas Pandruvada
57577c996d cpufreq: intel_pstate: Process HWP Guaranteed change notification
It is possible that HWP guaranteed ratio is changed in response to
change in power and thermal limits. For example when Intel Speed Select
performance profile is changed or there is change in TDP, hardware can
send notifications. It is possible that the guaranteed ratio is
increased. This creates an issue when turbo is disabled, as the old
limits set in MSR_HWP_REQUEST are still lower and hardware will clip
to older limits.

This change enables HWP interrupt and process HWP interrupts. When
guaranteed is changed, calls cpufreq_update_policy() so that driver
callbacks are called to update to new HWP limits. This callback
is called from a delayed workqueue of 10ms to avoid frequent updates.

Although the scope of IA32_HWP_INTERRUPT is per logical cpu, on some
plaforms interrupt is generated on all CPUs. This is particularly a
problem during initialization, when the driver didn't allocated
data for other CPUs. So this change uses a cpumask of enabled CPUs and
process interrupts on those CPUs only.

When the cpufreq offline() or suspend() callback is called, HWP interrupt
is disabled on those CPUs and also cancels any pending work item.

Spin lock is used to protect data and processing shared with interrupt
handler. Here READ_ONCE(), WRITE_ONCE() macros are used to designate
shared data, even though spin lock act as an optimization barrier here.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: pablomh@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-05 15:30:44 +02:00
Mikko Perttunen
c2ace21f93 cpufreq: tegra186/tegra194: Handle errors in BPMP response
The return value from tegra_bpmp_transfer indicates the success or
failure of the IPC transaction with BPMP. If the transaction
succeeded, we also need to check the actual command's result code.
Add code to do this.

While at it, explicitly handle missing CPU clusters, which can
occur on floorswept chips. This worked before as well, but
possibly only by accident.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2021-10-04 12:31:36 +05:30
Han Wang
6065a67267 cpufreq: remove useless INIT_LIST_HEAD()
list cpu_data_list has been inited staticly through LIST_HEAD,
so there's no need to call another INIT_LIST_HEAD. Simply remove
it from cppc_cpufreq_init.

Signed-off-by: Han Wang <zjuwanghan@outlook.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2021-10-04 12:27:44 +05:30
Arnd Bergmann
08ef8d35a8 cpufreq: s3c244x: add fallthrough comments for switch
Apparently nobody has so far caught this warning, I hit it in randconfig
build testing:

drivers/cpufreq/s3c2440-cpufreq.c: In function 's3c2440_cpufreq_setdivs':
drivers/cpufreq/s3c2440-cpufreq.c:175:10: error: this statement may fall through [-Werror=implicit-fallthrough=]
   camdiv |= S3C2440_CAMDIVN_HCLK3_HALF;
          ^
drivers/cpufreq/s3c2440-cpufreq.c:176:2: note: here
  case 3:
  ^~~~
drivers/cpufreq/s3c2440-cpufreq.c:181:10: error: this statement may fall through [-Werror=implicit-fallthrough=]
   camdiv |= S3C2440_CAMDIVN_HCLK4_HALF;
          ^
drivers/cpufreq/s3c2440-cpufreq.c:182:2: note: here
  case 4:
  ^~~~

Both look like the fallthrough is intentional, so add the new
"fallthrough;" keyword.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2021-10-04 11:43:14 +05:30
Guenter Roeck
45b2bb6620 cpufreq: vexpress: Drop unused variable
arm:allmodconfig fails to build with the following error.

drivers/cpufreq/vexpress-spc-cpufreq.c:454:13: error:
					unused variable 'cur_cluster'

Remove the unused variable.

Fixes: bb8c26d938 ("cpufreq: vexpress: Set CPUFREQ_IS_COOLING_DEV flag")
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2021-10-04 11:25:21 +05:30
Linus Torvalds
4357f03d66 Merge tag 'pm-5.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
 "These fix two cpufreq issues, one in the intel_pstate driver and one
  in the core.

  Specifics:

   - Prevent intel_pstate from avoiding to use HWP, even if instructed
     to do so via the kernel command line, when HWP has been enabled
     already by the platform firmware (Doug Smythies).

   - Prevent use-after-free from occurring in the schedutil cpufreq
     governor on exit by fixing a core helper function that attempts to
     access memory associated with a kobject after calling kobject_put()
     on it (James Morse)"

* tag 'pm-5.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: schedutil: Destroy mutex before kobject_put() frees the memory
  cpufreq: intel_pstate: Override parameters if HWP forced by BIOS
2021-09-17 12:05:04 -07:00
Guenter Roeck
b60cee5bae cpufreq: vexpress: Drop unused variable
arm:allmodconfig fails to build with the following error.

  drivers/cpufreq/vexpress-spc-cpufreq.c:454:13: error:
					unused variable 'cur_cluster'

Remove the unused variable.

Fixes: bb8c26d938 ("cpufreq: vexpress: Set CPUFREQ_IS_COOLING_DEV flag")
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-16 11:29:27 -07:00
James Morse
cdef119660 cpufreq: schedutil: Destroy mutex before kobject_put() frees the memory
Since commit e5c6b312ce ("cpufreq: schedutil: Use kobject release()
method to free sugov_tunables") kobject_put() has kfree()d the
attr_set before gov_attr_set_put() returns.

kobject_put() isn't the last user of attr_set in gov_attr_set_put(),
the subsequent mutex_destroy() triggers a use-after-free:
| BUG: KASAN: use-after-free in mutex_is_locked+0x20/0x60
| Read of size 8 at addr ffff000800ca4250 by task cpuhp/2/20
|
| CPU: 2 PID: 20 Comm: cpuhp/2 Not tainted 5.15.0-rc1 #12369
| Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development
| Platform, BIOS EDK II Jul 30 2018
| Call trace:
|  dump_backtrace+0x0/0x380
|  show_stack+0x1c/0x30
|  dump_stack_lvl+0x8c/0xb8
|  print_address_description.constprop.0+0x74/0x2b8
|  kasan_report+0x1f4/0x210
|  kasan_check_range+0xfc/0x1a4
|  __kasan_check_read+0x38/0x60
|  mutex_is_locked+0x20/0x60
|  mutex_destroy+0x80/0x100
|  gov_attr_set_put+0xfc/0x150
|  sugov_exit+0x78/0x190
|  cpufreq_offline.isra.0+0x2c0/0x660
|  cpuhp_cpufreq_offline+0x14/0x24
|  cpuhp_invoke_callback+0x430/0x6d0
|  cpuhp_thread_fun+0x1b0/0x624
|  smpboot_thread_fn+0x5e0/0xa6c
|  kthread+0x3a0/0x450
|  ret_from_fork+0x10/0x20

Swap the order of the calls.

Fixes: e5c6b312ce ("cpufreq: schedutil: Use kobject release() method to free sugov_tunables")
Cc: 4.7+ <stable@vger.kernel.org> # 4.7+
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-09-14 19:01:36 +02:00
Doug Smythies
d9a7e9df73 cpufreq: intel_pstate: Override parameters if HWP forced by BIOS
If HWP has been already been enabled by BIOS, it may be
necessary to override some kernel command line parameters.
Once it has been enabled it requires a reset to be disabled.

Suggested-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-09-13 19:26:08 +02:00
Rafael J. Wysocki
be2d24336f Merge branches 'pm-cpufreq', 'pm-sleep' and 'pm-em'
* pm-cpufreq:
  cpufreq: intel_pstate: hybrid: Rework HWP calibration
  ACPI: CPPC: Introduce cppc_get_nominal_perf()

* pm-sleep:
  PM: sleep: core: Avoid setting power.must_resume to false
  PM: sleep: wakeirq: drop useless parameter from dev_pm_attach_wake_irq()

* pm-em:
  Documentation: power: include kernel-doc in Energy Model doc
  PM: EM: fix kernel-doc comments
2021-09-10 20:26:08 +02:00