In policy initialization, policy->max and policy->cpuinfo.max_freq are
always set to the value calculated from caps->nominal_perf.
This will cause the frequency stay on base frequency even if the policy
is already boosted when a CPU is going online.
Fix this by using policy->boost_enabled to determine which value should
be set.
Signed-off-by: Lifeng Zheng <zhenglifeng1@huawei.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://patch.msgid.link/20250117101457.1530653-4-zhenglifeng1@huawei.com
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
kthread_create() creates a kthread without running it yet. kthread_run()
creates a kthread and runs it.
On the other hand, kthread_create_worker() creates a kthread worker and
runs it.
This difference in behaviours is confusing. Also there is no way to
create a kthread worker and affine it using kthread_bind_mask() or
kthread_affine_preferred() before starting it.
Consolidate the behaviours and introduce kthread_run_worker[_on_cpu]()
that behaves just like kthread_run(). kthread_create_worker[_on_cpu]()
will now only create a kthread worker without starting it.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
cppc_get_cpu_power() return 0 if the policy is NULL. Then in
em_create_perf_table(), the later zero check for power is not valid
as power is uninitialized. As Quentin pointed out, kernel energy model
core check the return value of active_power() first, so if the callback
failed it should tell the core. So return -EINVAL to fix it.
Fixes: a78e720756 ("cpufreq: CPPC: Fix possible null-ptr-deref for cpufreq_cpu_get_raw()")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Suggested-by: Quentin Perret <qperret@google.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
cpufreq_cpu_get_raw() may return NULL if the cpu is not in
policy->cpus cpu mask and it will cause null pointer dereference,
so check NULL for cppc_get_cpu_cost().
Fixes: 740fcdc2c2 ("cpufreq: CPPC: Register EM based on efficiency class information")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
cpufreq_cpu_get_raw() may return NULL if the cpu is not in
policy->cpus cpu mask and it will cause null pointer dereference.
Fixes: 740fcdc2c2 ("cpufreq: CPPC: Register EM based on efficiency class information")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Since commit 6c8d750f97 ("cpufreq / cppc: Work around for Hisilicon CPPC
cpufreq"), we introduce a workround for HiSilicon platforms that do not
support performance feedback counters, whereas they can get the actual
frequency from the desired perf register. Later on, FIE is disabled in
that workaround as well.
Now the workround can be handled by the common code. Desired perf would be
read and converted to frequency if feedback counters don't change. FIE
would be disabled if the CPPC regs are in PCC region.
Hence, the workaround is no longer needed and can be safely removed, in an
effort to consolidate the driver procedure.
Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com>
Reviewed-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: Huisong Li <lihuisong@huawei.com>
[ Viresh: Move fie_disabled withing CONFIG option to fix warning ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The CPPC performance feedback counters could be 0 or unchanged when the
target cpu is in a low-power idle state, e.g. power-gated or clock-gated.
When the counters are 0, cppc_cpufreq_get_rate() returns 0 KHz, which makes
cpufreq_online() get a false error and fail to generate a cpufreq policy.
When the counters are unchanged, the existing cppc_perf_from_fbctrs()
returns a cached desired perf, but some platforms may update the real
frequency back to the desired perf reg.
For the above cases in cppc_cpufreq_get_rate(), get the latest desired perf
from the CPPC reg to reflect the frequency because some platforms may
update the actual frequency back there; if failed, use the cached desired
perf.
Fixes: 6a4fec4f6d ("cpufreq: cppc: cppc_cpufreq_get_rate() returns zero in all error cases.")
Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com>
Reviewed-by: Zeng Heng <zengheng4@huawei.com>
Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>
Reviewed-by: Huisong Li <lihuisong@huawei.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
asm/unaligned.h is always an include of asm-generic/unaligned.h;
might as well move that thing to linux/unaligned.h and include
that - there's nothing arch-specific in that header.
auto-generated by the following:
for i in `git grep -l -w asm/unaligned.h`; do
sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
done
for i in `git grep -l -w asm-generic/unaligned.h`; do
sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
done
git mv include/asm-generic/unaligned.h include/linux/unaligned.h
git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
There is a corner case where the desired_perf is exactly same as the old
perf, but the actual current freq is not.
This happens during S3 while the cpufreq governor is set to powersave.
During cpufreq resume process, the booting CPU's new_freq obtained via
.get() is the highest frequency, while the policy->cur and
cpu->perf_ctrls.desired_perf are set to the lowest level (powersave
governor). This causes the warning: "CPU frequency out of sync:", and
the cpufreq core sets policy->cur to new_freq.
Then the governor->limits() calls cppc_cpufreq_set_target() to
configures the CPU frequency and returns directly because the
desired_perf converted from target_freq is same as the
cpu->perf_ctrls.desired_perf and both are the lowest_perf.
Since target_freq and policy->cur have been already compared in
__cpufreq_driver_target(), there's no need to compare them again here.
Drop the comparison.
Signed-off-by: Riwen Lu <luriwen@kylinos.cn>
[ Viresh: Updated commit message / subject ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
cppc_cpufreq_get_rate() and hisi_cppc_cpufreq_get_rate() can be called from
different places with various parameters. So cpufreq_cpu_get() can return
null as 'policy' in some circumstances.
Fix this bug by adding null return check.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: a28b2bfc09 ("cppc_cpufreq: replace per-cpu data array with a list")
Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Move and rename cppc_cpufreq_perf_to_khz() and cppc_cpufreq_khz_to_perf() to
use them outside cppc_cpufreq in topology_init_cpu_capacity_cppc().
Modify the interface to use struct cppc_perf_caps *caps instead of
struct cppc_cpudata *cpu_data as we only use the fields of cppc_perf_caps.
cppc_cpufreq was converting the lowest and nominal freq from MHz to kHz
before using them. We move this conversion inside cppc_perf_to_khz and
cppc_khz_to_perf to make them generic and usable outside cppc_cpufreq.
No functional change
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Pierre Gondois <pierre.gondois@arm.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lore.kernel.org/r/20231211104855.558096-6-vincent.guittot@linaro.org
The function cppc_freq_invariance_init() may failed to create
kworker_fie, make it more robust by setting fie_disabled to FIE_DISBALED
to prevent an invalid pointer dereference in kthread_destroy_worker(),
which called from cppc_freq_invariance_exit().
Signed-off-by: Liao Chang <liaochang1@huawei.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The cpufreq framework used to use the zero of return value to reflect
the cppc_cpufreq_get_rate() had failed to get current frequecy and treat
all positive integer to be succeed. Since cppc_get_perf_ctrs() returns a
negative integer in error case, so it is better to convert the value to
zero as the return value of cppc_cpufreq_get_rate().
Signed-off-by: Liao Chang <liaochang1@huawei.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The fields of the _CPC object are unsigned 32-bits values.
To avoid overflows while using _CPC's values, add 'u64' casts.
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
PCC regions utilize a mailbox to set/retrieve register values used by
the CPPC code. This is fine as long as the operations are
infrequent. With the FIE code enabled though the overhead can range
from 2-11% of system CPU overhead (ex: as measured by top) on Arm
based machines.
So, before enabling FIE assure none of the registers used by
cppc_get_perf_ctrs() are in the PCC region. Finally, add a module
parameter which can override the PCC region detection at boot or
module reload.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Make acpi_cpc_valid() check if ACPI is disabled, so that its callers
don't need to check that separately. This will also cause the AMD
pstate driver to refuse to load right away when ACPI is disabled.
Also update the warning message in amd_pstate_init() to mention the
ACPI disabled case for completeness.
Signed-off-by: Perry Yuan <Perry.Yuan@amd.com>
[ rjw: Subject edits, new changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Building the cppc_cpufreq driver with for arm64 with
CONFIG_ENERGY_MODEL=n triggers the following warnings:
drivers/cpufreq/cppc_cpufreq.c:550:12: error: ‘cppc_get_cpu_cost’ defined but not used
[-Werror=unused-function]
550 | static int cppc_get_cpu_cost(struct device *cpu_dev, unsigned long KHz,
| ^~~~~~~~~~~~~~~~~
drivers/cpufreq/cppc_cpufreq.c:481:12: error: ‘cppc_get_cpu_power’ defined but not used
[-Werror=unused-function]
481 | static int cppc_get_cpu_power(struct device *cpu_dev,
| ^~~~~~~~~~~~~~~~~~
Move the Energy Model related functions into specific guards.
This allows to fix the warning and prevent doing extra work
when the Energy Model is not present.
Fixes: 740fcdc2c2 ("cpufreq: CPPC: Register EM based on efficiency class information")
Reported-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Tested-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If CONFIG_ACPI_CPPC_CPUFREQ_FIE is not set, building fails:
drivers/cpufreq/cppc_cpufreq.c: In function ‘populate_efficiency_class’:
drivers/cpufreq/cppc_cpufreq.c:584:2: error: ‘cppc_cpufreq_driver’ undeclared (first use in this function); did you mean ‘cpufreq_driver’?
cppc_cpufreq_driver.register_em = cppc_cpufreq_register_em;
^~~~~~~~~~~~~~~~~~~
cpufreq_driver
Make declare of cppc_cpufreq_driver out of CONFIG_ACPI_CPPC_CPUFREQ_FIE
to fix this.
Fixes: 740fcdc2c2 ("cpufreq: CPPC: Register EM based on efficiency class information")
Signed-off-by: Zheng Bin <zhengbin13@huawei.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The communication mean of the _CPC desired performance can be
PCC, System Memory, System IO, or Functional Fixed Hardware (FFH).
PCC, SystemMemory and SystemIo address spaces are available from any
CPU. Thus, dvfs_possible_from_any_cpu should be enabled in such case.
For FFH, let the FFH implementation do smp_call_function_*() calls.
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The communication mean of the _CPC desired performance can be
PCC, System Memory, System IO, or Functional Fixed Hardware.
commit b7898fda5b ("cpufreq: Support for fast frequency switching")
fast_switching is 'for switching CPU frequencies from interrupt
context'.
Writes to SystemMemory and SystemIo are fast and suitable this.
This is not the case for PCC and might not be the case for FFH.
Enable fast_switching for the cppc_cpufreq driver in above cases.
Add cppc_allow_fast_switch() to check the desired performance
register address space and set fast_switching accordingly.
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Performance states and energy consumption values are not advertised
in ACPI. In the GicC structure of the MADT table, the "Processor
Power Efficiency Class field" (called efficiency class from now)
allows to describe the relative energy efficiency of CPUs.
To leverage the EM and EAS, the CPPC driver creates a set of
artificial performance states and registers them in the Energy Model
(EM), such as:
- Every 20 capacity unit, a performance state is created.
- The energy cost of each performance state gradually increases.
No power value is generated as only the cost is used in the EM.
During task placement, a task can raise the frequency of its whole
pd. This can make EAS place a task on a pd with CPUs that are
individually less energy efficient.
As cost values are artificial, and to place tasks on CPUs with the
lower efficiency class, a gap in cost values is generated for adjacent
efficiency classes.
E.g.:
- efficiency class = 0, capacity is in [0-1024], so cost values
are in [0: 51] (one performance state every 20 capacity unit)
- efficiency class = 1, capacity is in [0-1024], cost values
are in [1*gap+0: 1*gap+51].
The value of the cost gap is chosen to absorb a the energy of 4 CPUs
at their maximum capacity. This means that between:
1- a pd of 4 CPUs, each of them being used at almost their full
capacity. Their efficiency class is N.
2- a CPU using almost none of its capacity. Its efficiency class is
N+1
EAS will choose the first option.
This patch also populates the (struct cpufreq_driver).register_em
callback if the valid efficiency_class ACPI values are provided.
Signed-off-by: Pierre Gondois <Pierre.Gondois@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>