Add a new field to be set when the CPUFREQ_GOV_STRICT_TARGET flag is
set for the current governor to struct cpufreq_policy, so that the
drivers needing to check CPUFREQ_GOV_STRICT_TARGET do not have to
access the governor object during every frequency transition.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Introduce a new governor flag, CPUFREQ_GOV_STRICT_TARGET, for the
governors that want the target frequency to be set exactly to the
given value without leaving any room for adjustments on the hardware
side and set this flag for the powersave and performance governors.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
A new cpufreq governor flag will be added subsequently, so replace
the bool dynamic_switching fleid in struct cpufreq_governor with a
flags field and introduce CPUFREQ_GOV_DYNAMIC_SWITCHING to set for
the "dynamic switching" governors instead of it.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Add a helper function to test the flags of the cpufreq driver in use
againt a given flags mask.
In particular, this will be needed to test the
CPUFREQ_NEED_UPDATE_LIMITS cpufreq driver flag in the schedutil
governor.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Generally, a cpufreq driver may need to update some internal upper
and lower frequency boundaries on policy max and min changes,
respectively, but currently this does not work if the target
frequency does not change along with the policy limit.
Namely, if the target frequency does not change along with the
policy min or max, the "target_freq == policy->cur" check in
__cpufreq_driver_target() prevents driver callbacks from being
invoked and they do not even have a chance to update the
corresponding internal boundary.
This particularly affects the "powersave" and "performance"
governors that always set the target frequency to one of the
policy limits and it never changes when the other limit is updated.
To allow cpufreq the drivers needing to update internal frequency
boundaries on policy limits changes to avoid this issue, introduce
a new driver flag, CPUFREQ_NEED_UPDATE_LIMITS, that (when set) will
neutralize the check mentioned above.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Compared to other arch_* functions, arch_set_freq_scale() has an atypical
weak definition that can be replaced by a strong architecture specific
implementation.
The more typical support for architectural functions involves defining
an empty stub in a header file if the symbol is not already defined in
architecture code. Some examples involve:
- #define arch_scale_freq_capacity topology_get_freq_scale
- #define arch_scale_freq_invariant topology_scale_freq_invariant
- #define arch_scale_cpu_capacity topology_get_cpu_scale
- #define arch_update_cpu_topology topology_update_cpu_topology
- #define arch_scale_thermal_pressure topology_get_thermal_pressure
- #define arch_set_thermal_pressure topology_set_thermal_pressure
Bring arch_set_freq_scale() in line with these functions by renaming it to
topology_set_freq_scale() in the arch topology driver, and by defining the
arch_set_freq_scale symbol to point to the new function for arm and arm64.
While there are other users of the arch_topology driver, this patch defines
arch_set_freq_scale for arm and arm64 only, due to their existing
definitions of arch_scale_freq_capacity. This is the getter function of the
frequency invariance scale factor and without a getter function, the
setter function - arch_set_freq_scale() has not purpose.
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com> (BL_SWITCHER and topology parts)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The passed cpumask arguments to arch_set_freq_scale() and
arch_freq_counters_available() are only iterated over, so reflect this
in the prototype. This also allows to pass system cpumasks like
cpu_online_mask without getting a warning.
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Now that the update of the FI scale factor is done in cpufreq core for
selected functions - target(), target_index() and fast_switch(),
we can provide feedback to the task scheduler and architecture code
on whether cpufreq supports FI.
For this purpose provide an external function to expose whether the
cpufreq drivers support FI, by using a static key.
The logic behind the enablement of cpufreq-based invariance is as
follows:
- cpufreq-based invariance is disabled by default
- cpufreq-based invariance is enabled if any of the callbacks
above is implemented while the unsupported setpolicy() is not
The cpufreq_supports_freq_invariance() function only returns whether
cpufreq is instrumented with the arch_set_freq_scale() calls that
result in support for frequency invariance. Due to the lack of knowledge
on whether the implementation of arch_set_freq_scale() actually results
in the setting of a scale factor based on cpufreq information, it is up
to the architecture code to ensure the setting and provision of the
scale factor to the scheduler.
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The relation can't be invalid here, so if it turns out to be invalid,
just WARN_ON_ONCE() and return 0.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Allow intel_pstate to work in the passive mode with HWP enabled and
make it set the HWP minimum performance limit (HWP floor) to the
P-state value given by the target frequency supplied by the cpufreq
governor, so as to prevent the HWP algorithm and the CPU scheduler
from working against each other, at least when the schedutil governor
is in use, and update the intel_pstate documentation accordingly.
Among other things, this allows utilization clamps to be taken
into account, at least to a certain extent, when intel_pstate is
in use and makes it more likely that sufficient capacity for
deadline tasks will be provided.
After this change, the resulting behavior of an HWP system with
intel_pstate in the passive mode should be close to the behavior
of the analogous non-HWP system with intel_pstate in the passive
mode, except that the HWP algorithm is generally allowed to make the
CPU run at a frequency above the floor P-state set by intel_pstate in
the entire available range of P-states, while without HWP a CPU can
run in a P-state above the requested one if the latter falls into the
range of turbo P-states (referred to as the turbo range) or if the
P-states of all CPUs in one package are coordinated with each other
at the hardware level.
[Note that in principle the HWP floor may not be taken into account
by the processor if it falls into the turbo range, in which case the
processor has a license to choose any P-state, either below or above
the HWP floor, just like a non-HWP processor in the case when the
target P-state falls into the turbo range.]
With this change applied, intel_pstate in the passive mode assumes
complete control over the HWP request MSR and concurrent changes of
that MSR (eg. via the direct MSR access interface) are overridden by
it.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Pull ARM cpufreq driver changes for v5.9-rc1 from Viresh Kumar:
"Here are the details:
- Adaptive voltage scaling (AVS) support and minor cleanups for
brcmstb driver (Florian Fainelli and Markus Mayer).
- A new tegra driver and cleanup for the existing one (Sumit Gupta and
Jon Hunter).
- Bandwidth level support for Qcom driver along with OPP changes (Sibi
Sankar).
- Cleanups to sti, cpufreq-dt, ap806, CPPC drivers (Viresh Kumar, Lee
Jones, Ivan Kokshaysky, Sven Auhagen, and Xin Hao).
- Make schedutil default governor for ARM (Valentin Schneider).
- Fix dependency issues for imx (Walter Lozano).
- Cleanup around cached_resolved_idx in cpufreq core (Viresh Kumar)."
* 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: make schedutil the default for arm and arm64
cpufreq: cached_resolved_idx can not be negative
cpufreq: Add Tegra194 cpufreq driver
dt-bindings: arm: Add NVIDIA Tegra194 CPU Complex binding
cpufreq: imx: Select NVMEM_IMX_OCOTP
cpufreq: sti-cpufreq: Fix some formatting and misspelling issues
cpufreq: tegra186: Simplify probe return path
cpufreq: CPPC: Reuse caps variable in few routines
cpufreq: ap806: fix cpufreq driver needs ap cpu clk
cpufreq: cppc: Reorder code and remove apply_hisi_workaround variable
cpufreq: dt: fix oops on armada37xx
cpufreq: brcmstb-avs-cpufreq: send S2_ENTER / S2_EXIT commands to AVS
cpufreq: brcmstb-avs-cpufreq: Support polling AVS firmware
cpufreq: brcmstb-avs-cpufreq: more flexible interface for __issue_avs_command()
cpufreq: qcom: Disable fast switch when scaling DDR/L3
cpufreq: qcom: Update the bandwidth levels on frequency change
OPP: Add and export helper to set bandwidth
cpufreq: blacklist SC7180 in cpufreq-dt-platdev
cpufreq: blacklist SDM845 in cpufreq-dt-platdev
It is not possible for cached_resolved_idx to be invalid here as the
cpufreq core always sets index to a positive value.
Change its type to unsigned int and fix qcom usage a bit.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Currently, most CPUFreq governors are registered at the core_initcall
time when the given governor is the default one, and the module_init
time otherwise.
In preparation for letting users specify the default governor on the
kernel command line, change all of them to be registered at the
core_initcall unconditionally, as it is already the case for the
schedutil and performance governors. This will allow us to assume
that builtin governors have been registered before the built-in
CPUFreq drivers probe.
And since all governors have similar init/exit patterns now, introduce
two new macros, cpufreq_governor_{init,exit}(), to factorize the code.
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Macro 'for_each_active_policy()' is defined internally. To avoid some
cpufreq driver needing this macro to iterate over all the policies in
'.set_boost' callback, we redefine '.set_boost' to act on only one
policy and pass the policy as an argument.
'cpufreq_boost_trigger_state()' iterates over all the policies to set
boost for the system.
This is preparation for adding SW BOOST support for CPPC.
To protect Boost enable/disable by sysfs from CPU online/offline,
add 'cpu_hotplug_lock' before calling '.set_boost' for each CPU.
Also move the lock from 'set_boost()' to 'store_cpb()' in
acpi_cpufreq.
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add weak function to return the hardware maximum frequency of a CPU,
with the default implementation returning cpuinfo.max_freq, which is
the best information we can generically get from the cpufreq framework.
The default can be overwritten by a strong function in platforms
that want to provide an alternative implementation, with more accurate
information, obtained either from hardware or firmware.
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The cpufreq_global_kobject is only used internally by cpufreq.c
after commit 2361be2366 ("cpufreq: Don't create empty
/sys/devices/system/cpu/cpufreq directory").
Make it static.
Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
[ rjw: Add empty line after cpufreq_global_kobject definition ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In the process of modifying a cpufreq policy, the cpufreq core makes
a copy of it including all of the internals which is stored on the
CPU stack. Because struct cpufreq_policy is relatively large, this
may cause the size of the stack frame to exceed the 2 KB limit and
so the GCC complains when -Wframe-larger-than= is used.
In fact, it is not necessary to copy the entire policy structure
in order to modify it, however.
First, because cpufreq_set_policy() obtains the min and max policy
limits from frequency QoS now, it is not necessary to pass the limits
to it from the callers. The only things that need to be passed to it
from there are the new governor pointer or (if there is a built-in
governor in the driver) the "policy" value representing the governor
choice. They both can be passed as individual arguments, though, so
make cpufreq_set_policy() take them this way and rework its callers
accordingly. This avoids making copies of cpufreq policies in the
callers of cpufreq_set_policy().
Second, cpufreq_set_policy() still needs to pass the new policy
data to the ->verify() callback of the cpufreq driver whose task
is to sanitize the min and max policy limits. It still does not
need to make a full copy of struct cpufreq_policy for this purpose,
but it needs to pass a few items from it to the driver in case they
are needed (different drivers have different needs in that respect
and all of them have to be covered). For this reason, introduce
struct cpufreq_policy_data to hold copies of the members of
struct cpufreq_policy used by the existing ->verify() driver
callbacks and pass a pointer to a temporary structure of that
type to ->verify() (instead of passing a pointer to full struct
cpufreq_policy to it).
While at it, notice that intel_pstate and longrun don't really need
to verify the "policy" value in struct cpufreq_policy, so drop those
check from them to avoid copying "policy" into struct
cpufreq_policy_data (which allows it to be slightly smaller).
Also while at it fix up white space in a couple of places and make
cpufreq_set_policy() static (as it can be so).
Fixes: 3000ce3c52 ("cpufreq: Use per-policy frequency QoS")
Link: https://lore.kernel.org/linux-pm/CAMuHMdX6-jb1W8uC2_237m8ctCpsnGp=JCxqt8pCWVqNXHmkVg@mail.gmail.com
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
The scheduler code calling cpufreq_update_util() may run during CPU
offline on the target CPU after the IRQ work lists have been flushed
for it, so the target CPU should be prevented from running code that
may queue up an IRQ work item on it at that point.
Unfortunately, that may not be the case if dvfs_possible_from_any_cpu
is set for at least one cpufreq policy in the system, because that
allows the CPU going offline to run the utilization update callback
of the cpufreq governor on behalf of another (online) CPU in some
cases.
If that happens, the cpufreq governor callback may queue up an IRQ
work on the CPU running it, which is going offline, and the IRQ work
may not be flushed after that point. Moreover, that IRQ work cannot
be flushed until the "offlining" CPU goes back online, so if any
other CPU calls irq_work_sync() to wait for the completion of that
IRQ work, it will have to wait until the "offlining" CPU is back
online and that may not happen forever. In particular, a system-wide
deadlock may occur during CPU online as a result of that.
The failing scenario is as follows. CPU0 is the boot CPU, so it
creates a cpufreq policy and becomes the "leader" of it
(policy->cpu). It cannot go offline, because it is the boot CPU.
Next, other CPUs join the cpufreq policy as they go online and they
leave it when they go offline. The last CPU to go offline, say CPU3,
may queue up an IRQ work while running the governor callback on
behalf of CPU0 after leaving the cpufreq policy because of the
dvfs_possible_from_any_cpu effect described above. Then, CPU0 is
the only online CPU in the system and the stale IRQ work is still
queued on CPU3. When, say, CPU1 goes back online, it will run
irq_work_sync() to wait for that IRQ work to complete and so it
will wait for CPU3 to go back online (which may never happen even
in principle), but (worse yet) CPU0 is waiting for CPU1 at that
point too and a system-wide deadlock occurs.
To address this problem notice that CPUs which cannot run cpufreq
utilization update code for themselves (for example, because they
have left the cpufreq policies that they belonged to), should also
be prevented from running that code on behalf of the other CPUs that
belong to a cpufreq policy with dvfs_possible_from_any_cpu set and so
in that case the cpufreq_update_util_data pointer of the CPU running
the code must not be NULL as well as for the CPU which is the target
of the cpufreq utilization update in progress.
Accordingly, change cpufreq_this_cpu_can_update() into a regular
function in kernel/sched/cpufreq.c (instead of a static inline in a
header file) and make it check the cpufreq_update_util_data pointer
of the local CPU if dvfs_possible_from_any_cpu is set for the target
cpufreq policy.
Also update the schedutil governor to do the
cpufreq_this_cpu_can_update() check in the non-fast-switch
case too to avoid the stale IRQ work issues.
Fixes: 99d14d0e16 ("cpufreq: Process remote callbacks from any CPU if the platform permits")
Link: https://lore.kernel.org/linux-pm/20191121093557.bycvdo4xyinbc5cb@vireshk-i7/
Reported-by: Anson Huang <anson.huang@nxp.com>
Tested-by: Anson Huang <anson.huang@nxp.com>
Cc: 4.14+ <stable@vger.kernel.org> # 4.14+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Peng Fan <peng.fan@nxp.com> (i.MX8QXP-MEK)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Replace the CPU device PM QoS used for the management of min and max
frequency constraints in cpufreq (and its users) with per-policy
frequency QoS to avoid problems with cpufreq policies covering
more then one CPU.
Namely, a cpufreq driver is registered with the subsys interface
which calls cpufreq_add_dev() for each CPU, starting from CPU0, so
currently the PM QoS notifiers are added to the first CPU in the
policy (i.e. CPU0 in the majority of cases).
In turn, when the cpufreq driver is unregistered, the subsys interface
doing that calls cpufreq_remove_dev() for each CPU, starting from CPU0,
and the PM QoS notifiers are only removed when cpufreq_remove_dev() is
called for the last CPU in the policy, say CPUx, which as a rule is
not CPU0 if the policy covers more than one CPU. Then, the PM QoS
notifiers cannot be removed, because CPUx does not have them, and
they are still there in the device PM QoS notifiers list of CPU0,
which prevents new PM QoS notifiers from being registered for CPU0
on the next attempt to register the cpufreq driver.
The same issue occurs when the first CPU in the policy goes offline
before unregistering the driver.
After this change it does not matter which CPU is the policy CPU at
the driver registration time and whether or not it is online all the
time, because the frequency QoS is per policy and not per CPU.
Fixes: 67d874c3b2 ("cpufreq: Register notifiers with the PM QoS framework")
Reported-by: Dmitry Osipenko <digetx@gmail.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Reported-by: Sudeep Holla <sudeep.holla@arm.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Diagnosed-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lore.kernel.org/linux-pm/5ad2624194baa2f53acc1f1e627eb7684c577a19.1562210705.git.viresh.kumar@linaro.org/T/#md2d89e95906b8c91c15f582146173dce2e86e99f
Link: https://lore.kernel.org/linux-pm/20191017094612.6tbkwoq4harsjcqv@vireshk-i7/T/#m30d48cc23b9a80467fbaa16e30f90b3828a5a29b
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
No driver makes reference to these events now, remove them and the code
related to them.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This effectively reverts some changes made by commit f9f41e3ef9
("cpufreq: Remove policy create/remove notifiers").
We have a new use case for policy create/remove notifiers (for
allocating/freeing QoS requests per policy), so add them back.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* pm-cpufreq:
cpufreq: Make cpufreq_generic_init() return void
cpufreq: imx-cpufreq-dt: Add i.MX8MN support
cpufreq: Add QoS requests for userspace constraints
cpufreq: intel_pstate: Reuse refresh_frequency_limits()
cpufreq: Register notifiers with the PM QoS framework
PM / QoS: Add support for MIN/MAX frequency constraints
PM / QOS: Pass request type to dev_pm_qos_read_value()
PM / QOS: Rename __dev_pm_qos_read_value() and dev_pm_qos_raw_read_value()
PM / QOS: Pass request type to dev_pm_qos_{add|remove}_notifier()
It always returns 0 (success) and its return type should really be void.
Over that, many drivers have added error handling code based on its
return value, which is not required at all.
Change its return type to void and update all the callers.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>