It looks like the name of struct pstate_adjust_policy was updated
without updating its kerneldoc comment accordingly, so fix that
mistake.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The PID algorithm used by the intel_pstate driver tends to drive
performance to the minimum for workloads with utilization below the
setpoint, which is undesirable, so replace it with a modified
"proportional" algorithm on Atom.
The new algorithm will set the new P-state to be 1.25 times the
available maximum times the (frequency-invariant) utilization during
the previous sampling period except when the target P-state computed
this way is lower than the average P-state during the previous
sampling period. In the latter case, it will increase the target by
50% of the difference between it and the average P-state to prevent
performance from dropping down too fast in some cases.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Make the comment explaining the meaning of the perf_scaled variable
in get_target_pstate_use_performance() more straightforward.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This is a requirement that MSR MSR_PM_ENABLE must be set to 0x01 before
reading MSR_HWP_CAPABILITIES on a given CPU. If cpufreq init() is
scheduled on a CPU which is not same as policy->cpu or migrates to a
different CPU before calling msr read for MSR_HWP_CAPABILITIES, it
is possible that MSR_PM_ENABLE was not to set to 0x01 on that CPU.
This will cause GP fault. So like other places in this path
rdmsrl_on_cpu should be used instead of rdmsrl.
Moreover the scope of MSR_HWP_CAPABILITIES is on per thread basis, so it
should be read from the same CPU, for which MSR MSR_HWP_REQUEST is
getting set.
dmesg dump or warning:
[ 22.014488] WARNING: CPU: 139 PID: 1 at arch/x86/mm/extable.c:50 ex_handler_rdmsr_unsafe+0x68/0x70
[ 22.014492] unchecked MSR access error: RDMSR from 0x771
[ 22.014493] Modules linked in:
[ 22.014507] CPU: 139 PID: 1 Comm: swapper/0 Not tainted 4.7.5+ #1
...
...
[ 22.014516] Call Trace:
[ 22.014542] [<ffffffff813d7dd1>] dump_stack+0x63/0x82
[ 22.014558] [<ffffffff8107bc8b>] __warn+0xcb/0xf0
[ 22.014561] [<ffffffff8107bcff>] warn_slowpath_fmt+0x4f/0x60
[ 22.014563] [<ffffffff810676f8>] ex_handler_rdmsr_unsafe+0x68/0x70
[ 22.014564] [<ffffffff810677d9>] fixup_exception+0x39/0x50
[ 22.014604] [<ffffffff8102e400>] do_general_protection+0x80/0x150
[ 22.014610] [<ffffffff817f9ec8>] general_protection+0x28/0x30
[ 22.014635] [<ffffffff81687940>] ? get_target_pstate_use_performance+0xb0/0xb0
[ 22.014642] [<ffffffff810600c7>] ? native_read_msr+0x7/0x40
[ 22.014657] [<ffffffff81688123>] intel_pstate_hwp_set+0x23/0x130
[ 22.014660] [<ffffffff81688406>] intel_pstate_set_policy+0x1b6/0x340
[ 22.014662] [<ffffffff816829bb>] cpufreq_set_policy+0xeb/0x2c0
[ 22.014664] [<ffffffff81682f39>] cpufreq_init_policy+0x79/0xe0
[ 22.014666] [<ffffffff81682cb0>] ? cpufreq_update_policy+0x120/0x120
[ 22.014669] [<ffffffff816833a6>] cpufreq_online+0x406/0x820
[ 22.014671] [<ffffffff8168381f>] cpufreq_add_dev+0x5f/0x90
[ 22.014717] [<ffffffff81530ac8>] subsys_interface_register+0xb8/0x100
[ 22.014719] [<ffffffff816821bc>] cpufreq_register_driver+0x14c/0x210
[ 22.014749] [<ffffffff81fe1d90>] intel_pstate_init+0x39d/0x4d5
[ 22.014751] [<ffffffff81fe13f2>] ? cpufreq_gov_dbs_init+0x12/0x12
Cc: 4.3+ <stable@vger.kernel.org> # 4.3+
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch fixes overflow issue when calculating the desired_perf.
Fixes: ad38677df4 (cpufreq: CPPC: Force reporting values in KHz to fix user space interface)
Signed-off-by: Hoan Tran <hotran@apm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Now that the cpufreq-dt-platdev is used to create the cpufreq-dt platform
device for all OMAP platforms and the platform code that did it
before has been removed, add ti,am33xx and ti,dra7xx to the machine list
in cpufreq-dt-platdev which had relied on the removed platform code to do
this previously.
Fixes: 7694ca6e1d (cpufreq: omap: Use generic platdev driver)
Signed-off-by: Dave Gerlach <d-gerlach@ti.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 4.7+ <stable@vger.kernel.org> # 4.7+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Modify the P-state selection algorithm for Atom processors to use
the new SCHED_CPUFREQ_IOWAIT flag instead of the questionable
get_cpu_iowait_time_us() function.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Modify the schedutil cpufreq governor to boost the CPU
frequency if the SCHED_CPUFREQ_IOWAIT flag is passed to
it via cpufreq_update_util().
If that happens, the frequency is set to the maximum during
the first update after receiving the SCHED_CPUFREQ_IOWAIT flag
and then the boost is reduced by half during each following update.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Looks-good-to: Steve Muckle <smuckle@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Testing indicates that it is possible to improve performace
significantly without increasing energy consumption too much by
teaching cpufreq governors to bump up the CPU performance level if
the in_iowait flag is set for the task in enqueue_task_fair().
For this purpose, define a new cpufreq_update_util() flag
SCHED_CPUFREQ_IOWAIT and modify enqueue_task_fair() to pass that
flag to cpufreq_update_util() in the in_iowait case. That generally
requires cpufreq_update_util() to be called directly from there,
because update_load_avg() may not be invoked in that case.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Looks-good-to: Steve Muckle <smuckle@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
The schedutil cpufreq governor will be switched from tristate to bool. Fix
defconfigs.
* tag 'samsung-defconfig-schedutil-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux:
ARM: multi_v7_defconfig: Don't attempt to enable schedutil governor as module
ARM: exynos_defconfig: Don't attempt to enable schedutil governor as module
When CPPC is being used by ACPI on arm64, user space tools such as
cpupower report CPU frequency values from sysfs that are incorrect.
What the driver was doing was reporting the values given by ACPI tables
in whatever scale was used to provide them. However, the ACPI spec
defines the CPPC values as unitless abstract numbers. Internal kernel
structures such as struct perf_cap, in contrast, expect these values
to be in KHz. When these struct values get reported via sysfs, the
user space tools also assume they are in KHz, causing them to report
incorrect values (for example, reporting a CPU frequency of 1MHz when
it should be 1.8GHz).
The downside is that this approach has some assumptions:
(1) It relies on SMBIOS3 being used, *and* that the Max Frequency
value for a processor is set to a non-zero value.
(2) It assumes that all processors run at the same speed, or that
the CPPC values have all been scaled to reflect relative speed.
This patch retrieves the largest CPU Max Frequency from a type 4 DMI
record that it can find. This may not be an issue, however, as a
sampling of DMI data on x86 and arm64 indicates there is often only
one such record regardless. Since CPPC is relatively new, it is
unclear if the ACPI ASL will always be written to reflect any sort
of relative performance of processors of differing speeds.
(3) It assumes that performance and frequency both scale linearly.
For arm64 servers, this may be sufficient, but it does rely on
firmware values being set correctly. Hence, other approaches will
be considered in the future.
This has been tested on three arm64 servers, with and without DMI, with
and without CPPC support.
Signed-off-by: Al Stone <ahs3@redhat.com>
Signed-off-by: Prashanth Prakash <pprakash@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If a cpufreq driver is registered very early in the boot stage (e.g.
registered from postcore_initcall()), then cpufreq core may generate
kernel warnings for it.
In this case, the CPUs are brought online, then the cpufreq driver is
registered, and then the CPU topology devices are registered. However,
by the time cpufreq_add_dev() gets called, the cpu device isn't stored
in the per-cpu variable (cpu_sys_devices,) which is read by
get_cpu_device().
So the cpufreq core fails to get device for the CPU, for which
cpufreq_add_dev() was called in the first place and we will hit a
WARN_ON(!cpu_dev).
Even if we reuse the 'dev' parameter passed to cpufreq_add_dev() to
avoid that warning, there might be other CPUs online that share the
policy with the cpu for which cpufreq_add_dev() is called. Eventually
get_cpu_device() will return NULL for them as well, and we will hit the
same WARN_ON() again.
In order to fix these issues, change cpufreq core to create links to the
policy for a cpu only when cpufreq_add_dev() is called for that CPU.
Reuse the 'real_cpus' mask to track that as well.
Note that cpufreq_remove_dev() already handles removal of the links for
individual CPUs and cpufreq_add_dev() has aligned with that now.
Reported-by: Russell King <rmk+kernel@arm.linux.org.uk>
Tested-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
For structure types defined in the same file or local header files, find
top-level static structure declarations that have the following
properties:
1. Never reassigned.
2. Address never taken
3. Not passed to a top-level macro call
4. No pointer or array-typed field passed to a function or stored in a
variable.
Declare structures having all of these properties as const.
Done using Coccinelle.
Based on a suggestion by Joe Perches <joe@perches.com>.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The cpufreq-dt driver is also used for systems with multiple
clock/voltage domains for CPUs, i.e. multiple cpufreq policies in a
system.
And in such cases the platform users may want to enable "governor
tunables per policy". Support that via platform data, as not all users
of the driver would want that behavior.
Reported-by: Juri Lelli <Juri.Lelli@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The cpufreq DT driver also supports systems that have multiple
clock/voltage domains for CPUs, i.e. multiple policy systems.
The description of the Kconfig entry was never updated after the driver
was modified to support such systems, fix it.
Reported-by: Juri Lelli <Juri.Lelli@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This is leftover from an earlier patch which removed the usage of
platform data but forgot to remove this line. Remove it now.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The cpufreq-stats code can no longer be built as a module, so it now
appears with square brackets in menuconfig.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Fixes: 1aefc75b24 (cpufreq: stats: Make the stats code non-modular)
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>