Commit Graph

62 Commits

Author SHA1 Message Date
Stratos Karafotis 830bcac4e4 cpufreq: intel_pstate: Remove duplicate CPU ID check
We check the CPU ID during driver init. There is no need
to do it again per logical CPU initialization.

So, remove the duplicate check.

Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-06-10 22:49:41 +02:00
Rafael J. Wysocki 5ece239918 Merge back earlier cpufreq material.
Conflicts:
	arch/mips/loongson/lemote-2f/clock.c
	drivers/cpufreq/intel_pstate.c
2014-06-03 15:03:27 +02:00
Doug Smythies bf8102228a intel_pstate: Improve initial busy calculation
This change makes the busy calculation using 64 bit math which prevents
overflow for large values of aperf/mperf.

Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-06-02 12:46:01 +02:00
Dirk Brandewie c4ee841f60 intel_pstate: add sample time scaling
The PID assumes that samples are of equal time, which for a deferable
timers this is not true when the system goes idle.  This causes the
PID to take a long time to converge to the min P state and depending
on the pattern of the idle load can make the P state appear stuck.

The hold-off value of three sample times before using the scaling is
to give a grace period for applications that have high performance
requirements and spend a lot of time idle,  The poster child for this
behavior is the ffmpeg benchmark in the Phoronix test suite.

Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-06-02 12:45:05 +02:00
Dirk Brandewie f0fe3cd7e1 intel_pstate: Correct rounding in busy calculation
Changing to fixed point math throughout the busy calculation in
commit e66c1768 (Change busy calculation to use fixed point
math.) Introduced some inaccuracies by rounding the busy value at two
points in the calculation.  This change removes roundings and moves
the rounding to the output of the PID where the calculations are
complete and the value returned as an integer.

Fixes: e66c176837 (intel_pstate: Change busy calculation to use fixed point math.)
Reported-by: Doug Smythies <dsmythies@telus.net>
Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-06-02 12:44:59 +02:00
Dirk Brandewie adacdf3f2b intel_pstate: Remove C0 tracking
Commit fcb6a15c (intel_pstate: Take core C0 time into account for core
busy calculation) introduced a regression referenced below.  The issue
with "lockup" after suspend that this commit was addressing is now dealt
with in the suspend path.

Fixes: fcb6a15c2e (intel_pstate: Take core C0 time into account for core busy calculation)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=66581
Link: https://bugzilla.kernel.org/show_bug.cgi?id=75121
Reported-by: Doug Smythies <dsmythies@telus.net>
Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-06-02 12:44:48 +02:00
Stratos Karafotis 94f89e0760 cpufreq: intel_pstate: Remove unused member name of cpudata
Although, a value is assigned to member name of struct cpudata,
it is never used.

We can safely remove it.

Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-05-27 00:52:32 +02:00
Dirk Brandewie d40a63c45b intel_pstate: remove setting P state to MAX on init
Setting the P state of the core to max at init time is a hold over
from early implementation of intel_pstate where intel_pstate disabled
cpufreq and loaded VERY early in the boot sequence.  This was to
ensure that intel_pstate did not affect boot time. This in not needed
now that intel_pstate is a cpufreq driver.

Removing this covers the case where a CPU has gone through a manual
CPU offline/online cycle and the P state is set to MAX on init and the
CPU immediately goes idle.  Due to HW coordination the P state request
on the idle CPU will drag all cores to MAX P state until the load is
reevaluated when to core goes non-idle.

Reported-by: Patrick Marlier <patrick.marlier@gmail.com>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-05-13 17:39:13 +02:00
Dirk Brandewie c7e241df59 intel_pstate: Add CPU IDs for Broadwell processors
Add support for Broadwell processors.

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-05-13 02:54:33 +02:00
Dirk Brandewie 21855ff5bc intel_pstate: Set turbo VID for BayTrail
A documentation update exposed that there is a separate set of VID
values that must be used in the turbo/boost P state range.  Add
enumerating and setting the correct VID for P states in the turbo
range.

Cc: v3.13+ <stable@vger.kernel.org> # v3.13+
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-05-12 01:58:58 +02:00
Stratos Karafotis 6b17ddb2a5 intel_pstate: Remove sample parameter in intel_pstate_calc_busy
Since commit d37e2b7644 ("intel_pstate: remove unneeded sample buffers")
we use only one sample. So, there is no need to pass the sample
pointer to intel_pstate_calc_busy. Instead, get the pointer from
cpudata. Also, remove the unused SAMPLE_COUNT macro.

While at it, reformat the first line in this function.

Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-05-07 00:29:51 +02:00
Dirk Brandewie c2294a2f78 intel_pstate: Use del_timer_sync in intel_pstate_cpu_stop
Ensure that no timer callback is running since we are about to free
the timer structure.  We cannot guarantee that the call back is called
on the CPU where the timer is running.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-26 16:39:53 +01:00
Dirk Brandewie bb18008f80 intel_pstate: Set core to min P state during core offline
Change to use the new ->stop_cpu() callback to do clean up during CPU
hotplug. The requested P state for an offline core will be used by the
hardware coordination function to select the package P state. If the
core is under load when it is offlined it will fix the package P state
floor to the requested P state of offline core.

Reported-by: Patrick Marlier <patrick.marlier@gmail.com>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-20 04:04:40 +01:00
Dirk Brandewie d98d099b9f intel_pstate: fix pid_reset to use fixed point values
commit d253d2a526 (Improve accuracy by not truncating until final
result), changed internal variables of the PID to be fixed point
numbers. Update the pid_reset() to reflect this change.

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-02 00:35:19 +01:00
Dirk Brandewie d37e2b7644 intel_pstate: remove unneeded sample buffers
Remove unneeded sample buffers, intel_pstate operates on the most
recent sample only.  This save some memory and make the code more
readable.

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-02 00:33:46 +01:00
Dirk Brandewie e66c176837 intel_pstate: Change busy calculation to use fixed point math.
Commit fcb6a15c2e (intel_pstate: Take core C0 time into account for
core busy calculation) introduced a regression on some processor SKUs
supported by intel_pstate. This was due to the truncation caused by
using integer math to calculate core busy and C0 percentages.

On a i7-4770K processor operating at 800Mhz going to 100% utilization
the percent busy of the CPU using integer math is 22%, but it actually
is 22.85%.  This value scaled to the current frequency returned 97
which the PID interpreted as no error and did not adjust the P state.

Tested on i7-4770K, i7-2600, i5-3230M.

Fixes: fcb6a15c2e (intel_pstate: Take core C0 time into account for core busy calculation)
References: https://lkml.org/lkml/2014/2/19/626
References: https://bugzilla.kernel.org/show_bug.cgi?id=70941
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-02-26 00:56:49 +01:00
Dirk Brandewie 61d8d2abc1 intel_pstate: Add support for Baytrail turbo P states
A documentation update exposed the existance of the turbo ratio
register. Update baytrail support to use the turbo range.

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Cc: 3.13+ <stable@vger.kernel.org> # 3.13+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-02-21 01:22:40 +01:00
Dirk Brandewie 4042e7570c intel_pstate: Use LFM bus ratio as min ratio/P state
LFM (max efficiency ratio) is the max frequency at minimum voltage
supported by the processor.  Using LFM as the minimum P state
increases performmance without affecting power. By not using P states
below LFM we avoid using P states that are less power efficient.

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Cc: 3.13+ <stable@vger.kernel.org> # 3.13+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-02-21 01:22:40 +01:00
Dirk Brandewie 709c078e17 intel_pstate: Remove energy reporting from pstate_sample tracepoint
Remove the reporting of energy since it does not provide any useful
information about the state of the driver and will be a maintainance
headache going forward since the RAPL energy units register is not
architectural and subject to change between micro-architectures

References: https://bugzilla.kernel.org/show_bug.cgi?id=69831
Fixes: b69880f9cc (intel_pstate: Add trace point to report internal state.)
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-02-13 02:11:18 +01:00
Dirk Brandewie fcb6a15c2e intel_pstate: Take core C0 time into account for core busy calculation
Take non-idle time into account when calculating core busy time.
This ensures that intel_pstate will notice a decrease in load.

References: https://bugzilla.kernel.org/show_bug.cgi?id=66581
Cc: 3.10+ <stable@vger.kernel.org> # 3.10+
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-02-04 21:41:15 +01:00
Dirk Brandewie b69880f9cc intel_pstate: Add trace point to report internal state.
Add perf trace event "power:pstate_sample" to report driver state to
aid in diagnosing issues reported against intel_pstate.

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-01-17 02:00:44 +01:00
Rafael J. Wysocki 51c4c4ce1d Merge back earlier 'pm-cpufreq' material. 2014-01-14 23:12:08 +01:00
Dirk Brandewie 6cbd7ee10e intel_pstate: Add X86_FEATURE_APERFMPERF to cpu match parameters.
KVM environments do not support APERF/MPERF MSRs. intel_pstate cannot
operate without these registers.

The previous validity checks in intel_pstate_msrs_not_valid() are
insufficent in nested KVMs.

References: https://bugzilla.redhat.com/show_bug.cgi?id=1046317
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-01-06 22:16:14 +01:00
Rafael J. Wysocki 72e2adcdcb Merge back earlier 'pm-cpufreq' material. 2014-01-05 15:32:51 +01:00
Rafael J. Wysocki 98a947abdd intel_pstate: Fail initialization if P-state information is missing
If pstate.current_pstate is 0 after the initial
intel_pstate_get_cpu_pstates(), this means that we were unable to
obtain any useful P-state information and there is no reason to
continue, so free memory and return an error in that case.

This fixes the following divide error occuring in a nested KVM
guest:

Intel P-state driver initializing.
Intel pstate controlling: cpu 0
cpufreq: __cpufreq_add_dev: ->get() failed
divide error: 0000 [#1] SMP
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.0-0.rc4.git5.1.fc21.x86_64 #1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
task: ffff88001ea20000 ti: ffff88001e9bc000 task.ti: ffff88001e9bc000
RIP: 0010:[<ffffffff815c551d>]  [<ffffffff815c551d>] intel_pstate_timer_func+0x11d/0x2b0
RSP: 0000:ffff88001ee03e18  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88001a454348 RCX: 0000000000006100
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff88001ee03e38 R08: 0000000000000000 R09: 0000000000000000
R10: ffff88001ea20000 R11: 0000000000000000 R12: 00000c0a1ea20000
R13: 1ea200001ea20000 R14: ffffffff815c5400 R15: ffff88001a454348
FS:  0000000000000000(0000) GS:ffff88001ee00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000001c0c000 CR4: 00000000000006f0
Stack:
 fffffffb1a454390 ffffffff821a4500 ffff88001a454390 0000000000000100
 ffff88001ee03ea8 ffffffff81083e9a ffffffff81083e15 ffffffff82d5ed40
 ffffffff8258cc60 0000000000000000 ffffffff81ac39de 0000000000000000
Call Trace:
 <IRQ>
 [<ffffffff81083e9a>] call_timer_fn+0x8a/0x310
 [<ffffffff81083e15>] ? call_timer_fn+0x5/0x310
 [<ffffffff815c5400>] ? pid_param_set+0x130/0x130
 [<ffffffff81084354>] run_timer_softirq+0x234/0x380
 [<ffffffff8107aee4>] __do_softirq+0x104/0x430
 [<ffffffff8107b5fd>] irq_exit+0xcd/0xe0
 [<ffffffff81770645>] smp_apic_timer_interrupt+0x45/0x60
 [<ffffffff8176efb2>] apic_timer_interrupt+0x72/0x80
 <EOI>
 [<ffffffff810e15cd>] ? vprintk_emit+0x1dd/0x5e0
 [<ffffffff81757719>] printk+0x67/0x69
 [<ffffffff815c1493>] __cpufreq_add_dev.isra.13+0x883/0x8d0
 [<ffffffff815c14f0>] cpufreq_add_dev+0x10/0x20
 [<ffffffff814a14d1>] subsys_interface_register+0xb1/0xf0
 [<ffffffff815bf5cf>] cpufreq_register_driver+0x9f/0x210
 [<ffffffff81fb19af>] intel_pstate_init+0x27d/0x3be
 [<ffffffff81761e3e>] ? mutex_unlock+0xe/0x10
 [<ffffffff81fb1732>] ? cpufreq_gov_dbs_init+0x12/0x12
 [<ffffffff8100214a>] do_one_initcall+0xfa/0x1b0
 [<ffffffff8109dbf5>] ? parse_args+0x225/0x3f0
 [<ffffffff81f64193>] kernel_init_freeable+0x1fc/0x287
 [<ffffffff81f638d0>] ? do_early_param+0x88/0x88
 [<ffffffff8174b530>] ? rest_init+0x150/0x150
 [<ffffffff8174b53e>] kernel_init+0xe/0x130
 [<ffffffff8176e27c>] ret_from_fork+0x7c/0xb0
 [<ffffffff8174b530>] ? rest_init+0x150/0x150
Code: c1 e0 05 48 63 bc 03 10 01 00 00 48 63 83 d0 00 00 00 48 63 d6 48 c1 e2 08 c1 e1 08 4c 63 c2 48 c1 e0 08 48 98 48 c1 e0 08 48 99 <49> f7 f8 48 98 48 0f af f8 48 c1 ff 08 29 f9 89 ca c1 fa 1f 89
RIP  [<ffffffff815c551d>] intel_pstate_timer_func+0x11d/0x2b0
 RSP <ffff88001ee03e18>
---[ end trace f166110ed22cc37a ]---
Kernel panic - not syncing: Fatal exception in interrupt

Reported-and-tested-by: Kashyap Chamarthy <kchamart@redhat.com>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: All applicable <stable@vger.kernel.org>
2013-12-31 13:37:46 +01:00