This allows drivers and other code to get the max reported CPU frequency.
Initial use is for scaling ring frequency with GPU frequency in the i915
driver.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Keith Packard <keithp@keithp.com>
With dynamic debug having gained the capability to report debug messages
also during the boot process, it offers a far superior interface for
debug messages than the custom cpufreq infrastructure. As a first step,
remove the old cpufreq_debug_printk() function and replace it with a call
to the generic pr_debug() function.
How can dynamic debug be used on cpufreq? You need a kernel which has
CONFIG_DYNAMIC_DEBUG enabled.
To enabled debugging during runtime, mount debugfs and
$ echo -n 'module cpufreq +p' > /sys/kernel/debug/dynamic_debug/control
for debugging the complete "cpufreq" module. To achieve the same goal during
boot, append
ddebug_query="module cpufreq +p"
as a boot parameter to the kernel of your choice.
For more detailled instructions, please see
Documentation/dynamic-debug-howto.txt
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Dave Jones <davej@redhat.com>
None of the existing cpufreq drivers uses the second argument of
its .suspend() callback (which isn't useful anyway), so remove it.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
lock_policy_rwsem_* and unlock_policy_rwsem_* functions are scheduled
to be unexported when 2.6.33. Now there are no other callers of them
out of cpufreq.c, unexport them and make them static.
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Multiple modules used to define those which are with identical
functionality and were needlessly replicated among the different cpufreq
drivers. Push them into the header and remove duplication.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1270065406-1814-7-git-send-email-bp@amd64.org>
Reviewed-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
This interface is mainly intended (and implemented) for ACPI _PPC BIOS
frequency limitations, but other cpufreq drivers can also use it for
similar use-cases.
Why is this needed:
Currently it's not obvious why cpufreq got limited.
People see cpufreq/scaling_max_freq reduced, but this could have
happened by:
- any userspace prog writing to scaling_max_freq
- thermal limitations
- hardware (_PPC in ACPI case) limitiations
Therefore export bios_limit (in kHz) to:
- Point the user that it's the BIOS (broken or intended) which limits
frequency
- Export it as a sysfs interface for userspace progs.
While this was a rarely used feature on laptops, there will appear
more and more server implemenations providing "Green IT" features like
allowing the service processor to limit the frequency. People want
to know about HW/BIOS frequency limitations.
All ACPI P-state driven cpufreq drivers are covered with this patch:
- powernow-k8
- powernow-k7
- acpi-cpufreq
Tested with a patched DSDT which limits the first two cores (_PPC returns 1)
via _PPC, exposed by bios_limit:
# echo 2200000 >cpu2/cpufreq/scaling_max_freq
# cat cpu*/cpufreq/scaling_max_freq
2600000
2600000
2200000
2200000
# #scaling_max_freq shows general user/thermal/BIOS limitations
# cat cpu*/cpufreq/bios_limit
2600000
2600000
2800000
2800000
# #bios_limit only shows the HW/BIOS limitation
CC: Pallipadi Venkatesh <venkatesh.pallipadi@intel.com>
CC: Len Brown <lenb@kernel.org>
CC: davej@codemonkey.org.uk
CC: linux@dominikbrodowski.net
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
When CONFIG_CPU_FREQ is disabled, cpufreq_get() needs a stub. Used by kvm
(although it looks like a bit of the kvm code could be omitted when
CONFIG_CPU_FREQ is disabled).
arch/x86/built-in.o: In function `kvm_arch_init':
(.text+0x10de7): undefined reference to `cpufreq_get'
(Needed in linux-next's KVM tree, but it's correct in 2.6.32).
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Tested-by: Eric Paris <eparis@redhat.com>
Cc: Jiri Slaby <jirislaby@gmail.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently everything in the cpufreq layer is per core based.
This does not reflect reality, for example ondemand on conservative
governors have global sysfs variables.
Introduce a global cpufreq directory and add the kobject to the governor
struct, so that governors can easily access it.
The directory is initialized in the cpufreq_core_init initcall and thus will
always be created if cpufreq is compiled in, even if no cpufreq driver is
active later.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
This reverts commit e088e4c9cd.
Removing the sysfs interface for p4-clockmod was flagged as a
regression in bug 12826.
Course of action:
- Find out the remaining causes of overheating, and fix them
if possible. ACPI should be doing the right thing automatically.
If it isn't, we need to fix that.
- mark p4-clockmod ui as deprecated
- try again with the removal in six months.
It's not really feasible to printk about the deprecation, because
it needs to happen at all the sysfs entry points, which means adding
a lot of strcmp("p4-clockmod".. calls to the core, which.. bleuch.
Signed-off-by: Dave Jones <davej@redhat.com>
Impact: use new cpumask API to reduce memory usage
This is part of an effort to reduce structure sizes for machines
configured with large NR_CPUS. cpumask_t gets replaced by
cpumask_var_t, which is either struct cpumask[1] (small NR_CPUS) or
struct cpumask * (large NR_CPUS).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Dave Jones <davej@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
p4-clockmod has a long history of abuse. It pretends to be a CPU
frequency scaling driver, even though it doesn't actually change
the CPU frequency, but instead just modulates the frequency with
wait-states.
The biggest misconception is that when running at the lower 'frequency'
p4-clockmod is saving power. This isn't the case, as workloads running
slower take longer to complete, preventing the CPU from entering deep C states.
However p4-clockmod does have a purpose. It can prevent overheating.
Having it hooked up to the cpufreq interfaces is the wrong way to achieve
cooling however. It should instead be hooked up to ACPI.
This diff introduces a means for a cpufreq driver to register with the
cpufreq core, but not present a sysfs interface.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Add a cpu parameter to __cpufreq_driver_getavg(). This is needed for software
cpufreq coordination where policy->cpu may not be same as the CPU on which we
want to getavg frequency.
A follow-on patch will use this parameter to getavg freq from all cpus
in policy->cpus.
Change since last patch. Fix the offline/online and suspend/resume
oops reported by Youquan Song <youquan.song@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
This patch removes CVS keywords that weren't updated for a long time
from comments.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Dave Jones <davej@redhat.com>
Currently, affected_cpus shows which CPUs need to have their frequency
coordinated in software. When hardware coordination is in use, the contents
of this file appear the same as when no coordination is required. This can
lead to some confusion among user-space programs, for example, that do not
know that extra coordination is required to force a CPU core to a particular
speed to control power consumption.
To fix this, create a "related_cpus" attribute that always displays the
coordination map regardless of whatever coordination strategy the cpufreq
driver uses (sw or hw). If the cpufreq driver does not provide a value, fall
back to policy->cpus.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
Eliminate cpufreq_userspace scaling_setspeed deadlock.
Luming Yu recently uncovered yet another cpufreq related deadlock.
One thread that continuously switches the governors and the other thread that
repeatedly cats the contents of cpufreq directory causes both these threads to
go into a deadlock.
Detailed examination of the deadlock showed the exact flow before the deadlock
as:
Thread 1 Thread 2
________ ________
cats files under /sys/devices/.../cpufreq/
Set governor to userspace
Adds a new sysfs entry for
scaling_setspeed
cats files under /sys/devices/.../cpufreq/
Set governor to performance
Holds cpufreq_rw_sem in write
mode
Sends a STOP notify to
userspace governor
cat /sys/devices/.../cpufreq/scaling_setspeed
Gets a handle on the above sysfs entry with
sysfs_get_active
Blocks while trying to get cpufreq_rw_sem
in read mode
Remove a sysfs entry for
scaling_setspeed
Blocks on sysfs_deactivate
while waiting for earlier
get_active (on other thread)
to drain
At this point both threads go into deadlock and any other thread that tries to
do anything with sysfs cpufreq will also block.
There seems to be no easy way to avoid this deadlock as long as
cpufreq_userspace adds/removes the sysfs entry under same kobject as cpufreq.
Below patch moves scaling_setspeed to cpufreq.c, keeping it always and calling
back the governor on read/write. This is the cleanest fix I could think of,
even though adding two callbacks in governor structure just for this seems
unnecessary.
Note that the change makes scaling_setspeed under /sys/.../cpufreq permanent
and returns <unsupported> when governor is not userspace.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Callsites such as arch/powerpc/oprofile/op_model_cell.c are having to
open-code #ifdef CONFIG_CPU_FREQ only to be able to get at the full definition
of cpufreq_unregister_notifier(), because no empty stub is available for the
!CONFIG_CPU_FREQ case. Let's provide one, to be able to remove such #ifdef's
from the rest of the kernel tree -- those will come in a subsequent patch.
Signed-off-by: Satyam Sharma <satyam@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
Depending on the transition latency of the HW for cpufreq switches, the
ondemand or conservative governor cannot be used with certain cpufreq
drivers. Still the ondemand should be the default governor on a wide range
of systems. This patch allows this and lets the governor fallback to the
performance governor at cpufreq driver load time, if the driver does not
support fast enough frequency switching.
Main benefit is that on e.g. installation or other systems without
userspace support a working dynamic cpufreq support can be achieved on most
systems by simply loading the cpufreq driver. This is especially essential
for recent x86(_64) laptop hardware which may rely on working dynamic
cpufreq OS support.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Bryan Wu <bryan.wu@analog.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
This reverts commit 184c44d204.
As noted by Dave Jones:
"Linus, please revert the above cset. It doesn't seem to be
necessary (it was added to fix a miscompile in 'make allnoconfig'
which doesn't seem to be repeatable with it reverted) and actively
breaks the ARM SA1100 framebuffer driver."
Requested-by: Dave Jones <davej@redhat.com>
Cc: Russell King <rmk+lkml@arm.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (231 commits)
[PATCH] i386: Don't delete cpu_devs data to identify different x86 types in late_initcall
[PATCH] i386: type may be unused
[PATCH] i386: Some additional chipset register values validation.
[PATCH] i386: Add missing !X86_PAE dependincy to the 2G/2G split.
[PATCH] x86-64: Don't exclude asm-offsets.c in Documentation/dontdiff
[PATCH] i386: avoid redundant preempt_disable in __unlazy_fpu
[PATCH] i386: white space fixes in i387.h
[PATCH] i386: Drop noisy e820 debugging printks
[PATCH] x86-64: Fix allnoconfig error in genapic_flat.c
[PATCH] x86-64: Shut up warnings for vfat compat ioctls on other file systems
[PATCH] x86-64: Share identical video.S between i386 and x86-64
[PATCH] x86-64: Remove CONFIG_REORDER
[PATCH] x86-64: Print type and size correctly for unknown compat ioctls
[PATCH] i386: Remove copy_*_user BUG_ONs for (size < 0)
[PATCH] i386: Little cleanups in smpboot.c
[PATCH] x86-64: Don't enable NUMA for a single node in K8 NUMA scanning
[PATCH] x86: Use RDTSCP for synchronous get_cycles if possible
[PATCH] i386: Add X86_FEATURE_RDTSCP
[PATCH] i386: Implement X86_FEATURE_SYNC_RDTSC on i386
[PATCH] i386: Implement alternative_io for i386
...
Fix up trivial conflict in include/linux/highmem.h manually.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>