Linux tells ICH4 users that they can (manually) invoke
"hpet=force" to enable the undocumented ICH-4M HPET.
The HPET becomes available for both clocksource and clockevents.
But as of ff69f2bba6
(acpi: fix of pmtimer overflow that make Cx states time incorrect)
the HPET may be used via clocksource for idle accounting, and
hpet=force on an ICH4 box hangs boot.
It turns out that touching the MMIO HPET withing
the ARB_DIS part of C3 will hang the hardware.
The fix is to simply move the timer access outside
the ARB_DIS region. This is a no-op on modern hardware
because ARB_DIS is no longer used.
http://bugzilla.kernel.org/show_bug.cgi?id=13087
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Linux-2.6.29 deleted the legacy ACPI idle handler, leaving
the CPU_IDLE handler, which does not track bus master activity.
So delete the unused bm_activity field -- it is confusing to
print an always zero value.
This patch could break programs that parse
/proc/acpi/processor/*/power, since it deletes this
line from that file:
bus master activity: 00000000
http://bugzilla.kernel.org/show_bug.cgi?id=13145
is not fixed by this patch, but provoked this patch.
Signed-off-by: Len Brown <len.brown@intel.com>
The c2 and c3 idle handlers check tsc_halts_in_c()
after every time they return from idle. Um, when?:-)
Move this check to init-time to remove the unnecessary
run-time overhead, and also to have the check complete before
the first entry into the idle handler.
ff69f2bba6
(acpi: fix of pmtimer overflow that make Cx states time incorrect)
replaced the hard-coded use of the PM-timer inside idle,
with ktime_get_readl(), which possibly uses the TSC --
so it is now especially prudent to detect a broken TSC
before entering idle.
http://bugzilla.kernel.org/show_bug.cgi?id=13087
Signed-off-by: Len Brown <len.brown@intel.com>
Add support for Always Running APIC timer, CPUID_0x6_EAX_Bit2.
This bit means the APIC timer continues to run even when CPU is
in deep C-states.
The advantage is that we can use LAPIC timer on these CPUs
always, and there is no need for "slow to read and program"
external timers (HPET/PIT) and the timer broadcast logic
and related code in C-state entry and exit.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Rename acpi_get_register and acpi_set_register to clarify the
purpose of these functions. New names are acpi_read_bit_register
and acpi_write_bit_register.
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Removed locking for reads from the ACPI bit registers in PM1
Status, Enable, Control, and PM2 Control. The lock is not required
when reading the single-bit registers. The acpi_get_register_unlocked
function is no longer needed and has been removed. This will
improve performance for reads on these registers. ACPICA BZ 760.
http://www.acpica.org/bugzilla/show_bug.cgi?id=760
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
We found Cx states time abnormal in our some of machines which have 16
LCPUs, the C0 take too many time while system is really idle when kernel
enabled tickless and highres. powertop output is below:
PowerTOP version 1.9 (C) 2007 Intel Corporation
Cn Avg residency P-states (frequencies)
C0 (cpu running) (40.5%) 2.53 Ghz 0.0%
C1 0.0ms ( 0.0%) 2.53 Ghz 0.0%
C2 128.8ms (59.5%) 2.40 Ghz 0.0%
1.60 Ghz 100.0%
Wakeups-from-idle per second : 4.7 interval: 20.0s
no ACPI power usage estimate available
Top causes for wakeups:
41.4% ( 24.9) <interrupt> : extra timer interrupt
20.2% ( 12.2) <kernel core> : usb_hcd_poll_rh_status
(rh_timer_func)
After tacking detailed for this issue, Yakui and I find it is due to 24
bit PM timer overflows when some of cpu sleep more than 4 seconds. With
tickless kernel, the CPU want to sleep as much as possible when system
idle. But the Cx sleep time are recorded by pmtimer which length is
determined by BIOS. The current Cx time was gotten in the following
function from driver/acpi/processor_idle.c:
static inline u32 ticks_elapsed(u32 t1, u32 t2)
{
if (t2 >= t1)
return (t2 - t1);
else if (!(acpi_gbl_FADT.flags & ACPI_FADT_32BIT_TIMER))
return (((0x00FFFFFF - t1) + t2) & 0x00FFFFFF);
else
return ((0xFFFFFFFF - t1) + t2);
}
If pmtimer is 24 bits and it take 5 seconds from t1 to t2, in above
function, just about 1 seconds ticks was recorded. So the Cx time will be
reduced about 4 seconds. and this is why we see above powertop output.
To resolve this problem, Yakui and I use ktime_get() to record the Cx
states time instead of PM timer as the following patch. the patch was
tested with i386/x86_64 modes on several platforms.
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Tested-by: Alex Shi <alex.shi@intel.com>
Signed-off-by: Alex Shi <alex.shi@intel.com>
Signed-off-by: Yakui.zhao <yakui.zhao@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
CPU_IDLE=y has been default for ACPI=y since Nov-2007,
and has shipped in many distributions since then.
Here we delete the CPU_IDLE=n ACPI idle code, since
nobody should be using it, and we don't want to
maintain two versions.
Signed-off-by: Len Brown <len.brown@intel.com>
It is true that BM_RLD needs to be set to enable
bus master activity to wake an older chipset (eg PIIX4) from C3.
This is contrary to the erroneous wording the ACPI 2.0, 3.0
specifications that suggests that BM_RLD is an indicator
rather than a control bit.
ACPI 1.0's correct wording should be restored in ACPI 4.0:
http://www.acpica.org/bugzilla/show_bug.cgi?id=689
But the kernel should not have to clear BM_RLD
when entering a non C3-type state just to set
it again when entering a C3-type C-state.
We should be able to set BM_RLD at boot time
and leave it alone -- removing the overhead of
accessing this IO register from the idle entry path.
Signed-off-by: Len Brown <len.brown@intel.com>
PM1a_STS and PM1b_STS are twins that get OR'd together
on reads, and all writes are repeated to both.
The fields in PM1x_STS are single bits only,
there are no multi-bit fields.
So it is not necessary to lock PM1x_STS reads against
writes because it is impossible to read an intermediate
value of a single bit. It will either be 0 or 1,
even if a write is in progress during the read.
Reads are asynchronous to writes no matter if a lock
is used or not.
Signed-off-by: Len Brown <len.brown@intel.com>
While looking at reducing the amount of architecture namespace pollution
in the generic kernel, I found that asm/irq.h is included in the vast
majority of compilations on ARM (around 650 files.)
Since asm/irq.h includes a sub-architecture include file on ARM, this
causes a negative impact on the ccache's ability to re-use the build
results from other sub-architectures, so we have a desire to reduce the
dependencies on asm/irq.h.
It turns out that a major cause of this is the needless include of
linux/hardirq.h into asm-generic/local.h. The patch below removes this
include, resulting in some 250 to 300 files (around half) of the kernel
then omitting asm/irq.h.
My test builds still succeed, provided two ARM files are fixed
(arch/arm/kernel/traps.c and arch/arm/mm/fault.c) - so there may be
negative impacts for this on other architectures.
Note that x86 does not include asm/irq.h nor linux/hardirq.h in its
asm/local.h, so this patch can be viewed as bringing the generic version
into line with the x86 version.
[kosaki.motohiro@jp.fujitsu.com: add #include <linux/irqflags.h> to acpi/processor_idle.c]
[adobriyan@gmail.com: fix sparc64]
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Impact: reward non-stop TSCs with good TSC-based clocksources, etc.
Add support for CPUID_0x80000007_Bit8 on Intel CPUs as well. This bit means
that the TSC is invariant with C/P/T states and always runs at constant
frequency.
With Intel CPUs, we have 3 classes
* CPUs where TSC runs at constant rate and does not stop n C-states
* CPUs where TSC runs at constant rate, but will stop in deep C-states
* CPUs where TSC rate will vary based on P/T-states and TSC will stop in deep
C-states.
To cover these 3, one feature bit (CONSTANT_TSC) is not enough. So, add a
second bit (NONSTOP_TSC). CONSTANT_TSC indicates that the TSC runs at
constant frequency irrespective of P/T-states, and NONSTOP_TSC indicates
that TSC does not stop in deep C-states.
CPUID_0x8000000_Bit8 indicates both these feature bit can be set.
We still have CONSTANT_TSC _set_ and NONSTOP_TSC _not_set_ on some older Intel
CPUs, based on model checks. We can use TSC on such CPUs for time, as long as
those CPUs do not support/enter deep C-states.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move all the component definitions for drivers to a single shared place,
include/acpi/acpi_drivers.h.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
reflect the actual state entered in dev->last_state, when actaul state entered
is different from intended one.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
pm_idle_save resp. pm_idle_old can be NULL when the restore code in
acpi_processor_cst_has_changed() resp. cpuidle_uninstall_idle_handler()
is called. This can set pm_idle unconditinally to NULL, which causes the
kernel to panic when calling pm_idle in the x86 idle code. This was
covered by an extra check for !pm_idle in the x86 idle code, which was
removed during the x86 idle code refactoring.
Instead of restoring the pm_idle check in the x86 code prevent the
acpi/cpuidle code to set pm_idle to NULL.
Reported by: Dhaval Giani http://lkml.org/lkml/2008/7/2/309
Based on a debug patch from Ingo Molnar
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The acpi idle waits calls local_irq_save and then uses mwait to go into
idle. The tracer gets reenabled at local_irq_save but does not detect that
the idle allows for wake ups.
This patch adds code to disable the tracing when acpi puts the CPU to idle.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
"idle=halt" limits the idle loop to using
the halt instruction. No MWAIT, no IO accesses,
no C-states deeper than C1.
If something is broken in the idle code,
"idle=halt" is a less severe workaround
than "idle=poll" which disables all power savings.
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>